From tim.one@home.com Mon Jan 1 00:13:12 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 31 Dec 2000 19:13:12 -0500 Subject: [Python-Dev] Re: Most everything is busted In-Reply-To: <14926.34447.60988.553140@anthem.concentric.net> Message-ID: [Barry A. Warsaw] > There's a stupid, stupid bug in Mailman 2.0, which I've just fixed > and (hopefully) unjammed things on the Mailman end[1]. We're still > probably subject to the Postfix delays unfortunately; I think those > are DNS related, and I've gotten a few other reports of DNS oddities, > which I've forwarded off to the DC sysadmins. I don't think that > particular problem will be fixed until after the New Year. > > relax-and-enjoy-the-quiet-ly y'rs, I would have, except you appear to have ruined it: hundreds of msgs disgorged overnight and into the afternoon. And echoes of email to c.l.py now routinely come back in minutes instead of days. Overall, ya, I liked it better when it was broken -- jerk . typical-user-ly y'rs - tim From tim.one@home.com Mon Jan 1 01:31:18 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 31 Dec 2000 20:31:18 -0500 Subject: [Python-Dev] Copyrights and licensing (was ... something irrelevant) In-Reply-To: <200012291652.RAA20251@pandora.informatik.hu-berlin.de> Message-ID: [Martin von Loewis] > I'd like to get an "official" clarification on this question. Is it > the case that patches containing copyright notices are only accepted > if they are accompanied with license information? It's nigh unto impossible to get Guido to pay attention to these kinds of issues until after it's too late -- guess who's still trying to get an FSF approved license for Python 1.6 . What I intend to push for is that nothing be accepted except under the understanding that copyright is assigned to the Python Software Foundation; but, since that doesn't exist yet, we're in limbo. > I agree that the changes are minor, I also believe that I hold the > copyright to the changes whether I attach a notice or not (at least > according to our local copyright law). Under U.S. law too. The difference is that, without an explicit copyright notice, it's a lot easier to get lawyers to ignore that reality <0.3 wink>. When the PSF does come into being, the lawyers will doubtless make us hassle everyone with an explicit copyright notice into signing reams of paperwork. It's a drain on time and money for all concerned, IMO, with no real payback. > What concerns me that without such a notice, gencodec.py looks as if > CNRI holds the copyright to it. I'm not willing to assign the > copyright of my changes to CNRI, and I'd like to avoid the impression > of doing so. Understood, and with sympathy. Since the status of JPython/Jython is still muddy, I urged Finn Bock to put his own copyright notice on his Jython work for exactly the same reason (i.e., to prevent CNRI claiming it later). Seems to me, though, that it may simplify life down the road if, whenever an author felt a similar need to assert copyright explicitly, they list Guido as the copyright holder. He's not going to screw Python! And it's inevitable that all Python copyrights will eventually be owned by him and/or the PSF anyway. But, for God's sake, whatever you do, *please* (anyone) don't make us look at a unique license! We're not lawyers, but we've been paying lawyers out of our own pockets to do this crap, and it's expensive and time-consuming. If you can't trust Guido to do a Right Thing with your code, Python is better off without it over the long haul. > What is even more concerning is that CNRI also holds the copyright to > the generated files, even though they are derived from information > made available by the Unicode consortium! It's no concern to me -- but then I'm not paranoid . cnri-and-the-uc-can-fight-it-out-if-it-comes-to-that-ly y'rs - tim From moshez@zadka.site.co.il Mon Jan 1 10:01:02 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 1 Jan 2001 12:01:02 +0200 (IST) Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20001231105812.A12168@newcnri.cnri.reston.va.us> References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> Message-ID: <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> On Sun, 31 Dec 2000, Andrew Kuchling wrote: > It also leads to one section of the FAQ (#3, I think) having something > like 60 questions jumbled together. IMHO the FAQ should be a text > file, perhaps in the PEP format so it can be converted to HTML, and it > should have an editor who'll arrange it into smaller sections. Any > volunteers? (Must ... resist ... urge to volunteer myself... help > me, Spock...) Well, Andrew, I know if I leave you any more time, you won't be able to resist the urge. OK, I'll volunteer. Can't do anything right now, but expect to see an updated version posted on my site soon. If people will think it's a good idea, I'll move it to Misc/. Fred, if the some-xml-format-to-HTML you're working on is in any sort of readiness, I'll use that to format the FAQ. Having used Perl in the last couple of weeks, I learned to appreciate the fact that the FAQ is a standard part of the documentation. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From loewis@informatik.hu-berlin.de Mon Jan 1 11:43:34 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 1 Jan 2001 12:43:34 +0100 (MET) Subject: [Python-Dev] Re: Copyrights and licensing (was ... something irrelevant) In-Reply-To: References: Message-ID: <200101011143.MAA11550@pandora.informatik.hu-berlin.de> > Seems to me, though, that it may simplify life down the road if, whenever an > author felt a similar need to assert copyright explicitly, they list Guido > as the copyright holder. He's not going to screw Python! That's a good solution, which I'll implement in a revised patch. Thanks for the advice, and Happy New Year, Martin From mal@lemburg.com Mon Jan 1 17:56:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 01 Jan 2001 18:56:20 +0100 Subject: [Python-Dev] Re: Copyright statements ([Patch #103002] Fix for #116285: Properly raise UnicodeErrors) References: <200012290957.KAA17936@pandora.informatik.hu-berlin.de> <3A4C757D.F64E9CEF@lemburg.com> Message-ID: <3A50C4C4.76A1C5B6@lemburg.com> Martin von Loewis wrote: > > > My only problem with it is your copyright notice. AFAIK, patches to > > the Python core cannot contain copyright notices without proper > > license information. OTOH, I don't think that these minor changes > > really warrant adding a complete license paragraph. > > I'd like to get an "official" clarification on this question. Is it > the case that patches containing copyright notices are only accepted > if they are accompanied with license information? > > I agree that the changes are minor, I also believe that I hold the > copyright to the changes whether I attach a notice or not (at least > according to our local copyright law). True. > What concerns me that without such a notice, gencodec.py looks as if > CNRI holds the copyright to it. I'm not willing to assign the > copyright of my changes to CNRI, and I'd like to avoid the impression > of doing so. > > What is even more concerning is that CNRI also holds the copyright to > the generated files, even though they are derived from information > made available by the Unicode consortium! The copyright for the files and changes needed for the Unicode support was indeed transferred to CNRI earlier this year. This was part of the contract I had with CNRI. I don't know why the copyright notice wasn't subsequently removed from the files after final checkin of the changes, though, because, as I remember, the copyright line was only added as "search&replace" token to the files in question in the sign over period. The codec files were part of the Unicode support patch, even though they were created by the gencodec.py tool I wrote to create them from the Unicode mapping files. That's why they also carry the copyright token. Note that with strict reading of the CNRI license, there's no problem with removing the notice from the files in question: """ ...provided, however, that CNRI's License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) 1995-2000 Corporation for National Research Initiatives; All Rights Reserved" are retained in Python 1.6 alone or in any derivative version prepared by Licensee... """ The copyright line in the Unicode files is "(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.", so this does not match the definition they gave in their license text. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@digicool.com Mon Jan 1 18:58:36 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 13:58:36 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: Your message of "Fri, 29 Dec 2000 21:59:16 +0100." <20001229215915.L1281@xs4all.nl> References: <20001229215915.L1281@xs4all.nl> Message-ID: <200101011858.NAA09263@cj20424-a.reston1.va.home.com> Thomas just checked this in, using Tim's words: > *** ref7.tex 2000/07/16 19:05:38 1.20 > --- ref7.tex 2000/12/31 22:52:59 1.21 > *************** > *** 243,249 **** > \ttindex{exc_value}\ttindex{exc_traceback}} > > ! The optional \keyword{else} clause is executed when no exception occurs > ! in the \keyword{try} clause. Exceptions in the \keyword{else} clause are > ! not handled by the preceding \keyword{except} clauses. > \kwindex{else} > > --- 243,251 ---- > \ttindex{exc_value}\ttindex{exc_traceback}} > > ! The optional \keyword{else} clause is executed when the \keyword{try} clause > ! terminates by any means other than an exception or executing a > ! \keyword{return}, \keyword{continue} or \keyword{break} statement. > ! Exceptions in the \keyword{else} clause are not handled by the preceding > ! \keyword{except} clauses. > \kwindex{else} How is this different from "when control flow reaches the end of the try clause", which is what I really had in mind? Using the current wording, this paragraph would have to be changed each time a new control-flow keyword is added. Based upon the historical record that's not a grave concern ;-), but I think the new wording relies too much on accidentals such as the fact that these are the only control flow altering events. It may be that control flow is not rigidly defined -- but as it is what was really intended, maybe the fix should be to explain the right concept rather than the current ad-hoc solution. This also avoids concerns of readers who are trying to read too much into the words and might become worried that there are other ways of altering the control flow that *would* cause the else clause to be executed; and guides implementors of other Pyhon-like languages (like vyper) that might have more control-flow altering statements or events. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Mon Jan 1 19:00:38 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 1 Jan 2001 20:00:38 +0100 Subject: [Python-Dev] PSA (Was: FAQ Horribly Out Of Date) Message-ID: <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> > It appears that CNRI can only think about one thing at a time <0.5 > wink>. For the last 6 months, that thing has been the license. If > they ever resolve the GPL compatibility issue, maybe they can be > persuaded to think about the PSA. In the meantime, I'd suggest you > not renew . I think we need to find a better answer than that, and soon. While everybody reading this list probably knows not to renew, the PSA is the first thing that you see when selecting "Python Community" on python.org. The first paragraph reads # The continued, free existence of Python is promoted by the # contributed efforts of many people. The Python Software Activity # (PSA) supports those efforts by helping to coordinate them. The PSA # operates web, ftp, and email services, organizes conferences, and # engages in other activities that benefit the Python user # community. In order to continue, the PSA needs the membership of # people who value Python. If you look at the current members list (http://www.python.org/psa/Members.html), it appears that many long-time members indeed have not renewed. This page was last updated Nov 14 - so it appears that CNRI is still processing applications when they come. It may well be that many of the newer members ask themselves by now what happened to their money; it might not be easy to get an answer to that question. However, there is clearly somebody to blame here: The Python Community. So I'd like to request that somebody with write permissions to these pages changes the text, to something along the lines of replacing the first paragraph with # The Python community organizes itself in different ways; people # interested in discussing development of and with Python usually # participate in mailing lists. # #

Organizations that wish to influence further directions of the # Python language may join the Python # Consortium. # #

The Corporation for # National Research Initiatives hosts the Python Software # Activity, which is described below. The PSA used to provide funding # for the Python development; that is no longer the case. If there is a factual error in this text, please let me know. Regards, Martin From tim.one@home.com Mon Jan 1 19:20:53 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 1 Jan 2001 14:20:53 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [gvanrossum, in an SF patch comment] > Bah. I don't like this one bit. More complexity for a little > bit of extra speed. > I'm keeping this open but expect to be closing it soon unless I > hear a really good argument why more speed is really needed in > this area. Down with code bloat and creeping featurism! Without judging "the solution" here, "the problem" is that everyone's first attempt to use line-at-a-time file input in Perl: while (} { ... $_ ...; } runs 2-5x faster then everyone's first attempt in Python: while 1: line = f.readline() if not line: break ... line ... It would be beneficial to address that *somehow*, cuz 2-5x isn't just "a little bit"; and by the time you walk a newbie thru while 1: lines = f.readlines(hintsize) if not lines: break for line in lines: ... line ... they feel like maybe Perl isn't so obscure after all . Does someone have an elegant way to address this? I believe Jeff's shot at elegance was the other part of the patch, using (his new) xreadlines under the covers to speed the fileinput module. reading-text-files-is-very-common-ly y'rs - tim From guido@digicool.com Mon Jan 1 19:25:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:25:07 -0500 Subject: [Python-Dev] PSA (Was: FAQ Horribly Out Of Date) In-Reply-To: Your message of "Mon, 01 Jan 2001 20:00:38 +0100." <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> References: <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> Message-ID: <200101011925.OAA09669@cj20424-a.reston1.va.home.com> > > It appears that CNRI can only think about one thing at a time <0.5 > > wink>. For the last 6 months, that thing has been the license. If > > they ever resolve the GPL compatibility issue, maybe they can be > > persuaded to think about the PSA. In the meantime, I'd suggest you > > not renew . > > I think we need to find a better answer than that, and soon. While > everybody reading this list probably knows not to renew, the PSA is > the first thing that you see when selecting "Python Community" on > python.org. The first paragraph reads > > # The continued, free existence of Python is promoted by the > # contributed efforts of many people. The Python Software Activity > # (PSA) supports those efforts by helping to coordinate them. The PSA > # operates web, ftp, and email services, organizes conferences, and > # engages in other activities that benefit the Python user > # community. In order to continue, the PSA needs the membership of > # people who value Python. > > If you look at the current members list > (http://www.python.org/psa/Members.html), it appears that many > long-time members indeed have not renewed. This page was last updated > Nov 14 - so it appears that CNRI is still processing applications when > they come. It may well be that many of the newer members ask > themselves by now what happened to their money; it might not be easy > to get an answer to that question. However, there is clearly somebody > to blame here: The Python Community. I don't know how many memberships CNRI has received, but it can't be many, since we sent out no reminders. I'll see if I can get an answer. > So I'd like to request that somebody with write permissions to these > pages changes the text, to something along the lines of replacing the > first paragraph with > > # The Python community organizes itself in different ways; people > # interested in discussing development of and with Python usually > # participate in mailing lists. > # > #

Organizations that wish to influence further directions of the > # Python language may join the Python > # Consortium. > # > #

The Corporation for > # National Research Initiatives hosts the Python Software > # Activity, which is described below. The PSA used to provide funding > # for the Python development; that is no longer the case. > > If there is a factual error in this text, please let me > know. I've done something slightly different -- see http://www.python.org/psa/. I've kept only your first paragraph, and inserted a boldface note before that about the obsolescence (or deprecation :-) of the PSA membership. I've removed the references to the consortium, since that's also about to collapse under its own inactivity; instead, the PSF will be formed, independent from CNRI, to hold the IP rights (insofar they can be assigned to the PSF) and for not much else. I'll see if I can get some more news about the creation of the PSF (which is supposed to be an initiative of ActiveState and Digital Creations). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jan 1 19:35:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:35:24 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 01 Jan 2001 14:20:53 EST." References: Message-ID: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> > [gvanrossum, in an SF patch comment] > > Bah. I don't like this one bit. More complexity for a little > > bit of extra speed. > > I'm keeping this open but expect to be closing it soon unless I > > hear a really good argument why more speed is really needed in > > this area. Down with code bloat and creeping featurism! > > Without judging "the solution" here, "the problem" is that everyone's first > attempt to use line-at-a-time file input in Perl: > > while (} { > ... $_ ...; > } > > runs 2-5x faster then everyone's first attempt in Python: > > while 1: > line = f.readline() > if not line: > break > ... line ... But is everyone's first thought to time the speed of Python vs. Perl? Why does it hurt so much that this is a bit slow? > It would be beneficial to address that *somehow*, cuz 2-5x isn't just "a > little bit"; and by the time you walk a newbie thru > > while 1: > lines = f.readlines(hintsize) > if not lines: > break > for line in lines: > ... line ... > > they feel like maybe Perl isn't so obscure after all . > > Does someone have an elegant way to address this? I believe Jeff's shot at > elegance was the other part of the patch, using (his new) xreadlines under > the covers to speed the fileinput module. But of course suggesting fileinput is also not a great solution -- it's relatively obscure (since it's not taught by most tutorials, certainly not by the standard tutorial). > reading-text-files-is-very-common-ly y'rs - tim So is worrying about performance without a good reason... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jan 1 19:49:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:49:24 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: Your message of "Mon, 01 Jan 2001 12:01:02 +0200." <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> Message-ID: <200101011949.OAA09804@cj20424-a.reston1.va.home.com> [Moshe] > Well, Andrew, I know if I leave you any more time, you won't be able > to resist the urge. OK, I'll volunteer. Can't do anything right now, > but expect to see an updated version posted on my site soon. If > people will think it's a good idea, I'll move it to Misc/. > Fred, if the some-xml-format-to-HTML you're working on is in any > sort of readiness, I'll use that to format the FAQ. Moshe, if your solution is to turn the FAQ into a document with a single editor again, I think you're not doing the community a favor. Granted, we could add some more sections (easy enough for me if someone tells me the new section headings and which existing questions go where) and there is a lot of obsolete information. But I would be very hesitant to drop the notion of maintaining the FAQ as a group collaboration project. There's nothing wrong with the FAQ wizard except that the password (Spam) should be made publicly known... I've also noticed that Bjorn Pettersen has made a whole slew of useful updates to various sections, mostly updates about new 2.0 features or syntax. > Having used Perl > in the last couple of weeks, I learned to appreciate the fact that > the FAQ is a standard part of the documentation. Does that mean more than that it should be linked to from http://www.python.org/doc/ ? It's already there in the side bar; does it need a more prominent position? I used to include the FAQ in Misc/ (Ping's Misc/faq2html.py script is a last remnant of that), but gave up after realizing that the on-line FAQ is much more useful than a single text file. In my eyes, the best thing you (and everyone else) could do, if you find the time, would be to use the FAQ wizard to fix or delete out-of-date entries. To delete an entry, change its subject to "Deleted" and remove its body; I'll figure out a way to delete them from the index. Because FAQ entries can refer to each other (and are referred to from elsewhere) by number, it's not safe to simply renumber entries. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Mon Jan 1 20:27:37 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 1 Jan 2001 15:27:37 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: <200101011858.NAA09263@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Thomas just checked this in, using Tim's words: [ The optional \keyword{else} clause is executed when no exception occurs in the \keyword{try} clause. Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. vs The optional \keyword{else} clause is executed when the \keyword{try} clause terminates by any means other than an exception or executing a \keyword{return}, \keyword{continue} or \keyword{break} statement. Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. ] > How is this different from "when control flow reaches the end of the > try clause", which is what I really had in mind? Only in that it doesn't appeal to a new undefined phrase, and is (I think) unambiguous in the eyes of a non-specialist reader (like Robin's friend). Note that "reaching the end of the try clause" is at best ambiguous, because you *really* have in mind "falling off the end" of the try clause. It wouldn't be unreasonable to say that in: try: x = 1 y = 2 return 1 "x=1" is the beginning of the try clause and "return 1" is the end. So if the reader doesn't already know what you mean, saying "the end" doesn't nail it (or, if like me, the reader does already know what you mean, it doesn't matter one whit what it says ). > Using the current wording, this paragraph would have to be > changed each time a new control-flow keyword is added. Based > upon the historical record that's not a grave concern ;-), It was sure no concern of mine ... > but I think the new wording relies too much on accidentals such > as the fact that these are the only control flow altering events. > > It may be that control flow is not rigidly defined -- but as it is > what was really intended, maybe the fix should be to explain the > right concept rather than the current ad-hoc solution. > ... OK, except I don't know how to do that succinctly. For example, if Java had an "else" clause, the Java spec would say: If present, the "else block" is executed if and only if execution of the "try block" completes normally, and then there is a choice: If the "else block" completes normally, then the "try" statement completes normally. If the "else block" completes abruptly for reason S, then the "try" statement completes abruptly for reason S. That is, they deal with control-flow issues via appeal to "complete normally" and "complete abruptly" (which latter comes in several flavors ("reasons"), such as returns and exceptions), and there are pages and pages and pages of stuff throughout the spec inductively defining when these conditions obtain. It's clear, precise and readable; but it's also wordy, and we don't have anything similar to build on. As a compromise, given that we're not going to take the time to be precise (well, I'm sure not ...): The optional \keyword{else} clause is executed if and when control flows off the end of the \keyword{try} clause.\foonote{In Python 2.0, control "flows off the end" except in case of exception, or executing a \keyword{return}, \keyword{continue} or \keyword{break} statement.} Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. Now it's all of imprecise, almost precise, specific to Python 2.0, and robust against any future changes . From akuchlin@mems-exchange.org Mon Jan 1 20:35:27 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 1 Jan 2001 15:35:27 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <200101011949.OAA09804@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 01, 2001 at 02:49:24PM -0500 References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> <200101011949.OAA09804@cj20424-a.reston1.va.home.com> Message-ID: <20010101153527.A14116@newcnri.cnri.reston.va.us> On Mon, Jan 01, 2001 at 02:49:24PM -0500, Guido van Rossum wrote: >But I would be very hesitant to drop the notion of maintaining the FAQ >as a group collaboration project. There's nothing wrong with the FAQ >wizard except that the password (Spam) should be made publicly known... Why multiply the number of mechanisms required to maintain things? We already use CVS for other documentation; why not use it for the FAQ as well? --amk From tim.one@home.com Mon Jan 1 21:00:36 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 1 Jan 2001 16:00:36 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20010101153527.A14116@newcnri.cnri.reston.va.us> Message-ID: [Andrew Kuchling] > Why multiply the number of mechanisms required to maintain things? > We already use CVS for other documentation; why not use it for the > FAQ as well? The search facilities of the FAQ wizard are invaluable, and so is the ability for "just users" to update the info from within their browsers. There are two problems with the FAQ in practice: 1. It doesn't get updated enough. We can't fix that by making it harder to update! 2. It's *only* available via the web interface. We should ship a text or HTML snapshot with releases; perhaps even do the usual Usenet periodic FAQ-posting thing. From tim.one@home.com Mon Jan 1 22:34:03 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 1 Jan 2001 17:34:03 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > But is everyone's first thought to time the speed of Python vs. Perl? It's few peoples' first thought. It's impossible for bilingual programmers (or dabblers, or evaluators) not to notice *soon*, though, because: > Why does it hurt so much that this is a bit slow? Factors of 2 to 5 aren't "a bit" -- they're obvious when they happen, but the *cause* is not. To judge from a decade of c.l.py gripes, most people write it off to "huh -- guess Python is just slow"; the rest eventually figure out that their text input is the bottleneck (Tom Christiansen never got this far <0.5 wink>), but then don't know what to do about it. At this point I'm going to insert two anonymized pvt emails from last year: -----Original Message #1 ----- From: TTT Sent: Monday, March 13, 2000 2:29 AM To: GGG Subject: RE: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison GGG, note especially figure 4 in Lutz Prechelt's report: > http://wwwipd.ira.uka.de/~prechelt/Biblio/#jccpprtTR The submitted Python programs had by far the largest variability in how long it took to load the dictionary. My input loop is probably typical of the "fast" Python programs, which indeed beat most (but not all) of the fastest Perl ones here: class Dictionary: ... def fill_from_file(self, f, BUFFERSIZE=500000): """f, BUFFERSIZE=500000 -> fill dictionary from file f. f must be an open file, or other object with a readlines() method. It must contain one word per line. Optional arg BUFFERSIZE is used to chunk up input for efficiency, and is roughly the # of bytes read at a time. """ addword = self.addword while 1: lines = f.readlines(BUFFERSIZE) if not lines: break for line in lines: addword(line[:-1]) # chop trailing newline Comparable Perl may have been the one-liner: grep(&addword, chomp(<>)); which may account for why Perl's memory use was uniformly higher than Python's. Whatever, you really need to be a Python expert to dream up "the fast way" to do Python input! Hire me, and I'll fix that . nothing-like-blackmail-before-going-to-bed-ly y'rs - TTT -----Original Message #2 ----- From: GGG Sent: Monday, March 13, 2000 7:08 AM To: TTT Subject: Re: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison Agreed. readlines(BUFFERSIZE) is a crock. In fact, ``for i in f.readlines()'' should use lazy evaluation -- but that will have to wait for Py3K unless we add hints so that readlines knows it is being called from a for loop. --GGG -----Back to 2001 ----- I took TTT's advice and read Lutz's report . I agree with GGG that hiding this in .readlines() would be maximally elegant. xreadlines supplies most of the lazy machinery GGG favored. I don't know how hard it would be to supply the rest of it, but it's such a frequent bitching point that I would prefer pointing people to an explicit .xreadlines() hack than either (a) try to convince them that they "shouldn't" care about the speed as much as they claim to; or, (b) try to explain the double-loop buffering method. I'd personally rather use an explicit .xreadlines() hack than code the double-loop buffering too, and don't see an obvious way to do better than that right now. >> reading-text-files-is-very-common-ly y'rs - tim > So is worrying about performance without a good reason... Indeed it is. I'm persuaded that many people making this specific complaint have a legitimate need for more speed, though, and that many don't persist with Python long enough to find out how to address this complaint (because the double-loop method is too obscure for a newbie to dream up). That makes this hack score extraordinarily high on my benefit/harm ratio scale (in P3K xreadlines can be deprecated in favor of readlines <0.9 wink>). heck-it-doesn't-even-require-a-new-keyword-ly y'rs - tim From thomas@xs4all.net Mon Jan 1 22:46:45 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 1 Jan 2001 23:46:45 +0100 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101011935.OAA09728@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 01, 2001 at 02:35:24PM -0500 References: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <20010101234645.B5435@xs4all.nl> On Mon, Jan 01, 2001 at 02:35:24PM -0500, Guido van Rossum wrote: [ Python lacks a One True Way of doing Perl's 'while(<>)' ] > > Does someone have an elegant way to address this? I believe Jeff's shot at > > elegance was the other part of the patch, using (his new) xreadlines under > > the covers to speed the fileinput module. > But of course suggesting fileinput is also not a great solution -- > it's relatively obscure (since it's not taught by most tutorials, > certainly not by the standard tutorial). Is fileinput really obscure ? I personally quite like it. It is enough like the perl idiom to be very useful for people thinking that way, and it doesn't require special syntax or considerations. If tutorialization is the only problem, I'd be happy to fix that, provided Fred or Moshe can TeX my fix up. As for speed (which stays a secondary or tertiary consideration at best) do we really need the xreadlines method to accomplish that ? Couldn't fileinput get almost the same performance using readlines() with a sizehint ? I personally don't like the xreadlines because it adds yet another function to do the same, with a slight, subtle and to the untrained programmer unclear distinction from the rest. (I don't really like the range/xrange difference either -- I think Python code shouldn't care whether they're dealing with a real list or a generator, and as much as possible should just be generators. And in the case of simple (x)range()es, I have yet to see a case where a 'real' list had significantly better performance than a generator.) If we *do* start adding methods to (the public API of) filemethods, I think we should consider more than just xreadlines() (I seem to recall other proposals, but my memory is hazy at the moment -- I haven't slept since last millennium) add whatever is necessary, and provide a UserFile in the std. lib that 'emulates' all fileobject functionality using a single readline() function. Now, if you'll excuse me, I have a date with a soft bed I haven't seen in about 40 hours, a pair of aspirin my head is killing for and probably a hangover that I don't want to think about, right now ;) Gelukkig-Nieuwjaar-iedereen-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jepler@inetnebr.com Tue Jan 2 01:49:35 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Mon, 1 Jan 2001 19:49:35 -0600 Subject: [Python-Dev] Re: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: ; from Tim Peters on Mon, Jan 01, 2001 at 02:20:53PM -0500 Message-ID: <20010101194935.19672@falcon.inetnebr.com> I'd like to speak up about this patch I've submitted on sourceforge. I consider the xreadlines function/object to be the core of my proposal. The addition of a method to file objects, as well as the modifications to fileinput, are secondary in my opinion. The desire is to iterate over file conents in a way that satisfies the following criteria: * Uses the "for" syntax, because this clearly captures the underlying operation. (files can be viewed as sequences of lines when appropriate) * Consumes small amounts of memory even when the file contents are large. * Has the lowest overhead that can reasonably be attained. I think that it is agreed that the ability to use the "for" syntax is important, since it was the impetus for the xrange function/object. After all, there's a "while" statement which will give the same effect, without introducing xrange. The point under debate, as I see it, is the utility of speeding up the "benchmarks" of folks who compare the speed of Python and another language doing a very simple loop over the lines in a file. Since this advantage disappears once real work is beig done on the file, maybe an XReadLines class, written in Python, would be more suitable. In fact, I've written such a class since I didn't know about fileinput and in any case I find it less useful to me because of all the weird stuff it does. (parsing argv, opening files by name, etc) One shortcoming of my current patch, aside from the ones already named in another person's response to the it, are that it fails when working on a file-like class which implements .readline but not .readlines. In any case, I wrote xreadlines to learn how to write C extensions to Python, and submitted it at the suggestion of a fellow Python user in a private discussion. I'd like to extinguish one of these eternal comp.lang.python threads with it too, but maybe it's not to be. Happy new year, all. Jeff From gstein@lyra.org Tue Jan 2 03:34:31 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 1 Jan 2001 19:34:31 -0800 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20010101153527.A14116@newcnri.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Mon, Jan 01, 2001 at 03:35:27PM -0500 References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> <200101011949.OAA09804@cj20424-a.reston1.va.home.com> <20010101153527.A14116@newcnri.cnri.reston.va.us> Message-ID: <20010101193431.M10567@lyra.org> On Mon, Jan 01, 2001 at 03:35:27PM -0500, Andrew Kuchling wrote: > On Mon, Jan 01, 2001 at 02:49:24PM -0500, Guido van Rossum wrote: > >But I would be very hesitant to drop the notion of maintaining the FAQ > >as a group collaboration project. There's nothing wrong with the FAQ > >wizard except that the password (Spam) should be made publicly known... > > Why multiply the number of mechanisms required to maintain things? We > already use CVS for other documentation; why not use it for the FAQ as > well? That would limit the updaters to just those with CVS access. As Guido just pointed out, Bjorn made a bunch of updates. And he didn't need CVS to do that... Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Tue Jan 2 03:44:05 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 1 Jan 2001 22:44:05 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101194935.19672@falcon.inetnebr.com> Message-ID: [Jeff Epler] > I'd like to speak up about this patch I've submitted on sourceforge. I'm not sure that's allowed . > ... > The point under debate, as I see it, is the utility of speeding > up the "benchmarks" of folks who compare the speed of Python and > another language doing a very simple loop over the lines in a file. If that were true, I couldn't care less. > Since this advantage disappears once real work is being done on > the file, ... I agree that's true, but submit it's rarely relevant. *Most* file-crunching apps are dominated by I/O time, which is why this is so visible to so many; e.g., chewing over massive log files looking for patterns appears to be the growth industry of the 21st century . Even in Lutz's report (see reference from earlier mail), where the task to be solved was far from trivial, input time exceeded processing time across all languages (with some oddball exceptions, when the coder neglected to use a hash table to store info). That's thoroughly typical of real file-crunching applications, in my experience: Perl has a killer speed advantage in the single most time-consuming portion of the app, and due to one implementation trick. Take that advantage away, and Python holds its own in this domain. Coincidentally, I got pvt email from a newbie today, reading in part; > If Perl wasn't so gosh darn good and fast at text scrubbing, it > wouldn't really be a consideration, it's syntax is so clunky and > hard to learn by comparison to both Python and Ruby. This is just depressing, because I can predict every step of this dance. > ... > Happy new year, all. And to you! Just make sure it's a fast new year . From moshez@zadka.site.co.il Tue Jan 2 15:24:40 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 2 Jan 2001 17:24:40 +0200 (IST) Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101234645.B5435@xs4all.nl> References: <20010101234645.B5435@xs4all.nl>, <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <20010102152440.9C26DA84F@darjeeling.zadka.site.co.il> On Mon, 1 Jan 2001, Thomas Wouters wrote: > As for speed (which stays a secondary or tertiary consideration at best) do > we really need the xreadlines method to accomplish that ? Couldn't fileinput > get almost the same performance using readlines() with a sizehint ? I me too Adding xreadlines() to the interface would break half a dozen file-objects all around the world (just the standard library has StringIO, cStringIO, GzipFile and probably some others I can't remember) Adding .readlines(sizehint) to fileinput, and adding a function to create something similar to fileinput from a file object (as opposed to a file name) would help everyone, and doesn't seem to hard. Is there a gotcha I'm just not seeing? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one@home.com Tue Jan 2 08:06:32 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 03:06:32 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101234645.B5435@xs4all.nl> Message-ID: [Thomas Wouters] > ... > As for speed (which stays a secondary or tertiary consideration > at best) do we really need the xreadlines method to accomplish > that ? Couldn't fileinput get almost the same performance using > readlines() with a sizehint ? There was a long email discussion among Jeff, Paul Prescod, Neel Krishnaswami, and Alex Martelli about this. I started getting copied on it somewhere midstream, but didn't have time to follow it then (like I do now ). About two weeks ago Neel summarized all the approaches then under discussion: """ [Neel Krishnaswami] ... Quick performance summary of the current solutions: Slowest: for line in fileinput.input('foo'): # Time 100 : while 1: line = file.readline() # Time 75 : for line in LinesOf(open('foo')): # Time 25 Fastest: for line in file.readlines(): # Time 10 while 1: lines = file.readlines(hint) # Time 10 for line in xreadlines(file): # Time 10 The difference in speed between the slowest and fastest is about a factor of 10. LinesOf is Alex's Python wrapper class that takes a file and uses readlines() with a size-hint to present a sequence interface. It's around half as fast as the fastest idioms, and 3-4 times faster than while 1:. Jeff's xreadlines is essentially the same thing in C, and is indistinguishable in performance from the other fast idioms. ... """ On his box, line-at-a-time is >7x slower than the fastest Python methods, which latter are usually close (depending on the platform) to Perl line-at-a-time speeds. A factor of 7 is too large for most working programmers to ignore in the interest of somebody else's notion of theoretical purity . Seriously, speed is not a secondary consideration to me when the gap is this gross, and in an area so visible and common. Alex's LineOf appears a good predictor for how adding fileinput.readlines(hint) would perform, since it appears to *be* that (except off on its own). Then it buys a factor of 3 over line-at-a-time on Neel's box but leaves a factor of 2.5 on the table. The cause of the latter appears mostly to be the overhead of getting a Python method call into the equation for each line returned. Note that Jeff added .xreadlines() as a file object method at Neel's urging. The way he started this is shown on the last line: a function. If we threw out the fileinput and file method aspects, and just added a new module xreadlines with a function xreadlines, then what? I bet it would become as popular as the string module, and for good reason: it's a specific approach that works, to a specific and common problem. > ... > And in the case of simple (x)range()es, I have yet to see a case > where a 'real' list had significantly better performance than > a generator.) It varies by platform, but I don't think I've heard of variations larger than 20% in either direction. 20% is nothing, though; in *this* case we're talking order of magnitude. That's go/nogo territory. > ... > Gelukkig-Nieuwjaar-iedereen-ly y'rs I understand people are passionate when reality clashes with the dream of a wart-free language, but that's no reason to swear at me . wishing-you-a-happy-new-year-like-a-civilized-man-ly y'rs - tim From paulp@ActiveState.com Tue Jan 2 10:00:46 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 02:00:46 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <3A51A6CE.3B15371D@ActiveState.com> Guido van Rossum wrote: > > ... > > But is everyone's first thought to time the speed of Python vs. Perl? > Why does it hurt so much that this is a bit slow? I want to interject here that I asked Jeff to submit this patch because I don't see it as "a little bit slow." When someone transliterates a program from one scripting language to another and gets a program that is two to five times slower that is a big deal! > But of course suggesting fileinput is also not a great solution -- > it's relatively obscure (since it's not taught by most tutorials, > certainly not by the standard tutorial). Fileinput's primary problem is that IIRC, it is even slower than doing readline yourself! > > reading-text-files-is-very-common-ly y'rs - tim > > So is worrying about performance without a good reason... I don't understand what constitutes good reason. We're talking about a relatively minor change that will speed up thousands of programs, answer a frequently asked question from comp.lang.python, obliterate an obscure idiom and reduce the number of requests for a Python syntax change (assignment expression) all in one bold sweep. It seemed to me as if it was a "pure win." Paul Prescod From paulp@ActiveState.com Tue Jan 2 10:06:24 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 02:06:24 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: <20010101234645.B5435@xs4all.nl>, <200101011935.OAA09728@cj20424-a.reston1.va.home.com> <20010102152440.9C26DA84F@darjeeling.zadka.site.co.il> Message-ID: <3A51A820.50365F02@ActiveState.com> Moshe Zadka wrote: > > ... > > Adding .readlines(sizehint) to fileinput, and adding a function > to create something similar to fileinput from a file object (as opposed > to a file name) would help everyone, and doesn't seem to hard. > Is there a gotcha I'm just not seeing? Fileinput is inherently slow because there are too many layers of Python code. I started to consider ways of inverting the logic so that it only called into Python when it needed to switch files but it would have been a much larger patch than Jeff's and I thought that a conservative approach was important. Fileinput should someday be optimized but we can easily get a low-hanging fruit improvement with Jeff's patch. Paul Prescod From guido@digicool.com Tue Jan 2 14:56:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 09:56:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 03:06:32 EST." References: Message-ID: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Tim's almost as good at convincing me as he is at channeling me! The timings he showed almost convinced me that fileinput is hopeless and xreadlines should be added. But then I wrote a little timer of my own... I am including the timer program below my signature. The test input was the current access_log of dinsdale.python.org, which has about 119 Mbytes and 1M lines (as counted by the test program). I measure about a factor of 2 between readlines with a sizehint (of 1 MB) and fileinput; a change to fileinput that uses readline with a sizehint and in-lines the common case in __getitem__ (as suggested by Moshe), didn't make a difference. Output (the first time is realtime seconds, the second CPU seconds): total 119808333 chars and 1009350 lines count_chars_lines 7.944 7.890 readlines_sizehint 5.375 5.320 using_fileinput 15.861 15.740 while_readline 8.648 8.570 This was on a 600 MHz Pentium-III Linux box (RH 6.2). Note that count_chars_lines and readlines_sizehint use the same algorithm -- the difference is that readlines_sizehint uses 'pass' as the inner loop body, while count_chars_lines adds two counters. Given that very light per-line processing (counting lines and characters) already increases the time considerably, I'm not sure I buy the arguments that the I/O overhead is always considerable. The fact that my change to fileinput.py didn't make a difference suggests that its lack of speed it purely caused by the Python code. Now what to do? I still don't like xreadlines very much, but I do see that it can save some time. But my test doesn't confirm Neel's times as posted by Tim: > Slowest: for line in fileinput.input('foo'): # Time 100 > : while 1: line = file.readline() # Time 75 > : for line in LinesOf(open('foo')): # Time 25 > Fastest: for line in file.readlines(): # Time 10 > while 1: lines = file.readlines(hint) # Time 10 > for line in xreadlines(file): # Time 10 I only see a factor of 3 between fastest and slowest, and readline is only about 60% slower than readlines_sizehint. --Guido van Rossum (home page: http://www.python.org/~guido/) import time, fileinput, sys def timer(func, *args): t0 = time.time() c0 = time.clock() func(*args) t1 = time.time() c1 = time.clock() print "%-20s %6.3f %6.3f" % (func.__name__, t1-t0, c1-c0) def count_chars_lines(fn, bs=1024*1024): nl = 0 nc = 0 f = open(fn, "r") while 1: buf = f.readlines(bs) if not buf: break for line in buf: nl += 1 nc += len(line) f.close() print "total", nc, "chars and", nl, "lines" def readlines_sizehint(fn, bs=1024*1024): f = open(fn, "r") while 1: buf = f.readlines(bs) if not buf: break for line in buf: pass f.close() def using_fileinput(fn): f = fileinput.FileInput(fn) for line in f: pass f.close() def while_readline(fn): f = open(fn, "r") while 1: line = f.readline() if not line: break pass f.close() fn = "/home/guido/access_log" if sys.argv[1:]: fn = sys.argv[1] timer(count_chars_lines, fn) timer(readlines_sizehint, fn, 1024*1024) timer(using_fileinput, fn) timer(while_readline, fn) From guido@digicool.com Tue Jan 2 15:07:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:07:06 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: Your message of "Mon, 01 Jan 2001 15:27:37 EST." References: Message-ID: <200101021507.KAA12796@cj20424-a.reston1.va.home.com> > As a compromise, given that we're not going to take the time to be precise > (well, I'm sure not ...): > > The optional \keyword{else} clause is executed if and > when control flows off the end of the \keyword{try} > clause.\foonote{In Python 2.0, control "flows off the > end" except in case of exception, or executing a > \keyword{return}, \keyword{continue} or \keyword{break} > statement.} > Exceptions in the \keyword{else} clause are not handled by > the preceding \keyword{except} clauses. > > Now it's all of imprecise, almost precise, specific to Python 2.0, and > robust against any future changes . Sounds good to me. The reference to 2.0 could be changed to "Currently". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 2 15:20:11 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:20:11 -0500 Subject: [Python-Dev] Re: curses in the core? In-Reply-To: Your message of "Thu, 28 Dec 2000 18:25:28 EST." <20001228182528.A10743@thyrsus.com> References: <200012282252.XAA18952@loewis.home.cs.tu-berlin.de> <20001228182528.A10743@thyrsus.com> Message-ID: <200101021520.KAA13222@cj20424-a.reston1.va.home.com> > What does being in the Python core mean? There are two potential definitions: > > 1. Documentation says it's available on all platforms. > > 2. Documentation restricts it to one of the three platform groups > (Unix/Windows/Mac) but implies that it will be available on any > OS in that group. > > I think the second one is closer to what application programmers > thinking about which batteries are included expect. But I could be > persuaded otherwise by a good argument. Actually, when *I* have used the term "core" I've typically thought of this as referring to anything that's in the standard source distribution, whether or not it is built on all platforms. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@arctrix.com Tue Jan 2 08:42:30 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 00:42:30 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021456.JAA12633@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 02, 2001 at 09:56:40AM -0500 References: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Message-ID: <20010102004230.A29700@glacier.fnational.com> On Tue, Jan 02, 2001 at 09:56:40AM -0500, Guido van Rossum wrote: > Now what to do? I still don't like xreadlines very much, but I do see > that it can save some time. But my test doesn't confirm Neel's times > as posted by Tim: > > > Slowest: for line in fileinput.input('foo'): # Time 100 > > : while 1: line = file.readline() # Time 75 > > : for line in LinesOf(open('foo')): # Time 25 > > Fastest: for line in file.readlines(): # Time 10 > > while 1: lines = file.readlines(hint) # Time 10 > > for line in xreadlines(file): # Time 10 > > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. Could it be that your using the CVS version of Python which includes Andrew's cool glibc getline enhancement? Neil From guido@digicool.com Tue Jan 2 15:40:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:40:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 00:42:30 PST." <20010102004230.A29700@glacier.fnational.com> References: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> <20010102004230.A29700@glacier.fnational.com> Message-ID: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> [me] > > I only see a factor of 3 between fastest and slowest, and > > readline is only about 60% slower than readlines_sizehint. [Neil] > Could it be that your using the CVS version of Python which > includes Andrew's cool glibc getline enhancement? Bingo! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue Jan 2 16:34:31 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 11:34:31 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: <200101021507.KAA12796@cj20424-a.reston1.va.home.com> Message-ID: >> The optional \keyword{else} clause is executed if and >> when control flows off the end of the \keyword{try} >> clause.\foonote{In Python 2.0, control "flows off the >> end" except in case of exception, or executing a >> \keyword{return}, \keyword{continue} or \keyword{break} >> statement.} >> Exceptions in the \keyword{else} clause are not handled by >> the preceding \keyword{except} clauses. [Guido] > Sounds good to me. The reference to 2.0 could be changed to > "Currently". Cool. See http://sourceforge.net/bugs/?group_id=5470&func=detailbug&bug_id=127098 From tim.one@home.com Tue Jan 2 20:48:08 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 15:48:08 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom Message-ID: test_compare is broken because the expected-output file has bizarre stuff in it like: cmp(2, [1]) = -108 cmp(2, (2,)) = -116 cmp(2, None) = -78 What's up with that? I'll leave test_minidom to someone who thinks they know what it's doing. Both failures are very recent. From tim.one@home.com Tue Jan 2 20:48:09 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 15:48:09 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. [Neil] > Could it be that your using the CVS version of Python which > includes Andrew's cool glibc getline enhancement? [Guido] > Bingo! It's a good thing I haven't yet had time to try any speed tests myself, since I don't have a glibc-enabled platform so Guido and I may have been tempted to disagree about numbers in public . I checked out the source for glibc's getline. It's pulling the same trick Perl uses, copying directly from the stdio buffer when it can, instead of (like Python, and like almost all vendor fgets implementations) doing getc-in-a-loop. The difference is that Perl can't do that without breaking into the FILE* representation in platform-dependent ways. It's a shame that almost all vendors missed that fgets was defined as a primitive by the C committee precisely so that vendors *could* pull this speed trick under the covers. It's also a shame that Perl did it for them . From barry@digicool.com Tue Jan 2 21:56:10 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 2 Jan 2001 16:56:10 -0500 Subject: [Python-Dev] testing, please ignore Message-ID: <14930.20090.283107.799626@anthem.wooz.org> Sorry folks, just making sure things are working again. you-really-didn't-want-email-this-millennium-didja?-ly y'rs, -Barry From guido@python.org Tue Jan 2 20:59:22 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 15:59:22 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 14:59:24 EST." References: Message-ID: <200101022059.PAA14845@cj20424-a.reston1.va.home.com> > [Guido] > > I only see a factor of 3 between fastest and slowest, and > > readline is only about 60% slower than readlines_sizehint. > > [Neil] > > Could it be that your using the CVS version of Python which > > includes Andrew's cool glibc getline enhancement? > > [Guido] > > Bingo! > > It's a good thing I haven't yet had time to try any speed tests myself, > since I don't have a glibc-enabled platform so Guido and I may have been > tempted to disagree about numbers in public . > > I checked out the source for glibc's getline. It's pulling the same trick > Perl uses, copying directly from the stdio buffer when it can, instead of > (like Python, and like almost all vendor fgets implementations) doing > getc-in-a-loop. The difference is that Perl can't do that without breaking > into the FILE* representation in platform-dependent ways. It's a shame that > almost all vendors missed that fgets was defined as a primitive by the C > committee precisely so that vendors *could* pull this speed trick under the > covers. It's also a shame that Perl did it for them . Quite apart from whether we should enable xreadlines(), could you look into doing a similar thing for MSVC stdio? For most Unix platforms, a cop-out answer is "use glibc" -- but for Windows it may pay to do our own hack. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Tue Jan 2 21:06:05 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 2 Jan 2001 16:06:05 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 03:48:09PM -0500 References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> Message-ID: <20010102160605.A5211@kronos.cnri.reston.va.us> On Tue, Jan 02, 2001 at 03:48:09PM -0500, Tim Peters wrote: >into the FILE* representation in platform-dependent ways. It's a shame that >almost all vendors missed that fgets was defined as a primitive by the C >committee precisely so that vendors *could* pull this speed trick under the >covers. It's also a shame that Perl did it for them . So, should Python be changed to use fgets(), available on all ANSI C platforms, rather than the glibc-specific getline()? That would be more complicated than the brain-dead easy course of using getline(), which is obviously why I didn't do it; PyFile_GetLine() had annoyingly complicated logic. When this was discussed in comp.lang.python, someone also mentioned getc_unlocked(), which saves the overhead of locking the stream every time, but that didn't seem a fruitful avenue for exploration. --amk From tim.one@home.com Tue Jan 2 22:00:37 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:00:37 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101022059.PAA14845@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Quite apart from whether we should enable xreadlines(), could you look > into doing a similar thing for MSVC stdio? For most Unix platforms, a > cop-out answer is "use glibc" -- but for Windows it may pay to do our > own hack. There's no question about whether it would pay on Windows, because it pays big for Perl on Windows. The question is about cost. There's no way to *do* it short of the way Perl does it, which is to write a large pile of Windows-specific code (roughly the same size and complexity as the glibc getline implementation -- check it out, it's not trivial, and glibc exploits compiler inlining to make it bearable) relying on reverse-engineered accidents of how MS happens to use all the fields from this undocumented struct (from MS's stdio.h): struct _iobuf { char *_ptr; int _cnt; char *_base; int _flag; int _file; int _charbuf; int _bufsiz; char *_tmpfname; }; typedef struct _iobuf FILE; in their stdio implementation. Else it won't play correctly with MS's stdio. That's A Project. Last year I tried extracting the relevant code from Perl, but, as is usual, gave up after unraveling the third (whatever) layer of mystery macros with no end in sight. I bet it would take me a week. Is it worth that much to you and DC? Since the real Windows experts are hanging out at ActiveState, I bet one of them will volunteer to do it tonight . From tim.one@home.com Tue Jan 2 22:17:14 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:17:14 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010102160605.A5211@kronos.cnri.reston.va.us> Message-ID: [Tim] > It's a shame that almost all vendors missed that fgets was defined > as a primitive by the C committee precisely so that vendors *could* > pull this speed trick under the covers. It's also a shame that Perl > did it for them . [Andrew Kuchling] > So, should Python be changed to use fgets(), available on all ANSI C > platforms, rather than the glibc-specific getline()? That would be > more complicated than the brain-dead easy course of using getline(), > which is obviously why I didn't do it; PyFile_GetLine() had annoyingly > complicated logic. The thrust of my original comment above is that fgets is almost never faster than what Python is doing now, because vendors overwhelmingly do *not* exploit the opportunity the std gave them. So, no, switching to fgets() wouldn't help. > When this was discussed in comp.lang.python, someone also mentioned > getc_unlocked(), which saves the overhead of locking the stream every > time, but that didn't seem a fruitful avenue for exploration. Well, get_unlocked isn't std (not even in C99). Mentioning it did inspire me to discover, however, that while the MS fgets() is the typical "getc in a loop" thing, at least it locks/unlocks the stream once each at function entry/exit, and uses a special MS flavor of getc ("_getc_lk") inside the loop. However, that this helps is an illusion, because the body of their _getc_lk macro is identical to the body of their getc macro. Smells like a bug, or an unfinished project. From paulp@ActiveState.com Tue Jan 2 22:40:39 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 14:40:39 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: Message-ID: <3A5258E7.D52CA2C@ActiveState.com> Tim Peters wrote: > > There's no question about whether it would pay on Windows, because it pays > big for Perl on Windows. The question is about cost. There's no way to > *do* it short of the way Perl does it, which is to write a large pile of > Windows-specific code > ... Since the real Windows experts > are hanging out at ActiveState, I bet one of them will volunteer to do it > tonight . Mark is busy tonight and the Perl guys are still recovering from implementing it the first time. :) Paul From guido@python.org Tue Jan 2 22:46:00 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 17:46:00 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 16:06:05 EST." <20010102160605.A5211@kronos.cnri.reston.va.us> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> Message-ID: <200101022246.RAA16384@cj20424-a.reston1.va.home.com> > On Tue, Jan 02, 2001 at 03:48:09PM -0500, Tim Peters wrote: > >into the FILE* representation in platform-dependent ways. It's a shame that > >almost all vendors missed that fgets was defined as a primitive by the C > >committee precisely so that vendors *could* pull this speed trick under the > >covers. It's also a shame that Perl did it for them . > > So, should Python be changed to use fgets(), available on all ANSI C > platforms, rather than the glibc-specific getline()? That would be > more complicated than the brain-dead easy course of using getline(), > which is obviously why I didn't do it; PyFile_GetLine() had annoyingly > complicated logic. You mean get_line(), which indeed has a complicated API and corresponding logic: the argument may be a max length, or 0 to indicate arbutrary length, or negative to indicate raw_input() semantics. :-( Unfortunately we can't use fgets(), even if it were faster than getline(), because it doesn't tell how many characters it read. On files containing null bytes, readline() is supposed to treat these like any other character; if your input is "abc\0def\nxyz\n", the first readline() call should return "abc\0def\n". But with fgets(), you're left to look in the returned buffer for a null byte, and there's no way (in general) to distinguish this result from an input file that only consisted of the three characters "abc". getline() doesn't seem to have this problem, since its size is also an output parameter. > When this was discussed in comp.lang.python, someone also mentioned > getc_unlocked(), which saves the overhead of locking the stream every > time, but that didn't seem a fruitful avenue for exploration. I've never heard of getc_unlocked; it's not in the (old) C standard. If it's also a glibc thing, I doubt that using it would be faster than getline(). If it's a new C standard (C9x) thing, we'll have to wait. Fred reminded me that for e.g. Solaris, while everybody probably compiles with GCC, that doesn't mean they are using glibc, so in practice getline() will only help on Linux. I'm slowly warming up to xreadlines(), although we must be careful to consider the consequences (do other file-like objects need to support it too?). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue Jan 2 22:46:18 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:46:18 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <3A5258E7.D52CA2C@ActiveState.com> Message-ID: [Tim] > ... Since the real Windows experts are hanging out at ActiveState, > I bet one of them will volunteer to do it tonight . [Paul Prescod] > Mark is busy tonight and the Perl guys are still recovering from > implementing it the first time. :) I'm delighted, then, that you have nothing better to do than tease the decent, hard-working folks on Python-Dev! I'll be up until about 4am -- feel free to submit your patch anytime before then. in-a-pinch-i'll-even-accept-it-tomorrow-ly y'rs - tim From guido@python.org Tue Jan 2 22:53:14 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 17:53:14 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 17:00:37 EST." References: Message-ID: <200101022253.RAA16482@cj20424-a.reston1.va.home.com> > [Guido] > > Quite apart from whether we should enable xreadlines(), could you look > > into doing a similar thing for MSVC stdio? For most Unix platforms, a > > cop-out answer is "use glibc" -- but for Windows it may pay to do our > > own hack. > > There's no question about whether it would pay on Windows, because it pays > big for Perl on Windows. The question is about cost. There's no way to > *do* it short of the way Perl does it, which is to write a large pile of > Windows-specific code (roughly the same size and complexity as the glibc > getline implementation -- check it out, it's not trivial, and glibc exploits > compiler inlining to make it bearable) relying on reverse-engineered > accidents of how MS happens to use all the fields from this undocumented > struct (from MS's stdio.h): > > struct _iobuf { > char *_ptr; > int _cnt; > char *_base; > int _flag; > int _file; > int _charbuf; > int _bufsiz; > char *_tmpfname; > }; > typedef struct _iobuf FILE; > > in their stdio implementation. Else it won't play correctly with MS's > stdio. That's A Project. Last year I tried extracting the relevant code > from Perl, but, as is usual, gave up after unraveling the third (whatever) > layer of mystery macros with no end in sight. I bet it would take me a > week. Is it worth that much to you and DC? Since the real Windows experts > are hanging out at ActiveState, I bet one of them will volunteer to do it > tonight . Yeah. That's too much. Too bad. I'm not holding my breath for ActiveState though. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Tue Jan 2 22:52:58 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 2 Jan 2001 16:52:58 -0600 (CST) Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101022246.RAA16384@cj20424-a.reston1.va.home.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> Message-ID: <14930.23498.53540.401218@beluga.mojam.com> Guido> I'm slowly warming up to xreadlines(), ... I haven't followed this thread closely, and my brain is a bit frazzled at the moment, but is there some fundamental reason that the file object's readlines method can't be made lazy, perhaps only when given a sizehint? Skip From paulp@ActiveState.com Tue Jan 2 22:59:47 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 14:59:47 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> Message-ID: <3A525D63.17ABCC87@ActiveState.com> Skip Montanaro wrote: > > Guido> I'm slowly warming up to xreadlines(), ... > > I haven't followed this thread closely, and my brain is a bit frazzled at > the moment, but is there some fundamental reason that the file object's > readlines method can't be made lazy, perhaps only when given a sizehint? I suggested this at one point but it was pointed out that there is probably a lot of code that works with the resulting list *as a list* i.e. as a random-access, writable sequence object. I really wasn't thrilled with xreadlines at first either...it's the least of all possible evils (including the status quo). Paul From nas@arctrix.com Tue Jan 2 16:09:15 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 08:09:15 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 03:48:08PM -0500 References: Message-ID: <20010102080915.A30892@glacier.fnational.com> On Tue, Jan 02, 2001 at 03:48:08PM -0500, Tim Peters wrote: > test_compare is broken because the expected-output file has bizarre stuff in > it like: > > cmp(2, [1]) = -108 > cmp(2, (2,)) = -116 > cmp(2, None) = -78 > > What's up with that? My fault. I only ran regrtest.py and not "make test". I'm not sure why you say bizarre stuff though. Do you object to testing that 2 is less than None (something that is not part of the language spec) or do you think that the results from cmp() should be clamped between -1 and 1? Neil From guido@python.org Tue Jan 2 23:06:16 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 18:06:16 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 16:52:58 CST." <14930.23498.53540.401218@beluga.mojam.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> Message-ID: <200101022306.SAA16684@cj20424-a.reston1.va.home.com> > I haven't followed this thread closely, and my brain is a bit frazzled at > the moment, but is there some fundamental reason that the file object's > readlines method can't be made lazy, perhaps only when given a sizehint? Yes -- readlines() is documented to return a list, and some people do things to it that require it to be a real list (e.g. sort or reverse it or modify it in place or concatenate it with other lists). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue Jan 2 23:19:14 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 18:19:14 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102080915.A30892@glacier.fnational.com> Message-ID: [Tim] > test_compare is broken because the expected-output file has > bizarre stuff in it like: > > cmp(2, [1]) = -108 > cmp(2, (2,)) = -116 > cmp(2, None) = -78 > > What's up with that? [Neil Schemenauer] > My fault. I only ran regrtest.py and not "make test". Neil, my platform doesn't even *have* a "make": are you saying the test passes for you when you run regrtest.py? That's what I did. > I'm not sure why you say bizarre stuff though. Do you object to > testing that 2 is less than None (something that is not part of the > language spec) Only in part. Lang Ref 2.1.3 (Comparisons) says you can compare them, and guarantees they won't compare equal, but doesn't define it beyond that. If Python actually says "less", fine, we can test for that, although to minimize maintenance down the road it would be better to test for no more than we expect Python to guarantee across releases and implementations (suppose Jython says 2 is greater than None: that's fine too, and it would be better if the test suite didn't say Jython was broken). > or do you think that the results from cmp() should be clamped > between -1 and 1? Not that either ; cmp() isn't documented that way. They're "bizarre" simply because they're not what Python returns! C:\Code\python\dist\src\PCbuild>python Python 2.0 (#8, Dec 17 2000, 01:39:08) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> cmp(2, [1]) -1 >>> cmp(2, (2,)) -1 >>> cmp(2, None) -1 >>> The expected-output file is supposed to match what Python actually does. I have no idea where things like "-108" came from. So things like -108 look bizarre to me. So long as cmp(2, [1]) returns -1 in reality, an expected-output file that claims it returns -108 will never work no matter how you run the tests. One of us is missing something obvious here . From paulp@ActiveState.com Tue Jan 2 23:26:39 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 15:26:39 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> Message-ID: <3A5263AF.CE6C8C81@ActiveState.com> Guido van Rossum wrote: > > ... > > I'm slowly warming up to xreadlines(), although we must be careful to > consider the consequences (do other file-like objects need to support > it too?). The implementation is such that it is pretty easy to add the method to other file-like objects. It is also easy to use the xreadlines module to get the same behavior for objects that do not have the method. Essentially, file.xreadlines is implemented like this: def xreadlines(self): import xreadlines xreadlines.xreadlines(self) Any object can add the method similarly. Paul Prescod From nas@arctrix.com Tue Jan 2 16:51:48 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 08:51:48 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 06:19:14PM -0500 References: <20010102080915.A30892@glacier.fnational.com> Message-ID: <20010102085148.A30986@glacier.fnational.com> On Tue, Jan 02, 2001 at 06:19:14PM -0500, Tim Peters wrote: > Neil, my platform doesn't even *have* a "make": are you saying the test > passes for you when you run regrtest.py? Yes. Isn't checking in code without running regrtest a capital offence? :) > Lang Ref 2.1.3 (Comparisons) says you can compare them, and > guarantees they won't compare equal, but doesn't define it beyond that. Okay, I'll use == rather than cmp(). When I was working on the coercion patch I found cmp() useful. I guess it shouldn't be in the standard test suite, especially since Jython may implement things differently. [Neil] > or, do you think that the results from cmp() should be clamped > between -1 and 1? [Tim] > Not that either ; cmp() isn't documented that way. > > They're "bizarre" simply because they're not what Python returns! They do on my box: Python 2.0 (#19, Nov 21 2000, 18:13:04) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Type "copyright", "credits" or "license" for more information. >>> cmp(1, None) -78 I guess MS uses a different strcmp than GNU. Do you mind trying the attached C code? I get "-78" as output. I should have thought a little more before checking in the patch. -78 is quite obviously a machine/library dependent thing. [Tim again] > One of us is missing something obvious here . I don't know about that. The implementation of coercion and comparison is not simple. I've been studying it for some time now and I obviously still don't know what the hell is going on. AFAICT, the problem is that instances without a comparison method can compare larger or smaller than numbers depending on where in memory the objects are stored. Neil #include #include int main() { printf("%d\n", strcmp("", "None")); } From tim.one@home.com Wed Jan 3 00:30:26 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 19:30:26 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102085148.A30986@glacier.fnational.com> Message-ID: [Neil] > They do on my box: > > Python 2.0 (#19, Nov 21 2000, 18:13:04) > [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> cmp(1, None) > -78 Well, who cares about your silly box ? Messier than I thought! Yes, Windows strcmp is always in {-1, 0, 1}. Rather than run tests, here's the tail end of MS's strcmp.c: if ( ret < 0 ) ret = -1 ; else if ( ret > 0 ) ret = 1 ; return( ret ); Wasted cycles and stupid formatting . > ... > AFAICT, the problem is that instances without a comparison method can > compare larger or smaller than numbers depending on where in memory > the objects are stored. If so, that's a bug ... OK, it *is* a bug, at least in current CVS. Did you cause that, or was it always this way? I was able to provoke this badness: >>> j < c < i 1 >>> j < i 0 >>> i.e. it violates transitivity, and that's never supposed to happen in the absence of user-supplied __cmp__. Here c is an instance of "class C: pass", and i and j are ints. >>> type(i), type(j), type(c) (, , ) >>> i, j, c (999999, 1000000, <__main__.C instance at 00791B7C>) >>> id(i), id(j), id(c) (7941572, 7744676, 7936892) >>> Guido thought he fixed this kind of stuff once (and I believed him ) by treating all numbers as if they had type name "" (i.e., yes, an empty string) when compared to non-numbers. Then the usual "mixed-type comparisons in the absence of __cmp__ compare via type name string" rule ensured that numbers would always compare "less than" instances of any other type. That's the intent of the tail end: else if (vtp->tp_as_number != NULL) vname = ""; else if (wtp->tp_as_number != NULL) wname = ""; /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); of PyObject_Compare. So, in the example above, we *should* have i < c == 1 j < c == 1 j < c < i == 0 Unfortunately, we actually have i < c == 0 in that example. We're apparently not getting to the "number hack" code because c is an instance, and I'll confess up front that my eyes always glazed over long before I got to PyInstance_HalfBinOp <0.half wink>. Whatever, there's at least one bug somewhere in that path! We should have n < i == 1 for any numeric type n and any non-numeric type i (in the absence of user-defined __cmp__). From skip@mojam.com (Skip Montanaro) Wed Jan 3 01:27:03 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 2 Jan 2001 19:27:03 -0600 (CST) Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <3A525D63.17ABCC87@ActiveState.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> <3A525D63.17ABCC87@ActiveState.com> Message-ID: <14930.32743.525564.69044@beluga.mojam.com> Paul> I suggested this at one point but it was pointed out that there is Paul> probably a lot of code that works with the resulting list *as a Paul> list* How about this idea? What if readlines() was allowed to return a lazy evaluator if a sizehint > 0 was given? I only saw one example outside of test cases in the current CVS tree where readlines(sizehint) was used (Tools/idle/GrepDialog.py), and it used it as expected: while 1: block = f.readlines(sizehint) if not block: break for line in block: more stuff My suspicion is that most uses of sizehint will be like this. It hasn't been around all that long in Python-years (since 1.5a2), so there's probably not tons of code to break (I agree the semantics would change), and the majority of code that uses it probably looks like the above, which is almost safe (if it returned "" instead of an empty evaluator when nothing was left to read it would be safe). The advantage would be that the above could become the more obvious for line in f.readlines(sizehint): more stuff and the change to file reading code that is "too slow" becomes much simpler. (Of course, xreadlines() has that advantage as well.) I scanned my own code quickly. I found about 10 uses with sizehint and 300 without. I presume we are talking about 2.1 here. In any case, it seems to me that in Py3k readlines should be lazy. Skip P.S. Why did FileInput class never grow a readlines method? From nas@arctrix.com Tue Jan 2 19:38:53 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 11:38:53 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 07:30:26PM -0500 References: <20010102085148.A30986@glacier.fnational.com> Message-ID: <20010102113853.A31341@glacier.fnational.com> On Tue, Jan 02, 2001 at 07:30:26PM -0500, Tim Peters wrote: > > AFAICT, the problem is that instances without a comparison method can > > compare larger or smaller than numbers depending on where in memory > > the objects are stored. > > If so, that's a bug ... OK, it *is* a bug, at least in current CVS. Did you > cause that, or was it always this way? To quote Bart Simpson: I didn't do it. I'm pretty sure the bug is in PyInstance_DoBinOp. I don't think its worth fixing though. I'm ready to check in my coercion overhaul patch, assuming no veto's from the list. It should fix this bug (and introduce a whole slew of new ones :). Guido suggested that I remove the "number types compare smaller than other types" behavior. What's your take on that? The current patch on SF always uses the type names. It should be easy to implement the old behavior though. Neil From nas@arctrix.com Tue Jan 2 19:48:09 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 11:48:09 -0800 Subject: [Python-Dev] Applying the PEP 208 (coercion overhaul) patch Message-ID: <20010102114809.B31341@glacier.fnational.com> I'm almost ready to apply SF patch #102652. Guido has give the okay assuming there are no objections from the rest of python-dev. The patch is large and modifies some complicated parts of the interpreter. I expect there will be some bugs. If you would like me to wait, speak now. Guido has sent me some comments on the patch today which I plan to review and address tonight. I will probably apply the patch tomorrow evening. Neil From tim.one@home.com Wed Jan 3 03:05:59 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 22:05:59 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102113853.A31341@glacier.fnational.com> Message-ID: [Neil Schemenauer, on a violation of transitivity j < c < i but not j < i] > To quote Bart Simpson: I didn't do it. I'm pretty sure the bug > is in PyInstance_DoBinOp. I don't think its worth fixing though. > I'm ready to check in my coercion overhaul patch, assuming no > veto's from the list. It should fix this bug (and introduce a > whole slew of new ones :). Sounds good to me! > Guido suggested that I remove the "number types compare smaller > than other types" behavior. What's your take on that? The > current patch on SF always uses the type names. It should be > easy to implement the old behavior though. It doesn't matter that they're specifically smaller, it matters that they can't violate transitivity. "numbers compare smaller" was introduced deliberately (by Guido) because, e.g., before that we had 99 < [99] < 99L despite that 99 == 99L, because "int" < "list" < "long int" Even stranger, we had 100 < [99] < 0L < 100 and 100 < [] < -101L < -100 Making numbers compare smaller than other types is one way to ensure stuff like that can't happen; I can't think of a simpler way (although making them compare larger than other types would be equally simple, as would making them compare as if their type name were "Neil" ). From paulp@ActiveState.com Wed Jan 3 03:34:59 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 19:34:59 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> <3A525D63.17ABCC87@ActiveState.com> <14930.32743.525564.69044@beluga.mojam.com> Message-ID: <3A529DE3.D93C3916@ActiveState.com> Skip Montanaro wrote: > >... > > I presume we are talking about 2.1 here. In any case, it seems to me that > in Py3k readlines should be lazy. I agree, but I'm ambivalent about your suggestion for polymorphic return values from readlines(). Yet another option is a "lazy=1" option. Paul Prescod From tim.one@home.com Wed Jan 3 04:33:29 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 2 Jan 2001 23:33:29 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Message-ID: [Guido, writes a timing program] [Jeff, if you weren't copied on all this stuff, you can play catch-up by reading the archives, at http://mail.python.org/pipermail/python-dev/ ] > ... > I am including the timer program below my signature. The test input > was the current access_log of dinsdale.python.org, which has about 119 > Mbytes and 1M lines (as counted by the test program). For a contrast, I cobbled together a large test file out of various chunks of C source, .py source, HTML source, and email archives. I was shooting for the same size you used (~119Mb), but ended up with more than 3x as many lines. > I measure about a factor of 2 between readlines with a sizehint (of 1 > MB) and fileinput; Factor of 7 here (Jeff, NeilS eventually figured out that Guido was using a CVS version of Python that has AndrewK's glibc getline patch, a zippier line-input routine than Python 2.0 has; but it only applies to platforms using glibc). > ... > Output (the first time is realtime seconds, the second CPU seconds): > > total 119808333 chars and 1009350 lines > count_chars_lines 7.944 7.890 > readlines_sizehint 5.375 5.320 > using_fileinput 15.861 15.740 > while_readline 8.648 8.570 > > This was on a 600 MHz Pentium-III Linux box (RH 6.2). total 117615824 chars and 3237568 count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 866 MHz P3 Win98SE, current CVS Python. I have no handy explanation for why clock() and time() differ on my box (Win98 has no notions of "user time" or "CPU time" distinct from clock time). > Note that count_chars_lines and readlines_sizehint use the same > algorithm -- the difference is that readlines_sizehint uses 'pass' as > the inner loop body, while count_chars_lines adds two counters. > > Given that very light per-line processing (counting lines and > characters) already increases the time considerably, I'm not sure I > buy the arguments that the I/O overhead is always considerable. I disagree that this is "very light processing", although I agree it's hard to think of lighter processing : it's a few Python statements per line, which I'd say is pretty *typical* processing. Read a line, run a string find or regexp search on it, test the result, sometimes fiddle the line accordingly and sometimes not. File-crunching apps generally aren't rocket science! For example, I changed count_chars_lines to tally the number of lines containing the string "Guido" instead, and the runtime went up by just 0.8 seconds (BTW, it found 13808 of them ): if you're thinking in C terms, millions of failing searches for "Guido" may seem like more work, but the number of Python stmts executed usually counts more than what the stmts do at the C level. > ... > Now what to do? I still don't like xreadlines very much, but I do > see that it can save some time. But my test doesn't confirm Neel's > times as posted by Tim: > >> Slowest: for line in fileinput.input('foo'): # Time 100 >> : while 1: line = file.readline() # Time 75 >> : for line in LinesOf(open('foo')): # Time 25 >> Fastest: for line in file.readlines(): # Time 10 >> while 1: lines = file.readlines(hint) # Time 10 >> for line in xreadlines(file): # Time 10 > > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. I don't know what Neel used for an input file, or which platform he used either. And this is bound to vary a lot across platforms. As above, I saw a factor of 7 between fastest and slowest and a factor of 3 between readline and readlines_sizehint. BTW, on my platform the Perl script (using a recent ActiveState Windows Perl) open(FILE, "ga.txt"); while () { 1; } ran in about 6 seconds (I never figured how to get Perl to compute usable timings itself)-- substantially faster than even readlines_sizehint! --and changing the body to $nc = $nl = 0; while () { ++$nl; $nc += length; } print "$nc $nl\n"; boosted that to about 8 seconds. So Perl has gotten zippier too over the years. From tim.one@home.com Wed Jan 3 09:32:55 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 3 Jan 2001 04:32:55 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101022253.RAA16482@cj20424-a.reston1.va.home.com> Message-ID: [Guido & Tim, wonder about faking getline-like functionality for Windows] The attached is kinda baffling. The std tests pass with it, and it changes my test timings from: count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 to: count_chars_lines 14.880 14.854 readlines_sizehint 9.280 9.302 using_fileinput 48.610 48.589 while_readline 13.450 13.451 Big win? You bet. But ... The baffling parts: 1. That Perl still takes only 6 seconds in line-at-a-time mode. 2. I originally wrote a getline workalike, instead of building directly into a PyString buffer. That made my test run *slower*, and I'm talking factor of 2, not a yawn. To judge from my usually silent disk (I've got 256Mb RAM on this box), I'm afraid the extra mallocs required may have triggered the horrid Win9x malloc-thrashing problem I wrote about while I was still at Dragon. Consider that another vote for Vlad's PyMalloc -- we've got no handle on x-platform dynamic memory behavior now. Python's destiny is to replace both the platform OS and libc anyway <0.9 wink>. The scary parts: + As the "XXX" comments indicate, this is full of little insecurities. + Another one I just thought of: if the user's last operations on the fp were two or more consecutive ungetc calls, all bets are off. But then MS doesn't define what happens then either. + This is much less ambitious than I recall Perl's code being: it doesn't try to guess anything about the file, and effectively captures only what would happen if you could unroll the guts of a getc-in-a-loop and optimize the snot out of it. The good news is that this means it's much easier to maintain (it touches only two of the MS FILE* fields, and in ways that are pretty obviously correct). The bad news is that this seems also pretty clearly all there *is* to be gotten out of breaking into the FILE* abstraction for the particular test case I'm using; and increasing TUNEME doesn't save any time at all: the sucker is flying at full speed already. + It drops (line-at-a-time) drops to a little under 13 seconds if I comment out the thread macros. + I haven't looked at Perl's implementation in a year, and they must have dreamt up another trick since then. That's a "scary part" indeed to anyone who has ever looked at Perl's implementation. retreating-into-a-fetal-position-ly y'rs - tim Anyone wants to play, the sandbox is fileobject.c. Do two things: insert this new chunk somewhere above get_line: #ifdef MS_WIN32 static PyObject* win32_getline(FILE *fp) { /* XXX ignores thread safety -- but so does MS's getc macro! */ PyObject* v; char* pBuf; /* next free slot in v's buffer */ /* MS's internals are declared in terms of ints, but it's a sure bet * that won't last forever -- use size_t now & live w/ the casting; * ditto for Python's routines */ size_t total_buf_size = 100; size_t free_buf_size = total_buf_size; #define TUNEME 1000 /* how much to boost the string buffer when exhausted */ v = PyString_FromStringAndSize((char *)NULL, (int)total_buf_size); if (v == NULL) return NULL; pBuf = BUF(v); Py_BEGIN_ALLOW_THREADS for (;;) { char ch; size_t ms_cnt; /* FILE->_cnt shadow */ char* ms_ptr; /* FILE->_ptr shadow */ size_t max_to_copy, i; /* stdio buffer empty or in unknown state; rather * than try to simulate every quirk of MS's internals, * let the MS macros deal with it. */ /* XXX we also wind up here when we simply run out of string * XXX buffer space, but I'm not sure I care: making this a * XXX double-nested loop doesn't seem worth it */ ch = getc(fp); if (ch == EOF) break; /* make sure we've got some breathing room */ if (free_buf_size < 100) { size_t currentoffset = pBuf - BUF(v); total_buf_size += TUNEME; /* XXX check for overflow */ Py_BLOCK_THREADS if (_PyString_Resize(&v, (int)total_buf_size) < 0) return NULL; Py_UNBLOCK_THREADS pBuf = BUF(v) + currentoffset; free_buf_size = TUNEME; } /* ch wasn't EOF, so store it */ *pBuf++ = ch; --free_buf_size; if (ch == '\n') { break; } ms_cnt = (size_t)fp->_cnt; if (!ms_cnt) { /* XXX this is a slow way to read one character at * XXX a time if, e.g., the stream is unbuffered */ continue; } /* payback! now we don't have to check for buffer overflows or * EOF inside the loop, nor does the macro _filbuf() branch force * _ptr and _cnt in and out of memory on each iteration */ ms_ptr = fp->_ptr; assert(ms_cnt > 0); i = max_to_copy = ms_cnt < free_buf_size ? ms_cnt : free_buf_size; do { /* XXX unclear to me why MS's getc macro does "& 0xff" */ *pBuf++ = ch = *ms_ptr++ & 0xff; } while (--i && ch != '\n'); /* update the shadows & counters */ fp->_ptr = ms_ptr; free_buf_size -= max_to_copy - i; fp->_cnt = ms_cnt - (max_to_copy - i); if (ch == '\n') break; } Py_END_ALLOW_THREADS _PyString_Resize(&v, pBuf - BUF(v)); return v; } #endif 2. Within get_line, add this before the #endif (this is the getline #if block): #elif defined(MS_WIN32) if (n == 0) { return win32_getline(fp); } From ping@lfw.org Wed Jan 3 11:40:47 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 3 Jan 2001 05:40:47 -0600 (CST) Subject: [Python-Dev] inspect.py In-Reply-To: <14840.19556.127151.457533@anthem.concentric.net> Message-ID: Uh... hi. I know i've all but dropped out of existence for a long time, what with my simultaneous first stints as a grad student, a teaching assistant, and a house cook (!) and all, but i didn't want to let this work go to waste. Now that the holidays are here i can *finally* try to get some work done! So, i've updated inspect.py in response to Barry's comments, and below is my reply to this old thread. I also wrote some regression tests. I tried to submit inspect.py to SourceForge, but i got: ERROR Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 Does anyone know what's going on with that? Anyway, the latest module and regression tests are available at: http://www.lfw.org/python/inspect.py http://www.lfw.org/python/test_inspect.py for your perusal. On Thu, 26 Oct 2000 barry@wooz.org wrote: > Some thoughts after an initial scan of inspect.py: > > - The doc strings for the is*() functions aren't accurate. > E.g. ismodule() says that it asks whether "the object is a module > with the __file__ special attribute", but that isn't really what it > tests! Guido points out that builtin modules don't currently have > __file__ and besides, you're really testing that the type of the > object is ModuleType. Perhaps a different wording would be better, but i should at least clarify the intention: i wrote them that way because it seemed that the current objects export an unofficial "interface" by means of the special attributes they provide. The purpose of the "is*()" functions is to determine whether an object meets one of these interfaces. A complete interface would provide (1) a type-checker, (2) a constructor, and (3) the methods. As for (2), we don't normally allow construction of these things (except for wizards using the newmodule). As for (3), i suppose that one could further encapsulate these interfaces by providing spelled-out methods like "def getcode(f): return f.func_code", but it didn't seem worth the trouble. So that left just (1), and i had the other parts in mind while trying to describe (1). The type-checkers aren't of much use unless they accurately reflect the availability of the special attributes. Do you see what i'm trying to do? Maybe you can suggest a better way of doing it... anyway, i've tried to compromise in the docstrings as submitted. > - Don't make the predicate in getmembers() default to "lambda x: 1" > Instead make the default None, and skip the predicate test if it is > None. Okay, fine. > - getdoc()'s docstring should describe the margin munging it does. Okay, done. > - findsource() seems off-by one, e.g. > > >>> x = inspect.findsource(inspect.findsource) > >>> x[1] > 138 > > but the function really stars on line 139. 138 was the intended result here. Indeed the function starts on line 139 if you start counting from 1. The reason it returns 138 is that it's the index you would use for the array of lines (thus x[0][x[1]] or file.readlines()[138] is the first line of the function). Which way makes more sense? Should it be changed? > - I notice that currentframe() still uses the try/except trick to get > the frame object. It's much more efficient to provide a C > trampoline for getting that information. Sure, if there's a faster way, that's fine. It just wasn't something i expected to be used really often, and i wanted to write the module in pure Python so it could be easily maintained. I added a line to clobber the pure-Python currentframe() with sys._getframe() if it exists. > - If this were included in the library, we might want to 2.0-ify it. It currently doesn't rely on any 2.0 features, and it would be kind of nice to have it still work with 1.5 (especially if it is part of a drop-in documentation tool, as it is now, since it goes with htmldoc). -- ?!ng "Computers are useless. They can only give you answers." -- Pablo Picasso From guido@python.org Wed Jan 3 12:06:33 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 07:06:33 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Apparently getc_unlocked() is in the Single Unix spec. Not sure how widespread that is -- do Linux developers pay attention to this standard at all? According to the webpage it's (c) 1997. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 03 Jan 2001 10:58:44 +0200 From: Erno Kuusela To: guido@python.org Subject: getc_unlocked note hello, i was reading the python-dev archives and saw that someone had noticed my getline/getc_unlocked post from the newsgroup. a correction to the python-dev thread: getc_unlocked and friends are infact standard (not c99 though since c99 doesn't specify threads); they are part of the single unix specification. link: http://www.opennc.org/onlinepubs/007908799/xsh/getc_unlocked.html -- erno ------- End of Forwarded Message From guido@python.org Wed Jan 3 12:37:11 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 07:37:11 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 03 Jan 2001 04:32:55 EST." References: Message-ID: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> > 1. That Perl still takes only 6 seconds in line-at-a-time mode. Are you sure Perl still uses stdio at all? If so, does it open the file in binary or in text mode? Based on the APIs in MS's libc, I presume that the crlf->lf translation is not done by stdio proper but by the Unix I/O emulation just underneath it (open() has an O_BINARY option flag, so read() probably does the translation). That comes down to copying most bytes an extra time. (To test this hypothesis, you could try to open the test file with mode "rb" and see if it makes a difference.) > 2. I originally wrote a getline workalike, instead of building > directly into a PyString buffer. That made my test run *slower*, > and I'm talking factor of 2, not a yawn. To judge from my usually > silent disk (I've got 256Mb RAM on this box), I'm afraid the extra > mallocs required may have triggered the horrid Win9x > malloc-thrashing problem I wrote about while I was still at Dragon. > Consider that another vote for Vlad's PyMalloc -- we've got no > handle on x-platform dynamic memory behavior now. Python's destiny > is to replace both the platform OS and libc anyway <0.9 wink>. > > The scary parts: > > + As the "XXX" comments indicate, this is full of little > insecurities. My biggest worry: thread-safety. There must be a way to lock the file (you indicated that fgets() uses it). > + Another one I just thought of: if the user's last operations on > the fp were two or more consecutive ungetc calls, all bets are off. > But then MS doesn't define what happens then either. Python doesn't have an interface to ungetc(), and I believe the stdio standard says you can only call ungetc() once consecutively. Assuming other C code linked with Python obeys this rule (a pretty safe assumption), we should be fine. And if the assumption is violated, I presume it's really that C code's fault -- plus, it code that only uses getc() would be screwed just as badly. > + This is much less ambitious than I recall Perl's code being: it > doesn't try to guess anything about the file, and effectively > captures only what would happen if you could unroll the guts of a > getc-in-a-loop and optimize the snot out of it. The good news is > that this means it's much easier to maintain (it touches only two of > the MS FILE* fields, and in ways that are pretty obviously correct). > The bad news is that this seems also pretty clearly all there *is* > to be gotten out of breaking into the FILE* abstraction for the > particular test case I'm using; and increasing TUNEME doesn't save > any time at all: the sucker is flying at full speed already. You probably don't have many lines longer than 1000 characters. > + It drops (line-at-a-time) drops to a little under 13 seconds if I > comment out the thread macros. If you mean the Py_BLOCK_THREADS around the resize, that can be safely dropped. (If/when we introduce Vladimir's malloc, we'll have to decide whether it is threadsafe by itself or whether it requires the global interpreter lock. I vote to make it threadsafe by itself.) > + I haven't looked at Perl's implementation in a year, and they must > have dreamt up another trick since then. That's a "scary part" > indeed to anyone who has ever looked at Perl's implementation. > > retreating-into-a-fetal-position-ly y'rs - tim > > > Anyone wants to play, the sandbox is fileobject.c. Do two things: > insert this new chunk somewhere above get_line: > > #ifdef MS_WIN32 > static PyObject* > win32_getline(FILE *fp) > { > /* XXX ignores thread safety -- but so does MS's getc macro! */ > PyObject* v; > char* pBuf; /* next free slot in v's buffer */ > /* MS's internals are declared in terms of ints, but it's a sure bet > * that won't last forever -- use size_t now & live w/ the casting; > * ditto for Python's routines > */ > size_t total_buf_size = 100; > size_t free_buf_size = total_buf_size; > #define TUNEME 1000 /* how much to boost the string buffer when exhausted */ > > v = PyString_FromStringAndSize((char *)NULL, (int)total_buf_size); > if (v == NULL) > return NULL; > pBuf = BUF(v); > Py_BEGIN_ALLOW_THREADS > for (;;) { > char ch; > size_t ms_cnt; /* FILE->_cnt shadow */ > char* ms_ptr; /* FILE->_ptr shadow */ > size_t max_to_copy, i; > /* stdio buffer empty or in unknown state; rather > * than try to simulate every quirk of MS's internals, > * let the MS macros deal with it. > */ > /* XXX we also wind up here when we simply run out of string > * XXX buffer space, but I'm not sure I care: making this a > * XXX double-nested loop doesn't seem worth it > */ > ch = getc(fp); > if (ch == EOF) > break; > /* make sure we've got some breathing room */ > if (free_buf_size < 100) { > size_t currentoffset = pBuf - BUF(v); > total_buf_size += TUNEME; /* XXX check for overflow */ > Py_BLOCK_THREADS > if (_PyString_Resize(&v, (int)total_buf_size) < 0) > return NULL; > Py_UNBLOCK_THREADS > pBuf = BUF(v) + currentoffset; > free_buf_size = TUNEME; > } > /* ch wasn't EOF, so store it */ > *pBuf++ = ch; > --free_buf_size; > if (ch == '\n') { > break; > } > ms_cnt = (size_t)fp->_cnt; > if (!ms_cnt) { > /* XXX this is a slow way to read one character at > * XXX a time if, e.g., the stream is unbuffered > */ > continue; > } > /* payback! now we don't have to check for buffer overflows or > * EOF inside the loop, nor does the macro _filbuf() branch force > * _ptr and _cnt in and out of memory on each iteration > */ > ms_ptr = fp->_ptr; > assert(ms_cnt > 0); > i = max_to_copy = ms_cnt < free_buf_size ? ms_cnt : free_buf_size; Doesn't it make more sense to delay the resize until this point? I don't know how much the character copying accounts for, but I could imagine a strategy based on memchr() and memcpy() that first searches for a \n, and if found, allocates to the right size before copying. Typically, the buffer contains many lines, so this could be optimized into requiring a single exactly-sized malloc() call in the common case (where the buffer doesn't wrap). But possibly scanning the buffer for \n and then copying the bytes separately, even with memcmp() and memcpy(), slows things down too much for this to be faster. > do { > /* XXX unclear to me why MS's getc macro does "& 0xff" */ > *pBuf++ = ch = *ms_ptr++ & 0xff; I know why. getchar() returns an int in the range [-1, 255]. If chars are signed the &0xff is needed else you would get a return in the range [-128, 127] and -1 would be ambiguous (EOF==-1). Not sure if they *are* unsigned on any MS platform -- if they aren't, whoever coded this wasn't thinking -- on the other hand the compiler probagbly optimizes it out. But here since you're copying to another character, it's pointless. > } while (--i && ch != '\n'); > /* update the shadows & counters */ > fp->_ptr = ms_ptr; > free_buf_size -= max_to_copy - i; > fp->_cnt = ms_cnt - (max_to_copy - i); > if (ch == '\n') > break; > } > Py_END_ALLOW_THREADS > _PyString_Resize(&v, pBuf - BUF(v)); > return v; > } > #endif > > 2. Within get_line, add this before the #endif (this is the getline #if block): > > #elif defined(MS_WIN32) > if (n == 0) { > return win32_getline(fp); > } Note that get_line() with negative n could be implemented as get_line(0) with some post processing. This should be done completely separately, in PyFile_GetLine. The negative n case is only used by raw_input() -- it means strip the \n and raise EOFError for EOF, and I expect that this is rarely if ever used in a speed-conscious situation. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jan 3 14:56:31 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 09:56:31 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 03 Jan 2001 07:06:33 EST." <200101031206.HAA19182@cj20424-a.reston1.va.home.com> References: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Message-ID: <200101031456.JAA19990@cj20424-a.reston1.va.home.com> > Apparently getc_unlocked() is in the Single Unix spec. Not sure how > widespread that is -- do Linux developers pay attention to this > standard at all? According to the webpage it's (c) 1997. Erno Kuusela gave me some more info about this; glibc supports it. I did a quick test which suggests that it is a lot faster than regular getc() -- on a small test file it's actually faster than GNU getline(), even with the proper flockfile() / funlockfile() calls. (The test file was 6Mb -- 10 copies of /etc/termcap, which has short lines -- avg 43 chars.) This together with Tim's Win32x specific hacks might be the best we can do for get_line(). However, raw xreadlines is still almost twice as fast, so it's still under consideration. Maybe MS supports a similar unlocked getc macro, and a separate primitive to lock/unlock a file? That would allow more unified code. (Quick research shows that it exists, but only in internal form. We could probably call _lock_file() and _unlock_file(), and define our own getc_lk(), protected by the proper set of macros. This could all be presented by config.h as flockfile(), funlockfile(), and getc_unlocked() macros.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Wed Jan 3 15:27:09 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 10:27:09 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101031206.HAA19182@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 07:06:33AM -0500 References: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Message-ID: <20010103102709.A19451@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 07:06:33AM -0500, Guido van Rossum wrote: >Apparently getc_unlocked() is in the Single Unix spec. Not sure how >widespread that is -- do Linux developers pay attention to this >standard at all? According to the webpage it's (c) 1997. It seems to be in glibc 2.1, but I don't know how much it would help, and the added complexity of having to lock the file separately worries me, perhaps due to a superstitious fear of angering the Thread Gods. --amk From akuchlin@mems-exchange.org Wed Jan 3 15:44:57 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 10:44:57 -0500 Subject: [Python-Dev] Help wanted with setup.py script In-Reply-To: <017201c0759a$c2b180c0$e000a8c0@thomasnotebook>; from thomas.heller@ion-tof.com on Wed, Jan 03, 2001 at 04:35:10PM +0100 References: <200012290046.TAA01346@207-172-57-128.s128.tnt2.ann.va.dialup.rcn.com> <017201c0759a$c2b180c0$e000a8c0@thomasnotebook> Message-ID: <20010103104457.A19493@kronos.cnri.reston.va.us> [Cc'ing to python-dev]. On Wed, Jan 03, 2001 at 04:35:10PM +0100, Thomas Heller wrote: >You didn't expect this script run under windows? >(It does not run) It shouldn't matter, I think, since the makesetup stuff doesn't run on Windows either; presumably the compiled-in modules are specified by an MSVC project file, or something similar. Can anyone confirm that I don't care if setup.py works on Windows? (Well, I *know* for a fact I don't care; but should I? :) ) --amk From guido@python.org Wed Jan 3 15:49:43 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 10:49:43 -0500 Subject: [Python-Dev] Help wanted with setup.py script In-Reply-To: Your message of "Wed, 03 Jan 2001 10:44:57 EST." <20010103104457.A19493@kronos.cnri.reston.va.us> References: <200012290046.TAA01346@207-172-57-128.s128.tnt2.ann.va.dialup.rcn.com> <017201c0759a$c2b180c0$e000a8c0@thomasnotebook> <20010103104457.A19493@kronos.cnri.reston.va.us> Message-ID: <200101031549.KAA20188@cj20424-a.reston1.va.home.com> > It shouldn't matter, I think, since the makesetup stuff doesn't run on > Windows either; presumably the compiled-in modules are specified by an > MSVC project file, or something similar. Can anyone confirm that I > don't care if setup.py works on Windows? (Well, I *know* for a fact I > don't care; but should I? :) ) Personally, I don't think it's worth to make setup.py work for Windows. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Wed Jan 3 20:04:07 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 15:04:07 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: ; from noreply@sourceforge.net on Wed, Jan 03, 2001 at 08:47:30AM -0800 References: Message-ID: <20010103150407.D20301@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 08:47:30AM -0800, GvR wrote: >Summary: speed up readline() using getc_unlocked() So what does the performance of this version look like? --amk From guido@python.org Wed Jan 3 20:25:53 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 15:25:53 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 15:04:07 EST." <20010103150407.D20301@kronos.cnri.reston.va.us> References: <20010103150407.D20301@kronos.cnri.reston.va.us> Message-ID: <200101032025.PAA27457@cj20424-a.reston1.va.home.com> > >Summary: speed up readline() using getc_unlocked() > > So what does the performance of this version look like? Very slightly faster than the GNU getline() version. Without GNU getline, the old code was about 3.5 times slower. Here are the current times on a 6 Mb file (fileinput.py has my sourceforge speedup patch too): $ ./python ~/rltest.py ~/termcapx10 total 6252720 chars and 146250 lines; average line length 42.8 count_chars_lines 0.943 0.930 readlines_sizehint 0.544 0.540 using_fileinput 2.089 2.090 while_readline 0.956 0.960 For comparison, here's what Python 1.5.2 does with the same test (which should be pretty close to what the released Python 2.0 does; I don't have a copy of that handy). $ python1.5 ~/rltest.py ~/termcapx10 total 6252720 chars and 146250 lines; average line length 42.8 count_chars_lines 0.836 0.820 readlines_sizehint 0.523 0.520 using_fileinput 5.739 5.740 while_readline 3.670 3.670 I don't know why count_chars_lines got proportionally more slower than readlines_sizehint. (The += operator didn't make a difference either way.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jan 3 20:45:38 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 15:45:38 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 15:25:53 EST." <200101032025.PAA27457@cj20424-a.reston1.va.home.com> References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> Message-ID: <200101032045.PAA27595@cj20424-a.reston1.va.home.com> I should add that the patches are on SourceForge: fileinput.py: http://sourceforge.net/patch/?func=detailpatch&patch_id=103081&group_id=5470 fileobject.c: http://sourceforge.net/patch/?func=detailpatch&patch_id=103082&group_id=5470 I'm ready to check these in, but I'm waiting 24 hours in case there's something I've missed. (I haven't actually tested these on any other platform besides Linux.) Jeff Epler's xreadlines patch is here: http://sourceforge.net/patch/?func=detailpatch&patch_id=102915&group_id=5470 Note that Jeff's patch includes a patch to fileinput.py that does the same thing as mine but using his xreadlines module instead of directly using readlines(sizehint) as does mine. I like my approach better, mostly because it reduces depenencies. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Wed Jan 3 21:25:30 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 16:25:30 -0500 Subject: [Python-Dev] speed up readline() using getc_unlocked() In-Reply-To: <200101032045.PAA27595@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 03:45:38PM -0500 References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> <200101032045.PAA27595@cj20424-a.reston1.va.home.com> Message-ID: <20010103162530.A20433@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 03:45:38PM -0500, Guido van Rossum wrote: >I'm ready to check these in, but I'm waiting 24 hours in case there's >something I've missed. (I haven't actually tested these on any other >platform besides Linux.) On Solaris 2.6, the configure script doesn't detect that getc_unlocked() & friends are supported; details available from the patch. After editing config.h manually to enable them, the results are: Before getc_unlocked patch: total 1559913 chars and 32513 lines count_chars_lines 0.892 0.730 readlines_sizehint 0.329 0.300 using_fileinput 4.612 4.470 while_readline 2.739 2.670 After patch: total 1559913 chars and 32513 lines count_chars_lines 0.698 0.680 readlines_sizehint 0.273 0.270 using_fileinput 2.707 2.700 while_readline 0.778 0.780 amarok src> With a patched version of fileinput.py: using_fileinput 1.675 1.680 --amk From guido@python.org Wed Jan 3 21:36:07 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 16:36:07 -0500 Subject: [Python-Dev] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 16:25:30 EST." <20010103162530.A20433@kronos.cnri.reston.va.us> References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> <200101032045.PAA27595@cj20424-a.reston1.va.home.com> <20010103162530.A20433@kronos.cnri.reston.va.us> Message-ID: <200101032136.QAA07752@cj20424-a.reston1.va.home.com> > On Solaris 2.6, the configure script doesn't detect that > getc_unlocked() & friends are supported; details available from the > patch. (Fixed now, see the new patch.) > After editing config.h manually to enable them, the results are: > > Before getc_unlocked patch: > total 1559913 chars and 32513 lines > count_chars_lines 0.892 0.730 > readlines_sizehint 0.329 0.300 > using_fileinput 4.612 4.470 > while_readline 2.739 2.670 > > After patch: > total 1559913 chars and 32513 lines > count_chars_lines 0.698 0.680 > readlines_sizehint 0.273 0.270 > using_fileinput 2.707 2.700 > while_readline 0.778 0.780 > amarok src> > > With a patched version of fileinput.py: > using_fileinput 1.675 1.680 Thanks! The bottom line seems to be that your basic readline loop is still 3x as slow as the fastest way -- so there's still a lot to say for xreadlines... --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Jan 3 21:42:48 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 03 Jan 2001 22:42:48 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib codecs.py,1.13,1.14 References: Message-ID: <3A539CD8.367361B8@lemburg.com> "M.-A. Lemburg" wrote: > > Update of /cvsroot/python/python/dist/src/Lib > In directory usw-pr-cvs1:/tmp/cvs-serv26608/Lib > > Modified Files: > codecs.py > Log Message: > ... > > This patch closes the bugs #116285 and #119960. I was too fast... the subject line of #119960 was misleading. It is still open. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one@home.com Wed Jan 3 23:13:15 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 3 Jan 2001 18:13:15 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Are you sure Perl still uses stdio at all? Pretty sure, but there are so many layers of macros the code is undecipherable, and I can't step thru macros in the debugger either (that's assuming I wanted to devote N hours to building Perl from source too -- which I don't). Perl also makes heavy use of macroizing std library names, so e.g. when I see "fopen" (which I do!), that doesn't mean I'm getting the fopen I'm thinking of. But the MSVC config files define all sorts of macros to get at the MS stdio _cnt and _ptr (and most other) FILE* fields, and the version of fopen in the Win32 stuff appears to defer to the platform fopen (after doing Perlish stuff, like if someone passed "/dev/null" as the file name, Perl changes it to "NUL"). This is what it's like: the first line of Perl's win32_fopen is this: dTHXo; That's conditionally defined in perl.h, either as #define dTHXo dTHXoa(PERL_GET_THX) or, if pTHXo is not defined, as # define dTHXo dTHX dTHX in turn is #defined in 4 different places across 3 different files in 2 different directories. I'll skip those. OTOH, dTHXoa is easy! It's only defined once: #define dTHXoa(a) pTHXo = a Ah, *that* clears it up . Etc. 20 years ago I may have thought this was fun. I thought debugging large systems of m4 macros was fun then, and I'm not sure this is either better or worse than that -- well, it's worse, because I understood m4's implementation. > If so, does it open the file in binary or in text mode? Sorry, but I really don't know and it's a pit to pursue. If it's not native text mode, they do a good job of faking it (e.g., Ctrl-Z acts like an EOF when reading a text file from Perl on Windows -- not something even Larry would be likely to do on his own ). > Based on the APIs in MS's libc, I presume that the crlf->lf > translation is not done by stdio proper but by the Unix I/O > emulation just underneath it (open() has an O_BINARY option > flag, so read() probably does the translation). Yes; and late in the last release cycle, import.c's open_exclusive had a Windows bug related to this (fdopen() used "wb", but the earlier open() didn't use O_BINARY, and fdopen *acted* like it had used "w"). Also, the MS setmode() function works on file handles, not streams. > That comes down to copying most bytes an extra time. Understood. But the CRLF are stored physically on disk, so unless the disk controller is converting them, *someone's* software (whether MS's or Perl's) is doing it. By the time Perl is doing its fast line-input stuff, and doing what sure looks like a straight copy out of an IO buffer, it's clear from the code that CRLF has already been translated to LF. > (To test this hypothesis, you could try to open the test file > with mode "rb" and see if it makes a difference.) In Python, that saved about 10% (but got the wrong answers ). In Perl, about 15-20%. But I don't think that tells us who's doing the translation. Assuming that the translation takes about the same total time for each, it makes sense that the percentage would be higher for Perl (since its total runtime is lower: same-sized slice of a smaller pie). > My biggest worry: thread-safety. There must be a way to lock > the file (you indicated that fgets() uses it). Yes, via the unadvertised _lock_str and _unlock_str macros defined in MS mtdll.h, which is not on the include path: /* * This is an internal C runtime header file. It is used when building * the C runtimes only. It is not to be used as a public header file. */ The routines and macros it calls are also unadvertised. After an hour of thrashing I wasn't able to successfully link any code trying to call these routines. Doesn't mean it's impossible, does means they're internal to MS libc and aren't meant to be called by anything else. That's why it's called "cheating" . Perl appears to ignore the whole issue (but Perl's thread story is muddy at best). [... ungetc ...] Not worried here either. > ... > You probably don't have many lines longer than 1000 characters. None, in fact. >> + It drops (line-at-a-time) drops to a little under 13 seconds if I >> comment out the thread macros. > If you mean the Py_BLOCK_THREADS around the resize, that can be safely > dropped. I meant *all* thread-related macros -- was just trying to get a feel for how much that fiddling cost (it's an expense Perl doesn't seem to have -- yet). Was measurable but not substantial. WRT the resize, there's now a "fast path" that avoids it. > (If/when we introduce Vladimir's malloc, we'll have to decide whether > it is threadsafe by itself or whether it requires the global > interpreter lock. I vote to make it threadsafe by itself.) As feared, this thread is going to consume my life <0.5 wink>. > ... > Doesn't it make more sense to delay the resize until this point? I > don't know how much the character copying accounts for, but I could > imagine a strategy based on memchr() and memcpy() that first searches > for a \n, and if found, allocates to the right size before copying. > Typically, the buffer contains many lines, so this could be optimized > into requiring a single exactly-sized malloc() call in the common case > (where the buffer doesn't wrap). But possibly scanning the buffer for > \n and then copying the bytes separately, even with memcmp() and > memcpy(), slows things down too much for this to be faster. Turns out that Perl does very much what I was doing; the Perl code is actually more burdensome, because its routine is trying to deal not only with \n-termination, but also arbitrary-string termination (Perl's Awk-like input record separator), and "paragraph mode", and fixed-size reads, and some other stuff I can't figure out from the macro names. In all cases with a terminator, though, it's doing the same business of both copying and testing in a very tight inner loop. It doesn't appear to make any serious attempts to avoid resizing the buffer. But, Perl has its own malloc routines, and I'm guessing they're highly tuned for this stuff. Since we're stuck with the MS malloc-- and Win9x's in particular seems lame --adding this near the start of my stuff did yield a nice speedup: if (fp->_cnt > 0 && (pBuf = (char *)memchr(fp->_ptr, '\n', fp->_cnt)) != NULL) { /* it's all in the buffer so don't bother releasing the * global lock */ total_buf_size = pBuf - fp->_ptr + 1; v = PyString_FromStringAndSize(fp->_ptr, (int)total_buf_size); if (v != NULL) { pBuf = BUF(v) + total_buf_size; fp->_cnt -= total_buf_size; fp->_ptr += total_buf_size; } goto done; } So that builds the result string directly from the stdio buffer when it can. Times dropped from (before this particular small hack) count_chars_lines 14.880 14.854 readlines_sizehint 9.280 9.302 using_fileinput 48.610 48.589 while_readline 13.450 13.451 to count_chars_lines 14.780 14.784 readlines_sizehint 9.550 9.514 using_fileinput 43.560 43.584 while_readline 10.600 10.578 Since I have no long lines in this test data, and the stdio buffer typically contains thousands of chars, most calls should be satisfied by the fast path. Compared to the previous code, the fast path (1) avoids global lock fiddling (but that didn't account for much in a distinct test); (2) crawls over the buffer twice instead of once; and, (3) avoids one (shrinking!) realloc. So crawling over the buffer an extra time costs nothing compared to the cost of a resize; and that's likely just more evidence that malloc/realloc suck on this platform. CAUTION: no file locking is going on now (because I haven't found a way to do it). My previous claim that the MS getc macro did no locking was wrong, as I discovered by stepping thru the generated machine code. stdio.h #defines getc without locking, but in _MT mode it later gets #undef'ed and turned into a function call. >> /* XXX unclear to me why MS's getc macro does "& 0xff" */ >> *pBuf++ = ch = *ms_ptr++ & 0xff; > I know why. getchar() returns an int in the range [-1, 255]. If > chars are signed the &0xff is needed else you would get a return in > the range [-128, 127] and -1 would be ambiguous (EOF==-1). Bingo -- MS chars are signed. > ... > But here since you're copying to another character, it's pointless. Yup! Gone. > .... > Note that get_line() with negative n could be implemented as > get_line(0) with some post processing. Andrew's glibc getline code appears to have wanted to do that, but looks to me like it's unreachable (unless I'm hallucinating, the "n < 0" test after return from glibc getline can't succeed, because the enclosing block is guarded by an "n==0" test). > This should be done completely separately, in PyFile_GetLine. I assume you have an editor . > The negative n case is only used by raw_input() -- it means strip > the \n and raise EOFError for EOF, and I expect that this is rarely > if ever used in a speed-conscious situation. I've never seen raw_input used except when stdin and stdout were connected to a tty. When I tried raw_input from a DOS box under the debugger, it never called get_line. Something trickier is going on there; I suspect it's actually calling fgets (eventually) instead in that case. more-mysteries-than-i-really-need-ly y'rs - tim From jeremy@alum.mit.edu Thu Jan 4 00:06:58 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 3 Jan 2001 19:06:58 -0500 (EST) Subject: [Python-Dev] Mailman problems? In-Reply-To: References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: <14931.48802.273143.209933@localhost.localdomain> Tim & Barry, It looks like the is some problem with Mailman that is garbling messages to python-dev. It may only affect lines that begin with a tab; not sure. Your most recent message came through with the following line > dTHXo; (This was not the only example.) I think this was supposed to be a line of C code, but whatever meaningful contents it had were rendered as gobbledygook. Jeremy From loewis@informatik.hu-berlin.de Thu Jan 4 00:13:16 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Thu, 4 Jan 2001 01:13:16 +0100 (MET) Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> > Apparently getc_unlocked() is in the Single Unix spec. Not sure how > widespread that is -- do Linux developers pay attention to this > standard at all? Ulrich Drepper, who is in charge of glibc, is always interested in following Single Unix to the letter; getc_unlocked is supported atleast since glibc 2.0. http://www.sun.com/smcc/solaris-migration/docs/courses/threadsHTML/adv.html claims that getc_unlocked is already in POSIX.1c; Solaris apparently supports it atleast since Solaris 2.4. Irix has it since 6.5, Tru64 atleast since 4.0d (probably much longer); HPUX since 11.0, AIX since atleast 4.3. Of the BSDs, only OpenBSD appears to support it; it knows that it is in ANSI 1003.1 since 1996-07-12. SCO OpenServer doesn't support it. Regards, Martin From fredrik@effbot.org Thu Jan 4 00:20:41 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 4 Jan 2001 01:20:41 +0100 Subject: [Python-Dev] Mailman problems? References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> <14931.48802.273143.209933@localhost.localdomain> Message-ID: <011901c075e4$2ce96360$e46940d5@hagrid> > It looks like the is some problem with Mailman that is garbling > messages to python-dev. It may only affect lines that begin with a > tab; not sure. > > Your most recent message came through with the following line > > > dTHXo; > > (This was not the only example.) > > I think this was supposed to be a line of C code, but whatever > meaningful contents it had were rendered as gobbledygook. also looks like Mailman removed all smileys from Jeremys post ;-) From thomas@xs4all.net Thu Jan 4 00:27:54 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 01:27:54 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101040013.BAA13436@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Thu, Jan 04, 2001 at 01:13:16AM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> Message-ID: <20010104012753.D2467@xs4all.nl> On Thu, Jan 04, 2001 at 01:13:16AM +0100, Martin von Loewis wrote: > Of the BSDs, only OpenBSD appears to support it; it knows that it is > in ANSI 1003.1 since 1996-07-12. BSDI supports getc_unlocked() at least since BSDI 3.1. I don't have any older boxes to check, but the manpage for getc and all its friends carries the timestamp 'June 4, 1993', which implies it could have been available a lot longer. (Note that BSD was once known to *define* the standard ;-) I concur that FreeBSD does not currently support getc_unlocked, but since BSDI and FreeBSD are merging, I suspect it will, soonish. In other words: use it! :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry@wooz.org Thu Jan 4 02:59:01 2001 From: barry@wooz.org (Barry A. Warsaw) Date: Wed, 3 Jan 2001 21:59:01 -0500 Subject: [Python-Dev] Re: Mailman problems? References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> <14931.48802.273143.209933@localhost.localdomain> Message-ID: <14931.59125.391596.730296@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> It looks like the is some problem with Mailman that is JH> garbling messages to python-dev. It may only affect lines JH> that begin with a tab; not sure. JH> Your most recent message came through with the following line >> dTHXo; JH> (This was not the only example.) JH> I think this was supposed to be a line of C code, but whatever JH> meaningful contents it had were rendered as gobbledygook. Oh shoot, my bad. I dropped in an experimental Perl filter module in the delivery pipeline. It's been so long since I hacked Perl, I think I meant to write $%_-> when I really wrote %$_-> -Barry From tim.one@home.com Thu Jan 4 04:26:51 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 3 Jan 2001 23:26:51 -0500 Subject: [Python-Dev] RE: Mailman problems? In-Reply-To: <14931.48802.273143.209933@localhost.localdomain> Message-ID: [Jeremy] > It looks like the is some problem with Mailman that is garbling > messages to python-dev. It may only affect lines that begin with a > tab; not sure. > > Your most recent message came through with the following line > >> dTHXo; > > (This was not the only example.) > > I think this was supposed to be a line of C code, but whatever > meaningful contents it had were rendered as gobbledygook. I have no idea where that "o" came from! It was supposed to be "o". Barry, fix it! BTW, the second line of Perl implementation functions is usually a lot less mysterious than the first. If anyone wants the joy of reverse-engineering Perl's supernaturally fast input, it's function Perl_sv_gets in file sv.c. sv.c? Yes! The destination of a one-line input is a Scalar Value, hence, sc. I expect there's similar method behind all of this stuff, but I never stumbled into the key. To get you started, here's the first line of Perl_sv_gets: dTHR; The line you're looking for is 119 lines down from that: if ((*bp++ = *ptr++) == rslast) /* really | dust */ the-comment-makes-more-sense-in-context-ly y'rs - tim From thomas@xs4all.net Thu Jan 4 06:51:17 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 07:51:17 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101040037.TAA08699@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 07:37:22PM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> Message-ID: <20010104075116.J402@xs4all.nl> On Wed, Jan 03, 2001 at 07:37:22PM -0500, Guido van Rossum wrote: > > In other words: use it! :) > > Mind doing a few platform tests on the (new version of the) patch? Well, only a bit :) It's annoying that BSDI doesn't come with autoconf, but I managed to use all my early-morning wit (it's 6:30AM ) to work around it. I've tested it on BSDI 4.1 and FreeBSD 4.2-RELEASE. > I already know that it works on Red Hat Linux 6.2 (my box) and Solaris > 2.6 (Andrew's box). I would be delighted to know that it works on at > least one other platform that has getc_unlocked() and one platform > that doesn't have it! Sorry, I have to disappoint you. FreeBSD does have getc_unlocked, they just didn't document it. Hurrah for autoconf ;P Anyway, it worked like a charm on BSDI: (Python 2.0) total 1794310 chars and 37660 lines count_chars_lines 0.310 0.300 readlines_sizehint 0.150 0.150 using_fileinput 2.013 2.017 while_readline 1.006 1.000 (CVS Python + getc_unlocked) daemon2:~/python/python/dist/src > ./python test.py termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.354 0.350 readlines_sizehint 0.182 0.183 using_fileinput 1.594 1.583 while_readline 0.363 0.367 But something weird is going on on FreeBSD: (Standard CVS Python) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.265 0.266 readlines_sizehint 0.148 0.148 using_fileinput 0.943 0.938 while_readline 0.214 0.219 (CVS+getc_unlocked) > ./python-getc-unlocked ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.266 0.266 readlines_sizehint 0.151 0.141 using_fileinput 1.066 1.078 while_readline 0.283 0.281 This was sufficiently unexpected that I looked a bit further. The FreeBSD Python was compiled without editing Modules/Setup, so it was statically linked, no readline etc, but *with* threads (which are on by default, and functional on both FreeBSD and BSDI 4.1.) Here's the timings after I enabled just '*shared*': (CVS + *shared*) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.276 0.273 readlines_sizehint 0.150 0.156 using_fileinput 0.902 0.898 while_readline 0.206 0.203 (This was not a fluke, I repeated it several times, getting hardly any variation.) Enabling readline and cursesmodule had no additional effect. Adding *shared* to the getc_unlocked tree saw roughly the same improvement, but was still slower than without getc_unlocked. (CVS + *shared* + getc_unlocked) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.272 0.273 readlines_sizehint 0.149 0.148 using_fileinput 1.031 1.031 while_readline 0.267 0.266 Increasing the size of the testfile didn't change anything, other than the absolute numbers. I browsed stdio.h, where both getc() and getc_unlocked() are defined as macros. getc_unlocked is defined as: #define __sgetc(p) (--(p)->_r < 0 ? __srget(p) : (int)(*(p)->_p++)) #define getc_unlocked(fp) __sgetc(fp) and getc either as #define getc(fp) getc_unlocked(fp) (without threads) or static __inline int \ __getc_locked(FILE *_fp) \ { \ extern int __isthreaded; \ int _ret; \ if (__isthreaded) \ _FLOCKFILE(_fp); \ _ret = getc_unlocked(_fp); \ if (__isthreaded) \ funlockfile(_fp); \ return (_ret); \ } #define getc(fp) __getc_locked(fp) _FLOCKFILE(x) is defined as flockfile(x), so that isn't the difference. The speed difference has to be in the quick-and-easy test for whether the locking is even necessary. Starting a thread on 'time.sleep(900)' in test.py shows these numbers: (standard CVS python) > ./python-shared-std ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.433 0.445 readlines_sizehint 0.204 0.188 using_fileinput 1.595 1.594 while_readline 0.456 0.453 (getc_unlocked) > ./python-getc-unlocked-shared ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.441 0.453 readlines_sizehint 0.206 0.195 using_fileinput 1.677 1.688 while_readline 0.509 0.508 So... using getc_unlocked manually for performance reasons isn't a cardinal sin on FreeBSD only if you are really using threads :-) Lets-outsmart-the-OS-scheduler-next!-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Thu Jan 4 07:57:26 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 08:57:26 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test/output test_coercion,1.2,1.3 In-Reply-To: ; from nascheme@users.sourceforge.net on Wed, Jan 03, 2001 at 05:36:27PM -0800 References: Message-ID: <20010104085726.E2467@xs4all.nl> On Wed, Jan 03, 2001 at 05:36:27PM -0800, Neil Schemenauer wrote: > Update of /cvsroot/python/python/dist/src/Lib/test/output > In directory usw-pr-cvs1:/tmp/cvs-serv21710/Lib/test/output > > Modified Files: > test_coercion > Log Message: > Sequence repeat works now for in-place multiply with an integer type > as the left operand. I don't know if this is a feature or a bug. > ! 2 *= [1] => [1, 1] It's a feature. x = 2 * [1] works, so x = 2 x *= [1] does, too. Obviously, '2 *= [1]' shouldn't, but I'm assuming you don't actually execute that (it should give a SyntaxError) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik@effbot.org Thu Jan 4 09:32:55 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 4 Jan 2001 10:32:55 +0100 Subject: [Python-Dev] RE: Mailman problems? References: Message-ID: <00a701c07631$531983b0$e46940d5@hagrid> tim wrote: > I have no idea where that "o" came from! It was supposed to be "o". > Barry, fix it! no need. from the perlguts man page: "You can ignore [pad]THX[xo] when browsing the Perl headers/sources." in-my-dictionary-perl's-an-american-physicist-ly yrs /F From mal@lemburg.com Thu Jan 4 10:02:35 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 04 Jan 2001 11:02:35 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 References: Message-ID: <3A544A3B.32B86792@lemburg.com> Neil Schemenauer wrote: > > Update of /cvsroot/python/python/dist/src/Include > In directory usw-pr-cvs1:/tmp/cvs-serv21006/Include > > Modified Files: > classobject.h > Log Message: > Remove PyInstance_*BinOp functions. > > Index: classobject.h > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Include/classobject.h,v > retrieving revision 2.33 > retrieving revision 2.34 > diff -C2 -r2.33 -r2.34 > *** classobject.h 2000/09/01 23:29:26 2.33 > --- classobject.h 2001/01/04 01:30:34 2.34 > *************** > *** 60,71 **** > extern DL_IMPORT(int) PyClass_IsSubclass(PyObject *, PyObject *); > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > - char *, char *, > - PyObject * (*)(PyObject *, > - PyObject *)); > - > - extern DL_IMPORT(int) > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > - PyObject * (*)(PyObject *, PyObject *), int); Wouldn't it be safer to provide emulation APIs for these ? There might be code out there using these APIs. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Thu Jan 4 14:06:53 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 09:06:53 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 In-Reply-To: Your message of "Thu, 04 Jan 2001 11:02:35 +0100." <3A544A3B.32B86792@lemburg.com> References: <3A544A3B.32B86792@lemburg.com> Message-ID: <200101041406.JAA11926@cj20424-a.reston1.va.home.com> > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > > - char *, char *, > > - PyObject * (*)(PyObject *, > > - PyObject *)); > > - > > - extern DL_IMPORT(int) > > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > > - PyObject * (*)(PyObject *, PyObject *), int); > > Wouldn't it be safer to provide emulation APIs for these ? There > might be code out there using these APIs. No. These were never intended to be part of the API (and it was a mistake that they used DL_IMPORT()). They had to be extern because they were defined in one file and used in another. I'm glad they're gone. They are so obscure that I'd be *very* surprised if anybody was using them, and even more if they even *wanted* emulation under the new scheme -- I'd expect them to eagerly convert their code to using new-style numbers right away. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jan 4 14:16:39 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 09:16:39 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 07:51:17 +0100." <20010104075116.J402@xs4all.nl> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> Message-ID: <200101041416.JAA11983@cj20424-a.reston1.va.home.com> [Thomas finds that on FreeBSD, getc() is faster than getc_unlocked().] Thomas, I really don't understand it. The getc() source code you showed calls getc_unlocked(). So how can it be faster? The answer must be somewhere else... Cache line conflicts, the rewriting of the loop that I did, a compiler bug, the inlining, who knows. Can you compare the generated assembly code? On other platforms, getc_unlocked() typically speeds the readline() test case up by a significant factor (as in your BSDI numbers, where it's almost 3x faster). Could it be that you're mistaken and that somehow getc_unlocked() is *not* chosen on FreeBSD? Then I could believe it, the rewritten loop is so different that the optimizer might have done something different to it. (Check config.h. When all else fails, I put an #error in the #ifdef branch that I expect not to be taken.) Could it be that somehow getc_unlocked() is later defined to be the same as getc(), so choosing it just adds the overhead of calling f[un]lockfile() for each line? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Thu Jan 4 14:59:05 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 15:59:05 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041416.JAA11983@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 04, 2001 at 09:16:39AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> Message-ID: <20010104155904.L402@xs4all.nl> On Thu, Jan 04, 2001 at 09:16:39AM -0500, Guido van Rossum wrote: > [Thomas finds that on FreeBSD, getc() is faster than getc_unlocked().] > Thomas, I really don't understand it. The getc() source code you > showed calls getc_unlocked(). So how can it be faster? The answer > must be somewhere else... Cache line conflicts, the rewriting of the > loop that I did, a compiler bug, the inlining, who knows. Can you > compare the generated assembly code? On other platforms, > getc_unlocked() typically speeds the readline() test case up by a > significant factor (as in your BSDI numbers, where it's almost 3x > faster). Nono, reread my message, and your code. getc() isn't faster than getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, etc.) Significantly so when there is only one thread running (which is still the common case, for most systems, and FreeBSD's libc has easy inside knowledge about) and marginally so when there is at least one other thread. The small advantage in the multi-threaded case can be explained by the rest of the changes. You see, I was comparing a patched tree versus a non-patched tree, not a getc_unlocked() enabled one versus a disabled one, so I was measuring the speed difference of the *patch*, not of the use of getc_unlocked() vs getc(). Here is the speed difference of just the use of getc() vs getc_unlocked() (same tree, hand-edited config.h) in a non-threaded environment: > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.149 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.148 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 As you see, no significant difference. Here is the difference in a threaded environment (a second thread that does just 'time.sleep(900)'): > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.422 readlines_sizehint 0.200 0.211 using_fileinput 1.604 1.594 while_readline 0.465 0.461 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.430 readlines_sizehint 0.201 0.203 using_fileinput 1.600 1.602 while_readline 0.463 0.461 ... where I have to note that the getc-disabled version's 'using_fileinput' time fluctuates a lot more, mostly upwards, in the threaded environment. (I see it jump to 1.609, 1.617 cputime, every few runs.) Still not a terribly significant difference, but a hint that we, too, can use inside knowledge ;) > Could it be that you're mistaken and that somehow getc_unlocked() is > *not* chosen on FreeBSD? Then I could believe it, the rewritten loop > is so different that the optimizer might have done something different > to it. (Check config.h. When all else fails, I put an #error in the > #ifdef branch that I expect not to be taken.) Yah, #error is great for debugging, I use it a lot ;) But I'm sure of this. FreeBSD's getc() is just craftily optimized. Note that if we can get get_line using getc_unlocked() to run as fast as get_line using getc() on FreeBSD, it should also benifit other platforms, because the only speed to be had is in our own code :) Not that I'm saying it can be improved, just that it apparently got slower, because of this patch. I can't be much help doing any performance tuning, though, I've about used up my lunchhour and I'm working late tonight ;P Good-thing-my-boss-can't-tell-the-difference-between-Apache-and-Python-src-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Thu Jan 4 15:27:28 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 10:27:28 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 15:59:05 +0100." <20010104155904.L402@xs4all.nl> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> Message-ID: <200101041527.KAA12181@cj20424-a.reston1.va.home.com> [Me & Thomas in violent agreement that there's something weird about the speed of getc_unlocked() vs. getc() on FreeBSD.] I just realized what's the probable cause. Read your timing post again: # BSDI: # # (Python 2.0) # while_readline 1.006 1.000 # # (CVS Python + getc_unlocked) # while_readline 0.363 0.367 # FreeBSD: # # (Standard CVS Python) # while_readline 0.214 0.219 # # (CVS+getc_unlocked) # while_readline 0.283 0.281 Standard CVS Python, as opposed to Python 2.0 as released, uses GNU getline()! So on FreeBSD, for this test case, GNU getline() is faster than getc_unlocked(). So the question is, should I leave the GNU getline() code in? I'm inclined against it -- it's not that much faster, and on other platform getc_unlocked() is faster. Given that getc_unlocked() is a standard (of some sort) and GNU getline() is, well, just that, I'd say let's stick with getc_unlocked(). (Unfortunately, from a phone conversation I had last night with Tim, there's not much hope of doing something there -- and that platform sorely needs it! The hacks that Tim reported earlier are definitely not thread-safe. While it's easy to come up with getc_unlocked() for Windows, the locking operations used internally there by the /MT code are not exported from MSVCRT.DLL, and that's crucial.) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Thu Jan 4 15:31:39 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 16:31:39 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041527.KAA12181@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 04, 2001 at 10:27:28AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <200101041527.KAA12181@cj20424-a.reston1.va.home.com> Message-ID: <20010104163139.M402@xs4all.nl> On Thu, Jan 04, 2001 at 10:27:28AM -0500, Guido van Rossum wrote: > [Me & Thomas in violent agreement that there's something weird about > the speed of getc_unlocked() vs. getc() on FreeBSD.] > I just realized what's the probable cause. Read your timing post > again: > Standard CVS Python, as opposed to Python 2.0 as released, uses GNU > getline()! Sorry, no go. You need two things to use getline(): getline() itself, and a GNU libc. FreeBSD has neither. (And autoconf agrees with me.) If you *really really* want me to, I can compile 2.0-standard on FreeBSD and show you. But I'd rather not :) Now go back and read my other mail about why FreeBSD is faster :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin@mems-exchange.org Thu Jan 4 15:43:15 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 4 Jan 2001 10:43:15 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104155904.L402@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 04, 2001 at 03:59:05PM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> Message-ID: <20010104104315.C23803@kronos.cnri.reston.va.us> On Thu, Jan 04, 2001 at 03:59:05PM +0100, Thomas Wouters wrote: >getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ >the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, >etc.) Significantly so when there is only one thread running (which is still So it looks like the ALLOW_THREADS should be moved out of the for loop. This produced no measureable performance difference on Solaris; I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some unusually slow thread operation? --amk From thomas@xs4all.net Thu Jan 4 15:59:25 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 16:59:25 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104104315.C23803@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 04, 2001 at 10:43:15AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> Message-ID: <20010104165925.G2467@xs4all.nl> On Thu, Jan 04, 2001 at 10:43:15AM -0500, Andrew Kuchling wrote: > On Thu, Jan 04, 2001 at 03:59:05PM +0100, Thomas Wouters wrote: > >getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ > >the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, > >etc.) Significantly so when there is only one thread running (which is still > So it looks like the ALLOW_THREADS should be moved out of the for > loop. This produced no measureable performance difference on Solaris; > I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some > unusually slow thread operation? Note that I was just guessing there. I did a quick scan of the function, and noticed that the ALLOW_THREADS statements had moved into the outer loop. I didn't even contemplate whether that made a difference, so don't trust that judgement. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin@mems-exchange.org Thu Jan 4 16:10:29 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 4 Jan 2001 11:10:29 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104165925.G2467@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 04, 2001 at 04:59:25PM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> <20010104165925.G2467@xs4all.nl> Message-ID: <20010104111029.A28510@kronos.cnri.reston.va.us> On Thu, Jan 04, 2001 at 04:59:25PM +0100, Thomas Wouters wrote: >Note that I was just guessing there. I did a quick scan of the function, and >noticed that the ALLOW_THREADS statements had moved into the outer loop. I >didn't even contemplate whether that made a difference, so don't trust that >judgement. According to your benchmark, the performance of the threaded version was the same whether or not getc_unlocked() was unused, so it's not that flockfile() is really slow. I can't believe the compiler optimized the old, ungainly loop better than the newer, tighter loop. That leaves the ALLOW_THREADS as the most reasonable culprit. --amk From akuchlin@mems-exchange.org Thu Jan 4 17:10:11 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 04 Jan 2001 12:10:11 -0500 Subject: [Python-Dev] SGI's Digital Media SDK Message-ID: SGI just made a source release of their digital media SDK for IRIX and Linux at http://oss.sgi.com/projects/dmsdk/ . According to the FAQ, this is derived from previous SGI libraries, "including the Video Library (VL), the Audio Library (AL), Digital Media Image Convertor (DMIC), Digital Media Audio Convertor (DMAC), and the Compression Library (CL)." Interested parties may want to look into this, because Python still has the al, cd, cl, and sv modules; maybe they'd work with the new software with a reasonable amount of fixing, and at least now there's a reasonable chance that non-IRIX platforms will be supported. --amk From guido@python.org Thu Jan 4 19:07:13 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 14:07:13 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 10:43:15 EST." <20010104104315.C23803@kronos.cnri.reston.va.us> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> Message-ID: <200101041907.OAA12573@cj20424-a.reston1.va.home.com> > So it looks like the ALLOW_THREADS should be moved out of the for > loop. This produced no measureable performance difference on Solaris; > I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some > unusually slow thread operation? I kind of doubt that it's Py_ALLOW_THREADS -- it's in the outer loop, which typically only gets executed once. It only goes around a second time when the line is longer than the initial buffer. We could tweak the initial buffer size (currently 100, with increments of 1000). --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Jan 4 19:32:15 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 04 Jan 2001 20:32:15 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 References: <3A544A3B.32B86792@lemburg.com> <200101041406.JAA11926@cj20424-a.reston1.va.home.com> Message-ID: <3A54CFBF.CDD2138B@lemburg.com> Guido van Rossum wrote: > > > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > > > - char *, char *, > > > - PyObject * (*)(PyObject *, > > > - PyObject *)); > > > - > > > - extern DL_IMPORT(int) > > > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > > > - PyObject * (*)(PyObject *, PyObject *), int); > > > > Wouldn't it be safer to provide emulation APIs for these ? There > > might be code out there using these APIs. > > No. These were never intended to be part of the API (and it was a > mistake that they used DL_IMPORT()). They had to be extern because > they were defined in one file and used in another. I'm glad they're > gone. They are so obscure that I'd be *very* surprised if anybody was > using them, and even more if they even *wanted* emulation under the > new scheme -- I'd expect them to eagerly convert their code to using > new-style numbers right away. I'll see whether I can get mxDateTime working with the new scheme later this year -- it would be really great to do away with the coercion hack I was using until now :-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one@home.com Fri Jan 5 06:04:56 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 5 Jan 2001 01:04:56 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041527.KAA12181@cj20424-a.reston1.va.home.com> Message-ID: [Guido van Rossum] > ... > (Unfortunately, from a phone conversation I had last night with > Tim, there's not much hope of doing something there -- and that > platform [Win32] sorely needs it! The hacks that Tim reported > earlier are definitely not thread-safe. While it's easy to come > up with getc_unlocked() for Windows, the locking operations used > internally there by the /MT code are not exported from MSVCRT.DLL, > and that's crucial.) The short course is that I still haven't found a workable way to lock streams on Windows: they do have a complete set of stream-locking functions and macros, but there's no way short of deep magic I can find to get at them ("deep magic" == resort to assembler and patch in function addresses). The only file-locking functions advertised in the C and platform SDK libraries are trivial variants of Python's msvcrt.locking, but that has to do with locking specific file byte-position ranges across processes, not ensuring the integrity of runtime stream structures across threads. Perl appears to ignore the issue of thread safety here (on Windows and everywhere else). Revealing experiment! 1. I threw away my changes and rebuilt from current CVS. 2. I made one change, expanding the getc() call in get_line to what MSVC *would* expand it to if we weren't building in thread mode: if ((c = (--fp->_cnt >= 0 ? 0xff & *fp->_ptr++ : _filbuf(fp))) == EOF) { That alone reduced the runtime of my "while 1: readline" test case from over 30 seconds to 12.8. What I did before went beyond that, by also (in effect) unrolling the loop and optimizing it. That bought an additional ~2 seconds. So compared to Perl's 6 seconds, it looks like we're paying (on Win98SE) approximately: 17 seconds for compiling with _MT (threadsafe libc) 6 seconds to do the work 5 seconds for "other stuff", best guess mostly a poor platform malloc/realloc 2 seconds for not optimizing the loop -- 30 total Unfortunately, the smoking gun is the only one whose firing pin we can't file down on this platform. so-the-good-news-is-that-it's-impossible-for-perl-not-to-be-at- least-twice-as-fast-ly y'rs - tim From guido@python.org Fri Jan 5 15:29:05 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 10:29:05 -0500 Subject: [Python-Dev] Python 2.1 release schedule (PEP 226) Message-ID: <200101051529.KAA19100@cj20424-a.reston1.va.home.com> We had our first PythonLabs meeting of the year yesterday, and we went over the 2.1 release schedule. The release schedule is posted in PEP 226: http://python.sourceforge.net/peps/pep-0226.html We found that the schedule previously posted there was a bit too aggressive, given our goals for this release, so we have adjusted the dates somewhat. We have also decided on a date for the first alpha release (previously unmentioned in the PEP). So, here are the relevant dates: 19-Jan-2001: First 2.1 alpha release 23-Feb-2001: First 2.1 beta release 01-Apr-2001: 2.1 final release We're already in PEP freeze mode -- no more PEPs will be considered for inclusion in 2.1. Below is a list of the PEPs that we are currently considering, with some comments. But first some general remarks: - The alpha release cycle is for testing of tentative features. Alpha releases contain working code that we want to see widely tested; however, it's possible that a feature present in an alpha release is changed or even retracted in a later release. - Beta releases represent a feature freeze -- after the first beta release, we will resign ourselves to fixing bugs. Once beta 1 is released, no new features will be introduced, and no features will be withdrawn. The alpha cycle is especially important for features (such as nested scopes) that (may) introduce backwards incompatibilities. There may be more than one alpha release depending on feedback on the alpha 1 release. (But having too many alpha releases is not good -- people won't bother downloading.) Thus, we can only introduce a new feature in beta 1 if we're very sure that it is mature enough to stay without interface changes. The final decision on all PEPs under consideration has to be made before the beta 1 release. The beta cycle is important to ensure stability of the final release. Specific PEPs under consideration: I 42 pep-0042.txt Small Feature Requests Hylton Actually, most of these won't be fulfilled in 2.1. SD 205 pep-0205.txt Weak References Drake Fred is still working on this. I hope Tim can assist. But we may have to postpone this. S 207 pep-0207.txt Rich Comparisons Lemburg, van Rossum I'm pretty sure that this is a piece of cake now that the coercion patches are checked in. S 208 pep-0208.txt Reworking the Coercion Model Schemenauer All checked in. Great work, Neil! S 217 pep-0217.txt Display Hook for Interactive Use Zadka Moshe, this was accepted ages ago. Would you mind submitting a patch to SourceForge? If you don't champion this (and nobody else does), we may have to postpone it still. S 222 pep-0222.txt Web Library Enhancements Kuchling This is really up to Andrew. It seems he plans to create new modules, so he won't be introducing incompatibilities in existing APIs. S 227 pep-0227.txt Statically Nested Scopes Hylton Jeremy is still working on a proper implementation, which he hopes to have ready in time for the first alpha release date. S 229 pep-0229.txt Using Distutils to Build Python Kuchling I just moved this from pie-in-the-sky to active. Andrew has a working prototype, it just doesn't work 100% yet, so I'm very hopeful. S 230 pep-0230.txt Warning Framework van Rossum All done. S 232 pep-0232.txt Function Attributes Warsaw Still waiting for Barry to implement this, but it's pretty straightforward. S 233 pep-0233.txt Python Online Help Prescod Paul, what's up with this? Tim & I recommended to do something simple and working, and then you disappeared from the face of the earth. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Fri Jan 5 15:28:16 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 5 Jan 2001 10:28:16 -0500 (EST) Subject: [Python-Dev] new "theme" on SourceForge! Message-ID: <14933.59408.512734.105160@cj42289-a.reston1.va.home.com> While "theme-ability" is becoming very popular for desktop software (think about the latest Gnome and KDE systems for Unix, and some of the multimedia applications for Windows, and the newest MacOS desktops), it can be a huge drain on Web sites; too many graphics is a pain, and too many tables just makes it worse. SourceForge had definately fallen prey to the overly-fancy themes, and all of us developers paid the price with slow rendering. But they've fixed that! The SF crew has announced a new "theme" called "Ultra Light" which is optimized for slow connections. What that really means is less embedded graphics and fewer nested tables, so rendering is *much* faster. To try the new theme, go to the "Change My Theme" link near the top of the left-hand navigation area. Use the form to select "Ultra Light"; you can preview the theme first if you want. Guido also thinks its cool that the bug & patch report pages are printable with this theme. (Sheesh... managers! ;) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Fri Jan 5 17:46:16 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 5 Jan 2001 12:46:16 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Lib fileinput.py,1.5,1.6 In-Reply-To: Message-ID: [Guido] > Modified Files: > fileinput.py > Log Message: > Speed it up by using readlines(sizehint). It's still slower than > other ways of reading input. :-( On my box, it's now head-to-head with (maybe even a little quicker than) the while 1: line-at-a-time way: total 117615824 chars and 3237568 lines readlines_sizehint 9.450 9.459 using_fileinput 29.880 29.884 while_readline 30.480 30.506 (stock CVS Python under Win98SE) So that's a huge improvement! the-two-people-using-fileinput-should-be-delighted-ly y'rs - tim From skip@mojam.com (Skip Montanaro) Fri Jan 5 19:05:14 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 5 Jan 2001 13:05:14 -0600 (CST) Subject: [Python-Dev] fileinput.py In-Reply-To: References: Message-ID: <14934.6890.160122.384692@beluga.mojam.com> Tim> the-two-people-using-fileinput-should-be-delighted-ly What do you think contributes to fileinput's relative disfavor? This whole thread on Python's file reading performance was started by the eternal whine "why is Python so much slower than Perl?" which really means why is line = f.readline() while line: process(line) so much slower than whatever that thing is in Perl that everybody uses as the be-all-end-all performance benchmark (something with <> in it). Given that fileinput is supposed to make the I/O loop in Python more familiar to those people wandering over from Perl (at least in part), you'd think that people would naturally gravitate to it. Would it benefit from some exposure in the Python tutorial? Is it fast enough now to warrant the extra exposure? just-whining-out-loud-ly y'rs Skip From tim.one@home.com Fri Jan 5 19:11:00 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 5 Jan 2001 14:11:00 -0500 Subject: [Python-Dev] new "theme" on SourceForge! In-Reply-To: <14933.59408.512734.105160@cj42289-a.reston1.va.home.com> Message-ID: [Fred L. Drake, Jr.] Who would have guessed that the "L." stands for Light? > ... > The SF crew has announced a new "theme" called "Ultra Light" which > is optimized for slow connections. Indeed, I think I can cancel my cable modem now and go back to a 28.8 phone modem. liking-it!-ly y'rs - tim From jeremy@alum.mit.edu Fri Jan 5 19:14:49 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 5 Jan 2001 14:14:49 -0500 (EST) Subject: [Python-Dev] unit testing bake-off Message-ID: <14934.7465.360749.199433@localhost.localdomain> There was a brief discussion of unit testing last millennium, which did not reach any conclusions. I'd like to restart the discussion and set some specific goals. The action item is a unit testing bake-off, held next week, to choose a tool. The primary goal is to choose a unit testing framework for the regression test suite. Tests written with this framework would eventually replace the current regrtest.py framework, based on comparing test output to expected output. For the 2.1 release, the goal would be to choose a test framework to include in the standard distribution and use it to write some or all of the new tests. We would need to integrate it in some way with regrtest.py, so that a single command can be used to run all the tests. In the long run, we can migrate existing tests to use the new system. The new system can help us address some other goals: - running an entire test suite to completion instead of stopping on the first failure - clearer reporting of what went wrong - better support for conditional tests, e.g. write a test for httplib that only runs if the network is up. This is tied into better error reporting, since the current test suite could only report that httplib succeeded or failed. Does anyone disagree with the goal? Three tools have been proposed: PyUnit, Quixote unittest, and doctest. doctest has been championed by Peter Funk, who wants a few new features, but Tim, its author, isn't pushing it as a tool for writing stand alone tests. I think the best way to use doctest is for module writers to consider it when writing a new module. If doctest is used from the start for a module, we could integrate it with the regression test. It seems quite useful for what it is intended for, but is not a general solution. That leaves PyUnit and Quixote's unittest. The two tools are fairly similar, but differ on a number of non-trivial details. Quixote also integrates code coverage, which is quite handy. If we don't adopt its unittest, we should add code coverage to PyUnit. Is anyone else interested in the choice between the two? If so, I suggest you try writing some tests with each tool and reporting back with your feedback. I propose leaving one week for such a bake-off and making a decision next Friday. Jeremy From fredrik@effbot.org Fri Jan 5 19:55:18 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 5 Jan 2001 20:55:18 +0100 Subject: [Python-Dev] unit testing bake-off References: <14934.7465.360749.199433@localhost.localdomain> Message-ID: <004c01c07751$6eed84d0$e46940d5@hagrid> Jeremy Hylton wrote: > Is anyone else interested in the choice between the two? yes. I suggest adding doctest.py plus one unit test implementation. > If so, I suggest you try writing some tests with each tool and > reporting back with your feedback. we've recently migrated from a 30-minute reimplementation of Kent Beck's original framework to one of the frameworks you mention. with that background, the choice was easy. let me know when it's time to vote... From guido@python.org Fri Jan 5 19:55:33 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 14:55:33 -0500 Subject: [Python-Dev] fileinput.py In-Reply-To: Your message of "Fri, 05 Jan 2001 13:05:14 CST." <14934.6890.160122.384692@beluga.mojam.com> References: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: <200101051955.OAA20190@cj20424-a.reston1.va.home.com> > What do you think contributes to fileinput's relative disfavor? In my view, fileinput is one of those unfortunate features that exist solely to shut up a particular kind of criticism. Without fileinput, Perl zealots would have an easy argument for a "trivial reject" of even considering Python. Now, when somebody claims the superiority of Perl's "loop involving a <> thingie", you can point to fileinput to prevent them from scoring a point. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jan 5 20:01:13 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 15:01:13 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: Your message of "Fri, 05 Jan 2001 20:55:18 +0100." <004c01c07751$6eed84d0$e46940d5@hagrid> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> Message-ID: <200101052001.PAA20238@cj20424-a.reston1.va.home.com> > yes. I suggest adding doctest.py plus one unit test implementation. I second this vote for doctest (in addition to a unittest thing). I propose that Tim checks in his latest version of doctest. It should go under Lib, not under Lib/test, I think. (Certainly that's how Tim has been proposing its use.) It requires LaTeX docs, but since it's got a great docstring, that should be easy. > > If so, I suggest you try writing some tests with each tool and > > reporting back with your feedback. > > we've recently migrated from a 30-minute reimplementation of Kent > Beck's original framework to one of the frameworks you mention. with > that background, the choice was easy. let me know when it's time to > vote... Which framework are you now using? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jan 5 20:14:41 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 15:14:41 -0500 Subject: [Python-Dev] Add __exports__ to modules Message-ID: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Please have a look at this SF patch: http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 This implements control over which names defined in a module are externally visible: if there's a variable __exports__ in the module, it is a list of identifiers, and any access from outside the module to names not in the list is disallowed. This affects access using the getattr and setattr protocols (which raise AttributeError for disallowed names), as well as "from M import v" (which raises ImportError). I like it. This has been asked for many times. Does anybody see a reason why this should *not* be added? Tim remarked that introducing this will prompt demands for a similar feature on classes and instances, where it will be hard to implement without causing a bit of a slowdown. It causes a slight slowdown (an extra dictionary lookup for each use of "M.v") even when it is not used, but for accessing module variables that's acceptable. I'm not so sure about instance variable references. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Fri Jan 5 20:19:55 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 5 Jan 2001 15:19:55 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101052001.PAA20238@cj20424-a.reston1.va.home.com> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> Message-ID: <14934.11371.879059.610988@localhost.localdomain> If anyone is interested in experimenting with a test suite, here is a summary of the code coverage for the current regression test suite as run on my Linux box. Pick a module with low code coverage and your experiment can also improve the regression test suite. Jeremy 67.42% 798 Modules/arraymodule.c 74.39% 773 Modules/audioop.c 81.84% 380 Modules/binascii.c 62.36% 449 Modules/bsddbmodule.c 78.29% 152 Modules/cmathmodule.c 67.89% 246 Modules/_codecsmodule.c 47.41% 2647 Modules/cPickle.c 87.50% 8 Modules/cryptmodule.c 64.34% 272 Modules/cStringIO.c 0.00% 1351 Modules/_cursesmodule.c 0.00% 202 Modules/_curses_panel.c 99.28% 139 Modules/errnomodule.c 30.71% 127 Modules/fcntlmodule.c 81.90% 315 Modules/gcmodule.c 0.00% 4 Modules/getbuildinfo.c 47.29% 277 Modules/getpath.c 72.22% 54 Modules/grpmodule.c 79.95% 419 Modules/imageop.c 0.00% 11 Modules/../Include/cStringIO.h 13.25% 234 Modules/linuxaudiodev.c 14.80% 223 Modules/_localemodule.c 30.66% 137 Modules/main.c 73.20% 97 Modules/mathmodule.c 98.39% 124 Modules/md5c.c 69.70% 66 Modules/md5module.c 48.62% 362 Modules/mmapmodule.c 66.22% 74 Modules/newmodule.c 84.91% 53 Modules/operator.c 50.57% 1236 Modules/parsermodule.c 0.00% 350 Modules/pcremodule.c 28.88% 1077 Modules/posixmodule.c 82.05% 39 Modules/pwdmodule.c 77.96% 431 Modules/pyexpat.c 0.00% 1876 Modules/pypcre.c 50.00% 2 Modules/python.c 0.00% 189 Modules/readline.c 78.35% 425 Modules/regexmodule.c 72.93% 931 Modules/regexpr.c 0.00% 81 Modules/resource.c 76.98% 443 Modules/rgbimgmodule.c 82.70% 289 Modules/rotormodule.c 82.47% 291 Modules/selectmodule.c 85.10% 208 Modules/shamodule.c 81.52% 276 Modules/signalmodule.c 51.18% 678 Modules/socketmodule.c 78.64% 1105 Modules/_sre.c 69.67% 689 Modules/stropmodule.c 80.49% 656 Modules/structmodule.c 4.88% 123 Modules/termios.c 60.71% 140 Modules/threadmodule.c 68.78% 205 Modules/timemodule.c 76.92% 65 Modules/ucnhash.c 87.50% 16 Modules/unicodedatabase.c 65.83% 120 Modules/unicodedata.c 68.81% 420 Modules/zlibmodule.c 64.68% 1005 Objects/abstract.c 18.77% 261 Objects/bufferobject.c 68.77% 1204 Objects/classobject.c 27.59% 58 Objects/cobject.c 59.41% 271 Objects/complexobject.c 78.32% 678 Objects/dictobject.c 52.14% 723 Objects/fileobject.c 80.43% 368 Objects/floatobject.c 84.86% 185 Objects/frameobject.c 60.40% 149 Objects/funcobject.c 78.68% 455 Objects/intobject.c 77.66% 779 Objects/listobject.c 81.17% 1142 Objects/longobject.c 50.68% 148 Objects/methodobject.c 58.82% 136 Objects/moduleobject.c 76.50% 549 Objects/object.c 15.24% 105 Objects/rangeobject.c 41.03% 78 Objects/sliceobject.c 76.63% 1797 Objects/stringobject.c 77.00% 287 Objects/tupleobject.c 22.22% 18 Objects/typeobject.c 84.26% 108 Objects/unicodectype.c 66.61% 2743 Objects/unicodeobject.c 90.79% 76 Parser/acceler.c 0.00% 28 Parser/bitset.c 0.00% 67 Parser/firstsets.c 18.18% 22 Parser/grammar1.c 0.00% 139 Parser/grammar.c 0.00% 30 Parser/intrcheck.c 0.00% 38 Parser/listnode.c 0.00% 2 Parser/metagrammar.c 0.00% 63 Parser/myreadline.c 90.70% 43 Parser/node.c 82.26% 124 Parser/parser.c 79.38% 97 Parser/parsetok.c 0.00% 366 Parser/pgen.c 0.00% 85 Parser/pgenmain.c 0.00% 60 Parser/printgrammar.c 76.70% 588 Parser/tokenizer.c 62.31% 1231 Python/bltinmodule.c 76.55% 2021 Python/ceval.c 64.78% 230 Python/codecs.c 73.85% 2367 Python/compile.c 76.67% 30 Python/dynload_shlib.c 75.75% 301 Python/errors.c 65.59% 401 Python/exceptions.c 0.00% 31 Python/frozenmain.c 56.83% 776 Python/getargs.c 100.00% 2 Python/getcompiler.c 100.00% 2 Python/getcopyright.c 80.00% 5 Python/getmtime.c 15.62% 32 Python/getopt.c 100.00% 2 Python/getplatform.c 100.00% 4 Python/getversion.c 61.78% 1167 Python/import.c 66.67% 42 Python/importdl.c 51.35% 483 Python/marshal.c 60.58% 274 Python/modsupport.c 88.73% 71 Python/mystrtoul.c 0.00% 2 Python/pyfpe.c 91.15% 113 Python/pystate.c 37.80% 635 Python/pythonrun.c 0.00% 5 Python/sigcheck.c 12.67% 150 Python/structmember.c 53.87% 323 Python/sysmodule.c 100.00% 5 Python/thread.c 53.47% 144 Python/thread_pthread.h 21.74% 138 Python/traceback.c 58.65% 48417 TOTAL From tim.one@home.com Fri Jan 5 20:46:10 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 5 Jan 2001 15:46:10 -0500 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: [Skip Montanaro] > What do you think contributes to fileinput's relative disfavor? Only half jokingly, because I never use it , and I don't think Fredrik or Alex Martelli do either. That means it rarely gets mentioned by the c.l.py reply bots. Plus it's not *used* anywhere in the Python distribution, so nobody stumbles into it that way either. Plus the docs require more than one line to explain what it does, and get bogged down describing the Awk-like (Perl took this from Awk) convolutions before the simplest (one explictly named file) case. It *is* regularly mentioned in the eternal "while 1:" debate, but that's it. > This whole thread on Python's file reading performance was started > by the eternal whine "why is Python so much slower than Perl?" No, it started with Guido's objections to Jeff's xreadlines patch. I dragged Perl into it -- because, like it or not, that was the right thing to do . > which really means why is > > line = f.readline() > while line: > process(line) > > so much slower than whatever that thing is in Perl that everybody > uses as the be-all-end-all performance benchmark (something with > <> in it). "" is simply Perl's way of spelling Python's FILE.readline() (and FILE.readlines(), when appears in an array context; and FILE.read() when Perl's Awkish "record separator" is disabled; and ...). "<>" without an explict filehandle does all the inherited-from-Awk magic with argv, else that stuff doesn't come into play. "<>" (wihtout a filehandle) seems rarely used in Perl practice, though, *except* in support of your_shell_prompt> some_perl_script < some_file That is, "<>" is usually used simply as an abbrevision for , and I bet *most* Perl programmers don't even know "<>" is more general than that. > Given that fileinput is supposed to make the I/O loop in Python more > familiar to those people wandering over from Perl (at least in part), > you'd think that people would naturally gravitate to it. I guess you didn't actually read the timing results . Really, it's been an outrageously slow way to do input. That's better now, and I'm much more likely now than I used to be to use for line in fileinput.input('file'): instead of f = open('file') while 1: line = f.readline() if not line: break The relative attraction of the former is obvious if it's reasonably quick. I don't really have any use for the Awk complications (note that I'm running on Windows, though, and the shells here don't expand wildcards -- the Awk gimmicks are much more useful on Unix systems). > Would it benefit from some exposure in the Python tutorial? Heh -- that's a tough one. The *simplest* case is the only one deserving of promotion. But in that case, Jeff's xreadlines is about as convenient and much quicker. I bet we'll all be afraid to change the tutorial to mention either <0.9 wink>. > Is it fast enough now to warrant the extra exposure? Don't know. It's the same speed as "while 1: on *my* box now, but still 3x slower than the double-loop method. > just-whining-out-loud-ly y'rs so-do-*you*-want-to-use-it-now?-ly y'rs - tim From thomas@xs4all.net Fri Jan 5 21:19:42 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 5 Jan 2001 22:19:42 +0100 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: ; from tim.one@home.com on Fri, Jan 05, 2001 at 03:46:10PM -0500 References: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: <20010105221942.J2467@xs4all.nl> On Fri, Jan 05, 2001 at 03:46:10PM -0500, Tim Peters wrote: > "<>" (wihtout a filehandle) seems > rarely used in Perl practice, though, *except* in support of > > your_shell_prompt> some_perl_script < some_file > > That is, "<>" is usually used simply as an abbrevision for , and I > bet *most* Perl programmers don't even know "<>" is more general than that. Well, I can't say anything about *most* Perl programmers, but all Perl programmers I know (including me) know damned well what <> does, and use it frequently. And in all the ways: no arguments meaning , a list of files meaning open those files one at a time, using - to include stdin in that list, accessing the filename and linenumber, etc. None of them can be called newbies, though. But then, I like using Python's fileinput, too, so maybe I'm just weird :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ping@lfw.org Fri Jan 5 22:01:53 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 5 Jan 2001 16:01:53 -0600 (CST) Subject: [Python-Dev] RE: fileinput.py In-Reply-To: <20010105221942.J2467@xs4all.nl> Message-ID: On Fri, 5 Jan 2001, Thomas Wouters wrote: > On Fri, Jan 05, 2001 at 03:46:10PM -0500, Tim Peters wrote: > > That is, "<>" is usually used simply as an abbrevision for , and I > > bet *most* Perl programmers don't even know "<>" is more general than that. > > Well, I can't say anything about *most* Perl programmers, but all Perl > programmers I know (including me) know damned well what <> does, and use it > frequently. And in all the ways: no arguments meaning , a list of > files meaning open those files one at a time, using - to include stdin in > that list, accessing the filename and linenumber, etc. I was just about to chime in and say the same thing. I don't even program in Perl any more, and i still remember all the ways that <> works. For text-processing scripts, it's unbeatable. It does pretty much exactly everything you want, and the idiom while (<>) { ... } is simple, quickly learned, frequently used, and instantly recognizable. import sys if len(sys.argv) > 1: file = open(sys.argv[1]) else: file = sys.stdin while 1: line = file.readline() if not line: break ... is much more complex, harder to explain, harder to learn, and runs slower. I have two separate suggestions: 1. Include 'sys' in builtins. It's silly to have to 'import sys' just to be able to see sys.argv and sys.stdin. 2. Put fileinput.input() in sys. With both, the while (<>) idiom becomes: for line in sys.input(): ... -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From skip@mojam.com (Skip Montanaro) Fri Jan 5 22:19:36 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 5 Jan 2001 16:19:36 -0600 (CST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14934.11371.879059.610988@localhost.localdomain> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> <14934.11371.879059.610988@localhost.localdomain> Message-ID: <14934.18552.749081.871226@beluga.mojam.com> Jeremy> If anyone is interested in experimenting with a test suite, here Jeremy> is a summary of the code coverage for the current regression Jeremy> test suite as run on my Linux box. Speaking of which, I am still running my nightly code coverage thing (still with warts) whose results are available at http://musi-cal.mojam.com/~skip/python/Python/dist/src/ Does anyone care? Should I turn it off? Skip From thomas@xs4all.net Fri Jan 5 23:18:58 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 00:18:58 +0100 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: ; from ping@lfw.org on Fri, Jan 05, 2001 at 04:01:53PM -0600 References: <20010105221942.J2467@xs4all.nl> Message-ID: <20010106001858.B402@xs4all.nl> On Fri, Jan 05, 2001 at 04:01:53PM -0600, Ka-Ping Yee wrote: > while (<>) { > ... > } > is simple, quickly learned, frequently used, and instantly recognizable. > import sys > if len(sys.argv) > 1: > file = open(sys.argv[1]) > else: > file = sys.stdin > while 1: > line = file.readline() > if not line: > break > ... ... Except that it can take more than one filename, and will do the one after another, and that it takes "-" as a filename for stdin. Doing it in a script is not dead simple, unless you open up all files at once (which can be harmful, and Perl, for one, doesn't do) or you do most of the work fileinput does. That is why I use fileinput (and while-diamond) -- I might not need it now, but when I do need it, it already works :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From moshez@zadka.site.co.il Sat Jan 6 11:00:33 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Sat, 6 Jan 2001 13:00:33 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> On Fri, 05 Jan 2001 15:14:41 -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). Ummmmm.....why do we want this? What's wrong with the current suggestion of using "_"? __exports__ feels somehow wrong to me. None of the rest of Python has any access control, and I really like that. A big -1 from me, for what it's worth. > I like it. I'm surprised. Why do you like that? > This has been asked for many times. So has adding curly-braces as control structure, with all due respect. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From billtut@microsoft.com Sat Jan 6 03:43:06 2001 From: billtut@microsoft.com (Bill Tutt) Date: Fri, 5 Jan 2001 19:43:06 -0800 Subject: [Python-Dev] Add __exports__ to modules Message-ID: <58C671173DB6174A93E9ED88DCB0883DB8637E@red-msg-07.redmond.corp.microsoft.com> I think I'm with Moshe on this one, whats wrong with just using underscores (__) to play the hiding game. Here's my silly language suggestion for this week: with self: .bar = foo bar.blah = .fubar .bar = .bar + 1 # etc.... Bill From skip@mojam.com (Skip Montanaro) Sat Jan 6 04:15:12 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 5 Jan 2001 22:15:12 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <14934.39888.908416.983794@beluga.mojam.com> > On Fri, 05 Jan 2001 15:14:41 -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). I have to agree with Moshe. If __exports__ is implemented for modules we'll have multiple, different access control mechanisms for different things, some of which thoughtful programmers would be able to get around, some of which they wouldn't. Here are the ways I'm aware of to control attribute visibility (there may be others - I don't usually delve too deeply into this stuff): * preface module globals with "_": This just prevents those globals from being added to the current namespace when a programmer executes "from module import *". Programmers can workaround this by attribute access through the module object or by explicitly importing it: "from module import _foo" works, yes? * preface class or instance attributes with "__": This just mangles the name by prefacing the visible name with _. The programmer can still access it by knowing the simple name mangling rule. In both cases the programmer can still get at the attribute value when necessary. If you were to add some sort of access control to module globals, I would have thought it would have been along the same lines as the existing mechanisms in place to "hide" class/instance attributes. Would it be possible (or desirable) to add the name mangling restriction to module globals as an alternative to this more restrictive implementation? What about the chances that class/instance attribute hiding will get more restrictive in the future? Finally, are the motivations for wanting to restrict access to module globals and class/instance attributes that much different from one another that they call for fundamentally different mechanisms? Skip From barry@digicool.com Sat Jan 6 05:15:20 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 00:15:20 -0500 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <14934.43496.322436.612746@anthem.wooz.org> I'm -0 on this, largely for the reasons already brought up: if modules grow __exports__ then there will be pressure to add it to classes, and modules already have a limited version of access control through leading underscore names. I might be more positive on the addition if __exports__ were added to classes, because at least there'd be a consistently stronger fence added to name access rules that prevented even consenting adults from fiddling with the naughty bits. -Barry From nas@arctrix.com Fri Jan 5 23:20:58 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Fri, 5 Jan 2001 15:20:58 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14934.43496.322436.612746@anthem.wooz.org>; from barry@digicool.com on Sat, Jan 06, 2001 at 12:15:20AM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> Message-ID: <20010105152058.A6016@glacier.fnational.com> On Sat, Jan 06, 2001 at 12:15:20AM -0500, Barry A. Warsaw wrote: > I might be more positive on the addition if __exports__ were added to > classes, because at least there'd be a consistently stronger fence > added to name access rules that prevented even consenting adults from > fiddling with the naughty bits. I think you, Skip and Moshe are missing a big advantage of having the __exports__ mechanism. It should allow some attribute access inside of modules to become faster (like LOAD_FAST for locals). I think that optimization could be implemented without too much difficultly. I've never channeled Guido before so I could be off the mark. If the only advantage is encapsulation then I'm -0. Neil From barry@digicool.com Sat Jan 6 07:09:31 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 02:09:31 -0500 Subject: [Python-Dev] PEP 232 update and patch Message-ID: <14934.50347.851118.581484@anthem.wooz.org> --qCCjmbam8k Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit I've updated PEP 232, function attributes, and uploaded a patch to SF. I couldn't coax cvs diff into including the new files Lib/test/test_funcattrs.py and Lib/test/output/test_funcattrs so I'll attach them below. PEP 232: http://python.sourceforge.net/peps/pep-0232.html SF patch #103123: http://sourceforge.net/patch/?func=detailpatch&patch_id=103123&group_id=5470 Enjoy, -Barry --qCCjmbam8k Content-Type: text/plain Content-Description: regrtest for function attributes Content-Disposition: inline; filename="test_funcattrs.py" Content-Transfer-Encoding: 7bit from test_support import verbose, TestFailed class F: def a(self): pass def b(): 'my docstring' pass # setting attributes on functions try: b.publish except AttributeError: pass else: raise TestFailed, 'expected AttributeError' b.publish = 1 if b.publish <> 1: raise TestFailed, 'function attribute not set to expected value' docstring = 'its docstring' b.__doc__ = docstring if b.__doc__ <> docstring: raise TestFailed, 'problem with setting __doc__ attribute' if 'publish' not in dir(b): raise TestFailed, 'attribute not in dir()' f1 = F() f2 = F() try: F.a.publish except AttributeError: pass else: raise TestFailed, 'expected AttributeError' try: f1.a.publish except AttributeError: pass else: raise TestFailed, 'expected AttributeError' F.a.publish = 1 if F.a.publish <> 1: raise TestFailed, 'unbound method attribute not set to expected value' if f1.a.publish <> 1: raise TestFailed, 'bound method attribute access did not work' if f2.a.publish <> 1: raise TestFailed, 'bound method attribute access did not work' if 'publish' not in dir(F.a): raise TestFailed, 'attribute not in dir()' try: f1.a.publish = 0 except TypeError: pass else: raise TestFailed, 'expected TypeError' F.a.myclass = F f1.a.myclass f2.a.myclass f1.a.myclass F.a.myclass if f1.a.myclass is not f2.a.myclass or \ f1.a.myclass is not F.a.myclass: raise TestFailed, 'attributes were not the same' # try setting __dict__ try: F.a.__dict__ = (1, 2, 3) except TypeError: pass else: raise TestFailed, 'expected TypeError' F.a.__dict__ = {'one': 11, 'two': 22, 'three': 33} if f1.a.two <> 22: raise TestFailed, 'setting __dict__' from UserDict import UserDict d = UserDict({'four': 44, 'five': 55}) try: F.a.__dict__ = d except TypeError: pass else: raise TestFailed if f2.a.one <> f1.a.one <> F.a.one <> 11: raise TestFailed --qCCjmbam8k Content-Type: text/plain Content-Description: output of regrtest for function attributes Content-Disposition: inline; filename="test_funcattrs" Content-Transfer-Encoding: 7bit test_funcattrs --qCCjmbam8k-- From martin@loewis.home.cs.tu-berlin.de Sat Jan 6 10:06:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 11:06:49 +0100 Subject: [Python-Dev] PEP 208 comment Message-ID: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> I just studied PEP 208 for the first time. Overall, it seems all natural and nice, but there is one one aspect I'd like to see changed: the naming of the type flag. Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a program should be called "new". The flag will still be there five years from now, but it won't be new anymore. Also, while the flag indicates that style of the numbers is new, it does not say what it does. So I propose to rename it; if nobody finds a better name, I propose to call it Py_TPFLAGS_UNCOERCED. Regards, Martin From thomas@xs4all.net Sat Jan 6 12:52:19 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 13:52:19 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sat, Jan 06, 2001 at 11:06:49AM +0100 References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> Message-ID: <20010106135219.L2467@xs4all.nl> On Sat, Jan 06, 2001 at 11:06:49AM +0100, Martin v. Loewis wrote: > Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a > program should be called "new". The flag will still be there five > years from now, but it won't be new anymore. Also, while the flag > indicates that style of the numbers is new, it does not say what it > does. So I propose to rename it; if nobody finds a better name, I > propose to call it Py_TPFLAGS_UNCOERCED. Wrong name. The TPFLAGs only indicate whether a struct is large enough to contain a particular member, not whether that member is going to contain or do anything. 'Py_TPFLAGS_HASCOERCE' or some such would seem more appropriate to me. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin@loewis.home.cs.tu-berlin.de Sat Jan 6 13:36:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 14:36:39 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <20010106135219.L2467@xs4all.nl> (message from Thomas Wouters on Sat, 6 Jan 2001 13:52:19 +0100) References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <20010106135219.L2467@xs4all.nl> Message-ID: <200101061336.f06DadP02895@mira.informatik.hu-berlin.de> > Wrong name. The TPFLAGs only indicate whether a struct is large enough to > contain a particular member, not whether that member is going to contain or > do anything. That may have been the original intention; *this* specific flag is not of that kind. Please look at abstract.c:binary_op1, which has if (v->ob_type->tp_as_number != NULL && NEW_STYLE_NUMBER(v)) { slot = NB_BINOP(v->ob_type->tp_as_number, op_slot); if (*slot) { x = (*slot)(v, w); if (x != Py_NotImplemented) { return x; } Py_DECREF(x); /* can't do it */ } if (v->ob_type == w->ob_type) { goto binop_error; } } Here, no additional member was added: there always was tp_as_number, and that also supported all possible op_slot values. What is new here is that the slot may be called even if v and w have different types; that was not allowed before the PEP 208 changes. Yet it tests for NEW_STYLE_NUMBER(v), which is PyType_HasFeature((o)->ob_type, Py_TPFLAGS_NEWSTYLENUMBER) So the presence of this flag is indeed an promise that a specific member will do something that it normally wouldn't do. > 'Py_TPFLAGS_HASCOERCE' or some such would seem more appropriate to > me. Well, all numbers still have coercion - it just may not be used if the flag is present. It's not a matter of having or not having something (well, only the "new style" numbers may have nb_cmp, but calling it Py_TPFLAGS_HAS_NB_CMP would be besides the point, IMO). Anyway, I don't want to defend my version too much - I just want to request that the current name is changed to *something* more descriptive. Regards, Martin From skip@mojam.com (Skip Montanaro) Sat Jan 6 14:40:30 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 6 Jan 2001 08:40:30 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010105152058.A6016@glacier.fnational.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> <20010105152058.A6016@glacier.fnational.com> Message-ID: <14935.11870.360839.235102@beluga.mojam.com> Neil> I think you, Skip and Moshe are missing a big advantage of having Neil> the __exports__ mechanism. It should allow some attribute access Neil> inside of modules to become faster (like LOAD_FAST for locals). I Neil> think that optimization could be implemented without too much Neil> difficultly. True enough, that hadn't occurred to me. Knowing that now, I still don't think consistency of the interface should suffer as a result of under-the-covers performance gains. Skip From skip@mojam.com (Skip Montanaro) Sat Jan 6 14:42:25 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 6 Jan 2001 08:42:25 -0600 (CST) Subject: [Python-Dev] Re: [Patches] [Patch #103123] PEP 232 implementation (function attributes) In-Reply-To: References: Message-ID: <14935.11985.972526.108391@beluga.mojam.com> Oooo... I tried went to check out Barry's function attribute patch at http://sourceforge.net/patch/?func=detailpatch&patch_id=103123&group_id=5470 and got Fatal error: Call to a member function on a non-object in /usr/local/htdocs/alexandria/www/patch/index.php on line 55 in response. Any idea whazzup? Skip From akuchlin@mems-exchange.org Sat Jan 6 14:47:59 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Sat, 6 Jan 2001 09:47:59 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14934.18552.749081.871226@beluga.mojam.com>; from skip@mojam.com on Fri, Jan 05, 2001 at 04:19:36PM -0600 References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> <14934.11371.879059.610988@localhost.localdomain> <14934.18552.749081.871226@beluga.mojam.com> Message-ID: <20010106094759.A13723@newcnri.cnri.reston.va.us> On Fri, Jan 05, 2001 at 04:19:36PM -0600, Skip Montanaro wrote: >Speaking of which, I am still running my nightly code coverage thing (still >with warts) whose results are available at > http://musi-cal.mojam.com/~skip/python/Python/dist/src/ Add a link to it from the Python development pages on SourceForge; I suspect much of the problem is that people don't remember the URL for it, and don't want to dig through the archives to find it. --amk From mal@lemburg.com Sat Jan 6 15:15:27 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 06 Jan 2001 16:15:27 +0100 Subject: [Python-Dev] PEP 208 comment References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> Message-ID: <3A57368F.FC01F78@lemburg.com> "Martin v. Loewis" wrote: > > I just studied PEP 208 for the first time. Overall, it seems all > natural and nice, but there is one one aspect I'd like to see changed: > the naming of the type flag. > > Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a > program should be called "new". The flag will still be there five > years from now, but it won't be new anymore. Also, while the flag > indicates that style of the numbers is new, it does not say what it > does. So I propose to rename it; if nobody finds a better name, I > propose to call it Py_TPFLAGS_UNCOERCED. Given that the design could well be applied to other slots as well, I think you've got a point there. The idea behind the flag was to signal that slots will no longer make object type assumptions which they could previously. Right now, only numeric types support this feature. In the future I could imaging strings and other types involving coercion would also want to use the feature. Given this design idea, how about calling the flag Py_TPFLAGS_CHECKTYPES ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip@mojam.com (Skip Montanaro) Sat Jan 6 15:35:20 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 6 Jan 2001 09:35:20 -0600 (CST) Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error Message-ID: <14935.15160.130742.390323@beluga.mojam.com> You know, I thought of something (which was probably already obvious to the rest of you) while perusing Barry's patch. Attaching function attributes to unbound methods could really function like C++ static data members. You'd have to write accessor functions to make setting the attributes look clean, but that wouldn't be all bad. Precisely because you couldn't modify them through the bound method, there's be no chance you could make the mistake of modifying them that way and having them transmogrify into instance attributes. Here's a quick example: class C: def __init__(self): self.just_resting() __init__.howmany = 0 def __del__(self): self.hes_dead() def hes_dead(self): C.__init__.howmany -= 1 def just_resting(self): C.__init__.howmany += 1 def howmany(self): return C.__init__.howmany def howmany(): return C.__init__.howmany c = C() print c.howmany() d = C() print d.howmany() del c print d.howmany() After applying Barry's patch, if I execute this script from the command line it displays 1 2 1 as one would expect, but then catches an attribute error during cleanup: Exception exceptions.AttributeError: "'None' object has no attribute '__init__'" in ignored If I add "del d" to the end of the script the exception disappears. I suspect there is a cleanup order problem of some sort. It seems like C is getting reclaimed before d (not possible), or that d's __class__ attribute is set to None before its __del__ method is called. Is this a known problem or something introduced by Barry's patch? Skip From barry@digicool.com Sat Jan 6 16:09:47 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 11:09:47 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103123] PEP 232 implementation (function attributes) References: <14935.11985.972526.108391@beluga.mojam.com> Message-ID: <14935.17227.634808.132783@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> and got | Fatal error: Call to a member function on a non-object in | /usr/local/htdocs/alexandria/www/patch/index.php on line 55 SM> in response. Any idea whazzup? I got a similar error on SF when I tried to find my patch on the patches page. I still think the patch manager just gives you no way to see all the patches when there's more than what fits on one page. The error dropped a cookie in my lap that logged me out too. After I logged in again, it all seemed to work. -Barry From martin@loewis.home.cs.tu-berlin.de Sat Jan 6 15:20:51 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 16:20:51 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <3A57368F.FC01F78@lemburg.com> (mal@lemburg.com) References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <3A57368F.FC01F78@lemburg.com> Message-ID: <200101061520.f06FKpu03218@mira.informatik.hu-berlin.de> > Given this design idea, how about calling the flag > Py_TPFLAGS_CHECKTYPES ?! Sounds good to me. Martin From thomas@xs4all.net Sat Jan 6 16:47:24 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 17:47:24 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <200101061336.f06DadP02895@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sat, Jan 06, 2001 at 02:36:39PM +0100 References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <20010106135219.L2467@xs4all.nl> <200101061336.f06DadP02895@mira.informatik.hu-berlin.de> Message-ID: <20010106174724.M2467@xs4all.nl> On Sat, Jan 06, 2001 at 02:36:39PM +0100, Martin v. Loewis wrote: > That may have been the original intention; *this* specific flag is not > of that kind. Please look at abstract.c:binary_op1, which has You're right, I stand corrected, I retract my proposal :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Sat Jan 6 22:05:23 2001 From: guido@python.org (Guido van Rossum) Date: Sat, 06 Jan 2001 17:05:23 -0500 Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error In-Reply-To: Your message of "Sat, 06 Jan 2001 09:35:20 CST." <14935.15160.130742.390323@beluga.mojam.com> References: <14935.15160.130742.390323@beluga.mojam.com> Message-ID: <200101062205.RAA23603@cj20424-a.reston1.va.home.com> > You know, I thought of something (which was probably already obvious to the > rest of you) while perusing Barry's patch. Attaching function attributes to > unbound methods could really function like C++ static data members. You'd > have to write accessor functions to make setting the attributes look clean, > but that wouldn't be all bad. Precisely because you couldn't modify them > through the bound method, there's be no chance you could make the mistake of > modifying them that way and having them transmogrify into instance > attributes. > > Here's a quick example: > > class C: > def __init__(self): > self.just_resting() > __init__.howmany = 0 > > def __del__(self): > self.hes_dead() > > def hes_dead(self): > C.__init__.howmany -= 1 > > def just_resting(self): > C.__init__.howmany += 1 > > def howmany(self): > return C.__init__.howmany > > def howmany(): > return C.__init__.howmany > > c = C() > print c.howmany() > d = C() > print d.howmany() > del c > print d.howmany() Skip, I don't find this better than the existing solution, which uses C._howmany instead of C.__init__.howmany. True, you can access it as self._howmany and if you assign to self._howmany you'd transform it into an instance attribute -- but that falls in the "then don't do that" category. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Sat Jan 6 22:14:44 2001 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 17:14:44 -0500 Subject: [Python-Dev] Rehabilitating fgets Message-ID: [Guido] > ... > Unfortunately we can't use fgets(), even if it were faster than > getline(), because it doesn't tell how many characters it read. Let's think about that a little harder, because it appears to be our only hope on Windows (the MS fgets isn't optimized like the Perl inner loop, but it does lock/unlock the stream only at routine entry/exit, and uses a hidden non-locking (== much faster) variant of getc in the guts -- we've seen that the "locking" part of MS getc accounts for 17 of 30 seconds in my test case). > On files containing null bytes, readline() is supposed to treat > these like any other character; fgets does too (at least it does on Windows, and I believe that's std behavior). The problem is that it also makes up a null byte on its own. > If your input is "abc\0def\nxyz\n", the first readline() call > should return "abc\0def\n". Yes. > But with fgets(), you're left to look in the returned buffer for > a null byte, Also yes. But suppose I search "from the right", and ensure the buffer is free of null bytes before the fgets. For your input file above, fgets overwrites the initial 9 bytes of the buffer (assuming the buffer is at least 9 bytes long ...) with "abc\0def\n\0" and there's no problem if I search from the right. > and there's no way (in general) to distinguish this result from > an input file that only consisted of the three characters "abc". As above, I'm not convinced of that. The input file "abc" would overwrite the first four bytes of the buffer with "abc\0" and leave the tail end alone (well, the MS fgets leaves the tail alone, although I'm not sure ANSI C guarantees that). Of course I've *read* any number of Unix(tm) FAQs that also claim it's impossible, but I never believed them either . This extra buffer fiddling is surely an expense I don't want to pay, but the timing evidence on Windows so far says that I can probably search and/or copy the whole buffer 100 times and still be faster than enduring the threadsafe getc. Am I missing something obvious? From guido@python.org Sat Jan 6 22:33:00 2001 From: guido@python.org (Guido van Rossum) Date: Sat, 06 Jan 2001 17:33:00 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Your message of "Sat, 06 Jan 2001 17:14:44 EST." References: Message-ID: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> [Tim suggests to use fgets(), preparing the buffer with non-null bytes, and searching for a null byte from the right.] If this is really sufficiently fast, I'd say, go for it. Looks bullet-proof as long as the source code to MSVCRT doesn't change. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Sat Jan 6 22:34:42 2001 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 17:34:42 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Message-ID: [Tim, pondering] > ... But suppose I search "from the right", and ensure the buffer is > free of null bytes before the fgets. Even better, suppose I ensure the buffer is free of both null bytes and newlines before the fgets; then if I search from the *left* for a newline and find one, it must be that fgets found a line and it ends right there, and this should usually obtain. There's no need to search from the right unless I don't find a newline ... From skip@mojam.com (Skip Montanaro) Sun Jan 7 01:15:08 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 6 Jan 2001 19:15:08 -0600 (CST) Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error In-Reply-To: <200101062205.RAA23603@cj20424-a.reston1.va.home.com> References: <14935.15160.130742.390323@beluga.mojam.com> <200101062205.RAA23603@cj20424-a.reston1.va.home.com> Message-ID: <14935.49948.574427.668588@beluga.mojam.com> Skip> Attaching function attributes to unbound methods could really Skip> function like C++ static data members.... Guido> Skip, I don't find this better than the existing solution, which Guido> uses C._howmany instead of C.__init__.howmany. It was more a "hey, I never thought of it quite that way" than a "hey, I think this would be a great new idiom". In fact, I believe the more important part of my note was the bit about the attribute error on exit. I'm sure function attributes will attract their fair share of abuse. ;-) Skip From tim_one@email.msn.com Sun Jan 7 03:16:31 2001 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 22:16:31 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow Message-ID: I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. test_builtin fails because raw_input() isn't stripping a trailing newline. I've got my own code in this area that *may* be to blame, but I don't see how it could be. I note that fileobject.c's new function get_line_raw has the comment /* Internal routine to get a line for raw_input(): strip trailing '\n', raise EOFError if EOF reached immediately */ but the code doesn't look for a trailing newline (let alone strip one). From tim_one@email.msn.com Sun Jan 7 03:33:02 2001 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 22:33:02 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> Message-ID: > [Tim suggests to use fgets(), preparing the buffer with non-null > bytes, and searching for a null byte from the right.] [Guido] > If this is really sufficiently fast, I'd say, go for it. Looks > bullet-proof as long as the source code to MSVCRT doesn't change. :-) Surprise? Despite all the memsets, memchrs (looking for a newline), and one-at-a-time backward searches (looking for a null byte), it's a huge win on Windows: total 117615824 chars and 3237568 lines readlines_sizehint 9.550 9.578 using_fileinput 28.790 28.781 while_readline 13.120 13.134 The last one was 30.5 seconds before the fgets hackery. I'll check it in tomorrow after sleeping on it (there's a large pile of messy endcases (not only does fgets() invent a null byte, it can't tell you whether it stopped reading due to EOF, so maybe the last line in the file ends with 10000 null bytes + no newline + exactly lines up with a buffer boundary -- etc); test_builtin is failing in a closely related area but nobody would have checked in code that failed a std test ; and it's been a frustrating day all around). i-want-my-cable-modem-back-now-ly y'rs - tim From esr@thyrsus.com Sun Jan 7 04:01:25 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sat, 6 Jan 2001 23:01:25 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: ; from tim_one@email.msn.com on Sat, Jan 06, 2001 at 10:33:02PM -0500 References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> Message-ID: <20010106230125.A29058@thyrsus.com> Tim Peters : > > [Tim suggests to use fgets(), preparing the buffer with non-null > > bytes, and searching for a null byte from the right.] No, I haven't forgotten about the curses autoconfig stuff. But... This mess reminds me. For some work I'm doing right now, it would be very useful if there were a way to query the end-of-file status of a file descriptor without actually doing a read. I don't see this ability anywhere in the 2.0 API. Questions: 1. Am I missing something obvious? 2. If the answer to 1 is that I am not, in fact, being a dumbass, what is the right way to support this? The obvious alternatives are an eof member (analogous to the existing `closed' member, or an eof() method. I favor the latter. 3. If we agree on a design, I'm willing to implement this at least for Unix. Should be a small project. -- Eric S. Raymond The direct use of physical force is so poor a solution to the problem of limited resources that it is commonly employed only by small children and great nations. -- David Friedman From skip@mojam.com (Skip Montanaro) Sun Jan 7 04:05:22 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 6 Jan 2001 22:05:22 -0600 (CST) Subject: [Python-Dev] readline module seems crippled - am I missing something? Message-ID: <14935.60162.726131.593211@beluga.mojam.com> For a more-or-less throwaway script I'm working on I need a little input function similar to Emacs's read-from-minibuffer, which accepts both a prompt and an initial string for the input buffer. Seems like I ought to be able to whip something up using readline, but it's not happening. GNU readline's docs aren't the greatest, but I thought this simple script would work: import readline readline.insert_text("default") x = raw_input("?") print x I expected to see an editable "default" displayed after the prompt and have x default to "default" if I just hit the return key. I see nothing displayed after the question mark, and x is the empty string if I just hit return. This does print "default": readline.insert_text("default") x = readline.get_line_buffer() print x so I know that insert_text and get_line_buffer seem to be working as intended. Looking at call_readline in Modules/readline.c I see nothing that would disrupt the line buffer before the call to readline(). Am I missing something totally obvious about how GNU readline works or the conditions under which readline is used (only at the interactive prompt?) or is some required bit of GNU readline not exposed through Python's readline module? Skip From tim.one@home.com Sun Jan 7 10:09:02 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 7 Jan 2001 05:09:02 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <20010106230125.A29058@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > For some work I'm doing right now, it would be very useful if > there were a way to query the end-of-file status of a file > descriptor without actually doing a read. > > I don't see this ability anywhere in the 2.0 API. When someone says "API", I think "C API". In that case you can use feof(stream) directly, or whatever the heck your platform supports for handles (_eof(handle) on Windows, which I know is an OS you're secretly longing to master ). I don't believe there's a way to find out from Python short of trying to read, though. Well, I suppose you could try to compare f.tell() to the size, if you knew that f.tell() and "the size" made sense for f ... > 1. Am I missing something obvious? I don't know! I never asked Guido about this, and given that he's not on vacation now I'm not allowed to channel him. I would hazard a guess, though, that he thinks "you do or don't get something back when you read" is clearer than "you may or may not get something back when you read, regardless of which answer I give you in response to .eof() -- depending". The latter is particularly muddy in a threaded environment, even for plain old disk files. > 2. If the answer to 1 is that I am not, in fact, being a dumbass, > what is the right way to support this? The obvious alternatives > are an eof member (analogous to the existing `closed' member, or > an eof() method. I favor the latter. > > 3. If we agree on a design, I'm willing to implement this at least > for Unix. Should be a small project. I agree an .eof() method would be better than a data member. Note that whenever Python internals hit stream EOF today, they call clearerr(), so simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to make sure that feof() would never be useful <0.8 wink>. one-of-life's-little-mysteries-ly y'rs - tim From gstein@lyra.org Sun Jan 7 10:46:54 2001 From: gstein@lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 02:46:54 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.96,2.97 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Fri, Jan 05, 2001 at 06:43:07AM -0800 References: Message-ID: <20010107024654.W17220@lyra.org> On Fri, Jan 05, 2001 at 06:43:07AM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv3183 > > Modified Files: > fileobject.c > Log Message: > Restructured get_line() for clarity and speed. > > - The raw_input() functionality is moved to a separate function. > > - Drop GNU getline() in favor of getc_unlocked(), which exists on more > platforms (and is even a tad faster on my system). The "configure" tests for getline() can be punted if we won't use it any more... Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sun Jan 7 12:27:57 2001 From: gstein@lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 04:27:57 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101052014.PAA20328@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 05, 2001 at 03:14:41PM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <20010107042757.X17220@lyra.org> It feels wrong. Whatever happened to the "we're all adults here" mantra. Besides people asking for it, what is a good reason *for* it to be added? Cheers, -g On Fri, Jan 05, 2001 at 03:14:41PM -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). > > I like it. This has been asked for many times. Does anybody see a > reason why this should *not* be added? > > Tim remarked that introducing this will prompt demands for a similar > feature on classes and instances, where it will be hard to implement > without causing a bit of a slowdown. It causes a slight slowdown (an > extra dictionary lookup for each use of "M.v") even when it is not > used, but for accessing module variables that's acceptable. I'm not > so sure about instance variable references. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Greg Stein, http://www.lyra.org/ From guido@python.org Sun Jan 7 16:52:11 2001 From: guido@python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 11:52:11 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sat, 06 Jan 2001 23:01:25 EST." <20010106230125.A29058@thyrsus.com> References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> <20010106230125.A29058@thyrsus.com> Message-ID: <200101071652.LAA31411@cj20424-a.reston1.va.home.com> > This mess reminds me. For some work I'm doing right now, it would be > very useful if there were a way to query the end-of-file status of a > file descriptor without actually doing a read. I hope you really mean file object (== wrapper around stdio FILE object). A file descriptor (small little integer in Unix) doesn't have a way to find this out. Even for file objects, it is typically only known that there's an EOF condition after a lowest-level read operation returned 0 bytes. So in effect you must still do a read in order to determine EOF status. I just ran a small test program, and fread() appears to set the eof status when it returns a short count. Normally, Python's read() uses fread() so this might be useful. However after a readline(), you can't know the eof status (unless the last line of the file doesn't end in a newline). > I don't see this ability anywhere in the 2.0 API. Questions: > > 1. Am I missing something obvious? > > 2. If the answer to 1 is that I am not, in fact, being a dumbass, what > is the right way to support this? The obvious alternatives are an > eof member (analogous to the existing `closed' member, or an eof() > method. I favor the latter. > > 3. If we agree on a design, I'm willing to implement this at least for > Unix. Should be a small project. Before adding an eof() method, can you explain what your program is trying to do? Is it reading from a pipe or socket? Then select() or poll() might be useful. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Sun Jan 7 18:30:32 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 13:30:32 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: ; from tim.one@home.com on Sun, Jan 07, 2001 at 05:09:02AM -0500 References: <20010106230125.A29058@thyrsus.com> Message-ID: <20010107133032.F4586@thyrsus.com> Tim Peters : > I agree an .eof() method would be better than a data member. Note that > whenever Python internals hit stream EOF today, they call clearerr(), so > simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to > make sure that feof() would never be useful <0.8 wink>. That's inconvenient, but only means the internal Python state flag that feof() would inspect would have to be checked after each read. -- Eric S. Raymond "...The Bill of Rights is a literal and absolute document. The First Amendment doesn't say you have a right to speak out unless the government has a 'compelling interest' in censoring the Internet. The Second Amendment doesn't say you have the right to keep and bear arms until some madman plants a bomb. The Fourth Amendment doesn't say you have the right to be secure from search and seizure unless some FBI agent thinks you fit the profile of a terrorist. The government has no right to interfere with any of these freedoms under any circumstances." -- Harry Browne, 1996 USA presidential candidate, Libertarian Party From esr@thyrsus.com Sun Jan 7 18:45:41 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 13:45:41 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101071652.LAA31411@cj20424-a.reston1.va.home.com>; from guido@python.org on Sun, Jan 07, 2001 at 11:52:11AM -0500 References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> <20010106230125.A29058@thyrsus.com> <200101071652.LAA31411@cj20424-a.reston1.va.home.com> Message-ID: <20010107134541.G4586@thyrsus.com> Guido van Rossum : > > This mess reminds me. For some work I'm doing right now, it would be > > very useful if there were a way to query the end-of-file status of a > > file descriptor without actually doing a read. > > I hope you really mean file object (== wrapper around stdio FILE > object). A file descriptor (small little integer in Unix) doesn't > have a way to find this out. You're right, my bad. > Even for file objects, it is typically only known that there's an EOF > condition after a lowest-level read operation returned 0 bytes. So in > effect you must still do a read in order to determine EOF status. > > I just ran a small test program, and fread() appears to set the eof > status when it returns a short count. Normally, Python's read() uses > fread() so this might be useful. However after a readline(), you > can't know the eof status (unless the last line of the file doesn't > end in a newline). I considered trying a zero-length read() in Python, but this strikes me as inelegant even if it would work. > Before adding an eof() method, can you explain what your program is > trying to do? Is it reading from a pipe or socket? Then select() or > poll() might be useful. Sadly, it's exactly the wrong case. Hmmm...omitting irrelevant details, it's a situation where a markup file can contain sections in two different languages. The design requires the first interpreter to exit on seeing either EOF or a marker that says "switching to second language". For reasons too compllicated to explain, it would be best if the parser for the first language didn't simply call the second parser. The logic I wanted to write amounts to: while 1: line = fp.readline() if not line or line == "history": break interpret_in-language_1(line) if not fp.feof() while 1: line = fp.readline() if not line: break interpret_in-language_2(line) I just tested the zero-length-read method. That worked. I guess I'll use it. -- Eric S. Raymond "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom." -- John F. Kennedy From martin@loewis.home.cs.tu-berlin.de Sun Jan 7 18:45:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 7 Jan 2001 19:45:15 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? Message-ID: <200101071845.f07IjFi01249@mira.informatik.hu-berlin.de> Authors of extension packages often find the need to auto-import some of their modules. This is often needed for registration, e.g. a codec author (like Tamito KAJIYAMA, who wrote the JapaneseCodecs package) may need to register a search function with codecs.register. This is currently only possible by writing into sitecustomize.py, which must be done by the system administrator manually. To enhance the service of site.py, I've written the patch http://sourceforge.net/patch/?func=detailpatch&patch_id=103134&group_id=5470 which treats lines in PTH files which start with "import" as statements and executes them, instead of appending these lines to sys.path. The patch is relatively small, but since it is an extension: Do I need to write a PEP for it? Regards, Martin From tismer@tismer.com Sun Jan 7 18:05:21 2001 From: tismer@tismer.com (Christian Tismer) Date: Sun, 07 Jan 2001 20:05:21 +0200 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> <20010105152058.A6016@glacier.fnational.com> <14935.11870.360839.235102@beluga.mojam.com> Message-ID: <3A58AFE1.3AB619BD@tismer.com> Skip Montanaro wrote: > > Neil> I think you, Skip and Moshe are missing a big advantage of having > Neil> the __exports__ mechanism. It should allow some attribute access > Neil> inside of modules to become faster (like LOAD_FAST for locals). I > Neil> think that optimization could be implemented without too much > Neil> difficultly. > > True enough, that hadn't occurred to me. Knowing that now, I still don't > think consistency of the interface should suffer as a result of > under-the-covers performance gains. Ok, vice versa: Given that we can support access control via __exports__ for modules, classes and instances as well, *and* if we can think up a scheme that allows a LOAD_FAST like speedup for all of these cases at the same time, then I would say +1, otherwise -0, half-hearted solution. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@python.org Sun Jan 7 21:13:01 2001 From: guido@python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 16:13:01 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sun, 07 Jan 2001 13:30:32 EST." <20010107133032.F4586@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> Message-ID: <200101072113.QAA32467@cj20424-a.reston1.va.home.com> > Tim Peters : > > I agree an .eof() method would be better than a data member. Note that > > whenever Python internals hit stream EOF today, they call clearerr(), so > > simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to > > make sure that feof() would never be useful <0.8 wink>. > [ESR] > That's inconvenient, but only means the internal Python state flag > that feof() would inspect would have to be checked after each read. This was done because some platforms set feof() when there's still a possibity to read more (e.g. after an interactive user typed ^D), while others don't. It's inconvenient to get an endless stream of EOFs from stdin when a user typed ^D to one particular prompt, so I decided to clear the EOF status. [ESR in a later message] > I considered trying a zero-length read() in Python, but this strikes me > as inelegant even if it would work. I doubt that a zero-length read conveys any information. It should return "" whether or not there is more to read! Plus, look at the implementation of readline() (file_readline() in Objects/fileobject.c): it shortcuts the n == 0 case and returns an empty string without touching the file. [me] > > Before adding an eof() method, can you explain what your program is > > trying to do? Is it reading from a pipe or socket? Then select() or > > poll() might be useful. [ESR again] > Sadly, it's exactly the wrong case. Hmmm...omitting irrelevant details, > it's a situation where a markup file can contain sections in two different > languages. The design requires the first interpreter to exit on seeing > either EOF or a marker that says "switching to second language". For > reasons too compllicated to explain, it would be best if the parser for > the first language didn't simply call the second parser. > > The logic I wanted to write amounts to: > > while 1: > line = fp.readline() > if not line or line == "history": > break > interpret_in-language_1(line) > > if not fp.feof() > while 1: > line = fp.readline() > if not line: > break > interpret_in-language_2(line) > > I just tested the zero-length-read method. That worked. I guess I'll > use it. Bizarre (given what I know about zero-length read). But in the above code, you can replace "if not fp.feof()" with "if line". In other words, you just have to carry the state over within your program. So, I see no reason why the logic in your program couldn't take care of this, which in general is a preferred way to solve a problem than to change the language. Also note that in Python it's no sin to attempt to read a line even when the file is already at EOF -- you will simply get an empty line again. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Sun Jan 7 21:29:46 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Sun, 7 Jan 2001 22:29:46 +0100 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> Message-ID: <035901c078f0$f6180f70$e46940d5@hagrid> Guido van Rossum wrote: > Bizarre (given what I know about zero-length read). But in the above > code, you can replace "if not fp.feof()" with "if line". In other > words, you just have to carry the state over within your program. and if that's too hard, just hide the state in a class: class FileWrapper: def __init__(self, file): self.__file = file self.__line = None def __more(self): # try reading another line if not self.__line: self.__line = self.__file.readline() def eof(self): self.__more() return not self.__line def readline(self): self.__more() line = self.__line self.__line = None return line file = open("myfile.txt") file = FileWrapper(file) while not file.eof(): print repr(file.readline()) From guido@python.org Sun Jan 7 21:32:26 2001 From: guido@python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 16:32:26 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: Your message of "Sat, 06 Jan 2001 22:16:31 EST." References: Message-ID: <200101072132.QAA32627@cj20424-a.reston1.va.home.com> > I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. > > test_builtin fails because raw_input() isn't stripping a trailing newline. > I've got my own code in this area that *may* be to blame, but I don't see > how it could be. I note that fileobject.c's new function get_line_raw has > the comment > > /* Internal routine to get a line for raw_input(): > strip trailing '\n', raise EOFError if EOF reached immediately > */ > > but the code doesn't look for a trailing newline (let alone strip one). My bad. Try the latest CVS now. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Sun Jan 7 22:15:27 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 17:15:27 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101072113.QAA32467@cj20424-a.reston1.va.home.com>; from guido@python.org on Sun, Jan 07, 2001 at 04:13:01PM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> Message-ID: <20010107171527.A5093@thyrsus.com> Guido van Rossum : > [ESR in a later message] > > I considered trying a zero-length read() in Python, but this strikes me > > as inelegant even if it would work. > > I doubt that a zero-length read conveys any information. It should > return "" whether or not there is more to read! Duh. Of course it would. You know, I've always been half-consciously dissatisfied with Python's use of "" as an EOF marker, and now I know why. It's precisely because there's no way to distinguish these cases. I think a zero-length read ought to return "" and a read on EOF ought to return None. > Bizarre (given what I know about zero-length read). But in the above > code, you can replace "if not fp.feof()" with "if line". In other > words, you just have to carry the state over within your program. > > So, I see no reason why the logic in your program couldn't take care > of this, which in general is a preferred way to solve a problem than > to change the language. OK, two objections, one practical and one (more important) esthetic: Practical: I guess I oversimplified the code for expository purposes. What's actually going on is that I have two parser classes both based on shlex -- they do character-at-a-time input and don't actually *have* accessible line buffers. Esthetic: Yes, I can have the first parser set a flag, or return some EOF token. But this seems deeply wrong to me, because EOFness is not a property of the parser but of the underlying stream object. It seems to me that my program ought to be able to ask the stream object whether it's at EOF rather than carrying its own flag for that state. In Python as it is, there's no clean way to do this. I'd have to do a nonzero-length read to test it (I failed to check the right alternate case before when I tried zero-length). That's really broken. What if the neither the underlying stream nor the parser supports pushback? Do you see now why I think this is a more general issue? Now, another and more general way to handle this would be to make an equivalent of the old FIONCLEX ioctl part of Python's standard set of file object methods -- a way to ask "how many bytes are ready to be read in this stream? Trivial to make it work for plain files, of course. Harder to make it work usefully for pipes/fifos/sockets/terminals. Having it pass up the results of the fstat.size field (corrected for the current seek address if you're reading a plain file) would be a good start. -- Eric S. Raymond Live free or die; death is not the worst of evils. -- General George Stark. From tismer@tismer.com Sun Jan 7 22:37:55 2001 From: tismer@tismer.com (Christian Tismer) Date: Mon, 08 Jan 2001 00:37:55 +0200 Subject: [Python-Dev] ANN: Stackless Python 2.0 Message-ID: <3A58EFC3.5A722FF0@tismer.com> Dear community, I'm happy to announce that Stackless Python 2.0 is finally ready and available for download. Stackless Python for Python 1.5.2+ also got some minor enhancements. Both versions are available as Win32 installer files here: http://www.stackless.com/spc20-win32.exe http://www.stackless.com/spc15-win32.exe Speed: Stackless Python for Python 2.0 is again a bit faster than the original. This time even better: About 9-10 percent. I have to say that optimization was much harder this time. My speed patches are now done by a Python script, which will make maintenance and diff reading much easier in the future. There is now also a bit of example code available, like the uthread9.py Microthreads module from Will Ware, Just van Rossum, and Mike Fletcher. Source code and an update to the website will become available in the next days. enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal@lemburg.com Mon Jan 8 00:26:00 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 01:26:00 +0100 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow References: Message-ID: <3A590918.E90031AA@lemburg.com> Tim Peters wrote: > > I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. test_charmapcodec is my fault... I should run the tests in a clean room environment before checkin: my PYTHONPATH picked up some other file which it was not supposed to do. I'll fix it next week. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one@home.com Mon Jan 8 04:13:26 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 7 Jan 2001 23:13:26 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Message-ID: The "Win32" readline() hack is now checked in, but there's really nothing Win32-specific about it anymore. It makes one mild assumption about what the C std doesn't clearly address but may have intended: that in case of a non-NULL return, fgets doesn't overwrite any of the buffer positions beyond the terminating null byte (the std is clear that it doesn't overwrite anything at all in case of a NULL-because-EOF return, but I can't say whether they're pointing that out as a consequence, or pointing that out as an exception). I'm curious about how it performs (relative to the getc_unlocked hack) on other platforms. If you'd like to try that, just recompile fileobject.c with USE_MS_GETLINE_HACK #define'd. It should *work* on any platform with fgets() meeting the assumption. The new test_bufio.py std test gives it a pretty good correctness workout, if you're worried about that. From esr@snark.thyrsus.com Mon Jan 8 04:16:53 2001 From: esr@snark.thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 23:16:53 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge Message-ID: <200101080416.f084GrM10912@snark.thyrsus.com> Setting things up so curses is autoconfigured into the default build if your system has it in the expected places turned out to be dead easy. Some clever person (the BDFL himself?) wrote the build process so that there is *already* a Setup.config.in that gets configure expansions done on it, with the generated Setup.config used when makesetup does its magic. As a bonus, I've also added autoconfiguration for readline. A small detail, but one which I suspect many people building their own Pythons frequently trip over. The technique generalizes easily. The archetype for a facility for autoconfiguring libfoo with a Python extension foo.c if it's present has just two steps: Add this to Modules/Setup.config.in: @USE_FOO_MODULE@foo foo.c -lfoo Add this to configure.in: # This is used to generate Setup.config AC_SUBST(USE_FOO_MODULE) AC_CHECK_LIB(foo, random_foo_function, [USE_FOO_MODULE=""], [USE_FOO_MODULE="#"]) (Apologies for the lack of description with the patch. I tripped over a SourceForge interface bug.) -- Eric S. Raymond The possession of arms by the people is the ultimate warrant that government governs only with the consent of the governed. -- Jeff Snyder From tim.one@home.com Mon Jan 8 05:34:20 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 00:34:20 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <3A590918.E90031AA@lemburg.com> Message-ID: An update: test_builtin works again (thanks, Guido!), and test_charmapcodec will "next week" (thanks, MAL!). Still unknown (to me): is the test_pow failure unique to Windows? One response from a Unix(tm) geek would settle that. From nas@arctrix.com Sun Jan 7 22:59:49 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 14:59:49 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 12:34:20AM -0500 References: <3A590918.E90031AA@lemburg.com> Message-ID: <20010107145949.A14166@glacier.fnational.com> On Mon, Jan 08, 2001 at 12:34:20AM -0500, Tim Peters wrote: > Still unknown (to me): is the test_pow failure unique to Windows? One > response from a Unix(tm) geek would settle that. It works fine for me on Linux. I thought I tested on Windows before checking in the coerce patch. I'll try again. Neil From nas@arctrix.com Sun Jan 7 23:29:14 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:29:14 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107145949.A14166@glacier.fnational.com>; from nas@arctrix.com on Sun, Jan 07, 2001 at 02:59:49PM -0800 References: <3A590918.E90031AA@lemburg.com> <20010107145949.A14166@glacier.fnational.com> Message-ID: <20010107152914.A14228@glacier.fnational.com> On Sun, Jan 07, 2001 at 02:59:49PM -0800, Neil Schemenauer wrote: > It works fine for me on Linux. I thought I tested on Windows > before checking in the coerce patch. I'll try again. Wierd. rt.bat does not run the test_pow script. If I run "regrtet test_pow" then the test fails. It could be a problem with line endings (I copied the source for a Unix CVS checkout). Anyhow, I found the bug. I don't know how test_pow was passing under Linux. Time to reboot again. Neil From tim.one@home.com Mon Jan 8 06:39:20 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 01:39:20 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107152914.A14228@glacier.fnational.com> Message-ID: [NeilS] > Wierd. rt.bat does not run the test_pow script. Works for me, else I never would have noticed . Also works for me in single-test mode: C:\Code\python\dist\src\PCbuild>rt test_pow C:\Code\python\dist\src\PCbuild>python ../lib/test/regrtest.py test_pow test_pow The actual stdout doesn't match the expected stdout. This much did match (between asterisk lines): ********************************************************************** test_pow Testing integer mode... Testing 2-argument pow() function... Testing 3-argument pow() function... Testing long integer mode... Testing 2-argument pow() function... Testing 3-argument pow() function... Testing floating point mode... Testing 3-argument pow() function... The number in both columns should match. 3 3 -5 -5 -1 -1 5 5 -3 -3 -7 -7 3L 3L -5L -5L -1L -1L 5L 5L -3L -3L -7L -7L 3.0 3.0 -5.0 -5.0 -1.0 -1.0 -7.0 -7.0 ********************************************************************** Then ... We expected (repr): '' But instead we got: 'Float mismatch:' test test_pow failed -- Writing: 'Float mismatch:', expected: '' 1 test failed: test_pow C:\Code\python\dist\src\PCbuild> That may point to the problem, too: the canned output file is truncated? > If I run "regrtet test_pow" then the test fails. It could be a > problem with line endings (I copied the source for a Unix CVS > checkout). Don't understand; e.g., "copied" what, from where to where? I'm not sure I gave you write access to my box, and hacking into Windows machines is uncool because it's not challenging . > Anyhow, I found the bug. I don't know how test_pow was passing > under Linux. Time to reboot again. Cool! BTW, Windows solves the "don't reboot enough" problem for you via automation, sometimes on an hourly basis. Thanks for sharing the brain cells, Neil! From thomas@xs4all.net Mon Jan 8 06:44:11 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 07:44:11 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101080416.f084GrM10912@snark.thyrsus.com>; from esr@snark.thyrsus.com on Sun, Jan 07, 2001 at 11:16:53PM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> Message-ID: <20010108074411.N2467@xs4all.nl> On Sun, Jan 07, 2001 at 11:16:53PM -0500, Eric S. Raymond wrote: > Setting things up so curses is autoconfigured into the default build > if your system has it in the expected places turned out to be dead > easy. Some clever person (the BDFL himself?) wrote the build process > so that there is *already* a Setup.config.in that gets configure > expansions done on it, with the generated Setup.config used when > makesetup does its magic. Skip, actually, IIRC. It was added in the last stages of 2.0 development, to auto-detect bsddb. However, I still think it should be a separate 'configure', in the Modules directory. Especially now that Andrew is practically checking in the distutils setup ;) The main configure can make an educated guess whether Python and distutils are available, and call configure with some passed-through options if not. It does depend on what the distutils setup does, though, and I'll shamefully admit that I haven't looked at that ;P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas@arctrix.com Sun Jan 7 23:51:16 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:51:16 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 01:39:20AM -0500 References: <20010107152914.A14228@glacier.fnational.com> Message-ID: <20010107155116.A14312@glacier.fnational.com> On Mon, Jan 08, 2001 at 01:39:20AM -0500, Tim Peters wrote: > [NeilS] > > If I run "regrtet test_pow" then the test fails. It could be a > > problem with line endings (I copied the source for a Unix CVS > > checkout). > > Don't understand; e.g., "copied" what, from where to where? I should have been clearer. I mean the problem with rt.bat not running test_pow. I copied the CVS source from my Linux ext2 filesystem to a VFAT filesystem. I was too lazy to fix the line endings. Neil From nas@arctrix.com Sun Jan 7 23:52:38 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:52:38 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107152914.A14228@glacier.fnational.com>; from nas@arctrix.com on Sun, Jan 07, 2001 at 03:29:14PM -0800 References: <3A590918.E90031AA@lemburg.com> <20010107145949.A14166@glacier.fnational.com> <20010107152914.A14228@glacier.fnational.com> Message-ID: <20010107155238.A14291@glacier.fnational.com> On Sun, Jan 07, 2001 at 03:29:14PM -0800, Neil Schemenauer wrote: > I don't know how test_pow was passing under Linux. Under Linux with the buggy float_pow: >>> pow(10.0, 0, 10) nan >>> pow(10.0, 0, 10) == 1 1 >>> pow(10.0, 0, 10) == 0 1 Under Windows NAN obviously behaves differently. floating-point-is-fun-ly y'rs Neil From esr@thyrsus.com Mon Jan 8 06:49:45 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 01:49:45 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108074411.N2467@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 08, 2001 at 07:44:11AM +0100 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> Message-ID: <20010108014945.A19516@thyrsus.com> Thomas Wouters : > On Sun, Jan 07, 2001 at 11:16:53PM -0500, Eric S. Raymond wrote: > > Setting things up so curses is autoconfigured into the default build > > if your system has it in the expected places turned out to be dead > > easy. Some clever person (the BDFL himself?) wrote the build process > > so that there is *already* a Setup.config.in that gets configure > > expansions done on it, with the generated Setup.config used when > > makesetup does its magic. > > Skip, actually, IIRC. It was added in the last stages of 2.0 development, to > auto-detect bsddb. However, I still think it should be a separate > 'configure', in the Modules directory. You may be right. Still, this patch solves the immediate problem in a reasonably clean way, and I urge that it should go in. We can do a more complete reorganization of the build process later. (I'll help with that; I'm pretty expert with autoconf and friends.) -- Eric S. Raymond "As to the species of exercise, I advise the gun. While this gives [only] moderate exercise to the body, it gives boldness, enterprise, and independence to the mind. Games played with the ball and others of that nature, are too violent for the body and stamp no character on the mind. Let your gun, therefore, be the constant companion to your walks." -- Thomas Jefferson, writing to his teenaged nephew. From tim.one@home.com Mon Jan 8 07:05:46 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 02:05:46 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: Well, I like __exports__ (but not some details of the patch, for which see my SF comments). Guido is aware of the optimization possibilities, but that's not what's driving it. I don't know why he likes it; I like it because the only normal use for a module is to do module.attr, or "from module import attr", and dir(module) very often exposes stuff today that the module author had no intention of exporting. For example, if I do import os dir(os) under CVS Python today, on my box I see that os exports "i". It's bound to _exit. That's baffling, and is purely an accident of how module os.py initialization works when you're running on Windows. Couple that with that I've hardly ever seen (or bothered to write) a module docstring spelling out everything a module *intends* to export, and an __exports__ line near the top (when present) would also automagically give a solid answer to that question. modules aren't classes or instances, and in normal practice modules accumulate all sorts of accidental attrs (due to careless (== normal) imports, and module init code). It doesn't make any *sense* that os exports "sys" either, or that random exports "cos", or that cgi exports "string", or ... this inelegance is ubiquitous. In a world with an __exports__ that gets used, though, I do wonder whether people will or won't export their test() functions. I really like that they do now. or-maybe-it's-just-that-i-like-modules-that-*have*-a- test-function-ly y'rs - tim From gstein@lyra.org Mon Jan 8 07:25:32 2001 From: gstein@lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 23:25:32 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 02:05:46AM -0500 References: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <20010107232532.V17220@lyra.org> On Mon, Jan 08, 2001 at 02:05:46AM -0500, Tim Peters wrote: >... > modules aren't classes or instances, and in normal practice modules > accumulate all sorts of accidental attrs (due to careless (== normal) > imports, and module init code). It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. Simple question: so what? "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Mon Jan 8 07:29:39 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 02:29:39 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107155238.A14291@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Under Linux with the buggy float_pow: > > >>> pow(10.0, 0, 10) > nan > >>> pow(10.0, 0, 10) == 1 > 1 > >>> pow(10.0, 0, 10) == 0 > 1 > > Under Windows NAN obviously behaves differently. Comparisons with NaN are a platform-dependent accident, partly because some C compilers generate nonsense code, partly because Python isn't coded to cater to NaN's peculiarities either. The behavior under Windows is (accidentally) better in these cases today (NaN should never compare equal to anything -- not even to itself -- and, curiously, MSVC's codegen mistakes cancel out Python's mistakes in this case!). Thank you for fixing the bug. Only test_charmapcodec is failing for me now, and MAL knows the cause and cure. nothing-can-stop-the-alpha-now-ly y'rs - tim From thomas@xs4all.net Mon Jan 8 07:42:30 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 08:42:30 +0100 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 02:29:39AM -0500 References: <20010107155238.A14291@glacier.fnational.com> Message-ID: <20010108084230.O2467@xs4all.nl> On Mon, Jan 08, 2001 at 02:29:39AM -0500, Tim Peters wrote: > (NaN should never compare equal to anything -- not even to itself You know that's impossible, in Python, right ? (Due to the shortcut taken by '==', based on object identity.) Is that going to be 'fixed', too ? :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ping@lfw.org Mon Jan 8 07:51:11 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 7 Jan 2001 23:51:11 -0800 (PST) Subject: [Python-Dev] inspect.py In-Reply-To: Message-ID: Hi again. Sorry to bother you if you're busy -- i haven't seen any responses about inspect.py for a few days and wanted to know what your reactions were. The module and test suite are still at: http://www.lfw.org/python/inspect.py http://www.lfw.org/python/test_inspect.py The only change since my announcement last Wednesday is that getframe() has been renamed to getframeinfo(). Thanks, -- ?!ng "Old code doesn't die -- it just smells that way." -- Bill Frantz From tim.one@home.com Mon Jan 8 08:17:57 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:17:57 -0500 Subject: NaN nonsense (was RE: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow) In-Reply-To: <20010108084230.O2467@xs4all.nl> Message-ID: >> (NaN should never compare equal to anything -- not even to itself [Thomas Wouters] > You know that's impossible, in Python, right ? (Due to the > shortcut taken by '==', based on object identity.) Surely you jest: I probably knew that while you were still nursing . OTOH, Python on WinTel comes remarkably close (by accident): C:\Code\python\dist\src\PCbuild>python Python 2.0 (#8, Jan 5 2001, 00:33:19) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> inf = 1e300**2 >>> inf 1.#INF >>> nan = inf - inf >>> nan -1.#IND >>> nan2 = nan * 1.0 >>> nan2 -1.#IND >>> nan == nan2 0 >>> > Is that going to be 'fixed', too ? :) Not if I can help it. I'd be in favor of adding an fcmp function that needs to be called explicitly when you want the full complexity of 754 comparisons. Count them all up, and there are 32 distinct 754 binary float comparison operators! The 754 std says 26 (from memory, may be 2 more or less) of those have to be supplied, but-- since 754 is not a language std --says nothing about how they're to be spelled. OTOH, C99 resolutely tries to map that into C, and 754 True Believers will use that as a club. On the third hand, as Tom MacDonald posted here earlier (he was X3J11 chair), he's not sure anyone will ever implement C99 in whole. The complexities of full 754 support are a large part of why he worries about that. too-much-too-late-ly y'rs - tim From tim.one@home.com Mon Jan 8 08:17:59 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:17:59 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010107232532.V17220@lyra.org> Message-ID: [Greg Stein] > Simple question: so what? > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Couldn't care less about the module author. It's the module user who has to sort this stuff out. "Don't use 'import *'" is good advice but not followed either, and after I do from MyPackage import sys # intentionally exports its own sys from GregSnort import * # accidentally exports some other sys madness ensues. Like I said, it's inelegant, and at best. Simple question for you: what would __exports__ hurt? "Oh, no! Tim's module explicitly lists what it intended to export! Oh, woe is me!". Gimme a break. From gstein@lyra.org Mon Jan 8 08:26:03 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 00:26:03 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:17:59AM -0500 References: <20010107232532.V17220@lyra.org> Message-ID: <20010108002603.X17220@lyra.org> On Mon, Jan 08, 2001 at 03:17:59AM -0500, Tim Peters wrote: > [Greg Stein] > > Simple question: so what? > > > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* > > Couldn't care less about the module author. It's the module user who has to > sort this stuff out. "Don't use 'import *'" is good advice but not followed > either, and after I do > > from MyPackage import sys # intentionally exports its own sys > from GregSnort import * # accidentally exports some other sys > > madness ensues. Like I said, it's inelegant, and at best. > > Simple question for you: what would __exports__ hurt? "Oh, no! Tim's > module explicitly lists what it intended to export! Oh, woe is me!". Gimme > a break. hehe... adding __exports__ to your module is fine. Adding more crud to Python, in opposition to the "we're all adults" motto, doesn't seem Right. Somebody wants to use "from foo import *" on a module not designed for it? Too bad for them. If you're suggesting __exports__ is to patch over problems caused by "from foo import *", then I think you're barking up the wrong tree :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez@zadka.site.co.il Mon Jan 8 16:50:57 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 8 Jan 2001 18:50:57 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010107232532.V17220@lyra.org> References: <20010107232532.V17220@lyra.org>, <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> [Tim Peters] > modules aren't classes or instances, and in normal practice modules > accumulate all sorts of accidental attrs (due to careless (== normal) > imports, and module init code). It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. [Greg Stein] > Simple question: so what? > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Let me "me to" here: Put another way, what Greg said is just a rephrase of "don't use from foo import * unless foo's docos say it's OK". Add to that the simple access control of a leading underscore, and I don't see any place which needs it. Something better to do would be to use import foo as _foo In some standard library modules, and minimize using from foo import bar in them. Since everyone know that leading underscore means "implementation detail - ignore at your convenience, use at yor peril", this would keep the "we're all adults" philosophy of Python, with all the advantages *I* see in __exports__. One more point against __exports__, which I hoped I would not have to make (but when I'm up against the timbot *and* Guido, I need to pull out the heavy artillery): it would *totally* stop any hope in the future of module level __getattr__ (or at least complicate the semantics). I think Alex M. is thinking of a PEP, but he's taking his time, since no PEPs can be considered until 2.1 is out. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one@home.com Mon Jan 8 08:49:58 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:49:58 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010108002603.X17220@lyra.org> Message-ID: [Greg Stein] > hehe... adding __exports__ to your module is fine. Adding more > crud to Python, in opposition to the "we're all adults" motto, > doesn't seem Right. My idea of what's Right is copied from my boss . > Somebody wants to use "from foo import *" on a module not designed > for it? Too bad for them. How is someone supposed to know whether a module "was designed" for import*? Even Tkinter (which just about everyone does "import *" on) also exports sys, and everything from the "types" module, by accident too. > If you're suggesting __exports__ is to patch over problems > caused by "from foo import *", then I think you're barking up the > wrong tree > :-) Indeed. But I'm suggesting that the problems that *can* arise from "import*" illustrate the fundamental silliness of exporting things by accident. It's come up much more often for me when I'm looking over someone's shoulder, teaching them how to use dir() in an interactive shell to answer their own damn questions <0.5 wink>. It's usually the case that dir(M) shows them something that isn't documented, and over time I am *not* pleased that "oh, I guess the 'string' in there is just crap" is how they learn to view it. I can live without __exports__; but I'd prefer not to, because I would always use it if it were there. if-i'd-both-use-it-and-heartily-recommend-it-it's-hard-to- oppose-it-ly y'rs - tim From m.favas@per.dem.csiro.au Mon Jan 8 11:48:40 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Mon, 08 Jan 2001 19:48:40 +0800 Subject: [Python-Dev] _cursesmodule.c clobbered since Christmas Message-ID: <3A59A918.E0D02E0D@per.dem.csiro.au> I last successfully downloaded from CVS, compiled, linked and tested on Dec. 22 last year. For the last week or so, the current CVS _cursesmodule.c gives a bunch of compiler warning messages of the form: cc: Warning: ./_cursesmodule.c, line 619: In this statement, "derwin(...)" of ty pe "int", is being converted to "pointer to struct _win_st". (cvtdiftypes) win = derwin(self->win,nlines,ncols,begin_y,begin_x); --^ cc: Warning: ./_cursesmodule.c, line 1259: In this statement, "subpad(...)" of t ype "int", is being converted to "pointer to struct _win_st". (cvtdiftypes) win = subpad(self->win, nlines, ncols, begin_y, begin_x); ----^ cc: Warning: ./_cursesmodule.c, line 1488: In this statement, "termname(...)" of type "int", is being converted to "pointer to const char". (cvtdiftypes) NoArgReturnStringFunction(termname) ^ (more elided) and cc: Warning: ./_cursesmodule.c, line 305: The scalar variable "arg1" is fetched but not initialized. And there may be other such fetches of this variable that have not been reported in this compilation. (uninit1) Window_NoArg2TupleReturnFunction(getparyx, int, "(ii)") ^ cc: Warning: ./_cursesmodule.c, line 305: The scalar variable "arg2" is fetched but not initialized. And there may be other such fetches of this variable that have not been reported in this compilation. (uninit1) Window_NoArg2TupleReturnFunction(getparyx, int, "(ii)") ^ (more elided) and at link time, fails with: ld: Unresolved: getbegyx getmaxyx getparyx I've held off bothering anyone about this, but it begins to look as though no-one else has noticed... My platform? Tru64 Unix, V4.0F (aka OSF1). The recent pow() bug hit this platform, too. Happy to do any testing... -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From guido@python.org Mon Jan 8 14:27:50 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 09:27:50 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: Your message of "Mon, 08 Jan 2001 01:49:45 EST." <20010108014945.A19516@thyrsus.com> References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> Message-ID: <200101081427.JAA03146@cj20424-a.reston1.va.home.com> > You may be right. Still, this patch solves the immediate problem in a > reasonably clean way, and I urge that it should go in. We can do a > more complete reorganization of the build process later. (I'll help with > that; I'm pretty expert with autoconf and friends.) I expect Andrew's code to go in before 2.1 is released. So I don't see a reason why we should hurry and check in a stop-gap measure. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jan 8 14:33:09 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 09:33:09 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Mon, 08 Jan 2001 00:26:03 PST." <20010108002603.X17220@lyra.org> References: <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> Message-ID: <200101081433.JAA03185@cj20424-a.reston1.va.home.com> > hehe... adding __exports__ to your module is fine. Adding more crud to > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > Somebody wants to use "from foo import *" on a module not designed for it? > Too bad for them. If you're suggesting __exports__ is to patch over problems > caused by "from foo import *", then I think you're barking up the wrong tree > :-) You haven't been answering many newbie questions lately, have you? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jan 8 15:06:28 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 10:06:28 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sun, 07 Jan 2001 17:15:27 EST." <20010107171527.A5093@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> Message-ID: <200101081506.KAA03404@cj20424-a.reston1.va.home.com> > > So, I see no reason why the logic in your program couldn't take care > > of this, which in general is a preferred way to solve a problem than > > to change the language. > > OK, two objections, one practical and one (more important) esthetic: > > Practical: I guess I oversimplified the code for expository purposes. > What's actually going on is that I have two parser classes both based > on shlex -- they do character-at-a-time input and don't actually > *have* accessible line buffers. And what's wrong with always starting the second parser? If the stream was at EOF it will simply process zero lines. Or does your parser have a problem with empty input? > Esthetic: Yes, I can have the first parser set a flag, or return some > EOF token. But this seems deeply wrong to me, because EOFness is not > a property of the parser but of the underlying stream object. It > seems to me that my program ought to be able to ask the stream object > whether it's at EOF rather than carrying its own flag for that state. Eric, before we go furhter, can you give an exact definition of EOFness to me? > In Python as it is, there's no clean way to do this. I'd have to do a > nonzero-length read to test it (I failed to check the right alternate > case before when I tried zero-length). That's really broken. What if the > neither the underlying stream nor the parser supports pushback? > > Do you see now why I think this is a more general issue? No. What's wrong with just setting the parser loose on the input and letting it deal with EOF? In your example, apparently a line containing the word "history" signals that the rest of the file must be parsed by the second parser. What if "history" is the last line of the file? The eof() test can't tell you *that*! > Now, another and more general way to handle this would be to make an > equivalent of the old FIONCLEX ioctl part of Python's standard set of > file object methods -- a way to ask "how many bytes are ready to be > read in this stream? There's no portable way to do that. > Trivial to make it work for plain files, of course. Harder to make it > work usefully for pipes/fifos/sockets/terminals. Having it pass up the > results of the fstat.size field (corrected for the current seek address > if you're reading a plain file) would be a good start. This seems totally the wrong level to solve your problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez@zadka.site.co.il Mon Jan 8 23:13:21 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 01:13:21 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101081433.JAA03185@cj20424-a.reston1.va.home.com> References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> Message-ID: <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 09:33:09 -0500, Guido van Rossum wrote: > > hehe... adding __exports__ to your module is fine. Adding more crud to > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > caused by "from foo import *", then I think you're barking up the wrong tree > > :-) > > You haven't been answering many newbie questions lately, have you? :-) Well, I have. And frankly, I think having "from foo import *" issue a warning at 2.1 a *much* better solution. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido@python.org Mon Jan 8 15:15:20 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 10:15:20 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Tue, 09 Jan 2001 01:13:21 +0200." <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> [Greg] > > > hehe... adding __exports__ to your module is fine. Adding more crud to > > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > > caused by "from foo import *", then I think you're barking up the wrong tree > > > :-) [Guido] > > You haven't been answering many newbie questions lately, have you? :-) [Moshe] > Well, I have. > And frankly, I think having "from foo import *" issue a warning at 2.1 > a *much* better solution. (1) For what problem? (2) Under exactly what circumstances do you want from foo import * issue a warning? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Mon Jan 8 15:26:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 16:26:21 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101071845.f07IjFi01249@mira.informatik.hu-berlin.de> Message-ID: <3A59DC1D.29DE500B@lemburg.com> "Martin v. Loewis" wrote: > > Authors of extension packages often find the need to auto-import some > of their modules. This is often needed for registration, e.g. a codec > author (like Tamito KAJIYAMA, who wrote the JapaneseCodecs package) > may need to register a search function with codecs.register. This is > currently only possible by writing into sitecustomize.py, which must > be done by the system administrator manually. > > To enhance the service of site.py, I've written the patch > > http://sourceforge.net/patch/?func=detailpatch&patch_id=103134&group_id=5470 > > which treats lines in PTH files which start with "import" as > statements and executes them, instead of appending these lines to > sys.path. > > The patch is relatively small, but since it is an extension: Do I need > to write a PEP for it? Just curious: wouldn't this introduce a /tmp-style problem to Python ? The scenario is quite simple: a Python script runs under root. The script could pick up a lingering .pth file (e.g. from /tmp or one of its subdirs -- distutils does this !) and then executes arbitrary code as *root*. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim@interet.com Mon Jan 8 15:43:05 2001 From: jim@interet.com (James C. Ahlstrom) Date: Mon, 08 Jan 2001 10:43:05 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? Message-ID: <3A59E009.96922CA5@interet.com> There a number of problems which frequently recur on c.l.p that can serve as a source of Python improvement ideas. On December 30, 2000 gerson.kurz@t-online.de (Gerson Kurz) writes: If I embedd Python in a Win32 console application (using Demo\embed.c), everything works fine. If I take the very same piece of code and put it in a Win32 Windows application (not MFC, just a plain WinMain()) I see no output (and more importantly so, no errors), because the application does not have a stdout/stderr set up. This is well known. Windows developers must replace sys.stdout and sys.stderr with alternative mechanisms. Unfortunately this solution does not completely work because errors can occur before sys.stdout is replaced. I propose patching pythonw.exe (WinMain.c) and adding a new module to fix this so it Just Works. The patch is completely Windows specific. I am not sure if this constitutes a PEP, but would like everyone's feedback anyway. Design Requirements 1) "pythonw.exe myfile.py" will give the usual error message if myfile.py does not exist. 2) "pythonw.exe myfile.py" will give the usual traceback for a syntax error in myfile.py. 3) python.exe will provide a useful C-language stdout/stderr so the user does not have to replace sys.stdout/err herself. 4) None of the above will interfere will the user's replacement of sys.stdout/err for her own purposes. Description of Patch A new module winstdoutmodule.c (138 lines) is included in Windows builds. It contains a C entry point PyWin_StdoutReplace() which creates a valid C stdout/err, and code to display output in a popup dialog box. There is a Python entry point winstdout.print() to display output, but it is only used for special purposes, and the typical user will never import winstdout. The file WinMain.c calls PyWin_StdoutReplace() before it calls Py_Main(), and PyWin_StdoutPrint() afterwards. This is meant to display startup error messages. Normally, any available output is displayed when the system is idle. Technical Details Some experimentation (as opposed to documentation) shows that Win32 programs have a valid FILE * stdout, but fileno(stdout) gives INVALID_HANDLE_VALUE; the FILE * has an invalid OS file object. It is tempting to hack the FILE structure directly. But it is more prudent to use the only documented way to replace stdout, namely the standard call "freopen()" (also available on Unix). The design uses this call to open a temporary file to append stdout and stderr output. To display output, the file is checked when the system is idle, and MessageBox() is called with the file contents if any. Status After a few false starts, I now have working code. Is this a good idea? If so, is the implementation optimal (comments from MarkH especially welcome)? JimA From mal@lemburg.com Mon Jan 8 15:52:32 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 16:52:32 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <3A59E240.7F77790E@lemburg.com> Moshe Zadka wrote: > > On Mon, 08 Jan 2001 09:33:09 -0500, Guido van Rossum wrote: > > > hehe... adding __exports__ to your module is fine. Adding more crud to > > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > > caused by "from foo import *", then I think you're barking up the wrong tree > > > :-) > > > > You haven't been answering many newbie questions lately, have you? :-) > > Well, I have. > And frankly, I think having "from foo import *" issue a warning at 2.1 > a *much* better solution. Why raise a warning ? "from xyz import *" is still very useful in intercative sessions and also has some merrits when it comes to importing all subpackages of a package (well, at least those listed in __all__). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From barry@digicool.com Mon Jan 8 15:54:10 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 8 Jan 2001 10:54:10 -0500 Subject: [Python-Dev] Add __exports__ to modules References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> Message-ID: <14937.58018.792925.31985@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> it would *totally* stop any hope in the future of module level MZ> __getattr__ (or at least complicate the semantics). I think MZ> Alex M. is thinking of a PEP, but he's taking his time, since MZ> no PEPs can be considered until 2.1 is out. Given the current discussion, I'm now -1 on __exports__ unless a PEP is written. I think enough issues and interactions have been brought up that a PEP is warranted first. -Barry From moshez@zadka.site.co.il Tue Jan 9 00:03:00 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 02:03:00 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com>, <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 10:15:20 -0500, Guido van Rossum wrote: > (1) For what problem? Users seeing things they didn't expect in their modules. > (2) Under exactly what circumstances do you want from foo import * > issue a warning? All. If you want to be less extreme, don't warn if the module defines a __from_star_ok__ But in any case, I'm done with this thread. We'll probably won't manage to convince each other. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido@python.org Mon Jan 8 16:04:58 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 11:04:58 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Mon, 08 Jan 2001 10:54:10 EST." <14937.58018.792925.31985@anthem.wooz.org> References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <14937.58018.792925.31985@anthem.wooz.org> Message-ID: <200101081604.LAA04464@cj20424-a.reston1.va.home.com> > Given the current discussion, I'm now -1 on __exports__ unless a PEP > is written. I think enough issues and interactions have been brought > up that a PEP is warranted first. I have to agree. I am no longer championing this patch. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Mon Jan 8 16:27:17 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 8 Jan 2001 10:27:17 -0600 (CST) Subject: [Python-Dev] inspect.py In-Reply-To: References: Message-ID: <14937.60005.951163.80255@beluga.mojam.com> Ping> Sorry to bother you if you're busy -- i haven't seen any responses Ping> about inspect.py for a few days and wanted to know what your Ping> reactions were. Fiddling code bits is not the sort of stuff I do very often, but every time I do I wind up having to reacquaint myself with all sorts of object details that slip out of my brain shortly after the latest need is gone. Having a module that hides the details seems like a good idea to me. +1. I vote it go into 2.1 assuming a bit for the library reference can be written in time. Skip From akuchlin@mems-exchange.org Mon Jan 8 16:31:09 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:31:09 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101081427.JAA03146@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 09:27:50AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> Message-ID: <20010108113109.C7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 09:27:50AM -0500, Guido van Rossum wrote: >I expect Andrew's code to go in before 2.1 is released. So I don't >see a reason why we should hurry and check in a stop-gap measure. But it might not; the final version might be unacceptable or run into some intractable problem. Assuming the patch is correct (I haven't looked at it), why not check it in? The work has already been done to write it, after all. --amk From akuchlin@mems-exchange.org Mon Jan 8 16:41:10 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:41:10 -0500 Subject: [Python-Dev] _cursesmodule.c clobbered since Christmas In-Reply-To: <3A59A918.E0D02E0D@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Mon, Jan 08, 2001 at 07:48:40PM +0800 References: <3A59A918.E0D02E0D@per.dem.csiro.au> Message-ID: <20010108114110.D7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 07:48:40PM +0800, Mark Favas wrote: >I last successfully downloaded from CVS, compiled, linked and tested on >Dec. 22 last year. For the last week or so, the current CVS >_cursesmodule.c gives a bunch of compiler warning messages of the form: Hmm... on Dec. 22 there was a sizable change to export a C API from the module; since then there's only been one minor change. Perhaps the last version you compiled successfully was from before I checked in those changes. In any case, I'll look into it as soon as my Compaq test drive account is usable and I have access to a Tru64 4.0 machine again. Thanks for the report! Once the PEP 229 changes go in, many more modules will be tried on many more platforms. It might be worth considering setting up a Tinderbox for Python, or at least doing a systematic test on several platforms before releases. --amk From paulp@ActiveState.com Mon Jan 8 16:46:47 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:46:47 -0800 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A59EEF7.BB4118BD@ActiveState.com> Tim Peters wrote: > > ... It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. I agree strongly. I think that Python people are careless about what their module dictionaries look like. My two main annoyances are modules that export other modules randomly and modules that export huge wacks of constants. > Indeed. But I'm suggesting that the problems that *can* arise from > "import*" illustrate the fundamental silliness of exporting things by > accident. It's come up much more often for me when I'm looking over > someone's shoulder, teaching them how to use dir() in an interactive shell > to answer their own damn questions <0.5 wink>. It's usually the case that > dir(M) shows them something that isn't documented, and over time I am *not* > pleased that "oh, I guess the 'string' in there is just crap" is how they > learn to view it. Screw dir()! Let's talk about important stuff: Komodo. And Idle. And WingIDE. And PythonWorks and PythonWin. :) How are class browsers and "intellisense prompters" supposed to know that it "makes sense" to prompt the user with os.path but not CGIHTTPServer.os.path. Overall, I think Tim is right. We are all adults here and part of being adults is keeping your privates private and your nose clean. Paul Prescod From paulp@ActiveState.com Mon Jan 8 16:47:39 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:47:39 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <20010107232532.V17220@lyra.org>, <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> Message-ID: <3A59EF2B.792801E5@ActiveState.com> Moshe Zadka wrote: > > ... > Let me "me to" here: > Put another way, what Greg said is just a rephrase of "don't use from > foo import * unless foo's docos say it's OK". That's not the issue. It's not about keeping people out of your module. In fact I would propose that mod.__dict__ should be as loose as ever. It's a user interface issue. If we encourage people to learn about modules in interactive environments like the prompt using dir(), class browsers and IDEs then we need to create modules that are friendly for those users. I think that the current situation is pretty bad that way. what does CGIHTTPServer export BaseHTTPServer? And why is CGIHTTPServer.CGIHTTPServer a class but CGIHTTPServer.BaseHTTPServer is a module? We go to great lengths to make the syntax newbie friendly. I think that we should make similar efforts in a cleanly reflective class library. > Add to that the simple > access control of a leading underscore, and I don't see any place > which needs it. > > Something better to do would be to use > import foo as _foo It's pretty clear that nobody does this now and nobody is going to start doing it in the near future. It's too invasive and it makes the code too ugly. Why obfuscate thousands of lines of code when a simple feature can mitigate that? >... > One more point against __exports__, which I hoped I would not have to > make (but when I'm up against the timbot *and* Guido, I need to pull > out the heavy artillery): it would *totally* stop any hope in the > future of module level __getattr__ (or at least complicate the semantics). > I think Alex M. is thinking of a PEP, but he's taking his time, since > no PEPs can be considered until 2.1 is out. __exports__ would merely be considered an implementation detail of the "default __getattr__". Custom __getattr__'s could decide whether to respect it or not. It doesn't complicate anything much. Paul Prescod From nas@arctrix.com Mon Jan 8 09:54:55 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 8 Jan 2001 01:54:55 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A59E009.96922CA5@interet.com>; from jim@interet.com on Mon, Jan 08, 2001 at 10:43:05AM -0500 References: <3A59E009.96922CA5@interet.com> Message-ID: <20010108015455.A15138@glacier.fnational.com> On Mon, Jan 08, 2001 at 10:43:05AM -0500, James C. Ahlstrom wrote: > Is this a good idea? If so, is the implementation optimal > (comments from MarkH especially welcome)? The general idea sounds good to me. Having tracebacks go nowhere when running pythonw is un-Python-like. I don't know enough about MFC, etc. to comment on the specifics of your patch. Neil From akuchlin@mems-exchange.org Mon Jan 8 16:49:13 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:49:13 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EEF7.BB4118BD@ActiveState.com>; from paulp@ActiveState.com on Mon, Jan 08, 2001 at 08:46:47AM -0800 References: <3A59EEF7.BB4118BD@ActiveState.com> Message-ID: <20010108114913.E7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 08:46:47AM -0800, Paul Prescod wrote: >How are class browsers and "intellisense prompters" supposed to know >that it "makes sense" to prompt the user with os.path but not >CGIHTTPServer.os.path. Could we then simply adopt __exports__ as a convention for such browsers, but with no changes to core Python to support it? Browsers would then follow the algorithm "Use __exports__ if present, dir() if not." --amk From paulp@ActiveState.com Mon Jan 8 16:51:26 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:51:26 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <3A59F00E.53A0A32A@ActiveState.com> Tim Peters wrote: > > .... > > Perl appears to ignore the issue of thread safety here (on Windows and > everywhere else). If you can create a sample program that demonstrates the unsafety I'll anonymously submit it as a bug on our internal system and ensure that the next version of Perl is as slow as Python. :) Seriously: If someone comes at me with Perl-IO-is-way-faster-than-Python-IO, I'd like to know what concretely they've given up in order to achieve that performance. And even just for my own interest I'd like to understand the cost/benefit of stream thread safety. For instance would it make sense to just write a thread-safe wrapper for streams used from multiple threads? Paul Prescod From paulp@ActiveState.com Mon Jan 8 17:01:49 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 09:01:49 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <3A59EEF7.BB4118BD@ActiveState.com> <20010108114913.E7563@kronos.cnri.reston.va.us> Message-ID: <3A59F27D.C27B8CD0@ActiveState.com> Andrew Kuchling wrote: > > ... > > Could we then simply adopt __exports__ as a convention for such > browsers, but with no changes to core Python to support it? Browsers > would then follow the algorithm "Use __exports__ if present, dir() if > not." dir() is one of the "interactive tools" I'd like to work better in the presence of __exports__. On the other hand, dir() works pretty poorly for object instances today so maybe we need something new anyhow. Perhaps attrs()? If there were an "attrs()" and it basically returned __exports__ if it existed and dir() if it didn't, then I would buy it. Graphical apps would just build on attrs(). Paul From MarkH@ActiveState.com Mon Jan 8 17:04:31 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Mon, 8 Jan 2001 09:04:31 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A59E009.96922CA5@interet.com> Message-ID: > Is this a good idea? If so, is the implementation optimal Im really on the fence here. Note however that your solution does not solve the original problem. Eg, your example is: > On December 30, 2000 gerson.kurz@t-online.de (Gerson Kurz) writes: > > If I embedd Python in a Win32 console application (using > Demo\embed.c), everything works fine. If I take the very same piece But your solution involves: > The file WinMain.c calls PyWin_StdoutReplace() before it > calls Py_Main(), and PyWin_StdoutPrint() afterwards. This Note that the original problem was _embedding_ Python - thus, you need to patch _their_ WinMain to make it work for them - something you can't do. Even if PyWin_StdoutReplace() was a public symbol so they _could_ call it, I am not convinced they would - it is almost certain they will still need to redirect output to somewhere useful, so why bother redirecting it temporarily just to redirect it for real immediately after? Finally, I am slightly concerned about the possibility of "hanging" certain programs. For example, I believe that DCOM will often invoke a COM server in a different "desktop" than the user (this is also true for Services, but Python services don't use pythonw.exe). Thus, a Python program may end up hanging with a dialog box, but in the context where no user is able to see it. However, this could be addressed by adding a command-line option to prevent this new behaviour kicking in. I would prefer to see a decent API for extracting error and traceback information from Python. On the other hand, I _do_ see the problem for "newbies" trying to use pythonw.exe. So - I guess I am saying that I don't see this as optimal, and it doesnt solve the original problem you pointed at - but in the interests of making pythonw.exe seem "less broken" for newbies, I could live with this as long as I could prevent it when necessary. Another option would be to use the Win32 Console APIs, and simply attempt to create a console for the error message. Eg, maybe PyErr_Print() could be changed to check for the existance of a console, and if not found, create it. However, the problem with this approach is that the error message will often be printed just as the process is terminating - meaning you will see a new console with the error message for about 0.025 of a second before it vanishes due to process termination. Any sort of "press any key to terminate" option then leaves us in the same position - if no user can see the message, the process appears hung. Mark. From Andreas Jung Mon Jan 8 17:06:16 2001 From: Andreas Jung (Andreas Jung) Date: Mon, 8 Jan 2001 18:06:16 +0100 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 In-Reply-To: <3A58EFC3.5A722FF0@tismer.com>; from tismer@tismer.com on Mon, Jan 08, 2001 at 12:37:55AM +0200 References: <3A58EFC3.5A722FF0@tismer.com> Message-ID: <20010108180616.A18993@yetix.sz-sb.de> On Mon, Jan 08, 2001 at 12:37:55AM +0200, Christian Tismer wrote: > Dear community, > > I'm happy to announce that > > Stackless Python 2.0 > > is finally ready and available for download. > > Stackless Python for Python 1.5.2+ also got some minor > enhancements. Both versions are available as Win32 > installer files here: Are there patches available against the standard Python 2.0 source code tree ? Andreas From tismer@tismer.com Mon Jan 8 16:15:55 2001 From: tismer@tismer.com (Christian Tismer) Date: Mon, 08 Jan 2001 18:15:55 +0200 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 References: <3A58EFC3.5A722FF0@tismer.com> <20010108180616.A18993@yetix.sz-sb.de> Message-ID: <3A59E7BB.6908B7E2@tismer.com> Andreas Jung wrote: > > On Mon, Jan 08, 2001 at 12:37:55AM +0200, Christian Tismer wrote: > > Dear community, > > > > I'm happy to announce that > > > > Stackless Python 2.0 > > > > is finally ready and available for download. > > > > Stackless Python for Python 1.5.2+ also got some minor > > enhancements. Both versions are available as Win32 > > installer files here: > > Are there patches available against the standard Python 2.0 > source code tree ? I had no time yet to put the source trees on the web. Should happen in one or two days. The I will probably not provide patches, hoping that some other Unix people will catch up and provide that part. This worked the same for the 1.5.2 version. The 2.0 port consists of 10 or so files, which can be used as direct replacements for the same files in the 2.0 distro. I think on Unix this is the right way to go. For me it is simpler to have my own litle tree, since I'm working with Windows, and I just have to modify my VC++ project file. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From moshez@zadka.site.co.il Tue Jan 9 01:30:09 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 03:30:09 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59F27D.C27B8CD0@ActiveState.com> References: <3A59F27D.C27B8CD0@ActiveState.com>, <3A59EEF7.BB4118BD@ActiveState.com> <20010108114913.E7563@kronos.cnri.reston.va.us> Message-ID: <20010109013009.37D6DA82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 09:01:49 -0800, Paul Prescod wrote: > dir() is one of the "interactive tools" I'd like to work better in the > presence of __exports__. On the other hand, dir() works pretty poorly > for object instances today so maybe we need something new anyhow. > Perhaps attrs()? > > If there were an "attrs()" and it basically returned __exports__ if it > existed and dir() if it didn't, then I would buy it. Graphical apps > would just build on attrs(). Even better, __exports__ could be what was imported in from foo import *. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From Andreas Jung Mon Jan 8 17:25:36 2001 From: Andreas Jung (Andreas Jung) Date: Mon, 8 Jan 2001 18:25:36 +0100 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 In-Reply-To: <3A59E7BB.6908B7E2@tismer.com>; from tismer@tismer.com on Mon, Jan 08, 2001 at 06:15:55PM +0200 References: <3A58EFC3.5A722FF0@tismer.com> <20010108180616.A18993@yetix.sz-sb.de> <3A59E7BB.6908B7E2@tismer.com> Message-ID: <20010108182536.A20361@yetix.sz-sb.de> On Mon, Jan 08, 2001 at 06:15:55PM +0200, Christian Tismer wrote: > > The 2.0 port consists of 10 or so files, which can be used > as direct replacements for the same files in the 2.0 distro. > I think on Unix this is the right way to go. > For me it is simpler to have my own litle tree, since I'm > working with Windows, and I just have to modify my VC++ > project file. I would prefer a tar.gz archive that contains just the modified files. With this approach it is easy possible to extract the archive inside the Python source tree. Andreas From loewis@informatik.hu-berlin.de Mon Jan 8 17:51:28 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 8 Jan 2001 18:51:28 +0100 (MET) Subject: [Python-Dev] Extending startup code: PEP needed? Message-ID: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> > Just curious: wouldn't this introduce a /tmp-style problem to > Python ? I tried, but I could not produce such a problem. > The scenario is quite simple: a Python script runs under root. > The script could pick up a lingering .pth file (e.g. from /tmp > or one of its subdirs -- distutils does this !) and then executes > arbitrary code as *root*. No, Python looks only in a few places for pth file: {,}{,/lib/python/site-packages,/lib/site-python} so it won't pick up pth files in /tmp. Regards, Martin From esr@thyrsus.com Mon Jan 8 18:01:37 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 13:01:37 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101081506.KAA03404@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 10:06:28AM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> Message-ID: <20010108130137.E22834@thyrsus.com> Guido van Rossum : > Eric, before we go furhter, can you give an exact definition of > EOFness to me? A file is at EOF when attempts to read more data from it will fail returning no data. > What's wrong with just setting the parser loose on the input and > letting it deal with EOF? Nothing wrong in theory, but it's a problem in practice. I don't want to import the second parser unless it's actually needed, because it's much larger than the first one. > In your example, apparently a line > containing the word "history" signals that the rest of the file must > be parsed by the second parser. What if "history" is the last line of > the file? The eof() test can't tell you *that*! Right. That case never happens. I mean it *really* never happens :-). What we're talking about is a game system. The first parser recognizes a spec language for describing games of a particular class (variants of Diplomacy, if that's meaningful to you). The system keeps logfiles which consist of a a section in the game description language, optionally followed by the token "history" and an order log. The parser for the order log language is a *lot* larger than the one for the description language. This is why I said I don't want the first parser to just call the second. I want to test for EOF to know whether I have to import the second parser at all! Here's the beginning of my problem: the first parser can't export a line buffer, because it doesn't *have* a line buffer. It's a subclass of shlex and does single-character reads. There are two ways I can cope with this. One is to do a (nonzero) length read after the first parser exits; the other is to have the first parser set a state flag controlling whether the second parser loads. This is where it bites that I can't test for EOF with a read(0). The second shlex parser only has token-level pushback! If do a nonzero-length read and I get data, I'm screwed. On the other hand (as I said before) setting a lexer state flag seems wrong, because EOFness is a property of the underlying stream rather than the parser. I'd be duplicating state that exists in the stdio stream structure anyway; it ought to be accessible. > > Now, another and more general way to handle this would be to make an > > equivalent of the old FIONCLEX ioctl part of Python's standard set of > > file object methods -- a way to ask "how many bytes are ready to be > > read in this stream? > > There's no portable way to do that. Actually, fstat(2) is portable enough to support a very useful approximation of FIONCLEX. I know, because I tried it. Last night I coded up a "waiting" method for file objects that calls fstat(2) on the associated file descriptor. For a plain file, it then subtracts the result of ftell() from the fstat size field and returns that -- for other files, it simply returns the size field. I then tested this on plain files, FIFOs, and sockets under Linux. It turns out fstat(2) gives useful information in all three cases (a count of characters waiting in the buffer in the latter two). I expected this; it should be true under all current Unixes. fstat(2) does not give useful size-field results for Linux block devices. I didn't test the character (terminal) devices. (I documented my results in Python's Doc/lib/stat.tex, in a patch I have already submitted to SourceForge.) I would be quite surprised if the plain-file case didn't work on Mac and Windows. I would be a little surprised if the socket case failed, because all three probably inherited fstat(2) from the ancestral BSD TCP/IP stack. Just having the plain-file case work would, IMHO, be justification enough for this method. If it turns out to be portable across Mac and Windows sockets as well, *huge* win. Could this be tested by someone with access to Windows and Mac systems? -- Eric S. Raymond An armed society is a polite society. Manners are good when one may have to back up his acts with his life. -- Robert A. Heinlein, "Beyond This Horizon", 1942 From mal@lemburg.com Mon Jan 8 18:10:50 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 19:10:50 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> Message-ID: <3A5A02AA.675A35D1@lemburg.com> Martin von Loewis wrote: > > > Just curious: wouldn't this introduce a /tmp-style problem to > > Python ? > > I tried, but I could not produce such a problem. > > > The scenario is quite simple: a Python script runs under root. > > The script could pick up a lingering .pth file (e.g. from /tmp > > or one of its subdirs -- distutils does this !) and then executes > > arbitrary code as *root*. > > No, Python looks only in a few places for pth file: > {,}{,/lib/python/site-packages,/lib/site-python} > > so it won't pick up pth files in /tmp. Hmm, but what if the Python script picks up a site.py which is different from the standard one distributed with Python ? The code adding (and with the patch: executing) the .pth files is defined in site.py and it is rather easy to override this file by adding a modified site.py file to the current working dir... a potential security hole in its own right, I guess :( -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Mon Jan 8 18:30:34 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 13:30:34 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Mon, 08 Jan 2001 13:01:37 EST." <20010108130137.E22834@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> <20010108130137.E22834@thyrsus.com> Message-ID: <200101081830.NAA05301@cj20424-a.reston1.va.home.com> Eric, take a hint. You're not going to get your eof() method no matter what arguments you bring up. But I'll explain it to you again anyway... :-) > Guido van Rossum : > > Eric, before we go furhter, can you give an exact definition of > > EOFness to me? [Eric] > A file is at EOF when attempts to read more data from it will fail > returning no data. I was afraid you would say this. That's not a condition that's easy to calculate without doing I/O, *and* that's not the condition that you are interested in for your problem. According to your definition, f.eof() should be true in this example: f = open("/etc/passwd") f.seek(0, 2) # Seek to end of file print f.eof() # What will this print??? print `f.readline()` # Will print '' But getting the right result here requires a lot of knowledge about how the file is implemented! While you've explained how this can be implemented on Unix, it can't be implemented with just the tools that stdio gives us. Going beyond stdio in order to implement a feature is a grave decision. After all, Python is portable to many less-than-mainstream operating systems (VxWorks, OS/9, VMS...). Now, if this was just a speed hack (like xreadlines) I could accept having some platform-dependent code, if at least there was a portable way to do it that was just a bit slower. But here you can't convince me that this can be done in a portable way, and I don't want to force porters to figure out how to do this for their platform before their port can work. I also don't want to make f.eof() a non-portable feature: *if* it is provided, it's too important for that. Note that stdio's feof() doesn't have this definition! It is set when the last *read* (or getc(), etc.) stumbled upon an EOF condition. That's also of limited value; it's mostly defined so you can distinguish between errors and EOF when you get a short read. The stdio feof() flag would be false in the above example. > > What's wrong with just setting the parser loose on the input and > > letting it deal with EOF? > > Nothing wrong in theory, but it's a problem in practice. I don't want > to import the second parser unless it's actually needed, because it's much > larger than the first one. So be practical and let the first parser set a global flag that tells you whether it's necessary to load the second one. > > In your example, apparently a line > > containing the word "history" signals that the rest of the file must > > be parsed by the second parser. What if "history" is the last line of > > the file? The eof() test can't tell you *that*! > > Right. That case never happens. I mean it *really* never happens :-). > > What we're talking about is a game system. The first parser recognizes > a spec language for describing games of a particular class (variants of > Diplomacy, if that's meaningful to you). The system keeps logfiles which > consist of a a section in the game description language, optionally > followed by the token "history" and an order log. > > The parser for the order log language is a *lot* larger than the one > for the description language. This is why I said I don't want the > first parser to just call the second. I want to test for EOF to > know whether I have to import the second parser at all! > > Here's the beginning of my problem: the first parser can't export a line > buffer, because it doesn't *have* a line buffer. It's a subclass of > shlex and does single-character reads. > > There are two ways I can cope with this. One is to do a (nonzero) > length read after the first parser exits; the other is to have the > first parser set a state flag controlling whether the second parser > loads. Do the latter. Nothing wrong with it that I can see. > This is where it bites that I can't test for EOF with a read(0). And can you tell me a system where you *can* test for EOF with a read(0)? I've never heard of such a thing. The Unix read() system call has the same properties as Python's f.read(). I'm pretty sure that fread() with a zero count also doesn't give you the information you're after. > The > second shlex parser only has token-level pushback! If do a > nonzero-length read and I get data, I'm screwed. On the other hand > (as I said before) setting a lexer state flag seems wrong, because > EOFness is a property of the underlying stream rather than the parser. > I'd be duplicating state that exists in the stdio stream structure > anyway; it ought to be accessible. Bullshit. The EOFness that you're after (according to your own definition) is not the same as the EOFness of the stdio stream. The EOFness in the stdio stream could help you, but Python resets it -- so that making it available wouldn't be as easy as you claim. Anyway, you seem to have a sufficiently vague idea of what "EOFness" means that I don't think providing access to whatever low-level EOFness condition might exist would do you much good. > > > Now, another and more general way to handle this would be to make an > > > equivalent of the old FIONCLEX ioctl part of Python's standard set of > > > file object methods -- a way to ask "how many bytes are ready to be > > > read in this stream? > > > > There's no portable way to do that. > > Actually, fstat(2) is portable enough to support a very useful > approximation of FIONCLEX. I know, because I tried it. > > Last night I coded up a "waiting" method for file objects that calls > fstat(2) on the associated file descriptor. For a plain file, it > then subtracts the result of ftell() from the fstat size field and > returns that -- for other files, it simply returns the size field. > > I then tested this on plain files, FIFOs, and sockets under Linux. It > turns out fstat(2) gives useful information in all three cases (a > count of characters waiting in the buffer in the latter two). I expected > this; it should be true under all current Unixes. > > fstat(2) does not give useful size-field results for Linux block > devices. I didn't test the character (terminal) devices. (I > documented my results in Python's Doc/lib/stat.tex, in a patch I have > already submitted to SourceForge.) > > I would be quite surprised if the plain-file case didn't work on Mac > and Windows. I would be a little surprised if the socket case failed, > because all three probably inherited fstat(2) from the ancestral BSD > TCP/IP stack. > > Just having the plain-file case work would, IMHO, be justification > enough for this method. If it turns out to be portable across Mac and > Windows sockets as well, *huge* win. Could this be tested by someone > with access to Windows and Mac systems? I don't see the huge win. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jan 8 18:33:26 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 13:33:26 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 19:10:50 +0100." <3A5A02AA.675A35D1@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> Message-ID: <200101081833.NAA05325@cj20424-a.reston1.va.home.com> Discussions based on Python running as root and picking up untrusted code from $PYTHONPATH are pointless. Of course this is a security hole. If root runs *any* Python script in a way that could pick up even a single untrusted module, there's a security hole. site.py or *.pth files are just a special case of this, so I don't see why this is used as an example. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Mon Jan 8 18:48:40 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 13:48:40 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EF2B.792801E5@ActiveState.com> Message-ID: [Moshe] > Something better to do would be to use > import foo as _foo [Paul] > It's pretty clear that nobody does this now and nobody is going > to start doing it in the near future. It's too invasive and it > makes the code too ugly. Actually, this function is one of my std utilities: def _pvt_import(globs, modname, *items): """globs, modname, *items -> import into globs with leading "_". If *items is empty, set globs["_" + modname] to module modname. If *items is not empty, import each item similarly but don't import the module into globs. Leave names that already begin with an underscore as-is. # import math as _math >>> _pvt_import(globals(), "math") >>> round(_math.pi, 0) 3.0 # import math.sin as _sin and math.floor as _floor >>> _pvt_import(globals(), "math", "sin", "floor") >>> _floor(3.14) 3.0 """ mod = __import__(modname, globals()) if items: for name in items: xname = name if xname[0] != "_": xname = "_" + xname globs[xname] = getattr(mod, name) else: xname = modname if xname[0] != "_": xname = "_" + xname globs[xname] = mod Note that it begins with an underscore because it's *meant* to be exported <0.5 wink>. That is, the module importing this does from utils import _pvt_import because they don't already have _pvt_import to automate adding the underscore, and without the underscore almost everyone would accidentally export "pvt_import" in turn. IOW, import M from N import M not only import M, by default they usually export it too, but the latter is rarely *intended*. So, over the years, I've gone thru several phases of naming objects I *intend* to export with a leading underscore. That's the only way to prevent later imports from exporting by accident. I don't believe I've distributed any code using _pvt_import, though, because it fights against the language and expectations. Metaprogramming against the grain should be a private sin <0.9 wink>. _metaprogramming-ly y'rs - tim From mal@lemburg.com Mon Jan 8 18:40:37 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 19:40:37 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> Message-ID: <3A5A09A5.D0DC33A1@lemburg.com> Guido van Rossum wrote: > > Discussions based on Python running as root and picking up untrusted > code from $PYTHONPATH are pointless. Of course this is a security > hole. If root runs *any* Python script in a way that could pick up > even a single untrusted module, there's a security hole. site.py or > *.pth files are just a special case of this, so I don't see why this > is used as an example. Agreed; see my reply to Martin. Still, wouldn't it be wise to add some logic to Python to prevent importing untrusted modules, e.g. by making sys.path read-only and disabling the import hook usage using a command line ? This would at least prevent the most obvious attacks. I wonder how RedHat works around these problems. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim@interet.com Mon Jan 8 19:16:45 2001 From: jim@interet.com (James C. Ahlstrom) Date: Mon, 08 Jan 2001 14:16:45 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? References: Message-ID: <3A5A121D.FDD8C2C1@interet.com> Mark Hammond wrote: > Note that the original problem was _embedding_ Python - thus, you need to > patch _their_ WinMain to make it work for them - something you can't do. Correct, if they don't use pythonw.exe, but use a different main program, the new stdout will not be installed. But then they must have their own main.c, and they can add the C call. > Even if PyWin_StdoutReplace() was a public symbol so they _could_ call it, I Yes, the symbol PyWin_StdoutReplace() is public, and they can call it. > am not convinced they would - it is almost certain they will still need to > redirect output to somewhere useful, so why bother redirecting it > temporarily just to redirect it for real immediately after? Redirecting it temporarily is valuable, because if the sys.stdout replacement occurs in (for example) myprog.py, then "pythonw.exe myprog.py" will fail to produce any error messages for a syntax error in myprog.py. Also, I was hoping further sys.stdout redirection would be unnecessary. > Finally, I am slightly concerned about the possibility of "hanging" certain > programs. For example, I believe that DCOM will often invoke a COM server in > a different "desktop" than the user (this is also true for Services, but > Python services don't use pythonw.exe). Thus, a Python program may end up > hanging with a dialog box, but in the context where no user is able to see > it. However, this could be addressed by adding a command-line option to > prevent this new behaviour kicking in. Limiting the code to pythonw.exe instead of trying to install it in python20.dll was supposed to prevent damage to the use of Python in servers. Since pythonw.exe is a Windows (GUI) program, I am assuming there is a screen. The dialog box is started with MessageBox() and a window handle of GetForegroundWindow(). So there doesn't need to be an application window. I have tested it with GUI programs, and it also works when run from a console. Having said that, you may be right that there is some way to hang on a dialog box which can not be seen. It depends on what MessageBox() and GetForegroundWindow() actually do. If it seems that this patch has merit, I would be grateful if you would review the code to look for issues of this type. > I would prefer to see a decent API for extracting error and traceback > information from Python. On the other hand, I _do_ see the problem for > "newbies" trying to use pythonw.exe. There could be an API added to the winstdout module such as msg = winstdout.GetMessageText() which would return saved text, control its display etc. But then the problem remains of actually displaying the messages especially in the context of tracebacks and errors. And it is probably easier to redirect sys.stdout so it does what you want rather than use the API. I do not view winstdout as a "newbie" feature, but rather a generally useful C-language addition to Python. > So - I guess I am saying that I don't see this as optimal, and it doesnt > solve the original problem you pointed at - but in the interests of making > pythonw.exe seem "less broken" for newbies, I could live with this as long > as I could prevent it when necessary. I guess I am saying, perhaps incorrectly, that the mechanism provided will make further redirection of sys.stdout unnecessary 99% of the time. Experimentation shows that Python composes tracebacks and error messages a line or partial line at a time. That is, you can not display each call to printf(), but must wait until the system is idle to be sure that multiple calls to printf() are complete. So this forces you to use the idle processing loop, not rocket science but at least inconvenient. And the only source of stdout/err is tracebacks, error messages and the "print" statement. What would you do with these in a Windows program except display an "OK" dialog box? If someone out there knows of a different example of sys.stdout redirection in use in the real world, it would be helpful if they would describe it. Maybe it could be incorporated. > Another option would be to use the Win32 Console APIs, and simply attempt to > create a console for the error message. Eg, maybe PyErr_Print() could be > changed to check for the existance of a console, and if not found, create > it. However, the problem with this approach is that the error message will > often be printed just as the process is terminating - meaning you will see a > new console with the error message for about 0.025 of a second before it > vanishes due to process termination. Any sort of "press any key to > terminate" option then leaves us in the same position - if no user can see > the message, the process appears hung. Yes, this a problem with the console API approach. Another is that popping up a black console for output instead of the usual "OK" dialog box is unnatural, and will force the user to replace sys.stdout. I was hoping this C stdout will make this unnecessary. JimA From esr@thyrsus.com Mon Jan 8 19:17:50 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 14:17:50 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101081830.NAA05301@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 01:30:34PM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> <20010108130137.E22834@thyrsus.com> <200101081830.NAA05301@cj20424-a.reston1.va.home.com> Message-ID: <20010108141750.C23214@thyrsus.com> Guido van Rossum : > [Eric] > > A file is at EOF when attempts to read more data from it will fail > > returning no data. > > I was afraid you would say this. That's not a condition that's easy > to calculate without doing I/O, *and* that's not the condition that > you are interested in for your problem. According to your definition, > f.eof() should be true in this example: > > f = open("/etc/passwd") > f.seek(0, 2) # Seek to end of file > print f.eof() # What will this print??? > print `f.readline()` # Will print '' I agree that after f.seek(0, 2) f is in an end-of-file condition. But I think it's precisely the definition that would be useful for my problem. Contrary to what you say, I think my definition of EOF is quite sharp -- a sequential read would return no data. Better to think of what I need as an "is there data waiting?" query. I should have framed it that way, rather than about EOFness, from the beginning. > But getting the right result here requires a lot of knowledge about > how the file is implemented! While you've explained how this can be > implemented on Unix, it can't be implemented with just the tools that > stdio gives us. Granted. However, it looks possible that "is there data waiting" *can* be portably implemented with the help of fstat(2), which by precedent is also part of Python's toolkit. > I also don't want to make f.eof() a non-portable feature: *if* > it is provided, it's too important for that. Agreed. > Note that stdio's feof() doesn't have this definition! It is set when > the last *read* (or getc(), etc.) stumbled upon an EOF condition. > That's also of limited value; it's mostly defined so you can > distinguish between errors and EOF when you get a short read. The > stdio feof() flag would be false in the above example. OK. You're right about that. I should have thought more clearly about the difference between the state of stdio and the state of the underlying file or device. Access to stdio state won't do by itself. > > This is where it bites that I can't test for EOF with a read(0). > > And can you tell me a system where you *can* test for EOF with a > read(0)? I've never heard of such a thing. The Unix read() system > call has the same properties as Python's f.read(). I'm pretty sure > that fread() with a zero count also doesn't give you the information > you're after. I'd have to test -- but what Unix read(2) does in this case isn't really my point. My real point is that I can't probe for whether there's data waiting to be read in what seems like the obvious way. I expect Python to compensate for the deficiencies of the underlying C, not reflect them. > > Just having the plain-file case work would, IMHO, be justification > > enough for this method. If it turns out to be portable across Mac and > > Windows sockets as well, *huge* win. Could this be tested by someone > > with access to Windows and Mac systems? > > I don't see the huge win. Try "polling after a non-blocking open". A lower-overhead and more natural way to do it than with a poller object. (This is on my mind because I used a poller object to query FIFOs just last week.) The game system I'm working on, BTW, has another point of interest for this list. It is a rather large and complex suite of C programs that makes heavy use of dynamic-memory allocation; I am translating to Python partly in order to avoid chronic misallocation problems (leaks and wild pointers) and partly because the thing needed to be rewritten anyway to eliminate global state so I can embed it an multithreaded server. Side-by-side comparison of the original C and its translation should be quite an interesting educational experience once it's done. That just might be my next yesar's paper. -- Eric S. Raymond It is the assumption of this book that a work of art is a gift, not a commodity. Or, to state the modern case with more precision, that works of art exist simultaneously in two "economies," a market economy and a gift economy. Only one of these is essential, however: a work of art can survive without the market, but where there is no gift there is no art. -- Lewis Hyde, The Gift: Imagination and the Erotic Life of Property From guido@python.org Mon Jan 8 19:36:02 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 14:36:02 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 19:40:37 +0100." <3A5A09A5.D0DC33A1@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> Message-ID: <200101081936.OAA05440@cj20424-a.reston1.va.home.com> > Still, wouldn't it be wise to add some logic to Python to prevent > importing untrusted modules, e.g. by making sys.path read-only and > disabling the import hook usage using a command line ? > > This would at least prevent the most obvious attacks. I wonder how > RedHat works around these problems. I don't understand what kind of attacks you are thinking of. What would making sys.path read-only prevent? You seem to be thinking that some malicious piece of code could try to subvert you by setting sys.path. But what you forget is that if this piece of code cannot be trusted wiuth sys.path, it should not be trusted to run at all! --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis@informatik.hu-berlin.de Mon Jan 8 19:45:44 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 8 Jan 2001 20:45:44 +0100 (MET) Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: <3A5A02AA.675A35D1@lemburg.com> (mal@lemburg.com) References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> Message-ID: <200101081945.UAA12178@pandora.informatik.hu-berlin.de> > The code adding (and with the patch: executing) the .pth files > is defined in site.py and it is rather easy to override this > file by adding a modified site.py file to the current working dir... > a potential security hole in its own right, I guess :( Indeed - independent of my patch changing the other site.py :-) Regards, Martin From skip@mojam.com (Skip Montanaro) Mon Jan 8 19:49:22 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 8 Jan 2001 13:49:22 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EF2B.792801E5@ActiveState.com> References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <3A59EF2B.792801E5@ActiveState.com> Message-ID: <14938.6594.44596.509259@beluga.mojam.com> Paul> It's not about keeping people out of your module. In fact I would Paul> propose that mod.__dict__ should be as loose as ever. Okay, how about this as a compromise first step? Allow programmers to put __exports__ lists in their modules but don't do anything with them *except* modify dir() to respect that if it exists? That would pretty up dir() output for newbies, almost certainly not break anything, improve the internal documentation of the modules that use __exports__, and still allow us to move in a more restrictive direction at a later time if we so choose. Skip From moshez@zadka.site.co.il Tue Jan 9 04:04:23 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 06:04:23 +0200 (IST) Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: <3A5A02AA.675A35D1@lemburg.com> References: <3A5A02AA.675A35D1@lemburg.com>, <200101081751.SAA08918@pandora.informatik.hu-berlin.de> Message-ID: <20010109040423.68AA4A82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 19:10:50 +0100, "M.-A. Lemburg" wrote: > Hmm, but what if the Python script picks up a site.py which is > different from the standard one distributed with Python ? Then the site.py can do whatever it wants. No need to go through PTHs -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one@home.com Mon Jan 8 19:59:48 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 14:59:48 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <20010108130137.E22834@thyrsus.com> Message-ID: Quickie: [Guido] > Eric, before we go furhter, can you give an exact definition of > EOFness to me? [Eric] > A file is at EOF when attempts to read more data from it will fail > returning no data. To be very clear about this, that's not what C's feof() means: in general, the end-of-file indicator in std C stream input is set only *after* you've attempted a read that "didn't work". For example, #include void main() { FILE* fp = fopen("guts", "wb"); fputs("abc", fp); fclose(fp); fp = fopen("guts", "rb"); for (;;) { int c; c = getc(fp); printf("getc returned %c (%d)\n", c, c); printf("At EOF after getc? %d\n", feof(fp)); if (c == EOF) break; } } Unless your C is broken, feof() will return 0 after getc() returns 'a', and again after 'b', and again after 'c'. It's not until getc() returns EOF that feof() first returns a non-zero result. Then add these two lines after the "for": fseek(fp, 0L, SEEK_END); printf("after seeking to the end, feof() says %d\n", feof(fp)); Unless your fseek() is non-std, that clears the end-of-file indicator, and regardless of to where you seek. So the std behavior throughout libc is much like Python's behavior: there's nothing that can tell you whether you're at the end of the file, in general, short of trying to read and failing to get something back. In your case you seem to *know* that you have a "plain old file", meaning that its size is well-defined and that ftell() makes sense for it. You also seem to know that you don't have to worry about anyone else, e.g., appending to it (or in any other way changing its size, or changing your stream's file position), while you're mucking with it. So why not just do f.tell() and compare that to the size yourself? This sounds easy for you to do, but in this particular case you enjoy the benefits of a world of assumptions that aren't true in general. > ... > This is where it bites that I can't test for EOF with a read(0). You can't in std C using an fread of 0 bytes either -- that has no effect on the end-of-file indicator. Add if (c == 'c') { char buf[100]; size_t i = fread(buf, 1, 0, fp); printf("after fread of 0 bytes, feof() says %d\n", feof(fp)); } before the "(c == EOF)" test above to try that on your platform. > ... > I would be quite surprised if the plain-file case didn't work on Mac > and Windows. Don't know about Mac. On Windows everything is grossly complicated because of line-end translations in text mode. Like the C std says, the only *portable* thing you can do with an ftell() result for a text file is feed it back unaltered to fseek(). It so happens that on Windows, using MS's libc, if f.readline() returns "abc\n" for the first line of a native text file, f.tell() returns 5, reflecting the actual byte offset in the file (including the \r that .readline() doesn't show you). So you *can* get away with comparing f.tell() to the file's size on Windows too (using the MS C compiler; don't know about others). the-operational-defn-of-eof-is-the-only-portable-defn- there-is-ly y'rs - tim From moshez@zadka.site.co.il Tue Jan 9 04:08:29 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 06:08:29 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14938.6594.44596.509259@beluga.mojam.com> References: <14938.6594.44596.509259@beluga.mojam.com>, <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <3A59EF2B.792801E5@ActiveState.com> Message-ID: <20010109040829.BDB66A82D@darjeeling.zadka.site.co.il> [Paul Prescod] > It's not about keeping people out of your module. In fact I would > propose that mod.__dict__ should be as loose as ever. [Skip Montanaro] > Okay, how about this as a compromise first step? Allow programmers to put > __exports__ lists in their modules but don't do anything with them *except* > modify dir() to respect that if it exists? That would pretty up dir() > output for newbies, almost certainly not break anything, improve the > internal documentation of the modules that use __exports__, and still allow > us to move in a more restrictive direction at a later time if we so choose. I'm +1 on that personally. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal@lemburg.com Mon Jan 8 20:38:00 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 21:38:00 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> Message-ID: <3A5A2528.C289BE1D@lemburg.com> Guido van Rossum wrote: > > > Still, wouldn't it be wise to add some logic to Python to prevent > > importing untrusted modules, e.g. by making sys.path read-only and > > disabling the import hook usage using a command line ? > > > > This would at least prevent the most obvious attacks. I wonder how > > RedHat works around these problems. > > I don't understand what kind of attacks you are thinking of. What > would making sys.path read-only prevent? You seem to be thinking that > some malicious piece of code could try to subvert you by setting > sys.path. But what you forget is that if this piece of code cannot be > trusted wiuth sys.path, it should not be trusted to run at all! I was thinking an attack where knowledge of common temporary execution locations is used to trick Python into executing untrusted code -- the untrusted code would only have to be copied to the known temporary execution directory and then gets executed by Python next time the program using the temporary location is invoked. But you're right: this is possible with and without sys.path being writeable or not. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas@xs4all.net Mon Jan 8 20:45:57 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 21:45:57 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101081427.JAA03146@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 09:27:50AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> Message-ID: <20010108214557.H402@xs4all.nl> On Mon, Jan 08, 2001 at 09:27:50AM -0500, Guido van Rossum wrote: > > You may be right. Still, this patch solves the immediate problem in a > > reasonably clean way, and I urge that it should go in. We can do a > > more complete reorganization of the build process later. (I'll help with > > that; I'm pretty expert with autoconf and friends.) > I expect Andrew's code to go in before 2.1 is released. So I don't > see a reason why we should hurry and check in a stop-gap measure. Oh, we're gonna distribute binaries of Python 2.0/1.5.2-with-distutils for every known platform that can run configure ? :) I still think there are more than enough platforms without Python to warrant using autoconf for configuring modules. The module list and their demands are stable enough to make maintenance a fair breeze, IMHO. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin@mems-exchange.org Mon Jan 8 21:57:58 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 16:57:58 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108214557.H402@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 08, 2001 at 09:45:57PM +0100 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108214557.H402@xs4all.nl> Message-ID: <20010108165758.B9260@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 09:45:57PM +0100, Thomas Wouters wrote: >every known platform that can run configure ? :) I still think there are >more than enough platforms without Python to warrant using autoconf for >configuring modules. The module list and their demands are stable enough to >make maintenance a fair breeze, IMHO. Umm... the proposed PEP 229 patch would compile a Python binary with sre, posix, and strop statically linked; this minimal Python is then used to run the setup.py script. You shouldn't require a preinstalled Python, though the current version of the patch doesn't meet this requirement yet. --amk From tim.one@home.com Mon Jan 8 20:59:40 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 15:59:40 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: [Tim] > Perl appears to ignore the issue of thread safety here (on Windows and > everywhere else). [Paul Prescod] > If you can create a sample program that demonstrates the unsafety > I'll anonymously submit it as a bug on our internal system I don't want to spend time on that, as I *assume* it's already well-known within the Perl thread community. Besides, the last version of Perl I got from ActiveState complains: No threads in this perl at temp.pl line 14 if I try to use Perl threads. That's: > \perl\bin\perl -v This is perl, v5.6.0 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2000, Larry Wall Binary build 620 provided by ActiveState Tool Corp. http://www.ActiveState.com Built 18:31:05 Oct 31 2000 ... If I can repair that by downloading a more recent release, let me know. > and ensure that the next version of Perl is as slow as Python. :) I don't want to slow them down! To the contrary, now I've got a solid reason for why I keep using Perl for simple high-volume text-crunching jobs . > Seriously: If someone comes at me with Perl-IO-is-way-faster-than- > Python-IO, I'd like to know what concretely they've given up in order > to achieve that performance. My line-at-a-time test case used (rounding to nearest whole integers) 30 seconds in Python and 6 in Perl. The result of testing many changes to Python's implementation was that the excess 24 seconds broke down like so: 17 spent inside internal MS threadsafe getc() lock/unlock routines 5 uncertain, but evidence suggests much of it due to MS malloc/realloc (Perl does its own memory mgmt) 2 for not copying directly out of the platform FILE* implementation struct in a highly optimized loop (like Perl does) My last checkin to fileobject.c reclaimed 17 seconds on Win98SE while remaining threadsafe, via a combination of locking per line instead of per character, and invoking realloc much less often (only for lines exceeding 200 chars). (BTW, I'm still curious to know how that compares to the getc_unlocked hack on a platform other than Windows!) > And even just for my own interest I'd like to understand the cost/ > benefit of stream thread safety. If you're not *using* threads, or not using them to muck with the same stream at the same time, the ratio is infinite. And that's usually the case. > For instance would it make sense to just write a thread-safe > wrapper for streams used from multiple threads? Alas, on Windows you can't pick and choose: you get the threadsafe libc, or you don't. So long as anyone may want to use threads for any reason whatsoever, we must link with threadsafe libraries. But, as above, on Windows we're not paying much for that anymore in this case (unless maybe the threadsafe MS malloc family is also outrageously slower than its careless counterpart ...). It does prevent me from persuing the "optimized inner loop" business, because MS doesn't expose its locking primitives (so I can't do in C everything I would need to do to optimize the inner loop while remaining threadsafe). there-are-damn-few-pieces-of-libc-we-wouldn't-be-better-off- writing-ourselves-but-then-we'd-have-a-much-harder-time- playing-with-others'-code-ly y'rs - tim From akuchlin@mems-exchange.org Mon Jan 8 21:15:34 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 16:15:34 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:59:40PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: <20010108161534.A2392@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: >200 chars). (BTW, I'm still curious to know how that compares to the >getc_unlocked hack on a platform other than Windows!) On Solaris and Linux, the results seemed to be lost in the noise. Repeated runs of filetest.py were sometimes faster than without USE_MS_GETLINE_HACK, so the variation is probably large enough to swamp any difference between the two. (Assuming I enabled the getline hack correctly of course; someone please replicate...) --amk Linux: w/o USE_MS_GETLINE_HACK kronos Python-2.0>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.186 0.190 readlines_sizehint 0.108 0.110 using_fileinput 0.447 0.450 while_readline 0.184 0.180 Linux w/ USE_MS_GETLINE_HACK: kronos Python-2.0>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.178 0.180 readlines_sizehint 0.108 0.110 using_fileinput 0.434 0.430 while_readline 0.183 0.190 Solaris w/o USE_MS_GETLINE_HACK: amarok src>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.640 0.630 readlines_sizehint 0.278 0.280 using_fileinput 1.874 1.820 while_readline 0.839 0.840 Solaris w/ USE_MS_GETLINE_HACK: amarok src>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.569 0.570 readlines_sizehint 0.275 0.280 using_fileinput 1.902 1.900 while_readline 0.769 0.770 From gstein@lyra.org Mon Jan 8 21:29:40 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 13:29:40 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108161534.A2392@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Jan 08, 2001 at 04:15:34PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> <20010108161534.A2392@kronos.cnri.reston.va.us> Message-ID: <20010108132940.G4141@lyra.org> On Mon, Jan 08, 2001 at 04:15:34PM -0500, Andrew Kuchling wrote: > On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: > >200 chars). (BTW, I'm still curious to know how that compares to the > >getc_unlocked hack on a platform other than Windows!) > > On Solaris and Linux, the results seemed to be lost in the noise. Your times are so small... I'd suggest do a few iterations within filetest.py so your margin of error isn't so noticable. Cheers, -g >... > Linux: w/o USE_MS_GETLINE_HACK > kronos Python-2.0>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.186 0.190 > readlines_sizehint 0.108 0.110 > using_fileinput 0.447 0.450 > while_readline 0.184 0.180 > > Linux w/ USE_MS_GETLINE_HACK: > kronos Python-2.0>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.178 0.180 > readlines_sizehint 0.108 0.110 > using_fileinput 0.434 0.430 > while_readline 0.183 0.190 > Solaris w/o USE_MS_GETLINE_HACK: > amarok src>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.640 0.630 > readlines_sizehint 0.278 0.280 > using_fileinput 1.874 1.820 > while_readline 0.839 0.840 > > Solaris w/ USE_MS_GETLINE_HACK: > amarok src>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.569 0.570 > readlines_sizehint 0.275 0.280 > using_fileinput 1.902 1.900 > while_readline 0.769 0.770 -- Greg Stein, http://www.lyra.org/ From thomas@xs4all.net Mon Jan 8 21:59:17 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 22:59:17 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108165758.B9260@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Jan 08, 2001 at 04:57:58PM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108214557.H402@xs4all.nl> <20010108165758.B9260@kronos.cnri.reston.va.us> Message-ID: <20010108225916.P2467@xs4all.nl> On Mon, Jan 08, 2001 at 04:57:58PM -0500, Andrew Kuchling wrote: > Umm... the proposed PEP 229 patch would compile a Python binary with > sre, posix, and strop statically linked; this minimal Python is then > used to run the setup.py script. You shouldn't require a preinstalled > Python, though the current version of the patch doesn't meet this > requirement yet. Apologies. I should've bothered to read the PEP first, but I haven't found the time yet :P I retract all my comments on the subject until I do. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Mon Jan 8 22:08:50 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 23:08:50 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Tue, Jan 09, 2001 at 02:03:00AM +0200 References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com>, <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> <200101081515.KAA03474@cj20424-a.reston1.va.home.com> <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> Message-ID: <20010108230850.Q2467@xs4all.nl> On Tue, Jan 09, 2001 at 02:03:00AM +0200, Moshe Zadka wrote: > > (2) Under exactly what circumstances do you want from foo import * > > issue a warning? > All. > If you want to be less extreme, don't warn if the module defines > a __from_star_ok__ We already have a perfectly acceptable way of turning off warnings in particular circumstances. I'm +1 on warning against using 'from spam import *' by the way, though it would be even better (+2!) if there was a 'import * considered harmful' page/chapter in the documentation somewhere, so we could point to it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Mon Jan 8 22:23:02 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 17:23:02 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 21:38:00 +0100." <3A5A2528.C289BE1D@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> <3A5A2528.C289BE1D@lemburg.com> Message-ID: <200101082223.RAA05858@cj20424-a.reston1.va.home.com> > I was thinking an attack where knowledge of common temporary > execution locations is used to trick Python into executing > untrusted code -- the untrusted code would only have to be > copied to the known temporary execution directory and then > gets executed by Python next time the program using the temporary > location is invoked. When does Python execute code from a predictable common temporary location? When is that likely to be used from a Python script running as root? Note that if you use tempfile.TemporaryFile(), you can create a temporary file that's not subvertible. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Mon Jan 8 22:35:17 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 8 Jan 2001 17:35:17 -0500 (EST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010108230850.Q2467@xs4all.nl> References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> <200101081433.JAA03185@cj20424-a.reston1.va.home.com> <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> <20010108230850.Q2467@xs4all.nl> Message-ID: <14938.16549.944123.917467@cj42289-a.reston1.va.home.com> Thomas Wouters writes: > *' by the way, though it would be even better (+2!) if there was a 'import * > considered harmful' page/chapter in the documentation somewhere, so we could > point to it. Care to write it? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From MarkH@ActiveState.com Mon Jan 8 23:00:01 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Mon, 8 Jan 2001 15:00:01 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A5A05DA.86B3EB86@interet.com> Message-ID: > Limiting the code to pythonw.exe instead of trying to install > it in python20.dll was supposed to prevent damage to the use > of Python in servers. Since pythonw.exe is a Windows (GUI) program, > I am assuming there is a screen. Sometimes _no_ screen at all is wanted - ie, no main GUI window, and no console window. pythonw is used in this case. COM uses pythonw.exe in just this way, and when executed by DCOM, it will be executed in a context where the user can not see any such dialog. However, I would be happy to ensure the correct command-line is used to prevent this behaviour in this case. Indeed, in _every_ case I use pythonw.exe I would disable this - but I accept that other users have simpler requirements. > Having said that, you may be right that there is some way to > hang on a dialog box which can not be seen. It depends on what > MessageBox() and GetForegroundWindow() actually do. If it seems > that this patch has merit, I would be grateful if you would review > the code to look for issues of this type. There will be no issues in the code - it is just that Win2k will execute in a different "workspace" (I think that is the term). This is identical to the problem of a service attempting to display a messagebox - the code is perfect and works perfectly - just in a context where noone can see it, or dismiss it. > > I would prefer to see a decent API for extracting error and traceback > > information from Python. On the other hand, I _do_ see the problem for > > "newbies" trying to use pythonw.exe. > > There could be an API added to the winstdout module such as > msg = winstdout.GetMessageText() > which would return saved text, control its display etc. I was thinking more of a "Py_GetTraceback()", which would return a complete exception string. Thus, embedders could write code similar to: whatever = Py_BuildValue(...); ret = PyObject_Call(foo, whatever); ... if (!ok) { char *text = Py_GetTraceback(); MsgBox(text); } Thus, with only a small amount of work, they have _complete_ control over the output. However, I agree this doesnt really solve pythonw.exe's problems. > I do not view winstdout as a "newbie" feature, but rather a > generally useful C-language addition to Python. Hrm. I dont believe a commercial app, for example, would find this suitable - they would roll their own solution. Hence I see this purely for newbie users. Advanced users have complete control now - a simple try/except block around their main code, and you are pretty good. A builtin module for displaying a messagebox is as robust as an experienced user needs to emulate this, IMO. > I guess I am saying, perhaps incorrectly, that the mechanism provided > will make further redirection of sys.stdout unnecessary 99% of the > time. Yes, I disagree here. IMO it is no good for a commercial, real app. As I said, I see this as a feature so the newbie will not believe pythonw.exe is broken. Advanced users can already do similar things themselves. > Experimentation shows that Python composes tracebacks and > error messages a line or partial line at a time. That is, you can > not display each call to printf(), but must wait until the system is > idle to be sure that multiple calls to printf() are complete. So this > forces you to use the idle processing loop, not rocket science but > at least inconvenient. What "idle processing loop"? > And the only source of stdout/err is tracebacks, > error messages and the "print" statement. What would you do with > these in a Windows program except display an "OK" dialog box? Log the error to a file, and display a "friendly" dialog - possibly offering to automatically submit a support request/bug report. The casual user is going to be _very_ scared by a Python traceback. This is a sin of a similar magnitude to those crappy applications with unhandled VB exceptions. IMO, nothing looks more unprofessional than an app that displays an internal VB error message. Python is no different IMO. For real applications, there is a good chance that the majority of your users have never heard of Python. Thus, I don't believe your solution suitable for the real, professional, commercial user. However, I agree that your solution does not prevent this user doing the "right thing"... But all this does keep me believing this is a "newbie" helper. > > If someone out there knows of a different example of sys.stdout > redirection in use in the real world, it would be helpful if > they would describe it. Maybe it could be incorporated. Sure. Komodo to a file with a friendly dialog (sometimes ;-). Pythonwin actually attempts a few things first - eg, not every exception Pythonwin casues at startup should be logged. Python services write unhandled errors to the event log. I don't believe I have worked on 2 projects with the same requirement here!!! Mark. From nas@arctrix.com Mon Jan 8 16:22:10 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 8 Jan 2001 08:22:10 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:59:40PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: <20010108082210.A16149@glacier.fnational.com> On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: > My line-at-a-time test case used (rounding to nearest whole integers) 30 > seconds in Python and 6 in Perl. The result of testing many changes to > Python's implementation was that the excess 24 seconds broke down like so: > > 17 spent inside internal MS threadsafe getc() lock/unlock > routines > 5 uncertain, but evidence suggests much of it due to MS > malloc/realloc (Perl does its own memory mgmt) > 2 for not copying directly out of the platform FILE* > implementation struct in a highly optimized loop (like > Perl does) Have you tried pymalloc? Neil From billtut@microsoft.com Tue Jan 9 00:38:14 2001 From: billtut@microsoft.com (Bill Tutt) Date: Mon, 8 Jan 2001 16:38:14 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? Message-ID: <58C671173DB6174A93E9ED88DCB0883D0A6202@red-msg-07.redmond.corp.microsoft.com> > From: Mark Hammond [mailto:MarkH@ActiveState.com] > There will be no issues in the code - it is just that Win2k will execute in > a different "workspace" (I think that is the term). This is identical to > the problem of a service attempting to display a messagebox - the code is > perfect and works perfectly - just in a context where noone can see it, or > dismiss it. The term Mark is looking for here is Windowstation, and it's an NT thing, not just a Win2k thing. Windowstations have been around for ages. Bill From ping@lfw.org Tue Jan 9 01:51:15 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 8 Jan 2001 17:51:15 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14938.6594.44596.509259@beluga.mojam.com> Message-ID: On Mon, 8 Jan 2001, Skip Montanaro wrote: > Okay, how about this as a compromise first step? Allow programmers to put > __exports__ lists in their modules but don't do anything with them *except* > modify dir() to respect that if it exists? I'd say: Just have dir() and import * pay attention to __exports__. Don't mess with getattr or __dict__. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From ping@lfw.org Tue Jan 9 02:00:08 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 8 Jan 2001 18:00:08 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59F27D.C27B8CD0@ActiveState.com> Message-ID: On Mon, 8 Jan 2001, Paul Prescod wrote: > dir() is one of the "interactive tools" I'd like to work better in the > presence of __exports__. On the other hand, dir() works pretty poorly > for object instances today so maybe we need something new anyhow. I suggest a built-in function "methods()" that works like this: def methods(obj): if type(obj) is InstanceType: return methods(obj.__class__) results = [] if hasattr(obj, '__bases__'): for base in obj.__bases__: results.extend(methods(base)) results.extend( filter(lambda k, o=obj: type(getattr(o, k)) in [MethodType, BuiltinMethodType], dir(obj))) return unique(results) def unique(seq): dict = {} for item in seq: dict[item] = 1 results = dict.keys() results.sort() return results >>> import sys >>> >>> methods(sys.stdin) ['close', 'fileno', 'flush', 'isatty', 'read', 'readinto', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines'] >>> >>> import SocketServer >>> >>> methods(SocketServer.ForkingTCPServer) ['__init__', 'collect_children', 'fileno', 'finish_request', 'get_request', 'handle_error', 'handle_request', 'process_request', 'serve_forever', 'server_activate', 'server_bind', 'verify_request'] >>> -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From gstein@lyra.org Tue Jan 9 02:20:56 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 18:20:56 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.102,2.103 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, Jan 08, 2001 at 06:00:13PM -0800 References: Message-ID: <20010108182056.C4640@lyra.org> On Mon, Jan 08, 2001 at 06:00:13PM -0800, Guido van Rossum wrote: >... > Modified Files: > fileobject.c > Log Message: > Tsk, tsk, tsk. Treat FreeBSD the same as the other BSDs when defining > a fallback for TELL64. Fixes SF Bug #128119. >... > *** fileobject.c 2001/01/08 04:02:07 2.102 > --- fileobject.c 2001/01/09 02:00:11 2.103 > *************** > *** 59,63 **** > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > /* NOTE: this is only used on older > NetBSD prior to f*o() funcions */ > --- 59,63 ---- > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > /* NOTE: this is only used on older > NetBSD prior to f*o() funcions */ All of those #ifdefs could be tossed and it would be more robust (long term) if an autoconf macro were used to specify when TELL64 should be defined. [ I've looked thru fileobject.c and am a bit confused: the conditions for defining TELL64 do not match the conditions for *using* it. that would seem to imply a semantic error somewhere and/or a potential gotcha when they get skewed (like I assume what happened to FreeBSD). simplifying with an autoconf macro may help to rationalize it. ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Tue Jan 9 04:29:02 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:29:02 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108161534.A2392@kronos.cnri.reston.va.us> Message-ID: [Andrew Kuchling] I'll chop everything except while_readline (which is most affected by this stuff): > Linux: w/o USE_MS_GETLINE_HACK > while_readline 0.184 0.180 > > Linux w/ USE_MS_GETLINE_HACK: > while_readline 0.183 0.190 > > Solaris w/o USE_MS_GETLINE_HACK: > while_readline 0.839 0.840 > > Solaris w/ USE_MS_GETLINE_HACK: > while_readline 0.769 0.770 So it's probably a wash. In that case, do we want to maintain two hacks for this? I can't use the FLOCKFILE/etc approach on Windows, while "the Windows" approach probably works everywhere (although its speed relies on the platform factoring out at least the locking/unlocking in fgets). Both methods lack a refinement I would like to see, but can't achieve in "the Windows way": ensure that consistency is on no worse than a per-line basis. Right now, both methods lock/unlock the file only for the extent of the current buffer size, so that two threads *can* get back different interleaved pieces of a single long line. Like so: import thread def read(f): x = f.readline() print "thread saw " + `len(x)` + " chars" m.release() f = open("ga", "w") # a file with one long line f.write("x" * 100000 + "\n") f.close() m = thread.allocate_lock() for i in range(10): print i f = open("ga", "r") m.acquire() thread.start_new_thread(read, (f,)) x = f.readline() print "main saw " + `len(x)` + " chars" m.acquire(); m.release() f.close() Here's a typical run on Windows (current CVS Python): 0 main saw 95439 chars thread saw 4562 chars 1 main saw 97941 chars thread saw 2060 chars 2 thread saw 43801 chars main saw 56200 chars 3 thread saw 8011 chars main saw 91990 chars 4 main saw 46546 chars thread saw 53455 chars 5 thread saw 53125 chars main saw 46876 chars 6 main saw 98638 chars thread saw 1363 chars 7 main saw 72121 chars thread saw 27880 chars 8 thread saw 70031 chars main saw 29970 chars 9 thread saw 27555 chars main saw 72446 chars So, yes, it's threadsafe now: between them, the threads always see a grand total of 100001 characters. But what friggin' good is that ? If, e.g., Guido wants multiple threads to chew over his giant logfile, there's no guarantee that .readline() ever returns an actual line from the file. Not that Python 2.0 was any better in this respect ... From tim.one@home.com Tue Jan 9 04:48:25 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:48:25 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108082210.A16149@glacier.fnational.com> Message-ID: [Tim] > 5 uncertain, but evidence suggests much of it due to MS > malloc/realloc (Perl does its own memory mgmt) [NeilS] > Have you tried pymalloc? Not recently, and don't expect to find time for it this week. IIRC, Vladimir did get significant speedups-- lo those many years ago! --when he tried it on Windows, though. Maybe (or maybe not) that was due to exploiting the global lock (i.e., exploiting that pymalloc didn't need to do its own serialization, when called from the Python core). From tim.one@home.com Tue Jan 9 04:52:25 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:52:25 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [Tim] > ... > Here's a typical run on Windows (current CVS Python): > > 0 > main saw 95439 chars > thread saw 4562 chars > 1 > main saw 97941 chars > thread saw 2060 chars > 2 > thread saw 43801 chars > main saw 56200 chars > 3 > thread saw 8011 chars > main saw 91990 chars > 4 > main saw 46546 chars > thread saw 53455 chars > 5 > thread saw 53125 chars > main saw 46876 chars > 6 > main saw 98638 chars > thread saw 1363 chars > 7 > main saw 72121 chars > thread saw 27880 chars > 8 > thread saw 70031 chars > main saw 29970 chars > 9 > thread saw 27555 chars > main saw 72446 chars Oops! I lied. That was the released 2.0. Current CVS is either better or worse, depending on whether you think "working" by accident more often is a good thing or leads to false confidence : 0 main saw 100001 chars thread saw 0 chars 1 main saw 100001 chars thread saw 0 chars 2 main saw 100001 chars thread saw 0 chars 3 main saw 100001 chars thread saw 0 chars 4 main saw 100001 chars thread saw 0 chars 5 thread saw 25802 chars main saw 74199 chars 6 thread saw 802 chars main saw 99199 chars 7 main saw 100001 chars thread saw 0 chars 8 main saw 100001 chars thread saw 0 chars 9 main saw 100001 chars thread saw 0 chars From mal@lemburg.com Tue Jan 9 07:23:42 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 09 Jan 2001 08:23:42 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> <3A5A2528.C289BE1D@lemburg.com> <200101082223.RAA05858@cj20424-a.reston1.va.home.com> Message-ID: <3A5ABC7E.E953962B@lemburg.com> Guido van Rossum wrote: > > > I was thinking an attack where knowledge of common temporary > > execution locations is used to trick Python into executing > > untrusted code -- the untrusted code would only have to be > > copied to the known temporary execution directory and then > > gets executed by Python next time the program using the temporary > > location is invoked. > > When does Python execute code from a predictable common temporary > location? When is that likely to be used from a Python script running > as root? > > Note that if you use tempfile.TemporaryFile(), you can create a > temporary file that's not subvertible. It's not Python itself that's running temporary files. Tools like distutils, RPM, etc. tend to run Python code in temporary locations during build stages. That's what I was thinking about. OTOH, root should know where these tools run their code, so I guess it's moot to discuss who's fault this really is, e.g. distutils style distributions should never be unzipped to /tmp for subsequent installation, but nobody will prevent root from doing so. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one@home.com Tue Jan 9 07:35:09 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 9 Jan 2001 02:35:09 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Are you sure Perl still uses stdio at all? I've got solid answers now, but I'll paraphrase them anonymously to save the bother of untangling multi-person email etiquette snarls: + Yes, Perl uses platform stdio. Usually. Yes on Windows anyway. + But Perl "cheats" on Windows (well, everywhere it can ...), as I've explained in great detail half a dozen times over the years. No reason to retract any of that. + The cheating is not thread-safe. + The last stab at threads accessible from Perl was an experiment that got dropped. There are no user-muckable threads in std Perl builds. + But there is a notion of threads available at the C level. + This latter notion of threads is used to implement Perl's fork() on Windows, so can be exploited to test Windows Perl thread safety without writing a Perl extension module in C. + This Perl program (very much like the 2-threaded one I just posted for Python) uses that trick: ------------------------------------------------------------------- sub counter { my $nc = 0; while () { $nc += length; } print "num bytes seen = $nc\n"; } open(FILE, "ga"); binmode FILE; fork(); &counter(); ------------------------------------------------------------------- Under the covers, that really shares the FILE filehandle on Windows via threads. Running it multiple times yields multiple wild results; the number of bytes seen by parent and child rarely sum to the number of bytes actually in the input file ("ga"). The most common output for me is that one thread sees the entire file, while the other sees "a lot" of it (since the Perl inner loop registerizes its FILE* struct member shadows for as long as possible, that's actually what I expected). So the code is exactly as thread-unsafe as it looked. bosses-demand-answers-but-they-forget-their-questions-ly y'rs - tim From guido@python.org Tue Jan 9 13:41:24 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 09 Jan 2001 08:41:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 08 Jan 2001 23:29:02 EST." References: Message-ID: <200101091341.IAA09132@cj20424-a.reston1.va.home.com> > So it's probably a wash. In that case, do we want to maintain two hacks for > this? I can't use the FLOCKFILE/etc approach on Windows, while "the > Windows" approach probably works everywhere (although its speed relies on > the platform factoring out at least the locking/unlocking in fgets). I'm much more confident about the getc_unlocked() approach than about fgets() -- with the latter we need much more faith in the C library implementers. (E.g. that fgets() never writes beyond the null bytes it promises, and that it locks/unlocks only once.) Also, you're relying on blindingly fast memchr() and memset() implementations. > Both methods lack a refinement I would like to see, but can't achieve in > "the Windows way": ensure that consistency is on no worse than a per-line > basis. [Example omitted] The only portable way to ensure this that I can see, is to have a separate mutex in the Python file object. Since this is hardly a common thing to do, I think it's better to let the application manage that lock if they need it. (Then why are we bothering with flockfile(), you may ask? Because otherwise, accidental multithreaded reading from the same file could cause core dumps.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Tue Jan 9 15:48:13 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 9 Jan 2001 10:48:13 -0500 Subject: [Python-Dev] Python 2.1 release schedule (PEP 226) In-Reply-To: <200101051529.KAA19100@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 05, 2001 at 10:29:05AM -0500 References: <200101051529.KAA19100@cj20424-a.reston1.va.home.com> Message-ID: <20010109104813.D6203@kronos.cnri.reston.va.us> On Fri, Jan 05, 2001 at 10:29:05AM -0500, Guido van Rossum wrote: > S 222 pep-0222.txt Web Library Enhancements Kuchling > > This is really up to Andrew. It seems he plans to create > new modules, so he won't be introducing incompatibilities in > existing APIs. I don't think PEP 222 will be worked on for 2.1; there have only been a few reactions, and none at all on the python-web-modules mailing list, so I don't think anyone really cares very much at this point. Maybe for 2.2, or maybe I'll just write new classes for Quixote. That leaves PEP 229 as the only PEP I need to work on for 2.1. --amk From tim.one@home.com Tue Jan 9 21:12:42 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 9 Jan 2001 16:12:42 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101091341.IAA09132@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I'm much more confident about the getc_unlocked() approach than about > fgets() -- with the latter we need much more faith in the C library > implementers. (E.g. that fgets() never writes beyond the null bytes > it promises, and that it locks/unlocks only once.) Also, you're > relying on blindingly fast memchr() and memset() implementations. Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a bit quicker on Solaris, despite that it's paying an extra layer of function call per line, to keep it out of get_line proper). That tells me the assumptions are indeed mild. The business about not writing beyond the null byte is a concern only I would have raised: the possibility is an aggressively paranoid reading of the std (I do *lots* of things with libc I'm paranoid about <0.9 wink>). If even *Microsoft* didn't blow these things, it's hard to imagine any other vendor exploding ... Still, I'd rather get rid of ms_getline_hack if I could, because the code is so much more complicated. >> Both methods lack a refinement I would like to see, but can't >> achieve in "the Windows way": ensure that consistency is on no >> worse than a per-line basis. [Example omitted] > The only portable way to ensure this that I can see, is to have a > separate mutex in the Python file object. Since this is hardly a > common thing to do, I think it's better to let the application manage > that lock if they need it. Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the file locked until the line was complete, and I wouldn't be opposed to making life saner on platforms that allow it. But there's another problem here: part of the reason we release Python threads around the fgets is in case some other thread is trying to write the data we're trying to read, yes? But since FLOCKFILE is in effect, other threads *trying* to write to the stream we're reading will get blocked anyway. Seems to give us potential for deadlocks. > (Then why are we bothering with flockfile(), you may ask? I wouldn't ask that, no . > Because otherwise, accidental multithreaded reading from the same > file could cause core dumps.) Ugh ... turns out that on my box I can provoke core dumps anyway, with this program. Blows up under released 2.0 and CVS Pythons (so it's not due to anything new): import thread def read(f): import time time.sleep(.01) n = 0 while n < 1000000: x = f.readline() n += len(x) print "r", print "read " + `n` m.release() m = thread.allocate_lock() f = open("ga", "w+") print "opened" m.acquire() thread.start_new_thread(read, (f,)) n = 0 x = "x" * 113 + "\n" while n < 1000000: f.write(x) print "w", n += len(x) m.acquire() print "done" Typical run: C:\Python20>\code\python\dist\src\pcbuild\python temp.py opened w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r r w r w r w r w r w r and then it dies in msvcrt.dll with a bad pointer. Also dies under the debugger (yay!) ... always dies like so: + We (Python) call the MS fwrite, from fileobject.c file_write. + MS fwrite succeeds with its _lock_str(stream) call. + MS fwrite then calls MS _fwrite_lk. + MS _fwrite_lk calls memcpy, which blows up for a non-obvious reason. Looks like the stream's _cnt member has gone mildly negative, which _fwrite_lk casts to unsigned and so treats like a giant positive count, and so memcpy eventually runs off the end of the process address space. Only thing I can conclude from this is that MS's internal stream-locking implementation is buggy. At least on W98SE. Other flavors of Windows? Other platforms? Note that I don't claim the program above is *sensible*, just that it shouldn't blow up. Alas, short of indeed adding a separate mutex in Python file objects-- or writing our own stdio --I don't believe I can fix this. the-best-thing-to-do-with-threads-is-don't-ly y'rs - tim From fdrake@acm.org Tue Jan 9 22:58:49 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 9 Jan 2001 17:58:49 -0500 (EST) Subject: [Python-Dev] Updated development documentation Message-ID: <14939.38825.218757.535010@cj42289-a.reston1.va.home.com> I've just updated the development version of the documentation, but am not sure the automated notice got sent. This version contains a wide variety of smaller updates, plus added documentation on the fpectl and xreadlines modules. http://python.sourceforge.net/devel-docs/ -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From MarkH@ActiveState.com Wed Jan 10 00:00:03 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Tue, 9 Jan 2001 16:00:03 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: > Only thing I can conclude from this is that MS's internal stream-locking > implementation is buggy. At least on W98SE. Other flavors of Windows? > Other platforms? Same behaviour on Win2k for me. Mark. From tim.one@home.com Wed Jan 10 00:55:11 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 9 Jan 2001 19:55:11 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: Final report (I've spent way more time on this than I can afford already, so it's "final" by defn <0.3 wink>). We started here (on my Win98SE box, using Guido's test program): total 117615824 chars and 3237568 lines count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 Here's where we are today: total 117615824 chars and 3237568 lines count_chars_lines 14.670 14.667 readlines_sizehint 9.500 9.506 using_fileinput 28.670 28.708 while_readline 13.680 13.676 for_xreadlines 7.630 7.635 Same box, same input file, same test program except for this addition: def for_xreadlines(fn): f = open(fn, MODE) for line in xreadlines.xreadlines(f): pass f.close() This last is within 25% of Perl "while (<>)" speed, but-- unlike Perl --is thread-safe. Good show! The other speedups are nothing to snort at either. The strangest thing left to my eye is why xreadlines enjoys a significant advantage over the double-loop buffering method (readlines_sizehint) on my box; reducing the very large (1Mb) buffer in Guido's test program made no material difference to that. nothing's-ever-finished-but-everything-ends-ly y'rs - tim From tim.one@home.com Wed Jan 10 05:46:24 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 10 Jan 2001 00:46:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [Tim] > Only thing I can conclude from this is that MS's internal stream- > locking implementation is buggy. At least on W98SE. Other flavors > of Windows? Other platforms? [Mark Hammond] > Same behaviour on Win2k for me. Thanks, Mark! I opened a bug on SF to record more clues: http://sourceforge.net/bugs/?func=detailbug&bug_id=128210&group_id=5470 I didn't assign it to anyone because-- best I can tell --there's nothing realistic we can do about it. Probably won't happen in practice anyway . there's-a-reason-thread-problems-pop-up-on-windows-first-but- ms-isn't-it-ly y'rs - tim From billtut@microsoft.com Wed Jan 10 09:10:51 2001 From: billtut@microsoft.com (Bill Tutt) Date: Wed, 10 Jan 2001 01:10:51 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> With a nice simple C test case from Tim, I've submitted this one to internal support. I'll let everybody know what happens when I know more. Bill -----Original Message----- From: Tim Peters [mailto:tim.one@home.com] Sent: Tuesday, January 09, 2001 9:46 PM To: python-dev@python.org Subject: RE: [Python-Dev] xreadlines : readlines :: xrange : range [Tim] > Only thing I can conclude from this is that MS's internal stream- > locking implementation is buggy. At least on W98SE. Other flavors > of Windows? Other platforms? [Mark Hammond] > Same behaviour on Win2k for me. Thanks, Mark! I opened a bug on SF to record more clues: http://sourceforge.net/bugs/?func=detailbug&bug_id=128210&group_id=5470 I didn't assign it to anyone because-- best I can tell --there's nothing realistic we can do about it. Probably won't happen in practice anyway . there's-a-reason-thread-problems-pop-up-on-windows-first-but- ms-isn't-it-ly y'rs - tim _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://www.python.org/mailman/listinfo/python-dev From m.favas@per.dem.csiro.au Wed Jan 10 11:57:56 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Wed, 10 Jan 2001 19:57:56 +0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint Message-ID: <3A5C4E44.23B593E9@per.dem.csiro.au> Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same behaviour as Tim's WinBox wrt the new xreadline and the double-loop readlines (so it's not just something funny with MS (not that there's not anything funny with MS...)): total 131426612 chars and 514216 lines count_chars_lines 5.450 5.066 readlines_sizehint 4.112 4.083 using_fileinput 10.928 10.916 while_readline 11.766 11.733 for_xreadlines 3.569 3.533 -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tismer@tismer.com Wed Jan 10 11:06:42 2001 From: tismer@tismer.com (Christian Tismer) Date: Wed, 10 Jan 2001 13:06:42 +0200 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A5C4242.E445C3A1@tismer.com> Ka-Ping Yee wrote: > > On Mon, 8 Jan 2001, Skip Montanaro wrote: > > Okay, how about this as a compromise first step? Allow programmers to put > > __exports__ lists in their modules but don't do anything with them *except* > > modify dir() to respect that if it exists? > > I'd say: Just have dir() and import * pay attention to __exports__. > Don't mess with getattr or __dict__. quadruple-nodd - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal@lemburg.com Wed Jan 10 13:21:28 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 10 Jan 2001 14:21:28 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <3A5C61D8.2E5D098C@lemburg.com> Guido van Rossum wrote: > > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). Can't we use the existing attribute __all__ (this is currently only used for packages) for this kind of thing. As other have already remarked: I would rather like to see this attribute being used as basis for 'from M import *' rather than enforce the access restrictions like the patch suggests. Access control mechanisms should be treated in different ways such as wrapping objects using access-control proxies (see mx.Proxy for an example of such an implementation) and on-demand only. I wouldn't wan't to pay the performance hit for each and every lookup in all my Python applications just because someone out there feels that "from M import *" has a meaning in life apart from being useful in interactive sessions to ease typing ;-) > I like it. This has been asked for many times. Does anybody see a > reason why this should *not* be added? > > Tim remarked that introducing this will prompt demands for a similar > feature on classes and instances, where it will be hard to implement > without causing a bit of a slowdown. It causes a slight slowdown (an > extra dictionary lookup for each use of "M.v") even when it is not > used, but for accessing module variables that's acceptable. I'm not > so sure about instance variable references. Again, I'd rather see these implemented using different techniques which are under programmer control and made explicit and visible in the program flow. Proxies are ideal for these things, since they allow great flexibility while still providing reasonable security at Python level. I have been using the proxy approach for years now and so far with great success. What's even better is that weak references and garbage finalization aids come along with it for free. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Wed Jan 10 15:12:56 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 10:12:56 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 09 Jan 2001 19:55:11 EST." References: Message-ID: <200101101512.KAA26193@cj20424-a.reston1.va.home.com> > The strangest thing left to my eye is why xreadlines enjoys a significant > advantage over the double-loop buffering method (readlines_sizehint) on my > box; reducing the very large (1Mb) buffer in Guido's test program made no > material difference to that. I was baffled at this too (same difference on my box), until I discovered that the buffer size is specified *twice*: once as a default in the arg list of readlines_sizehint(), then *again* in the call to timer() near the bottom of the file. Take the latter one out and the times are comparable, in fact readlines_sizehint() is a few percent quicker. --Guido van Rossum (home page: http://www.python.org/~guido/) From jim@interet.com Wed Jan 10 15:19:01 2001 From: jim@interet.com (James C. Ahlstrom) Date: Wed, 10 Jan 2001 10:19:01 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? References: Message-ID: <3A5C7D65.780065C6@interet.com> Mark Hammond wrote: > Sometimes _no_ screen at all is wanted - ie, no main GUI window, and no > console window. pythonw is used in this case. COM uses pythonw.exe in just > this way, and when executed by DCOM, it will be executed in a context where > the user can not see any such dialog. > > However, I would be happy to ensure the correct command-line is used to > prevent this behaviour in this case. > > Indeed, in _every_ case I use pythonw.exe I would disable this - but I > accept that other users have simpler requirements. It would be easier to have a pythonw2.exe where this feature is built in, rather than a command line option. But see below. > > I do not view winstdout as a "newbie" feature, but rather a > > generally useful C-language addition to Python. > > Hrm. I dont believe a commercial app, for example, would find this > suitable - they would roll their own solution. ... > > I guess I am saying, perhaps incorrectly, that the mechanism provided > > will make further redirection of sys.stdout unnecessary 99% of the > > time. > > Yes, I disagree here. IMO it is no good for a commercial, real app. As I ... > > If someone out there knows of a different example of sys.stdout > > redirection in use in the real world, it would be helpful if > > they would describe it. Maybe it could be incorporated. > > Sure. Komodo to a file with a friendly dialog (sometimes ;-). ... > I don't believe I have worked on 2 projects with the same requirement > here!!! Well, that is the problem. Is this feature "generally useful"? I am writing Windows programs in which Python is the "main" and provides the GUI, so I find this useful. And I do show my users tracebacks. But perhaps this is unique to me. I don't see users of wxPython nor tkinter replying "great idea" so maybe they don't use pythonw. Absent more support, I don't think this idea has enough merit to justify a patch. JimA From guido@python.org Wed Jan 10 16:39:34 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:39:34 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 01:10:51 PST." <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> References: <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> Message-ID: <200101101639.LAA26776@cj20424-a.reston1.va.home.com> > With a nice simple C test case from Tim, I've submitted this one to internal > support. > I'll let everybody know what happens when I know more. I bet you it's rejected on the basis of "the docs tell you not to mix reading and writing on the same stream without intervening seek or flush." If I were on the support line I would do that. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jan 10 16:38:16 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:38:16 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 09 Jan 2001 16:12:42 EST." References: Message-ID: <200101101638.LAA26759@cj20424-a.reston1.va.home.com> > [Guido] > > I'm much more confident about the getc_unlocked() approach than about > > fgets() -- with the latter we need much more faith in the C library > > implementers. (E.g. that fgets() never writes beyond the null bytes > > it promises, and that it locks/unlocks only once.) Also, you're > > relying on blindingly fast memchr() and memset() implementations. [Tim] > Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a > bit quicker on Solaris, despite that it's paying an extra layer of function > call per line, to keep it out of get_line proper). That tells me the > assumptions are indeed mild. The business about not writing beyond the null > byte is a concern only I would have raised: the possibility is an > aggressively paranoid reading of the std (I do *lots* of things with libc > I'm paranoid about <0.9 wink>). If even *Microsoft* didn't blow these > things, it's hard to imagine any other vendor exploding ... > > Still, I'd rather get rid of ms_getline_hack if I could, because the code is > so much more complicated. Which is another argument to prefer the getc_unlocked() code when it works -- it's obviously correct. :-) > >> Both methods lack a refinement I would like to see, but can't > >> achieve in "the Windows way": ensure that consistency is on no > >> worse than a per-line basis. [Example omitted] > > > The only portable way to ensure this that I can see, is to have a > > separate mutex in the Python file object. Since this is hardly a > > common thing to do, I think it's better to let the application manage > > that lock if they need it. > > Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the > file locked until the line was complete, and I wouldn't be opposed to making > life saner on platforms that allow it. Hm... That would be possible, except for one unfortunate detail: _PyString_Resize() may call PyErr_BadInternalCall() which touches thread state. > But there's another problem here: > part of the reason we release Python threads around the fgets is in case > some other thread is trying to write the data we're trying to read, yes? NO, NO NO! Mixing reads and writes on the same stream wasn't what we are locking against at all. (As you've found out, it doesn't even work.) We're only trying to protect against concurrent *reads*. > But since FLOCKFILE is in effect, other threads *trying* to write to the > stream we're reading will get blocked anyway. Seems to give us potential > for deadlocks. Only if tyeh are holding other locks at the same time. I haven't done a thorough survey of fileobject.c, but I've skimmed it, I believe it's religious about releasing the Global Interpreter Lock around I/O calls. But, of course, 3rd party C code might not be. > > (Then why are we bothering with flockfile(), you may ask? > > I wouldn't ask that, no . > > > Because otherwise, accidental multithreaded reading from the same > > file could cause core dumps.) > > Ugh ... turns out that on my box I can provoke core dumps anyway, with this > program. Blows up under released 2.0 and CVS Pythons (so it's not due to > anything new): Yeah. But this is insane use -- see my comments on SF. It's only worth fixing because it could be used to intentionally crash Python -- but there are easier ways... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Wed Jan 10 16:41:47 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 10 Jan 2001 10:41:47 -0600 (CST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? Message-ID: <14940.37067.893679.750918@beluga.mojam.com> I just noticed that the "Environment" options for Python on the SF site are listed as Console (Text Based), Win32 (MS Windows), X11 Applications Shouldn't something Macintosh-related be in that list as well? Skip From guido@python.org Wed Jan 10 16:53:16 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:53:16 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Wed, 10 Jan 2001 14:21:28 +0100." <3A5C61D8.2E5D098C@lemburg.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> Message-ID: <200101101653.LAA28986@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > Please have a look at this SF patch: > > > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > > > This implements control over which names defined in a module are > > externally visible: if there's a variable __exports__ in the module, > > it is a list of identifiers, and any access from outside the module to > > names not in the list is disallowed. This affects access using the > > getattr and setattr protocols (which raise AttributeError for > > disallowed names), as well as "from M import v" (which raises > > ImportError). [Marc-Andre] > Can't we use the existing attribute __all__ (this is currently > only used for packages) for this kind of thing. As other have already > remarked: I would rather like to see this attribute being used > as basis for 'from M import *' rather than enforce the access > restrictions like the patch suggests. Yes -- I came up with the same thought. So here's a plan: somebody please submit a patch that does only one thing: from...import * looks for __all__ and if it exists, imports exactly those names. No changes to dir(), or anything. > Access control mechanisms should be treated in different ways > such as wrapping objects using access-control proxies (see mx.Proxy > for an example of such an implementation) and on-demand only. > I wouldn't wan't to pay the performance hit for each and every > lookup in all my Python applications just because someone out > there feels that "from M import *" has a meaning in life > apart from being useful in interactive sessions to ease typing ;-) In the process of looking into Zope internals I've noticed that proxies are indeed very useful! I note that the IMPORT opcodes in ceval.c require that the imported module (as found in sys.modules[name] or returned by __import__()) is a real module object. I think this is unnecessary -- at least IMPORT_FROM should work even if the module is a proxy or some other thing (I've been known to smuggle class instances into sys.modules :-) and IMPORT_STAR should work with a non-module at least if it has an __all__ attribute. > > I like it. This has been asked for many times. Does anybody see a > > reason why this should *not* be added? > > > > Tim remarked that introducing this will prompt demands for a similar > > feature on classes and instances, where it will be hard to implement > > without causing a bit of a slowdown. It causes a slight slowdown (an > > extra dictionary lookup for each use of "M.v") even when it is not > > used, but for accessing module variables that's acceptable. I'm not > > so sure about instance variable references. > > Again, I'd rather see these implemented using different > techniques which are under programmer control and made > explicit and visible in the program flow. Proxies are ideal > for these things, since they allow great flexibility while > still providing reasonable security at Python level. > > I have been using the proxy approach for years now and > so far with great success. What's even better is that > weak references and garbage finalization aids come along with > it for free. Agreed. Which reminds me -- would you mind reviewing Fred's new version of PEP 205 (weak refs)? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Jan 10 17:12:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 10 Jan 2001 18:12:20 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <3A5C97F4.945D0C1@lemburg.com> Guido van Rossum wrote: > > > Guido van Rossum wrote: > > > > > > Please have a look at this SF patch: > > > > > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > > > > > This implements control over which names defined in a module are > > > externally visible: if there's a variable __exports__ in the module, > > > it is a list of identifiers, and any access from outside the module to > > > names not in the list is disallowed. This affects access using the > > > getattr and setattr protocols (which raise AttributeError for > > > disallowed names), as well as "from M import v" (which raises > > > ImportError). > > [Marc-Andre] > > Can't we use the existing attribute __all__ (this is currently > > only used for packages) for this kind of thing. As other have already > > remarked: I would rather like to see this attribute being used > > as basis for 'from M import *' rather than enforce the access > > restrictions like the patch suggests. > > Yes -- I came up with the same thought. Sorry, I didn't read the whole thread on the topic. Rereading the above paragraph I guess I should have had some more coffee at the time of writing ;-) > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. +1 -- this won't be me though (at least not this week). > > Access control mechanisms should be treated in different ways > > such as wrapping objects using access-control proxies (see mx.Proxy > > for an example of such an implementation) and on-demand only. > > I wouldn't wan't to pay the performance hit for each and every > > lookup in all my Python applications just because someone out > > there feels that "from M import *" has a meaning in life > > apart from being useful in interactive sessions to ease typing ;-) > > In the process of looking into Zope internals I've noticed that > proxies are indeed very useful! > > I note that the IMPORT opcodes in ceval.c require that the imported > module (as found in sys.modules[name] or returned by __import__()) is > a real module object. I think this is unnecessary -- at least > IMPORT_FROM should work even if the module is a proxy or some other > thing (I've been known to smuggle class instances into sys.modules :-) > and IMPORT_STAR should work with a non-module at least if it has an > __all__ attribute. Cool. This could make Python instances usable as "modules" -- with full getattr() hook support ! For IMPORT_STAR I'd suggest first looking for __all__ and then reverting to __dict__.items() in case this fails. BTW, is __dict__ needed by the import mechanism or would the getattr/setattr slots suffice ? And if yes, must it be a real Python dictionary ? > > > I like it. This has been asked for many times. Does anybody see a > > > reason why this should *not* be added? > > > > > > Tim remarked that introducing this will prompt demands for a similar > > > feature on classes and instances, where it will be hard to implement > > > without causing a bit of a slowdown. It causes a slight slowdown (an > > > extra dictionary lookup for each use of "M.v") even when it is not > > > used, but for accessing module variables that's acceptable. I'm not > > > so sure about instance variable references. > > > > Again, I'd rather see these implemented using different > > techniques which are under programmer control and made > > explicit and visible in the program flow. Proxies are ideal > > for these things, since they allow great flexibility while > > still providing reasonable security at Python level. > > > > I have been using the proxy approach for years now and > > so far with great success. What's even better is that > > weak references and garbage finalization aids come along with > > it for free. > > Agreed. Which reminds me -- would you mind reviewing Fred's new > version of PEP 205 (weak refs)? I'll have a look at it next week. Is that OK ? > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake@acm.org Wed Jan 10 17:37:58 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 10 Jan 2001 12:37:58 -0500 (EST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: <14940.37067.893679.750918@beluga.mojam.com> References: <14940.37067.893679.750918@beluga.mojam.com> Message-ID: <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > I just noticed that the "Environment" options for Python on the SF site are > listed as > > Console (Text Based), Win32 (MS Windows), X11 Applications > > Shouldn't something Macintosh-related be in that list as well? Are the maintainers of the MacOS port using the SF bug tracker or something else? If they're using it, then by all means we should add it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas@xs4all.net Wed Jan 10 18:06:06 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 10 Jan 2001 19:06:06 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,NONE,1.1 Setup.dist,1.3,1.4 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Tue, Jan 09, 2001 at 01:46:53PM -0800 References: Message-ID: <20010110190606.T2467@xs4all.nl> On Tue, Jan 09, 2001 at 01:46:53PM -0800, Guido van Rossum wrote: > static void > xreadlines_dealloc(PyXReadlinesObject *op) { > Py_XDECREF(op->file); > Py_XDECREF(op->lines); > PyObject_DEL(op); > } I'm confuzzled. Is this breach of the style guidelines intentional, accidental, or just not cared enough about ? The style isn't even consistent in that single module! > void > initxreadlines(void) > { > PyObject *m; > > m = Py_InitModule("xreadlines", xreadlines_methods); > } -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip@mojam.com (Skip Montanaro) Wed Jan 10 18:11:52 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 10 Jan 2001 12:11:52 -0600 (CST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> References: <14940.37067.893679.750918@beluga.mojam.com> <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> Message-ID: <14940.42472.174920.866172@beluga.mojam.com> Fred> Are the maintainers of the MacOS port using the SF bug tracker or Fred> something else? If they're using it, then by all means we should Fred> add it. Even if they aren't, I think it would be valuable to list. There aren't all that many tools (open source or otherwise) that run on Unix, Windows and Mac and can be used as either a console app or a GUI. I assume the reason Fred asks is that the Environment: list is generated on-the-fly and somehow ties into use of the SF bug tracker. Skip From thomas@xs4all.net Wed Jan 10 18:45:44 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 10 Jan 2001 19:45:44 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101101653.LAA28986@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 10, 2001 at 11:53:16AM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010110194544.V2467@xs4all.nl> On Wed, Jan 10, 2001 at 11:53:16AM -0500, Guido van Rossum wrote: > I note that the IMPORT opcodes in ceval.c require that the imported > module (as found in sys.modules[name] or returned by __import__()) is > a real module object. I think this is unnecessary -- at least > IMPORT_FROM should work even if the module is a proxy or some other > thing (I've been known to smuggle class instances into sys.modules :-) > and IMPORT_STAR should work with a non-module at least if it has an > __all__ attribute. Hmm.... Have you been sneaking looks at python-list again, Guido ? :-) I'm certain the expanding of IMPORT would make a lot of people very happy. Alex Martelli only just discovered the fact you can populate sys.modules yourself, with non-module objects, and was wondering about its legality and compatibility. I, for one, am very +1 on the idea, also on MAL's idea to do our best in the IMPORT_STAR case (try dict.items(), etc.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Wed Jan 10 18:49:40 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 10 Jan 2001 13:49:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101101512.KAA26193@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > The strangest thing left to my eye is why xreadlines enjoys a > significant advantage over the double-loop buffering method > (readlines_sizehint) on my box; reducing the very large > (1Mb) buffer in Guido's test program made no material difference > to that. [Guido] > I was baffled at this too (same difference on my box), until I > discovered that the buffer size is specified *twice*: once as a > default in the arg list of readlines_sizehint(), then *again* in > the call to timer() near the bottom of the file. Bingo! > Take the latter one out and the times are comparable, in fact > readlines_sizehint() is a few percent quicker. They're indistinguishable then on my box (on one run xreadlines is .1 seconds (out of around 7.6 total) quicker, on another readlines_sizehint), *provided* that I specify the same buffer size (8192) that xreadlines uses internally. However, if I even double that, readlines_sizehint is uniformly about 10% slower. It's also a tiny bit slower if I cut the sizehint buffer size to 4096. I'm afraid Mysteries will remain no matter how many person-decades we spend staring at this <0.5 wink> ... From guido@python.org Wed Jan 10 18:50:10 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 13:50:10 -0500 Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: Your message of "Wed, 10 Jan 2001 10:41:47 CST." <14940.37067.893679.750918@beluga.mojam.com> References: <14940.37067.893679.750918@beluga.mojam.com> Message-ID: <200101101850.NAA29744@cj20424-a.reston1.va.home.com> > I just noticed that the "Environment" options for Python on the SF site are > listed as > > Console (Text Based), Win32 (MS Windows), X11 Applications > > Shouldn't something Macintosh-related be in that list as well? Yeah, except for two problems: :-) (1) This is a selection from a drop-down menu that doesn't have a Mac option; (2) There are only three slots allowed. So this is the best we can do. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Wed Jan 10 18:53:32 2001 From: gstein@lyra.org (Greg Stein) Date: Wed, 10 Jan 2001 10:53:32 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010110194544.V2467@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 10, 2001 at 07:45:44PM +0100 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> <20010110194544.V2467@xs4all.nl> Message-ID: <20010110105332.T4640@lyra.org> On Wed, Jan 10, 2001 at 07:45:44PM +0100, Thomas Wouters wrote: > On Wed, Jan 10, 2001 at 11:53:16AM -0500, Guido van Rossum wrote: > > > I note that the IMPORT opcodes in ceval.c require that the imported > > module (as found in sys.modules[name] or returned by __import__()) is > > a real module object. I think this is unnecessary -- at least > > IMPORT_FROM should work even if the module is a proxy or some other > > thing (I've been known to smuggle class instances into sys.modules :-) > > and IMPORT_STAR should work with a non-module at least if it has an > > __all__ attribute. > > Hmm.... Have you been sneaking looks at python-list again, Guido ? :-) I'm > certain the expanding of IMPORT would make a lot of people very happy. Alex > Martelli only just discovered the fact you can populate sys.modules > yourself, with non-module objects, and was wondering about its legality and > compatibility. > > I, for one, am very +1 on the idea, also on MAL's idea to do our best in the > IMPORT_STAR case (try dict.items(), etc.) +1 ... I'm always up for removing type restrictions. Did that with the bytecodes in function objects a while back. Cheers, -g -- Greg Stein, http://www.lyra.org/ From MarkH@ActiveState.com Wed Jan 10 18:54:34 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Wed, 10 Jan 2001 10:54:34 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,NONE,1.1 Setup.dist,1.3,1.4 In-Reply-To: <20010110190606.T2467@xs4all.nl> Message-ID: > I'm confuzzled. Is this breach of the style guidelines intentional, > accidental, or just not cared enough about ? I vote the latter! Who-really-cares ly, Mark. From guido@python.org Wed Jan 10 19:00:24 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 14:00:24 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: Your message of "Mon, 08 Jan 2001 11:31:09 EST." <20010108113109.C7563@kronos.cnri.reston.va.us> References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> Message-ID: <200101101900.OAA30486@cj20424-a.reston1.va.home.com> [me] > >I expect Andrew's code to go in before 2.1 is released. So I don't > >see a reason why we should hurry and check in a stop-gap measure. [Andrew] > But it might not; the final version might be unacceptable or run into > some intractable problem. Assuming the patch is correct (I haven't > looked at it), why not check it in? The work has already been done to > write it, after all. OK, done. It was more work than I had hoped for, because Eric apparently (despite having developer privileges!) doesn't use the CVS tree -- he sent in a diff relative to the 2.0 release. I munged it into place, adding the feature that readline, _curses and bsdddb are built as shared libraries by default. You'd have to edit Setup.config.in to change this. Hope this doesn't break anybody's setup. (Skip???) Question for Eric: do you still want developer privileges? They come with responsibilities too. Please check out the @#$%& CVS tree! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jan 10 19:03:07 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 14:03:07 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 01 Jan 2001 19:49:35 CST." <20010101194935.19672@falcon.inetnebr.com> References: <20010101194935.19672@falcon.inetnebr.com> Message-ID: <200101101903.OAA30522@cj20424-a.reston1.va.home.com> Hi Jeff, I'm glad to tell you that I've accepted your xreadlines patches. It's all checked into the CVS tree now, except for your patch to fileinput.py, where I had already checked in a similar change using readlines(sizehint) directly. Thanks again for your contribution! --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp@ActiveState.com Wed Jan 10 20:08:31 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 10 Jan 2001 12:08:31 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <3A5CC13F.DFB26A0B@ActiveState.com> Guido van Rossum wrote: > > ... > > Yes -- I came up with the same thought. > > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. Why? From my point of view, the changes to dir() are much more important. I seldom tell newbies about import * but I always tell them how they can browse objects (especially modules) with dir. If dir() is changed then IDEs and so forth would use that and inherit the right behavior. If the module exporting behavior gets more sophisticated in a future version of Python they will continue to inherit the behavior. Also, dir() could look for an __all__ on all objects including "module proxies", classes and "plain old instances". In other words we can extend the convention to other objects "for free". Paul From tim.one@home.com Wed Jan 10 20:25:24 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 10 Jan 2001 15:25:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101101638.LAA26759@cj20424-a.reston1.va.home.com> Message-ID: [Tim] >> Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method >> to keep the file locked until the line was complete, and I >> wouldn't be opposed to making life saner on platforms that allow it. [Guido] > Hm... That would be possible, except for one unfortunate detail: > _PyString_Resize() may call PyErr_BadInternalCall() which touches > thread state. FLOCKFILE/FUNLOCKFILE are independent of Python's notion of thread state. IOW, do FLOCKFILE once before the for(;;), and FUNLOCKFILE once on every *exit* path thereafter. We can block/unblock Python threads as often as desired between those *file*-locking brackets. The only thing the repeated FLOCKFILE/FUNLOCKFILE calls do to my eyes now is to create the *possibility* for multiple readers to get partial lines of the file. > ... > NO, NO NO! Mixing reads and writes on the same stream wasn't what we > are locking against at all. (As you've found out, it doesn't even > work.) On Windows, yes, but that still seems to me to be a bug in MS's code. If anyone had reported a core dump on any other platform, I'd be more tractable on this point. > We're only trying to protect against concurrent *reads*. As above, I believe that we could do a better job of that, then, on platforms that HAVE_GETC_UNLOCKED, by protecting not only against core dumps but also against .readline() not delivering an intact line from the file. >> But since FLOCKFILE is in effect, other threads *trying* to write >> to the stream we're reading will get blocked anyway. Seems to give us >> potential for deadlocks. > Only if tyeh are holding other locks at the same time. I'm not being clear, then. Thread X does f.readline(), on a HAVE_GETC_UNLOCKED platform. get_line allows other threads to run and invokes FLOCKFILE on f->f_fp. get_line's GETC in thread X eventually hits the end of the stdio buffer, and does its platform's version of _filbuf. _filbuf may wait (depending on the nature of the stream) for more input to show up. Simultaneously, thread Y attempts to write some data to f. But the *FLOCKFILE* lock prevents it from doing anything with f. So X is waiting for Y to write data inside platform _filbuf, but Y is waiting for X to release the platform stream lock inside some platform stream-output routine (if I'm being clear now, Python locks have nothing to do with this scenario: it's the platform stream lock). I think this is purely the user's fault if it happens. Just pointing it out as another insecurity we're probably not able to protect users from. > ... > Yeah. But this is insane use -- see my comments on SF. It's only > worth fixing because it could be used to intentionally crash Python -- > but there are easier ways... If it's unique to MS (as I suspect), I see no reason to even consider trying to fix it in Python. Unless the Perl Mongers use it to crash Zope . From cgw@fnal.gov Wed Jan 10 21:57:41 2001 From: cgw@fnal.gov (Charles G Waldman) Date: Wed, 10 Jan 2001 15:57:41 -0600 (CST) Subject: [Python-Dev] Interning filenames of imported modules Message-ID: <14940.56021.646147.770080@buffalo.fnal.gov> I have a question about the following code in compile.c:jcompile (line 3678) filename = PyString_InternFromString(sc.c_filename); name = PyString_InternFromString(sc.c_name); In the case of a long-running server which constantly imports modules, this causes the interned string dict to grow without bound. Is there a strong reason that the filename needs to be interned? How about the module name? How about some way to enforce a limit on the size of the interned strings dictionary? From mwh21@cam.ac.uk Wed Jan 10 22:02:49 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: Wed, 10 Jan 2001 22:02:49 +0000 (GMT) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5CC13F.DFB26A0B@ActiveState.com> Message-ID: On Wed, 10 Jan 2001, Paul Prescod wrote: > Guido van Rossum wrote: > > > > ... > > > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Why? From my point of view, the changes to dir() are much more > important. I seldom tell newbies about import * but I always tell them > how they can browse objects (especially modules) with dir. If dir() is > changed then IDEs and so forth would use that and inherit the right > behavior. If the module exporting behavior gets more sophisticated in a > future version of Python they will continue to inherit the behavior. Changing dir would also make rlcompleter nicer - it's something of a pain to use with a module that has, eg, "from TERMIOS import *"-ed. This might also make "from ... import *" less of a pariah... Sounds good to me, IOW. Cheers, M. From tim.one@home.com Wed Jan 10 22:23:14 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 10 Jan 2001 17:23:14 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101101639.LAA26776@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I bet you it's rejected on the basis of "the docs tell you not to mix > reading and writing on the same stream without intervening seek or > flush." If I were on the support line I would do that. So would I if I were a typical first-line support idiot . But the *implementers*-- if they ever see it --should be very keen to figure out how they managed to let the _iobuf get corrupted. *I'm* not mucking with their internals, nor doing wild pointer stores, nor anything else sneaky to subvert their locking protection. I wasn't even trying to break it. The only code reading from or storing into the _iobuf is theirs. They're ordinary stdio calls with ordinary arguments, and if *any* sequence of those can cause internal corruption, they've almost certainly got a problem that will manifest in other situations too. Think like an implementer here <0.5 wink>: they've lost track of how many characters are in the buffer despite a locking scheme whose purpose is to prevent that. If it were my implementation, that would be a top-priority bug no matter how silly the first program I saw that triggered it. but-willing-to-let-them-decide-whether-they-care-ly y'rs - tim From skip@mojam.com (Skip Montanaro) Wed Jan 10 22:52:55 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 10 Jan 2001 16:52:55 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5CC13F.DFB26A0B@ActiveState.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> <3A5CC13F.DFB26A0B@ActiveState.com> Message-ID: <14940.59335.723701.574821@beluga.mojam.com> Paul> Also, dir() could look for an __all__ on all objects including Paul> "module proxies", classes and "plain old instances". In other Paul> words we can extend the convention to other objects "for free". The __exports__/dir() patch I submitted will do this if you remove the PyModule_Check that guards it. Skip From tim.one@home.com Wed Jan 10 23:06:05 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 10 Jan 2001 18:06:05 -0500 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <3A5C4E44.23B593E9@per.dem.csiro.au> Message-ID: [Mark Favas] > Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same > behaviour as Tim's WinBox wrt the new xreadline and the double-loop > readlines (so it's not just something funny with MS (not that there's > not anything funny with MS...)): > > total 131426612 chars and 514216 lines You average over 255 chars/line? Really? What kind of file are you reading? I don't really want to measure the speed of line-at-a-time input on binary files where "line" doesn't actually make sense <0.6 wink>. > count_chars_lines 5.450 5.066 > readlines_sizehint 4.112 4.083 > using_fileinput 10.928 10.916 > while_readline 11.766 11.733 > for_xreadlines 3.569 3.533 Guido pointed out that his readlines_sizehint test forced use of a 1Mb buffer (in the call, not only the default value). For whatever reason, that was significantly slower than using an 8Kb sizehint on my box. Another oddity is that while_readline is slower than using_fileinput for you. From that I take it Python config does *not* #define HAVE_GETC_UNLOCKED on your platform. If that's true (or esp. if it's not!), would you do me a favor? Recompile fileobject.c with USE_MS_GETLINE_HACK #define'd, try the timing test again (while_readline is the most interesting test for this), and run the test_bufio.py std test to make sure you're actually getting the right answers. At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack whenever HAVE_GETC_UNLOCKED isn't available. I'd be surprised if ms_getline_hack failed to work correctly on any platform; a bigger unknown (to me) is whether it will yield a speedup. So far it yields a large speedup on Windows, and looks like a speedup equal to getc_unlocked() yields on Linux and Solaris. Info on a platform from Mars (like Tru64 Unix ) would be valuable in deciding whether to boost +0.5. don't-want-your-python-to-run-slower-than-possible-if-possible-ly y'rs - tim From tismer@tismer.com Wed Jan 10 22:38:57 2001 From: tismer@tismer.com (Christian Tismer) Date: Thu, 11 Jan 2001 00:38:57 +0200 Subject: [Python-Dev] [Stackless] ANN: Sourcecode for Stackless Python 2.0 Message-ID: <3A5CE481.24A7656@tismer.com> On Monday, Jan 8th, I spake """ Source code and an update to the website will become available in the next days. """ Now, here it is, together with a slightly updated website, which tries to mention all the people who are helping or sponsoring me (yes, there are sponsors!). If somebody feels ignored by me, let me know. I'm good at making mistakes. Let me also know if there are problems building the code, or if there are *no* problems understanding the code. I don't expect either :-) There is nearly no support for Unix, but Stackless *should* build on Unix as it did before without problems. enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From nas@arctrix.com Wed Jan 10 18:15:45 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Wed, 10 Jan 2001 10:15:45 -0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: ; from tim.one@home.com on Wed, Jan 10, 2001 at 06:06:05PM -0500 References: <3A5C4E44.23B593E9@per.dem.csiro.au> Message-ID: <20010110101545.A21305@glacier.fnational.com> On Wed, Jan 10, 2001 at 06:06:05PM -0500, Tim Peters wrote: > At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack > whenever HAVE_GETC_UNLOCKED isn't available. Leave it to the timbot use floating point votes. :) Compare ms_getline_hack to what Perl does in order speed up IO. I think its worth maintaining that piece of relatively portable code given the benefit. If the code has to be maintained then it might was well be used. If we find a platform the breaks we can always disable it before the final release. Neil From m.favas@per.dem.csiro.au Thu Jan 11 01:28:59 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 09:28:59 +0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <3A5D0C5B.162F624A@per.dem.csiro.au> [Tim produces a warped threader that crashes on MS OS's] >> ... >> NO, NO NO! Mixing reads and writes on the same stream wasn't what >> we are locking against at all. (As you've found out, it doesn't >> even work.) >On Windows, yes, but that still seems to me to be a bug in MS's code. >If anyone had reported a core dump on any other platform, I'd be more >tractable on this point. On Tru64 Unix, I get an infinite generator of 'r's (after an initial few 'w's) to the screen (but no crashes). If I reduce the size of the loop counters from 1000000 to 3000, I get the following output: opened w w w w w w w w w w w w w w w w w w w w w w w w w w w r read 5114 done -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From m.favas@per.dem.csiro.au Thu Jan 11 03:40:18 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 11:40:18 +0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint Message-ID: <3A5D2B22.B8028AC@per.dem.csiro.au> [Tim responded] >> >> total 131426612 chars and 514216 lines >You average over 255 chars/line? Really? What kind of file are you >reading? I don't really want to measure the speed of line-at-a-time >input on binary files where "line" doesn't actually make sense <0.6 wink>. Real-life input, my boy! It's actually a syslog from my mailserver, consisting mainly of sendmail log messages, and I have a current need to process these things (MS Exchange, corrupted database, clobbered backup tapes), so this thread came along at the right time... >Guido pointed out that his readlines_sizehint test forced use of a 1Mb >buffer (in the call, not only the default value). For whatever >reason, that was significantly slower than using an 8Kb sizehint on my >box. Removing the buffer size arg in the call to readlines_sizehint results in this (using up-to-the-minute CVS): total 131426612 chars and 514216 lines count_chars_lines 4.922 4.916 readlines_sizehint 3.881 3.850 using_fileinput 10.371 10.366 while_readline 10.943 10.916 for_xreadlines 2.990 2.967 and with an 8Kb sizehint: total 131426612 chars and 514216 lines count_chars_lines 5.241 5.216 readlines_sizehint 2.917 2.900 using_fileinput 10.351 10.333 while_readline 10.990 10.983 for_xreadlines 2.877 2.867 >Another oddity is that while_readline is slower than using_fileinput >for you. From that I take it Python config does *not* #define > > HAVE_GETC_UNLOCKED > >on your platform. If that's true Nope, HAVE_GETC_UNLOCKED is indeed #define'd >(or esp. if it's not!), would you do me a >favor? Recompile fileobject.c with > > USE_MS_GETLINE_HACK > >#define'd, try the timing test again (while_readline is the most >interesting test for this), and run the test_bufio.py std test to make >sure you're actually getting the right answers. Sure: With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd (although defining the former makes the latter def irrelevant): (test_bufio also OK) total 131426612 chars and 514216 lines count_chars_lines 5.056 5.050 readlines_sizehint 3.771 3.667 using_fileinput 11.128 11.116 while_readline 8.287 8.233 for_xreadlines 3.090 3.083 With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just for completeness): total 131426612 chars and 514216 lines count_chars_lines 4.916 4.900 readlines_sizehint 3.875 3.867 using_fileinput 14.404 14.383 while_readline 322.728 321.837 for_xreadlines 7.113 7.100 So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From nas@arctrix.com Wed Jan 10 21:55:23 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Wed, 10 Jan 2001 13:55:23 -0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <3A5D2B22.B8028AC@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Thu, Jan 11, 2001 at 11:40:18AM +0800 References: <3A5D2B22.B8028AC@per.dem.csiro.au> Message-ID: <20010110135523.A21894@glacier.fnational.com> On Thu, Jan 11, 2001 at 11:40:18AM +0800, Mark Favas wrote: [with getc_unlocked] > while_readline 10.943 10.916 [without] > while_readline 322.728 321.837 Holy crap. Great work team. Neil From tim.one@home.com Thu Jan 11 05:03:51 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 11 Jan 2001 00:03:51 -0500 Subject: [Python-Dev] Baffled on Windows Message-ID: In version 2.26 of mmapmodule.c, Guido replaced (as part of a contributed Cygwin patch): #ifdef MS_WIN32 __declspec(dllexport) void #endif /* MS_WIN32 */ #ifdef UNIX extern void #endif by: DL_EXPORT(void) before initmmap. 1. Windows Python can no longer import mmap: >>> import mmap Traceback (most recent call last): File "", line 1, in ? ImportError: dynamic module does not define init function (initmmap) >>> This is because GetProcAddress returns NULL. 2. Everything's fine if I revert Guido's change (although I assume that breaks Cygwin then). 3. DL_EXPORT(void) expands to "void". 4. The way mmapmodule.c is coded and built after Guido's change appears to me to be the same as how every other non-builtin module is coded and built on Windows. For example, winsound.c, which uses DL_EXPORT(void) before its initwinsound and where that macro also expands to "void". But importing winsound works fine. Since what I'm seeing makes no consistent sense, I'm at a loss how to fix it. But then I'm punch-drunk too <0.7 wink>. Any Windows geek got a clue? From tim.one@home.com Thu Jan 11 06:10:40 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 11 Jan 2001 01:10:40 -0500 Subject: [Python-Dev] RE: xreadline speed vs readlines_sizehint In-Reply-To: <3A5D2B22.B8028AC@per.dem.csiro.au> Message-ID: [Tim, to MarkF] >> You average over 255 chars/line? [nag, nag, nag] [Mark Favas] > Real-life input, my boy! It's actually a syslog from my > mailserver, consisting mainly of sendmail log messages, and I > have a current need to process these things (MS Exchange, > corrupted database, clobbered backup tapes), so this thread > came along at the right time... Hmm. I tuned ms_getline_hack for Guido's logfiles, which he said don't often exceed 160 chars/line. I guess if you're on a 64-bit platform, though, it must take about twice as many chars per line to record a log msg . > ... > Removing the buffer size arg in the call to readlines_sizehint results > in this (using up-to-the-minute CVS): > total 131426612 chars and 514216 lines > count_chars_lines 4.922 4.916 > readlines_sizehint 3.881 3.850 > using_fileinput 10.371 10.366 > while_readline 10.943 10.916 > for_xreadlines 2.990 2.967 > > and with an 8Kb sizehint: > total 131426612 chars and 514216 lines > count_chars_lines 5.241 5.216 > readlines_sizehint 2.917 2.900 > using_fileinput 10.351 10.333 > while_readline 10.990 10.983 > for_xreadlines 2.877 2.867 That's sure consistent across platforms, then. I guess we'll write it off to "cache effects" (a catch-all explanation for any timing mystery -- go ahead, just *try* to prove it's wrong <0.5 wink>). [and Mark has HAVE_GETC_UNLOCKED on his Tru64 Unix box, yet using_fileinput is quicker than while_readline] > With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd > (although defining the former makes the latter def irrelevant): > (test_bufio also OK) > total 131426612 chars and 514216 lines > count_chars_lines 5.056 5.050 > readlines_sizehint 3.771 3.667 > using_fileinput 11.128 11.116 > while_readline 8.287 8.233 > for_xreadlines 3.090 3.083 So ms_getline_hack is significantly faster on your box (I'm only looking at while_readline: 11 using getc_unlocked, 8.3 using ms_getline_hack). There are only two reasons I can imagine for that: 1. Your vendor optimizes the inner loop in fgets (as all vendors should, but few do). and/or 2. Despite the long average length of your lines, many of them are nevertheless shorter than 200 chars, and so all the pain ms_getline_hack endures to avoid a realloc pays off. Unfortunately, there's not enough info to figure out if either, both, or none of those are on-target. It's such a large percentage speedup, though, that my bet goes primarily to #1 -- unless realloc is really pig slow on your box. Which some things *are*: > With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just > for completeness): > total 131426612 chars and 514216 lines > count_chars_lines 4.916 4.900 > readlines_sizehint 3.875 3.867 > using_fileinput 14.404 14.383 > while_readline 322.728 321.837 > for_xreadlines 7.113 7.100 > > So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement > Yes, that's the "platform from Mars" evidence I was seeking: if ms_getline_hack survives test_bufio on *your* crazy box, it's as close to provably correct as any algorithm in all of Computer Science . a-factor-of-39-is-almost-big-enough-to-notice!-ly y'rs - tim From m.favas@per.dem.csiro.au Thu Jan 11 07:26:37 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 15:26:37 +0800 Subject: [Python-Dev] Re: xreadline speed vs readlines_sizehint References: Message-ID: <3A5D602D.9DC991CB@per.dem.csiro.au> [Tim speculates on getc_unlocked and his ms_getline_hack]: > > So ms_getline_hack is significantly faster on your box (I'm only > looking at while_readline: 11 using getc_unlocked, 8.3 using > ms_getline_hack). There are only two reasons I can imagine for that: > > 1. Your vendor optimizes the inner loop in fgets (as all vendors > should, but few do). Digital engineering, Compaq management/marketing <0.6 wink> > > and/or > > 2. Despite the long average length of your lines, many of them are > nevertheless shorter than 200 chars, and so all the pain > ms_getline_hack endures to avoid a realloc pays off. > > Unfortunately, there's not enough info to figure out if either, both, > or none of those are on-target. It's such a large percentage > speedup, though, that my bet goes primarily to #1 -- unless realloc > is really pig slow on your box. The lines range in length from 96 to 747 characters, with 11% @ 233, 17% @ 252 and 52% @ 254 characters, so #1 looks promising - most lines are long enough to trigger a realloc. Cranking up INITBUFSIZE in ms_getline_hack to 260 from 200 improves thing again, by another 25%: total 131426612 chars and 514216 lines count_chars_lines 5.081 5.066 readlines_sizehint 3.743 3.717 using_fileinput 11.113 11.100 while_readline 6.100 6.083 for_xreadlines 3.027 3.033 Apart from the name , I like ms_getline_hack... tho'-a-factor-of-100-makes-xreadlines-a-welcome-addition!-ly y'rs -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From m.favas@per.dem.csiro.au Thu Jan 11 09:08:29 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 17:08:29 +0800 Subject: [Python-Dev] Current CVS version of sysmodule.c fails to compile Message-ID: <3A5D780D.62D0F473@per.dem.csiro.au> On Tru64 Unix, with Compaq's C/CXX compilers, the current CVS version of sysmodule.c produces the following errors: cc -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c -o sysmodule.o sysmodule.c cc: Error: sysmodule.c, line 73: Invalid declarator. (declarator) PyObject *o, *stdout; ----------------------^ cc: Error: sysmodule.c, line 79: In this statement, "o" is not declared. (undeclared) if (!PyArg_ParseTuple(args, "O:displayhook", &o)) ------------------------------------------------------^ cc: Error: sysmodule.c, line 93: In this statement, "(&_iob[1])" is not an lvalue, but occurs in a context that requires one. (needlvalue) stdout = PySys_GetObject("stdout"); --------^ cc: Warning: sysmodule.c, line 98: In this statement, the referenced type of the pointer value "(&_iob[1])" is "struct declared without a tag", which is not compatible with "struct _object". (ptrmismatch) if (PyFile_WriteObject(o, stdout, 0) != 0) ----------------------------------^ cc: Warning: sysmodule.c, line 100: In this statement, the referenced type of the pointer value "(&_iob[1])" is "struct declared without a tag", which is not compatible with "struct _object". (ptrmismatch) PyFile_SoftSpace(stdout, 1); -------------------------^ The problem is that stdout is a macro #define'd in stdio.h as (&_iob[1]) (stdin and stderr also are similarly #define'd). -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From gstein@lyra.org Thu Jan 11 09:18:44 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 01:18:44 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.216,2.217 sysmodule.c,2.80,2.81 In-Reply-To: ; from moshez@users.sourceforge.net on Wed, Jan 10, 2001 at 09:41:29PM -0800 References: Message-ID: <20010111011843.W4640@lyra.org> On Wed, Jan 10, 2001 at 09:41:29PM -0800, Moshe Zadka wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv21213/Python > > Modified Files: > ceval.c sysmodule.c >... > --- 1246,1269 ---- > case PRINT_EXPR: > v = POP(); > ! w = PySys_GetObject("displayhook"); > ! if (w == NULL) { > ! PyErr_SetString(PyExc_RuntimeError, > ! "lost sys.displayhook"); > ! err = -1; > } > + if (err == 0) { > + x = Py_BuildValue("(O)", v); > + if (x == NULL) > + err = -1; > + } > + if (err == 0) { > + w = PyEval_CallObject(w, x); > + if (w == NULL) > + err = -1; > + } > Py_DECREF(v); > + Py_XDECREF(x); x was never initialized to NULL. In fact, the loop sets it to Py_None. If you get an error in the initial "w" setup case, then you could erroneously decref None. Further, there is no DECREF for the CallObject result ("w"). But watch out: you don't want to DECREF the PySys_GetObject result (that is a borrowed reference). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Jan 11 09:28:16 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 01:28:16 -0800 Subject: [Python-Dev] Current CVS version of sysmodule.c fails to compile In-Reply-To: <3A5D780D.62D0F473@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Thu, Jan 11, 2001 at 05:08:29PM +0800 References: <3A5D780D.62D0F473@per.dem.csiro.au> Message-ID: <20010111012815.X4640@lyra.org> You're quite right! I've checked in a change, renaming it to "outf". Cheers, -g On Thu, Jan 11, 2001 at 05:08:29PM +0800, Mark Favas wrote: > On Tru64 Unix, with Compaq's C/CXX compilers, the current CVS version of > sysmodule.c produces the following errors: > > cc -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c -o > sysmodule.o sysmodule.c > cc: Error: sysmodule.c, line 73: Invalid declarator. (declarator) > PyObject *o, *stdout; > ----------------------^ > cc: Error: sysmodule.c, line 79: In this statement, "o" is not declared. > (undeclared) > if (!PyArg_ParseTuple(args, "O:displayhook", &o)) > ------------------------------------------------------^ > cc: Error: sysmodule.c, line 93: In this statement, "(&_iob[1])" is not > an lvalue, but occurs in a context that requires one. (needlvalue) > stdout = PySys_GetObject("stdout"); > --------^ > cc: Warning: sysmodule.c, line 98: In this statement, the referenced > type of the pointer value "(&_iob[1])" is "struct declared without a > tag", which is not compatible with "struct _object". (ptrmismatch) > if (PyFile_WriteObject(o, stdout, 0) != 0) > ----------------------------------^ > cc: Warning: sysmodule.c, line 100: In this statement, the referenced > type of the pointer value "(&_iob[1])" is "struct declared without a > tag", which is not compatible with "struct _object". (ptrmismatch) > PyFile_SoftSpace(stdout, 1); > -------------------------^ > > The problem is that stdout is a macro #define'd in stdio.h as (&_iob[1]) > (stdin and stderr also are similarly #define'd). > > -- > Mark Favas - m.favas@per.dem.csiro.au > CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Greg Stein, http://www.lyra.org/ From skip@mojam.com (Skip Montanaro) Thu Jan 11 14:13:55 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 11 Jan 2001 08:13:55 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: References: Message-ID: <14941.49059.26189.733094@beluga.mojam.com> Moshe> * Did not DECREF result from displayhook function ... Moshe> w = PyEval_CallObject(w, x); Moshe> + Py_XDECREF(w); Moshe> if (w == NULL) ... While it works, is it really kosher to test w's value after the DECREF? Just seems like an odd construct to me. I'm used to seeing the test immediately after it's been set. Skip From guido@python.org Thu Jan 11 14:44:58 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 09:44:58 -0500 Subject: [Python-Dev] Interning filenames of imported modules In-Reply-To: Your message of "Wed, 10 Jan 2001 15:57:41 CST." <14940.56021.646147.770080@buffalo.fnal.gov> References: <14940.56021.646147.770080@buffalo.fnal.gov> Message-ID: <200101111444.JAA14597@cj20424-a.reston1.va.home.com> > I have a question about the following code in compile.c:jcompile (line 3678) > > filename = PyString_InternFromString(sc.c_filename); > name = PyString_InternFromString(sc.c_name); > > In the case of a long-running server which constantly imports modules, > this causes the interned string dict to grow without bound. Is there > a strong reason that the filename needs to be interned? How about the > module name? It's probably not *necessary* for the filename, but I know why I am interning it: since a module typically contains a bunch of functions, and each function has its own code object with a reference to the filename, I'm trying to save memory (the filename is a C string pointer in the "sc" structure, so it has to be turned into a Python string when creating the code object). The module name is used as an identifier elsewhere so will become interned anyway. > How about some way to enforce a limit on the size of the interned > strings dictionary? I've never thought of this -- but I suppose that a weak dictionary could be used. Fred's working on a PEP for weak references, so there's a chance that we might use this eventually. In the mean time, a possibility would be to provide a service function that goes through the "interned" dictionary and looks for values with a reference count of 1, and deletes them. You could then explicitly call this service function occasionally in your program. I would let it return a tuple: (number of values kept, number of values deleted). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jan 11 15:08:48 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:08:48 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 13:49:40 EST." References: Message-ID: <200101111508.KAA14870@cj20424-a.reston1.va.home.com> > They're indistinguishable then on my box (on one run xreadlines is .1 > seconds (out of around 7.6 total) quicker, on another readlines_sizehint), > *provided* that I specify the same buffer size (8192) that xreadlines uses > internally. However, if I even double that, readlines_sizehint is uniformly > about 10% slower. It's also a tiny bit slower if I cut the sizehint buffer > size to 4096. > > I'm afraid Mysteries will remain no matter how many person-decades we spend > staring at this <0.5 wink> ... 8192 happens to be the size of the stack-allocated buffer readlines() uses, and also the stdio BUFSIZ parameter, on many systems. Look for SMALLCHUNK in fileobject.c. Would it make sense to tie the two constants together more to tune this optimally even when BUFSIZ is different? --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Thu Jan 11 15:09:54 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Thu, 11 Jan 2001 10:09:54 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> <200101101900.OAA30486@cj20424-a.reston1.va.home.com> Message-ID: <14941.52418.18484.898061@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> It was more work than I had hoped for, because Eric GvR> apparently (despite having developer privileges!) doesn't use GvR> the CVS tree -- he sent in a diff relative to the 2.0 GvR> release. I munged it into place, adding the feature that GvR> readline, _curses and bsdddb are built as shared libraries by GvR> default. You'd have to edit Setup.config.in to change this. GvR> Hope this doesn't break anybody's setup. (Skip???) We may need to move dbm module to Setup.config from Setup and build it shared too. The problem I ran into when building the pybsddb3 module was that even though I'd built the standard bsddb shared, I was also building in dbm statically. This pulled in a dependency to the old db.so module (under RH6.1) and core dumped me during the test suite for pybsddb. Commenting out dbm did the trick, so building it shared should work too. Couple of things: dbm isn't enabled by default I believe so moving it to Setup.config may not be the right thing after all (would that imply an autoconf test and auto-enabling if it's detected?) Also, Andrew's distutils-based build procedure may obviate the need for this change. -Barry From ping@lfw.org Thu Jan 11 15:14:17 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 07:14:17 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: On Wed, 10 Jan 2001, Guido van Rossum wrote: > Yes -- I came up with the same thought. > > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. Please don't use __all__. At the moment, __all__ is the only way to easily tell whether a particular module object really represents a package, and the only way to get the list of submodule names. If __all__ is overloaded to also represent exportable symbols in modules, these two pieces of information will be impossible (or require much ugly hackery) to obtain. -- ?!ng From guido@python.org Thu Jan 11 15:23:26 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:23:26 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 15:25:24 EST." References: Message-ID: <200101111523.KAA14982@cj20424-a.reston1.va.home.com> > [Tim] > >> Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method > >> to keep the file locked until the line was complete, and I > >> wouldn't be opposed to making life saner on platforms that allow it. > > [Guido] > > Hm... That would be possible, except for one unfortunate detail: > > _PyString_Resize() may call PyErr_BadInternalCall() which touches > > thread state. [Tim] > FLOCKFILE/FUNLOCKFILE are independent of Python's notion of thread state. > IOW, do FLOCKFILE once before the for(;;), and FUNLOCKFILE once on every > *exit* path thereafter. We can block/unblock Python threads as often as > desired between those *file*-locking brackets. The only thing the repeated > FLOCKFILE/FUNLOCKFILE calls do to my eyes now is to create the *possibility* > for multiple readers to get partial lines of the file. I don't want to call FLOCKFILE while holding the Python lock, as this means that *if* we're blocked in FLOCKFILE (e.g. we're reading from a pipe or socket), no other Python thread can run! > > ... > > NO, NO NO! Mixing reads and writes on the same stream wasn't what we > > are locking against at all. (As you've found out, it doesn't even > > work.) > > On Windows, yes, but that still seems to me to be a bug in MS's code. If > anyone had reported a core dump on any other platform, I'd be more tractable > on this point. Yes, it's a Windows bug. > > We're only trying to protect against concurrent *reads*. > > As above, I believe that we could do a better job of that, then, on > platforms that HAVE_GETC_UNLOCKED, by protecting not only against core dumps > but also against .readline() not delivering an intact line from the file. See above for a reason why I think that's not safe. I think that applications that want to do this can do their own locking. (They'll find out soon enough that readline() isn't atomic. :-) > >> But since FLOCKFILE is in effect, other threads *trying* to write > >> to the stream we're reading will get blocked anyway. Seems to give us > >> potential for deadlocks. > > > Only if tyeh are holding other locks at the same time. > > I'm not being clear, then. Thread X does f.readline(), on a > HAVE_GETC_UNLOCKED platform. get_line allows other threads to run and > invokes FLOCKFILE on f->f_fp. get_line's GETC in thread X eventually hits > the end of the stdio buffer, and does its platform's version of _filbuf. > _filbuf may wait (depending on the nature of the stream) for more input to > show up. Simultaneously, thread Y attempts to write some data to f. But > the *FLOCKFILE* lock prevents it from doing anything with f. So X is > waiting for Y to write data inside platform _filbuf, but Y is waiting for X > to release the platform stream lock inside some platform stream-output > routine (if I'm being clear now, Python locks have nothing to do with this > scenario: it's the platform stream lock). I don't think that _filbuf can possibly wait for another thread to write data to the same stream object. A single stream object doesn't act like a pipe, even if it is open for simultaneous reading and writing. So if there's no more data in the file, _fulbuf will simply return with an EOF status, not wait for the data that the other thread would write. > I think this is purely the user's fault if it happens. Just pointing it out > as another insecurity we're probably not able to protect users from. I don't think this can happen. > > ... > > Yeah. But this is insane use -- see my comments on SF. It's only > > worth fixing because it could be used to intentionally crash Python -- > > but there are easier ways... > > If it's unique to MS (as I suspect), I see no reason to even consider trying > to fix it in Python. Unless the Perl Mongers use it to crash Zope . OK. It's unique to MS. So close the bug report with a "won't fix" resolution. There's no point in having bug reports remain open that we know we can't fix. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jan 11 15:27:05 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:27:05 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 17:23:14 EST." References: Message-ID: <200101111527.KAA15005@cj20424-a.reston1.va.home.com> > Think like an implementer here <0.5 wink>: they've lost track of how many > characters are in the buffer despite a locking scheme whose purpose is to > prevent that. If it were my implementation, that would be a top-priority > bug no matter how silly the first program I saw that triggered it. The locking prevents concurrent threads accessing the stream. But mixing reads and writes (without intervening fseek etc.) is illegal use of the stream, and the C standard allows them to be lax here, even if the program was single-threaded. In other words: the locking is so good that it serializes the sequence of reads and writes; but if the sequence of reads and writes is illegal, they don't guarantee anything. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jan 11 15:28:23 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:28:23 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 11 Jan 2001 09:28:59 +0800." <3A5D0C5B.162F624A@per.dem.csiro.au> References: <3A5D0C5B.162F624A@per.dem.csiro.au> Message-ID: <200101111528.KAA15021@cj20424-a.reston1.va.home.com> > On Tru64 Unix, I get an infinite generator of 'r's (after an initial few > 'w's) to the screen (but no crashes). Same here on Linux. > If I reduce the size of the loop > counters from 1000000 to 3000, I get the following output: > opened > w w w w w w w w w w w w w w w w w w w w w w w w w w w r read 5114 > done I still get an infinite amount of 'r's. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Thu Jan 11 15:28:21 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 16:28:21 +0100 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: ; from tim.one@home.com on Sun, Jan 07, 2001 at 11:13:26PM -0500 References: Message-ID: <20010111162820.W2467@xs4all.nl> On Sun, Jan 07, 2001 at 11:13:26PM -0500, Tim Peters wrote: > I'm curious about how it performs (relative to the getc_unlocked hack) on > other platforms. If you'd like to try that, just recompile fileobject.c > with > USE_MS_GETLINE_HACK > #define'd. It should *work* on any platform with fgets() meeting the > assumption. The new test_bufio.py std test gives it a pretty good > correctness workout, if you're worried about that. FreeBSD seems to work fine. Speed is practically the same as without USE_MS_GETLINE_HACK (but with HAVE_GETC_UNLOCKED), though still not quite the same as before all this hackery :-) Not by much though. For most tests it's smaller than the margin of error, though the difference is still as much as 20, 30% for the while_readline test. When using a second thread somewhere in the test, the difference vanishes further. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Thu Jan 11 15:33:28 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 11 Jan 2001 16:33:28 +0100 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A5DD248.8EE0DF63@lemburg.com> Ka-Ping Yee wrote: > > On Wed, 10 Jan 2001, Guido van Rossum wrote: > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package, and the only way to get the list of submodule names. But __all__ has to be user-defined, so I don't buy that argument. Note that the only true way to recognize a package is by looking for an attribute "__path__" since Python adds this for packages only. > If __all__ is overloaded to also represent exportable symbols in > modules, these two pieces of information will be impossible (or > require much ugly hackery) to obtain. Again, __all__ is not automatically generated, so trusting it doesn't get you very far. To be able to find subpackages you will always have to apply some hackery (based on __path__) in order to be sure. It would be better to add a helper function to packages to query this kind of information -- the package usually knows best where to look and what to look for. Note that __all__ was explicitly invented to be used by from package import * so I think it is the right choice here. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Thu Jan 11 15:37:19 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 11 Jan 2001 10:37:19 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <14941.52418.18484.898061@anthem.wooz.org>; from barry@digicool.com on Thu, Jan 11, 2001 at 10:09:54AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> <200101101900.OAA30486@cj20424-a.reston1.va.home.com> <14941.52418.18484.898061@anthem.wooz.org> Message-ID: <20010111103719.A7191@thyrsus.com> GvR> It was more work than I had hoped for, because Eric GvR> apparently (despite having developer privileges!) doesn't use GvR> the CVS tree -- he sent in a diff relative to the 2.0 GvR> release. I'm using the CVS tree now. I did that patch relative to 2.0 for boring reasons having to do with the state of my laptop. -- Eric S. Raymond The IRS has become morally corrupted by the enormous power which we in Congress have unwisely entrusted to it. Too often it acts like a Gestapo preying upon defenseless citizens. -- Senator Edward V. Long From thomas@xs4all.net Thu Jan 11 15:48:32 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 16:48:32 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5DD248.8EE0DF63@lemburg.com>; from mal@lemburg.com on Thu, Jan 11, 2001 at 04:33:28PM +0100 References: <3A5DD248.8EE0DF63@lemburg.com> Message-ID: <20010111164831.X2467@xs4all.nl> On Thu, Jan 11, 2001 at 04:33:28PM +0100, M.-A. Lemburg wrote: > > Please don't use __all__. At the moment, __all__ is the only way > > to easily tell whether a particular module object really represents > > a package, and the only way to get the list of submodule names. > > But __all__ has to be user-defined, so I don't buy that argument. > Note that the only true way to recognize a package is by looking > for an attribute "__path__" since Python adds this for packages > only. Ehm.... What, exactly, prevents usercode from doing __path__ = "neener, neener" ? In other words, even *that* isn't a true way to recognize a package. You can see what isn't a package, but not what is. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Thu Jan 11 15:58:55 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:58:55 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 07:14:17 PST." References: Message-ID: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package, and the only way to get the list of submodule names. > > If __all__ is overloaded to also represent exportable symbols in > modules, these two pieces of information will be impossible (or > require much ugly hackery) to obtain. Marc-Andre already explained that __all__ is not to be trusted. If you want a reasonably good test for package-ness, use the presence of __path__. For a really good test, check whether __file__ ends in __init__.py[c]. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Thu Jan 11 16:14:00 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 11 Jan 2001 11:14:00 -0500 Subject: [Python-Dev] PEP 229: setup.py revised Message-ID: I've put a new version of the setup.py script at http://www.mems-exchange.org/software/files/python/setup.py (I'm at work and can't remember the password to get into www.amk.ca. :) ) This version improves the detection of Tcl/Tk, handles the _curses_panel module, and doesn't do a chdir(). Same drill as before: just grab the script, drop it in the root of your Python source tree (2.0 or current CVS), run "./python setup.py build", and look at the modules it compiles. I can try it on Linux, so I'm most interested in hearing reports for other Unix versions (*BSD, HP-UX, etc.) --amk From ping@lfw.org Thu Jan 11 16:36:36 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 08:36:36 -0800 (PST) Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) Message-ID: I'm pleased to announce a reasonable first pass at a documentation utility for interactive use. "pydoc" is usable in three ways: 1. At the shell prompt, "pydoc " displays documentation on , very much like "man". 2. At the shell prompt, "pydoc -k " lists modules whose one-line descriptions mention the keyword, like "man -k". 3. Within Python, "from pydoc import help" provides a "help" function to display documentation at the interpreter prompt. All of them use sys.path in order to guarantee that the documentation you see matches the modules you get. To try "pydoc", download: http://www.lfw.org/python/pydoc.py http://www.lfw.org/python/htmldoc.py http://www.lfw.org/python/textdoc.py http://www.lfw.org/python/inspect.py I would very much appreciate your feedback, especially from testing on non-Unix platforms. Thank you! I've pasted some examples from my shell below (when you actually run pydoc, the output is piped through "less", "more", or a pager implemented in Python, depending on what is available). -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler skuld[1268]% pydoc -k mail mailbox - Classes to handle Unix style, MMDF style, and MH style mailboxes. mailcap - Mailcap file handling. See RFC 1524. mimify - Mimification and unmimification of mail messages. test.test_mailbox - (no description) skuld[1269]% pydoc -k text textdoc - Generate text documentation from live Python objects. collab - Routines for collaboration, especially group editing of text documents. gettext - Internationalization and localization support. test.test_gettext - (no description) curses.textpad - Simple textbox editing widget with Emacs-like keybindings. distutils.text_file - text_file ScrolledText - (no description) skuld[1270]% pydoc -k html htmldoc - Generate HTML documentation from live Python objects. htmlentitydefs - HTML character entity references. htmllib - HTML 2.0 parser. skuld[1271]% pydoc md5 Python Library Documentation: built-in module md5 NAME md5 FILE (built-in) DESCRIPTION This module implements the interface to RSA's MD5 message digest algorithm (see also Internet RFC 1321). Its use is quite straightforward: use the new() to create an md5 object. You can now feed this object with arbitrary strings using the update() method, and at any point you can ask it for the digest (a strong kind of 128-bit checksum, a.k.a. ``fingerprint'') of the contatenation of the strings fed to it so far using the digest() method. Functions: new([arg]) -- return a new md5 object, initialized with arg if provided md5([arg]) -- DEPRECATED, same as new, but for compatibility Special Objects: MD5Type -- type object for md5 objects FUNCTIONS md5(no arg info) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. new(no arg info) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. skuld[1272]% pydoc types Python Library Documentation: module types NAME types FILE /home/ping/sw/Python-1.5.2/Lib/types.py DESCRIPTION # Define names for all type symbols known in the standard interpreter. # Types that are part of optional modules (e.g. array) are not listed. skuld[1273]% pydoc abs Python Library Documentation: built-in function abs abs (no arg info) abs(number) -> number Return the absolute value of the argument. skuld[1274]% pydoc repr Python Library Documentation: built-in function repr repr (no arg info) repr(object) -> string Return the canonical string representation of the object. For most object types, eval(repr(object)) == object. Python Library Documentation: module repr NAME repr - # Redo the `...` (representation) but with limits on most sizes. FILE /home/ping/sw/Python-1.5.2/Lib/repr.py CLASSES Repr class Repr __init__(self) repr(self, x) repr1(self, x, level) repr_dictionary(self, x, level) repr_instance(self, x, level) repr_list(self, x, level) repr_long_int(self, x, level) repr_string(self, x, level) repr_tuple(self, x, level) FUNCTIONS repr(no arg info) skuld[1275]% pydoc re.MatchObject Python Library Documentation: class MatchObject in re class MatchObject __init__(self, re, string, pos, endpos, regs) end(self, g=0) Return the end of the substring matched by group g group(self, *groups) Return one or more groups of the match groupdict(self, default=None) Return a dictionary containing all named subgroups of the match groups(self, default=None) Return a tuple containing all subgroups of the match object span(self, g=0) Return (start, end) of the substring matched by group g start(self, g=0) Return the start of the substring matched by group g skuld[1276]% pydoc xml Python Library Documentation: package xml NAME xml - Core XML support for Python. FILE /home/ping/dev/python/dist/src/Lib/xml/__init__.py DESCRIPTION This package contains three sub-packages: dom -- The W3C Document Object Model. This supports DOM Level 1 + Namespaces. parsers -- Python wrappers for XML parsers (currently only supports Expat). sax -- The Simple API for XML, developed by XML-Dev, led by David Megginson and ported to Python by Lars Marius Garshol. This supports the SAX 2 API. VERSION 1.8 skuld[1277]% pydoc lovelyspam no Python documentation found for lovelyspam skuld[1278]% python Python 1.5.2 (#1, Dec 12 2000, 02:25:44) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> >>> from pydoc import help >>> help(int) Help on built-in function int: int (no arg info) int(x) -> integer Convert a string or number to an integer, if possible. A floating point argument will be truncated towards zero. >>> help("urlparse.urljoin") Help on function urljoin in module urlparse: urljoin(base, url, allow_fragments=1) # Join a base URL and a possibly relative URL to form an absolute # interpretation of the latter. >>> import random >>> help(random.generator) Help on class generator in module random: class generator(whrandom.whrandom) Random generator class. __init__(self, a=None) Constructor. Seed from current time or hashable value. seed(self, a=None) Seed the generator from current time or hashable value. >>> From moshez@zadka.site.co.il Fri Jan 12 00:48:30 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 02:48:30 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5C97F4.945D0C1@lemburg.com> References: <3A5C97F4.945D0C1@lemburg.com>, <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il> On Wed, 10 Jan 2001 18:12:20 +0100, "M.-A. Lemburg" wrote: > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > +1 -- this won't be me though (at least not this week). I'm working on it -- I'll have a patch ready as soon as my slow modem will manage to finish the "cvs diff". Guido, I'll assign it to you, OK? > Cool. This could make Python instances usable as "modules" > -- with full getattr() hook support ! My Patch already does that -- if the instance supports __all__ > For IMPORT_STAR I'd suggest first looking for __all__ and > then reverting to __dict__.items() in case this fails. That's what my patch is doing. > BTW, is __dict__ needed by the import mechanism or would > the getattr/setattr slots suffice ? And if yes, must it > be a real Python dictionary ? My patch works with getattr (no setattr) as longs as there is an __all__ attribute. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From ping@lfw.org Thu Jan 11 16:42:44 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 08:42:44 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Guido van Rossum wrote: > > Marc-Andre already explained that __all__ is not to be trusted. > > If you want a reasonably good test for package-ness, use the presence > of __path__. Sorry, you're right. I retract my comment about __all__. -- ?!ng From skip@mojam.com (Skip Montanaro) Thu Jan 11 16:47:13 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 11 Jan 2001 10:47:13 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010111164831.X2467@xs4all.nl> References: <3A5DD248.8EE0DF63@lemburg.com> <20010111164831.X2467@xs4all.nl> Message-ID: <14941.58257.304339.437443@beluga.mojam.com> Thomas> __path__ = "neener, neener" I believe correct English usage here is "neener, neener, neener", with a little extra emphasis on the first syllable of the third "neener"... does-that-help?-ly y'rs, Skip From MarkH@ActiveState.com Fri Jan 12 16:55:29 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Fri, 12 Jan 2001 08:55:29 -0800 Subject: [Python-Dev] RE: Baffled on Windows In-Reply-To: Message-ID: > 4. The way mmapmodule.c is coded and built after Guido's change appears to > me to be the same as how every other non-builtin module is coded and built > on Windows. For example, winsound.c, which uses DL_EXPORT(void) > before its > initwinsound and where that macro also expands to "void". But importing > winsound works fine. winsound adds "/export:initwinsound" to the link line. This is an alternative to __declspec in the sources. This all gets back to a discussion we had here nearly a year or so ago - that "DL_EXPORT" isnt capturing our semantics, and that we should probably create #defines that match the _intent_ of the definition, rather than the implementation details - ie, replace DL_EXPORT with (say) PY_API_DECL and PY_MODULEINIT_DECL or some such. I'm happy to think about this and help implement it if the time is now right... > Any Windows geek got a clue? Isn't that question a paradox? ;-) Mark. From skip@mojam.com (Skip Montanaro) Thu Jan 11 17:11:23 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 11 Jan 2001 11:11:23 -0600 (CST) Subject: [Python-Dev] dir()/__all__/etc Message-ID: <14941.59707.632995.224116@beluga.mojam.com> I know Guido has said he doesn't want to fiddle with dir(), but my sense of things from the overall discussion of the __exports__ concept tells me that when used interactively dir() often presents confusing output for new Python users. I twiddled CGIHTTPServer to have __all__ and added the following dir() function to my PYTHONSTARTUP file: def dir(o,showall=0): if not showall and hasattr(o, "__all__"): x = list(o.__all__) x.sort() return x from __builtin__ import dir as d return d(o) Compare its output with and without showall set: >>> dir(CGIHTTPServer) ['CGIHTTPRequestHandler', 'test'] >>> dir(CGIHTTPServer,1) ['BaseHTTPServer', 'CGIHTTPRequestHandler', 'SimpleHTTPServer', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__version__', 'executable', 'nobody', 'nobody_uid', 'os', 'string', 'sys', 'test', 'urllib'] I haven't demonstrated any great programming prowess with this little function, but I rather suspect it may be beyond most brand new users. If Guido can't be convinced to allow dir() to change, how about adding a sample PYTHONSTARTUP file to the distribution that contains little bits like this and Ping's pydoc.help stuff (assuming it gets into the distro, which I hope it does)? Skip From mal@lemburg.com Thu Jan 11 17:25:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 11 Jan 2001 18:25:20 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <3A5DD248.8EE0DF63@lemburg.com> <20010111164831.X2467@xs4all.nl> Message-ID: <3A5DEC80.596F0818@lemburg.com> Thomas Wouters wrote: > > On Thu, Jan 11, 2001 at 04:33:28PM +0100, M.-A. Lemburg wrote: > > > > Please don't use __all__. At the moment, __all__ is the only way > > > to easily tell whether a particular module object really represents > > > a package, and the only way to get the list of submodule names. > > > > But __all__ has to be user-defined, so I don't buy that argument. > > Note that the only true way to recognize a package is by looking > > for an attribute "__path__" since Python adds this for packages > > only. > > Ehm.... What, exactly, prevents usercode from doing > > __path__ = "neener, neener" > > ? In other words, even *that* isn't a true way to recognize a package. You > can see what isn't a package, but not what is. Purists.... ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez@zadka.site.co.il Fri Jan 12 02:06:37 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:06:37 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <14941.49059.26189.733094@beluga.mojam.com> References: <14941.49059.26189.733094@beluga.mojam.com>, Message-ID: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 08:13:55 -0600 (CST), Skip Montanaro wrote: > While it works, is it really kosher to test w's value after the DECREF? Yes. It may not point to anything valid, but it won't be NULL. > Just seems like an odd construct to me. I'm used to seeing the test > immediately after it's been set. It was more convenient that way. And I'm pretty certain the _DECREF macros do not change their arguments. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez@zadka.site.co.il Fri Jan 12 02:09:13 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:09:13 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: References: Message-ID: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 07:14:17 -0800 (PST), Ka-Ping Yee wrote: > On Wed, 10 Jan 2001, Guido van Rossum wrote: > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package Why not __init__? It has to be there, and is in no other module object. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez@zadka.site.co.il Fri Jan 12 02:23:16 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:23:16 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il> References: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il>, <3A5C97F4.945D0C1@lemburg.com>, <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010112022316.BE682A82D@darjeeling.zadka.site.co.il> On Fri, 12 Jan 2001, Moshe Zadka wrote: > I'm working on it -- I'll have a patch ready as soon as my slow > modem will manage to finish the "cvs diff". Guido, I'll > assign it to you, OK? OK, it's 103200. Unfortunately, I couldn't assign it to Guido, since I couldn't upload it at all (yeah, still those lynx problems). This time I managed to get one specific person to upload for me, but someone else will have to assign to Guido. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From nas@arctrix.com Thu Jan 11 11:42:51 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 11 Jan 2001 03:42:51 -0800 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: ; from akuchlin@mems-exchange.org on Thu, Jan 11, 2001 at 11:14:00AM -0500 References: Message-ID: <20010111034251.A23512@glacier.fnational.com> Here is what I get on my Debian Linux machine: _codecs.so cPickle.so imageop.so pwd.so termios.so _curses.so cStringIO.so linuxaudiodev.so regex.so time.so _curses_panel.so cmath.so math.so resource.so timing.so _locale.so crypt.so md5.so rgbimg.so ucnhash.so _socket.so dbm.so mmap.so rotor.so unicodedata.so _tkinter.so errno.so new.so select.so zlib.so array.so fcntl.so nis.so sha.so audioop.so fpectl.so operator.so signal.so binascii.so gdbm.so parser.so strop.so bsddb.so grp.so pcre.so syslog.so I think that is every module which can be compiled on my machine. Great work Andrew (and the distutil developers). Neil From nas@arctrix.com Thu Jan 11 11:47:09 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 11 Jan 2001 03:47:09 -0800 Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <14941.59707.632995.224116@beluga.mojam.com>; from skip@mojam.com on Thu, Jan 11, 2001 at 11:11:23AM -0600 References: <14941.59707.632995.224116@beluga.mojam.com> Message-ID: <20010111034709.C23512@glacier.fnational.com> I'm -1 on making dir() pay attention to __all__. I'm +1 on adding a help() function which pays attention to __all__ and (optionally?) prints doc strings. Neil From gstein@lyra.org Thu Jan 11 19:38:50 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 11:38:50 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101111558.KAA15447@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 11, 2001 at 10:58:55AM -0500 References: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> Message-ID: <20010111113850.F4640@lyra.org> On Thu, Jan 11, 2001 at 10:58:55AM -0500, Guido van Rossum wrote: > > Please don't use __all__. At the moment, __all__ is the only way > > to easily tell whether a particular module object really represents > > a package, and the only way to get the list of submodule names. > > > > If __all__ is overloaded to also represent exportable symbols in > > modules, these two pieces of information will be impossible (or > > require much ugly hackery) to obtain. > > Marc-Andre already explained that __all__ is not to be trusted. > > If you want a reasonably good test for package-ness, use the presence > of __path__. > > For a really good test, check whether __file__ ends in __init__.py[c]. Even that isn't safe: if the module was pulled from an archive, __file__ might not get set. Determining whether something is a package is highly dependent upon how it was brought into the system. It is entirely possibly that you *can't* know something represents a package. You can get close by looking in sys.modules to look for modules "below" the given module. But if none have been imported yet, then you're out of luck. If you're using imputil, then you can look for __ispkg__ in the module. Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas@xs4all.net Thu Jan 11 19:50:24 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 20:50:24 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Fri, Jan 12, 2001 at 04:09:13AM +0200 References: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il> Message-ID: <20010111205024.Z2467@xs4all.nl> On Fri, Jan 12, 2001 at 04:09:13AM +0200, Moshe Zadka wrote: > Why not __init__? It has to be there, and is in no other module object. Wrong association... __init__ would be a method that gets executed. (At least that's what I'd expect :) 'sides,-everyone-was-in-agreement-on-__all__-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From MarkH@ActiveState.com Thu Jan 11 20:25:30 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Thu, 11 Jan 2001 12:25:30 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> Message-ID: > It was more convenient that way. And I'm pretty certain the _DECREF > macros do not change their arguments. Pretty certain??? That doesn't inspire confidence . How certain are you that this will be true in the future? I think it bad style indeed - for example, I could see benefit in having DECREF (or _Py_Dealloc, called by decref) set the object to NULL in debug builds. What if that decision is taken in the future? I thought rules were pretty clear with reference counting - dont assume _anything_ about the object unless you hold a reference (or are damn sure someone else does!) Mark. From thomas@xs4all.net Thu Jan 11 21:41:57 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 22:41:57 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: ; from MarkH@ActiveState.com on Thu, Jan 11, 2001 at 12:25:30PM -0800 References: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> Message-ID: <20010111224157.A2467@xs4all.nl> On Thu, Jan 11, 2001 at 12:25:30PM -0800, Mark Hammond wrote: > I thought rules were pretty clear with reference counting - dont assume > _anything_ about the object unless you hold a reference (or are damn sure > someone else does!) Moshe isn't breaking that rule. He isn't assuming anything about the object, just about the value of the pointer to that object. I agree, though, that it's bad practice to rely on it having the old value, after DECREFing it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Thu Jan 11 21:48:46 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 16:48:46 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 08:42:44 PST." References: Message-ID: <200101112148.QAA16227@cj20424-a.reston1.va.home.com> > Sorry, you're right. I retract my comment about __all__. Can you explain *why* you wanted to test for package-ness? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jan 11 21:55:24 2001 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 16:55:24 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Thu, 11 Jan 2001 11:14:00 EST." References: Message-ID: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> > I've put a new version of the setup.py script at > http://www.mems-exchange.org/software/files/python/setup.py > > (I'm at work and can't remember the password to get into > www.amk.ca. :) ) > > This version improves the detection of Tcl/Tk, handles the > _curses_panel module, and doesn't do a chdir(). Same drill as before: > just grab the script, drop it in the root of your Python source tree > (2.0 or current CVS), run "./python setup.py build", and look at the > modules it compiles. I can try it on Linux, so I'm most interested in > hearing reports for other Unix versions (*BSD, HP-UX, etc.) Good work -- but I still can't run this inside a platform-specific subdirectory. Are you planning on supporting this? --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Thu Jan 11 21:20:45 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 11 Jan 2001 22:20:45 +0100 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) Message-ID: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> > I would very much appreciate your feedback At the first glance, it looks *very* promising. I really look forward to see it in 2.1. However, robustness probably needs to be improved: >>> help() Traceback (most recent call last): File "", line 1, in ? TypeError: not enough arguments to help(); expected 1, got 0 Wasn't there even a proposal that >>> help should do something meaningful (by implementing __repr__)? >>> import string >>> help(string) Traceback (most recent call last): File "", line 1, in ? File "pydoc.py", line 183, in help pager('Help on %s:\n\n' % desc + textdoc.document(thing)) File "./textdoc.py", line 171, in document if inspect.ismodule(object): results = document_module(object) File "./textdoc.py", line 87, in document_module if (inspect.getmodule(value) or object) is object: File "./inspect.py", line 190, in getmodule file = getsourcefile(object) File "./inspect.py", line 204, in getsourcefile filename = getfile(object) File "./inspect.py", line 172, in getfile raise TypeError, 'arg is a built-in class' TypeError: arg is a built-in class Also, the tools could use some command line options: martin@mira:~/pydoc > ./pydoc.py --help Traceback (most recent call last): File "./pydoc.py", line 190, in ? opts[args[i][1:]] = args[i+1] IndexError: list index out of range At a minimum, I propose -h, --help, -v, -V. Regards, Martin From fdrake@acm.org Thu Jan 11 22:11:24 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 11 Jan 2001 17:11:24 -0500 (EST) Subject: [Python-Dev] [PEP 205] Weak References PEP updated, patch available! Message-ID: <14942.12172.129547.770776@cj42289-a.reston1.va.home.com> I've updated the Weak References PEP a little: http://python.sourceforge.net/peps/pep-0205.html A preliminary version of the implementation and documentation is available as well: http://sourceforge.net/patch/?func=detailpatch&patch_id=103203&group_id=5470 Please send feedback on the PEP or implementation to me. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From akuchlin@mems-exchange.org Thu Jan 11 22:26:33 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 11 Jan 2001 17:26:33 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: <200101112155.QAA16678@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 11, 2001 at 04:55:24PM -0500 References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> Message-ID: <20010111172633.A26249@kronos.cnri.reston.va.us> On Thu, Jan 11, 2001 at 04:55:24PM -0500, Guido van Rossum wrote: >Good work -- but I still can't run this inside a platform-specific >subdirectory. Are you planning on supporting this? I didn't really understand this when you pointed it out, but forgot to ask for clarification. What does your directory layout look like? --amk From ping@lfw.org Thu Jan 11 22:26:53 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 14:26:53 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> Message-ID: On Thu, 11 Jan 2001, Martin v. Loewis wrote: > > However, robustness probably needs to be improved: Agreed. > Wasn't there even a proposal that > > >>> help > > should do something meaningful (by implementing __repr__)? There was. I am planning to incorporate Paul Prescod's mechanism for doing this; i just didn't have time to throw in that feature yet, and wanted feedback on the man-like stuff first. My next two targets are: 1. Generating text from the HTML documentation files using Paul Prescod's stuff in onlinehelp.py. 2. Running a background HTTP server that produces its pages using htmldoc.py. Both are pieces we already have and only need to integrate; i just wanted to get at least a working candidate done first. Did using pydoc like "man" work okay for you? > >>> import string > >>> help(string) > Traceback (most recent call last): ... > TypeError: arg is a built-in class Mine doesn't do this for me. I think i may have left up an older version of inspect.py by mistake. Try downloading http://www.lfw.org/python/inspect.py again -- apologies for the hassle. > Also, the tools could use some command line options: > > martin@mira:~/pydoc > ./pydoc.py --help > Traceback (most recent call last): > File "./pydoc.py", line 190, in ? > opts[args[i][1:]] = args[i+1] > IndexError: list index out of range > > At a minimum, I propose -h, --help, -v, -V. Okay. There is usage help already; i just failed to make it sufficiently robust about deciding when to show it. skuld[1010]% pydoc /home/ping/bin/pydoc ... Show documentation on something. may be the name of a Python function, module, package, or a dotted reference to a class or function within a module or module in a package. /home/ping/bin/pydoc -k Search for a keyword in the short descriptions of modules. -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler From ping@lfw.org Thu Jan 11 22:28:44 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 14:28:44 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101112148.QAA16227@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Guido van Rossum wrote: > > Sorry, you're right. I retract my comment about __all__. > > Can you explain *why* you wanted to test for package-ness? Auto-generating documentation. pydoc.py currently tests for __path__, and looks for the presence of __init__.py in a subdirectory to mean that the subdirectory name is a package name. Is it safe on all platforms to just list all .py files in the subdirectory to get all submodules? -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler From tim.one@home.com Thu Jan 11 23:17:06 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 11 Jan 2001 18:17:06 -0500 Subject: [Python-Dev] RE: Baffled on Windows In-Reply-To: Message-ID: [Mark Hammond] > winsound adds "/export:initwinsound" to the link line. This is an > alternative to __declspec in the sources. Yup/arghghghgh. It's fixed now. Thanks! > This all gets back to a discussion we had here nearly a year > or so ago - Yup/arghghghgh. . > that "DL_EXPORT" isnt capturing our semantics, and that we should > probably create #defines that match the _intent_ of the > definition, rather than the implementation details - ie, replace > DL_EXPORT with (say) PY_API_DECL and PY_MODULEINIT_DECL or some > such. Yup/noarghghghgh. > I'm happy to think about this and help implement it if the time > is now right... Same here. Now how can we tell whether the time is right? I must say, it hasn't gotten better by leaving it alone for a year. I think we need a Unix dweeb to play along, though -- if only to confirm that their compilers are no help. >> Any Windows geek got a clue? > Isn't that question a paradox? ;-) Well, nobody else will understand this, but *we* know that Windows geeks need more clues than everyone else put together just to get the box booted each day (or hour <0.9 wink>). From michel@digicool.com Fri Jan 12 01:15:52 2001 From: michel@digicool.com (Michel Pelletier) Date: Thu, 11 Jan 2001 20:15:52 -0500 Subject: [Python-Dev] New Draft PEP: Python Interfaces Message-ID: Hello, I have roughed out a draft PEP that proposes the extension of Python to include an interface framework. It is posted online here: http://www.zope.org/Members/michel/InterfacesPEP/PEP.txt This is my first revision and stab at a PEP. I'd like to find out what you think about the PEP and maybe discuss it some more offline on a different list. Thanks! -Michel From martin@loewis.home.cs.tu-berlin.de Fri Jan 12 01:15:25 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 02:15:25 +0100 Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: (message from Ka-Ping Yee on Thu, 11 Jan 2001 14:26:53 -0800 (PST)) References: Message-ID: <200101120115.f0C1FPx03702@mira.informatik.hu-berlin.de> > Did using pydoc like "man" work okay for you? Yes, that is very impressive. > Mine doesn't do this for me. I think i may have left up an older version > of inspect.py by mistake. Try downloading > > http://www.lfw.org/python/inspect.py > > again -- apologies for the hassle. No need to apologize. It works fine now. Thanks, Martin From moshez@zadka.site.co.il Fri Jan 12 09:53:35 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 11:53:35 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: References: Message-ID: <20010112095335.E8A15A82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001, "Mark Hammond" wrote: > I think it bad style indeed - for example, I could see benefit in having > DECREF (or _Py_Dealloc, called by decref) set the object to NULL in debug > builds. What if that decision is taken in the future? > > I thought rules were pretty clear with reference counting - dont assume > _anything_ about the object unless you hold a reference (or are damn sure > someone else does!) I'm not assuming anything about the object -- I'm assuming something about the pointer. And macros should not change their arguments -- DECREF is basically a wrapper around _Py_Dealloc((PyObject *)(op)). Just like free(pointer); if (pointer == NULL) do_something(); is perfectly legal C. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez@zadka.site.co.il Fri Jan 12 09:57:32 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 11:57:32 +0200 (IST) Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <14941.59707.632995.224116@beluga.mojam.com> References: <14941.59707.632995.224116@beluga.mojam.com> Message-ID: <20010112095732.1F65BA82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 11:11:23 -0600 (CST), Skip Montanaro wrote: > > I know Guido has said he doesn't want to fiddle with dir(), but my sense of > things from the overall discussion of the __exports__ concept tells me that > when used interactively dir() often presents confusing output for new Python > users. > > I twiddled CGIHTTPServer to have __all__ and added the following dir() > function to my PYTHONSTARTUP file: > > def dir(o,showall=0): > if not showall and hasattr(o, "__all__"): > x = list(o.__all__) > x.sort() > return x > from __builtin__ import dir as d > return d(o) > > Compare its output with and without showall set: > > >>> dir(CGIHTTPServer) > ['CGIHTTPRequestHandler', 'test'] > >>> dir(CGIHTTPServer,1) > ['BaseHTTPServer', 'CGIHTTPRequestHandler', 'SimpleHTTPServer', '__all__', > '__builtins__', '__doc__', '__file__', '__name__', '__version__', > 'executable', 'nobody', 'nobody_uid', 'os', 'string', 'sys', 'test', > 'urllib'] > > I haven't demonstrated any great programming prowess with this little > function, but I rather suspect it may be beyond most brand new users. If > Guido can't be convinced to allow dir() to change, how about adding a sample > PYTHONSTARTUP file to the distribution that contains little bits like this > and Ping's pydoc.help stuff (assuming it gets into the distro, which I hope > it does)? And, while we're at it, the following bit too can be in the PYTHONSTARTUP: def display(x): import __builtin__ __builtin__._ = None if type(x) == type(''): print `x` else: print x __built__._ = x import sys sys.displayhook = display -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one@home.com Fri Jan 12 02:33:59 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 11 Jan 2001 21:33:59 -0500 Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <20010111034709.C23512@glacier.fnational.com> Message-ID: [Neil Schemenauer] > I'm -1 on making dir() pay attention to __all__. Me too. The original __exports__ idea was an ironclad guarantee about which names were externally visible for *any* purpose. Then it made sense to restrict dir() accordingly. But if __all__ is just "a hint" (to be ignored or honored at whim, by whoever chooses), the introspective uses of dir() must be served too. > I'm +1 on adding a help() function which pays attention to > __all__ and (optionally?) prints doc strings. I can't be +1 on anything that vague -- although I'm +1 on each part of it if done in exactly the way I envision . From ping@lfw.org Fri Jan 12 02:51:54 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 18:51:54 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101120115.f0C1FPx03702@mira.informatik.hu-berlin.de> Message-ID: On Fri, 12 Jan 2001, Martin v. Loewis wrote: > > Did using pydoc like "man" work okay for you? > > Yes, that is very impressive. Good. What platform did you try it on? I have updated the scripts now to provide a very rudimentary HTTP server feature: skuld[1316]% pydoc -p 8080 starting server on port 8080 This starts a server on port 8080 that generates HTML documentation for modules on the fly. The root page (http://localhost:8080/) shows an index of modules -- it badly needs some cleaning up, but at least it provides access to all the documentation. http://www.lfw.org/python/pydoc.py http://www.lfw.org/python/htmldoc.py Also, as you requested: skuld[1324]% pydoc -h /home/ping/bin/pydoc ... Show documentation on something. may be the name of a Python function, module, package, or a dotted reference to a class or function within a module or module in a package. /home/ping/bin/pydoc -k Search for a keyword in the short descriptions of modules. /home/ping/bin/pydoc -p Start an HTTP server on the given port on the local machine. More to come. -- ?!ng From fdrake@acm.org Fri Jan 12 03:02:00 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 11 Jan 2001 22:02:00 -0500 (EST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: References: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> Message-ID: <14942.29609.19618.534613@cj42289-a.reston1.va.home.com> Ka-Ping Yee writes: > My next two targets are: > 1. Generating text from the HTML documentation files > using Paul Prescod's stuff in onlinehelp.py. You mean the ones I publish as the standard documentation? Relying on the structure of that HTML is pure folly! I don't think I can make any guaranttees that the HTML structures won't change as the processing evolves. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Fri Jan 12 03:49:47 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 11 Jan 2001 22:49:47 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101111523.KAA14982@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I don't want to call FLOCKFILE while holding the Python lock, as > this means that *if* we're blocked in FLOCKFILE (e.g. we're reading > from a pipe or socket), no other Python thread can run! Ah, good point! Doesn't appear an essential point, though: the HAVE_GETC_UNLOCKED code could still be fiddled easily enough to call FLOCKFILE and FUNLOCKFILE exactly once per line, but with the first thread release before the (dynamically only) FLOCKFILE and the last thread grab after the (dynamically only) FUNLOCKFILE. It's just a question of will, but since that's lacking I'll drop it. > ... > I don't think that _filbuf can possibly wait for another thread to > write data to the same stream object. OK, I'll buy that. Dropped too. > ... > OK. It's unique to MS. So close the bug report with a "won't fix" > resolution. There's no point in having bug reports remain open that > we know we can't fix. We don't really have a policy about that. Perhaps you're articulating one here, though! I've always left bugs open if they're (a) bugs, and (b) open . For example, I left the Norton Blue-Screen crash bug open (although I see now you eventually closed that). Ditto the "Rare hangs in w9xpopen.exe" bug (which is still open, but will never be fixed by *us*). Just other examples of things we'll almost certainly never fix ourselves (we have no handle on them, and all evidence says the OS is screwing up). My view has been that if a user comes to the bug site, it's most helpful for them if active (== "still happens") crashes and hangs appear among the open problems. Now that your view of it is clearer, I'll switch to yours. too-easy-ly y'rs - tim From tim.one@home.com Fri Jan 12 04:22:40 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 11 Jan 2001 23:22:40 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101111527.KAA15005@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > The locking prevents concurrent threads accessing the stream. > > But mixing reads and writes (without intervening fseek etc.) is > illegal use of the stream, and the C standard allows them to be lax > here, even if the program was single-threaded. > > In other words: the locking is so good that it serializes the > sequence of reads and writes; but if the sequence of reads and > writes is illegal, they don't guarantee anything. We're never going to agree on this one, you know. My definition of "bug" here has nothing to do with the std: something's "a bug" if it's not functioning as designed. That's all. So if the implementers would say "oops! that should not have happened!", then to me it's "a bug". It so happens I believe the MS implementers would consider this to be a bug under that defn. Multi-threaded libraries have to be written to a much higher level than the C std guarantees (been there, done that, and so have you), and this is specifically corruption in a crucial area vulnerable to races. They have a timing hole! That's clear. If the MS implementers don't believe that's "a bug", then I'd say they're too unprofessional to be allowed in the same country as a multithreaded library <0.1 wink>. Your definition of "bug" seems to be more "I don't want it in Python's open bug list, so I'll do what Tim usually does and appeal to the std in a transparent effort to convince someone that it's not really 'a bug' -- then maybe I'll get it off of Python's bug list". I'm sure you'll agree that's a fair summary of both sides . it's-a-bug-and-it's-no-longer-on-python's-open-bug-list-ly y'rs - tim From tim.one@home.com Fri Jan 12 06:54:47 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 12 Jan 2001 01:54:47 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101111508.KAA14870@cj20424-a.reston1.va.home.com> Message-ID: [Tim, on for_xreadlines vs readlines_sizehint, after disabling the default 1Mb buffer size in the latter] > They're indistinguishable then on my box (on one run xreadlines > is .1 seconds (out of around 7.6 total) quicker, on another > readlines_sizehint), *provided* that I specify the same buffer > size (8192) that xreadlines uses internally. However, if I even > double that, readlines_sizehint is uniformly about 10% slower. It's > also a tiny bit slower if I cut the sizehint buffer size to 4096. [Guido] > 8192 happens to be the size of the stack-allocated buffer readlines() > uses, and also the stdio BUFSIZ parameter, on many systems. Look for > SMALLCHUNK in fileobject.c. > > Would it make sense to tie the two constants together more to tune > this optimally even when BUFSIZ is different? Have to repeat what I first said: > I'm afraid Mysteries will remain no matter how many > person-decades we spend staring at this <0.5 wink> ... I'm repeating that because BUFSIZ is 4096 on WinTel, but SMALLCHUNK (8192) worked best for me. Now we're in some complex balancing act among how often the outer loop needs to refill the readlines_sizehint buffer;, how out of whack the latter is with the platform stdio buffer; whether platform malloc takes only twice as long to allocate space for 2*N strings as for N; and, if the readlines buffer is too large, at exactly which point the known Win9x eventually-quadratic-time behavior of PyList_Append starts to kick in. I can't out-think all that. Indeed, I can't out-think any of it . After staring at the code, I expect my "only a tiny bit slower" was an illusion: if 0 < sizehint <= SMALLCHUNK, sizehint appears to have no effect on the operation on file_readline. BTW, changing fileobject.c's SMALLCHUNK to a copy of BUFSIZ didn't make any difference on Windows. From moshez@zadka.site.co.il Fri Jan 12 16:03:58 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 18:03:58 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,1.2,1.3 In-Reply-To: References: Message-ID: <20010112160358.B0AC0A82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001, Thomas Wouters wrote: > Noone but me cares, but Guido said to go ahead and fix it if it bothered me. I think you meant no one. Noone is an archaic spelling of noon. quid-pro-quo-ly y'rs, Z. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From fredrik@effbot.org Fri Jan 12 08:17:11 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 12 Jan 2001 09:17:11 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,1.2,1.3 References: <20010112160358.B0AC0A82D@darjeeling.zadka.site.co.il> Message-ID: <012a01c07c70$11aac700$e46940d5@hagrid> > > Noone but me cares, but Guido said to go ahead and fix it if it bothered me. > > I think you meant no one. Noone is an archaic spelling of noon. no, he meant me. I care. From martin@loewis.home.cs.tu-berlin.de Fri Jan 12 08:09:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 09:09:00 +0100 Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: (message from Ka-Ping Yee on Thu, 11 Jan 2001 18:51:54 -0800 (PST)) References: Message-ID: <200101120809.f0C890B00802@mira.informatik.hu-berlin.de> > Good. What platform did you try it on? Linux, in a Konsole. I guess that is an environment you'd been using as well :-) Martin From jack@oratrix.nl Fri Jan 12 09:57:27 2001 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 12 Jan 2001 10:57:27 +0100 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message by Ka-Ping Yee , Thu, 11 Jan 2001 08:36:36 -0800 (PST) , Message-ID: <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> > I'm pleased to announce a reasonable first pass at a documentation > utility for interactive use. "pydoc" is usable in three ways: [...] > I would very much appreciate your feedback, especially from testing > on non-Unix platforms. Thank you! Wow, I'm impressed! To make it run on the mac I had to add tests for the existence of os.system only. (So all statements "if os.system(...) > 0:" got to be "if hasattr(os, "system") and os.system(...) > 0:"). There are however various other niceties that could be added to make it more useful, can this be put into the repository or something? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein@lyra.org Fri Jan 12 10:31:53 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 12 Jan 2001 02:31:53 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <20010111224157.A2467@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 11, 2001 at 10:41:57PM +0100 References: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> <20010111224157.A2467@xs4all.nl> Message-ID: <20010112023153.Q4640@lyra.org> On Thu, Jan 11, 2001 at 10:41:57PM +0100, Thomas Wouters wrote: > On Thu, Jan 11, 2001 at 12:25:30PM -0800, Mark Hammond wrote: > > > I thought rules were pretty clear with reference counting - dont assume > > _anything_ about the object unless you hold a reference (or are damn sure > > someone else does!) > > Moshe isn't breaking that rule. He isn't assuming anything about the object, > just about the value of the pointer to that object. I agree, though, that > it's bad practice to rely on it having the old value, after DECREFing it. Oh, that is just so much baloney. If I said Py_DECREF(&ptr), *then* I'd be worried. But if I ever call Py_DECREF(foo) and it modifies foo, then I'd be quite upset. "functions" just aren't supposed to do that. -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Fri Jan 12 13:51:51 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 08:51:51 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Thu, 11 Jan 2001 17:26:33 EST." <20010111172633.A26249@kronos.cnri.reston.va.us> References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> Message-ID: <200101121351.IAA19676@cj20424-a.reston1.va.home.com> > >Good work -- but I still can't run this inside a platform-specific > >subdirectory. Are you planning on supporting this? > > I didn't really understand this when you pointed it out, but forgot to > ask for clarification. What does your directory layout look like? Ah. It's very simple. I create a directory "linux" as a subdirectory of the Python source tree (i.e. at the same level as Lib, Objects, etc.). Then I chdir into that directory, and I say "../configure". The configure script creates subdirectories to hold the object files for me: Grammar, Parser, Objects, Python, Modules, and sticks Makefiles in them. The "srcdir" variable in the Makefiles is set to "..". Then I say "make" and it builds Python. The source directories are used but no files are created or modified there: all files are created in the "linux" directory. This lets me have several separate configurations: the feature used to be intended for sharing a source tree between multiple platforms, but now I use it to have threaded, nonthreaded, debugging, and regular builds under a single source tree. This also works where the build directory is completely outside the source tree (some people apparently mount the source tree read-only). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jan 12 13:54:12 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 08:54:12 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 14:28:44 PST." References: Message-ID: <200101121354.IAA19700@cj20424-a.reston1.va.home.com> > > Can you explain *why* you wanted to test for package-ness? > > Auto-generating documentation. pydoc.py currently tests for __path__, > and looks for the presence of __init__.py in a subdirectory to mean > that the subdirectory name is a package name. Is it safe on all platforms > to just list all .py files in the subdirectory to get all submodules? Yes, that should work. Of course there could also be extension modules or .pyc-only files there -- you could use imp..get_suffixes() to find out all modules (even if that means you don't always have the source code available). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jan 12 14:07:30 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 09:07:30 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 11 Jan 2001 22:49:47 EST." References: Message-ID: <200101121407.JAA19781@cj20424-a.reston1.va.home.com> > [Guido] > > I don't want to call FLOCKFILE while holding the Python lock, as > > this means that *if* we're blocked in FLOCKFILE (e.g. we're reading > > from a pipe or socket), no other Python thread can run! [Tim] > Ah, good point! Doesn't appear an essential point, though: the > HAVE_GETC_UNLOCKED code could still be fiddled easily enough to call > FLOCKFILE and FUNLOCKFILE exactly once per line, but with the first thread > release before the (dynamically only) FLOCKFILE and the last thread grab > after the (dynamically only) FUNLOCKFILE. It's just a question of will, but > since that's lacking I'll drop it. Yes, but if the line is very long, you'd have to use malloc() -- you can't use _PyString_Resize() since that can access the thread state. You're right that I don't want to do this. > > OK. It's unique to MS. So close the bug report with a "won't fix" > > resolution. There's no point in having bug reports remain open that > > we know we can't fix. > > We don't really have a policy about that. Perhaps you're articulating one > here, though! I've always left bugs open if they're (a) bugs, and (b) open > . For example, I left the Norton Blue-Screen crash bug open (although > I see now you eventually closed that). Ditto the "Rare hangs in > w9xpopen.exe" bug (which is still open, but will never be fixed by *us*). > Just other examples of things we'll almost certainly never fix ourselves (we > have no handle on them, and all evidence says the OS is screwing up). Yes, as I was thinking about this I realized that that was the policy I wanted. So, yes, the w9xpopen popen bug can be closed as WontFix too. > My view has been that if a user comes to the bug site, it's most helpful for > them if active (== "still happens") crashes and hangs appear among the open > problems. Now that your view of it is clearer, I'll switch to yours. I find it more important that the bug list gives us developers an overview of tasks to be tackled. The problems that won't go away can be listed in the Python 2.0 MoinMoin web! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jan 12 14:27:43 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 09:27:43 -0500 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Your message of "Fri, 12 Jan 2001 10:57:27 +0100." <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> References: <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> Message-ID: <200101121427.JAA20034@cj20424-a.reston1.va.home.com> > There are however various other niceties that could be added to make it more > useful, can this be put into the repository or something? Ping, do you think you could check this in into the nondist tree? nondist/sandbox/help would seem a good name (next to Paul's nondist/sandbox/doctools). --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Fri Jan 12 16:37:57 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 12 Jan 2001 10:37:57 -0600 (CST) Subject: [Python-Dev] [Patch #103154] Cygwin Check Import Case Patch In-Reply-To: References: Message-ID: <14943.13029.103771.261362@beluga.mojam.com> Guido> Summary: Cygwin Check Import Case Patch ... Guido> But I believe the solution is that the TERMIOS module should be Guido> renamed. Isn't this a general problem? As I recall, the convention when generating Python modules from C header files is to simply convert the base name to upper case and replace ".h" with ".py" (errno.h -> ERRNO.py). From h2py.py: # Without filename arguments, acts as a filter. # If one or more filenames are given, output is written to corresponding # filenames in the local directory, translated to all uppercase, with # the extension replaced by ".py". Perhaps the convention should be instead to append "d" or "data" to the base name (errno.h -> errnodata.py). Skip From guido@python.org Fri Jan 12 17:47:46 2001 From: guido@python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 12:47:46 -0500 Subject: [Python-Dev] [Patch #103154] Cygwin Check Import Case Patch In-Reply-To: Your message of "Fri, 12 Jan 2001 10:37:57 CST." <14943.13029.103771.261362@beluga.mojam.com> References: <14943.13029.103771.261362@beluga.mojam.com> Message-ID: <200101121747.MAA27504@cj20424-a.reston1.va.home.com> > Guido> Summary: Cygwin Check Import Case Patch > ... > Guido> But I believe the solution is that the TERMIOS module should be > Guido> renamed. > > Isn't this a general problem? As I recall, the convention when generating > Python modules from C header files is to simply convert the base name to > upper case and replace ".h" with ".py" (errno.h -> ERRNO.py). From h2py.py: > > # Without filename arguments, acts as a filter. > # If one or more filenames are given, output is written to corresponding > # filenames in the local directory, translated to all uppercase, with > # the extension replaced by ".py". > > Perhaps the convention should be instead to append "d" or "data" to the base > name (errno.h -> errnodata.py). An even better solution is to get rid of those generated headers and incorporate the desired symbols directly in the C extension modules. That's happened for errno and socket, for example; maybe it's time to do that for termios, too! --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Fri Jan 12 18:54:47 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 12 Jan 2001 13:54:47 -0500 Subject: [Python-Dev] Patch 103216 - dbmmodule Setup changes Message-ID: <14943.21239.382891.661026@anthem.wooz.org> I've just uploaded patch 103216 to the Python project at SF. This does a couple of things. First, it auto-detects (in configure) whether dbmmodule can be built, and if so whether the -lndbm library needs to be specified. Second, it moves the entry for dbmmodule to Setup.conf, after the *shared* key so that it'll be built as a dynamic library by default. This should fix the problem where compiling in dbmmodule sets up a dependency to libdb which later hoses pybsddb3. I'd have just checked it in, but I'd like someone else to just proof it first. I've only tested this with the current CVS tree on a fairly stock RH6.1. BTW, I didn't include the changes to configure in the patch, because it's large and made SF's patch manager cough. Besides it can be generated from configure.in and config.h.in which are included in the patch. Cheers, -Barry From martin@loewis.home.cs.tu-berlin.de Fri Jan 12 22:19:57 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 23:19:57 +0100 Subject: [Python-Dev] PEP 205 comments Message-ID: <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> Before commenting on the patch itself, I'd like to comment on the patch describing it. I'm missing a discussion as to why weak references don't act as proxies (or why they do now). A weak proxy would provide the same attributes as the object which it encapsulates, so it could be used transparently in place of the original object. I can think of a number of reasons why it is not done this way (e.g. complete transparency is impossible to achieve); now that a revision of the patch provides proxies, the documentation should state which features are forwarded to the proxy and which aren't (it lists the type() as a difference, but I doubt that is the only difference - repr is also different). Next, I wonder whether weakref.new is allowed to return an existing weak reference to the same object. If that is not acceptable, I'd like to know why - if it was acceptable, then weakref.new(instance) (i.e. without callback) could return the same weak reference all the time. A smart implementation might chose to put the weak reference with no callback in the start of the list, so creation of additional weak references to the same object would be inexpensive. Likewise, I'd like to know the rationale for the clear method. Why is it desirable to drop the object, yet keep the weak reference? Isn't it easier for the application to either ignore clearing altogether, or dropping the reference to the weak reference? So I'd propose to kill the clear method. Again on proxies, there is no discussion or documentation of the ReferenceError. Why is it a RuntimeError? LookupError, ValueError, and AttributeError seem to be just as fine or better. On to the type type extensions: Should there be a type flag indicating presence of tp_weaklistoffset? It appears that the type structure had tp_xxx7 for a long time, so likely all in-use binary modules have that field set to zero. Is that sufficient? Thanks for reading all of this message, Martin From skip@mojam.com (Skip Montanaro) Sat Jan 13 15:37:55 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 13 Jan 2001 09:37:55 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tempfile.py,1.23,1.24 In-Reply-To: References: Message-ID: <14944.30291.658931.489979@beluga.mojam.com> Tim> On Linux, someone please run that standalone with more files and/or Tim> more threads; e.g., Tim> python lib/test/test_threadedtempfile.py -f 1000 -t 10 Tim> to run with 10 threads each creating (and deleting) 1000 temp files. After capitalizing "Lib", it worked fine for me: % ./python Lib/test/test_threadedtempfile.py -f 1000 -t 10 Creating Starting Reaping Done: errors 0 ok 10000 Skip From dkwolfe@pacbell.net Sat Jan 13 18:48:21 2001 From: dkwolfe@pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 10:48:21 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore Message-ID: <0G740027Q6Q1KL@mta6.snfc21.pbi.net> Howdy Folks, I need some help here. I'd like to see Python build out of the box with a ./configure, make, make test, and make install on Darwin and Mac OS X. Having it build out of the box will make it easier to be incorporated into both Darwin and the base Mac OS X distribution - although not for the initial release of the latter but definitely doable for subsequent releases. In order to do this, I need to have it build cleanly on HFS and UFS filesystems. Under HFS system, I've got a name conflict due to case insenstivity between the build target and the "Python" directory that forces me to build with a -with-suffix command on HFS and manually change the name after install - which is an automatic knockout factor when it comes to incorporating it in an automatic build system. Not to mention a problem with unix newbies trying to build from source... Last night, I did some quick investigation to determine the best way to fix this problem as documented in PEP-42 in the build section and Sourceforge bug 122215 and determined that the easiest and least error prone way was to change the directory name Python to PyCore. It's apparent from the comments that I'm missing something here as the reaction has been negative so far - to the point where Guido has rejected the patch. Can someone explain what I'd missing that's causing such strong feelings? My second question is how do I resolve the name conflict in an approved way? It's been suggested that a build directory be created (/src/build ?) and that the target be place here. The problem that I had with this suggestion is that it would require an additional layer to execute the target and I wasn't sure what impact it whould have on running python from a new directory... which is the reason I took the more known path. :-) Bottom line, come March 24th, Mac OS X 1.0 will be released and as of July 2001 all Macintoshes will come with Mac OS X. I'd like to see Python be easily built on "out of the box" these machines - rather come with a haphazardous list of instructions or commands as currently needed for 1.5.2 and 2.0 releases. And hopefully, at some point be incorporated into the base Mac OS X installation... - Dan Wolfe From esr@thyrsus.com Sat Jan 13 20:23:50 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 15:23:50 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? Message-ID: <20010113152350.A17338@thyrsus.com> I have a new goodie for the 2.1 standard library, a module called "simil" that supports computation of similarity indices between strings such as one might use for recovery-matching of misspellings against a dictionary. The three methods supported are stemming, normalized Hamming similarity, and (the star of the show) Ratcliff-Obershelp gestalt subpattern matching. The latter is spookily effective for detecting not just substition typos but insertions and deletions. The module is a C extension (my first!) for speed and because the Ratcliff-Obershelp implementation uses pointer arithmetic heavily. It's documented, tested, and ready to go. But having written it, I now have a question: why is soundex marked obsolete? Is there something wrong with the algorithm or implementation? If not, then it would be natural for simil to absorb the existing soundex implementation as a fourth entry point. -- Eric S. Raymond Whether the authorities be invaders or merely local tyrants, the effect of such [gun control] laws is to place the individual at the mercy of the state, unable to resist. -- Robert Anson Heinlein, 1949 -- Eric S. Raymond Americans have the right and advantage of being armed - unlike the citizens of other countries whose governments are afraid to trust the people with arms. -- James Madison, The Federalist Papers From tim.one@home.com Sat Jan 13 21:34:10 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 13 Jan 2001 16:34:10 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113152350.A17338@thyrsus.com> Message-ID: [Eric S. Raymond] > I have a new goodie for the 2.1 standard library, a module called > "simil" that supports computation of similarity indices between > strings such as one might use for recovery-matching of misspellings > against a dictionary. My guess is that Guido won't accept it. > The three methods supported are stemming, normalized Hamming > similarity, and (the star of the show) Ratcliff-Obershelp gestalt > subpattern matching. The latter is spookily effective for detecting > not just substition typos but insertions and deletions. The module is > a C extension (my first!) for speed and because the Ratcliff-Obershelp > implementation uses pointer arithmetic heavily. Never heard of R-O, so tracked down some C code via google. It appears I invented the same algorithm at Cray Research in the early 80's for a diff generator, which later got reincarnated in my ndiff.py (in the Tools/scripts/ directory). ndiff generates "human-friendly" diffs between text files, at both the "file is a sequence of lines" and "line is a sequence of characters" levels. I didn't have the hyperbolic marketing genius to call it "gestalt subpattern matching", though -- I thought of it as what Unix diff *would* do if it constrained itself to matching *contiguous* subsequences, and under the theory people would find that more natural because contiguity is something the human visual system naturally latches on to. ndiff can be spookily natural in practice too. > It's documented, tested, and ready to go. But having written it, I > now have a question: why is soundex marked obsolete? Is there > something wrong with the algorithm or implementation? What is the soundex algorithm? Not joking. Skip Montanaro and I were unable to find the algorithm implemented by soundex.c anywhere in the literature, and I never found *any* two definitions that were the same. Even Knuth changed his description of Soundex between editions 2 and 3 of volume 3. Skip eventually merged my and Fred Drake's Python implementations of Knuth Vol 3 Ed 3 Soundex (see the Vaults of Parnassus). > If not, then it would be natural for simil to absorb the existing > soundex implementation as a fourth entry point. Well, soundex.c doesn't match any other Soundex on earth, so it's not worth reproducing in new code. Guido doesn't want to be in the middle of fighting over ill-defined algorithms, so booted Soundex entirely. Another candidate for inclusion is the NYSIIS algorithm, which is probably in more "serious" use than Soundex anyway. Same thing with NYSIIS, though (i.e., what-- exactly --is "the NYSIIS algorithm"?), except that Knuth didn't do us the favor of making up his own variation that will *become* "the std" via force of reputation. Sean True implemented *a* NYSIIS in Python (and again see the Vaults for a link to that). So that's why the module is unlikely to make it into the core: + There are any number of algorithms people may want to see (I don't know what "normalized Hamming similarity" means, but if it's not the same as Levenshtein edit distance then add the latter to the pot too). + Each algorithm on its own is likely controversial. + Computing string similarity is something few apps need anyway. Lots of hassle + little demand == not a natural for the core. ndiff is in the core only because many people found the *app* useful; its SequenceMatcher class isn't even advertised. may-never-understand-how-bigints-got-into-python-ly y'rs - tim From fdrake@acm.org Sat Jan 13 21:45:12 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 13 Jan 2001 16:45:12 -0500 (EST) Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: References: <20010113152350.A17338@thyrsus.com> Message-ID: <14944.52328.558763.46161@cj42289-a.reston1.va.home.com> Tim Peters writes: > + Computing string similarity is something few apps need anyway. And this is a biggie. > Lots of hassle + little demand == not a natural for the core. ndiff is in But it *is* an excellent type of thing to have around -- Eric: just post it on your Web site and register it with the Vaults. > the core only because many people found the *app* useful; its > SequenceMatcher class isn't even advertised. Did you ever write documentation for it? ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas@arctrix.com Sat Jan 13 15:17:58 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sat, 13 Jan 2001 07:17:58 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Sat, Jan 13, 2001 at 02:06:07PM -0800 References: Message-ID: <20010113071758.C28643@glacier.fnational.com> [Guido van Rossum on Demo/embed/loop] > (Except it still leaks, but that's probably a separate issue.) Could this be caused by modules adding things to their dict and then forgetting to decref them? I know I've been guilty of that. Neil From esr@thyrsus.com Sat Jan 13 22:15:28 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 17:15:28 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 04:34:10PM -0500 References: <20010113152350.A17338@thyrsus.com> Message-ID: <20010113171528.A17480@thyrsus.com> OK, now I understand why soundex isn't in the core -- there's no canonical version. Tim Peters : > + There are any number of algorithms people may want to see (I don't know > what "normalized Hamming similarity" means, but if it's not the same as > Levenshtein edit distance then add the latter to the pot too). Normalized Hamming similarity: it's an inversion of Hamming distance -- number of pairwise matches in two strings of the same length, divided by the common string length. Gives a measure in [0.0, 1.0]. I've looked up "Levenshtein edit distance" and you're rigbt. I'll add it as a fourth entry point as soon as I can find C source to crib. (Would you happen to have a pointer?) > + Each algorithm on its own is likely controversial. Not these. There *are* canonical versions of all these, and exact equivalents are all heavily used in commercial OCR software. > + Computing string similarity is something few apps need anyway. Tim, this isn't true. Any time you need to validate user input against a controlled vocabulary and give feedback on probable right choices, R/O similarity is *very* useful. I've had it in my personal toolkit for a decade and used it heavily for this -- you take your unknown input, check it against a dictionary and kick "maybe you meant foo?" to the user for every foo with an R/O similarity above 0.6 or so. The effects look like black magic. Users love it. -- Eric S. Raymond "I hold it, that a little rebellion, now and then, is a good thing, and as necessary in the political world as storms in the physical." -- Thomas Jefferson, Letter to James Madison, January 30, 1787 From guido@python.org Sat Jan 13 22:25:12 2001 From: guido@python.org (Guido van Rossum) Date: Sat, 13 Jan 2001 17:25:12 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: Your message of "Sat, 13 Jan 2001 07:17:58 PST." <20010113071758.C28643@glacier.fnational.com> References: <20010113071758.C28643@glacier.fnational.com> Message-ID: <200101132225.RAA03197@cj20424-a.reston1.va.home.com> > [Guido van Rossum on Demo/embed/loop] > > (Except it still leaks, but that's probably a separate issue.) > > Could this be caused by modules adding things to their dict and > then forgetting to decref them? I know I've been guilty of that. Do you have a tool that detects leaks? Barry has one: Insure++. It's expensive and we don't have a site license, so I'll ask Barry to investigate this. (Barry: go to Demo/embed and do "make looptest". Then in another shell window use "top" to watch the "loop" process grow slowly. I'd love to find out what's the problem here. It's not dependent on what you ask it to loop over; "./loop pass" also grows. Of course it could be one of the modules loaded during initialization...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Jan 13 22:33:34 2001 From: guido@python.org (Guido van Rossum) Date: Sat, 13 Jan 2001 17:33:34 -0500 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Your message of "Sat, 13 Jan 2001 10:48:21 PST." <0G740027Q6Q1KL@mta6.snfc21.pbi.net> References: <0G740027Q6Q1KL@mta6.snfc21.pbi.net> Message-ID: <200101132233.RAA03229@cj20424-a.reston1.va.home.com> > Howdy Folks, > > I need some help here. I'd like to see Python build out of the box with a > ./configure, make, make test, and make install on Darwin and Mac OS X. > Having it build out of the box will make it easier to be incorporated > into both Darwin and the base Mac OS X distribution - although not for > the initial release of the latter but definitely doable for subsequent > releases. In order to do this, I need to have it build cleanly on HFS and > UFS filesystems. > > Under HFS system, I've got a name conflict due to case insenstivity > between the build target and the "Python" directory that forces me to > build with a -with-suffix command on HFS and manually change the name > after install - which is an automatic knockout factor when it comes to > incorporating it in an automatic build system. Not to mention a problem > with unix newbies trying to build from source... > > Last night, I did some quick investigation to determine the best way to > fix this problem as documented in PEP-42 in the build section and > Sourceforge bug 122215 and determined that the easiest and least error > prone way was to change the directory name Python to PyCore. > > It's apparent from the comments that I'm missing something here as the > reaction has been negative so far - to the point where Guido has rejected > the patch. Can someone explain what I'd missing that's causing such > strong feelings? We use CVS to manage the sources. CVS makes it it very hard to a directory; it doesn't have a command for this, so you have to do the move directly in the repository, which will then break checkouts for everyone who has a work directory linked to the CVS repository. Using SourceForge makes it a bit harder still: we have to ask the SF sysadmins to do the move for us. And if we did the move, it would be much harder to reproduce old versions of the source tree with a single CVS command. A way around that would be to do a copy instead of a move, but that would cause the directory "PyCore" to pop up in all old versions, too. I just don't want to go through this hassle in order to make building easier for one relatively little-used platform. > My second question is how do I resolve the name conflict in an approved > way? It's been suggested that a build directory be created (/src/build > ?) and that the target be place here. The problem that I had with this > suggestion is that it would require an additional layer to execute the > target and I wasn't sure what impact it whould have on running python > from a new directory... which is the reason I took the more known path. > :-) I don't understand what you are proposing here; I can't imagine that an extra directory level could cause a slowdown. A suggestion I would be open to: change the executable name during build (currently a .exe suffix is added), but change it back (removing the .exe suffix) during the install. That should be a small change to the Makefile. > Bottom line, come March 24th, Mac OS X 1.0 will be released and as of > July 2001 all Macintoshes will come with Mac OS X. I'd like to see > Python be easily built on "out of the box" these machines - rather come > with a haphazardous list of instructions or commands as currently needed > for 1.5.2 and 2.0 releases. And hopefully, at some point be incorporated > into the base Mac OS X installation... Just get Apple to include Python with their standard distribution and nobody will *have* to build Python on Mac OSX. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Sat Jan 13 23:59:44 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 13 Jan 2001 18:59:44 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113171528.A17480@thyrsus.com> Message-ID: [Eric] > OK, now I understand why soundex isn't in the core -- there's no > canonical version. Actually, I think Knuth Vol 3 Ed 3 is canonical *now* -- nobody would dare to oppose him <0.5 wink>. > Normalized Hamming similarity: it's an inversion of Hamming distance > -- number of pairwise matches in two strings of the same length, > divided by the common string length. Gives a measure in [0.0, 1.0]. > > I've looked up "Levenshtein edit distance" and you're rigbt. I'll add > it as a fourth entry point as soon as I can find C source to crib. > (Would you happen to have a pointer?) If you throw almost everything out of Unix diff, that's what you'll be left with. Offhand I don't know of enencumbered, industrial-strength C source; a problem is that writing a program to compute this is a std homework exercise (it's a common first "dynamic programming" example), so you can find tons of bad C source. Caution: many people want small variations of "edit distance", usually via assigning different weights to insertions, replacements and deletions. A less common but still popular variant is to say that a transposition ("xy" vs "yx") is less costly than a delete plus an insert. Etc. "edit distance" is really a family of algorithms. >> + Each algorithm on its own is likely controversial. > Not these. There *are* canonical versions of all these, See the "edit distance" gloss above. > and exact equivalents are all heavily used in commercial OCR > software. God forbid that core Python may lose the commercial OCR developer market . It's not accepted that for every field F, core Python needs to supply the algorithms F uses heavily. Heck, core Python doesn't even ship with an FFT! Doesn't bother the folks working in signal processing. >> + Computing string similarity is something few apps need anyway. > Tim, this isn't true. Any time you need to validate user input > against a controlled vocabulary and give feedback on probable right > choices, Which is something few apps need anyway -- in my experience, but more so in my *primary* role here of trying to channel for you (& Guido) what Guido will say. It should be clear that I've got some familiarity with these schemes, so it should also be clear that Guido is likely to ask me about them whenever they pop up. But Guido has hardly ever asked me about them over the past decade, with the exception of the short-lived Soundex brouhaha. From that I guess hardly anyone ever asks *him* about them, and that's how channeling works: if this were an area where Guido felt core Python needed beefier libraries, I'm pretty sure I would have heard about it by now. But now Guido can speak for himself. There's no conceivable argument that could change what I *predict* he'll say. > R/O similarity is *very* useful. I've had it in my personal > toolkit for a decade and used it heavily for this -- you take your > unknown input, check it against a dictionary and kick "maybe you meant > foo?" to the user for every foo with an R/O similarity above 0.6 or so. > > The effects look like black magic. Users love it. I believe that. And I'd guess we all have things in our personal toolkits our users love. That isn't enough to get into the core, as I expect Guido will belabor on the next iteration of this . doesn't-mean-the-code-isn't-mondo-cool-ly y'rs - tim From dkwolfe@pacbell.net Sun Jan 14 00:19:56 2001 From: dkwolfe@pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 16:19:56 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore Message-ID: <0G7400EZQM2TXD@mta5.snfc21.pbi.net> >CVS makes it it very hard to a directory... >which will then break checkouts for everyone... with the potential to cause development code to be lost >Using SourceForge...have to ask the SF sysadmins I understand... we also use CVS and periodically (usually pre alpha) reorganize the source... going thru SF sysadmin makes it doublely hard... yuck! However, since you have "released" tarball archives, it seems to me that the loss of the diffs and log notes is more troubling that the need to create an old version.... at least that's been my experience when building software. ;-) >I just don't want to go through this hassle in order to make building >easier for one relatively little-used platform. humph. Ok, I'll accept that for now as we've only sold 100,000 Beta copies of Mac OS X... but if were not over 1 million users by this time next year... I'll eat my words. ;-) >> It's been suggested that a build directory be created (/src/build ?) >> and that the target be place here. >I don't understand what you are proposing here; I can't imagine that >an extra directory level could cause a slowdown. moshez suggested this in his comment on the patch - moving the target to a seperate directory. I'm not sure of the implications of doing this however, and wondered if it might effect the running of the regression suite and the executable before it was installed. >A suggestion I would be open to: change the executable name during >build (currently a .exe suffix is added), but change it back (removing >the .exe suffix) during the install. That should be a small change to >the Makefile. You mean without using the -with-suffix command? That can probably be done... but based on my readings, I'd thought you reject it as not being "clean" and complicating the build process more than it should - not to mention renaming the executable behind the builder's back... Lesser of two evils I guess - I'll investigate this however... >> I'd like to see Python be easily built on "out of the box"... >> [and] incorporated into the base Mac OS X installation... > >Just get Apple to include Python with their standard distribution and >nobody will *have* to build Python on Mac OSX. :-) Easier said that done as they already have the other P language installed. ;-) But then on the other hand, there are quite a few Pythonatic including me who use it in daily work at Apple. As I mentioned, the road to getting it in Mac OS X begins with getting it to build cleanly with the automated build system... so I've got to get this problem fixed before I start working on getting it in the build. - Dan (yes, I work for Apple, but this is something that I'm doing on my own!) From mwh21@cam.ac.uk Sun Jan 14 00:41:35 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 14 Jan 2001 00:41:35 +0000 Subject: [Python-Dev] a readline replacement? In-Reply-To: Michael Hudson's message of "17 Dec 2000 18:18:24 +0000" References: <200012142319.MAA02140@s454.cosc.canterbury.ac.nz> <3A39ED07.6B3EE68E@lemburg.com> <14906.17412.221040.895357@anthem.concentric.net> <20001215040304.A22056@glacier.fnational.com> <20001215235425.A29681@xs4all.nl> Message-ID: Michael Hudson writes: > It wouldn't be particularly hard to rewrite editline in Python (we > have termios & the terminal handling functions in curses - and even > ioctl if we get really keen). > > I've been hacking on my own Python line reader on and off for a while; > it's still pretty buggy, but if you're feeling brave you could look at: > > http://www-jcsu.jesus.cam.ac.uk/~mwh21/hacks/pyrl-0.0.0.tar.gz As I secretly planned , the embarrassment of having code that full of holes publicly accessible spurred me to writing a much better version, to be found at: http://www-jcsu.jesus.cam.ac.uk/~mwh21/hacks/pyrl-0.2.0.tar.gz (or, now rsync works there again, in the equivalent place on the starship...). If you unpack it and execute $ python python_reader.py you should get something that closely mimics the current interpreter top level. It supports a wide range of cursor motion commands, built-in support for multiple line input and history (including incremental search). It doesn't do completion, basically because I haven't got round to it yet, and it will get into severe trouble if you enter an input that is taller than your terminal (I think this should be surmountable, but I haven't gotten round to this either). Another thing that I haven't gotten round to yet is documentation. After I've tackled these points I'll probably stick it up on parnassus. I've been using it as my standard python shell for a week or so, and quite like it, though the lack of completion is a drag. It is probably staggeringly unportable, so I'd appreciate finding out how it breaks on systems other that Linux with terminals other than xterms... Have the changes to enable use of editline been checked in yet? I worry that the licensing situation around the readline module is grey at best... Cheers, M. -- That's why the smartest companies use Common Lisp, but lie about it so all their competitors think Lisp is slow and C++ is fast. (This rumor has, however, gotten a little out of hand. :) -- Erik Naggum, comp.lang.lisp From esr@thyrsus.com Sun Jan 14 00:58:08 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 19:58:08 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 06:59:44PM -0500 References: <20010113171528.A17480@thyrsus.com> Message-ID: <20010113195808.B17712@thyrsus.com> Tim Peters : > If you throw almost everything out of Unix diff, that's what you'll be left > with. Offhand I don't know of enencumbered, industrial-strength C source; a > problem is that writing a program to compute this is a std homework exercise > (it's a common first "dynamic programming" example), so you can find tons of > bad C source. I found some formal descriptions of the algorithm and some unencumbered Oberon source. I'm coding up C now. It's not complicated if you're willing to hold the cost matrix in memory, which is reasonable for a string comparator in a way it wouldn't be for a file diff. > Caution: many people want small variations of "edit distance", usually via > assigning different weights to insertions, replacements and deletions. A > less common but still popular variant is to say that a transposition ("xy" > vs "yx") is less costly than a delete plus an insert. Etc. "edit distance" > is really a family of algorithms. Which about collapse into one if your function has three weight arguments for insert/replace/delete weights, as mine does. It don't get more general than that -- I can see that by looking at the formal description. OK, so I'll give you that I don't weight transpositions separately, but neither does any other variant I found on the web nor the formal descriptions. A fourth optional weight agument someday, maybe :-). > God forbid that core Python may lose the commercial OCR developer market > . It's not accepted that for every field F, core Python needs to > supply the algorithms F uses heavily. That's not my point -- I don't see OCR as a big Python market either. My point in observing that OCR uses Ratcliff/Obershelp heavily was simplty to show that it's a well-established algorithm, not `controversial'. > Heck, core Python doesn't even ship > with an FFT! Doesn't bother the folks working in signal processing. It probably won't surprise you that I considered writing an FFT extension module at one point :-). > > Tim, this isn't true. Any time you need to validate user input > > against a controlled vocabulary and give feedback on probable right > > choices, > > Which is something few apps need anyway I fundamentally disagree. Few application designers *know* they need it, but user interfaces would get a hell of a lot better if the technique were more commonly applied -- and that's why I want it in the Python library, so doing the right thing in Python will be a minimum-effort proposition. -- Eric S. Raymond What if you were an idiot, and what if you were a member of Congress? But I repeat myself. -- Mark Twain From tim.one@home.com Sun Jan 14 03:17:34 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 13 Jan 2001 22:17:34 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <14944.52328.558763.46161@cj42289-a.reston1.va.home.com> Message-ID: [Fred] > Did you ever write documentation for it? ;-) A lot more than you did . just-show-me-"write-docs"-in-my-job-description-ly y'rs - tim From tim.one@home.com Sun Jan 14 04:39:59 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 13 Jan 2001 23:39:59 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113195808.B17712@thyrsus.com> Message-ID: [Eric, on "edit distance"] > I found some formal descriptions of the algorithm and some > unencumbered Oberon source. I'm coding up C now. It's not > complicated if you're willing to hold the cost matrix in memory, > which is reasonable for a string comparator in a way it wouldn't > be for a file diff. All agreed, and it should be a straightforward task then. I'm assuming it will work with Unicode strings too . [on differing weights] > Which about collapse into one if your function has three weight > arguments for insert/replace/delete weights, as mine does. It don't > get more general than that -- I can see that by looking at the formal > description. > > OK, so I'll give you that I don't weight transpositions separately, > but neither does any other variant I found on the web nor the formal > descriptions. A fourth optional weight agument someday, maybe :-). > ... > and that's why I want it in the Python library, so doing the right > thing in Python will be a minimum-effort proposition. Guido will depart from you at a different point. I depart here: it's not "the right thing". It's a bunch of hacks that appeal not because they solve a problem, but because they're cute algorithms that are pretty easy to implement and kinda solve part of a problem. "The right thing"-- which you can buy --at least involves capturing a large base of knowledge about phonetics and spelling. In high school, one of my buddies was Dan Pryzbylski. If anyone who knew him (other than me ) were to type his name into the class reunion guy's web page, they'd probably spell it the way they remember him pronouncing it: sha-bill-skey (and that's how he pronounced "Dan" ). If that hit on the text string "Pryzbylski", *then* it would be "the right thing" in a way that makes sense to real people, not just to implementers. Working six years in commercial speech recog really hammered that home to me: 95% solutions are on the margin of unsellable, because an error one try in 20 is intolerable for real people. Developers writing for developers get "whoa! cool!" where my sisters walk away going "what good is that?". Edit distance doesn't get within screaming range of 95% in real life. Even for most developers, it would be better to package up the single best approach you've got (f(list, word) -> list of possible matches sorted in confidence order), instead of a module with 6 (or so) functions they don't understand and a pile of equally mysterious knobs. Then it may actually get used! Developers of the breed who would actually take the time to understand what you've done are, I suggest, similar to us: they'd skim the docs, ignore the code, and write their own variations. Or, IOW: > so doing the right thing in Python will be a minimum-effort > proposition. Make someone think first, and 95% of developers will just skip over it too. BTW, the theoretical literature ignored transposition at first, because it didn't fit well in the machinery. IIRC, I first read about it in an issue of SP&E (Software Practice & Experience), where the authors were forced into it because the "traditional" edit sequence measure sucked in their practice. They were much happier after taking transposition into account. The theoreticians have more than caught up since, and research is still active; e.g., 1997's PATTERN RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, DELETIONS AND GENERALIZED TRANSPOSITIONS B. J. Oommen and R. K. S. Loke http://www.scs.carleton.ca/~oommen/papers/GnTrnsJ2.PDF is a good read. As they say there, If one views the elements of the confusion matrices as probabilities, this [treating each character independent of all others, as "edit distance" does] is equivalent to assuming that the transformation probabilities at each position in the string are statistically independent and possess first-order Markovian characteristics. This model is usually assumed for simplicity rather it [sic] having any statistical significance. IOW, because it's easy to analyze, not because it solves a real problem -- and they're complaining about an earlier generalization of edit distance that makes the weights depend on the individual symbols involved as well as on the edit/delete/insert distinction (another variation trying to make this approach genuinely useful in real life). The Oommen-Loke algorithm appears much more realistic, taking into account the observed probabilities of mistyping specific letter pairs (although it still ignores phonetics), and they report accuracies approaching 98% in correctly identifying mangled words. 98% (more than twice as good as 95% -- the error rate is actually more useful to think about, 2% vs 5%) is truly useful for non-geek end users, and the state of the art here is far beyond what's easy to find and dead easy to implement. > ... > It probably won't surprise you that I considered writing an FFT > extension module at one point :-). Nope! More power to you, Eric. At least FFTs *are* state of the art, although *coding* them optimally is likely beyond human ability on modern machines: http://www.fftw.org/ (short course: they've generally got the fastest FFTs available, and their code is generated by program, systematically *trying* every trick in the book, timing it on a given box, and synthesizing a complete strategy out of the quickest pieces). sooner-or-later-the-only-code-real-people-will-use-won't-be-written- by-people-at-all-ly y'rs - tim From tim.one@home.com Sun Jan 14 05:38:52 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 00:38:52 -0500 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: <0G7400EZQM2TXD@mta5.snfc21.pbi.net> Message-ID: [Dan Wolfe] > ... > As I mentioned, the road to getting it in Mac OS X begins with > getting it to build cleanly with the automated build system... so > I've got to get this problem fixed before I start working on > getting it in the build. > > - Dan > (yes, I work for Apple, but this is something that I'm doing > on my own!) Hang in there, Dan! I did the first Python port to the KSR-1 on my own time too, despite working for the visionless bastards at the time. The rest is history: the glory, the fame, the riches, the groupies, the adulation of my peers. We won't mention the financial scandal and subsequent bankruptcy lest it discourage you for no good reason . BTW, "do the simplest thing that can possibly work"! It's OK if it's a little ugly. Better that than force hundreds of Python-builders to get divorced from a decade-old directory naming scheme. From esr@thyrsus.com Sun Jan 14 07:08:57 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 02:08:57 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 11:39:59PM -0500 References: <20010113195808.B17712@thyrsus.com> Message-ID: <20010114020857.E19782@thyrsus.com> Tim Peters : > All agreed, and it should be a straightforward task then. I'm assuming it > will work with Unicode strings too . Thought about that. Want to get it working for 8 bits first. > Guido will depart from you at a different point. I depart here: it's not > "the right thing". It's a bunch of hacks that appeal not because they solve > a problem, but because they're cute algorithms that are pretty easy to > implement and kinda solve part of a problem. Again, my experience says differently. I have actually *used* Ratcliff-Obershelp to implement Do What I Mean (actually, Tell Me What I Mean) -- and had it work very well for non-geek users. That's why I want other Python programmers to have easy access to the capability. > Working six years in commercial speech recog really hammered that home to > me: 95% solutions are on the margin of unsellable, because an error one try > in 20 is intolerable for real people. Developers writing for developers get > "whoa! cool!" where my sisters walk away going "what good is that?". Edit > distance doesn't get within screaming range of 95% in real life. I suspect your speech recognition experience has given you an unhelpful bias. For English, what you say is certainly true -- but that's a gross worst-case application of R/O and Levenshtein that I'm not interested in pursuing. Nor do I expect Python hackers to use my module for that. Where techniques like Ratcliff-Obershelp really shine (and what I expect the module to be used for) is with controlled vocabularies such as command interfaces. These tend to have better orthogonality than NL, so antinoise filtering by R/O or Levenshtein distance (a kindred technique I somehow didn't learn until today -- there are disadvantages to being an autodidact) can really go to town on them. (Actually, my gut after thinking about both algorithms hard is that R/O is still a better technique than Levenshtein for the kind of application I have in mind. But I also suspect the difference is marginal.) (Other good uses for algorithms in this class include cladistics and genomic analysis.) > Even for most developers, it would be better to package up the single best > approach you've got (f(list, word) -> list of possible matches sorted in > confidence order), instead of a module with 6 (or so) functions they don't > understand and a pile of equally mysterious knobs. That's why good documentation, with motivating usage hints, is important. I write good documentation, Tim. > PATTERN RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, > DELETIONS AND GENERALIZED TRANSPOSITIONS > B. J. Oommen and R. K. S. Loke > http://www.scs.carleton.ca/~oommen/papers/GnTrnsJ2.PDF Thanks for the pointer; I've downloaded it and will read it. If the description of Ooomen's algorithm is good enough, I'll implement it and add it to the module. -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From dkwolfe@pacbell.net Sun Jan 14 07:48:51 2001 From: dkwolfe@pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 23:48:51 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Message-ID: <0G75009ZD6UYYE@mta5.snfc21.pbi.net> --Apple-Mail-1687604877-3 content-transfer-encoding: 7bit content-type: text/plain; format=flowed; charset=us-ascii On Saturday, January 13, 2001, at 09:38 PM, Tim Peters wrote: > [Dan Wolfe] >> ... >> As I mentioned, the road to getting it in Mac OS X begins with >> getting it to build cleanly with the automated build system... so >> I've got to get this problem fixed before I start working on >> getting it in the build. >> >> - Dan >> (yes, I work for Apple, but this is something that I'm doing >> on my own!) > > Hang in there, Dan! I did the first Python port to the KSR-1 on my own > time > too, despite working for the visionless bastards at the time. Well, I won't go that far..... some of them are quite visionaries (I can't stop drooling over a Ti portable....). > The rest is > history: the glory, the fame, the riches, the groupies, the adulation > of my > peers. We won't mention the financial scandal and subsequent bankruptcy > lest it discourage you for no good reason . You left out the part where they turn ya into a timbot... > BTW, "do the simplest thing that can possibly work"! It's OK if it's a > little ugly. Better that than force hundreds of Python-builders to get > divorced from a decade-old directory naming scheme. Well the mv Python to PyCore was the simplest... but obviously the most painful.... The longer ugly fix is working but it's such a hack that I'd rather not show it off...I need to fix it so that it allow nice things such allowing the -with-suffix to be used...and then testing all the edge cases such as clobber, etc so that I don't break anything. :-) appreciating-your-note-after-attempting-to-understand-makefiles-on-Saturday-night' ly yours, - Dan --Apple-Mail-1687604877-3 content-transfer-encoding: quoted-printable content-type: text/enriched; charset=us-ascii On Saturday, January 13, 2001, at 09:38 PM, Tim Peters wrote: [Dan Wolfe] ... As I mentioned, the road to getting it in Mac OS X begins with getting it to build cleanly with the automated build system... so I've got to get this problem fixed before I start working on getting it in the build. - Dan (yes, I work for Apple, but this is something that I'm doing on my own!) Hang in there, Dan! I did the first Python port to the KSR-1 on my own time too, despite working for the visionless bastards at the time. =20 Well, I won't go that far..... some of them are quite visionaries (I can't stop drooling over a Ti portable....). The rest is history: the glory, the fame, the riches, the groupies, the adulation of my peers. We won't mention the financial scandal and subsequent bankruptcy lest it discourage you for no good reason <. You left out the part where they turn ya into a timbot... << 0000,0000,DEB7 BTW, "do the simplest thing that can possibly work"!=20 It's OK if it's a little ugly. Better that than force hundreds of Python-builders to get divorced from a decade-old directory naming scheme. Well the mv Python to PyCore was the simplest... but obviously the most painful.... The longer ugly fix is working but it's such a hack that I'd rather not show it off...I need to fix it so that it allow nice things such allowing the -with-suffix to be used...and then testing all the edge cases such as clobber, etc so that I don't break anything. :-) = appreciating-your-note-after-attempting-to-understand-makefiles-on-Saturda= y-night'ly yours, - Dan --Apple-Mail-1687604877-3-- From tim.one@home.com Sun Jan 14 10:45:53 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 05:45:53 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114020857.E19782@thyrsus.com> Message-ID: [Tim] >> ...It's a bunch of hacks that appeal not because they solve >> a problem, but because they're cute algorithms that are pretty >> easy to implement and kinda solve part of a problem. [Eric] > Again, my experience says differently. I have actually *used* > Ratcliff-Obershelp to implement Do What I Mean (actually, Tell Me What > I Mean) -- and had it work very well for non-geek users. That's why I > want other Python programmers to have easy access to the capability. > ... > Where techniques like Ratcliff-Obershelp really shine (and what I > expect the module to be used for) is with controlled vocabularies > such as command interfaces. Yet the narrower the domain, the less call for a library with multiple approaches. If R-O really shone for you, why bother with anything else? Seriously. You haven't used some (most?) of these. The core isn't a place for research modules either (note that I have no objection whatsoever to writing any module you like -- the only question here is what belongs in the core, and any algorithm *nobody* here has experience with in your target domain is plainly a poor *core* candidate for that reason alone -- we have to maintain, justify and explain it for years to come). > I suspect your speech recognition experience has given you an > unhelpful bias. Try to think of it as a helpfully different perspective <0.5 wink>. It's in favor of measuring error rate by controlled experiments, skeptical of intuition, and dismissive of anecdotal evidence. I may well agree you don't need all that heavy machinery if I had a clear definition of what problem it is you're trying to solve (I've learned it's not the kinds of problems *I* had in mind when I first read your description!). BTW, telephone speech recog requires controlled vocabularies because phone acoustics are too poor for the customary close-talking microphone approaches to work well enough. A std technique there is to build a "confusability matrix" of the words *in* the vocabulary, to spot trouble before it happens: if two words are acoustically confusable, it flags them and bounces that info back to the vocabulary designer. A similar approach should work well in your domain: if you get to define the cmd interface, run all the words in it pairwise through your similarity measure of choice, and dream up new words whenever a pair is "too close". That all but ensures that even a naive similarity algorithm will perform well (in telephone speech recog, the unconstrained error rate is up to 70% on cell phones; by constraining the vocabulary with the aid of confusability measures, we cut that to under 1%). > ... > (Actually, my gut after thinking about both algorithms hard is that > R/O is still a better technique than Levenshtein for the kind of > application I have in mind. But I also suspect the difference is > marginal.) So drop Levenshtein -- go with your best shot. Do note that they both (usually) consider a single transposition to be as much a mutation as two replacements (or an insert plus a delete -- "pure" Levenshtein treats those the same). What happens when the user doesn't enter an exact match? Does the kind of app you have in mind then just present them with a list of choices? If that's all (as opposed to, e.g., substituting its best guess for what the user actually typed and proceeding as if the user had given that from the start), then the evidence from studies says users are almost as pleased when the correct choice appears somewhere in the first three choices as when it appears as *the* top choice. A well-designed vocabulary can almost guarantee that happy result (note that most of the current research is aimed at the much harder job of getting the intended word into the #1 slot on the choice list). > (Other good uses for algorithms in this class include cladistics and > genomic analysis.) I believe you'll find current work in those fields has moved far beyond these simplest algorithms too, although they remain inspirational (for example, see "Protein Sequence Alignment and Database Scanning" at http://barton.ebi.ac.uk/papers/rev93_1/rev93_1.html Much as in typing, some mutations are more likely than others for *physical* reasons, so treating all pairs of symbols in the alphabet alike is too gross a simplification.). >> Even for most developers, it would be better to package up the >> single best approach you've got (f(list, word) -> list of possible >> matches sorted in confidence order), instead of a module with 6 >> (or so) functions they don't understand and a pile of equally >> mysterious knobs. > That's why good documentation, with motivating usage hints, is > important. I write good documentation, Tim. You're not going to find offense here even if you look for it, Eric : while only a small percentage of developers don't read docs at all, everyone else spaces out at least in linear proportion to the length of the docs. Most people will be looking for "a solution", not for "a toolkit". If the docs read like a toolkit, it doesn't matter how good they are, the bulk of the people you're trying to reach will pass on it. If you really want this to be *used*, supply one class that does *all* the work, including making the expert-level choices of which algorithm is used under the covers and how it's tuned. That's good advice. I still expect Guido won't want it in the core before wide use is a demonstrated fact, though (and no, that's not a chicken-vs-egg thing: "wide use" for a thing outside the core is narrower than "wide use" for a thing in the core). An exception would likely get made if he tried it and liked it a lot. But to get it under his radar, it's again much easier if the usage docs are no longer than a couple paragraphs. I'll attach a tiny program that uses ndiff's SequenceMatcher to guess which of the 147 std 2.0 top-level library modules a user may be thinking of (and best I can tell, these are the same results case-folding R/O would yield): Module name? random Hmm. My best guesses are random, whrandom, anydbm (BTW, the first choice was an exact match) Module name? disect Hmm. My best guesses are bisect, dis, UserDict Module name? password Hmm. My best guesses are keyword, getpass, asyncore Module name? chitchat Hmm. My best guesses are whichdb, stat, asynchat Module name? xml Hmm. My best guesses are xmllib, mhlib, xdrlib [So far so good] Module name? http Hmm. My best guesses are httplib, tty, stat [I was thinking of httplib, but note that it missed SimpleHTTPServer: a name that long just isn't going to score high when the input is that short] Module name? dictionary Hmm. My best guesses are Bastion, ConfigParser, tabnanny [darn, I *think* I was thinking of UserDict there] Module name? uuencode Hmm. My best guesses are code, codeop, codecs [Missed uu] Module name? parse Hmm. My best guesses are tzparse, urlparse, pre Module name? browser Hmm. My best guesses are webbrowser, robotparser, user Module name? brower Hmm. My best guesses are webbrowser, repr, reconvert Module name? Thread Hmm. My best guesses are threading, whrandom, sched Module name? pickle Hmm. My best guesses are pickle, profile, tempfile (BTW, the first choice was an exact match) Module name? shelf Hmm. My best guesses are shelve, shlex, sched Module name? katmandu Hmm. My best guesses are commands, random, anydbm [I really was thinking of "commands"!] Module name? temporary Hmm. My best guesses are tzparse, tempfile, fpformat So it gets what I was thinking of into the top 3 very often, and despite some wildly poor guesses at the correct spelling -- you'd *almost* think it was doing a keyword search, except the *unintended* choices on the list are so often insane . Something like that may be a nice addition to Paul/Ping's help facility someday too. Hard question: is that "good enough" for what you want? Checking against 147 things took no perceptible time, because SequenceMatcher is already optimized for "compare one thing against N", doing preprocessing work on the "one thing" that greatly speeds the N similarity computations (I suspect you're not -- yet). It's been tuned and tested in practice for years; it works for any sequence type with hashable elements (so Unicode strings are already covered); it works for long sequences too. And if R-O is the best trick we've got, I believe it already does it. Do we need more? Of course *I'm* not convinced we even need *it* in the core, but packaging a match-1-against-N class is just a few minutes' editing of what follows. something-to-play-with-anyway-ly y'rs - tim NDIFFPATH = "/Python20/Tools/Scripts" LIBPATH = "/Python20/Lib" import sys, os sys.path.append(NDIFFPATH) from ndiff import SequenceMatcher modules = {} # map lowercase module stem to module name for f in os.listdir(LIBPATH): if f.endswith(".py"): f = f[:-3] modules[f.lower()] = f def match(fname, numchoices=3): lower = fname.lower() s = SequenceMatcher() s.set_seq2(lower) scores = [] for lowermod, mod in modules.items(): s.set_seq1(lowermod) scores.append((s.ratio(), mod)) scores.sort() scores.reverse() return modules.has_key(lower), [x[1] for x in scores[:numchoices]] while 1: name = raw_input("Module name? ") is_exact, choices = match(name) print "Hmm. My best guesses are", ", ".join(choices) if is_exact: print "(BTW, the first choice was an exact match)" From esr@thyrsus.com Sun Jan 14 12:15:33 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 07:15:33 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 05:45:53AM -0500 References: <20010114020857.E19782@thyrsus.com> Message-ID: <20010114071533.A5812@thyrsus.com> Tim Peters : > Yet the narrower the domain, the less call for a library with multiple > approaches. If R-O really shone for you, why bother with anything else? Well, I was bothering with Levenshtein because *you* suggested it. :-) I put in Hamming similarity and stemming because they're O(n) where R/O is quadratic, and both widely used in situations where a fast sloppy job is preferable to a good but slow one. My documentation page is explicit about the tradeoff. > Seriously. You haven't used some (most?) of these. I've used stemming and R-O. Haven't used Hamming or Levenshtein. > The core isn't a place > for research modules either (note that I have no objection whatsoever to > writing any module you like -- the only question here is what belongs in the > core, and any algorithm *nobody* here has experience with in your target > domain is plainly a poor *core* candidate for that reason alone -- we have > to maintain, justify and explain it for years to come). Fair point. I read it, in this context, as good advice to drop the Hamming entry point and forget about the Levenshtein implementation -- stick to what I've used and know is useful as opposed to what I think might be useful. > I may well agree you don't > need all that heavy machinery if I had a clear definition of what problem it > is you're trying to solve (I've learned it's not the kinds of problems *I* > had in mind when I first read your description!). I think you have it by now, judging by the following... > What happens when the user doesn't enter an exact match? Does the kind of > app you have in mind then just present them with a list of choices? Yes. I've used this technique a lot. It gives users not just guidance but warm fuzzy feelings -- they react as though there's a friendly homunculus inside the software looking out for them. Actually, in my experience, the less techie they are the more they like this. > If that's all (as opposed to, e.g., substituting its best guess for what the > user actually typed and proceeding as if the user had given that from the > start), then the evidence from studies says users are almost as pleased when > the correct choice appears somewhere in the first three choices as when it > appears as *the* top choice. Interesting. That does fit what I've seen. > A well-designed vocabulary can almost > guarantee that happy result (note that most of the current research is aimed > at the much harder job of getting the intended word into the #1 slot on the > choice list). Yes. One of my other tricks is to design command vocabularies so the first three characters close to unique. This means R/O will almost always nail the right thing. > Much as in typing, some mutations are more likely than others for *physical* > reasons, so treating all pairs of symbols in the alphabet alike is too gross > a simplification.). Indeed. Couple weeks ago I was a speaker at a conference called "After the Genome 6" at which one of the most interesting papers was given by a lady mathematician who designs algorithms for DNA sequence matching. She made exactly this point. > > That's why good documentation, with motivating usage hints, is > > important. I write good documentation, Tim. > > You're not going to find offense here even if you look for it, Eric : No worries, I wasn't looking. :-) > Most people will be looking for "a solution", not for "a toolkit". If the > docs read like a toolkit, it doesn't matter how good they are, the bulk of > the people you're trying to reach will pass on it. If you really want this > to be *used*, supply one class that does *all* the work, including making > the expert-level choices of which algorithm is used under the covers and how > it's tuned. That's good advice. I don't think that's possible in this case -- the proper domains for stemming and R-O are too different. But maybe this is another nudge to drop the Hamming code. > But to get it under his radar, it's again much easier if the usage > docs are no longer than a couple paragraphs. How's this? \section{\module{simil} -- String similarily metrics} \declaremodule{standard}{simil} \moduleauthor{Eric S. Raymond}{esr@thyrsus.com} \modulesynopsis{String similarity metrics.} \sectionauthor{Eric S. Raymond} The \module{simil} module provides similarity functions for approximate word or string matching. One important application is for checking input words against a dictionary to match possible misspellings with the right terms in a controlled vocabulary. The entry points provide different tradeoffs ranging from crude and fast (stemming) to effective but slow (Ratcliff-Obershelp gestalt subpattern matching). The latter is one of the standard techniques used in commercial OCR software. The \module{simil} module defines the following functions: \begin{funcdesc}{stem}{} Returns the length of the longest common prefix of two strings divided by the length of the longer. Similarity scores range from 0.0 (no common prefix) to 1.0 (identity). Running time is linear in string length. \end{funcdesc} \begin{funcdesc}{hamming}{} Computes a normalized Hamming similarity between two strings of equal length -- the number of pairwise matches in the strings, divided by their common length. It returns None if the strings are of unequal length. Similarity scores range from 0.0 (no positions equal) to 1.0 (identity). Running time is linear in string length. \end{funcdesc} \begin{funcdesc}{ratcliff}{} Returns a Ratcliff/Obershelp gestalt similarity score based on co-occurrence of subpatterns. Similarity scores range from 0.0 (no common subpatterns) to 1.0 (identity). Running time is best-case linear, worst-case quadratic in string length. \end{funcdesc} > Module name? http > Hmm. My best guesses are httplib, tty, stat > > [I was thinking of httplib, but note that it missed > SimpleHTTPServer: a name that long just isn't going to score > high when the input is that short] >>> simil.ratcliff("http", "httplib") 0.72727274894714355 >>> simil.ratcliff("http", "tty") 0.57142859697341919 >>> simil.ratcliff("http", "stat") 0.5 >>> simil.ratcliff("http", "simplehttpserver") 0.40000000596046448 So with the 0.6 threshold I normally use R-O does better at eliminating the false matches but doesn't catch SimpleHTTPServer (case is, I'm sure you'll agree, an irrelevant detail here). > Module name? dictionary > Hmm. My best guesses are Bastion, ConfigParser, tabnanny > > [darn, I *think* I was thinking of UserDict there] >>> simil.ratcliff("dictionary", "bastion") 0.47058823704719543 >>> simil.ratcliff("dictionary", "configparser") 0.45454546809196472 >>> simil.ratcliff("dictionary", "tabnanny") 0.4444444477558136 >>> simil.ratcliff("dictionary", "userdict") 0.4444444477558136 R-O would have booted all of these. Hiighest score to configparser. Interesting -- I'm beginning to think R-O overweights lots of small subpattern matches relative to a few big ones, something I didn't notice before because the statistics of my vocabularies masked it. > Module name? uuencode > Hmm. My best guesses are code, codeop, codecs >>> simil.ratcliff("uuencode", "code") 0.66666668653488159 >>> simil.ratcliff("uuencode", "codeops") 0.53333336114883423 >>> simil.ratcliff("uuencode", "codecs") 0.57142859697341919 >>> simil.ratcliff("uuencode", "uu") 0.40000000596046448 R-O would pick "code" and boot the rest. > [Missed uu] > > Module name? parse > Hmm. My best guesses are tzparse, urlparse, pre >>> simil.ratcliff("parse", "tzparse") 0.83333331346511841 >>> simil.ratcliff("parse", "urlparse") 0.76923078298568726 >>> simil.ratcliff("parse", "pre") 0.75 Same result. > Module name? browser > Hmm. My best guesses are webbrowser, robotparser, user >>> simil.ratcliff("browser", "webbrowser") 0.82352942228317261 >>> simil.ratcliff("browser", "robotparser") 0.55555558204650879 >>> simil.ratcliff("browser", "user") 0.54545456171035767 Big win for R-O. Picks the right one, boots the wrong two. > Module name? brower > Hmm. My best guesses are webbrowser, repr, reconvert >>> simil.ratcliff("brower", "webbrowser") 0.75 >>> simil.ratcliff("brower", "repr") 0.60000002384185791 >>> simil.ratcliff("brower", "reconvert") 0.53333336114883423 Small win for R/O -- boots reconvert, and repr squeaks in under the wire. > Module name? Thread > Hmm. My best guesses are threading, whrandom, sched >>> simil.ratcliff("thread", "threading") 0.80000001192092896 >>> simil.ratcliff("thread", "whrandom") 0.57142859697341919 >>> simil.ratcliff("thread", "sched") 0.54545456171035767 Big win for R-O. > Module name? pickle > Hmm. My best guesses are pickle, profile, tempfile >>> simil.ratcliff("pickle", "pickle") 1.0 >>> simil.ratcliff("pickle", "profile") 0.61538463830947876 >>> simil.ratcliff("pickle", "tempfile") 0.57142859697341919 R-O wins again. > (BTW, the first choice was an exact match) > Module name? shelf > Hmm. My best guesses are shelve, shlex, sched >>> simil.ratcliff("shelf", "shelve") 0.72727274894714355 >>> simil.ratcliff("shelf", "shlex") 0.60000002384185791 >>> simil.ratcliff("shelf", "sched") 0.60000002384185791 Interesting. Shelve scoores highest, both the others squeak in. > Module name? katmandu > Hmm. My best guesses are commands, random, anydbm > > [I really was thinking of "commands"!] >>> simil.ratcliff("commands", "commands") 1.0 >>> simil.ratcliff("commands", "random") 0.4285714328289032 >>> simil.ratcliff("commands", "anydbm") 0.4285714328289032 R-O wins big. > Module name? temporary > Hmm. My best guesses are tzparse, tempfile, fpformat >>> simil.ratcliff("temporary", "tzparse") 0.5 >>> simil.ratcliff("temporary", "tempfile") 0.47058823704719543 >>> simil.ratcliff("temporary", "fpformat") 0.47058823704719543 R-O boots all of these. > Hard question: is that "good enough" for what you want? Um...notice that R-O filtering, even though it seems to be underweighting large matches, did a rather better job on your examples! With an 0.66 threshold it would have done *much* better. I think you've just made an argument for replacing your SequenceMatcher with simil.ratcliff. Mine's even documented. :-). -- Eric S. Raymond Militias, when properly formed, are in fact the people themselves and include all men capable of bearing arms. [...] To preserve liberty it is essential that the whole body of the people always possess arms and be taught alike, especially when young, how to use them. -- Senator Richard Henry Lee, 1788, on "militia" in the 2nd Amendment From ping@lfw.org Sun Jan 14 12:38:42 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 04:38:42 -0800 (PST) Subject: [Python-Dev] Why both r'' and R'', u'' and U''? Message-ID: Sorry i'm being forgetful -- could someone please refresh my memory: Was there a good reason for allowing both lowercase and capital 'r' as a prefix for raw-strings? I assume that the availability of both r'' and R'' is what led to having both u'' and U''. Is there any good reason for that either? This just seems to lead to ambiguity and unneeded complexity: more cases in tokenize.py, more cases in tokenize.c, more work for IDLE, more annoying when searching for u' in your editor. (I was about to fix the lack of u'' support in tokenize.py and that made me think about this.) What happened to TOOWTDI? Would you believe we now have 36 different ways of starting a string: ' " ''' """ r' r" r''' r""" u' u" u''' u""" ur' ur" ur''' ur""" R' R" R''' R""" U' U" U''' U""" uR' uR" uR''' uR""" Ur' Ur" Ur''' Ur""" UR' UR" UR''' UR""" Would it be outrageous to suggest deprecating the last five rows? -- ?!ng [1] We started with 4. Perl has (by my count) 381 ways of starting a string literal, so we're halfway there, logarithmically speaking. Perl has 757 if you count the fancier operators qx, qw, s, and tr. From mal@lemburg.com Sun Jan 14 13:33:29 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 14 Jan 2001 14:33:29 +0100 Subject: [Python-Dev] Why is soundex marked obsolete? References: Message-ID: <3A61AAA9.F6F1EA9F@lemburg.com> [Lots of talk about interesting algorithms for "human" pattern matching] I just want to add my 2 cents to the discussion: * Eric's package seems very useful for pattern matching, but that is a very specific domain -- not main stream * I would opt to create a neat distutils style package for it for people to install at their own liking (I would certainly like it :) * If wrapped up as a separate package, I'd suggest to add all known algorithms to the package and also make it Unicode aware. There are similar package for e.g. RNGs on Parnassus. BTW, are there less English centric "sounds alike" matchers around ? The NIST soundex algorithm as published on the internet: http://physics.nist.gov/cuu/Reference/soundex.html works fine for English texts, but other languages of course have different letter coding requirements (or even different alphabets). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Sun Jan 14 13:53:03 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 14 Jan 2001 14:53:03 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? References: Message-ID: <3A61AF3F.EE6DAB88@lemburg.com> Ka-Ping Yee wrote: > > Sorry i'm being forgetful -- could someone please refresh my memory: > > Was there a good reason for allowing both lowercase and capital 'r' > as a prefix for raw-strings? I assume that the availability of both > r'' and R'' is what led to having both u'' and U''. Right. > Is there any > good reason for that either? No idea... I have never used anything other than the lowercase versions. > This just seems to lead to ambiguity and unneeded complexity: > more cases in tokenize.py, more cases in tokenize.c, more work > for IDLE, more annoying when searching for u' in your editor. > (I was about to fix the lack of u'' support in tokenize.py and > that made me think about this.) > > What happened to TOOWTDI? > > Would you believe we now have 36 different ways of starting a string: > > ' " ''' """ > r' r" r''' r""" > u' u" u''' u""" > ur' ur" ur''' ur""" > R' R" R''' R""" > U' U" U''' U""" > uR' uR" uR''' uR""" > Ur' Ur" Ur''' Ur""" > UR' UR" UR''' UR""" > > Would it be outrageous to suggest deprecating the last five rows? No. + 1 on the idea. > -- ?!ng > > [1] We started with 4. Perl has (by my count) 381 ways of starting > a string literal, so we're halfway there, logarithmically speaking. > Perl has 757 if you count the fancier operators qx, qw, s, and tr. > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas@xs4all.net Sun Jan 14 14:24:08 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 14 Jan 2001 15:24:08 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: ; from ping@lfw.org on Sun, Jan 14, 2001 at 04:38:42AM -0800 References: Message-ID: <20010114152408.G1005@xs4all.nl> On Sun, Jan 14, 2001 at 04:38:42AM -0800, Ka-Ping Yee wrote: > [1] We started with 4. Perl has (by my count) 381 ways of starting > a string literal, so we're halfway there, logarithmically speaking. > Perl has 757 if you count the fancier operators qx, qw, s, and tr. Don't forget 'qr//', which is quite like a raw string, except that Perl uses it to 'precompile' regular expressions as a side effect. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Sun Jan 14 17:08:28 2001 From: guido@python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 12:08:28 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Your message of "Sun, 14 Jan 2001 14:53:03 +0100." <3A61AF3F.EE6DAB88@lemburg.com> References: <3A61AF3F.EE6DAB88@lemburg.com> Message-ID: <200101141708.MAA11161@cj20424-a.reston1.va.home.com> > Ka-Ping Yee wrote: > > > > Sorry i'm being forgetful -- could someone please refresh my memory: > > > > Was there a good reason for allowing both lowercase and capital 'r' > > as a prefix for raw-strings? I assume that the availability of both > > r'' and R'' is what led to having both u'' and U''. > > Right. > > > Is there any > > good reason for that either? > > No idea... I have never used anything other than the lowercase > versions. It comes from the numeric literals. C allows 0x0 and 0X0, and 0L as well as 0l. So does Python (and also 0j == 0J). > > This just seems to lead to ambiguity and unneeded complexity: > > more cases in tokenize.py, more cases in tokenize.c, more work > > for IDLE, more annoying when searching for u' in your editor. > > (I was about to fix the lack of u'' support in tokenize.py and > > that made me think about this.) > > > > What happened to TOOWTDI? > > > > Would you believe we now have 36 different ways of starting a string: > > > > ' " ''' """ > > r' r" r''' r""" > > u' u" u''' u""" > > ur' ur" ur''' ur""" > > R' R" R''' R""" > > U' U" U''' U""" > > uR' uR" uR''' uR""" > > Ur' Ur" Ur''' Ur""" > > UR' UR" UR''' UR""" > > > > Would it be outrageous to suggest deprecating the last five rows? > > No. + 1 on the idea. Why bother? All that does is outdate a bunch of documentation. I don't see the extra effort in various parsers as a big deal. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Sun Jan 14 17:53:32 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Sun, 14 Jan 2001 18:53:32 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? Message-ID: <010f01c07e52$e9801fc0$e46940d5@hagrid> The name database portions of SF task 17335 ("add compressed unicode database") were postponed to 2.1. My current patch replaces the ~450k large ucnhash module with a new ~160k large module. (See earlier posts for more info on how the new database works). Should I check it in? From skip@mojam.com (Skip Montanaro) Sun Jan 14 17:51:52 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sun, 14 Jan 2001 11:51:52 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core Message-ID: <14945.59192.400783.403810@beluga.mojam.com> Ping's pydoc is awesome! Move it out of the sandbox and put it in the standard distribution. Biggest hook for me: 1. execute "pydoc -p 3200" 2. visit "http://localhost:3200/" 3. knock yourself out Skip From martin@mira.cs.tu-berlin.de Sun Jan 14 17:57:57 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 14 Jan 2001 18:57:57 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? Message-ID: <200101141757.f0EHvvt01407@mira.informatik.hu-berlin.de> > > Would it be outrageous to suggest deprecating the last five rows? > Why bother? All that does is outdate a bunch of documentation. He suggested to deprecate it, not to remove it. By the time it is removed, the documentation still mentioning it should be outdated for other reasons (e.g. the string module might have disappeared). In general, the rationale for deprecating things would be that the simplification will make everybody's life easier in the long run. In the case of a small change (such as this one), that advantage would be small. OTOH, the hassle for users that rely on the then-removed feature will be also small; I see it as quite unlikely that anybody uses that feature actively (although I do think that people use 0X10 and 100L; the latter is common since 100l is oft confused with 1001). Regards, Martin From tim.one@home.com Sun Jan 14 19:00:21 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 14:00:21 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114071533.A5812@thyrsus.com> Message-ID: Very quick (swamped): > I think you've just made an argument for replacing your > SequenceMatcher with simil.ratcliff. Actually, I'm certain they're the same algorithm now, except the C is showing through in ratcliff to the floating-point eye . For demonstration, I *always* printed the top three scorers (that's logic in the little driver I posted, not in SequenceMatcher), without any notion of cutoff (ndiff does use a cutoff). Add this line before the return (in the posted driver) to see the actual scores: print scores[:numchoices] For example: Module name? browser [(0.82352941176470584, 'webbrowser'), (0.55555555555555558, 'robotparser'), (0.54545454545454541, 'user')] Hmm. My best guesses are webbrowser, robotparser, user Module name? On this example you reported: >>> simil.ratcliff("browser", "webbrowser") 0.82352942228317261 >>> simil.ratcliff("browser", "robotparser") 0.55555558204650879 >>> simil.ratcliff("browser", "user") 0.54545456171035767 which strongly suggests you're using C floats instead of Python floats to compute the final score. I didn't try every example in your email, but it's the same story on the three I did try (scores identical modulo simil.ratcliff dropping about 30 of the low-order result bits -- which is about the difference between a C double and a C float on most boxes). > Mine's even documented. :-). Which I appreciate! I dreamt up the SequenceMatcher algorithm going on 20 years ago for a friendly diff generator, and never even considered using it for other purposes. But then I may have mentioned that these other purposes never come up in my apps . or-at-least-they-haven't-in-contexts-where-r/o-would-have-been- strong-enough-ly y'rs - tim From bckfnn@worldonline.dk Sun Jan 14 19:00:33 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Jan 2001 19:00:33 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <010f01c07e52$e9801fc0$e46940d5@hagrid> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: <3a61f12a.36601630@smtp.worldonline.dk> On Sun, 14 Jan 2001 18:53:32 +0100, you wrote: >The name database portions of SF task 17335 ("add >compressed unicode database") were postponed to >2.1. > >My current patch replaces the ~450k large ucnhash >module with a new ~160k large module. (See earlier >posts for more info on how the new database works). Do you have a link or an approx date of this earlier posts? I must have missed it. The patch on sourceforge seems a bit empty: https://sourceforge.net/patch/index.php?func=detailpatch&patch_id=100899&group_id=5470 As a result I invented my own compression format for the ucnhash for jython. I managed to achive ~100k but that probably have different performance properties. regards, finn From esr@thyrsus.com Sun Jan 14 19:09:01 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 14:09:01 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 02:00:21PM -0500 References: <20010114071533.A5812@thyrsus.com> Message-ID: <20010114140901.A6431@thyrsus.com> Tim Peters : > > I think you've just made an argument for replacing your > > SequenceMatcher with simil.ratcliff. > > Actually, I'm certain they're the same algorithm now, except the C is > showing through in ratcliff to the floating-point eye . Take a look: /***************************************************************************** * * Ratcliff-Obershelp common-subpattern similarity. * * This code first appeared in a letter to the editor in Doctor * Dobbs's Journal, 11/1988. The original article on the algorithm, * "Pattern Matching by Gestalt" by John Ratcliff, had appeared in the * July 1988 issue (#181) but the algorithm was presented in assembly. * The main drawback of the Ratcliff-Obershelp algorithm is the cost * of the pairwise comparisons. It is significantly more expensive * than stemming, Hamming distance, soundex, and the like. * * Running time quadratic in the data size, memory usage constant. * *****************************************************************************/ static int RatcliffObershelp(char *st1, char *end1, char *st2, char *end2) { register char *a1, *a2; char *b1, *b2; char *s1 = st1, *s2 = st2; /* initializations are just to pacify GCC */ short max, i; if (end1 <= st1 || end2 <= st2) return(0); if (end1 == st1 + 1 && end2 == st2 + 1) return(0); max = 0; b1 = end1; b2 = end2; for (a1 = st1; a1 < b1; a1++) { for (a2 = st2; a2 < b2; a2++) { if (*a1 == *a2) { /* determine length of common substring */ for (i = 1; a1[i] && (a1[i] == a2[i]); i++) continue; if (i > max) { max = i; s1 = a1; s2 = a2; b1 = end1 - max; b2 = end2 - max; } } } } if (!max) return(0); max += RatcliffObershelp(s1 + max, end1, s2 + max, end2); /* rhs */ max += RatcliffObershelp(st1, s1, st2, s2); /* lhs */ return max; } static float ratcliff(char *s1, char *s2) /* compute Ratcliff-Obershelp similarity of two strings */ { short l1, l2; l1 = strlen(s1); l2 = strlen(s2); /* exact match end-case */ if (l1 == 1 && l2 == 1 && *s1 == *s2) return(1.0); return 2.0 * RatcliffObershelp(s1, s1 + l1, s2, s2 + l2) / (l1 + l2); } static PyObject * simil_ratcliff(PyObject *self, PyObject *args) { char *str1, *str2; if(!PyArg_ParseTuple(args, "ss:ratcliff", &str1, &str2)) return NULL; return Py_BuildValue("f", ratcliff(str1, str2)); } -- Eric S. Raymond "Taking my gun away because I might shoot someone is like cutting my tongue out because I might yell `Fire!' in a crowded theater." -- Peter Venetoklis From fredrik@effbot.org Sun Jan 14 19:31:06 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Sun, 14 Jan 2001 20:31:06 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3a61f12a.36601630@smtp.worldonline.dk> Message-ID: <040e01c07e60$8c74d100$e46940d5@hagrid> finn wrote: > As a result I invented my own compression format for the ucnhash for > jython. I managed to achive ~100k but that probably have different > performance properties. here's the description: --- From: "Fredrik Lundh" Date: Sun, 16 Jul 2000 20:40:46 +0200 /.../ The unicodenames database consists of two parts: a name database which maps character codes to names, and a code database, mapping names to codes. * The Name Database (getname) First, the 10538 text strings are split into 42193 words, and combined into a 4949-word lexicon (a 29k array). Each word is given a unique index number (common words get lower numbers), and there's a "lexicon offset" table mapping from numbers to words (10k). To get back to the original text strings, I use a "phrase book". For each original string, the phrase book stores a a list of word numbers. Numbers 0-127 are stored in one byte, higher numbers (less common words) use two bytes. At this time, about 65% of the words can be represented by a single byte. The result is a 56k array. The final data structure is an offset table, which maps code points to phrase book offsets. Instead of using one big table, I split each code point into a "page number" and a "line number" on that page. offset = line[ (page[code>>SHIFT]< From tim.one@home.com Sun Jan 14 19:46:44 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 14:46:44 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <3A61AAA9.F6F1EA9F@lemburg.com> Message-ID: [M.-A. Lemburg] > BTW, are there less English centric "sounds alike" matchers > around ? Yes, but if anything there are far too many of them: like Soundex, they're just heuristics, and *everybody* who cares adds their own unique twists, while proper studies are almost non-existent. Few variants appear to be in use much beyond their inventor's friends; one notable exception in the Jewish community is the Daitch-Mokotoff variation, originally tailored to their unique needs but later generalized; a brief description here: http://www.avotaynu.com/soundex.html The similarly involved NYSIIS algorithm (New York State Identification Intelligence System -- look for NYSIIS on Parnassus) was the winner from a field of about two dozen competing algorithms, after measuring their effectiveness on assorted databases maintained by the state of New York. Since New York has a large immigrant population, NYSIIS isn't as Anglocentric as Soundex either. But state-of-the-art has given up on purely computational algorithms for these purposes: proper names are simply too much a mess. For example, if I search for "Richard", it *ought* to match on "Dick"; if my Arab buddy searches on "Mohammed", it *ought* to match on "Mhd"; "the rules" people actually use just aren't reducible to pure computation -- it takes a large knowledge base to capture what people "just know". You may enjoy visiting this commercial site (AFAIK, nobody is giving away state-of-the-art for free): http://www.las-inc.com/ > ... > http://physics.nist.gov/cuu/Reference/soundex.html > > works fine for English texts, If that were true, the English-speaking researchers would have declared victory 120 years ago . But English pronunciation is *notoriously* difficult to predict from spelling, partly because English is the Perl of human languages. or-maybe-the-borg-assuming-there's-a-difference-ly y'rs - tim From esr@thyrsus.com Sun Jan 14 20:17:53 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 15:17:53 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 02:46:44PM -0500 References: <3A61AAA9.F6F1EA9F@lemburg.com> Message-ID: <20010114151753.A6671@thyrsus.com> Tim Peters : > If that were true, the English-speaking researchers would have declared > victory 120 years ago . But English pronunciation is *notoriously* > difficult to predict from spelling, partly because English is the Perl of > human languages. Actually, according to the Oxford Encyclopedia of Linguistics, this is an urban myth. The orthography of English is, in fact, quite consistent; it looks much more wacked out than it is because the maddening irregularities are concentrated in the 400 most commonly used words. The situation is much like that with French verb forms -- most French verbs have a very regular inflection pattern, but the twenty or so exceptions are the most commonly used ones. In fact it's a general rule in language evolution that irregularities are preserved in common forms and not rare ones -- in the rare ones they get forgotten. American personal names are are problem precisely because they sometimes do *not* have English orthography. -- Eric S. Raymond "...quemadmodum gladius neminem occidit, occidentis telum est." [...a sword never kills anybody; it's a tool in the killer's hand.] -- (Lucius Annaeus) Seneca "the Younger" (ca. 4 BC-65 AD), From tim.one@home.com Sun Jan 14 20:31:06 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 15:31:06 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114140901.A6431@thyrsus.com> Message-ID: [Tim] > Actually, I'm certain they're the same algorithm now, except the C is > showing through in ratcliff to the floating-point eye . [Eric] > Take a look: Yup, same thing, except: > static float ratcliff(char *s1, char *s2) accounts for the numeric differences (change "float"->"double" and they'd be the same; Python has to convert it to a double anyway, lacking any internal support for C's floats; and the C code is *computing* in double regardless, cutting it back to a float upon return just because of the "float" decl). The code in SequenceMatcher doesn't *look* anything like it, though, due to years of dreaming up faster ways to do this (in its original role as a diff generator, it routinely had to deal with sequences containing 10s of thousands of elements, and code very much like the code you posted was just too slow for that). One simple trick that can enormously speed the worst cases: the "find the longest match starting here" innermost loop is guarded by > if (*a1 == *a2) However, it can't possibly find a *bigger* max unless it's also the case that a1[max) == a2[max) That's usually false in real life, so by adding that test to the guard you usually get to skip the innermost loop entirely. Probably more important in a diff-generator role, though. SequenceMatcher's prime trick is to preprocess one of the strings, in linear time building up a hash table mapping each character in the string to a list of the indices at which it appears. Then the second-innermost loop is saved from needing to do any search: when we get to, e.g., 'x' in the other string, the precomputed hash table tells us directly where to find all the x's in the original string. And in the match-1-against-N case, this hash table can be computed once & reused N times. That's a monster win. However, I never had the patience to code that in C, so I never *did* that before I reimplemented my stuff in Python. Now the Python ndiff runs circles around the old Pascal and C versions. I'm sure that has nothing to do with machines having gotten 100x faster in the meantime > for-short-1-against-1-matches-yours-will-certainly-be-quicker-ly y'rs - tim From guido@python.org Sun Jan 14 20:55:21 2001 From: guido@python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 15:55:21 -0500 Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: Your message of "Sun, 14 Jan 2001 11:51:52 CST." <14945.59192.400783.403810@beluga.mojam.com> References: <14945.59192.400783.403810@beluga.mojam.com> Message-ID: <200101142055.PAA13041@cj20424-a.reston1.va.home.com> > Ping's pydoc is awesome! Move it out of the sandbox and put it in the > standard distribution. > > Biggest hook for me: > > 1. execute "pydoc -p 3200" > 2. visit "http://localhost:3200/" > 3. knock yourself out Yes, wow! Now, if we could somehow get this to show both the docs that Fred maintains and the stuff that Ping extracts from the source code, that would be even better! (I think that Ping's stuff should also run on the python.org site, by the way.) --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Sun Jan 14 20:59:28 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 15:59:28 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 03:31:06PM -0500 References: <20010114140901.A6431@thyrsus.com> Message-ID: <20010114155928.A6793@thyrsus.com> Tim Peters : > [Tim] > > Actually, I'm certain they're the same algorithm now, except the C is > > showing through in ratcliff to the floating-point eye . > > [Eric] > > Take a look: > > Yup, same thing, except: > > > static float ratcliff(char *s1, char *s2) > > accounts for the numeric differences (change "float"->"double" and they'd be > the same; Python has to convert it to a double anyway, lacking any internal > support for C's floats; and the C code is *computing* in double regardless, > cutting it back to a float upon return just because of the "float" decl). OK, so the right answer is to make your version visible and documented in the library. -- Eric S. Raymond No one is bound to obey an unconstitutional law and no courts are bound to enforce it. -- 16 Am. Jur. Sec. 177 late 2d, Sec 256 From tim.one@home.com Sun Jan 14 21:01:19 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 16:01:19 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Message-ID: [?!ng] > [1] We started with 4. Na, *we* started with two, just ' and ". And at the time, I thought that was arguably one too many already . Allowing the modifiers to be case-insensitive seems to me much more Pythonic than the original sin of making ' and " mean the same thing. OTOH, if only " had been allowed at the start, we'd probably spell raw strings with ' today, and that doesn't really scream that they're so very different from " strings. leaving-this-one-be-ly y'rs - tim From barry@digicool.com Sun Jan 14 21:02:07 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Sun, 14 Jan 2001 16:02:07 -0500 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> Message-ID: <14946.5071.92879.789400@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> Ping's pydoc is awesome! Move it out of the sandbox and put SM> it in the standard distribution. SM> Biggest hook for me: | 1. execute "pydoc -p 3200" | 2. visit "http://localhost:3200/" | 3. knock yourself out Whoa. Awesome. From ping@lfw.org Sun Jan 14 21:01:45 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:01:45 -0800 (PST) Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: <200101141708.MAA11161@cj20424-a.reston1.va.home.com> Message-ID: On Sun, 14 Jan 2001, Guido van Rossum wrote: > > It comes from the numeric literals. C allows 0x0 and 0X0, and 0L as > well as 0l. So does Python (and also 0j == 0J). I just did a little test. Neither Python, Perl, nor Tcl support "\X66", only "\x66". Perl doesn't support 0X1234, only 0x1234. Tcl's "expr" routine does support 0X1234. Javascript supports 0X1234, but not "\X66". I'd bet that no one really relies on or expects the uppercase forms except L. -- ?!ng From ping@lfw.org Sun Jan 14 21:14:34 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:14:34 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <14942.29609.19618.534613@cj42289-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Fred L. Drake, Jr. wrote: > Ka-Ping Yee writes: > > My next two targets are: > > 1. Generating text from the HTML documentation files > > using Paul Prescod's stuff in onlinehelp.py. > > You mean the ones I publish as the standard documentation? Relying > on the structure of that HTML is pure folly! Paul's onlinehelp.py is using the HTMLParser and AbstractFormatter to turn HTML into text. It also contains paths to specific files, e.g. help('assert') looks for "ref/assert.html". Are you okay with this technique? Have you tried onlinehelp.py? I was planning to do the same to provide help on the language in pydoc. -- ?!ng From skip@mojam.com (Skip Montanaro) Sun Jan 14 21:26:48 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sun, 14 Jan 2001 15:26:48 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <200101142055.PAA13041@cj20424-a.reston1.va.home.com> References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> Message-ID: <14946.6552.542015.620760@beluga.mojam.com> Guido> Now, if we could somehow get this to show both the docs that Fred Guido> maintains and the stuff that Ping extracts from the source code, Guido> that would be even better! I had exactly the same thought. I suspect that if the install target were modified to install the html-ized sections of the lib reference manual pydoc could grovel around in sys and find the root of the library reference manual pretty easily. If not, it could simply redirect to the relevant section of http://www.python.org/doc/current/lib/. Skip From tim.one@home.com Sun Jan 14 21:45:48 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 16:45:48 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Message-ID: [?!ng] > ... > I'd bet that no one really relies on or expects the uppercase > forms except L. And 0X. I don't think it's in the std library, but I've certainly seen Python code do stuff like magic = 0XFEEDFACE Plus it's always good for a language to be able parse the stuff it prints, and "0X..." is generated by Python's %#X format code. Don't believe I've ever seen the "u" or "r" string modifiers in uppercase, though, but really don't see the harm in allowing that. From ping@lfw.org Sun Jan 14 21:50:43 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:50:43 -0800 (PST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <14946.5071.92879.789400@anthem.wooz.org> Message-ID: On Sun, 14 Jan 2001, Barry A. Warsaw wrote: > Whoa. Awesome. Thanks! Two things added recently: constants (any numbers, lists, tuples, strings, or types) in modules are shown; and packages are listed in the index as they should be. -- ?!ng From bckfnn@worldonline.dk Sun Jan 14 22:20:51 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Sun, 14 Jan 2001 22:20:51 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <040e01c07e60$8c74d100$e46940d5@hagrid> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3a61f12a.36601630@smtp.worldonline.dk> <040e01c07e60$8c74d100$e46940d5@hagrid> Message-ID: <3a622615.50148579@smtp.worldonline.dk> [/F] >here's the description: Thanks. >From: "Fredrik Lundh" >Date: Sun, 16 Jul 2000 20:40:46 +0200 > >/.../ > > The unicodenames database consists of two parts: a name > database which maps character codes to names, and a code > database, mapping names to codes. > >* The Name Database (getname) > > First, the 10538 text strings are split into 42193 words, > and combined into a 4949-word lexicon (a 29k array). I only added a word to the lexicon if it was used more than once and if the length was larger then the lexicon index. I ended up with 1385 entries in the lexicon. (a 7k array) > Each word is given a unique index number (common words get > lower numbers), and there's a "lexicon offset" table mapping > from numbers to words (10k). My lexicon offset table is 3k and I also use 4k on a perfect hash of the words. > To get back to the original text strings, I use a "phrase > book". For each original string, the phrase book stores a a > list of word numbers. Numbers 0-127 are stored in one byte, > higher numbers (less common words) use two bytes. At this > time, about 65% of the words can be represented by a single > byte. The result is a 56k array. Because not all words are looked up in the lexicon, I used the values 0-38 for the letters and number, 39-250 are used for one byte lexicon index, and 251-255 are combined with following byte to form a two byte. This also result in a 57k array So far it is only minor variations. > The final data structure is an offset table, which maps code > points to phrase book offsets. Instead of using one big > table, I split each code point into a "page number" and a > "line number" on that page. > > offset = line[ (page[code>>SHIFT]< > Since the unicode space is sparsely populated, it's possible > to split the code so that lots of pages gets no contents. I > use a brute force search to find the optimal SHIFT value. > > In the current database, the page table has 1024 entries > (SHIFT is 6), and there are 199 unique pages in the line > table. The total size of the offset table is 26k. > >* The code database (getcode) > > For the code table, I use a straight-forward hash table to store > name to code mappings. It's basically the same implementation > as in Python's dictionary type, but a different hash algorithm. > The table lookup loop simply uses the name database to check > for hits. > > In the current database, the hash table is 32k. I chose to split a unicode name into words even when looking up a unicode name. Each word is hashed to a lexicon index and a "phrase book string" is created. The sorted phrase book is then search with a binary search among 858 entries that can be address directly followed by a sequential search among 12 entries. The phrase book search index is 8k and a table that maps phrase book indexes to codepoints is another 20k. The searching I do makes jython slower then the direct calculation you do. I'll take another look at this after jython 2.0 to see if I can improve performance with your page/line number scheme and a total hashing of all the unicode names. regards, finn From ping@lfw.org Sun Jan 14 22:44:47 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 14:44:47 -0800 (PST) Subject: [Python-Dev] SourceForge and long patches Message-ID: Okay, this is getting really annoying. SourceForge won't accept any patches > 16k. Why not? Is there a way around this? SourceForge: Exiting with Error ERROR Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 I'm trying to submit the update to tokenize.py, but it's too long because i've changed test/output/test_tokenize and that's a big file. -- ?!ng From guido@python.org Sun Jan 14 22:58:03 2001 From: guido@python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 17:58:03 -0500 Subject: [Python-Dev] SourceForge and long patches In-Reply-To: Your message of "Sun, 14 Jan 2001 14:44:47 PST." References: Message-ID: <200101142258.RAA13606@cj20424-a.reston1.va.home.com> > Okay, this is getting really annoying. SourceForge won't accept > any patches > 16k. Why not? Is there a way around this? I have no idea why; can only assume it's a limitation in the database package they use. The standard workaround is to upload a URL pointing to the patch. :-( > SourceForge: Exiting with Error > > ERROR > > Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sun Jan 14 23:35:51 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 00:35:51 +0100 Subject: [Python-Dev] Where's Greg Ward ? Message-ID: <3A6237D7.673BBB30@lemburg.com> He seems to be offline and the people on the distutils list have some patches and other things which would be nice to have in distutils for 2.1. I suppose we could simply check in the patches, but we still want to get his OK on things before applying patches to the distutils tree. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one@home.com Sun Jan 14 23:57:45 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 14 Jan 2001 18:57:45 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <3A6237D7.673BBB30@lemburg.com> Message-ID: [MAL] > He seems to be offline and the people on the distutils list have > some patches and other things which would be nice to have in > distutils for 2.1. Greg's somewhere near the end of the process of moving from Virginia to Canada; I expect he'll become visible again Real Soon. > I suppose we could simply check in the patches, but we still want > to get his OK on things before applying patches to the distutils > tree. The distutils SIG could elect a Shadow Dictator in his place; if everyone agrees to vote for Andrew, you save the effort of counting votes . From tismer@tismer.com Mon Jan 15 01:35:57 2001 From: tismer@tismer.com (Christian Tismer) Date: Mon, 15 Jan 2001 02:35:57 +0100 Subject: [Python-Dev] Minor Bug-fix release for Stackless Python 2.0 Message-ID: <3A6253FD.E9B30462@tismer.com> Wolfgang Lipp reported that Microthreads were executing sequentially with SLP 2.0 . The bug fix is available on the website. Please use this new version, or microthreads will not give you much fun. http://www.stackless.com/spc20-win32.exe http://www.stackless.com/spc-src-010115.zip enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tommy@ilm.com Mon Jan 15 02:18:20 2001 From: tommy@ilm.com (Captain Senorita) Date: Sun, 14 Jan 2001 18:18:20 -0800 (PST) Subject: [Python-Dev] chomp()? In-Reply-To: <14923.31238.65155.496546@buffalo.fnal.gov> References: <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> Message-ID: <14946.23981.694472.406438@mace.lucasdigital.com> Charles G Waldman writes: | | P=NP (Python is not Perl) Is it too late to suggest this for the SPAM9 t-shirt? :) From guido@python.org Mon Jan 15 02:24:36 2001 From: guido@python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 21:24:36 -0500 Subject: [Python-Dev] chomp()? In-Reply-To: Your message of "Sun, 14 Jan 2001 18:18:20 PST." <14946.23981.694472.406438@mace.lucasdigital.com> References: <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> <14946.23981.694472.406438@mace.lucasdigital.com> Message-ID: <200101150224.VAA15254@cj20424-a.reston1.va.home.com> > Charles G Waldman writes: > | > | P=NP (Python is not Perl) > > Is it too late to suggest this for the SPAM9 t-shirt? :) By just about a day -- I haven't seen the new design yet, but Just & Eric were supposed to design it today and hand in the final proofs tomorrow. I believe the slogan will be "it fits your brain" (or "it fits my brain"). But if you print a bunch of P=NP shirts, I'm sure you can sell them with a profit, both in Long Beach and in San Diego (at the O'Reilly Open Source conference)... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Mon Jan 15 06:35:05 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 01:35:05 -0500 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <20010110101545.A21305@glacier.fnational.com> Message-ID: [Timmy] > At this point I'm +0.5 on the idea of fileobject.c using > ms_getline_hack whenever HAVE_GETC_UNLOCKED isn't available. [NeilS, from Wednesday] > Compare ms_getline_hack to what Perl does in order speed up IO. Believe me, I have . > I think its worth maintaining that piece of relatively portable > code given the benefit. If the code has to be maintained then it > might was well be used. If we find a platform the breaks we can > always disable it before the final release. Given that hearty encouragement, and the utterly non-scary results so far, I just checked in a new scheme: On a platform with getc_unlocked(): By default, use getc_unlocked(). If you want to use fgets() instead, #define USE_FGETS_IN_GETLINE. [so motivated people can use fgets() instead if it's faster on their platform] On a platform without getc_unlocked(): By default, use fgets(). If you don't want to use fgets(), #define DONT_USE_FGETS_IN_GETLINE. [so if we stumble into a platform it fails on between releases, the user will have an easy time turning it off themself] From gstein@lyra.org Mon Jan 15 07:18:20 2001 From: gstein@lyra.org (Greg Stein) Date: Sun, 14 Jan 2001 23:18:20 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Sat, Jan 13, 2001 at 08:55:35AM -0800 References: Message-ID: <20010114231820.C6081@lyra.org> On Sat, Jan 13, 2001 at 08:55:35AM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Lib > In directory usw-pr-cvs1:/tmp/cvs-serv14586 > > Modified Files: > httplib.py > Log Message: > SF Patch #103225 by Ping: httplib: smallest Python patch ever >... Not so small: >... > *** 333,337 **** > i = host.find(':') > if i >= 0: > ! port = int(host[i+1:]) > host = host[:i] > else: > --- 333,340 ---- > i = host.find(':') > if i >= 0: > ! try: > ! port = int(host[i+1:]) > ! except ValueError, msg: > ! raise socket.error, str(msg) > host = host[:i] > else: Did you intend to commit this? Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez@zadka.site.co.il Mon Jan 15 15:53:58 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 17:53:58 +0200 (IST) Subject: [Python-Dev] chomp()? In-Reply-To: <200101150224.VAA15254@cj20424-a.reston1.va.home.com> References: <200101150224.VAA15254@cj20424-a.reston1.va.home.com>, <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> <14946.23981.694472.406438@mace.lucasdigital.com> Message-ID: <20010115155358.86E5AA828@darjeeling.zadka.site.co.il> On Sun, 14 Jan 2001 21:24:36 -0500, Guido van Rossum wrote: > But if you print a bunch of P=NP shirts, I'm sure you can sell them > with a profit, both in Long Beach and in San Diego (at the O'Reilly > Open Source conference)... And the Libre Software Meeting (http://lsm.abul.org), which has a Python subtopic too. (Since it's in France, no one is calling it "free", so it's probable you can sell those T-shirts there...) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal@lemburg.com Mon Jan 15 09:44:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:44:14 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: <3A62C66E.2BB69E61@lemburg.com> Fredrik Lundh wrote: > > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? Since the Unicode character names are probably not used for performance sensitive tasks, I suggest to checkin the smallest version possible. If it is too much work to get Finn's version recoded in C (presuming it's written in Java), then I'd suggest checking in your version until someone comes up with a yet smaller edition. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 15 09:48:49 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:48:49 +0100 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> <14946.6552.542015.620760@beluga.mojam.com> Message-ID: <3A62C781.22240D3C@lemburg.com> Skip Montanaro wrote: > > Guido> Now, if we could somehow get this to show both the docs that Fred > Guido> maintains and the stuff that Ping extracts from the source code, > Guido> that would be even better! > > I had exactly the same thought. I suspect that if the install target were > modified to install the html-ized sections of the lib reference manual pydoc > could grovel around in sys and find the root of the library reference manual > pretty easily. If not, it could simply redirect to the relevant section of > http://www.python.org/doc/current/lib/. Since Fred remarked that the URLs for the different docs are not fixed, how about adding a __onlinedocs__ attribute to the standard Python modules providing the correct URL ? Or, alternatively, pass the module's name through some Google like "I feel lucky" documentation search engine... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 15 09:51:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:51:40 +0100 Subject: [Python-Dev] Where's Greg Ward ? References: Message-ID: <3A62C82C.EA25AAF5@lemburg.com> [CCed to distutils, since it matters there] Tim Peters wrote: > > [MAL] > > He seems to be offline and the people on the distutils list have > > some patches and other things which would be nice to have in > > distutils for 2.1. > > Greg's somewhere near the end of the process of moving from Virginia to > Canada; I expect he'll become visible again Real Soon. Great :) > > I suppose we could simply check in the patches, but we still want > > to get his OK on things before applying patches to the distutils > > tree. > > The distutils SIG could elect a Shadow Dictator in his place; if everyone > agrees to vote for Andrew, you save the effort of counting votes . Ok, let's agree to vote for Andrew :) Andrew, is that OK with you ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one@home.com Mon Jan 15 10:52:09 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 05:52:09 -0500 Subject: [Python-Dev] RE: xreadline speed vs readlines_sizehint In-Reply-To: <3A5D602D.9DC991CB@per.dem.csiro.au> Message-ID: [Mark Favas] > ... > The lines range in length from 96 to 747 characters, with > 11% @ 233, 17% @ 252 and 52% @ 254 characters, so #1 [a vendor > who actually optimized fgets()] looks promising - most lines are > long enough to trigger a realloc. Plus as soon as you spill over the stack buffer, I make you pay for filling 1024 new bytes with newlines before the next fgets() call, and almost all of those are irrelevant to you. It doesn't degrade gracefully. Alas, I tried several "adaptive" schemes (adjusting how much of the initial segment of a larger stack buffer they would use, based on the actual line lengths seen in the past), but the costs always exceeded the savings on my box. > Cranking up INITBUFSIZE in ms_getline_hack to 260 from 200 > improves thing again, by another 25%: > total 131426612 chars and 514216 lines > count_chars_lines 5.081 5.066 > readlines_sizehint 3.743 3.717 > using_fileinput 11.113 11.100 > while_readline 6.100 6.083 > for_xreadlines 3.027 3.033 Well, I couldn't let you forego *all* of 25%. The current fileobject.c has a stack buffer of 300 bytes, but only uses 100 of them on the first gets() call. On a very quiet machine, that saved 3-4% of the runtime on *my* test case, whose line lengths are typical of the text files I crunch over, so I'm happy for me. If 100 bytes aren't enough, it must call fgets() again, but just appends the next call into the full 300-byte buffer. So it saves the realloc for lines under 300 chars. > Apart from the name , I like ms_getline_hack... Ya, it's now the non-pejorative getline_via_fgets(). I hate that I became a grown-up <0.9 wink>. time-to-pick-wings-off-of-flies-ly y'rs - tim From ping@lfw.org Mon Jan 15 11:11:16 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 03:11:16 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: <20010114231820.C6081@lyra.org> Message-ID: On Sun, 14 Jan 2001, Greg Stein wrote: > Not so small: > > >... > > *** 333,337 **** > > i = host.find(':') > > if i >= 0: > > ! port = int(host[i+1:]) > > host = host[:i] > > else: > > --- 333,340 ---- > > i = host.find(':') > > if i >= 0: > > ! try: > > ! port = int(host[i+1:]) > > ! except ValueError, msg: > > ! raise socket.error, str(msg) > > host = host[:i] > > else: The above changes were not part of the patch i submitted; the patch i submitted was exactly a one-character change. Guido has already edited the file, so there's no need to commit anything further here. -- ?!ng From mal@lemburg.com Mon Jan 15 11:56:37 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 12:56:37 +0100 Subject: [Python-Dev] Why is soundex marked obsolete? References: Message-ID: <3A62E575.9A584108@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > BTW, are there less English centric "sounds alike" matchers > > around ? > > Yes, but if anything there are far too many of them: like Soundex, they're > just heuristics, and *everybody* who cares adds their own unique twists, > while proper studies are almost non-existent. Few variants appear to be in > use much beyond their inventor's friends; one notable exception in the > Jewish community is the Daitch-Mokotoff variation, originally tailored to > their unique needs but later generalized; a brief description here: > > http://www.avotaynu.com/soundex.html > > The similarly involved NYSIIS algorithm (New York State Identification > Intelligence System -- look for NYSIIS on Parnassus) was the winner from a > field of about two dozen competing algorithms, after measuring their > effectiveness on assorted databases maintained by the state of New York. > Since New York has a large immigrant population, NYSIIS isn't as > Anglocentric as Soundex either. Thanks for the pointer. I'll add that module to my lib :) http://metagram.webreply.com/downloads/nysiis.py Perhaps Eric ought to add this one to his package as well ?! BTW, where can I find your package on the web, Eric ? I'd like to give it a ride under German language conditions ;) > But state-of-the-art has given up on purely computational algorithms for > these purposes: proper names are simply too much a mess. For example, if I > search for "Richard", it *ought* to match on "Dick"; if my Arab buddy > searches on "Mohammed", it *ought* to match on "Mhd"; "the rules" people > actually use just aren't reducible to pure computation -- it takes a large > knowledge base to capture what people "just know". You may enjoy visiting > this commercial site (AFAIK, nobody is giving away state-of-the-art for > free): > > http://www.las-inc.com/ Sad -- "patent pending" algorithms don't help anyone on this planet :( > > ... > > http://physics.nist.gov/cuu/Reference/soundex.html > > > > works fine for English texts, > > If that were true, the English-speaking researchers would have declared > victory 120 years ago . But English pronunciation is *notoriously* > difficult to predict from spelling, partly because English is the Perl of > human languages. Then Dutch must be the Python of human languages... ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez@zadka.site.co.il Mon Jan 15 20:13:18 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:13:18 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 In-Reply-To: References: Message-ID: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> On Sun, 14 Jan 2001 19:26:38 -0800, Tim Peters wrote: > Modified Files: > tabnanny.py > Log Message: > Whitespace normalization. hmmmmmm....... -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal@lemburg.com Mon Jan 15 12:10:30 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 13:10:30 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 References: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: <3A62E8B6.3DFC1FA2@lemburg.com> Moshe Zadka wrote: > > On Sun, 14 Jan 2001 19:26:38 -0800, Tim Peters wrote: > > Modified Files: > > tabnanny.py > > Log Message: > > Whitespace normalization. > > hmmmmmm....... Perhaps you ought to make this a CRON job ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez@zadka.site.co.il Mon Jan 15 20:24:48 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:24:48 +0200 (IST) Subject: [Python-Dev] Someone should be shot In-Reply-To: <3A62E8B6.3DFC1FA2@lemburg.com> References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: <20010115202448.38F60A828@darjeeling.zadka.site.co.il> I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! Of course, the real culprit is the person who fixed up the reply-to in the checkin messages to point to python-dev. Why was it done, and isn't there a better way? This makes it painful to personally comment on people's checkin messages. I suggest instead to add a mail-followup-to header (Didn't anyone read "Reply-To Munging Considered Harmful"?) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From esr@thyrsus.com Mon Jan 15 12:23:25 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 07:23:25 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <3A62E575.9A584108@lemburg.com>; from mal@lemburg.com on Mon, Jan 15, 2001 at 12:56:37PM +0100 References: <3A62E575.9A584108@lemburg.com> Message-ID: <20010115072325.A10377@thyrsus.com> M.-A. Lemburg : > Perhaps Eric ought to add this one to his package as well ?! Actually, at this point, my plan is to give Tim a decent interval to refactor ndiff so his SequenceMatcher class is exposed and documented -- otherwise *I'll* go in and do it (har! waving a bloody knife!). His turns out to be the same as the Ratcliff-Obershelp technique I was using, except Tim had his bullshit threshold set too low (:-)) and let through matches I wouldn't have. -- Eric S. Raymond The only purpose for which power can be rightfully exercised over any member of a civilized community, against his will, is to prevent harm to others. His own good, either physical or moral, is not a sufficient warrant -- John Stuart Mill, "On Liberty", 1859 From mal@lemburg.com Mon Jan 15 12:26:59 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 13:26:59 +0100 Subject: [Python-Dev] Re: Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <3A62EC93.9AA60ABA@lemburg.com> Moshe Zadka wrote: > > I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! > Of course, the real culprit is the person who fixed up the reply-to in > the checkin messages to point to python-dev. Why was it done, and > isn't there a better way? This makes it painful to personally comment > on people's checkin messages. I suggest instead to add a mail-followup-to > header > > (Didn't anyone read "Reply-To Munging Considered Harmful"?) Naa, noone needs to be shot in the foot ;) In fact I like it, that replies go to python-dev ... after all, that's where these things should be discussed. BTW, in case you misunderstood my reply: it would indeed make sense to automate these kinds of check (tabnanny et al). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez@zadka.site.co.il Mon Jan 15 20:42:15 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:42:15 +0200 (IST) Subject: [Python-Dev] Re: Someone should be shot In-Reply-To: <3A62EC93.9AA60ABA@lemburg.com> References: <3A62EC93.9AA60ABA@lemburg.com>, <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <20010115204215.84F0CA828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001 13:26:59 +0100, "M.-A. Lemburg" wrote: > In fact I like it, that replies go to python-dev ... after all, > that's where these things should be discussed. Well, that's the mailing list where things should be discussed. But when I press the "Reply" button (as opposed to "Reply to List" button) I expect my e-mail to go to the person originating the e-mail. Reply-To: means "I'd like to get replies to some other address". What if, say, a checkin message relates to some private topic I'd discussed with someone: I'd like to reply to him personally. I agree that responses to Python-Checkins should be handled on Python-Dev: that's what the mail-followup-to header is for. > BTW, in case you misunderstood my reply: it would indeed make > sense to automate these kinds of check (tabnanny et al). Oh, ok. The "cron" part threw me off (why cron?) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From barry@digicool.com Mon Jan 15 13:15:28 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 08:15:28 -0500 Subject: [Python-Dev] Where's Greg Ward ? References: <3A62C82C.EA25AAF5@lemburg.com> Message-ID: <14946.63472.282750.828218@anthem.wooz.org> >>>>> "M" == M writes: >> The distutils SIG could elect a Shadow Dictator in his place; >> if everyone agrees to vote for Andrew, you save the effort of >> counting votes . M> Ok, let's agree to vote for Andrew :) M> Andrew, is that OK with you ? He's got my vote. I've been experiencing some weird problems with the distutils installation of pybsddb3 out of the current Python cvs tree. It'd be nice if the outstanding distutils patches are integrated before I dive in. I don't see anything relevant in patches or bugs, but I don't know if there are other repositories of distutils fixes (like the archives?). -Barry From barry@digicool.com Mon Jan 15 13:27:02 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 08:27:02 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <14946.64166.348139.425223@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> I'm sorry! I meant to reply to tim alone, and ended up MZ> spamming python-dev! Of course, the real culprit is the MZ> person who fixed up the reply-to in the checkin messages to MZ> point to python-dev. Why was it done, and isn't there a better MZ> way? This makes it painful to personally comment on people's MZ> checkin messages. I suggest instead to add a mail-followup-to MZ> header MZ> (Didn't anyone read "Reply-To Munging Considered Harmful"?) Or how about http://www.metasystema.org/essays/reply-to-useful.mhtml for a dissenting view. Of course Mail-Followup-To is completely non-standard, but even if it were, having the mailing list munge it in isn't recommended: http://cr.yp.to/proto/replyto.html Bottom line (IMHO), this is just something about email that is and will forever remain broken. Given that, it was voted a long while back to make Reply-To for checkins point to python-dev so until there's a hue and cry to change it back, I'll leave it as is. And yeah, it bites me sometimes too! -Barry From tony@lsl.co.uk Mon Jan 15 14:18:36 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 15 Jan 2001 14:18:36 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message-ID: <002801c07efe$0c728a80$f05aa8c0@lslp7o.int.lsl.co.uk> Neat stuff. Ka-Ping Yee strikes again. And it works with Python 1.5.2. Running on NT (4.00.1381) in an "MS-DOS" window, using Python 1.5.2 installed in the effbot manner, it works, with the slight strangeness that if I do: python pydoc.py I get the documentation for OK, but it is preceded with a line claiming that: The system cannot find the path specified. I don't have the time to pursue this at the moment - it's possibly an artefact of our system? (one minor "prettiness" hack - those of us who have been tainted by Emacs Lisp programming tend to start module documentation off with a line of the form: .py -- information about the module which, when pydoc'ed, results in a NAME line which starts with twice... Of course, if I'm the only person doing this, I'll just have to, well, stop...) A request - a "-f" switch to allow the user to specify a particular Python file (i.e., something not on the PYTHONPATH). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From jack@oratrix.nl Mon Jan 15 14:32:02 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 15 Jan 2001 15:32:02 +0100 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Message by Guido van Rossum , Sat, 13 Jan 2001 17:33:34 -0500 , <200101132233.RAA03229@cj20424-a.reston1.va.home.com> Message-ID: <20010115143203.A44B63C2031@snelboot.oratrix.nl> Also note that the problem only occurs when trying to build a unix-Python out-of-the-box on MacOSX. If you're building a Carbon Python from the MacPython sources (something very few people can do right now:-) the executable isn't called "python". And when a real MacOSX-Python will be done it'll have all the nifty packaging stuff that will also make sure that there's nothing called "python" in the toplevel folder. And the two workarounds (1-Use a UFS filesystem, 2-Put a ".exe" extension in the Makefile) work fine for the mean time. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido@python.org Mon Jan 15 14:33:23 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 09:33:23 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: Your message of "Sun, 14 Jan 2001 23:18:20 PST." <20010114231820.C6081@lyra.org> References: <20010114231820.C6081@lyra.org> Message-ID: <200101151433.JAA17944@cj20424-a.reston1.va.home.com> > >... > > *** 333,337 **** > > i = host.find(':') > > if i >= 0: > > ! port = int(host[i+1:]) > > host = host[:i] > > else: > > --- 333,340 ---- > > i = host.find(':') > > if i >= 0: > > ! try: > > ! port = int(host[i+1:]) > > ! except ValueError, msg: > > ! raise socket.error, str(msg) > > host = host[:i] > > else: > > Did you intend to commit this? Oops. That was a patch submitted a while ago that I applied as an experiment but then decided I didn't like (argument: why bother). I've reverted it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jan 15 14:40:30 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 09:40:30 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 22:24:48 +0200." <20010115202448.38F60A828@darjeeling.zadka.site.co.il> References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <200101151440.JAA18045@cj20424-a.reston1.va.home.com> > I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! > Of course, the real culprit is the person who fixed up the reply-to in > the checkin messages to point to python-dev. Why was it done, and > isn't there a better way? This makes it painful to personally comment > on people's checkin messages. I suggest instead to add a mail-followup-to > header > > (Didn't anyone read "Reply-To Munging Considered Harmful"?) I agree with you, but Barry (who set this up) seems to believe that there's a good reason to do it this way. Barry, do you still feel that way? The auto-reply-all has probably tripped me up more than anyone. Anyone else have a strong reason why this should be set? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez@zadka.site.co.il Mon Jan 15 23:03:25 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 16 Jan 2001 01:03:25 +0200 (IST) Subject: [Python-Dev] Someone should be shot In-Reply-To: <14946.64166.348139.425223@anthem.wooz.org> References: <14946.64166.348139.425223@anthem.wooz.org>, <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <20010115230325.1C7F5A828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001 08:27:02 -0500, barry@digicool.com (Barry A. Warsaw) wrote: > > Or how about > > http://www.metasystema.org/essays/reply-to-useful.mhtml If your mailer doesn't have this option, you should request it from its development team. Any mailer, whose development team refuses this simple request due to some ideological position, cannot be said to be reasonable. As some people here know, I'm my mailer's "development team". I refuse to add it due to an ideological position. Anyone who knows me know I'm quite unreasonable. Hmmm....I'm not making much headway, am I ;-) > for a dissenting view. Of course Mail-Followup-To is completely > non-standard, but even if it were, having the mailing list munge it in > isn't recommended: > > http://cr.yp.to/proto/replyto.html This has no relevance to the current case, since python-checkin messages are machine-generated -- so this is closer to doing this in the script generating the checkin message, and only differes in implementation. > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! I won't continue this thread, but remember that my vote is "no". I simply shudder at the thought that I might send someone e-mail with something like "nice bugfix. Didn't know you were back from the sex-change operation", and it would be broadcast out to all Python-Dev *and* the archives, for posterity. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From thomas@xs4all.net Mon Jan 15 15:31:22 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 16:31:22 +0100 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14946.64166.348139.425223@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 08:27:02AM -0500 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> Message-ID: <20010115163122.I1005@xs4all.nl> On Mon, Jan 15, 2001 at 08:27:02AM -0500, Barry A. Warsaw wrote: > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! I've said this before, on the Mailman-devel list, but I'll repeat it here for the record (in case this issue ever comes up for vote again :) The main bite (for me) is that to reply to a person in private, you have to cut&paste the 'From' header from the original mail, and edit your new mail's headers, in order to reply to a specific person. My mailer is mature enough to have a 'reply', 'reply-group' and 'reply-list' keybinding, so the 'Reply-To' only interferes. There probably is a 'reply-to-from-ignoring-replyto' keybinding in there, too, somewhere, or it could be added, but remembering to type that different key is almost as much trouble as typing the email address by hand ;P So, my vote, like Moshe's, is just back from a sex change, and reads 'no'. Recount-recount-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Mon Jan 15 15:38:01 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 10:38:01 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 08:27:02 EST." <14946.64166.348139.425223@anthem.wooz.org> References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> Message-ID: <200101151538.KAA21937@cj20424-a.reston1.va.home.com> > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! It sounds like a hue and cry to change it to me! It looks like it's time for a BDFL Pronouncement. I pronounce: Given that: - we all know how to mail to python-dev; - replying to the sender is by far the most common kind of reply; - the mistake of replying to the sender when a reply-all was intended does much less potential harm than the mistake of replying to all when reply-to-sender was intended, the reply-to header shall be removed. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Mon Jan 15 16:57:19 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 11:57:19 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <14946.63472.282750.828218@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 08:15:28AM -0500 References: <3A62C82C.EA25AAF5@lemburg.com> <14946.63472.282750.828218@anthem.wooz.org> Message-ID: <20010115115719.B919@kronos.cnri.reston.va.us> On Mon, Jan 15, 2001 at 08:15:28AM -0500, Barry A. Warsaw wrote: >tree. It'd be nice if the outstanding distutils patches are >integrated before I dive in. I don't see anything relevant in patches >or bugs, but I don't know if there are other repositories of distutils >fixes (like the archives?). There are a few patches buried in the back archives, but I don't know of any outstanding bugfixes, so please report whatever problem you're seeing. Oh, and Barry, did the issue holding up your patch for adding shar support (#102313) ever get resolved? --amk From guido@python.org Mon Jan 15 16:02:39 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 11:02:39 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Mon, 08 Jan 2001 18:20:56 PST." <20010108182056.C4640@lyra.org> References: <20010108182056.C4640@lyra.org> Message-ID: <200101151602.LAA22272@cj20424-a.reston1.va.home.com> Greg Stein noticed me checking in *yet* another system that needs the fallback TELL64() definition in fileobjects.c, and wrote: > All of those #ifdefs could be tossed and it would be more robust (long term) > if an autoconf macro were used to specify when TELL64 should be defined. > > [ I've looked thru fileobject.c and am a bit confused: the conditions for > defining TELL64 do not match the conditions for *using* it. that would > seem to imply a semantic error somewhere and/or a potential gotcha when > they get skewed (like I assume what happened to FreeBSD). simplifying with > an autoconf macro may help to rationalize it. ] I have a better idea. Since "lseek((fd),0,SEEK_CUR)" seems to be the universal fallback, why not just define TELL64 to be that if it's not previously defined (currently only MS_WIN64 has a different definition)? It isn't always *used* (the conditions under which _portable_fseek() uses it are quite complex), but *when* it is used, this seems to be the most common definition... Patch: *** fileobject.c 2001/01/15 10:36:56 2.106 --- fileobject.c 2001/01/15 16:02:06 *************** *** 58,66 **** /* define the appropriate 64-bit capable tell() function */ #if defined(MS_WIN64) #define TELL64 _telli64 ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) ! /* NOTE: this is only used on older ! NetBSD prior to f*o() funcions */ #define TELL64(fd) lseek((fd),0,SEEK_CUR) #endif --- 58,65 ---- /* define the appropriate 64-bit capable tell() function */ #if defined(MS_WIN64) #define TELL64 _telli64 ! #else ! /* Fallback for older systems that don't have the f*o() funcions */ #define TELL64(fd) lseek((fd),0,SEEK_CUR) #endif I'll check this in after 24 hours unless a better idea comes up. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jan 15 16:17:07 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 11:17:07 -0500 Subject: [Python-Dev] PEP 205 comments In-Reply-To: Your message of "Fri, 12 Jan 2001 23:19:57 +0100." <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> References: <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> Message-ID: <200101151617.LAA22359@cj20424-a.reston1.va.home.com> I'll leave most of this to Fred, but I'll reply to two items (Fred can add these replies to the PEP): > Again on proxies, there is no discussion or documentation of the > ReferenceError. Why is it a RuntimeError? LookupError, ValueError, and > AttributeError seem to be just as fine or better. RuntimeError was my suggestion. The error doesn't really qualify as a LookupError in my view (there's no key that could be valid or invalid) and ValueError seems too general (that's typically used for out-of-range arguments and unparseable strings and the like). Do you have a reason why RuntimeError is inappropriate? > On to the type type extensions: Should there be a type flag indicating > presence of tp_weaklistoffset? It appears that the type structure had > tp_xxx7 for a long time, so likely all in-use binary modules have > that field set to zero. Is that sufficient? Yes, that should be sufficient. (I'm also going to clain tp_xxx7 for the rich comparison function slot, but either patch can be modified to use tp_xxx8 instead.) Maybe it's time to add a bunch of new spares? > Thanks for reading all of this message, You're welcome. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Mon Jan 15 16:39:03 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 11:39:03 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> Message-ID: <14947.10151.575008.869188@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> the reply-to header shall be removed. I'm more than happy to do this (I remember adding the reply-to munging reluctantly). Understand one thing: anybody who naively replies to the whole list will send those replies to python-checkins, not python-dev. Still want it? -Barry From barry@digicool.com Mon Jan 15 16:46:28 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 11:46:28 -0500 Subject: [Python-Dev] Where's Greg Ward ? References: <3A62C82C.EA25AAF5@lemburg.com> <14946.63472.282750.828218@anthem.wooz.org> <20010115115719.B919@kronos.cnri.reston.va.us> Message-ID: <14947.10596.733726.995351@anthem.wooz.org> >>>>> "AK" == Andrew Kuchling writes: AK> There are a few patches buried in the back archives, but I AK> don't know of any outstanding bugfixes, so please report AK> whatever problem you're seeing. Okay, will do. AK> Oh, and Barry, did the issue holding up your patch for adding AK> shar support (#102313) ever get resolved? No, but I'll try to take another poke at it. -Barry From moshez@zadka.site.co.il Tue Jan 16 01:07:48 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Tue, 16 Jan 2001 03:07:48 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: References: Message-ID: <20010116010748.41869A828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001, Guido van Rossum wrote: > Modified Files: > Meta.py > Log Message: > Geoffrey Gerrietts discovered that a KeyError was caught that probably > should have been a NameError. I'm checking in a change that catches > both, just to be sure -- I can't be bothered trying to understand this > code any more. :-) ... > ! except (KeyError, AttributeError): Ummmm....can you be bothered to make sure you really meant AttributeError when you said NameError? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido@python.org Mon Jan 15 17:06:07 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 12:06:07 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 11:39:03 EST." <14947.10151.575008.869188@anthem.wooz.org> References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> Message-ID: <200101151706.MAA22884@cj20424-a.reston1.va.home.com> > I'm more than happy to do this (I remember adding the reply-to munging > reluctantly). Understand one thing: anybody who naively replies to > the whole list will send those replies to python-checkins, not > python-dev. > > Still want it? Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Mon Jan 15 17:11:29 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 12:11:29 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <200101151706.MAA22884@cj20424-a.reston1.va.home.com> Message-ID: <14947.12097.613433.580928@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm more than happy to do this (I remember adding the reply-to >> munging reluctantly). Understand one thing: anybody who >> naively replies to the whole list will send those replies to >> python-checkins, not python-dev. Still want it? GvR> Yes. Done. From thomas@xs4all.net Mon Jan 15 17:34:37 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 18:34:37 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ftplib.py,1.47,1.48 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, Jan 15, 2001 at 08:32:52AM -0800 References: Message-ID: <20010115183437.J1005@xs4all.nl> On Mon, Jan 15, 2001 at 08:32:52AM -0800, Guido van Rossum wrote: > This is slightly controversial, but after reading the argumentation in > the bug tracker for and against, I believe this is the right solution. It's really only slightly controversional. 'mfisk' convinced me too, and I used to use ftp to a server behind a firewall :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Mon Jan 15 18:21:54 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 19:21:54 +0100 Subject: [Python-Dev] Re: Someone should be shot References: <3A62EC93.9AA60ABA@lemburg.com>, <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <20010115204215.84F0CA828@darjeeling.zadka.site.co.il> Message-ID: <3A633FC2.11F90E94@lemburg.com> Moshe Zadka wrote: > > On Mon, 15 Jan 2001 13:26:59 +0100, "M.-A. Lemburg" wrote: > > > In fact I like it, that replies go to python-dev ... after all, > > that's where these things should be discussed. > > Well, that's the mailing list where things should be discussed. > But when I press the "Reply" button (as opposed to "Reply to List" button) > I expect my e-mail to go to the person originating the e-mail. > Reply-To: means "I'd like to get replies to some other address". > What if, say, a checkin message relates to some private topic > I'd discussed with someone: I'd like to reply to him personally. > > I agree that responses to Python-Checkins should be handled on Python-Dev: > that's what the mail-followup-to header is for. Ah, ok. I thought you pressed Reply-All and then wondered why your message got copied to python-dev... > > BTW, in case you misunderstood my reply: it would indeed make > > sense to automate these kinds of check (tabnanny et al). > > Oh, ok. The "cron" part threw me off (why cron?) CRON is what's used on Unix to implement jobs which run on a regular basis... perhaps we just need to seup the CRON job in timbot though ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Mon Jan 15 18:35:54 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 13:35:54 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: Your message of "Tue, 16 Jan 2001 03:07:48 +0200." <20010116010748.41869A828@darjeeling.zadka.site.co.il> References: <20010116010748.41869A828@darjeeling.zadka.site.co.il> Message-ID: <200101151835.NAA26712@cj20424-a.reston1.va.home.com> > > Modified Files: > > Meta.py > > Log Message: > > Geoffrey Gerrietts discovered that a KeyError was caught that probably > > should have been a NameError. I'm checking in a change that catches > > both, just to be sure -- I can't be bothered trying to understand this > > code any more. :-) > ... > > ! except (KeyError, AttributeError): > > Ummmm....can you be bothered to make sure you really meant AttributeError > when you said NameError? The code is correct. Ignore the comment. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@arctrix.com Mon Jan 15 11:55:51 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 03:55:51 -0800 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14947.10151.575008.869188@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 11:39:03AM -0500 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> Message-ID: <20010115035550.B4336@glacier.fnational.com> [Barry on removing the reply-to header on python-checkins messages] > I'm more than happy to do this (I remember adding the reply-to munging > reluctantly). Understand one thing: anybody who naively replies to > the whole list will send those replies to python-checkins, not > python-dev. Could you make the script generate mail-followup-to instead of reply-to? I know its not a standard header but some MUA understand it and it is exactly what is needed to solve this problem. I think promoting it is a good thing. Neil From thomas@xs4all.net Mon Jan 15 18:59:12 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 19:59:12 +0100 Subject: [Python-Dev] Someone should be shot In-Reply-To: <20010115035550.B4336@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 15, 2001 at 03:55:51AM -0800 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <20010115035550.B4336@glacier.fnational.com> Message-ID: <20010115195912.K1005@xs4all.nl> On Mon, Jan 15, 2001 at 03:55:51AM -0800, Neil Schemenauer wrote: > [Barry on removing the reply-to header on python-checkins messages] > > I'm more than happy to do this (I remember adding the reply-to munging > > reluctantly). Understand one thing: anybody who naively replies to > > the whole list will send those replies to python-checkins, not > > python-dev. > Could you make the script generate mail-followup-to instead of > reply-to? I know its not a standard header but some MUA > understand it and it is exactly what is needed to solve this > problem. I think promoting it is a good thing. The script just calls '/bin/mail'. The Reply-To munging is done by Mailman, which is slightly more than 'a script'. syncmail could do it, but that would mean using sendmail instead of mail, and writing all headers itself. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Mon Jan 15 19:17:27 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 14:17:27 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: Your message of "Fri, 05 Jan 2001 14:14:49 EST." <14934.7465.360749.199433@localhost.localdomain> References: <14934.7465.360749.199433@localhost.localdomain> Message-ID: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> There doesn't seem to be a lot of enthousiasm for a Unittest bakeoff... Certainly I don't think I'll get to this myself before the conference. How about the following though: talking of low-hanging fruit, Tim's doctest module is an excellent thing even if it isn't a unit testing framework! (I found this out when I played with it -- it's real easy to get used to...) Would anyone object against Tim checking this in? Since it isn't a contender in the unit test bake-off, it shouldn't affect the outcome there at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Mon Jan 15 19:40:03 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 14:40:03 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <20010115035550.B4336@glacier.fnational.com> <20010115195912.K1005@xs4all.nl> Message-ID: <14947.21011.310090.686632@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: >> Could you make the script generate mail-followup-to instead of >> reply-to? I know its not a standard header but some MUA >> understand it and it is exactly what is needed to solve this >> problem. I think promoting it is a good thing. TW> The script just calls '/bin/mail'. The Reply-To munging is TW> done by Mailman, which is slightly more than 'a TW> script'. syncmail could do it, but that would mean using TW> sendmail instead of mail, and writing all headers itself. I'm sure Fred or I would be happy to review such a patch to syncmail . -Barry From jeremy@alum.mit.edu Mon Jan 15 19:31:44 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 15 Jan 2001 14:31:44 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> Message-ID: <14947.20512.140859.119597@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: GvR> There doesn't seem to be a lot of enthousiasm for a Unittest GvR> bakeoff... Certainly I don't think I'll get to this myself GvR> before the conference. Let's have all the interested parties vote now, then. It would certainly be helpful to have the new unittest module in the alpha release of 2.1. I'd like to write some new tests and I'd rather use the new stuff than the old stuff. Jeremy From tim.one@home.com Mon Jan 15 20:01:52 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:01:52 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14947.10151.575008.869188@anthem.wooz.org> Message-ID: [Barry] > ... > Understand one thing: anybody who naively replies to the whole > list will send those replies to python-checkins, not python-dev. IIRC, that's why the redirect to python-dev was added to begin with: of course people will reply to python-checkins, and then the next guy x-posts to python-dev too, and the next three in turn variously remove one or the other groups, or keep both or add c.l.py too. In the end, no single archive contains a coherent record on its own, and the random mix of "[Python-Dev]" and "[Python-checkins]" Subject tags even make it impossible to sort by (true) subject easily in your own mail client. > Still want it? Don't care . From tim.one@home.com Mon Jan 15 20:08:15 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:08:15 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 In-Reply-To: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: [] > Modified Files: > tabnanny.py > Log Message: > Whitespace normalization. [Moshe] > hmmmmmm....... LOL! I was hoping nobody would notice that <0.7 wink>. The appalling truth is that late in tabnanny's development I deliberately indented a large block of code by one column, and actually thought it was a good idea at the time. I'm as delighted to see that finally fixed as I am emabarrassed by the necessity. although-perhaps-more-appalled-that-was-there-was-followup- debate-about-followups-containing-more-msgs-than-there- were-characters-in-moshe's-followup-ly y'rs - tim From ping@lfw.org Mon Jan 15 20:10:10 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 12:10:10 -0800 (PST) Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <002801c07efe$0c728a80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: On Mon, 15 Jan 2001, Tony J Ibbs (Tibs) wrote: > I get the documentation for OK, but it is preceded with a line > claiming that: > > The system cannot find the path specified. Thanks for the NT testing. That's funny -- i put in a special case for Windows to avoid messages like the above a couple of days ago. How recently did you download pydoc.py? Does your copy contain: if hasattr(sys, 'winver'): return lambda text: tempfilepager(text, 'more') ? > .py -- information about the module > > which, when pydoc'ed, results in a NAME line which starts with > twice... > Of course, if I'm the only person doing this, I'll just have to, well, > stop...) I think i'm going to ask you to stop, unless Guido prefers otherwise. Guido, do you have a style pronouncement for module docstrings? > A request - a "-f" switch to allow the user to specify a particular > Python file (i.e., something not on the PYTHONPATH). Yes, it's on my to-do list. So you can see what i'm up to, here's my current to-do list: make boldness optional (only if using more/less? only Unix?) document a .py file given on the command line + webserver in background help should have a repr write a better htmlrepr (\n should look special, max length limit, etc.) generate docs from lib HTML generate HTML index from precis and __path__ and package contents list have help(...) produce a directory of available things to ask for help on curses.wrapper is broken: both function and package respect package __all__ coherent answer to .py vs .pyc: do we show .pyc? fix getcomments() bug: last two lines stuck together + grey out shadowed modules/packages refactor .py/.pyc/.module.so/.module.so.1 listers in htmldoc, textdoc skip __main__ module + index built-in modules too Windows and Mac testing default to HTTP mode on GUI platforms? (win, mac) The ones marked with + i consider done. Feel free to comment on or suggest priorities for the others; in particular, what do you think of the last one? The idea is that double-clicking on pydoc.py in Windows or MacOS could launch the server and then open the localhost URL using webbrowser.py to display the documentation index. Should it do this by default? -- ?!ng From guido@python.org Mon Jan 15 20:41:25 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 15:41:25 -0500 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Your message of "Mon, 15 Jan 2001 12:10:10 PST." References: Message-ID: <200101152041.PAA32298@cj20424-a.reston1.va.home.com> > > .py -- information about the module > > > > which, when pydoc'ed, results in a NAME line which starts with > > twice... > > Of course, if I'm the only person doing this, I'll just have to, well, > > stop...) > > I think i'm going to ask you to stop, unless Guido prefers > otherwise. Guido, do you have a style pronouncement for module > docstrings? I'm with Ping. None of the examples in the style guide start the docstring with the function name. Almost none of the standard library modules start their module docstring with the module name (codecs is an exception, but I didn't write it :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn@worldonline.dk Mon Jan 15 20:45:02 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Mon, 15 Jan 2001 20:45:02 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <3A62C66E.2BB69E61@lemburg.com> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3A62C66E.2BB69E61@lemburg.com> Message-ID: <3a636122.45847835@smtp.worldonline.dk> [Fredrik Lundh] > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? [M.-A. Lemburg] >Since the Unicode character names are probably >not used for performance sensitive tasks, I suggest to >checkin the smallest version possible. > >If it is too much work to get Finn's version recoded in C >(presuming it's written in Java), then I'd suggest checking >in your version until someone comes up with a yet smaller >edition. FWIW, I agree the that 160k module should be used. Please, nobody should use the jython compression as an argument to delay any improvements in CPython. I certainly didn't post because I wanted to complicate your processes. I just wanted to show off . regards, finn From fredrik@effbot.org Mon Jan 15 20:58:11 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Mon, 15 Jan 2001 21:58:11 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3A62C66E.2BB69E61@lemburg.com> <3a636122.45847835@smtp.worldonline.dk> Message-ID: <001f01c07f35$e2c09500$e46940d5@hagrid> mal, finn: > >If it is too much work to get Finn's version recoded in C > >(presuming it's written in Java), then I'd suggest checking > >in your version until someone comes up with a yet smaller > >edition. > > FWIW, I agree the that 160k module should be used. Please, nobody should > use the jython compression as an argument to delay any improvements in > CPython. okay, unless someone throws in a -1 vote, I'll check this in tomorrow. Cheers /F From tim.one@home.com Mon Jan 15 20:57:26 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:57:26 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: [Fredrik Lundh] > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? Absolutely! But not like as for 2.0: check it in *now*, so we have a few days to deal with surprises before the alpha release. With 300K sitting on the table waiting to be taken, it's not worth delaying one hour to worry about 60K additional that may or may not be achievable later. From ping@lfw.org Mon Jan 15 21:02:38 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 13:02:38 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: <20010116010748.41869A828@darjeeling.zadka.site.co.il> Message-ID: On Tue, 16 Jan 2001, Moshe Zadka wrote: > Ummmm....can you be bothered to make sure you really meant AttributeError > when you said NameError? Nice bugfix. Didn't know you were back from the sex-change operation. -- ?!ng From tim.one@home.com Mon Jan 15 21:15:54 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 16:15:54 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > There doesn't seem to be a lot of enthousiasm for a Unittest > bakeoff... I'm enthusiastic, but ... > Certainly I don't think I'll get to this myself before the > conference. Ditto. Takes time that's not there. > ... > Would anyone object against Tim checking [doctest] in? You suggested that before, and so it was already on my 2.1a1 todo list. Hoped to get to it over the weekend but didn't. Hope to get to it today, but won't . On the chance that I do, anyone inclined to object should do so before the sun sets in Reston. or-if-it-never-sets-the-world-ends-anyway-ly y'rs - tim From akuchlin@mems-exchange.org Mon Jan 15 21:26:19 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 16:26:19 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14947.20512.140859.119597@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 15, 2001 at 02:31:44PM -0500 References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> <14947.20512.140859.119597@localhost.localdomain> Message-ID: <20010115162619.A19484@kronos.cnri.reston.va.us> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: >Let's have all the interested parties vote now, then. It would >certainly be helpful to have the new unittest module in the alpha >release of 2.1. I'd like to write some new tests and I'd rather use >the new stuff than the old stuff. Huh? If no one has tried the different modules, what's the point of having a vote? (Given that doctest is going to be added, though, it should be checked in ASAP.) --amk From trentm@ActiveState.com Mon Jan 15 22:10:26 2001 From: trentm@ActiveState.com (Trent Mick) Date: Mon, 15 Jan 2001 14:10:26 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <200101151602.LAA22272@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 11:02:39AM -0500 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> Message-ID: <20010115141026.I29870@ActiveState.com> On Mon, Jan 15, 2001 at 11:02:39AM -0500, Guido van Rossum wrote: > Greg Stein noticed me checking in *yet* another system that needs > the fallback TELL64() definition in fileobjects.c, and wrote: > > > All of those #ifdefs could be tossed and it would be more robust (long term) > > if an autoconf macro were used to specify when TELL64 should be defined. > > > > [ I've looked thru fileobject.c and am a bit confused: the conditions for > > defining TELL64 do not match the conditions for *using* it. that would > > seem to imply a semantic error somewhere and/or a potential gotcha when > > they get skewed (like I assume what happened to FreeBSD). simplifying with > > an autoconf macro may help to rationalize it. ] The problem is that these systems lie when they "say" (according to Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have largefile support. This seems to have happened for a particular release of BSD (which has since been fixed). I think that the Right(tm) (meaning the cleanest solution where the tests and definitions in the code actually represent the truth) answer is a proper configure test (sort of as Greg suggests). I don't really feel comfortable writing that patch (because (1) lack of time and (2) inability to test, I don't have any access to any of these BSD machines). [Guido] > > I have a better idea. Since "lseek((fd),0,SEEK_CUR)" seems to be the > universal fallback, why not just define TELL64 to be that if it's not > previously defined (currently only MS_WIN64 has a different > definition)? It isn't always *used* (the conditions under which > _portable_fseek() uses it are quite complex), but *when* it is used, > this seems to be the most common definition... While I agree that it is annoying that the build breaks for these platforms I think that it is appropriate that the build breaks. Having to put these: #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) definitions here gives a nice list of those platforms that *do* lie. I would prefer that to having an "#else" block that just captures all other cases, but that is just my opinion. Options (in order of preference): (1) Update the configure test for HAVE_LARGEFILE_SUPPORT such that the proper versions of these OSes do *not* #define it. (2) Guido's suggestion. (2) Keep extending the "#elif" list. ^---- using (2) twice was intentional Trent > > *** fileobject.c 2001/01/15 10:36:56 2.106 > --- fileobject.c 2001/01/15 16:02:06 > *************** > *** 58,66 **** > /* define the appropriate 64-bit capable tell() function */ > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > ! /* NOTE: this is only used on older > ! NetBSD prior to f*o() funcions */ > #define TELL64(fd) lseek((fd),0,SEEK_CUR) > #endif > > --- 58,65 ---- > /* define the appropriate 64-bit capable tell() function */ > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #else > ! /* Fallback for older systems that don't have the f*o() funcions */ > #define TELL64(fd) lseek((fd),0,SEEK_CUR) > #endif > > > I'll check this in after 24 hours unless a better idea comes up. > Better idea but no patch. :( Trent -- Trent Mick TrentM@ActiveState.com From skip@mojam.com (Skip Montanaro) Mon Jan 15 22:10:36 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 15 Jan 2001 16:10:36 -0600 (CST) Subject: [Python-Dev] should we start instrumenting modules with __all__? Message-ID: <14947.30044.934204.951564@beluga.mojam.com> I see the from-import-* patch for __all__ has been checked in. Should we make an effort to add __all__ to at least some modules before 2.1a1? Skip From akuchlin@mems-exchange.org Mon Jan 15 22:13:03 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 17:13:03 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: <200101121351.IAA19676@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 12, 2001 at 08:51:51AM -0500 References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> <200101121351.IAA19676@cj20424-a.reston1.va.home.com> Message-ID: <20010115171303.A23626@kronos.cnri.reston.va.us> On Fri, Jan 12, 2001 at 08:51:51AM -0500, Guido van Rossum wrote: >Ah. It's very simple. I create a directory "linux" as a subdirectory >of the Python source tree (i.e. at the same level as Lib, Objects, >etc.). Then I chdir into that directory, and I say "../configure". >The configure script creates subdirectories to hold the object files ... >Then I say "make" and it builds Python. This doesn't work at all for me in my copy of the CVS tree. Are there other steps or requirements to make this work. (Transcript available upon request, but I suspect I'm missing something simple.) --amk From tim.one@home.com Mon Jan 15 22:32:51 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 17:32:51 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: [Jeremy] > Let's have all the interested parties vote now, then. It would > certainly be helpful to have the new unittest module in the alpha > release of 2.1. I'd like to write some new tests and I'd rather use > the new stuff than the old stuff. [Andrew] > Huh? If no one has tried the different modules, what's the point of > having a vote? Presumably so that *something* gets into 2.1a1. At least you, Jeremy and Fredrik have tried them, and if that's all there can't be a tie . I would agree this is not an ideal decision procedure. the-question-is-whether-it's-better-than-paralysis-ly y'rs - tim From ping@lfw.org Mon Jan 15 22:35:47 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 14:35:47 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' Message-ID: I don't know whether this is going to be obvious or controversial, but here goes. Most of the time we're used to seeing a newline as '\n', not as '\012', and newlines are typed in as '\n'. A newcomer to Python is likely to do >>> 'hello\n' 'hello\012' and ask "what's \012?" -- whereupon one has to explain that it's an octal escape, that 012 in octal equals 10, and that chr(10) is newline, which is the same as '\n'. You're bound to run into this, and you'll see \012 a lot, because \n is such a common character. Aside from being slightly more frightening, '\012' also takes up twice as many characters as necessary. So... i'm submitting a patch that causes the three most common special whitespace characters, '\n', '\r', and '\t', to appear in their natural form rather than as octal escapes when strings are printed and repr()ed. Mm? -- ?!ng From esr@thyrsus.com Mon Jan 15 23:15:50 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 18:15:50 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from ping@lfw.org on Mon, Jan 15, 2001 at 02:35:47PM -0800 References: Message-ID: <20010115181550.A11566@thyrsus.com> Ka-Ping Yee : > I don't know whether this is going to be obvious or controversial, > but here goes. Most of the time we're used to seeing a newline as > '\n', not as '\012', and newlines are typed in as '\n'. > > A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > > and ask "what's \012?" -- whereupon one has to explain that it's an > octal escape, that 012 in octal equals 10, and that chr(10) is > newline, which is the same as '\n'. You're bound to run into this, > and you'll see \012 a lot, because \n is such a common character. > Aside from being slightly more frightening, '\012' also takes up > twice as many characters as necessary. > > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. Works for me. I'd add \v, \b and \a to cover the whole ANSI C standard escape set (hmmm...am I missing any?) -- Eric S. Raymond Live free or die; death is not the worst of evils. -- General George Stark. From thomas@xs4all.net Mon Jan 15 23:49:30 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 00:49:30 +0100 Subject: [Python-Dev] time functions Message-ID: <20010116004930.L1005@xs4all.nl> Maybe this is a dead and buried subject, but I'm going to try anyway, since everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood lately :) Why do we need the following atrocity : timestr = time.strftime("", time.localtime(time.time())) To do the simple task of 'date +' ? I never really understood why there isn't a way to get a timetuple directly from C, rather than converting a float that we got from C a bytecode before, even though the higher level almost always deals with timetuples. How about making the float-to-tuple functions (time.localtime, time.gmtime) accept 0 arguments as well, and defaulting to time.time() in that case ? Even better, how about doing the same for the other functions, too ? (where it makes sense, of course :) Actually, I'll split it up in three proposals: - Making the time in time.strftime default to 'now', so that the above becomes the ever so slightly confusing: timestr = time.strftime("") (confusing because it looks a bit like a regexp constructor...) - Making the time in time.asctime and time.ctime optional, defaulting to 'now', so you can just call 'time.ctime()' without having to pass time.time() (which are about half the calls in my own code :) - Making the time in time.localtime and time.gmtime default to 'now'. I'm 0/+1/+1 myself :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Mon Jan 15 23:55:36 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 00:55:36 +0100 Subject: [Python-Dev] TELL64 In-Reply-To: <20010115141026.I29870@ActiveState.com>; from trentm@ActiveState.com on Mon, Jan 15, 2001 at 02:10:26PM -0800 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> Message-ID: <20010116005536.M1005@xs4all.nl> On Mon, Jan 15, 2001 at 02:10:26PM -0800, Trent Mick wrote: > > > [ I've looked thru fileobject.c and am a bit confused: the conditions > > > for defining TELL64 do not match the conditions for *using* it. that > > > would seem to imply a semantic error somewhere and/or a potential > > > gotcha when they get skewed (like I assume what happened to > > > FreeBSD). simplifying with an autoconf macro may help to rationalize > > > it. ] > The problem is that these systems lie when they "say" (according to > Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have > largefile support. This seems to have happened for a particular release of > BSD (which has since been fixed). I think that the Right(tm) (meaning the > cleanest solution where the tests and definitions in the code actually > represent the truth) answer is a proper configure test (sort of as Greg > suggests). I don't really feel comfortable writing that patch (because (1) > lack of time and (2) inability to test, I don't have any access to any of > these BSD machines). There is no (longer any) 'single BSD release', so I doubt it has 'since been fixed' :) We should consider the different BSD derived OSes as separate, if slightly related, systems (much like SunOS <-> BSD.) The problem in the BSDI case is really simple: the autoconf test doesn't test whether the fs really supports large files, but rather whether the system has an off_t type that is 64 bits. BSDI has that type, but does not actually use it in any of the seek/tell functions. This has not been 'fixed' as far as I know, precisely because it isn't 'broken' :) I tried to fix the test, but I have been completely unable to find a proper test. There doesn't seem to be a 'standard' one, and I wasn't able to figure out what, say, 'zsh' uses -- black autoconf magic, for sure. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From trentm@ActiveState.com Tue Jan 16 00:24:54 2001 From: trentm@ActiveState.com (Trent Mick) Date: Mon, 15 Jan 2001 16:24:54 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <20010116005536.M1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 12:55:36AM +0100 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> Message-ID: <20010115162454.D3864@ActiveState.com> On Tue, Jan 16, 2001 at 12:55:36AM +0100, Thomas Wouters wrote: > On Mon, Jan 15, 2001 at 02:10:26PM -0800, Trent Mick wrote: > > > The problem is that these systems lie when they "say" (according to > > Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have > > largefile support. This seems to have happened for a particular release of > > BSD (which has since been fixed). I think that the Right(tm) (meaning the > > cleanest solution where the tests and definitions in the code actually > > represent the truth) answer is a proper configure test (sort of as Greg > > suggests). I don't really feel comfortable writing that patch (because (1) > > lack of time and (2) inability to test, I don't have any access to any of > > these BSD machines). > > There is no (longer any) 'single BSD release', so I doubt it has 'since been > fixed' :) Okay sure (showing my ignorance). My only understanding was that this "lying" was the case for some unspecified BSDs a while ago but that the latest releases of any of them *did* have largefile support. > > I tried to fix the test, but I have been completely unable to find a proper > test. There doesn't seem to be a 'standard' one, and I wasn't able to figure > out what, say, 'zsh' uses -- black autoconf magic, for sure. Hmmm... if one code encode whether or not a 64-bit fseek could be implemented (either using fseek, fseek0, fseek64, _fseek, fsetpos/fgetpos, etc.) in a short C program then that would be the test (or at least most of the test, might have to see if ftell could be implemented as well). Or are there other requirements? Trent -- Trent Mick TrentM@ActiveState.com From esr@thyrsus.com Tue Jan 16 01:26:14 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 20:26:14 -0500 Subject: [Python-Dev] time functions In-Reply-To: <20010116004930.L1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 12:49:30AM +0100 References: <20010116004930.L1005@xs4all.nl> Message-ID: <20010115202614.A11732@thyrsus.com> Thomas Wouters : > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) Likewise. -- Eric S. Raymond Never trust a man who praises compassion while pointing a gun at you. From barry@digicool.com Tue Jan 16 02:14:33 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 21:14:33 -0500 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <14947.44681.254332.976234@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> I'm 0/+1/+1 myself :) Maybe I'm an inch on the +0/+1/+1 side. :) From jeremy@alum.mit.edu Tue Jan 16 00:11:59 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 15 Jan 2001 19:11:59 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115162619.A19484@kronos.cnri.reston.va.us> References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> <14947.20512.140859.119597@localhost.localdomain> <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: <14947.37327.395622.66435@localhost.localdomain> >>>>> "AMK" == Andrew Kuchling writes: AMK> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: >> Let's have all the interested parties vote now, then. It would >> certainly be helpful to have the new unittest module in the alpha >> release of 2.1. I'd like to write some new tests and I'd rather >> use the new stuff than the old stuff. AMK> Huh? If no one has tried the different modules, what's the AMK> point of having a vote? (Given that doctest is going to be AMK> added, though, it should be checked in ASAP.) Guido is the only person that said he hadn't tried anything. If others have given it a whirl, they ought to chime in now. If very few people have given them a try, we should decide whether we wait for them or proceed without them. We can't wait indefinitely. I'm not sure when we need to decide. Jeremy From nas@arctrix.com Mon Jan 15 19:40:55 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 11:40:55 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: <200101132225.RAA03197@cj20424-a.reston1.va.home.com>; from guido@python.org on Sat, Jan 13, 2001 at 05:25:12PM -0500 References: <20010113071758.C28643@glacier.fnational.com> <200101132225.RAA03197@cj20424-a.reston1.va.home.com> Message-ID: <20010115114055.A5879@glacier.fnational.com> On Sat, Jan 13, 2001 at 05:25:12PM -0500, Guido van Rossum wrote: > Do you have a tool that detects leaks? debauch is showing promise athough it is still pretty rough around the edges. memprof is another option. It looks like init_exceptions may be leaking memory. Some debauch output: 1 Leaked Memory 0x0849cf98, size 44 (from 0x0) AllocTime: 79269 FreeTime: 43436 return stack: ???:?? (0x40016005) classobject.c:84 (0x805c16d) exceptions.c:337 (0x8088594) exceptions.c:1061 (0x80898dc) pythonrun.c:151 (0x8053581) loop.c:23 (0x8053305) I haven't figured out if this is a real leak yet. Neil From michel@digicool.com Tue Jan 16 06:33:00 2001 From: michel@digicool.com (Michel Pelletier) Date: Mon, 15 Jan 2001 22:33:00 -0800 (PST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14947.37327.395622.66435@localhost.localdomain> Message-ID: On Mon, 15 Jan 2001, Jeremy Hylton wrote: > >>>>> "AMK" == Andrew Kuchling writes: > > AMK> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: > >> Let's have all the interested parties vote now, then. It would > >> certainly be helpful to have the new unittest module in the alpha > >> release of 2.1. I'd like to write some new tests and I'd rather > >> use the new stuff than the old stuff. > > AMK> Huh? If no one has tried the different modules, what's the > AMK> point of having a vote? (Given that doctest is going to be > AMK> added, though, it should be checked in ASAP.) > > Guido is the only person that said he hadn't tried anything. If > others have given it a whirl, they ought to chime in now. I have used pyunit to create a simple set of tests. It seemed to do the job well and it was very easy. I'd never done it before and the docs were fat and A+. I can only give a one-sided opinion. I know of AMK's work but I have not used it, are there others? -Michel From akuchlin@mems-exchange.org Tue Jan 16 03:03:31 2001 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Mon, 15 Jan 2001 22:03:31 -0500 Subject: [Python-Dev] Detecting install time Message-ID: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> For PEP 229, the setup.py script needs to figure out if it's running from the build directory, because then distutils.sysconfig needs to look at different config files; ./Modules/Makefile instead of /usr/lib/python2.0/config/Makefile, and so forth. Is there a simple/clean way to do this? --amk From guido@python.org Tue Jan 16 03:21:43 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:21:43 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Mon, 15 Jan 2001 17:13:03 EST." <20010115171303.A23626@kronos.cnri.reston.va.us> References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> <200101121351.IAA19676@cj20424-a.reston1.va.home.com> <20010115171303.A23626@kronos.cnri.reston.va.us> Message-ID: <200101160321.WAA00648@cj20424-a.reston1.va.home.com> > On Fri, Jan 12, 2001 at 08:51:51AM -0500, Guido van Rossum wrote: > >Ah. It's very simple. I create a directory "linux" as a subdirectory > >of the Python source tree (i.e. at the same level as Lib, Objects, > >etc.). Then I chdir into that directory, and I say "../configure". > >The configure script creates subdirectories to hold the object files ... > >Then I say "make" and it builds Python. > > This doesn't work at all for me in my copy of the CVS tree. Are there > other steps or requirements to make this work. (Transcript available > upon request, but I suspect I'm missing something simple.) You can't start doing this in a tree where you have already built Python using the default way -- you have to use a pristine tree. The reason is the funny way Make's VPATH feature works, it sees the .o files in the source directory and then thinks it doesn't have to creat the .o file in the build directory. I think a "make clobber" at the top level would probably eradicate everything that confuses Make. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 03:24:04 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:24:04 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 14:35:47 PST." References: Message-ID: <200101160324.WAA00677@cj20424-a.reston1.va.home.com> > I don't know whether this is going to be obvious or controversial, > but here goes. Most of the time we're used to seeing a newline as > '\n', not as '\012', and newlines are typed in as '\n'. > > A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > > and ask "what's \012?" -- whereupon one has to explain that it's an > octal escape, that 012 in octal equals 10, and that chr(10) is > newline, which is the same as '\n'. You're bound to run into this, > and you'll see \012 a lot, because \n is such a common character. > Aside from being slightly more frightening, '\012' also takes up > twice as many characters as necessary. > > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. +1 on the idea; no time to study the patch tonight. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 03:28:38 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:28:38 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 18:15:50 EST." <20010115181550.A11566@thyrsus.com> References: <20010115181550.A11566@thyrsus.com> Message-ID: <200101160328.WAA00723@cj20424-a.reston1.va.home.com> > > So... i'm submitting a patch that causes the three most common > > special whitespace characters, '\n', '\r', and '\t', to appear in > > their natural form rather than as octal escapes when strings are > > printed and repr()ed. > > Works for me. I'd add \v, \b and \a to cover the whole ANSI C > standard escape set (hmmm...am I missing any?) You missed \f [*]. Unclear to me whether it's a good idea to add the lesser-known ones; they are just as likely binary gobbledegook rather than what their escapes stand for. [*] http://www.python.org/doc/current/ref/strings.html --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 03:31:19 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:31:19 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 00:49:30 +0100." <20010116004930.L1005@xs4all.nl> References: <20010116004930.L1005@xs4all.nl> Message-ID: <200101160331.WAA00780@cj20424-a.reston1.va.home.com> > Maybe this is a dead and buried subject, but I'm going to try anyway, since > everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood > lately :) > > Why do we need the following atrocity : > > timestr = time.strftime("", time.localtime(time.time())) > > To do the simple task of 'date +' ? I never really understood why > there isn't a way to get a timetuple directly from C, rather than converting > a float that we got from C a bytecode before, even though the higher level > almost always deals with timetuples. How about making the float-to-tuple > functions (time.localtime, time.gmtime) accept 0 arguments as well, and > defaulting to time.time() in that case ? Even better, how about doing the > same for the other functions, too ? (where it makes sense, of course :) > > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) I don't see the confusion. > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) Yes, I've wondered this myself too. I guess the current API is based too much on the C API... +1/+1/+1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 03:47:32 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:47:32 -0500 Subject: [Python-Dev] Detecting install time In-Reply-To: Your message of "Mon, 15 Jan 2001 22:03:31 EST." <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> References: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> Message-ID: <200101160347.WAA01132@cj20424-a.reston1.va.home.com> > For PEP 229, the setup.py script needs to figure out if it's running > from the build directory, because then distutils.sysconfig needs to > look at different config files; ./Modules/Makefile instead of > /usr/lib/python2.0/config/Makefile, and so forth. Is there a > simple/clean way to do this? You could check for the presence of config.status -- that file is not installed. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue Jan 16 03:53:16 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 22:53:16 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Message-ID: [?!ng] > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. -1 on doing that when they're printed (although I probably misunderstand what you mean there). +1 for changing repr() as suggested. -0 on generalizing to \a \b \f \v too (I've never used one of those in a string literal in my life, so would be more baffled by seeing one come back than I would the octal equivalent). I would also be +1 on using hex escapes instead of octal (I grew up on 36- and 60-bit machines, but that was the last time octal looked *natural*!). Octal and hex escapes both consume 4 characters, so I can't imagine what octal has going for it in the 21st century . 377-is-an-irritating-way-to-spell-ff-ly y'rs - tim PS: Note that C doesn't define what numerical values \a etc have, just that: Each of these escape sequences shall produce a unique implementation-defined value which can be stored in a single char object. The external representations in a text file need not be identical to the internal representations, and are outside the scope of this International Standard. The current method does have the advantage of extreme clarity. From guido@python.org Tue Jan 16 04:08:46 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 23:08:46 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Mon, 15 Jan 2001 16:24:54 PST." <20010115162454.D3864@ActiveState.com> References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> Message-ID: <200101160408.XAA01368@cj20424-a.reston1.va.home.com> Looking at the code (in _portable_fseek()) that uses TELL64, I don't understand why it can't use fgetpos(). That code is used only when fpos_t -- the type used by fgetpos() and fsetpos() -- is 64-bit. Trent, you wrote that code. Why wouldn't this work just as well? (your code): if ((pos = TELL64(fileno(fp))) == -1L) return -1; (my suggestion): if (fgetpos(fp, &pos) != 0) return -1; It can't be because fgetpos() doesn't exist or is otherwise unusable, because the SEEK_CUR case uses it. We also know that offset is 8-bit capable (the #if around the declaration of _portable_fseek() ensures that). I would even go as far as to collapse the entire switch as follows: fpos_t pos; switch (whence) { case SEEK_END: /* do a "no-op" seek first to sync the buffering so that the low-level tell() can be used correctly */ if (fseek(fp, 0, SEEK_END) != 0) return -1; /* fall through */ case SEEK_CUR: if (fgetpos(fp, &pos) != 0) return -1; offset += pos; break; /* case SEEK_SET: break; */ } return fsetpos(fp, &offset); --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 04:13:40 2001 From: guido@python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 23:13:40 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 22:53:16 EST." References: Message-ID: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> > [?!ng] > > So... i'm submitting a patch that causes the three most common > > special whitespace characters, '\n', '\r', and '\t', to appear in > > their natural form rather than as octal escapes when strings are > > printed and repr()ed. > > -1 on doing that when they're printed (although I probably misunderstand > what you mean there). Ping was using imprecise language here -- he meant repr() and "printed at the command line prompt." > +1 for changing repr() as suggested. > > -0 on generalizing to \a \b \f \v too (I've never used one of those in a > string literal in my life, so would be more baffled by seeing one come back > than I would the octal equivalent). > > I would also be +1 on using hex escapes instead of octal (I grew up on 36- > and 60-bit machines, but that was the last time octal looked *natural*!). Me too. One summer vacation while in college I had nothing better to do than decode the Pascal runtime system for the University's CDC-6600 from an octal dump into assembly. Learned lots! > Octal and hex escapes both consume 4 characters, so I can't imagine what > octal has going for it in the 21st century . Originally, using \x for these was impractical (at least) because of the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics of the \x escape. Now we've fixed this, I agree. > 377-is-an-irritating-way-to-spell-ff-ly y'rs - tim > > > PS: Note that C doesn't define what numerical values \a etc have, just > that: > > Each of these escape sequences shall produce a unique > implementation-defined value which can be stored in a single > char object. The external representations in a text file need > not be identical to the internal representations, and are > outside the scope of this International Standard. > > The current method does have the advantage of extreme clarity. Python doesn't support non-ASCII machines, like the C standard (pretends to). --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Tue Jan 16 04:26:13 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 23:26:13 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101160328.WAA00723@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 10:28:38PM -0500 References: <20010115181550.A11566@thyrsus.com> <200101160328.WAA00723@cj20424-a.reston1.va.home.com> Message-ID: <20010115232613.B12166@thyrsus.com> Guido van Rossum : > > > So... i'm submitting a patch that causes the three most common > > > special whitespace characters, '\n', '\r', and '\t', to appear in > > > their natural form rather than as octal escapes when strings are > > > printed and repr()ed. > > > > Works for me. I'd add \v, \b and \a to cover the whole ANSI C > > standard escape set (hmmm...am I missing any?) > > You missed \f [*]. Unclear to me whether it's a good idea to add the > lesser-known ones; they are just as likely binary gobbledegook rather > than what their escapes stand for. > > [*] http://www.python.org/doc/current/ref/strings.html Truth is, Guido, I'm kind of iffy about whether there'd be a gain in clarity myself. But I find I'm rather attached to the idea of maintaining strictest possible symmetry between what Python handles on input and what it emits on output. So unless we think adding \f, \v, \b, and \a to the special set would actually produce a *loss* of clarity relative to octal gibberish (!), I say do 'em all. Aesthetically, that feels to me like the right thing, and the *Pythonic* thing, to do here. Have I erred in my intuition, O BDFL? -- Eric S. Raymond A man who has nothing which he is willing to fight for, nothing which he cares about more than he does about his personal safety, is a miserable creature who has no chance of being free, unless made and kept so by the exertions of better men than himself. -- John Stuart Mill, writing on the U.S. Civil War in 1862 From nas@arctrix.com Mon Jan 15 21:45:28 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 13:45:28 -0800 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <20010115232613.B12166@thyrsus.com>; from esr@thyrsus.com on Mon, Jan 15, 2001 at 11:26:13PM -0500 References: <20010115181550.A11566@thyrsus.com> <200101160328.WAA00723@cj20424-a.reston1.va.home.com> <20010115232613.B12166@thyrsus.com> Message-ID: <20010115134528.B6193@glacier.fnational.com> On Mon, Jan 15, 2001 at 11:26:13PM -0500, Eric S. Raymond wrote: > [...] I find I'm rather attached to the idea of maintaining > strictest possible symmetry between what Python handles on > input and what it emits on output. > > So unless we think adding \f, \v, \b, and \a to the special set would > actually produce a *loss* of clarity relative to octal gibberish (!), > I say do 'em all. Symmetry is good but I bet most people who would see \f, \v, \b, \a wouldn't have entered those characters using escapes. Most likely those character's would have been read from a binary file. That said, I don't really mind either way. Neil From tim.one@home.com Tue Jan 16 04:43:06 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 15 Jan 2001 23:43:06 -0500 Subject: [Python-Dev] Whitesapce normalization Message-ID: You may have noticed that I checked in changes to most of the modules in the top level of Lib yesterday (Sunday). This is part of a Crusade that was supposed to happen before 2.0a1, but got dropped on the floor then due to misunderstandings: make the Python code we distribute adhere to Guido's style guide (4-space indents, no hard tabs), + clean up minor whitespace nits (no stray blank lines at the ends of files, no trailing whitespace on lines, last line of the file should end with a newline). It would be nice if people cleaned up their code this way too; I'm not going to go thru the entire distribution doing this. So, if you give a rip, pick a directory or some modules you're fond of, and clean 'em up. The program Tools/scripts/reindent.py does all of the above for you, so it's not hard. But it takes some care in two areas, which is why I did the top level of Lib one file at a time by hand, and studied diffs by eyeball before checking in any changes: + It's unlikely but possible that some program file *depends* on trailing whitespace. That plain sucks (it's *going* to break sooner or later), but reindent.py can't help you there. + While reindent should never otherwise damage program logic, very strange commenting or docstring styles may get mangled by it, making code and/or docs hard to read. reindent works very hard to do a good job on that, and indeed I found no need to make manual changes to anything it did in the top level of Lib. But check anyway. Especially some of the very oldest modules are littered with ugly stuff like # all over the place, from back when nobody had an editor smart enough to skip over preceding blank lines when suggesting indentation for the current line. Then again, maybe we should just drop the Irix5 directory . voice-in-the-wilderness-ly y'rs - tim From esr@thyrsus.com Tue Jan 16 04:43:24 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 23:43:24 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from tim.one@home.com on Mon, Jan 15, 2001 at 10:53:16PM -0500 References: Message-ID: <20010115234324.C12166@thyrsus.com> Tim Peters : > I would also be +1 on using hex escapes instead of octal (I grew up on 36- > and 60-bit machines, but that was the last time octal looked *natural*!). > Octal and hex escapes both consume 4 characters, so I can't imagine what > octal has going for it in the 21st century . Tim, on the level of aesthetic preference I'm totally with you. I've always found octal really ugly myself. Hex fits my brain better; somehow I find it easier to visualize the bit patterns from. Sadly, there are so many other related ways in which Python intelligently follows C/Unix conventions that I think changing to a default of hex escapes rather than octal would violate the Rule of Least Surprise. One of the things I like about Python is precisely its conservatism in areas like string escapes, that Guido refrained from inventing new OS APIs or new conventions for things like string escapes in places where Unix and C did them in a well-established and reasonable way. He didn't make the mistake, all too typical in academic languages, of confusing novelty with value... This conservatism is valuable because it frees the C-experienced programmer's mind from having to think about where the language is trivially different, so he can concentrate on where it's importantly different. It's worth maintaining. On the other hand, the change would mesh well with the Unicode support. Hmm. Tough call. I could go either way, I guess. -- Eric S. Raymond The politician attempts to remedy the evil by increasing the very thing that caused the evil in the first place: legal plunder. -- Frederick Bastiat From tim.one@home.com Tue Jan 16 05:07:16 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 16 Jan 2001 00:07:16 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <20010115234324.C12166@thyrsus.com> Message-ID: [Eric] > Tim, on the level of aesthetic preference I'm totally with you. > I've always found octal really ugly myself. Hex fits my brain > better; somehow I find it easier to visualize the bit patterns from. > > Sadly, there are so many other related ways in which Python > intelligently follows C/Unix conventions that I think changing to > a default of hex escapes rather than octal would violate the Rule > of Least Surprise. > > ... [and skipping nice stuff I *do* agree with ] ... The saving grace here is that repr() is a form of ASCII dump. C has nothing to say about that, while last time I used Unix it was real easy to get dumps in hex (and indeed that's what everyone I knew routinely did). I expect that od retains both its name and its octal defaults on most systems simply due to inertia. An octal dump would be infinitely surprising on Windows (I'm not sure I can even get one without writing it myself). Do people actually use octal dumps on Unices anymore? I'd be surprised, if they're running on power-of-2 boxes. Defaults aren't conventions when *everyone* overrides them, they're just old and in the way. takes-one-to-know-one-ly y'rs - tim From ping@lfw.org Tue Jan 16 05:27:33 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 21:27:33 -0800 (PST) Subject: [Python-Dev] time functions In-Reply-To: <20010116004930.L1005@xs4all.nl> Message-ID: On Tue, 16 Jan 2001, Thomas Wouters wrote: > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. I like all of these suggestions. Go for it! -- ?!ng From esr@thyrsus.com Tue Jan 16 05:31:14 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 00:31:14 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from tim.one@home.com on Tue, Jan 16, 2001 at 12:07:16AM -0500 References: <20010115234324.C12166@thyrsus.com> Message-ID: <20010116003114.A12365@thyrsus.com> Tim Peters : > Do people actually use octal dumps on Unices anymore? Well, we do when we momentarily forget to give od(1) the -x escape :-) This so annoyed me that back around 1983 I wrote my own hex dumper specifically to emulate the 16-hex-bytes-with-midpage-gutter-and-ASCII- over-on-the-right-side format that CP/M used and DOS inherited. It's still available at . Do you know the history on this? C speaks octal because a bunch of mode fields in the PDP-11 instruction word were three bits wide. Time was it was actually useful to have the output from (say) core files chunk that way. But I haven't seen an octal code dump in over a decade, probably pushing fifteen years now. -- Eric S. Raymond In the absence of any evidence tending to show that possession or use of a 'shotgun having a barrel of less than eighteen inches in length' at this time has some reasonable relationship to the preservation or efficiency of a well regulated militia, we cannot say that the Second Amendment guarantees the right to keep and bear such an instrument. [...] The Militia comprised all males physically capable of acting in concert for the common defense. -- Majority Supreme Court opinion in "U.S. vs. Miller" (1939) From ping@lfw.org Tue Jan 16 05:33:42 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 21:33:42 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> Message-ID: On Mon, 15 Jan 2001, Guido van Rossum wrote: > > > special whitespace characters, '\n', '\r', and '\t', to appear in > > > their natural form rather than as octal escapes when strings are > > > printed and repr()ed. > > > > -1 on doing that when they're printed (although I probably misunderstand > > what you mean there). > > Ping was using imprecise language here -- he meant repr() and "printed > at the command line prompt." Yes, i referred to "when strings are printed and repr()ed" as two cases because both string_print() and string_repr() have to be changed. (Side question: when are *_print() and *_repr() ever different, and why?) > Originally, using \x for these was impractical (at least) because of > the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics > of the \x escape. Now we've fixed this, I agree. Oh, now i understand. Good point. I'll update the patch to do hex. 0xdeadbeef-ly yours, -- ?!ng From fredrik@effbot.org Tue Jan 16 07:11:38 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 16 Jan 2001 08:11:38 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <00b201c07f8b$93996820$e46940d5@hagrid> thomas wrote: > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) where "now" is local time, I assume? since you're assuming a time zone, you could make it accept an integer as well... > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) same here. From thomas@xs4all.net Tue Jan 16 07:18:38 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 08:18:38 +0100 Subject: [Python-Dev] time functions In-Reply-To: <00b201c07f8b$93996820$e46940d5@hagrid>; from fredrik@effbot.org on Tue, Jan 16, 2001 at 08:11:38AM +0100 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> Message-ID: <20010116081838.N1005@xs4all.nl> On Tue, Jan 16, 2001 at 08:11:38AM +0100, Fredrik Lundh wrote: > thomas wrote: > > - Making the time in time.strftime default to 'now', so that the above > > becomes the ever so slightly confusing: > > > > timestr = time.strftime("") > > (confusing because it looks a bit like a regexp constructor...) > where "now" is local time, I assume? Yes. See the patch I'll upload later today (meetings first, grrr) > since you're assuming a time zone, you could make it accept > an integer as well... Could, yes... I'll include it in the 2nd revision of the patch, it can be rejected (or accepted) separately. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Tue Jan 16 08:22:11 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 09:22:11 +0100 Subject: [Python-Dev] time functions In-Reply-To: <20010116081838.N1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 08:18:38AM +0100 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> Message-ID: <20010116092211.O1005@xs4all.nl> On Tue, Jan 16, 2001 at 08:18:38AM +0100, Thomas Wouters wrote: > On Tue, Jan 16, 2001 at 08:11:38AM +0100, Fredrik Lundh wrote: > > > timestr = time.strftime("") > > since you're assuming a time zone, you could make it accept > > an integer as well... > Could, yes... Actually, on second thought, lets not, not just yet anyway. Doing that for all functions in the time module would continue to pollute the already toxic waters of a C API translated into Python :P Who knows what 'ctime' stands for, anyway ? And 'asctime' ? How can we expect Python programmers who think 'C' is a high note or average grade, to understand how the time module is supposed to be used ? :) We now have: time() -- return current time in seconds since the Epoch as a float gmtime() -- convert seconds since Epoch to UTC tuple localtime() -- convert seconds since Epoch to local time tuple asctime() -- convert time tuple to string ctime() -- convert time in seconds to string mktime() -- convert local time tuple to seconds since Epoch strftime() -- convert time tuple to string according to format specification where asctime and ctime are basically wrappers around strftime, and would do the exact same thing if they both accepted tuples and floats. I think we should have something like: time() -- current time in float timetuple() -- current (local) time in timetuple tuple2time(tuple) -- tuple -> float time2tuple(float, tz=local) -- float -> tuple using timezone tz stringtime(time=now, format="ctimeformat") -- convert time value to string Those are just working names, to make the point, I don't have time to think up better ones :) I'm not sure if the timezone support in the above list is extensive enough, mostly because I hardly use timezones myself. Also, tuple2time() could be merged with time(), and likewise for time2tuple() and timetuple(). I think keeping strftime() and maybe ctime() for ease-of-use is a good idea, but the rest could eventually be deprecated. Off-to-important-meetings-*cough*-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik@effbot.org Tue Jan 16 08:30:28 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 16 Jan 2001 09:30:28 +0100 Subject: [Python-Dev] unit testing bake-off References: Message-ID: <01ba01c07f96$967b7870$e46940d5@hagrid> Tim Peters wrote: > At least you, Jeremy and Fredrik have tried them, and > if that's all there can't be a tie . let me guess: Jeremy: PyUnit Andrew: unittest Fredrik: unittest (I find pyunit a bit unpythonic, and both overengineered and underengineered at the same time... hard to explain, but I strongly prefer unittest) > I would agree this is not an ideal decision procedure. well, any decision procedure that comes up with what I want just has to be ideal ;-) From andy@reportlab.com Tue Jan 16 09:20:45 2001 From: andy@reportlab.com (Andy Robinson) Date: Tue, 16 Jan 2001 09:20:45 -0000 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115204701.11972EA6B@mail.python.org> Message-ID: > Subject: Re: [Python-Dev] unit testing bake-off > From: Guido van Rossum > Date: Mon, 15 Jan 2001 14:17:27 -0500 > > There doesn't seem to be a lot of enthousiasm for a Unittest > bakeoff... Certainly I don't think I'll get to this myself before the > conference. > > How about the following though: talking of low-hanging fruit, Tim's > doctest module is an excellent thing even if it isn't a unit testing > framework! (I found this out when I played with it -- it's real easy > to get used to...) > > Would anyone object against Tim checking this in? Since it isn't a > contender in the unit test bake-off, it shouldn't affect the outcome > there at all. > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think it should definitely go in. Ditto with whatever testing framework and documentation tools (pydoc etc.) shortly emerge as "best of breed". I spend my time on corporate consulting projects, and saying things like "Python has standard tools for unit testing and documentation" is even better than saying "We have standard tools for unit testing and documentation". BTW, ReportLab has recently adopted PyUnit's unittest.py It feels a bit Java-like to me - a few more lines of code than needed - but it certainly works. One key feature is aggregating test suites; a big app we installed on a customer site can run the test suite for itself, the ReportLab library (whose test suite we are just getting to work on) and four or five dependent utilities; another is that people have heard of JUnit. Just my 2p worth, Andy Robinson From tony@lsl.co.uk Tue Jan 16 09:47:01 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 16 Jan 2001 09:47:01 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101152041.PAA32298@cj20424-a.reston1.va.home.com> Message-ID: <003901c07fa1$46e10c70$f05aa8c0@lslp7o.int.lsl.co.uk> In the context of my starting doc strings in an Emacs Lisp manner, Ka-Ping Yee said: > I think i'm going to ask you to stop, unless Guido prefers > otherwise. Guido, do you have a style pronouncement for module > docstrings? and since Guido replied > I'm with Ping. None of the examples in the style guide start the > docstring with the function name. Almost none of the standard library > modules start their module docstring with the module name (codecs is > an exception, but I didn't write it :-). I shall indeed stop (of course, my habit started before we HAD documentation tools, and if we're going to browse things with pydoc, et al, then there's no need for it. To be honest, it's the answer I expected. Oh dear, another item for my TO DO list (i.e., remove the offending nits). Still, if it's only me it's hardly high impact! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Which is safer, driving or cycling? Cycling - it's harder to kill people with a bike... My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Tue Jan 16 10:13:31 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 16 Jan 2001 10:13:31 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message-ID: <003a01c07fa4$fa0883c0$f05aa8c0@lslp7o.int.lsl.co.uk> I mentioned a "spurious" > The system cannot find the path specified. on NT, and Ka-Ping Yee said: > Thanks for the NT testing. That's funny -- i put in a special case > for Windows to avoid messages like the above a couple of days ago. > How recently did you download pydoc.py? Does your copy contain: > > if hasattr(sys, 'winver'): > return lambda text: tempfilepager(text, 'more') Hmm. I downloaded it when I read the email message announcing it, which was yesterday some time. But it doesn't look like the lines you mention are there - I'll try re-downloading... ...I've redownloaded the files from http://www.lfw.org/python/pydoc.py, etc., and done a grep for hasattr within them. There's no check such as the one you mention, so I guess it's "download impedance". > So you can see what i'm up to, here's my current to-do list: > > make boldness optional (only if using more/less? only Unix?) probably sensible. By the way, I don't get boldness on the NT box - any chance (he says, not intending to help *at all* in doing it!) of it happening there as well? (or would that depend on what curses support is built into the Python?) > document a .py file given on the command line also allow for a directory module (i.e., something with __init__.py in it) given on the command line? > write a better htmlrepr (\n should look special, max > length limit, etc.) yes, but these things can always get better - the fact it's working allows for improoooovement down the line. > generate HTML index from precis and __path__ and package a neat idea - definitely Good Stuff! > contents list well, I always do these, so I'm for this one as well > have help(...) produce a directory of available things to > ask for help on bouncy fun! > Windows and Mac testing I'm running Windows 98 with Python 1.5.2 at home, and will willingly try it out on that (after all, it's not a very big download) - although it might sometimes take a day or two to get round to it (for instance, I haven't yet done so!). But I suspect I shan't be a very demanding user... > default to HTTP mode on GUI platforms? (win, mac) > > The ones marked with + i consider done. Feel free to comment on > or suggest priorities for the others; in particular, what do you > think of the last one? The idea is that double-clicking on > pydoc.py in Windows or MacOS could launch the server and then open > the localhost URL using webbrowser.py to display the documentation > index. Should it do this by default? I'll leave that to better designers than myself (although if one is to *have* a double click action, that seems sensible to me). (looks up webbrowser.py - ah, a 2.0 module). Personally, I'd also like to have the option of having a "mini-browser" supported directly, perhaps in Tkinter, so I don't need to start up a whole web browser. But again I may be odd in that wish (I can't remember what IDLE does). Oh - that also means "integrate into IDLE" presumably goes on at least a WishList as well... Other ideas: * command line switch to *output* HTML to a file (i.e., documentation generation) (presumably something like "-o .html", where the "html" indicates the output format - an alternative being "txt" * if I ever finish the docutils effort (I should be getting back to it soon) then use that to format the texts (this would mean I need not worry about the "frontend" to docutils too much, since pydoc is already doing so much). Or maybe the docutils tool should be importing pydoc... Tibs (must do some (paid) work now!) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal@lemburg.com Tue Jan 16 10:18:44 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 11:18:44 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <3A642004.F6197E86@lemburg.com> Thomas Wouters wrote: > > Maybe this is a dead and buried subject, but I'm going to try anyway, since > everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood > lately :) > > Why do we need the following atrocity : > > timestr = time.strftime("", time.localtime(time.time())) > > To do the simple task of 'date +' ? I never really understood why > there isn't a way to get a timetuple directly from C, rather than converting > a float that we got from C a bytecode before, even though the higher level > almost always deals with timetuples. How about making the float-to-tuple > functions (time.localtime, time.gmtime) accept 0 arguments as well, and > defaulting to time.time() in that case ? Even better, how about doing the > same for the other functions, too ? (where it makes sense, of course :) > > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) +1 all the way -- though these days I tend not to use the time module anymore. mxDateTime already does everything I want and there date/time values are objects rather than Python integers or tuples... ok, I'm just showing opff a little :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Jan 16 10:32:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 11:32:21 +0100 Subject: [Python-Dev] Strings: '\012' -> '\n' References: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> Message-ID: <3A642335.82358B02@lemburg.com> Minor nit about this idea: it makes decoding repr() style strings harder for external tools and it could cause breakage (e.g. if "\n" is usedby the encoding for some other purpose). BTW, since there are a gazillion ways to encode strings into 7-bit ASCII, why not use the new codec design to add additional output schemes for 8-bit strings ?! Strings have an .encode() method as well... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping@lfw.org Tue Jan 16 10:37:42 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 16 Jan 2001 02:37:42 -0800 (PST) Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <003a01c07fa4$fa0883c0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Before somebody decides to shoot us for spamming both lists, i'm taking this thread off of python-dev and solely to doc-sig. Please continue further discussion there... -- ?!ng From ping@lfw.org Tue Jan 16 10:47:02 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 16 Jan 2001 02:47:02 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Message-ID: On Mon, 15 Jan 2001, Ka-Ping Yee wrote: > On Mon, 15 Jan 2001, Guido van Rossum wrote: > > Originally, using \x for these was impractical (at least) because of > > the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics > > of the \x escape. Now we've fixed this, I agree. > > Oh, now i understand. Good point. I'll update the patch to do hex. I assume you would like Unicode strings to do the same (\n, \t, \r, and \xff rather than \377). Guido, do you have a Pronouncement on \v, \f, \b, \a? By the way, why do Unicode escapes appear in capitals? >>> u'\uface' u'\uFACE' (If someone tells me that there happens to be a picture of a face at that code point, i'll laugh. Is there a cow at \uBEEF?) Does anyone care that \x will be followed by lowercase and \u by uppercase? I noticed that the tutorial claims Unicode strings can be str()-ified and will encode themselves using UTF-8 as default. But this doesn't actually work for me: >>> us = u'\uface' >>> us u'\uFACE' >>> str(us) Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> us.encode() Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> us.encode('UTF-8') '\xef\xab\x8e' Assuming i have understood this correctly, i have submitted a patch to correct tut.tex. -- ?!ng From bckfnn@worldonline.dk Tue Jan 16 10:52:10 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 16 Jan 2001 10:52:10 GMT Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: References: Message-ID: <3a642768.6426631@smtp.worldonline.dk> [Ping] >I don't know whether this is going to be obvious or controversial, >but here goes. Most of the time we're used to seeing a newline as >'\n', not as '\012', and newlines are typed in as '\n'. > >A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > >and ask "what's \012?" -- whereupon one has to explain that it's an >octal escape, that 012 in octal equals 10, and that chr(10) is >newline, which is the same as '\n'. You're bound to run into this, >and you'll see \012 a lot, because \n is such a common character. >Aside from being slightly more frightening, '\012' also takes up >twice as many characters as necessary. > >So... i'm submitting a patch that causes the three most common >special whitespace characters, '\n', '\r', and '\t', to appear in >their natural form rather than as octal escapes when strings are >printed and repr()ed. I like it, because it removes yet another difference between Python and Jython. Jython happens to handle these chars specially: \n, \t, \b, \f and \r. regards, finn From esr@thyrsus.com Tue Jan 16 10:53:00 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 05:53:00 -0500 Subject: [Python-Dev] time functions In-Reply-To: <3A642004.F6197E86@lemburg.com>; from mal@lemburg.com on Tue, Jan 16, 2001 at 11:18:44AM +0100 References: <20010116004930.L1005@xs4all.nl> <3A642004.F6197E86@lemburg.com> Message-ID: <20010116055300.C12847@thyrsus.com> M.-A. Lemburg : > +1 all the way -- though these days I tend not to use the > time module anymore. mxDateTime already does everything I want > and there date/time values are objects rather than Python integers > or tuples... ok, I'm just showing opff a little :) mxDateTime is on my short list of "why isn't this in the Python library already?" Has it ever been discussed? -- Eric S. Raymond You need only reflect that one of the best ways to get yourself a reputation as a dangerous citizen these days is to go about repeating the very phrases which our founding fathers used in the great struggle for independence. -- Attributed to Charles Austin Beard (1874-1948) From mal@lemburg.com Tue Jan 16 11:18:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 12:18:24 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> <3A642004.F6197E86@lemburg.com> <20010116055300.C12847@thyrsus.com> Message-ID: <3A642E00.BD330647@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > +1 all the way -- though these days I tend not to use the > > time module anymore. mxDateTime already does everything I want > > and there date/time values are objects rather than Python integers > > or tuples... ok, I'm just showing opff a little :) > > mxDateTime is on my short list of "why isn't this in the Python library > already?" Has it ever been discussed? Yes. I'd rather keep it separate from the standard dist for various reasons. One of these reasons is that I will be moving the mx tools into a new packaging scheme built on distutils -- installing it should then boil down to a simple RPM install or maybe a "python setup.py install" thanks to distutils. The package will then become a subpackage of the mx package. BTW, I see distutils as strong argument for *not* including more exotic packages in Python's stdlib. If this catches on, I expect that together with the Vaults we are not far away from having our own CPAN style archive of add-on packages. I also expect the commercial vendors like ActiveState et al. to take care of wrapping SUMO distributions of Python and the existing add-ons. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Tue Jan 16 11:20:18 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 06:20:18 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <3a642768.6426631@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Tue, Jan 16, 2001 at 10:52:10AM +0000 References: <3a642768.6426631@smtp.worldonline.dk> Message-ID: <20010116062018.A12935@thyrsus.com> Finn Bock : > I like it, because it removes yet another difference between Python and > Jython. Jython happens to handle these chars specially: \n, \t, \b, \f > and \r. This is an argument for adding \b and \f to the special set in CPython. If the BDFL looks benignly on adding \v and \a, those should go into Jython's special set too. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address From fredrik@pythonware.com Tue Jan 16 11:37:10 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 16 Jan 2001 12:37:10 +0100 Subject: [Python-Dev] Strings: '\012' -> '\n' References: Message-ID: <03eb01c07fb0$aaaa19e0$0900a8c0@SPIFF> ping wrote: > By the way, why do Unicode escapes appear in capitals? > > >>> u'\uface' > u'\uFACE' > > (If someone tells me that there happens to be a picture of a face at > that code point, i'll laugh. Is there a cow at \uBEEF?) iirc, 0xFACE and 0xBEEF are part of the CJK and Hangul spaces. not sure 0xFACE is assigned, but 0xBEEF glyph looks like a ribcage with four legs... you'll find faces at 0x263A etc. From skip@mojam.com (Skip Montanaro) Tue Jan 16 13:09:51 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 16 Jan 2001 07:09:51 -0600 (CST) Subject: [Python-Dev] bummer - regsub/regex no longer in module index Message-ID: <14948.18463.971334.401426@beluga.mojam.com> I am now getting deprecation warnings about regsub so I decided to start replacing it with more zeal than I had previously. First thing I wanted to replace were some regsub.split calls. I went to the module index to look up the description but regsub was nowhere to be found. (I know, I know. I can use pydoc.) Still... how about continuing to include deprecated modules in the library reference manual but in a separate Deprecated Modules section and annotate them as such in the module index? Skip From guido@python.org Tue Jan 16 13:44:01 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 08:44:01 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 08:11:38 +0100." <00b201c07f8b$93996820$e46940d5@hagrid> References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> Message-ID: <200101161344.IAA04513@cj20424-a.reston1.va.home.com> > thomas wrote: > > - Making the time in time.strftime default to 'now', so that the above > > becomes the ever so slightly confusing: > > > > timestr = time.strftime("") > > (confusing because it looks a bit like a regexp constructor...) > > where "now" is local time, I assume? > > since you're assuming a time zone, you could make it accept > an integer as well... What would the integer mean? > > - Making the time in time.asctime and time.ctime optional, defaulting to > > 'now', so you can just call 'time.ctime()' without having to pass > > time.time() (which are about half the calls in my own code :) > > same here. Same what here? "now" == local time, sure. But accept an integer? It already accepts an integer! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 13:55:01 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 08:55:01 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 09:22:11 +0100." <20010116092211.O1005@xs4all.nl> References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> <20010116092211.O1005@xs4all.nl> Message-ID: <200101161355.IAA04802@cj20424-a.reston1.va.home.com> Let's not redesign the time module API too much. I'm all for adding the default argument values that Thomas proposes. Then, instead of changing the API, we should look into a higher-level Python module. That's how those things typically go. Digital Creations has its own time extension type somewhere in Zope, a bit similar to mxDateTime. I looked into making this a standard Python extension but quickly gave up. The problems with these things seems to be that it's hard to come up with a design that makes everyone happy: some people want small objects (because they have a lot of them around, e.g. a timestamp on almost every other object); others want timezone support; yet others want microsecond resolution; leap-second support; pre-Christian era support; support for nonstandard calendars; interval arithmetic; support for dates without times or times without dates... Python could use a better time type, but we'll have to look into which requirements make sense for a generalized type, and which don't. I fear that a committee could easily pee away years designing an interface to satisfy absolutely every wish. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 14:02:29 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:02:29 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 21:33:42 PST." References: Message-ID: <200101161402.JAA05045@cj20424-a.reston1.va.home.com> > Yes, i referred to "when strings are printed and repr()ed" as two cases > because both string_print() and string_repr() have to be changed. > > (Side question: when are *_print() and *_repr() ever different, and why?) You mean the tp_print and tp_str function slots in type objects, right? tp_print *should* always render exactly the same as tp_str. tp_print is used by the print statement, not by value display at the interactive prompt. tp_print and tp_str have differed historically for 3rd party extension types by accident. So, string_print most definitely should *not* be changed -- only string_repr! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 14:06:23 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:06:23 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 02:47:02 PST." References: Message-ID: <200101161406.JAA05153@cj20424-a.reston1.va.home.com> > I assume you would like Unicode strings to do the same (\n, \t, \r, > and \xff rather than \377). Yeah. > Guido, do you have a Pronouncement on \v, \f, \b, \a? Practicality beats purity: these will remain octal. > By the way, why do Unicode escapes appear in capitals? > > >>> u'\uface' > u'\uFACE' Could it be just that that's what Unicode folks are expecting? > (If someone tells me that there happens to be a picture of a face at > that code point, i'll laugh. Is there a cow at \uBEEF?) I'm laughing even though I don't see pictures. :-) > Does anyone care that \x will be followed by lowercase and \u by uppercase? It's mildly weird, and I think hex escapes in lowercase are more Pythonic than in upper case. > I noticed that the tutorial claims Unicode strings can be str()-ified > and will encode themselves using UTF-8 as default. But this doesn't > actually work for me: > > >>> us = u'\uface' > >>> us > u'\uFACE' > >>> str(us) > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > >>> us.encode() > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > >>> us.encode('UTF-8') > '\xef\xab\x8e' > > Assuming i have understood this correctly, i have submitted a patch > to correct tut.tex. Yeah, I guess that part of the tutorial was written before we changed our minds about this. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 14:09:56 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:09:56 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 11:32:21 +0100." <3A642335.82358B02@lemburg.com> References: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> <3A642335.82358B02@lemburg.com> Message-ID: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> > Minor nit about this idea: it makes decoding repr() style > strings harder for external tools and it could cause breakage > (e.g. if "\n" is usedby the encoding for some other purpose). Such a tool would be broken. If it accepts string literals it should accept all forms of escapes. > BTW, since there are a gazillion ways to encode strings into > 7-bit ASCII, why not use the new codec design to add additional > output schemes for 8-bit strings ?! > > Strings have an .encode() method as well... Good idea! This could also be used to "hexify" a string, for which currently one of the quickest ways is still the hack "%02x"*len(s) % tuple(s) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jan 16 14:11:53 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:11:53 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 06:20:18 EST." <20010116062018.A12935@thyrsus.com> References: <3a642768.6426631@smtp.worldonline.dk> <20010116062018.A12935@thyrsus.com> Message-ID: <200101161411.JAA05336@cj20424-a.reston1.va.home.com> > Finn Bock : > > I like it, because it removes yet another difference between Python and > > Jython. Jython happens to handle these chars specially: \n, \t, \b, \f > > and \r. [ESR] > This is an argument for adding \b and \f to the special set in > CPython. If the BDFL looks benignly on adding \v and \a, those > should go into Jython's special set too. No, I think Jython should remove \b and \f. Or the language standard could allow implementations some freedom here (as long as the output is a string literal). --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Tue Jan 16 15:06:34 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 16 Jan 2001 10:06:34 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: References: <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: <14948.25466.698063.240902@cj42289-a.reston1.va.home.com> Tim Peters writes: > Presumably so that *something* gets into 2.1a1. At least you, Jeremy and > Fredrik have tried them, and if that's all there can't be a tie . I > would agree this is not an ideal decision procedure. I've been using PyUNIT some, but haven't tried the Quixote unittest module, which tells me I can't make a particularly informed recommendation (vote, whatever). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas@xs4all.net Tue Jan 16 15:23:52 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 16:23:52 +0100 Subject: [Python-Dev] time functions In-Reply-To: <200101161355.IAA04802@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 08:55:01AM -0500 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> <20010116092211.O1005@xs4all.nl> <200101161355.IAA04802@cj20424-a.reston1.va.home.com> Message-ID: <20010116162350.A21010@xs4all.nl> On Tue, Jan 16, 2001 at 08:55:01AM -0500, Guido van Rossum wrote: > Let's not redesign the time module API too much. [snip] Agreed. > I fear that a committee could easily pee away years designing an > interface to satisfy absolutely every wish. A committee is a life form with six or more legs and no brain. Lazarus Long in "Time Enough For Love", by R. A. Heinlein. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip@mojam.com (Skip Montanaro) Tue Jan 16 17:23:56 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 16 Jan 2001 11:23:56 -0600 (CST) Subject: [Python-Dev] Re: [Patches] [Patch #102891] Alternative readline module In-Reply-To: References: Message-ID: <14948.33708.332464.107009@beluga.mojam.com> Michael> ... (or I'll just call it pyttyinput) Which, like "Guido", when properly pronounced should leave your monitor slightly moist... ;-) Skip From thomas@xs4all.net Tue Jan 16 17:36:03 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 18:36:03 +0100 Subject: [Python-Dev] Re: [Patches] [Patch #102891] Alternative readline module In-Reply-To: <14948.33708.332464.107009@beluga.mojam.com>; from skip@mojam.com on Tue, Jan 16, 2001 at 11:23:56AM -0600 References: <14948.33708.332464.107009@beluga.mojam.com> Message-ID: <20010116183603.B2776@xs4all.nl> On Tue, Jan 16, 2001 at 11:23:56AM -0600, Skip Montanaro wrote: > Which, like "Guido", when properly pronounced should leave your monitor > slightly moist... ;-) Nono, 'Guido' should be pronounced using a hard, back-of-your-throat 'G', more like a growl than a hiss. The less moisture the better :) You-were-thinking-of-Centraal-Wiskunde-Instituut-(cwi.nl)-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From trentm@ActiveState.com Tue Jan 16 18:36:29 2001 From: trentm@ActiveState.com (Trent Mick) Date: Tue, 16 Jan 2001 10:36:29 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <200101160408.XAA01368@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 11:08:46PM -0500 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> Message-ID: <20010116103626.D30209@ActiveState.com> On Mon, Jan 15, 2001 at 11:08:46PM -0500, Guido van Rossum wrote: > > Trent, you wrote that code. Why wouldn't this work just as well? > > (your code): > if ((pos = TELL64(fileno(fp))) == -1L) > return -1; > (my suggestion): > if (fgetpos(fp, &pos) != 0) > return -1; I agree, that looks to me like it would. I guess I just missed that when I wrote it. > > I would even go as far as to collapse the entire switch as follows: > > fpos_t pos; > switch (whence) { > case SEEK_END: > /* do a "no-op" seek first to sync the buffering so that > the low-level tell() can be used correctly */ > if (fseek(fp, 0, SEEK_END) != 0) > return -1; > /* fall through */ > case SEEK_CUR: > if (fgetpos(fp, &pos) != 0) > return -1; > offset += pos; > break; > /* case SEEK_SET: break; */ > } > return fsetpos(fp, &offset); Sure. Just get rid of the """do a "no-op" seek...""" comment because it is no longer applicable. I am not setup to test this on Win64 right and I don't suppose there are a lot of you out there with your own Win64 setups. I will be able to test this before the scheduled 2.1 beta (late Feb), though. Trent -- Trent Mick TrentM@ActiveState.com From trentm@ActiveState.com Tue Jan 16 19:34:17 2001 From: trentm@ActiveState.com (Trent Mick) Date: Tue, 16 Jan 2001 11:34:17 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <20010116103626.D30209@ActiveState.com>; from trentm@ActiveState.com on Tue, Jan 16, 2001 at 10:36:29AM -0800 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> <20010116103626.D30209@ActiveState.com> Message-ID: <20010116113417.I30209@ActiveState.com> On Tue, Jan 16, 2001 at 10:36:29AM -0800, Trent Mick wrote: > Sure. Just get rid of the """do a "no-op" seek...""" comment because it is no > longer applicable. I am not setup to test this on Win64 right and I don't s/right/right now/ Trent -- Trent Mick TrentM@ActiveState.com From cgw@fnal.gov Tue Jan 16 20:19:09 2001 From: cgw@fnal.gov (Charles G Waldman) Date: Tue, 16 Jan 2001 14:19:09 -0600 (CST) Subject: [Python-Dev] Re: [Patch #103248] Fix a memory leak in _sre.c Message-ID: <14948.44221.876681.838046@buffalo.fnal.gov> Frederik - I noticed that you chose to check in a slightly different patch than the one I submitted. I wonder why you chose to do this? In particular at line 1238 I had: if (PyErr_Occurred()) { Py_DECREF(self); return NULL; } and you changed this to if (PyErr_Occurred()) { PyObject_DEL(self); return NULL; } Can you explain why you made this (seemingly arbitrary) change? I think that since "self" was created via: self = PyObject_NEW_VAR(PatternObject, &Pattern_Type, n); which calls PyObjectINIT, which in turn calls _Py_NewReference, which increments _Py_RefTotal, it is incorrect to simply do a PyObject_DEL to de-allocate it -- won't this screw up the value of _Py_RefTotal? Admittedly this is a minor nit and only matters if Py_TRACE_REFS is defined - I just wanted to check to make sure my understanding of reference counting w.r.t. memory allocation and deallocation is correct - if the above is in error, I'd apprecate any corrections... From guido@python.org Tue Jan 16 20:53:41 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 15:53:41 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Tue, 16 Jan 2001 10:36:29 PST." <20010116103626.D30209@ActiveState.com> References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> <20010116103626.D30209@ActiveState.com> Message-ID: <200101162053.PAA13099@cj20424-a.reston1.va.home.com> > I agree, that looks to me like it would. I guess I just missed that when I > wrote it. Excellent! I've checked this in now -- we'll hear if it breaks anywhere soon enough. >I am not setup to test this on Win64 right [now] and I don't > suppose there are a lot of you out there with your own Win64 setups. What happened to ActiveState's Itanium boxes? --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Tue Jan 16 21:53:22 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 16 Jan 2001 16:53:22 -0500 Subject: [Python-Dev] Re: Detecting install time In-Reply-To: <200101160347.WAA01132@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 10:47:32PM -0500 References: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> <200101160347.WAA01132@cj20424-a.reston1.va.home.com> Message-ID: <20010116165322.B29674@kronos.cnri.reston.va.us> [CC'ing to the distutils-sig] On Mon, Jan 15, 2001 at 10:47:32PM -0500, Guido van Rossum wrote: >> For PEP 229, the setup.py script needs to figure out if it's running >> from the build directory, because then distutils.sysconfig needs to > >You could check for the presence of config.status -- that file is not >installed. This isn't a check suitable for inclusion in distutils.sysconfig, though, because it's so liable to being fooled (consider a Distutils-packaged module that comes with a configure script to build some library). Right now I'm using a hacked version of sysconfig with several patches like this: @@ -120,12 +121,16 @@ def get_config_h_filename(): """Return full pathname of installed config.h file.""" inc_dir = get_python_inc(plat_specific=1) + # XXX + if 1: inc_dir = '.' return os.path.join(inc_dir, "config.h") One hackish approach would be to add a assume_build_directories() to distutils.sysconfig, a little back door to be used by the setup.py script that comes with Python, so the above would become 'if build_time_flag: ...'. Anyone have a cleaner idea? --amk From akuchlin@mems-exchange.org Wed Jan 17 01:46:47 2001 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Tue, 16 Jan 2001 20:46:47 -0500 Subject: [Python-Dev] PEP 229 issues Message-ID: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> I'm in a quandry about the patch implementing PEP 229. The patch is quite close to being ready, with only a few minor issues remaining, but to fix those issues, I need to make some changes to the Distutils, such as the sysconfig modification I recently suggested. Problem: I believe the patch *must* go in at the alpha stage, because there are bound to be lots of platform-specific problems that will show up; it should not be added in the beta stage, because it'll need time to get tested and debugged, and I wouldn't be surprised if it has to be reverted later because of some insurmountable problem. Problem: Greg Ward, the Distutils maintainer, is away at the moment. I can check in changes to the Distutils without his say-so, but when Greg gets back he might shriek in horror and rip all of the changes out again. (Or he's stuck with maintaining them until 2.2.) Problem: 2.1alpha1 is due on Friday. So, what to do? If I know there's going to be an alpha2, that's probably fine; Greg should have resurfaced by then, and the patch can go in for alpha2. Or, I can check in the changes before Friday, and if they're unacceptable, they can be fixed for alpha2/beta1, or simply backed out. Or, I can leave Distutils alone and make setup.py a tissue of hacks and workarounds. For example, it might insert new versions of various functions into the distutils.sysconf module. Icky and fragile, but cleaning it up for beta1 would then be a priority. Suggestions? Pronouncements? --amk From guido@python.org Wed Jan 17 01:39:35 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 20:39:35 -0500 Subject: [Python-Dev] PEP 229 issues In-Reply-To: Your message of "Tue, 16 Jan 2001 20:46:47 EST." <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> References: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> Message-ID: <200101170139.UAA17954@cj20424-a.reston1.va.home.com> I expect that there will be an alpha2, but I still recommend that you check in *something* that works for alpha1, to get maximal testing coverage. Alpha1 may slip a day or so (Jeremy and I are both late with our big patches, respectively nested scopes and rich comparisons, that we really want to have in alpha1). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Wed Jan 17 02:04:53 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 16 Jan 2001 21:04:53 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Good idea [using string.encode()]! This could also be used to > "hexify" a string, for which currently one of the quickest ways > is still the hack > > "%02x"*len(s) % tuple(s) Note that as of 2.0, a far quicker way is to use binascii.b2a_hex(), or its absurdist (read "Barry" ) synonym binascii.hexlify(). I'm wary of using string.encode() for this, because one normally hexlifies binary data (e.g., like sha checksums), and 4 days of 7 we're more than not in favor of moving away from strings to carry binary data. Of course we can change our minds about this across releases, and have even-numbered releases deprecate the function forms while odd-numbered ones abjure methods. Works for me . From nas@arctrix.com Tue Jan 16 21:08:23 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 16 Jan 2001 13:08:23 -0800 Subject: [Python-Dev] [droux@tuks.co.za: Our application doesn't work with Debian packaged Python] Message-ID: <20010116130823.C9640@glacier.fnational.com> This message was on the debian-python list. Does anyone know why the patch is needed? Neil ----- Forwarded message from Danie Roux ----- Date: Tue, 16 Jan 2001 11:44:48 +0200 From: Danie Roux Subject: Our application doesn't work with Debian packaged Python To: Debian Python Good they all, Our program is an archiver for gnome that uses gnome-python with one widget written in C. I converted our program to autoconf and automake so anyone can (and please do!) compile it and see what I mean. Everything compiles fine. But when it runs it just throws a weird exception. The funny thing is, if I alien RedHat 6.2's python package, and install that, it works! I need to change nothing else. Only the python package. I then went and look at the source rpm. They have this patch in there: --- Python-1.5.2/Python/importdl.c.global Sat Jul 17 16:52:26 1999 +++ Python-1.5.2/Python/importdl.c Sat Jul 17 16:53:19 1999 @@ -441,13 +441,13 @@ #ifdef RTLD_NOW /* RTLD_NOW: resolve externals now (i.e. core dump now if some are missing) */ - void *handle = dlopen(pathname, RTLD_NOW); + void *handle = dlopen(pathname, RTLD_NOW | RTLD_GLOBAL); #else void *handle; if (Py_VerboseFlag) printf("dlopen(\"%s\", %d);\n", pathname, - RTLD_LAZY); - handle = dlopen(pathname, RTLD_LAZY); + RTLD_LAZY | RTLD_GLOBAL); + handle = dlopen(pathname, RTLD_LAZY | RTLD_GLOBAL); #endif /* RTLD_NOW */ if (handle == NULL) { PyErr_SetString(PyExc_ImportError, dlerror()); Sure enough this fixes my problem. The thing is that this means our program only works on Redhat (and who ever patched python 1.5.2 with this). So what can I do now? How can I get this patch into debian-python? How can I change my program to not need the patch? btw the program is garchiver, it will be hosted at sourceforge as soon as they get back to me, in the mean time I will mail anyone a copy of the sources. -- Danie Roux *shuffle* Adore Unix -- To UNSUBSCRIBE, email to debian-python-request@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org ----- End forwarded message ----- From guido@python.org Wed Jan 17 04:16:48 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 23:16:48 -0500 Subject: [Python-Dev] [droux@tuks.co.za: Our application doesn't work with Debian packaged Python] In-Reply-To: Your message of "Tue, 16 Jan 2001 13:08:23 PST." <20010116130823.C9640@glacier.fnational.com> References: <20010116130823.C9640@glacier.fnational.com> Message-ID: <200101170416.XAA20515@cj20424-a.reston1.va.home.com> > This message was on the debian-python list. Does anyone know why > the patch is needed? > - handle = dlopen(pathname, RTLD_LAZY); > + handle = dlopen(pathname, RTLD_LAZY | RTLD_GLOBAL); This comes back every once in a while. It means that they have an module whose shared library implementation exports symbols that are needed by another shared library (probably another module). IMO this approach is evil, because RTLD_GLOBAL means that *all* external symbols defined by any module are exported to all other shared libraries, and this will cause conflicts if the same symbol is exported by two different modules -- which can happen quite easily. (I don't know what happens on conflicts -- maybe you get an error, maybe it links to the wrong symbol.) The proper solution would be to put the needed entry points beside the init entry point in a separate shared library. But that's often not how quick-and-dirty extension modules are designed... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jan 17 04:22:54 2001 From: guido@python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 23:22:54 -0500 Subject: [Python-Dev] Rich Comparisons technical prerelease Message-ID: <200101170422.XAA20626@cj20424-a.reston1.va.home.com> I've got a working version of the rich comparisons ready for preview. The patch is here: http://www.python.org/~guido/richdiff.txt It's also referenced at sourceforge: http://sourceforge.net/patch/?func=detailpatch&patch_id=103283&group_id=5470 Here's a summary: - The comparison operators support "rich comparison overloading" (PEP 207). C extension types can provide a rich comparison function in the new tp_richcompare slot in the type object. The cmp() function and the C function PyObject_Compare() first try the new rich comparison operators before trying the old 3-way comparison. There is also a new C API PyObject_RichCompare() (which also falls back on the old 3-way comparison, but does not constrain the outcome of the rich comparison to a Boolean result). The rich comparison function takes two objects (at least one of which is guaranteed to have the type that provided the function) and an integer indicating the opcode, which can be Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, Py_GE (for <, <=, ==, !=, >, >=), and returns a Python object, which may be NotImplemented (in which case the tp_compare slot function is used as a fallback, if defined). Classes can overload individual comparison operators by defining one or more of the methods__lt__, __le__, __eq__, __ne__, __gt__, __ge__. There are no explicit "reversed argument" versions of these; instead, __lt__ and __gt__ are each other's reverse, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reverse (similar at the C level). No other implications are made; in particular, Python does not assume that == is the inverse of !=, or that < is the inverse of >=. This makes it possible to define types with partial orderings. Classes or types that want to implement (in)equality tests but not the ordering operators (i.e. unordered types) should implement == and !=, and raise an error for the ordering operators. It is possible to define types whose comparison results are not Boolean; e.g. a matrix type might want to return a matrix of bits for A < B, giving elementwise comparisons. Such types should ensure that any interpretation of their value in a Boolean context raises an exception, e.g. by defining __nonzero__ (or the tp_nonzero slot at the C level) to always raise an exception. XXX TO DO for this feature: - the test "test_compare" fails, because of the changed semantics for complex number comparisons (1j<2j raises an error now) - tuple, dict should implement EQ/NE so containers containing complex numbers can be compared for equality (list is already done) -- or complex numbers should be reverted to old behavior - list.sort() shoud use rich comparison - check for memory leaks - int, long, float contain new-style-cmp functions that aren't used to their full potential any more (the new-style-cmp functions introduced by Neil's coercion work are gone again) - decide on unresolved issues from PEP 207 - documentation - more testing - compare performance to 2.0 (microbench?) Please give this a good spin -- I'm hoping to check this in and make it part of the alpha 1 release Friday... --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Wed Jan 17 04:50:25 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 16 Jan 2001 23:50:25 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' References: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> Message-ID: <14949.9361.591610.684695@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Note that as of 2.0, a far quicker way is to use TP> binascii.b2a_hex(), or its absurdist (read "Barry" ) TP> synonym binascii.hexlify(). Thanks for the compliment Tim, but I can't take credit for that name. If it was me I'd have called it wudduptify() (and its inverse, notmuchlify()). I stole the name from Emacs's hexlify-buffer function which kind of does the same thing. would-converting-to-octal-digits-be-called-octopuslify-ly y'rs, -Barry From fredrik@effbot.org Wed Jan 17 08:12:32 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 09:12:32 +0100 Subject: [Python-Dev] Re: [Patch #103248] Fix a memory leak in _sre.c References: <14948.44221.876681.838046@buffalo.fnal.gov> Message-ID: <00fe01c0805d$432d4cd0$e46940d5@hagrid> Charles G Waldman wrote: > Can you explain why you made this (seemingly arbitrary) change? > > I think that since "self" was created via: > > self = PyObject_NEW_VAR(PatternObject, &Pattern_Type, n); > > which calls PyObjectINIT, which in turn calls _Py_NewReference, which > increments _Py_RefTotal, it is incorrect to simply do a PyObject_DEL > to de-allocate it -- won't this screw up the value of _Py_RefTotal? and what do you think will happen if you call the destructor before you've initialized all pointer fields in the object? (according to the docs, the NEW/New functions return uninitialized memory. in this case, we're bailing out before the object has been fully initialized. pattern_dealloc definitely isn't prepared to deal with random pointer values...) > Admittedly this is a minor nit and only matters if Py_TRACE_REFS is > defined - I just wanted to check to make sure my understanding of > reference counting w.r.t. memory allocation and deallocation is > correct - if the above is in error, I'd apprecate any corrections... same here. I don't doubt it's working as you say it does, but I find it strange that you shouldn't be able to DEL an object you just created with NEW... maybe DEL should be fixed? Cheers /F From thomas@xs4all.net Wed Jan 17 09:48:12 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 17 Jan 2001 10:48:12 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules Setup.config.in,1.7,1.8 Setup.dist,1.7,1.8 In-Reply-To: ; from esr@users.sourceforge.net on Wed, Jan 17, 2001 at 12:25:13AM -0800 References: Message-ID: <20010117104812.F2776@xs4all.nl> On Wed, Jan 17, 2001 at 12:25:13AM -0800, Eric S. Raymond wrote: > + # ndbm(3) may require -lndbm or similar > + @USE_NDBM_MODULE@ndbm ndbmmodule.c @HAVE_LIBNDBM@ This is an interesting module... It's not in the Modules/ directory :-) Did you mean 'dbmmodule.c' with a different library argument ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip@mojam.com (Skip Montanaro) Wed Jan 17 15:17:39 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 17 Jan 2001 09:17:39 -0600 (CST) Subject: [Python-Dev] Rich comparison confusion Message-ID: <14949.46995.259157.871323@beluga.mojam.com> I'm a bit confused about Guido's rich comparison stuff. In the description he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. >From a boolean standpoint this just can't be so. Guido mentions partial orderings, but I'm still confused. Consider this example: Objects of type A implement rich comparisons. Objects of type B don't. If my code looks like a = A() b = B() ... if b < a: ... My interpretation of the rich comparison stuff is that either 1. Since b doesn't implement rich comparisons, the interpreter falls back to old fashioned comparisons which may or may not allow the comparison of B objects and A objects. or 2. The sense of the inequality is switched (a > b) and the rich comparison code in A's implementation is called. That's my reading of it. It has to be wrong. The inverse comparison should be a >= b, not a > b, but the described pairing of comparison functions would imply otherwise. I'm sure I'm missing something obvious or revealing some fundamental failure of my grade school education. Please explain... Skip From akuchlin@mems-exchange.org Wed Jan 17 15:42:13 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 10:42:13 -0500 Subject: [Python-Dev] PEP 229 issues In-Reply-To: <200101170139.UAA17954@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 08:39:35PM -0500 References: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> <200101170139.UAA17954@cj20424-a.reston1.va.home.com> Message-ID: <20010117104213.B490@kronos.cnri.reston.va.us> On Tue, Jan 16, 2001 at 08:39:35PM -0500, Guido van Rossum wrote: >I expect that there will be an alpha2, but I still recommend that you >check in *something* that works for alpha1, to get maximal testing >coverage. Alpha1 may slip a day or so (Jeremy and I are both late >with our big patches, respectively nested scopes and rich comparisons, >that we really want to have in alpha1). OK; thanks for the pronouncement! I've checked in all the smaller changes that shouldn't break anything. All that's left now is to actually enable the new feature, which requires the nasty changes: * In the top-level Makefile.in, the "sharedmods" target simply runs "./python setup.py build", and "sharedinstall" runs "./python setup.py install". The "clobber" target also deletes the build/ subdirectory where Distutils puts its output. * Rip stuff out of the Setup files. Modules/Setup.config.in only contains entries for the gc and thread modules; the readline, curses, and db modules are removed because it's now setup.py's job to handle them. * Modules/Setup.dist now contains entries for only 3 modules -- _sre, posix, and strop. Guido and Jeremy are rushing to finish their patches in time for the alpha release, though Guido seems to be checking in the rich comparison stuff now. I don't want to impede them by making them stop to debug build problems, so I can either wait until they've landed their changes (at which point there's nothing major left, I think), or they can simply not do a 'cvs update' after the serious changes go in. Thoughts? --amk From barry@digicool.com Wed Jan 17 15:54:06 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 17 Jan 2001 10:54:06 -0500 Subject: [Python-Dev] Breakage in latest CVS Message-ID: <14949.49182.636526.292265@anthem.wooz.org> Looks like the latest CVS (updated just minutes ago) is broken. I'm trying to fix some of these complaints, but thought I'd at least report what I've found... -Barry ... gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -I./../Include -I.. -DHAVE_CONFIG_H -c floatobject.c -o floatobject.o floatobject.c:675: warning: excess elements in struct initializer after `float_as_number' floatobject.c:700: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) floatobject.c:700: initializer element for `PyFloat_Type.tp_flags' is not constant ... intobject.c:800: warning: excess elements in struct initializer after `int_as_number' intobject.c:825: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) intobject.c:825: initializer element for `PyInt_Type.tp_flags' is not constant make[1]: *** [intobject.o] Error 1 ... gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -I./../Include -I.. -DHAVE_CONFIG_H -c longobject.c -o longobject.o longobject.c:1865: warning: excess elements in struct initializer after `long_as_number' longobject.c:1890: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) longobject.c:1890: initializer element for `PyLong_Type.tp_flags' is not constant make[1]: *** [longobject.o] Error 1 From guido@python.org Wed Jan 17 16:09:27 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 17 Jan 2001 11:09:27 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Wed, 17 Jan 2001 09:17:39 CST." <14949.46995.259157.871323@beluga.mojam.com> References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: <200101171609.LAA04102@cj20424-a.reston1.va.home.com> > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. Yes. By this I mean that AA are interchangeable, ditto for A<=B and B>=A. Also A==B interchanges for B==A, and A!=B for B!=A. > From a boolean standpoint this just can't be so. Guido mentions partial > orderings, but I'm still confused. Consider this example: Objects of type A > implement rich comparisons. Objects of type B don't. If my code looks like > > a = A() > b = B() > ... > if b < a: > ... > > My interpretation of the rich comparison stuff is that either > > 1. Since b doesn't implement rich comparisons, the interpreter falls > back to old fashioned comparisons which may or may not allow the > comparison of B objects and A objects. > > or > > 2. The sense of the inequality is switched (a > b) and the rich > comparison code in A's implementation is called. It's case 2. > That's my reading of it. It has to be wrong. The inverse comparison should > be a >= b, not a > b, but the described pairing of comparison functions > would imply otherwise. We're trying very hard *not* to make any connections between a=b. You've learned in grade school that these are each other's Boolean inverse (a=b is false). However, for partial orderings this may not be true: for unordered a and b, none of ab, a>=b, a==b may be true. On the other hand, even for partially ordered types, aa (note: swapped arguments *and* swapped sense of comparison) always give the same outcome! > I'm sure I'm missing something obvious or revealing some fundamental failure > of my grade school education. Please explain... I think what threw you off was the ambiguity of "inverse". This means Boolean negation. I'm not relying on Boolean negation here -- I'm relying on the more fundamental property that aa have the same outcome. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh21@cam.ac.uk Wed Jan 17 16:13:32 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 17 Jan 2001 16:13:32 +0000 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Skip Montanaro's message of "Wed, 17 Jan 2001 09:17:39 -0600 (CST)" References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: Skip Montanaro writes: > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. > >From a boolean standpoint this just can't be so. Guido mentions partial > orderings, but I'm still confused. Consider this example: Objects of type A > implement rich comparisons. Objects of type B don't. If my code looks like > > a = A() > b = B() > ... > if b < a: > ... > > My interpretation of the rich comparison stuff is that either > > 1. Since b doesn't implement rich comparisons, the interpreter falls > back to old fashioned comparisons which may or may not allow the > comparison of B objects and A objects. > > or > > 2. The sense of the inequality is switched (a > b) and the rich > comparison code in A's implementation is called. > > That's my reading of it. It has to be wrong. The inverse comparison should > be a >= b, not a > b, but the described pairing of comparison functions > would imply otherwise. > > I'm sure I'm missing something obvious or revealing some fundamental failure > of my grade school education. Please explain... For a total order: a < b if and only if b > a. This is what the rich comparison code does. a < b if and only if a >= b. This is that the rich comparison code doesn't do. Does this make sense? Cheers, M. -- Presumably pronging in the wrong place zogs it. -- Aldabra Stoddart, ucam.chat From moshez@zadka.site.co.il Thu Jan 18 00:08:06 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Thu, 18 Jan 2001 02:08:06 +0200 (IST) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <14949.46995.259157.871323@beluga.mojam.com> References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: <20010118000806.D1C04A828@darjeeling.zadka.site.co.il> On Wed, 17 Jan 2001 09:17:39 -0600 (CST), Skip Montanaro wrote: > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. I think that you're confused between two meanings of inverses. You think: op is an inverse of op' if for every a,b (a op b) = not (a op' b) Guido meant (and I hope, implemented): op is an inverse of op' if for every a,b (a op b) = (b op' a) And aa a<=b iff b>=a Sounds sane. Unless I'm the one confused.... -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From fredrik@effbot.org Wed Jan 17 16:47:29 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 17:47:29 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: Message-ID: <012901c080a5$306023a0$e46940d5@hagrid> tim wrote: > > Should I check it in? > > Absolutely! But not like as for 2.0: check it in *now*, so we have a few > days to deal with surprises before the alpha release. as it turned out, the source I had didn't build, and the table- building python script generated something that wasn't quite compatible with the C code. bit rot. I've almost sorted it all out. will check it in later tonight (local time). From tim.one@home.com Wed Jan 17 18:27:11 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 17 Jan 2001 13:27:11 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Tools/idle CallTipWindow.py,1.2,1.3 CallTips.py,1.7,1.8 ClassBrowser.py,1.11,1.12 Debugger.py,1.14,1.15 Delegator.py,1.2,1.3 FileList.py,1.7,1.8 FormatParagraph.py,1.8,1.9 IdleConf.py,1.5,1.6 IdleHistory.py,1.3,1 In-Reply-To: <200101171358.IAA27661@cj20424-a.reston1.va.home.com> Message-ID: [an anonymous developer panics, after Tim "reindent"s the IDLE dir] > Oh no! > > I have a whole slew of changes to IDLE sitting in my work directory. > If I do an update half of these will turn into merge conflicts. :-( > > Don't worry, I'll get over it. I imagine this will pop up from time to time until everything is normalized. If it's about to burn you, run reindent.py on the affected directory *before* you update ("python redindent.py -v ."). That will make all the same changes to your local versions as were checked in, modulo the rare hand-edit (of which there were none in the IDLE directory). From akuchlin@mems-exchange.org Wed Jan 17 19:04:04 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 14:04:04 -0500 Subject: [Python-Dev] PEP 229 checked in Message-ID: I've checked in the last bit of the PEP 229 changes. Be sure to rename your Modules/Setup file (or do a 'make distclean' before rebuilding. Squeal if you run into trouble, or file bugs on SF. --am"Aieee!"k From jeremy@alum.mit.edu Wed Jan 17 19:12:47 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 17 Jan 2001 14:12:47 -0500 (EST) Subject: [Python-Dev] unexpected consequence of function attributes Message-ID: <14949.61103.258714.325465@localhost.localdomain> I have found one place in the library that depended on hasattr(func, '__dict__') to return false -- dis.dis. You might want to check and see if there is anything other code that doesn't expect function's to have extra attributes. I expect that only introspective code would be affected. Jeremy From barry@wooz.org Wed Jan 17 19:46:36 2001 From: barry@wooz.org (Barry A. Warsaw) Date: Wed, 17 Jan 2001 14:46:36 -0500 Subject: [Python-Dev] Re: unexpected consequence of function attributes References: <14949.61103.258714.325465@localhost.localdomain> Message-ID: <14949.63132.583025.303677@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I have found one place in the library that depended on JH> hasattr(func, '__dict__') to return false -- dis.dis. You JH> might want to check and see if there is anything other code JH> that doesn't expect function's to have extra attributes. I JH> expect that only introspective code would be affected. I guess we need a test_dis.py in the regression test suite, eh? :) Here's an extremely quick and dirty fix to dis.py. -Barry -------------------- snip snip -------------------- Index: dis.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/dis.py,v retrieving revision 1.28 diff -u -r1.28 dis.py --- dis.py 2001/01/14 23:36:05 1.28 +++ dis.py 2001/01/17 19:45:40 @@ -15,6 +15,10 @@ return if type(x) is types.InstanceType: x = x.__class__ + if hasattr(x, 'func_code'): + x = x.func_code + if hasattr(x, 'im_func'): + x = x.im_func if hasattr(x, '__dict__'): items = x.__dict__.items() items.sort() @@ -28,17 +32,12 @@ except TypeError, msg: print "Sorry:", msg print + elif hasattr(x, 'co_code'): + disassemble(x) else: - if hasattr(x, 'im_func'): - x = x.im_func - if hasattr(x, 'func_code'): - x = x.func_code - if hasattr(x, 'co_code'): - disassemble(x) - else: - raise TypeError, \ - "don't know how to disassemble %s objects" % \ - type(x).__name__ + raise TypeError, \ + "don't know how to disassemble %s objects" % \ + type(x).__name__ def distb(tb=None): """Disassemble a traceback (default: last traceback).""" From barry@wooz.org Wed Jan 17 19:49:51 2001 From: barry@wooz.org (Barry A. Warsaw) Date: Wed, 17 Jan 2001 14:49:51 -0500 Subject: [Python-Dev] Re: unexpected consequence of function attributes References: <14949.61103.258714.325465@localhost.localdomain> Message-ID: <14949.63327.22745.359978@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I have found one place in the library that depended on JH> hasattr(func, '__dict__') to return false -- dis.dis. You JH> might want to check and see if there is anything other code JH> that doesn't expect function's to have extra attributes. I JH> expect that only introspective code would be affected. Patch #103303 http://sourceforge.net/patch/?func=detailpatch&patch_id=103303&group_id=5470 From tim.one@home.com Wed Jan 17 20:51:57 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 17 Jan 2001 15:51:57 -0500 Subject: [Python-Dev] Windows Python totally hosed Message-ID: Failures range from test test_winsound skipped -- Module use of python20.dll conflicts with this version of Python. to test test_tokenize crashed -- exceptions.AttributeError: 're' module has no attribute 'compile' I suspect the latter is really a disguised version of C:\Code\python\dist\src\PCbuild>python Python 2.1a1 (#8, Jan 17 2001, 13:15:23) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import re Traceback (most recent call last): File "", line 1, in ? File "c:\code\python\dist\src\lib\re.py", line 28, in ? from sre import * File "c:\code\python\dist\src\lib\sre.py", line 17, in ? import sre_compile File "c:\code\python\dist\src\lib\sre_compile.py", line 11, in ? import _sre ImportError: Module use of python20.dll conflicts with this version of Python. >>> Suspect all of this has to do with patchlevel.h changing. I'll try to dope it out, but if anyone knows the cure off the top of their head, don't be shy! From akuchlin@mems-exchange.org Wed Jan 17 21:00:56 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 16:00:56 -0500 Subject: [Python-Dev] Re: 'Setup' buglet In-Reply-To: <200101171928.OAA21460@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 17, 2001 at 02:28:36PM -0500 References: <200101171928.OAA21460@cj20424-a.reston1.va.home.com> Message-ID: <20010117160056.A20603@kronos.cnri.reston.va.us> [Taking this bug public] On Wed, Jan 17, 2001 at 02:28:36PM -0500, Guido van Rossum wrote: >One problem seems to be that the creation >of the (minimal) Modules/Setup file doesn't seem to be doing the right >thing. When I delete Modules/Setup, the next "make" doesn't create >it; it used to be copied from Setup.dist if it doesn't exist. This seems to have been removed from Modules/Makefile.pre.in in revision 1.69 by Fred; instead the configure script now copies Setup.dist to Setup, so you have to rerun configure in order to create Modules/Setup after deleting it. --amk From mal@lemburg.com Wed Jan 17 21:04:29 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 22:04:29 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests Message-ID: <3A6608DD.E12A2422@lemburg.com> I've just checked in a patch which removes all uses of the assert statement in the regression tests. This makes the tests compatible with the -O mode of Python and also allows centralizing error reporting (many tests already provide their own little test function for this purpose). I urge you to only check in tests which use the new API verify() to verify a certain condition. The API is defined in the regression tools module test_support. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@effbot.org Wed Jan 17 21:21:56 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 22:21:56 +0100 Subject: [Python-Dev] Windows Python totally hosed References: Message-ID: <028801c080cb$86658350$e46940d5@hagrid> tim wrote: > Suspect all of this has to do with patchlevel.h changing. I'll try to dope > it out, but if anyone knows the cure off the top of their head, don't be > shy! text.replace("python20", "python21") for all files in the PCBuild directory, plus PC/config.h From tim.one@home.com Wed Jan 17 21:42:13 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 17 Jan 2001 16:42:13 -0500 Subject: [Python-Dev] Windows Python totally hosed In-Reply-To: <028801c080cb$86658350$e46940d5@hagrid> Message-ID: [/F] > text.replace("python20", "python21") for all files in > the PCBuild directory, plus PC/config.h Brrrr. It strikes me as insane to have the core Python files in an MS project file *named* after the release number (python20.dsp). So I'm going to change that to core.dsp so that at least that much never needs to be changed again. gratefully y'rs - tim From fredrik@effbot.org Wed Jan 17 21:47:28 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 22:47:28 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests References: <3A6608DD.E12A2422@lemburg.com> Message-ID: <02b401c080cf$1a3a5530$e46940d5@hagrid> mal wrote: > I urge you to only check in tests which use the new API > verify() to verify a certain condition. The API is defined > in the regression tools module test_support. did you run the test yourself after applying that patch? (a patch to the patch is on the way in. please check that the test suite still runs on non-Windows boxes...) From gstein@lyra.org Wed Jan 17 21:45:44 2001 From: gstein@lyra.org (Greg Stein) Date: Wed, 17 Jan 2001 13:45:44 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.106,2.107 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Wed, Jan 17, 2001 at 01:27:04PM -0800 References: Message-ID: <20010117134544.H7731@lyra.org> On Wed, Jan 17, 2001 at 01:27:04PM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv14991 > > Modified Files: > object.c > Log Message: > Deal properly (?) with comparing recursive datastructures. >... > - Change the in-progress code to use static variables instead of > globals (both the nesting level and the key for the thread dict were > globals but have no reason to be globals; the key can even be a > function-static variable in get_inprogress_dict()). The "compare_nesting" variable is a bit troublesome long-term -- it will cause threading issues in a free-threaded implementation. The solution is to put the value into the thread-state. [ not sure if it matters right now, but just bringing it up ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Wed Jan 17 21:55:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 17 Jan 2001 16:55:02 -0500 (EST) Subject: [Python-Dev] [PEP 205] weak references patch Message-ID: <14950.5302.356566.778486@cj42289-a.reston1.va.home.com> I've updated the patch that implements PEP 205: http://sourceforge.net/patch/?func=detailpatch&patch_id=103203&group_id=5470 The actual patch is too big for SF: http://starship.python.net/crew/fdrake/patches/weakref.patch-5 One thing about this is that it changes some of the low-level object creation macros, so you'll need to do a "make clean" before "make" when testing it. Have fun! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal@lemburg.com Wed Jan 17 22:16:29 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 23:16:29 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests References: <3A6608DD.E12A2422@lemburg.com> <02b401c080cf$1a3a5530$e46940d5@hagrid> Message-ID: <3A6619BD.2AC8F6D3@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > I urge you to only check in tests which use the new API > > verify() to verify a certain condition. The API is defined > > in the regression tools module test_support. > > did you run the test yourself after applying that patch? Yes, but as I wrote in the SF patch message: I can only test it on Linux and there not all tests are run due to missing extensions. The alpha testing will hopefully catch all possible bugs this patch introduced. > (a patch to the patch is on the way in. please check > that the test suite still runs on non-Windows boxes...) I'll have to leave that to the Windows wizards, sorry. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas@xs4all.net Wed Jan 17 22:49:25 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 17 Jan 2001 23:49:25 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: ; from akuchlin@mems-exchange.org on Wed, Jan 17, 2001 at 02:04:04PM -0500 References: Message-ID: <20010117234925.A17392@xs4all.nl> On Wed, Jan 17, 2001 at 02:04:04PM -0500, Andrew Kuchling wrote: > I've checked in the last bit of the PEP 229 changes. Be sure to > rename your Modules/Setup file (or do a 'make distclean' before > rebuilding. make distclean doesn't remove Modules/Setup anymore :) Also, I couldn't get it to work with an old tree, even after several make distclean/reconfigures. I got tired looking for it, so I just grabbed a new tree. > Squeal if you run into trouble, or file bugs on SF. I have a couple of questions: what to do when setup.py doesn't work ? Is there a way to make it bypass a module ? What about specifying include dirs manually, for some modules (for instance, when you have readline source in a separate directory, and want to link it statically.) Here are are some specific squeals. See at the bottom for the most important one :) On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by setup.py. Also, SSL support for the socket module was not enabled, though OpenSSL is installed, in the default path. On Debian GNU/Linux' 'woody', the 'testing' (soon 'stable') branch, I can't compile dbmmodule: building 'dbm' extension gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -fpic -I. -I/home/thomas/python/python/dist/src/./Include -IInclude/ -c /home/thomas/python/python/dist/src/Modules/dbmmodule.c -o build/temp.linux-i686-2.1/dbmmodule.o /home/thomas/python/python/dist/src/Modules/dbmmodule.c:24: #error "No ndbm.h available!" error: command 'gcc' failed with exit status 1 make: *** [sharedmods] Error 1 (ndbm.h does exist, as /usr/include/db1/ndbm.h. There is also /usr/include/gdbm-ndbm.h, but I'm not sure if that's the same.) Nor can I build the _tkinter module there: building '_tkinter' extension gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -fpic -DWITH_APPINIT=1 -I/usr/X11R6/include -I. -I/home/thomas/python/python/dist/src/./Include -IInclude/ -c /home/thomas/python/python/dist/src/Modules/_tkinter.c -o build/temp.linux-i686-2.1/_tkinter.o /home/thomas/python/python/dist/src/Modules/_tkinter.c:44: tcl.h: No such file or directory In file included from /home/thomas/python/python/dist/src/Modules/_tkinter.c:45:/usr/include/tk.h:66: tcl.h: No such file or directory error: command 'gcc' failed with exit status 1 make: *** [sharedmods] Error 1 The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, which I personally like a lot, though it's probably a bitch to autodetect. (I tried, using autoconf ;-P) On Debian GNU/Linux 'sid', the current unstable branch, I can't compile Python at all, now: c++ -Xlinker -export-dynamic python.o \ ../libpython2.1.a -lpthread -ldl -lutil -lm -o python ../libpython2.1.a(posixmodule.o): In function `posix_tmpnam': /home/thomas/python/python-write/dist/src/Modules/./posixmodule.c:4115: the use of `tmpnam_r' is dangerous, better use `mkstemp' ../libpython2.1.a(posixmodule.o): In function `posix_tempnam': /home/thomas/python/python-write/dist/src/Modules/./posixmodule.c:4071: the use of `tempnam' is dangerous, better use `mkstemp' mv python ../python make[1]: Leaving directory `/home/thomas/python/python-write/dist/src/Modules' ./python ./setup.py build running build running build_ext Traceback (most recent call last): File "./setup.py", line 460, in ? main() File "./setup.py", line 455, in main ext_modules=[Extension('struct', ['structmodule.c'])] File "/home/thomas/python/python-write/dist/src/Lib/distutils/core.py", line 138, in setup dist.run_commands() File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 871, in run_commands self.run_command(cmd) File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 891, in run_command cmd_obj.run() File "/home/thomas/python/python-write/dist/src/Lib/distutils/command/build.py", line 106, in run self.run_command(cmd_name) File "/home/thomas/python/python-write/dist/src/Lib/distutils/cmd.py", line 328, in run_command self.distribution.run_command(command) File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 891, in run_command cmd_obj.run() File "/home/thomas/python/python-write/dist/src/Lib/distutils/command/build_ext.py", line 202, in run customize_compiler(self.compiler) File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 121, in customize_compiler (cc, opt, ccshared, ldshared, so_ext) = \ File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 389, in get_config_vars func() File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 302, in _init_posix raise DistutilsPlatformError, my_msg distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) make: *** [sharedmods] Error 1 For the record, I don't have a /usr/lib/python2.1 directory on the other machines either. I haven't been able to test FreeBSD yet, will get to that later tonight. And most importantly(!), on all these machines, 'make test' stops functioning. In fact, after setup.py started building, you can't run 'make' without 'make clean' anymore. You get a lot of undefined-symbol warnings (see below.) If you run 'make clean;make test' it also doesn't work, because the build directory is not in the Python library path, and regrtest.py requires (at least) the time module. c++ -Xlinker -export-dynamic python.o \ ../libpython2.1.a -lpthread -ldl -lutil -lm -o python ../libpython2.1.a(posixmodule.o): In function `posix_tmpnam': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:4115: the use of `tmpnam_r' is dangerous, better use `mkstemp' ../libpython2.1.a(posixmodule.o): In function `posix_tempnam': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:4071: the use of `tempnam' is dangerous, better use `mkstemp' ../libpython2.1.a(myreadline.o): In function `my_fgets': /home/thomas/python/python/dist/src/Parser/myreadline.c:41: undefined reference to `PyOS_InterruptOccurred' /home/thomas/python/python/dist/src/Parser/myreadline.c:35: undefined reference to `PyOS_InterruptOccurred' ../libpython2.1.a(errors.o): In function `PyErr_SetFromErrnoWithFilename': /home/thomas/python/python/dist/src/Python/errors.c:260: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(pythonrun.o): In function `Py_Finalize': /home/thomas/python/python/dist/src/Python/pythonrun.c:193: undefined reference to `PyOS_FiniInterrupts' ../libpython2.1.a(pythonrun.o): In function `initsigs': /home/thomas/python/python/dist/src/Python/pythonrun.c:1161: undefined reference to `PyOS_InitInterrupts' ../libpython2.1.a(traceback.o): In function `tb_printinternal': /home/thomas/python/python/dist/src/Python/traceback.c:213: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(fileobject.o): In function `get_line': /home/thomas/python/python/dist/src/Objects/fileobject.c:883: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `long_format': /home/thomas/python/python/dist/src/Objects/longobject.c:644: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `x_divrem': /home/thomas/python/python/dist/src/Objects/longobject.c:855: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `long_mul': /home/thomas/python/python/dist/src/Objects/longobject.c:1193: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(object.o):/home/thomas/python/python/dist/src/Objects/object.c:174: more undefined references to `PyErr_CheckSignals' follow ../libpython2.1.a(posixmodule.o): In function `posix_fork': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:1666: undefined reference to `PyOS_AfterFork' ../libpython2.1.a(posixmodule.o): In function `posix_forkpty': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:1733: undefined reference to `PyOS_AfterFork' collect2: ld returned 1 exit status make[1]: *** [link] Error 1 make[1]: Leaving directory `/home/thomas/python/python/dist/src/Modules' make: *** [python] Error 2 -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Wed Jan 17 22:56:58 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 23:56:58 +0100 Subject: [Python-Dev] Standard install locations for Python ? Message-ID: <3A66233A.A6AE07BD@lemburg.com> I'm currently busy building new version of my mx packages. While trying to convert all of them to distutils I found that there seems to be no standard for installing documentation or other data files of Python extensions. I also noted, that for Windows the standard extension installation defaults to \Python instead of some \Python\Site-Packages. So the general question is: Where should Python extensions install themselves and their docs ? (On Linux the typical place for docs is /usr/doc/packages, for Python code it is /usr/local/lib/pythonX.X/site-packages, BTW) Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Wed Jan 17 23:04:09 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 17 Jan 2001 18:04:09 -0500 Subject: [Python-Dev] Rich Comparisons technical prerelease In-Reply-To: <200101170422.XAA20626@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 11:22:54PM -0500 References: <200101170422.XAA20626@cj20424-a.reston1.va.home.com> Message-ID: <20010117180409.A17897@thyrsus.com> Guido van Rossum : > This makes it possible to define types with partial orderings. Guido's time machine is working again, and seems now to have been augmented by telepathy. I was just thinking about bugging him about this... I will definitely check this out with my set() class -- it was waiting on rich comparisons so I could do partial-orderings properly. If it works, we'll have set algebra for the standard library. Coolness. -- Eric S. Raymond Under democracy one party always devotes its chief energies to trying to prove that the other party is unfit to rule--and both commonly succeed, and are right... The United States has never developed an aristocracy really disinterested or an intelligentsia really intelligent. Its history is simply a record of vacillations between two gangs of frauds. --- H. L. Mencken From akuchlin@mems-exchange.org Wed Jan 17 23:09:47 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 18:09:47 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117234925.A17392@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 17, 2001 at 11:49:25PM +0100 References: <20010117234925.A17392@xs4all.nl> Message-ID: <20010117180947.E9384@kronos.cnri.reston.va.us> On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: >I have a couple of questions: what to do when setup.py doesn't work ? Is >there a way to make it bypass a module ? What about specifying include dirs There's a 'disabled_module_list' global in the code, but no way to set it from the command-line yet, since I couldn't figure out how to do that in time. >On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by >setup.py. Also, SSL support for the socket module was not enabled, though >OpenSSL is installed, in the default path. Can you take a look at the detection code in setup.py and see what's going wrong. I believe it should be found if OpenSSL is in /usr/local/, but /usr/contrib isn't checked currently. >The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, >which I personally like a lot, though it's probably a bitch to autodetect. >(I tried, using autoconf ;-P) There's code to handle Debian, though I have no way of testing it, and it worked on Neil's Debian box for some reason. Search for debian_tcl_include in setup.py, and see if you can fix it. >distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) Are you sure setup.py is up to date; do a 'cvs update setup.py' to check. You might get a "setup.py is in the way; remove it' message if you downloaded the first setup.py script manually. >without 'make clean' anymore. You get a lot of undefined-symbol warnings >(see below.) If you run 'make clean;make test' it also doesn't work, because >the build directory is not in the Python library path, and regrtest.py >requires (at least) the time module. Again, be sure the tree is up to date; I think this stems from attempting to compile the signal module as shared, which doesn't work. I know that "make test" doesn't work, but am not sure how to fix it yet. --amk From tim.one@home.com Wed Jan 17 23:42:24 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 17 Jan 2001 18:42:24 -0500 Subject: [Python-Dev] Windows Python totally rad Message-ID: Windows Python runs normally again, modulo four test failures I figure are due to the "get rid of assert" patch. Note that the python20 DevStudio subproject is gone. It's been replaced by a new subproject named pythoncore. From thomas@xs4all.net Wed Jan 17 23:44:00 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 00:44:00 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117234925.A17392@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 17, 2001 at 11:49:25PM +0100 References: <20010117234925.A17392@xs4all.nl> Message-ID: <20010118004400.B17392@xs4all.nl> On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: I got around to testing on FreeBSD now, and it actually went pretty smooth! However, some small points: > On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > setup.py. Also, SSL support for the socket module was not enabled, though > OpenSSL is installed, in the default path. Curiously enough, FreeBSD, with OpenSSL installed in /usr/include/openssl, *did* get the socketmodule compiled with SSL support, but without the necessary -I directive, so the compile failed. > And most importantly(!), on all these machines, 'make test' stops > functioning. In fact, after setup.py started building, you can't run 'make' > without 'make clean' anymore. You get a lot of undefined-symbol warnings Strangely enough, this problem does not exist on FreeBSD. I can run 'make' or 'make test' after 'make' just fine. 'make test' still doesn't work because of the incorrect library path, but it doesn't barf like the other systems (BSDI and Debian Linux) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr@thyrsus.com Thu Jan 18 00:32:53 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 17 Jan 2001 19:32:53 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <20010118000806.D1C04A828@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 18, 2001 at 02:08:06AM +0200 References: <14949.46995.259157.871323@beluga.mojam.com> <20010118000806.D1C04A828@darjeeling.zadka.site.co.il> Message-ID: <20010117193253.A18565@thyrsus.com> Moshe Zadka : > I think that you're confused between two meanings of inverses. > > You think: > op is an inverse of op' if for every a,b (a op b) = not (a op' b) > > Guido meant (and I hope, implemented): > op is an inverse of op' if for every a,b (a op b) = (b op' a) I thought the same. if (a op1 b) <=> (b op2 a), op2 is properly described as the "reflection" of op1, and vice-versa. -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From greg@cosc.canterbury.ac.nz Thu Jan 18 00:22:11 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Jan 2001 13:22:11 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> Michael Hudson : > a < b if and only if b > a. > This is what the rich comparison code does. Someone is bound to come up with a use for comparison operator overloading in which this isn't true, just to be difficult! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Thu Jan 18 03:40:31 2001 From: guido@python.org (Guido van Rossum) Date: Wed, 17 Jan 2001 22:40:31 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.106,2.107 In-Reply-To: Your message of "Wed, 17 Jan 2001 13:45:44 PST." <20010117134544.H7731@lyra.org> References: <20010117134544.H7731@lyra.org> Message-ID: <200101180340.WAA00655@cj20424-a.reston1.va.home.com> > > - Change the in-progress code to use static variables instead of > > globals (both the nesting level and the key for the thread dict were > > globals but have no reason to be globals; the key can even be a > > function-static variable in get_inprogress_dict()). > > The "compare_nesting" variable is a bit troublesome long-term -- it will > cause threading issues in a free-threaded implementation. The solution is to > put the value into the thread-state. > > [ not sure if it matters right now, but just bringing it up ] Good point -- especially since the in-progress-dict is already part of the thread state. Jeremy explained to me that the compare_nesting variable is mostly an optimization (avoiding the work with the in-progress-dict when we don't know for sure that it's worth it) but yes, mixing nesting levels (even if the dicts are separate) could cause coupling or interference between threads... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Thu Jan 18 04:20:30 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 17 Jan 2001 22:20:30 -0600 (CST) Subject: [Python-Dev] urllib.urlencode & repeated values Message-ID: <14950.28430.572215.10643@beluga.mojam.com> I'm pretty sure this has come up before, but urllib.urlencode doesn't handle repeated parameters properly. If I call urllib.urlencode({"performers": ("U2","Lawrence Martin")}) instead of getting performers=U2&performers=Lawrence+Martin I get a quoted stringified tuple: performers=%28%27U2%27%2c+%27Lawrence+Martin%27%29 Obviously, fixing this will change the function's current semantics, but I think it's worth treating lists and tuples (actually, any sequence) as repeated values. If the existing semantics are deemed valuable enough, a third default parameter could be added to switch on the new behavior when desired. If others agree I'd be happy to whip up a patch. I think it's a bug. Skip From jeremy@alum.mit.edu Thu Jan 18 02:58:19 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 17 Jan 2001 21:58:19 -0500 (EST) Subject: [Python-Dev] bug in grammar Message-ID: <14950.23499.275398.963621@localhost.localdomain> As part of the implementation of PEP 227 (and in an attempt to reach some low-hanging fruit Guido mentioned on the types-sig long ago), I have been working on a compiler pass that generates a module-level symbol table. I recently discovered a bug in the handling of list comprehensions that was giving me headaches. I realize now that the problem is with the current grammar and/or compiler. Here's a simple demonstration; try it in your friendly python 2.0 interpreter. >>> [i for i in range(10)] = (1, 2, 3) Traceback (most recent call last): File "", line 1, in ? ValueError: unpack list of wrong size The generated bytecode is: 0 SET_LINENO 0 3 SET_LINENO 1 6 LOAD_CONST 0 (1) 9 LOAD_CONST 1 (2) 12 LOAD_CONST 2 (3) 15 BUILD_TUPLE 3 18 UNPACK_SEQUENCE 1 21 STORE_NAME 0 (i) 24 LOAD_CONST 3 (None) 27 RETURN_VALUE I assume this isn't intended :-). The compiler is ignoring everything after the initial atom in the list comprehension. It's basically compiling the code as if it were: [i] = (1, 2, 3) I'm not sure how to try and fix this. Should the grammar allow one to construct the example statement above? If not, I'm not sure how to fix the grammar. If not, I suppose the compiler should detect that the list comp is misplaced. This seems fairly messy, since there are about 10 nodes between the expr_stmt and the list_for. Or is this a cool way to use list comprehensions to generate ValueErrors? Jeremy From akuchlin@mems-exchange.org Thu Jan 18 05:19:31 2001 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Thu, 18 Jan 2001 00:19:31 -0500 Subject: [Python-Dev] Embedded language discussion Message-ID: <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> http://www.kuro5hin.org/?op=displaystory;sid=2001/1/16/11334/2280 The poster is on a project that's trying to use Python, but they're encountering unspecified problems (perhaps because of the global interpreter lock). --amk From mal@lemburg.com Thu Jan 18 09:32:54 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 10:32:54 +0100 Subject: [Python-Dev] Windows Python totally rad References: Message-ID: <3A66B846.3D24B959@lemburg.com> Tim Peters wrote: > > Windows Python runs normally again, modulo four test failures I figure are > due to the "get rid of assert" patch. Could you tell me which these are ? The tests tested all passed just fine, so I guess these must be Windows-related problems. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@effbot.org Thu Jan 18 06:48:41 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 18 Jan 2001 07:48:41 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <012901c080a5$306023a0$e46940d5@hagrid> Message-ID: <008701c0811a$b3371c00$e46940d5@hagrid> I wrote: > I've almost sorted it all out. will check it in later tonight (local > time). python build problems and real life got in the way. will 2.1a1 be released according to plan? will there be a 2.1a2 release? maybe I should postpone this? From esr@thyrsus.com Thu Jan 18 07:23:21 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 18 Jan 2001 02:23:21 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? Message-ID: <20010118022321.A9021@thyrsus.com> So I'm writing a module to that needs to generate unique cookies. The module will run inside one of two environments: (1) a trivial test wrapper, not threaded, and (2) a lomg-running multithreaded server. Because Python garbage-collects, hash() of a just-created object isn't good enough. Because we may be threading, millisecond time isn't good enough. Because we may *not* be threading, thread ID isn't good either. On the other hand, I'm on Linux getting millisecond time resolution. And it's not hard to notice that an object hash is a memory address. So, how about `time.time()` + hex(hash([]))? It looks to me like this will remain unique forever, because another thread would have to create an object at the same memory address during the same millisecond to collide. Furthermore, it looks to me like this hack might be portable to any OS with a clock tick shorter than its timeslice. Comments? -- Eric S. Raymond Good intentions will always be pleaded for every assumption of authority. It is hardly too strong to say that the Constitution was made to guard the people against the dangers of good intentions. There are men in all ages who mean to govern well, but they mean to govern. They promise to be good masters, but they mean to be masters. -- Daniel Webster From ping@lfw.org Thu Jan 18 09:29:13 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 01:29:13 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101161402.JAA05045@cj20424-a.reston1.va.home.com> Message-ID: On Tue, 16 Jan 2001, Guido van Rossum wrote: > You mean the tp_print and tp_str function slots in type objects, > right? tp_print *should* always render exactly the same as tp_str. > tp_print is used by the print statement, not by value display at the > interactive prompt. Uh, i hate to disagree with you about your own interpreter, but: com_expr_stmt in Python/compile.c inserts a PRINT_EXPR opcode if c_interactive is true; eval_code2 in Python/ceval.c handles PRINT_EXPR by calling displayhook; sys_displayhook in Python/sysmodule.c prints the object by calling PyFile_WriteObject on sys.stdout; PyFile_WriteObject in Objects/fileobject.c calls PyObject_Print if the file is really a PyFileObject; PyObject_Print in Objects/object.c calls op->ob_type->tp_print if it's not NULL. The print statement produces a PRINT_ITEM opcode, which invokes PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW flag is propagated down to PyObject_Print and into string_print, where it causes the string to fwrite itself directly without quoting. > So, string_print most definitely should *not* be changed -- only > string_repr! I had to change them both before i actually saw the change in the interactive interpreter. Actually, your statement above (that the two should always render the same) seems to imply that if i change one, i must also change the other. -- ?!ng From sjoerd@oratrix.nl Thu Jan 18 10:11:09 2001 From: sjoerd@oratrix.nl (Sjoerd Mullender) Date: Thu, 18 Jan 2001 11:11:09 +0100 Subject: [Python-Dev] distutils in Python 2.1 not ready for prime time Message-ID: <20010118101110.6D29C31E1B8@bireme.oratrix.nl> I just updated my copy of python with the current CVS version and I am not happy. The current version uses distutils for configuring and compiling most modules that are written in C. That is a nice idea in theory, but in practice it's not ready for prime time yet. The major advantage of using a Setup file is that you can add your own -I and -L compiler flags on a module-by-module basis. I *need* those flags since not all libraries and include files are in standard places (e.g. I need -I/usr/local/include and -L/usr/local/lib for some modules which my compiler doesn't provide by itself). There seems to be no way to tell distutils to supply those flags. The documentation (only on the web site, also not great, but I assume more documentation (at least an up-to-date README) will be provided in the final release) says that that has not yet been implemented. -- Sjoerd Mullender From ping@lfw.org Thu Jan 18 10:14:19 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 02:14:19 -0800 (PST) Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: <3A66BCCC.14997FE3@lemburg.com> Message-ID: I hope you don't mind that i'm taking this over to python-dev, because it led me to discover a more general issue (see below). For the others on python-dev, here's the background: MAL was about to check in the unistr() function, described as follows: > This patch adds a utility function unistr() which works just like > the standard builtin str() -- only that the return value will > always be a Unicode object. > > The patch also adds a new object level C API PyObject_Unicode() > which complements PyObject_Str(). I responded: > Why are unistr() and unicode() two separate functions? > > str() performs one task: convert to string. It can convert anything, > including strings or Unicode strings, numbers, instances, etc. > > The other type-named functions e.g. int(), long(), float(), list(), > tuple() are similar in intent. > > Why have unicode() just for converting strings to Unicode strings, > and unistr() for converting everything else to a Unicode string? > What does unistr(x) do differently from unicode(x) if x is a string? MAL responded: > unistr() is meant to complement str() very closely. unicode() > works as constructor for Unicode objects which can also take > care of decoding encoded data. str() and unistr() don't provide > this capability but instead always assume the default encoding. > > There's also a subtle difference in that str() and unistr() > try the tp_str slot which unicode() doesn't. unicode() > supports any character buffer which str() and unistr() don't. Okay, given this explanation, i still feel fairly confident that unicode() should subsume unistr(). Many of the other type-named functions try various slots: int() looks for __int__ float() looks for __float__ long() looks for __long__ str() looks for __str__ In testing this i also discovered the following: >>> class Foo: ... def __int__(self): ... return 3 ... >>> f = Foo() >>> int(f) 3 >>> long(f) Traceback (most recent call last): File "", line 1, in ? AttributeError: Foo instance has no attribute '__long__' >>> float(f) Traceback (most recent call last): File "", line 1, in ? AttributeError: Foo instance has no attribute '__float__' This is kind of surprising. How about: int() looks for __int__ float() looks for __float__, then tries __int__ long() looks for __long__, then tries __int__ str() looks for __str__ unicode() looks for __unicode__, then tries __str__ The extra parameter to unicode() is very similar to the extra parameter to int(), so i think there is a natural parallel here. Hmm... what about the other types? Wow!! __complex__ can produce a segfault! >>> complex >>> class Foo: ... def __complex__(self): return 3 ... >>> Foo() <__main__.Foo instance at 0x81e8684> >>> f = _ >>> complex(f) Segmentation fault (core dumped) This happens because builtin_complex first retrieves and saves the PyNumberMethods of the argument (in this case, from the instance), then tries to call __complex__ (in this case, returning 3), and THEN coerces the result using nbr->nb_float if the result is not complex! (This calls the instance's nb_float method on the integer object 3!!) I think __complex__ should probably look for __complex__, then __float__, then __int__. One could argue for __list__, __tuple__, or __dict__, but that seems much weaker; the Pythonic way has always been to implement __getitem__ instead. There is no built-in dict(); if it existed i suppose it would do the opposite of x.items(); again a weak argument, though i might have found such a function useful once or twice. And that about covers the built-in types for data. -- ?!ng From ping@lfw.org Thu Jan 18 10:16:42 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 02:16:42 -0800 (PST) Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Message-ID: On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > str() looks for __str__ Oops. I forgot that str() looks for __str__, then tries __repr__ So, presumably, unicode() should look for __unicode__, then __str__, then __repr__ -- ?!ng From mal@lemburg.com Thu Jan 18 10:51:46 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 11:51:46 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: Message-ID: <3A66CAC2.74FC894@lemburg.com> Ka-Ping Yee wrote: > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > str() looks for __str__ > > Oops. I forgot that > > str() looks for __str__, then tries __repr__ > > So, presumably, > > unicode() should look for __unicode__, then __str__, then __repr__ Not quite... str() does this: 1. strings are passed back as-is 2. the type slot tp_str is tried 3. the method __str__ is tried 4. Unicode returns are converted to strings 5. anything other than a string return value is rejected unistr() does the same, but makes sure that the return value is an Unicode object. unicode() does the following: 1. for instances, __str__ is called 2. Unicode objects are returned as-is 3. string objects or character buffers are used as basis for decoding 4. decoding is applied to the character buffer and the results are returned I think we should perhaps merge the two approaches into one which then applies all of the above in unicode() (and then forget about unistr()). This might lose hide some type errors, but since all other generic constructors behave more or less in the same way, I think unicode() should too. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin@mira.cs.tu-berlin.de Thu Jan 18 10:48:30 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 11:48:30 +0100 Subject: [Python-Dev] Having extensions builtin Message-ID: <200101181048.f0IAmU210251@mira.informatik.hu-berlin.de> With the new distutils configuration scheme, it appears to be difficult to build modules in a non-shared way. Building modules non-shared is desirable when freezing is attempted, and also to reduce the startup time and memory consumption. It is still possible to add modules to Setup or Setup.local, so that they will be build into the interpreter. However, setup.py will still build them in a shared way afterwards. I propose that setup.py builds only those modules that are not builtin. Regards, Martin From martin@mira.cs.tu-berlin.de Thu Jan 18 12:20:06 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 13:20:06 +0100 Subject: [Python-Dev] Standard install locations for Python ? Message-ID: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> > Where should Python extensions install themselves and their docs? I feel that extensions should not need to care. For extensions, distutils will pick a location, and the system administrator configuration the package can chose a different location. Unfortunately, distutils does not support the installation of documentation, which I think it should. Now switching sides, as an administrator, I'd wish distutils to follow the system conventions by default. That means on Linux, documentation should go into the system's directory, which is /usr/share/doc according to latest standards. Distributions vary, so distutils should find out - e.g. by querying the location from rpm. In addition, when building RPMs, distutils should declare these files as %doc in the spec file, so RPM will install it following the system conventions. On Windows, the convention apparently is to put the documentation "nearby" the software, so it should probably go into Doc or a subdirectory thereof. On Unix, there appears to be no standard location, unless the documentation consists of man pages or perhaps info files. So /share/doc is probably a place as good as any other. Regards, Martin From martin@mira.cs.tu-berlin.de Thu Jan 18 10:39:30 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 11:39:30 +0100 Subject: [Python-Dev] SSL detection problem Message-ID: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> The distutils-based configuration fails to build on my system (SuSE 7.0) with the error /usr/src/python/Modules/socketmodule.c:159: rsa.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:160: crypto.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:161: x509.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:162: pem.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:163: ssl.h: Datei oder Verzeichnis nicht gefunden The problem is that these header files are in /usr/include/openssl, which is not in the standard include search path. So the obvious request is: could this be fixed? I guess when setup.py finds the openssl library, it should also try to find ssl.h, in some obvious locations. The not-so-obvious question: How can one work-around such a problem with the new setup scheme? In the old scheme, I could have chosen to either provide the right -I option in Modules/Setup, to disable SSL support, or to disable the _socket module altogether. How can I achieve either configuration with the new scheme? Regards, Martin P.S. As a quick hack, I added a custom include_dirs parameter to the SSL extension. From martin@mira.cs.tu-berlin.de Thu Jan 18 12:39:54 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 13:39:54 +0100 Subject: [Python-Dev] bug in grammar Message-ID: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> > Should the grammar allow one to construct the example statement > above? It should not. Please note that the grammar allows a number of other things, e.g. a+b = c (pass this to parser.suite to see details) > If not, I'm not sure how to fix the grammar. The central problem is that it allows testlist on the LHS of an augassign or '=', whereas the languages only allows a small subset in that position. It is not possible to restrict the grammar in itself, as that will necessarily produce a conflict - you only know that the '+' was incorrect when you see the '='. > I suppose the compiler should detect that the list comp is misplaced I think there should be a well-formedness pass in-between. I.e. after the AST has been build, a single pass should descend through the tree, looking for an expr_statement with more than a single testlist. Once it finds one, it should confirm that this really is a well-formed lvalue (in C speak). In this case, the test should be that each term is a an atom without factors. If the parser itself performs such checks, the compiler could be simplified in many places, I guess. Regards, Martin From thomas@xs4all.net Thu Jan 18 09:53:14 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 10:53:14 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117180947.E9384@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Wed, Jan 17, 2001 at 06:09:47PM -0500 References: <20010117234925.A17392@xs4all.nl> <20010117180947.E9384@kronos.cnri.reston.va.us> Message-ID: <20010118105314.D17392@xs4all.nl> On Wed, Jan 17, 2001 at 06:09:47PM -0500, Andrew Kuchling wrote: > >On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > >setup.py. Also, SSL support for the socket module was not enabled, though > >OpenSSL is installed, in the default path. > > Can you take a look at the detection code in setup.py and see what's > going wrong. I believe it should be found if OpenSSL is in > /usr/local/, but /usr/contrib isn't checked currently. Well, OpenSSL rests in the default location, which is /usr/local/ssl/include/openssl. Haven't the time to look into it right now, sorry. > >The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, > >which I personally like a lot, though it's probably a bitch to autodetect. > >(I tried, using autoconf ;-P) > There's code to handle Debian, though I have no way of testing it, and > it worked on Neil's Debian box for some reason. Search for > debian_tcl_include in setup.py, and see if you can fix it. Ah, yes. The problem in my case is that the *library* files are just in /usr/lib, but the include files are not. I re-indented the code to pull the debian-specific code out of the 'if prefix + os.sep + 'lib' not in lib_dirs' block, and it works now. Haven't tested it on other code yet, but I think it should work regardless. > >distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) > Are you sure setup.py is up to date; do a 'cvs update setup.py' to check. > You might get a "setup.py is in the way; remove it' message if you > downloaded the first setup.py script manually. D'oh, I guess not. I thought I did (I did on all other platforms :) but I guess I didn't, 'cause it works now. Thanx. > >without 'make clean' anymore. You get a lot of undefined-symbol warnings > >(see below.) If you run 'make clean;make test' it also doesn't work, because > >the build directory is not in the Python library path, and regrtest.py > >requires (at least) the time module. > Again, be sure the tree is up to date; I think this stems from > attempting to compile the signal module as shared, which doesn't work. This happened even with completely fresh, newly checked out trees, on all but FreeBSD (three different trees: Debian woody, BSDI 4.0 and BSDI 4.1) so I'm pretty sure that's not it. It works now, though, so I guess the move from a dynamic signalmodule to a static one does the trick ;) I got 'make test' working by applying the following patch to Makefile{,.in}, and running 'make PYTHONPATH=.: test' (determining builddir by hand, for now.): *************** *** 216,223 **** TESTPYTHON= ./python$(EXE) -tt test: all -rm -f $(srcdir)/Lib/test/*.py[co] ! -PYTHONPATH= $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) ! PYTHONPATH= $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) # Install everything install: altinstall bininstall maninstall --- 216,223 ---- TESTPYTHON= ./python$(EXE) -tt test: all -rm -f $(srcdir)/Lib/test/*.py[co] ! -PYTHONPATH=$(PYTHONPATH) $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) ! PYTHONPATH=$(PYTHONPATH) $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) # Install everything install: altinstall bininstall maninstall And because of that, I also noticed something funny: BSDI calls itself 'BSD/OS ', so distutils actually makes a directory called 'lib.bsd' and 'temp.bsd', with inside those a directory 'os--i386-2.1'. Is that a distutils bug, a setup.py bug, or intentional behaviour of one of the two ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas@arctrix.com Thu Jan 18 07:59:22 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Wed, 17 Jan 2001 23:59:22 -0800 Subject: [Python-Dev] new Makefile.in Message-ID: <20010117235922.A12356@glacier.fnational.com> Spurred on by comments made by Andrew, I spent some time last night overhauling the Python Makefiles. I now have a toplevel non-recursive Makefile.in that seems to work fairly well. I'm pretty sure it still should be portable. It doesn't use includes or any special GNU make features. It is half the size of the old Makefiles. The build is faster and its now easier to follow if something goes wrong. A question: is it possible to break the Python static library up? For example, instead of having libpython.a have Parser/parser.a, Objects/objects.a, etc? There would still only be one shared library. This would speed up incremental builds and also help Andrew with PEP 229. I'm thinking that the Makefile do something like this: all: python$(EXE) PYLIBS= Parser/parser.a Objects/objects.a ... Modules/modules.a python$(EXE): $(PYLIBS) $(LINKCC) -o python$(EXE) $(PYLIBS) ... Modules/modules.a: minpython$(EXE) ./minpython$(EXE) setup.py AFACT, the only thing affected by splitting up the static library is Misc/Makefile.pre.in. Is this correct? Neil From guido@digicool.com Thu Jan 18 14:52:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 09:52:23 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Thu, 18 Jan 2001 13:22:11 +1300." <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> References: <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> Message-ID: <200101181452.JAA06899@cj20424-a.reston1.va.home.com> > > a < b if and only if b > a. > > This is what the rich comparison code does. > > Someone is bound to come up with a use for comparison > operator overloading in which this isn't true, just > to be difficult! They'll get what they deserve -- this will be clearly documented! --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Thu Jan 18 15:15:25 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 10:15:25 -0500 (EST) Subject: [Python-Dev] Re: bug in grammar In-Reply-To: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> References: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> Message-ID: <14951.2189.14393.52725@localhost.localdomain> If I summarize your suggestion, I think you've said that ideally the grammar should not allow assignment to list comprehensions (or a variety of other constructs) -- but it doesn't so the compiler has to deal with it. This morning it seemed a lot easier to fix the bug than it did last night :-). com_assign() already has a number of checks for syntax errors in assignments. A test for list comprehensions belongs at the same place as tests for assignment to [] and augmented assignments applied to lists. I'll include a fix for assignment to list comprehensions in my big compiler patch. Jeremy From akuchlin@mems-exchange.org Thu Jan 18 15:28:19 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 10:28:19 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com>; from esr@thyrsus.com on Thu, Jan 18, 2001 at 02:23:21AM -0500 References: <20010118022321.A9021@thyrsus.com> Message-ID: <20010118102819.A21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 02:23:21AM -0500, Eric S. Raymond wrote: >And it's not hard to notice that an object hash is a memory address. Unless the object defines __hash__()! If you want the memory address, use id() instead. --amk From akuchlin@mems-exchange.org Thu Jan 18 15:30:36 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 10:30:36 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010118004400.B17392@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 18, 2001 at 12:44:00AM +0100 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> Message-ID: <20010118103036.B21503@kronos.cnri.reston.va.us> >On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: >> On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by >> setup.py. Also, SSL support for the socket module was not enabled, though >> OpenSSL is installed, in the default path. What does the layout of /usr/contrib look like? Is it /usr/contrib/openssl/include/, /usr/contrib/include/, or something else? >Strangely enough, this problem does not exist on FreeBSD. I can run 'make' >or 'make test' after 'make' just fine. 'make test' still doesn't work >because of the incorrect library path, but it doesn't barf like the other >systems (BSDI and Debian Linux) Have you already run "make install"? Perhaps it's picking up the already-installed modules when running "make test", because it really shouldn't be working. --amk From gward@cnri.reston.va.us Thu Jan 18 15:42:51 2001 From: gward@cnri.reston.va.us (Greg Ward) Date: Thu, 18 Jan 2001 10:42:51 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <3A6237D7.673BBB30@lemburg.com>; from mal@lemburg.com on Mon, Jan 15, 2001 at 12:35:51AM +0100 References: <3A6237D7.673BBB30@lemburg.com> Message-ID: <20010118104250.A27049@thrak.cnri.reston.va.us> On 15 January 2001, M.-A. Lemburg said: > He seems to be offline and the people on the distutils list have some > patches and other things which would be nice to have in distutils > for 2.1. Tim was right -- I'm *really* close to being back online. Just have to figure out why qmail's not answering port 25 and why LILO doesn't like my newly repartitioned hard drive, and all will be well. Oh yeah, and getting insurance, and a credit card, and unpacking all these cardboard boxes, and getting some furniture, ... (If anyone is considering it, I do *not* recommend buying a new computer, moving internationally, and getting a high speed home Internet connection all at the same time.) BTW I quite approve of Andrew being temporary Distutils dictator. Should have done it in December, but I didn't think I'd be out of commission for so long. Sigh. Greg From moshez@zadka.site.co.il Fri Jan 19 00:19:45 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 19 Jan 2001 02:19:45 +0200 (IST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: References: Message-ID: <20010119001945.80DC8A83E@darjeeling.zadka.site.co.il> On Thu, 18 Jan 2001 01:29:13 -0800 (PST), Ka-Ping Yee wrote: > On Tue, 16 Jan 2001, Guido van Rossum wrote: > > You mean the tp_print and tp_str function slots in type objects, > > right? tp_print *should* always render exactly the same as tp_str. > > tp_print is used by the print statement, not by value display at the > > interactive prompt. > > Uh, i hate to disagree with you about your own interpreter, but: > > com_expr_stmt in Python/compile.c > inserts a PRINT_EXPR opcode if c_interactive is true; > eval_code2 in Python/ceval.c > handles PRINT_EXPR by calling displayhook; > sys_displayhook in Python/sysmodule.c > prints the object by calling PyFile_WriteObject on sys.stdout; > PyFile_WriteObject in Objects/fileobject.c > calls PyObject_Print if the file is really a PyFileObject; > PyObject_Print in Objects/object.c > calls op->ob_type->tp_print if it's not NULL. > > The print statement produces a PRINT_ITEM opcode, which invokes > PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW > flag is propagated down to PyObject_Print and into string_print, > where it causes the string to fwrite itself directly without quoting. > > > So, string_print most definitely should *not* be changed -- only > > string_repr! > > I had to change them both before i actually saw the change in the > interactive interpreter. Actually, your statement above (that the > two should always render the same) seems to imply that if i change > one, i must also change the other. > > > -- ?!ng > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > > -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido@digicool.com Thu Jan 18 16:23:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:23:19 -0500 Subject: [Python-Dev] unistr() vs. unicode() Message-ID: <200101181623.LAA07389@cj20424-a.reston1.va.home.com> Ping wrote in response to a SourceForge mail about MAL's unistr() checking: ------- Forwarded Message Date: Wed, 17 Jan 2001 23:51:48 -0800 From: Ka-Ping Yee To: noreply@sourceforge.net cc: mal@lemburg.com, guido@python.org, patches@python.org Subject: Re: [Patches] [Patch #101664] Add new unistr() builtin + PyObject_Unic ode() C API On Wed, 17 Jan 2001 noreply@sourceforge.net wrote: > Comment: > This patch adds a utility function unistr() which works just like > the standard builtin str() -- only that the return value will > always be a Unicode object. Sorry for barging in, but i have an issue/question: Why are unistr() and unicode() two separate functions? str() performs one task: convert to string. It can convert anything, including strings or Unicode strings, numbers, instances, etc. The other type-named functions e.g. int(), long(), float(), list(), tuple() are similar in intent. Why have unicode() just for converting strings to Unicode strings, and unistr() for converting everything else to a Unicode string? What does unistr(x) do differently from unicode(x) if x is a string? - -- ?!ng ------- End of Forwarded Message (And no, Tim, this did *not* end up in the patches list because I made Barry remove the reply-to. SourceForge mails never had reply-to to begin with.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Jan 18 16:28:12 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:28:12 -0500 Subject: [Python-Dev] urllib.urlencode & repeated values In-Reply-To: Your message of "Wed, 17 Jan 2001 22:20:30 CST." <14950.28430.572215.10643@beluga.mojam.com> References: <14950.28430.572215.10643@beluga.mojam.com> Message-ID: <200101181628.LAA07406@cj20424-a.reston1.va.home.com> > I'm pretty sure this has come up before, but urllib.urlencode doesn't handle > repeated parameters properly. If I call > > urllib.urlencode({"performers": ("U2","Lawrence Martin")}) > > instead of getting > > performers=U2&performers=Lawrence+Martin > > I get a quoted stringified tuple: > > performers=%28%27U2%27%2c+%27Lawrence+Martin%27%29 > > Obviously, fixing this will change the function's current semantics, but I > think it's worth treating lists and tuples (actually, any sequence) as > repeated values. If the existing semantics are deemed valuable enough, a > third default parameter could be added to switch on the new behavior when > desired. > > If others agree I'd be happy to whip up a patch. I think it's a bug. Agreed. If you can come up with something that supports all sequence types, and treats singleton sequences the same as their one and only item, it would even be the inverse of cgi.parse_qs()! --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Thu Jan 18 16:43:49 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 17:43:49 +0100 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: ; from ping@lfw.org on Thu, Jan 18, 2001 at 02:14:19AM -0800 References: <3A66BCCC.14997FE3@lemburg.com> Message-ID: <20010118174349.E17392@xs4all.nl> On Thu, Jan 18, 2001 at 02:14:19AM -0800, Ka-Ping Yee wrote: > Wow!! __complex__ can produce a segfault! > >>> complex > > >>> class Foo: > ... def __complex__(self): return 3 > ... > >>> Foo() > <__main__.Foo instance at 0x81e8684> > >>> f = _ > >>> complex(f) > Segmentation fault (core dumped) > This happens because builtin_complex first retrieves and saves > the PyNumberMethods of the argument (in this case, from the > instance), then tries to call __complex__ (in this case, returning 3), > and THEN coerces the result using nbr->nb_float if the result is > not complex! (This calls the instance's nb_float method on the > integer object 3!!) I've noticed that lurking bug in the coercion code when I added augmented assignment, though I don't recall whether I fixed it then, nor do I know if that part's been "touched" by the recent coercion changes. If none of the coercion champions speak up, I'll look at this sometime this weekend. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin@mems-exchange.org Thu Jan 18 16:50:28 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 11:50:28 -0500 Subject: [Python-Dev] SSL detection problem In-Reply-To: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 11:39:30AM +0100 References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> Message-ID: <20010118115028.D21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 11:39:30AM +0100, Martin v. Loewis wrote: >The problem is that these header files are in /usr/include/openssl, >which is not in the standard include search path. I have an improved version of setup.py (not checked in yet) that tries to do better, checking for both header and library files. One point: the OpenSSL docs imply that the headers should be loaded as , not as ; the header files themselves use the openssl/*.h form, which means you'd need two -I directives.. I'll patch the socket module accordingly. >The not-so-obvious question: How can one work-around such a problem >with the new setup scheme? In the old scheme, I could have chosen to >either provide the right -I option in Modules/Setup, to disable SSL >support, or to disable the _socket module altogether. How can I >achieve either configuration with the new scheme? I still need to implement command-line options to specify such overrides, but that couldn't possibly get done in time for alpha1. I was thinking of something like ---libs="foo bar", ---includes="/usr/include/blah/", and so forth. Suggestions for a better interface welcomed... --amk From guido@digicool.com Thu Jan 18 16:55:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:55:39 -0500 Subject: [Python-Dev] bug in grammar In-Reply-To: Your message of "Wed, 17 Jan 2001 21:58:19 EST." <14950.23499.275398.963621@localhost.localdomain> References: <14950.23499.275398.963621@localhost.localdomain> Message-ID: <200101181655.LAA08001@cj20424-a.reston1.va.home.com> > As part of the implementation of PEP 227 (and in an attempt to reach > some low-hanging fruit Guido mentioned on the types-sig long ago), I > have been working on a compiler pass that generates a module-level > symbol table. I recently discovered a bug in the handling of list > comprehensions that was giving me headaches. > > I realize now that the problem is with the current grammar and/or > compiler. Here's a simple demonstration; try it in your friendly > python 2.0 interpreter. > > >>> [i for i in range(10)] = (1, 2, 3) > Traceback (most recent call last): > File "", line 1, in ? > ValueError: unpack list of wrong size > > The generated bytecode is: > > 0 SET_LINENO 0 > > 3 SET_LINENO 1 > 6 LOAD_CONST 0 (1) > 9 LOAD_CONST 1 (2) > 12 LOAD_CONST 2 (3) > 15 BUILD_TUPLE 3 > 18 UNPACK_SEQUENCE 1 > 21 STORE_NAME 0 (i) > 24 LOAD_CONST 3 (None) > 27 RETURN_VALUE > > I assume this isn't intended :-). The compiler is ignoring everything > after the initial atom in the list comprehension. It's basically > compiling the code as if it were: > > [i] = (1, 2, 3) > > I'm not sure how to try and fix this. Should the grammar allow one to > construct the example statement above? If not, I'm not sure how to > fix the grammar. If not, I suppose the compiler should detect that > the list comp is misplaced. This seems fairly messy, since there are > about 10 nodes between the expr_stmt and the list_for. > > Or is this a cool way to use list comprehensions to generate > ValueErrors? Good catch! Not everything cool deserves to be preserved. It looks like this happens because the code that traverses lists on the left-hand side of an assignment was never told about list comprehensions. You're right that the grammar can't be fixed; it's for the same reason that it can't be fixed to disallow "f() = 1". The solution is to add a test for this to the compiler that flags this as an error. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Jan 18 17:01:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 12:01:02 -0500 Subject: [Python-Dev] Embedded language discussion In-Reply-To: Your message of "Thu, 18 Jan 2001 00:19:31 EST." <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> References: <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> Message-ID: <200101181701.MAA08046@cj20424-a.reston1.va.home.com> > http://www.kuro5hin.org/?op=displaystory;sid=2001/1/16/11334/2280 > > The poster is on a project that's trying to use Python, but they're > encountering unspecified problems (perhaps because of the global > interpreter lock). I've sent the poster an email asking to be more specific about his questions; probably doing the right dance when calling Python from a thread created in C++ should do the trick. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Jan 18 17:04:43 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 12:04:43 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Thu, 18 Jan 2001 01:29:13 PST." References: Message-ID: <200101181704.MAA08074@cj20424-a.reston1.va.home.com> > On Tue, 16 Jan 2001, Guido van Rossum wrote: > > You mean the tp_print and tp_str function slots in type objects, > > right? tp_print *should* always render exactly the same as tp_str. > > tp_print is used by the print statement, not by value display at the > > interactive prompt. > > Uh, i hate to disagree with you about your own interpreter, but: > > com_expr_stmt in Python/compile.c > inserts a PRINT_EXPR opcode if c_interactive is true; > eval_code2 in Python/ceval.c > handles PRINT_EXPR by calling displayhook; > sys_displayhook in Python/sysmodule.c > prints the object by calling PyFile_WriteObject on sys.stdout; > PyFile_WriteObject in Objects/fileobject.c > calls PyObject_Print if the file is really a PyFileObject; > PyObject_Print in Objects/object.c > calls op->ob_type->tp_print if it's not NULL. > > The print statement produces a PRINT_ITEM opcode, which invokes > PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW > flag is propagated down to PyObject_Print and into string_print, > where it causes the string to fwrite itself directly without quoting. > > > So, string_print most definitely should *not* be changed -- only > > string_repr! > > I had to change them both before i actually saw the change in the > interactive interpreter. Actually, your statement above (that the > two should always render the same) seems to imply that if i change > one, i must also change the other. Oops. I'm so grateful that we have a collective memory! :-) You're right: tp_print() can be invoked in two modes: with or without Py_PRINT_RAW flag. In raw mode, it should behave exactly like str(); in cooked mode exactly like repr(). --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@mira.cs.tu-berlin.de Thu Jan 18 19:31:29 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 20:31:29 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? Message-ID: <200101181931.f0IJVTc00932@mira.informatik.hu-berlin.de> > Comments? Yes, three of them: 1. To guarantee uniqueness atleast within the process, the easiest solution would be if using_threads: import thread lock=thread.allocate_lock() _acquire = lock.acquire_lock _release = lock.release_lock else: _acquire = _release = lambda:None _cookie = time.time() def getCookie(): global _cookie _acquire() _cookie+=1 result = _cookie _release() return result 2. Invoking [] repeatedly likely returns the an object with the same id() when called twice in a row (i.e. with no intermediate objects allocated in-between). 3. Why did you send this question to python-dev? python-list is more appropriate. Regards, Martin From tim.one@home.com Thu Jan 18 19:49:12 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 14:49:12 -0500 Subject: [Python-Dev] Windows Python totally rad In-Reply-To: <3A66B846.3D24B959@lemburg.com> Message-ID: [MAL] > Could you tell me which these are [new test failures on Windows]? > The tests tested all passed just fine, so I guess these must be > Windows-related problems. Not to worry, all the tests pass now. Don't want to spend time backtracking, as I'm not the one who fixed them and don't know who did. FWIW, they "smelled like" shallow failures (== easy to diagnose & fix). onward!-ly y'rs - tim From martin@mira.cs.tu-berlin.de Thu Jan 18 19:37:04 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 20:37:04 +0100 Subject: [Python-Dev] new Makefile.in Message-ID: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de> > A question: is it possible to break the Python static library up? > For example, instead of having libpython.a have > Parser/parser.a, Objects/objects.a, etc? Please, no. It was that way in Python 1.4 (libModules, libObjects, and I forgot which the others were :-). We had that all documented in our book, then Guido tried to build an extension module for the first time, saw that these many libraries were terrible, and combined them into a single one. That was a good thing, and we have it documented in our book. I'm not at all looking forward to answering all the questions why the build infrastructure of Python changed yet again... Regards, Martin From fdrake@acm.org Thu Jan 18 20:22:30 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 18 Jan 2001 15:22:30 -0500 (EST) Subject: [Python-Dev] weak references in 2.1alpha Message-ID: <14951.20614.176140.672447@cj42289-a.reston1.va.home.com> I'd like to put the weak references patch into the alpha, but haven't received any feedback on the latest patch. I have some comments from Martin von L=F6wis on the PEP that need to be addressed, and that could change the implementation a bit, but the basic machinery seems to be pretty reasonable and works for me. Does anyone have any objections to it going into the alpha? I'd like to enable more wide-spread testing. Thanks! -Fred --=20 Fred L. Drake, Jr. PythonLabs at Digital Creations From mal@lemburg.com Thu Jan 18 17:10:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 18:10:14 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? References: <20010118022321.A9021@thyrsus.com> Message-ID: <3A672376.4B951848@lemburg.com> "Eric S. Raymond" wrote: > > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. > > Because Python garbage-collects, hash() of a just-created object isn't > good enough. Because we may be threading, millisecond time isn't > good enough. Because we may *not* be threading, thread ID isn't good > either. > > On the other hand, I'm on Linux getting millisecond time resolution. > And it's not hard to notice that an object hash is a memory address. > > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. > > Comments? A combination of time.time(), process id and counter should work in all cases. Make sure you use a lock around the counter, though. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Thu Jan 18 17:30:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 18:30:52 +0100 Subject: [Python-Dev] Standard install locations for Python ? References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> Message-ID: <3A67284C.B6C617A@lemburg.com> "Martin v. Loewis" wrote: > > > Where should Python extensions install themselves and their docs? > > I feel that extensions should not need to care. For extensions, > distutils will pick a location, and the system administrator > configuration the package can chose a different location. > > Unfortunately, distutils does not support the installation of > documentation, which I think it should. Right. > Now switching sides, as an administrator, I'd wish distutils to follow > the system conventions by default. > > That means on Linux, documentation should go into the system's > directory, which is /usr/share/doc according to latest > standards. Distributions vary, so distutils should find out - e.g. by > querying the location from rpm. In addition, when building RPMs, > distutils should declare these files as %doc in the spec file, so RPM > will install it following the system conventions. You currently have to do this by hand (e.g. in setup.cfg or using the doc_files option). It should fairly easy to add a command similar to install_data though which then applies all the necessary magic to the paths. If there a common landmark to look for on Unix (e.g. in case the system does not use RPM) ? Which paths should distutils check ? (/usr/share/doc/packages, /usr/share/doc, /usr/doc/packages, /usr/doc in that order ?) > On Windows, the convention apparently is to put the documentation > "nearby" the software, so it should probably go into Doc or a > subdirectory thereof. Na, I'd rather have \Python\Site-Packages and \Python\Site-Docs for that purpose. > On Unix, there appears to be no standard location, unless the > documentation consists of man pages or perhaps info files. So > /share/doc is probably a place as good as any other. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip@mojam.com (Skip Montanaro) Thu Jan 18 17:45:29 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 18 Jan 2001 11:45:29 -0600 (CST) Subject: [Python-Dev] urllib.urlencode & repeated values In-Reply-To: <200101181628.LAA07406@cj20424-a.reston1.va.home.com> References: <14950.28430.572215.10643@beluga.mojam.com> <200101181628.LAA07406@cj20424-a.reston1.va.home.com> Message-ID: <14951.11193.150232.564700@beluga.mojam.com> >> If others agree I'd be happy to whip up a patch. I think it's a bug. Guido> Agreed. Patch #103314: http://sourceforge.net/patch/?func=detailpatch&patch_id=103314&group_id=5470 I assigned it to Fred for doc review. Skip From akuchlin@mems-exchange.org Thu Jan 18 18:56:40 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 13:56:40 -0500 Subject: [Python-Dev] Standard install locations for Python ? In-Reply-To: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 01:20:06PM +0100 References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> Message-ID: <20010118135640.G21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 01:20:06PM +0100, Martin v. Loewis wrote: >On Unix, there appears to be no standard location, unless the >documentation consists of man pages or perhaps info files. So >/share/doc is probably a place as good as any other. This seems like a good suggestion. Should docs go in /share/doc/python/, then? Perhaps with subdirectories for different extensions? --amk From tismer@tismer.com Thu Jan 18 21:39:18 2001 From: tismer@tismer.com (Christian Tismer) Date: Thu, 18 Jan 2001 22:39:18 +0100 Subject: [Python-Dev] Rich comparison confusion References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> Message-ID: <3A676286.C33823B4@tismer.com> Guido van Rossum wrote: > > > I'm a bit confused about Guido's rich comparison stuff. In the description > > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. > > Yes. By this I mean that AA are interchangeable, ditto for > A<=B and B>=A. Also A==B interchanges for B==A, and A!=B for B!=A. ... > I think what threw you off was the ambiguity of "inverse". This means > Boolean negation. I'm not relying on Boolean negation here -- I'm > relying on the more fundamental property that aa have the > same outcome. Yes, the "inverse" is confusing. Is what you mean the "reverse" ? Like the other right-side operators __radd__, is it correct to think of __ge__ == __rle__ if __rle__ was written in the same fashion like __radd__ ? It looks semantically the same, although the reason for a call might be different. And if my above view is right, would it perhaps be less confusing to use in fact __rle__ and __rlt__, or woudl it be more confusing, since __rlt__ would also be invoked left-to-right, implementing ">". Not shure if I added even more confusion. -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim.one@home.com Thu Jan 18 21:53:44 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 16:53:44 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com> Message-ID: [Eric S. Raymond, in search of uniqueness] > ... > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because > another thread would have to create an object at the same memory > address during the same millisecond to collide. I'm afraid it's much more vulnerable than that: Python's thread granularity is at the bytecode level, not the statement level. It's very easy for thread A and B to see the same `time.time()` value, and after that arbitrarily long amounts of time may pass before they get around to doing the hash([]) business. When hash() completes, the storage for [] is immediately reclaimed under CPython, and it's again very easy for another thread to reuse the storage. I'm attaching an executable test case. It uses time.clock() because that has much higher resolution than time.time() on Windows (better than microsecond), but rounds it back to three decimal places to simulate millisecond resolution. The first three runs: saw 14600 unique in 30000 total saw 14597 unique in 30000 total saw 14645 unique in 30000 total So it sucks bigtime on my box. Better idea: borrow the _ThreadSafeCounter class from the tail end of the current CVS tempfile.py. The code works whether or not threads are available. Then `time.time()` + str(_counter.get_next()) is thread-safe. For that matter, plain old str(_counter.get_next()) will always be unique within a single run. However, in either case you're still not safe against concurrent *processes* generating the same cookies. tempfile.py has to worry about that too, of course, so the *best* idea is to call tempfile.mktemp() and leave it at that. It wastes some time checking the filesystem for a file of the same name (which, btw, goes much quicker on Linux than on Windows). >From time to time, somebody suggests adding a uuid generator to Python. Not a bad idea, but nobody wants to do all the x-platform work. like-capturing-snowflakes-ly y'rs - tim from threading import Thread import time N = 1000 NTHREADS = 30 class Worker(Thread): def __init__(self): Thread.__init__(self) def run(self): self.generated = [`round(time.clock(), 3)` + hex(hash([])) for i in range(N)] threads = [] for i in range(NTHREADS): threads.append(Worker()) for t in threads: t.start() d = {} total = 0 for t in threads: t.join() total += len(t.generated) for g in t.generated: d[g] = 1 print "saw", len(d), "unique in", total, "total" From tismer@tismer.com Thu Jan 18 21:56:08 2001 From: tismer@tismer.com (Christian Tismer) Date: Thu, 18 Jan 2001 22:56:08 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? References: <20010118022321.A9021@thyrsus.com> Message-ID: <3A676678.7E4AF278@tismer.com> "Eric S. Raymond" wrote: > > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. What do you mean by "unique"? Unique regarding your long-running server? If so, then I wonder why one should do > > So, how about `time.time()` + hex(hash([]))? > instead of using a single, simple counter for all sessions? > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. > > Comments? If I'm not overlooking something fundamental, the counter approach seems to be simpler and most portable. :-) but-sometimes-my-brain-malfunctions-badly-ly y'rs - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From nas@arctrix.com Thu Jan 18 15:07:13 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 07:07:13 -0800 Subject: [Python-Dev] Re: new Makefile.in In-Reply-To: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 08:37:04PM +0100 References: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de> Message-ID: <20010118070713.A13581@glacier.fnational.com> On Thu, Jan 18, 2001 at 08:37:04PM +0100, Martin v. Loewis wrote: > > A question: is it possible to break the Python static library up? > > For example, instead of having libpython.a have > > Parser/parser.a, Objects/objects.a, etc? > > Please, no. Okay. > I'm not at all looking forward to answering all the questions > why the build infrastructure of Python changed yet again... My Makefile patch shouldn't change the way you build extensions. Neil From tim.one@home.com Fri Jan 19 01:45:42 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 20:45:42 -0500 Subject: [Python-Dev] unistr() vs. unicode() In-Reply-To: <200101181623.LAA07389@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > (And no, Tim, this did *not* end up in the patches list because I made > Barry remove the reply-to. SourceForge mails never had reply-to to > begin with.) Aha! Another thing to blame Barry for . From tim.one@home.com Thu Jan 18 22:11:23 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:11:23 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <008701c0811a$b3371c00$e46940d5@hagrid> Message-ID: [/F] > python build problems and real life got in the way. > > will 2.1a1 be released according to plan? will there > be a 2.1a2 release? maybe I should postpone this? Depends on how confident you are. Since this is purely an optimization, I don't think it *needs* to get into a1 in order to make the final release; postponing a few days would be better than pushing too hard on something that's proved hairier than anticipated. do-the-right-thing-whatever-that-is-ly y'rs - tim From guido@digicool.com Fri Jan 19 02:17:36 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 21:17:36 -0500 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Thu, 18 Jan 2001 02:14:19 PST." References: Message-ID: <200101190217.VAA01497@cj20424-a.reston1.va.home.com> > I hope you don't mind that i'm taking this over to python-dev, > because it led me to discover a more general issue (see below). No -- in fact I wanted to see this here! (My mail backlog seems to be clearing -- or maybe it was only a temporary unclogging... :-) > For the others on python-dev, here's the background: MAL was > about to check in the unistr() function, described as follows: > > > This patch adds a utility function unistr() which works just like > > the standard builtin str() -- only that the return value will > > always be a Unicode object. > > > > The patch also adds a new object level C API PyObject_Unicode() > > which complements PyObject_Str(). > > I responded: > > Why are unistr() and unicode() two separate functions? > > > > str() performs one task: convert to string. It can convert anything, > > including strings or Unicode strings, numbers, instances, etc. > > > > The other type-named functions e.g. int(), long(), float(), list(), > > tuple() are similar in intent. > > > > Why have unicode() just for converting strings to Unicode strings, > > and unistr() for converting everything else to a Unicode string? > > What does unistr(x) do differently from unicode(x) if x is a string? > > MAL responded: > > unistr() is meant to complement str() very closely. unicode() > > works as constructor for Unicode objects which can also take > > care of decoding encoded data. str() and unistr() don't provide > > this capability but instead always assume the default encoding. > > > > There's also a subtle difference in that str() and unistr() > > try the tp_str slot which unicode() doesn't. unicode() > > supports any character buffer which str() and unistr() don't. > > Okay, given this explanation, i still feel fairly confident > that unicode() should subsume unistr(). Many of the other > type-named functions try various slots: > > int() looks for __int__ > float() looks for __float__ > long() looks for __long__ > str() looks for __str__ > > In testing this i also discovered the following: > > >>> class Foo: > ... def __int__(self): > ... return 3 > ... > >>> f = Foo() > >>> int(f) > 3 > >>> long(f) > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Foo instance has no attribute '__long__' > >>> float(f) > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Foo instance has no attribute '__float__' > > This is kind of surprising. How about: > > int() looks for __int__ > float() looks for __float__, then tries __int__ > long() looks for __long__, then tries __int__ > str() looks for __str__ > unicode() looks for __unicode__, then tries __str__ For the numeric types this could perhaps be done by calling PyNumber_Long() from PyNumber_Float(), calling PyNumber_Int() from PyNumber_Long(). Complex is a bit of an exception -- there's no PyNumber_Complex(), just because I felt that nobody would need it. :-) > The extra parameter to unicode() is very similar to the extra > parameter to int(), so i think there is a natural parallel here. Makes sense. > Hmm... what about the other types? > > Wow!! __complex__ can produce a segfault! > > >>> complex > > >>> class Foo: > ... def __complex__(self): return 3 > ... > >>> Foo() > <__main__.Foo instance at 0x81e8684> > >>> f = _ > >>> complex(f) > Segmentation fault (core dumped) > > This happens because builtin_complex first retrieves and saves > the PyNumberMethods of the argument (in this case, from the > instance), then tries to call __complex__ (in this case, returning 3), > and THEN coerces the result using nbr->nb_float if the result is > not complex! (This calls the instance's nb_float method on the > integer object 3!!) Thanks! Fixed now in CVS. > I think __complex__ should probably look for __complex__, then > __float__, then __int__. I make it call PyNumber_Float(), which could be made smarter as explained above. > One could argue for __list__, __tuple__, or __dict__, but that > seems much weaker; the Pythonic way has always been to implement > __getitem__ instead. Yes -- since __list__ etc. aren't used, let's not add them. > There is no built-in dict(); if it existed > i suppose it would do the opposite of x.items(); again a weak > argument, though i might have found such a function useful once > or twice. Yeah, it's not very common. Dict comprehensions anyone? d = {k:v for k,v in zip(range(10), range(10))} # :-) > And that about covers the built-in types for data. Thanks! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Thu Jan 18 22:13:14 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:13:14 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com> Message-ID: BTW, why doesn't hash([]) blow up in 2.1a1? In 2.0 it raised TypeError: unhashable type Did someone change this deliberately? From tim.one@home.com Thu Jan 18 22:58:22 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:58:22 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Message-ID: [Tim whined] > BTW, why doesn't hash([]) blow up in 2.1a1? In 2.0 it raised > > TypeError: unhashable type > > Did someone change this deliberately? Answer: it's an unintended consequence of the rich-comparison changes. Guido knows how to fix it and probably will. The list type grew a tp_richcompare slot but lost its non-NULL tp_compare pointer. PyObject_Hash wasn't changed accordingly (it now believes lists support neither direct hashing nor comparison, so does them a favor and hashes their memory addresses). Something trickier is probably going wrong elsewhere too, but I won't try to remember what that is unless Guido gets hit by a bus tonight. in-which-case-we-can-push-off-the-funeral-until-after-the-release-ly y'rs - tim From thomas@xs4all.net Thu Jan 18 23:02:09 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 00:02:09 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010118103036.B21503@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 18, 2001 at 10:30:36AM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> Message-ID: <20010119000209.F17392@xs4all.nl> On Thu, Jan 18, 2001 at 10:30:36AM -0500, Andrew Kuchling wrote: > >On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: > >> On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > >> setup.py. Also, SSL support for the socket module was not enabled, though > >> OpenSSL is installed, in the default path. > What does the layout of /usr/contrib look like? Is it > /usr/contrib/openssl/include/, /usr/contrib/include/, or something > else? Actually, it's /usr/local, not /usr/contrib. I've never installed OpenSSL in /usr/contrib, though I could, and maybe BSDI will, in the future. (BSDI installs its own software in /usr, and optional free, pre-compiled software in /usr/contrib.) OpenSSL installs into /usr/local/ssl/include/openssl by default, and installing into /usr/contrib would make it /usr/contrib/ssl/include/openssl. > >Strangely enough, this problem does not exist on FreeBSD. I can run 'make' > >or 'make test' after 'make' just fine. 'make test' still doesn't work > >because of the incorrect library path, but it doesn't barf like the other > >systems (BSDI and Debian Linux) > Have you already run "make install"? Perhaps it's picking up the > already-installed modules when running "make test", because it really > shouldn't be working. Hm, I think you misread my statement. 'make test' *doesn't* work. But it doesn't barf on the signal module being built dynamically either. You fixed that for every platform now, I was just pointing out that this was not a problem for FreeBSD for some reason. 'make test' still doesn't work, but I can make it work by specifying a hand-tweaked PYTHONPATH that includes the OS/arch-dependant build directory. This brings me to another point: how can 'make test' work at all ? Does python always check for './Lib' (and './Modules') for modules ? If that's specific for 'make test' and running python in the source distribution, that sounds like a bit of a weird hack. I can't find any such hackery in the source, but I also can't figure out how else it's working :) More-later--Meteor-((c)-1979)-is-on-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin@mira.cs.tu-berlin.de Thu Jan 18 23:14:05 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 19 Jan 2001 00:14:05 +0100 Subject: [Python-Dev] weak references in 2.1alpha Message-ID: <200101182314.f0INE5B00338@mira.informatik.hu-berlin.de> > Does anyone have any objections to it going into the alpha? I'd like to request that the .clear() method is removed from the patch for this alpha, and also that the weak dictionaries are removed until their semantics is clarified. It's always easier to add stuff later than to remove it. Regards, Martin From nas@arctrix.com Thu Jan 18 16:31:09 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 08:31:09 -0800 Subject: [Python-Dev] SSL detection problem In-Reply-To: <20010118115028.D21503@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 18, 2001 at 11:50:28AM -0500 References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> <20010118115028.D21503@kronos.cnri.reston.va.us> Message-ID: <20010118083109.A13972@glacier.fnational.com> On Thu, Jan 18, 2001 at 11:50:28AM -0500, Andrew Kuchling wrote: > On Thu, Jan 18, 2001 at 11:39:30AM +0100, Martin v. Loewis wrote: > >The not-so-obvious question: How can one work-around such a problem > >with the new setup scheme? > > I still need to implement command-line options to specify such > overrides, but that couldn't possibly get done in time for alpha1. My non-recursive makefile patch allows you to use both Setup and setup.py. Its not quite really for prime time but its getting close. I would be interested if someone could point me to the source for some crappy makes. I've tried GNU make, BSD 4.4 pmake and whatever comes with SunOS 5.6. Searching for "make" doesn't work too well. :-( Neil From thomas@xs4all.net Thu Jan 18 23:45:32 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 00:45:32 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Thu, Jan 18, 2001 at 08:46:54AM -0800 References: Message-ID: <20010119004532.G17392@xs4all.nl> On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > filename = '/tmp/delete_me' This reminds me: we need a portable way to handle test-files :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Thu Jan 18 23:56:04 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 18:56:04 -0500 Subject: [Python-Dev] new Makefile.in In-Reply-To: Your message of "Wed, 17 Jan 2001 23:59:22 PST." <20010117235922.A12356@glacier.fnational.com> References: <20010117235922.A12356@glacier.fnational.com> Message-ID: <200101182356.SAA19616@cj20424-a.reston1.va.home.com> Hi Neil, My mail suffers delays of 12-24 hours while mail.python.org is working on some enormous backlog. So I just saw your message about a new Makefile... > Spurred on by comments made by Andrew, I spent some time last > night overhauling the Python Makefiles. I now have a toplevel > non-recursive Makefile.in that seems to work fairly well. I'm > pretty sure it still should be portable. It doesn't use includes > or any special GNU make features. It is half the size of the old > Makefiles. The build is faster and its now easier to follow if > something goes wrong. I'd like to see this! > A question: is it possible to break the Python static library up? > For example, instead of having libpython.a have > Parser/parser.a, Objects/objects.a, etc? There > would still only be one shared library. This would speed up > incremental builds and also help Andrew with PEP 229. I'm > thinking that the Makefile do something like this: > > all: python$(EXE) > > PYLIBS= Parser/parser.a Objects/objects.a ... Modules/modules.a > > python$(EXE): $(PYLIBS) > $(LINKCC) -o python$(EXE) $(PYLIBS) ... > > Modules/modules.a: minpython$(EXE) > ./minpython$(EXE) setup.py Sounds cool to me. (Where's the patch for a shared libpython???) > AFACT, the only thing affected by splitting up the static library > is Misc/Makefile.pre.in. Is this correct? Yeah, and that should be phased out in favor of distutils anyway. Now would be a great time! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 00:34:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:34:02 -0500 Subject: [Python-Dev] Mail delays and SourceForge bugs Message-ID: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Through no fault of my own, email to guido@python.org (which includes the python-dev list) is currently suffering delays of 12-24 hours. I have a feeling this is probably true for all mail going through python.org, so checkin messages ans python-dev discussion have been greatly frustrated, with about 1 day to go until the planned 2.1a1 release date! On top of that, the SourceForge bug manager has developed a problem: all references to http://sourceforge.net/bugs/?group_id=5470/ come back with this error: An error occured in the logger. ERROR: pg_atoi: error in "5470/": can't parse "/" I'm still hoping to release Python 2.1a1 tomorrow, unless Jeremy tells me that he needs more time for his nested scopes patch. In the mean time, please everybody, do check out the latest CVS version and give it a good workout! Andrew's setup.py still has some rough edges, I believe that in order to run it from the build directory you still have to point PYTHONPATH to the build/lib* directory, where he hides the shared libraries for all modules. Andrew, are you planning to fix this? If there's anything that you need me to know about, please mail to guido@digicool.com -- that address suffers no delays. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Fri Jan 19 00:51:19 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 19:51:19 -0500 Subject: [Python-Dev] RE: [Pycabal] Mail delays and SourceForge bugs In-Reply-To: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Message-ID: [Guido. notes current woes w/ python.org email, and SourceForge] Note too that, over the past two days, it's not possible to follow Python-Dev email via http://mail.python.org/pipermail/python-dev/2001-January/date.html either, as (unlike during previous occurrences of python.org email delays) msgs aren't showing up there in a timely fashion either (for example, the msg of Guido's to which I'm replying isn't there). good-thing-guido's-so-easy-to-channel-ly y'rs - tim From guido@digicool.com Fri Jan 19 00:52:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:52:02 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Your message of "Thu, 18 Jan 2001 02:23:21 EST." <20010118022321.A9021@thyrsus.com> References: <20010118022321.A9021@thyrsus.com> Message-ID: <200101190052.TAA26849@cj20424-a.reston1.va.home.com> > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. > > Because Python garbage-collects, hash() of a just-created object isn't > good enough. Because we may be threading, millisecond time isn't > good enough. Because we may *not* be threading, thread ID isn't good > either. > > On the other hand, I'm on Linux getting millisecond time resolution. > And it's not hard to notice that an object hash is a memory address. > > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. Argh! hash([]) should raise TypeError, since lists are not hashable objects -- mutable objects can't be allowed as dictionary keys. This (hash([]) accidentally returned a value for a brief period after I checked in the rich comparisons -- I've fixed that now. But not to worry: instead of using hash([]), you can use hex(id([])). Same thing. On the other hand, remember how much you can do in a millisecond! (E.g. I can call tempfile.mktemp() 5 times in that time.) And when you create an object and immediately delete it, the next object created is very likely to have the same address. But what's wrong with this: try: from thread import get_ident as unique_id else: def unique_id(): return id([]) --Guido van Rossum (home page: http://www.python.org/~guido/) From billtut@microsoft.com Fri Jan 19 00:53:15 2001 From: billtut@microsoft.com (Bill Tutt) Date: Thu, 18 Jan 2001 16:53:15 -0800 Subject: [Python-Dev] MS CRT crashing: Message-ID: <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> >From the internal support squad: Turns out the C standard explicitly says you can't have an input follow output on a stream without doing fflush or fseek in-between, to make sure the stdio buffer is cleared. So this program is illegal. They've gone and resolved it by design. FYI, Bill -----Original Message----- From: Bill Tutt Sent: Wednesday, January 10, 2001 1:09 AM To: 'Tim Peters' Cc: 'Mark Hammond' Subject: RE: [Python-Dev] xreadlines : readlines :: xrange : range Heh. I've tossed this code to internal support. I'll give a yell if I hear anything interesting. Thanks for the C test case, Bill From guido@digicool.com Fri Jan 19 00:53:13 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:53:13 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: Your message of "Thu, 18 Jan 2001 07:48:41 +0100." <008701c0811a$b3371c00$e46940d5@hagrid> References: <012901c080a5$306023a0$e46940d5@hagrid> <008701c0811a$b3371c00$e46940d5@hagrid> Message-ID: <200101190053.TAA26862@cj20424-a.reston1.va.home.com> > I wrote: > > I've almost sorted it all out. will check it in later tonight (local > > time). > > python build problems and real life got in the way. What? You've got a real life? Can't be allowed, not when we're working on a release! > will 2.1a1 be released according to plan? will there > be a 2.1a2 release? maybe I should postpone this? Please check it in, there's still time (2.1a1 won't go out before Friday night, possibly it'll be delayed until Monday). And yes, there will be a 2.1a2. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 00:55:15 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:55:15 -0500 Subject: [Python-Dev] SSL detection problem In-Reply-To: Your message of "Thu, 18 Jan 2001 11:39:30 +0100." <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> Message-ID: <200101190055.TAA26905@cj20424-a.reston1.va.home.com> > The distutils-based configuration fails to build on my system (SuSE > 7.0) with the error > > /usr/src/python/Modules/socketmodule.c:159: rsa.h: Datei oder Verzeichnis nicht > gefunden > /usr/src/python/Modules/socketmodule.c:160: crypto.h: Datei oder Verzeichnis nicht gefunden > /usr/src/python/Modules/socketmodule.c:161: x509.h: Datei oder Verzeichnis nicht gefunden > /usr/src/python/Modules/socketmodule.c:162: pem.h: Datei oder Verzeichnis nicht > gefunden > /usr/src/python/Modules/socketmodule.c:163: ssl.h: Datei oder Verzeichnis nicht > gefunden The same happened to Fred on Mandrake 7.0 (except for the German messages :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 00:58:16 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:58:16 -0500 Subject: [Python-Dev] Re: unistr() vs. unicode() Message-ID: <200101190058.TAA26931@cj20424-a.reston1.va.home.com> MAL's reply to Ping in this thread. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Thu, 18 Jan 2001 10:52:12 +0100 From: "M.-A. Lemburg" To: Ka-Ping Yee cc: guido@python.org, patches@python.org Subject: Re: [Patches] [Patch #101664] Add new unistr() builtin + PyObject_Unic ode()C API Ka-Ping Yee wrote: > > On Wed, 17 Jan 2001 noreply@sourceforge.net wrote: > > Comment: > > This patch adds a utility function unistr() which works just like > > the standard builtin str() -- only that the return value will > > always be a Unicode object. > > Sorry for barging in, but i have an issue/question: > > Why are unistr() and unicode() two separate functions? > > str() performs one task: convert to string. It can convert anything, > including strings or Unicode strings, numbers, instances, etc. > > The other type-named functions e.g. int(), long(), float(), list(), > tuple() are similar in intent. > > Why have unicode() just for converting strings to Unicode strings, > and unistr() for converting everything else to a Unicode string? > What does unistr(x) do differently from unicode(x) if x is a string? unistr() is meant to complement str() very closely. unicode() works as constructor for Unicode objects which can also take care of decoding encoded data. str() and unistr() don't provide this capability but instead always assume the default encoding. There's also a subtle difference in that str() and unistr() try the tp_str slot which unicode() doesn't. unicode() supports any character buffer which str() and unistr() don't. Perhaps you are right though in that we should make all three APIs behave in the same way with respect to coercing their arguments. This could hide some errors... still in the long run, I agree that the existing setup probably causes more confusion than good. Guido ? - -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ _______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches ------- End of Forwarded Message From guido@digicool.com Fri Jan 19 01:04:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 20:04:22 -0500 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Thu, 18 Jan 2001 11:51:46 +0100." <3A66CAC2.74FC894@lemburg.com> References: <3A66CAC2.74FC894@lemburg.com> Message-ID: <200101190104.UAA27056@cj20424-a.reston1.va.home.com> > Ka-Ping Yee wrote: > > > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > > str() looks for __str__ > > > > Oops. I forgot that > > > > str() looks for __str__, then tries __repr__ > > > > So, presumably, > > > > unicode() should look for __unicode__, then __str__, then __repr__ > > Not quite... str() does this: > > 1. strings are passed back as-is > 2. the type slot tp_str is tried > 3. the method __str__ is tried > 4. Unicode returns are converted to strings > 5. anything other than a string return value is rejected > > unistr() does the same, but makes sure that the return > value is an Unicode object. > > unicode() does the following: > > 1. for instances, __str__ is called > 2. Unicode objects are returned as-is > 3. string objects or character buffers are used as basis for decoding > 4. decoding is applied to the character buffer and the results > are returned > > I think we should perhaps merge the two approaches into one > which then applies all of the above in unicode() (and then > forget about unistr()). This might lose hide some type errors, > but since all other generic constructors behave more or less > in the same way, I think unicode() should too. Yes, I would like to see these merged. I noticed that e.g. there is special code to compare Unicode strings in the comparison code (I think I *could* get rid of this now we have rich comparisons, but I decided to put that off), and when I looked at it it uses the same set of conversions as unicode(). Some of these seem questionable to me -- why do you try so many ways to get a string out of an object? (On the other hand the merge of unicode() and unistr() might have this effect anyway...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 01:06:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 20:06:23 -0500 Subject: [Python-Dev] bug in grammar In-Reply-To: Your message of "Thu, 18 Jan 2001 13:39:54 +0100." <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> References: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> Message-ID: <200101190106.UAA27073@cj20424-a.reston1.va.home.com> > I think there should be a well-formedness pass in-between. I.e. after > the AST has been build, a single pass should descend through the tree, > looking for an expr_statement with more than a single testlist. Once > it finds one, it should confirm that this really is a well-formed > lvalue (in C speak). In this case, the test should be that each term > is a an atom without factors. Good ideal. > If the parser itself performs such checks, the compiler could be > simplified in many places, I guess. Not sure that in practice it makes much of a difference: there aren't that many of these kinds of checks, and writing a separate pass is expensive. On the other hand, Jeremy is just writing a separate pass anyway, to collect name usage information for the nested scopes. Maybe it could be folded into that pass... --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Fri Jan 19 03:20:08 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 22:20:08 -0500 (EST) Subject: [Python-Dev] deprecated regex used by un-deprecated modules Message-ID: <14951.45672.806978.600944@localhost.localdomain> There are several modules in the standard library that use the regex module. When they are imported, they print a warning about using a deprecated module. I think this is bad form. Either the modules that depend on regex should by updated to use re or they should be deprecated themselves. I discovered the following offenders: asynchat knee poplib reconvert I would suggest fixing asynchat and poplib and deprecating knee. The reconvert module may be a special case. Jeremy From jeremy@alum.mit.edu Fri Jan 19 03:31:02 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 22:31:02 -0500 (EST) Subject: [Python-Dev] setup.py and build subdirectories Message-ID: <14951.46326.743921.988828@localhost.localdomain> I have a bunch of build directories under the source tree, e.g. src/python/dist/src/build src/python/dist/src/build-pg src/python/dist/src/build-O3 ... The new setup.py did not successfully build in these directories. I hacked distutils a tiny bit and had some success. Patch below. I'm not sure if the approach is kosher, but it allows me to build successfully. I also have a problem running 'make test' from these build directories. The reference to the distutils build directory has '..' prepended to it that shouldn't exist. Jeremy Index: setup.py =================================================================== RCS file: /cvsroot/python/python/dist/src/setup.py,v retrieving revision 1.8 diff -c -r1.8 setup.py *** setup.py 2001/01/18 20:39:34 1.8 --- setup.py 2001/01/19 03:26:55 *************** *** 536,540 **** # --install-platlib if __name__ == '__main__': ! sysconfig.set_python_build() main() --- 536,541 ---- # --install-platlib if __name__ == '__main__': ! path, file = os.path.split(sys.argv[0]) ! sysconfig.set_python_build(path) main() Index: Lib/distutils/sysconfig.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/distutils/sysconfig.py,v retrieving revision 1.31 diff -c -r1.31 sysconfig.py *** Lib/distutils/sysconfig.py 2001/01/17 15:16:52 1.31 --- Lib/distutils/sysconfig.py 2001/01/19 03:27:01 *************** *** 24,37 **** python_build = 0 ! def set_python_build(): """Set the python_build flag to true; this means that we're building Python itself. Only called from the setup.py script shipped with Python. """ global python_build ! python_build = 1 def get_python_inc(plat_specific=0, prefix=None): """Return the directory containing installed Python header files. --- 24,37 ---- python_build = 0 ! def set_python_build(loc): """Set the python_build flag to true; this means that we're building Python itself. Only called from the setup.py script shipped with Python. """ global python_build ! python_build = loc + "/" def get_python_inc(plat_specific=0, prefix=None): """Return the directory containing installed Python header files. *************** *** 48,54 **** prefix = (plat_specific and EXEC_PREFIX or PREFIX) if os.name == "posix": if python_build: ! return "Include/" return os.path.join(prefix, "include", "python" + sys.version[:3]) elif os.name == "nt": return os.path.join(prefix, "Include") # include or Include? --- 48,54 ---- prefix = (plat_specific and EXEC_PREFIX or PREFIX) if os.name == "posix": if python_build: ! return python_build + "Include/" return os.path.join(prefix, "include", "python" + sys.version[:3]) elif os.name == "nt": return os.path.join(prefix, "Include") # include or Include? From tim.one@home.com Fri Jan 19 03:46:16 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 18 Jan 2001 22:46:16 -0500 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() Message-ID: [attribution lost] > There is no built-in dict(); if it existed i suppose it would do > the opposite of x.items(); again a weak argument, though i might > have found such a function useful once or twice. [Guido] > Yeah, it's not very common. Dict comprehensions anyone? > > d = {k:v for k,v in zip(range(10), range(10))} # :-) It's very common in Perl code, but is in no sense the inverse of .items() there: when you build a dict from a list L in Perl, it acts like Python {L[0]: L[1], L[2]: L[3], L[4]: L[5], ... } That's what seems most practical most often; e.g., when crunching over text files with records of the form key value (e.g., mail headers are of this form; simple contact databases; to-do lists segregated by date; etc), whatever fancy re.split() is used to break things apart naturally returns a flat list. A list of two-tuples is natural only if it was obtained from another dict's .items() <0.9 wink>. pushing-the-limits-of-"practicality-beats-purity"?-ly y'rs - tim From tim.one@home.com Fri Jan 19 06:00:27 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 19 Jan 2001 01:00:27 -0500 Subject: [Python-Dev] test_urllib failing on Windows Message-ID: test test_urllib crashed -- exceptions.AssertionError: urllib.quote problem From tim.one@home.com Fri Jan 19 06:39:30 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 19 Jan 2001 01:39:30 -0500 Subject: [Python-Dev] (no subject) Message-ID: [some MS internal support group] > Turns out the C standard explicitly says you can't have an input > follow iutput on a stream without doing fflush or fseek in-between, > to make sure the stdio buffer is cleared. So this program is illegal. It's undefined (there are no "illegal" programs -- that word doesn't appear in the std; "undefined" does and has a precise technical meaning). In the presence of threads-- which the C std doesn't mention --you have to address issues the std doesn't touch. To date, MS's is the only C runtime we've seen that corrupts itself in this situation. It can do anything it likes short of blowing up and still be considered a good threaded implementation. As is, it has to be considered sub-standard, in the ordinary sense of displaying worse behavior than other threaded C stdio implementations. It falls short there on other counts too (like the lack of getc_unlocked() & friends), but internal corruption is a particularly egregious failing. and-that's-the-end-of-it-for-me-ly y'rs - tim From mwh21@cam.ac.uk Fri Jan 19 08:31:18 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 19 Jan 2001 08:31:18 +0000 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Thomas Wouters's message of "Fri, 19 Jan 2001 00:02:09 +0100" References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: Thomas Wouters writes: > This brings me to another point: how can 'make test' work at all ? Does > python always check for './Lib' (and './Modules') for modules ? If that's > specific for 'make test' and running python in the source distribution, that > sounds like a bit of a weird hack. I can't find any such hackery in the > source, but I also can't figure out how else it's working :) It's in Modules/getpath.c Cheers, M. -- I really hope there's a catastrophic bug insome future e-mail program where if you try and send an attachment it cancels your ISP account, deletes your harddrive, and pisses in your coffee -- Adam Rixey From gstein@lyra.org Fri Jan 19 08:38:54 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 19 Jan 2001 00:38:54 -0800 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) In-Reply-To: ; from gvanrossum@users.sourceforge.net on Thu, Jan 18, 2001 at 04:28:10PM -0800 References: Message-ID: <20010119003854.F7731@lyra.org> On Thu, Jan 18, 2001 at 04:28:10PM -0800, Guido van Rossum wrote: >... > PyTypeObject PyCursesWindow_Type = { > ! PyObject_HEAD_INIT(NULL) > 0, /*ob_size*/ > "curses window", /*tp_name*/ >... > --- 2432,2443 ---- > /* Initialization function for the module */ > > ! DL_EXPORT(void) > init_curses(void) > { > PyObject *m, *d, *v, *c_api_object; > static void *PyCurses_API[PyCurses_API_pointers]; > + > + /* Initialize object type */ > + PyCursesWindow_Type.ob_type = &PyType_Type; > > /* Initialize the C API pointer array */ I've never truly understood this. Is it because Windows cannot initialize (at load-time) a pointer to a data structure that is located in a different DLL? It is a bit painful to keep moving inits from load-time to run-time. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Fri Jan 19 09:01:22 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 19 Jan 2001 04:01:22 -0500 Subject: [Python-Dev] test_urllib failing on Windows Message-ID: Bet it was failing everywhere; it's fixed now. From moshez@zadka.site.co.il Fri Jan 19 17:53:36 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Fri, 19 Jan 2001 19:53:36 +0200 (IST) Subject: [Python-Dev] Dbm failure Message-ID: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il> test test_dbm skipped -- /home/moshez/prog/src/python/python/dist/src/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey Did it happen to anyone else? Anything else you need to know? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From mal@lemburg.com Fri Jan 19 09:58:08 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 10:58:08 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> Message-ID: <3A680FB0.AED2DB55@lemburg.com> Guido van Rossum wrote: > > > Ka-Ping Yee wrote: > > > > > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > > > str() looks for __str__ > > > > > > Oops. I forgot that > > > > > > str() looks for __str__, then tries __repr__ > > > > > > So, presumably, > > > > > > unicode() should look for __unicode__, then __str__, then __repr__ > > > > Not quite... str() does this: > > > > 1. strings are passed back as-is > > 2. the type slot tp_str is tried > > 3. the method __str__ is tried > > 4. Unicode returns are converted to strings > > 5. anything other than a string return value is rejected > > > > unistr() does the same, but makes sure that the return > > value is an Unicode object. > > > > unicode() does the following: > > > > 1. for instances, __str__ is called > > 2. Unicode objects are returned as-is > > 3. string objects or character buffers are used as basis for decoding > > 4. decoding is applied to the character buffer and the results > > are returned > > > > I think we should perhaps merge the two approaches into one > > which then applies all of the above in unicode() (and then > > forget about unistr()). This might lose hide some type errors, > > but since all other generic constructors behave more or less > > in the same way, I think unicode() should too. > > Yes, I would like to see these merged. I noticed that e.g. there is > special code to compare Unicode strings in the comparison code (I > think I *could* get rid of this now we have rich comparisons, but I > decided to put that off), and when I looked at it it uses the same set > of conversions as unicode(). Some of these seem questionable to me -- > why do you try so many ways to get a string out of an object? (On the > other hand the merge of unicode() and unistr() might have this effect > anyway...) ... because there are so many ways to get at string representations of objects in Python at C level. If we agree to merge the semantics of the two APIs, then str() would have to change too: is this desirable ? (IMHO, yes) Here's what we could do: a) merge the semantics of unistr() into unicode() b) apply the same semantics in str() c) remove unistr() -- how's that for a short-living builtin ;) About the semantics: These should be backward compatible to str() in that everything that worked before should continue to work after the merge. A strawman for processing str() and unicode(): 1. strings/Unicode is passed back as-is 2. tp_str is tried 3. the method __str__ is tried 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) 5. for str(): Unicode return values are converted to strings using the default encoding for unicode(): Unicode return values are passed back as-is; string return values are decoded according to the encoding parameter 6. the return object is type-checked: str() will always return a string object, unicode() always a Unicode object Note that passing back Unicode is only allowed in case no encoding was given. Otherwise an execption is raised: you can't decode Unicode. As extension we could add encoding and error parameters to str() as well. The result would be either an encoding of Unicode objects passed back by tp_str or __str__ or a recoding of string objects returned by checks 2, 3 or 4. If we agree to take this approach, then we should remove the unistr() Python API before the alpha ships. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@effbot.org Fri Jan 19 10:19:06 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 11:19:06 +0100 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) References: <20010119003854.F7731@lyra.org> Message-ID: <010c01c08201$4b0ec050$e46940d5@hagrid> greg wrote: > I've never truly understood this. Is it because Windows cannot initialize > (at load-time) a pointer to a data structure that is located in a different > DLL? Windows can do it (via DLL initialization code), but the compiler doesn't generate initialization code for C programs. you can compile the module as C++, but that's also a bit painful... From jack@oratrix.nl Fri Jan 19 11:02:00 2001 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 19 Jan 2001 12:02:00 +0100 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments Message-ID: <20010119110200.9E455373C95@snelboot.oratrix.nl> I get the impression that I'm currently seeing a non-NULL third argument in my (C) methods even though the method is called without keyword arguments. Is this new semantics that I missed the discussion about, or is this a bug? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From thomas@xs4all.net Fri Jan 19 12:22:06 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:22:06 +0100 Subject: [Python-Dev] deprecated regex used by un-deprecated modules In-Reply-To: <14951.45672.806978.600944@localhost.localdomain>; from jeremy@alum.mit.edu on Thu, Jan 18, 2001 at 10:20:08PM -0500 References: <14951.45672.806978.600944@localhost.localdomain> Message-ID: <20010119132206.H17392@xs4all.nl> On Thu, Jan 18, 2001 at 10:20:08PM -0500, Jeremy Hylton wrote: > I would suggest fixing asynchat and poplib and deprecating knee. The > reconvert module may be a special case. Can't reconvert just disable the warning before importing regex ? That would seem the sane thing to do, at least to me. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Fri Jan 19 12:26:31 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:26:31 +0100 Subject: [Python-Dev] Mail delays and SourceForge bugs In-Reply-To: <200101190034.TAA26664@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jan 18, 2001 at 07:34:02PM -0500 References: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Message-ID: <20010119132631.I17392@xs4all.nl> On Thu, Jan 18, 2001 at 07:34:02PM -0500, Guido van Rossum wrote: > Through no fault of my own, email to guido@python.org (which includes > the python-dev list) is currently suffering delays of 12-24 hours. I > have a feeling this is probably true for all mail going through > python.org, so checkin messages ans python-dev discussion have been > greatly frustrated, with about 1 day to go until the planned 2.1a1 > release date! I doubt it's (just) you, Guido. I'm seeing similar delays, and I already talked with Barry about it, too. It looks like it's clearing up a bit, now, but it's confusing as hell, for sure ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Fri Jan 19 12:33:47 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:33:47 +0100 Subject: [Python-Dev] Dbm failure In-Reply-To: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Fri, Jan 19, 2001 at 07:53:36PM +0200 References: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il> Message-ID: <20010119133347.J17392@xs4all.nl> On Fri, Jan 19, 2001 at 07:53:36PM +0200, Moshe Zadka wrote: > test test_dbm skipped -- /home/moshez/prog/src/python/python/dist/src/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey > Did it happen to anyone else? Yes, to me. You're suffering from the same thing I did: GNU sucks. Okay, okay, not as much as MS products or most other UNIX software, but still ;) The problem is a conflict between gdbm and glibc. gdbm (1.7.3, which is what woody currently carries, not sure why it isn't updated) offers a dbm interface/replacement, which includes a libdbm.(so|a) and /usr/include/gdbm-ndbm.h. Glibc (or at least the debian package) *also* offers a dbm interface/replacement, which consists of libdb1.(so|a) and /usr/include/db1/ndbm.h (which needs /usr/include/db1/*.h). If you add /usr/include/db1 to your include path, and -ldbm to the dbmmodule, you end up with the wrong versions. You need either to include /usr/include/db1 in your includepath and use -ldb1, or fix up dbmmodule.c so it includes gdbm-ndbm.h and uses -ldbm. I only figured this out yesterday, and sent Andrew a mail about that... I'm not sure what the Right(tm) way to fix this is :( I've always loathed these library/version mismatches :P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Fri Jan 19 13:07:00 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 14:07:00 +0100 Subject: [Python-Dev] Standard install locations for Python ? References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> <20010118135640.G21503@kronos.cnri.reston.va.us> Message-ID: <3A683BF4.BD74A979@lemburg.com> Andrew Kuchling wrote: > > On Thu, Jan 18, 2001 at 01:20:06PM +0100, Martin v. Loewis wrote: > >On Unix, there appears to be no standard location, unless the > >documentation consists of man pages or perhaps info files. So > >/share/doc is probably a place as good as any other. > > This seems like a good suggestion. Should docs go in > /share/doc/python/, then? Perhaps with > subdirectories for different extensions? Hmm, I guess it's better to follow bdist_rpm here: put the docs into a subdir under .../doc/ using the package name and version. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy@alum.mit.edu Fri Jan 19 14:39:13 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 19 Jan 2001 09:39:13 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <20010119110200.9E455373C95@snelboot.oratrix.nl> References: <20010119110200.9E455373C95@snelboot.oratrix.nl> Message-ID: <14952.20881.848489.869512@localhost.localdomain> >>>>> "JJ" == Jack Jansen writes: JJ> I get the impression that I'm currently seeing a non-NULL third JJ> argument in my (C) methods even though the method is called JJ> without keyword arguments. JJ> Is this new semantics that I missed the discussion about, or is JJ> this a bug? This is a bug in the changes I made to the call function implementation. I wasn't sure what was supposed to happen to a function that expected a kw argument but was called without one. I thought I saw some crashes when I passed NULL, so I changed the implementation to pass an empty dictionary. (Is the correct behavior documented anywhere?) If a NULL value is correct, I'll update the implementation and see if I can rediscover those crashes. Jeremy From nas@arctrix.com Fri Jan 19 07:39:50 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 23:39:50 -0800 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010119000209.F17392@xs4all.nl>; from thomas@xs4all.net on Fri, Jan 19, 2001 at 12:02:09AM +0100 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: <20010118233950.A15636@glacier.fnational.com> On Fri, Jan 19, 2001 at 12:02:09AM +0100, Thomas Wouters wrote: > I can't find any such hackery in the source, but I also can't > figure out how else it's working :) I thank you want to look at getpath.c. Neil From jeremy@alum.mit.edu Fri Jan 19 14:44:50 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 19 Jan 2001 09:44:50 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.107,2.108 In-Reply-To: References: Message-ID: <14952.21218.416551.695660@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: GvR> Log Message: Changes to recursive-object comparisons, having to GvR> do with a test case I found where rich comparison of unequal GvR> recursive objects gave unintuituve results. In a discussion GvR> with Tim, where we discovered that our intuition on when a<=b GvR> should be true was failing, we decided to outlaw ordering GvR> comparisons on recursive objects. (Once we have fixed our GvR> intuition and designed a matching algorithm that's practical GvR> and reasonable to implement, we can allow such orderings GvR> again.) Sounds sensible to me! I was quite puzzled about what <= should return for recursive objects. GvR> - Changed the nesting limit to a more reasonable small 20; this GvR> only slows down comparisons of very deeply nested objects GvR> (unlikely to occur in practice), while speeding up GvR> comparisons of recursive objects (previously, this would GvR> first waste time and space on 500 nested comparisons before GvR> it would start detecting recursion). After we talked through this code yesterday, I was also thinking that the limit was too high :-). Jeremy From guido@digicool.com Fri Jan 19 15:49:54 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 10:49:54 -0500 Subject: [Python-Dev] new Makefile.in In-Reply-To: Your message of "Thu, 18 Jan 2001 18:56:04 EST." <200101182356.SAA19616@cj20424-a.reston1.va.home.com> References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> Message-ID: <200101191549.KAA28699@cj20424-a.reston1.va.home.com> [Neil] > > A question: is it possible to break the Python static library up? [me] > Sounds cool to me. Of course after Martin's response I agree with him -- let's keep it one library. (Although I expect that the combined effect of setup.py and Neil's flat Makefile will still affect the infrastructure to build extensions... :-( ) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 15:56:58 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 10:56:58 -0500 Subject: [Python-Dev] MS CRT crashing: In-Reply-To: Your message of "Thu, 18 Jan 2001 16:53:15 PST." <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> References: <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> Message-ID: <200101191556.KAA28761@cj20424-a.reston1.va.home.com> Bill Tutt writes: > From the internal support squad: > Turns out the C standard explicitly says you can't have an input follow > output on a stream without doing fflush or fseek in-between, to make sure > the stdio buffer is cleared. So this program is illegal. > > They've gone and resolved it by design. I'd just like to note for the record that this is exactly what I had predicted. I'd also like to note that I *agree*. Tim seems to think there's a race condition in the threading code, but it's really much simpler than that: the same bug can easily be provoked with a single-threaded program: just randomly read and write alternatingly. So obviously the people who wrote the threading code aren't interested in the bug, because it's not in their code -- and the people who wrote the code that doesn't behave well when abused are protected by the C standard... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 16:00:30 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:00:30 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Thu, 18 Jan 2001 22:39:18 +0100." <3A676286.C33823B4@tismer.com> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> Message-ID: <200101191600.LAA28788@cj20424-a.reston1.va.home.com> > Yes, the "inverse" is confusing. Is what you mean the "reverse" ? > Like the other right-side operators __radd__, is it correct to > think of > > __ge__ == __rle__ > > if __rle__ was written in the same fashion like __radd__ ? > It looks semantically the same, although the reason for a > call might be different. Yes, it's semantically the same, and the reason for the call is the same too ("the left argument doesn't support the operator so let's try if the right one knows"). > And if my above view is right, would it perhaps be less > confusing to use in fact __rle__ and __rlt__, > or woudl it be more confusing, since __rlt__ would also be > invoked left-to-right, implementing ">". I prefer 6 new operators over 12 any day. I can see no valid reason why someone would want to overload a>b different than b; from guido@digicool.com on Fri, Jan 19, 2001 at 10:49:54AM -0500 References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> <200101191549.KAA28699@cj20424-a.reston1.va.home.com> Message-ID: <20010119111455.C25056@kronos.cnri.reston.va.us> On Fri, Jan 19, 2001 at 10:49:54AM -0500, Guido van Rossum wrote: >Of course after Martin's response I agree with him -- let's keep it >one library. (Although I expect that the combined effect of setup.py >and Neil's flat Makefile will still affect the infrastructure to build >extensions... :-( ) Which reminds me... there should really be a way to ignore the setup.py stuff and use the old method. How should that be done. A --use-makesetup flag to configure, maybe? --amk From guido@digicool.com Fri Jan 19 16:14:20 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:14:20 -0500 Subject: [Python-Dev] Re: test_support.py In-Reply-To: Your message of "Thu, 18 Jan 2001 21:59:23 PST." References: Message-ID: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> > if not condition: > ! raise AssertionError(reason) Wouldn't it be better if this raised TestFailed rather than AssertionError? Or is there code that catches the AssertionError? [...grep...] Yes, there's code that catches AssertionError: (1) in Marc-Andre's own test_unicode.py; (2) in test_re, which catches AssertionError and raises TestFailed instead. Proposal: (1) change verify() to raise TestFailed; (2) change test_unicode.py to catch TestFailed instead. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Fri Jan 19 16:17:06 2001 From: tismer@tismer.com (Christian Tismer) Date: Fri, 19 Jan 2001 17:17:06 +0100 Subject: [Python-Dev] Rich comparison confusion References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> Message-ID: <3A686882.F78C1268@tismer.com> Guido van Rossum wrote: > > > Yes, the "inverse" is confusing. Is what you mean the "reverse" ? > > Like the other right-side operators __radd__, is it correct to > > think of > > > > __ge__ == __rle__ > > > > if __rle__ was written in the same fashion like __radd__ ? > > It looks semantically the same, although the reason for a > > call might be different. > > Yes, it's semantically the same, and the reason for the call is the > same too ("the left argument doesn't support the operator so let's try > if the right one knows"). > > > And if my above view is right, would it perhaps be less > > confusing to use in fact __rle__ and __rlt__, > > or woudl it be more confusing, since __rlt__ would also be > > invoked left-to-right, implementing ">". > > I prefer 6 new operators over 12 any day. I can see no valid reason > why someone would want to overload a>b different than b there are plenty of reasons why a+b and b+a should be different: > e.g. string concatenation. Sure, I didn't want to introduce new operators, but use the "r" versions for three of the six new operators. But I should have read you proposal before. The confusion is not due to you, but Skip had a read error, since you don't talk about inverses at all: Skip==""" In the description he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. """ Truth==""" There are no explicit "reversed argument" versions of these; instead, __lt__ and __gt__ are each other's reverse, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reverse (similar at the C level). """ No reason for confusion at all > python-dev/null - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas@xs4all.net Fri Jan 19 16:20:56 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 17:20:56 +0100 Subject: [Python-Dev] test_ucn errors ? Message-ID: <20010119172056.K17392@xs4all.nl> I'm currently seeing a failure in test_ucn: test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding error: Illegal Unicode character It looks like one of the unicode literals in test_ucn is invalid, but it's damned hard to pin down which: Python 2.1a1 (#7, Jan 19 2001, 17:06:32) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import test.test_ucn Traceback (most recent call last): File "", line 1, in ? UnicodeError: Unicode-Escape decoding error: Illegal Unicode character >>> I get the same crashes on FreeBSD and (Debian) Linux. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Fri Jan 19 16:26:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:26:34 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Your message of "Fri, 19 Jan 2001 00:02:09 +0100." <20010119000209.F17392@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: <200101191626.LAA29165@cj20424-a.reston1.va.home.com> > This brings me to another point: how can 'make test' work at all ? Does > python always check for './Lib' (and './Modules') for modules ? Look at the logic in Modules/getpath.c, which calculates the initial (default) sys.path. It detects that it's running from the build tree and then modifies the default path a bit to include Lib and Modules relative to where the python executable was found. > If that's > specific for 'make test' and running python in the source distribution, that > sounds like a bit of a weird hack. I can't find any such hackery in the > source, but I also can't figure out how else it's working :) It's not jut for 'make test' -- it's to make life easy for developers in general (and me in particular :-) who want to try out their hacks without going through 'make install'. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Jan 19 16:34:58 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 17:34:58 +0100 Subject: [Python-Dev] Re: test_support.py References: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> Message-ID: <3A686CB2.C75D184D@lemburg.com> Guido van Rossum wrote: > > > if not condition: > > ! raise AssertionError(reason) > > Wouldn't it be better if this raised TestFailed rather than > AssertionError? Or is there code that catches the AssertionError? > > [...grep...] > > Yes, there's code that catches AssertionError: > > (1) in Marc-Andre's own test_unicode.py; > > (2) in test_re, which catches AssertionError and raises TestFailed > instead. > > Proposal: > > (1) change verify() to raise TestFailed; > > (2) change test_unicode.py to catch TestFailed instead. +1 Why not simply make TestFailed a subclass of AssertionError ? Then we wouldn't have to fear about breaking test code... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas@xs4all.net Fri Jan 19 16:34:15 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 17:34:15 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <200101191626.LAA29165@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, Jan 19, 2001 at 11:26:34AM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> Message-ID: <20010119173415.M17295@xs4all.nl> On Fri, Jan 19, 2001 at 11:26:34AM -0500, Guido van Rossum wrote: > > This brings me to another point: how can 'make test' work at all ? Does > > python always check for './Lib' (and './Modules') for modules ? > Look at the logic in Modules/getpath.c, which calculates the initial > (default) sys.path. It detects that it's running from the build tree > and then modifies the default path a bit to include Lib and Modules > relative to where the python executable was found. Aye, I found it now. > > If that's > > specific for 'make test' and running python in the source distribution, that > > sounds like a bit of a weird hack. I can't find any such hackery in the > > source, but I also can't figure out how else it's working :) > It's not jut for 'make test' -- it's to make life easy for developers > in general (and me in particular :-) who want to try out their hacks > without going through 'make install'. Well, after some old SF movies & some sleep, I realized that :) But it is going to have to change: you now have to include the build tree as well, and that is quite a bit more difficult to figure out. I'd suggest a 'make run' that calls python with the appropriate PYTHONPATH environment variable, but that doesn't cover test-scripts (which I use a lot myself.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Fri Jan 19 16:34:45 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:34:45 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Fri, 19 Jan 2001 12:02:00 +0100." <20010119110200.9E455373C95@snelboot.oratrix.nl> References: <20010119110200.9E455373C95@snelboot.oratrix.nl> Message-ID: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> > I get the impression that I'm currently seeing a non-NULL third > argument in my (C) methods even though the method is called without > keyword arguments. > Is this new semantics that I missed the discussion about, or is this a bug? Can't tell without spending more time looking at the code and experimenting than I can afford today; but Jeremy refactored the calling code, and it could be that you're seeing an empty dictionary instead of a NULL. Do you really need the NULL? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 16:41:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:41:02 -0500 Subject: [Python-Dev] Mail delays and SourceForge bugs In-Reply-To: Your message of "Fri, 19 Jan 2001 13:26:31 +0100." <20010119132631.I17392@xs4all.nl> References: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> <20010119132631.I17392@xs4all.nl> Message-ID: <200101191641.LAA29324@cj20424-a.reston1.va.home.com> > I doubt it's (just) you, Guido. I'm seeing similar delays, and I already > talked with Barry about it, too. It looks like it's clearing up a bit, now, > but it's confusing as hell, for sure ;) It's worse for me though than for most people: for others, only mail sent through mailman at mail.python.org is affected. For me, mail sent directly to guido@python.org is affected too (which is why I've changed my From address again to that old standby, guido@digicool.com). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 16:53:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:53:39 -0500 Subject: [Python-Dev] deprecated regex used by un-deprecated modules In-Reply-To: Your message of "Thu, 18 Jan 2001 22:20:08 EST." <14951.45672.806978.600944@localhost.localdomain> References: <14951.45672.806978.600944@localhost.localdomain> Message-ID: <200101191653.LAA29774@cj20424-a.reston1.va.home.com> > There are several modules in the standard library that use the regex > module. When they are imported, they print a warning about using a > deprecated module. I think this is bad form. Either the modules that > depend on regex should by updated to use re or they should be > deprecated themselves. > > I discovered the following offenders: > asynchat > knee > poplib > reconvert > > I would suggest fixing asynchat and poplib and deprecating knee. The > reconvert module may be a special case. Agreed. There's an idiom to disable the warning, which you can find in regsub.py: import warnings warnings.filterwarnings("ignore", "", DeprecationWarning, __name__) (The "" should be replaced by the specific warning message though.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jan 19 17:21:28 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 12:21:28 -0500 Subject: [Python-Dev] test_ucn errors ? In-Reply-To: Your message of "Fri, 19 Jan 2001 17:20:56 +0100." <20010119172056.K17392@xs4all.nl> References: <20010119172056.K17392@xs4all.nl> Message-ID: <200101191721.MAA31937@cj20424-a.reston1.va.home.com> > I'm currently seeing a failure in test_ucn: > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > error: Illegal Unicode character > > It looks like one of the unicode literals in test_ucn is invalid, but it's > damned hard to pin down which: Feels to me like there's a bug in the string literal processing that makes *any* string literal containing \N{...} fail during code generation. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Fri Jan 19 17:37:41 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 18:37:41 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> Message-ID: <023801c0823e$86fcedc0$e46940d5@hagrid> > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > error: Illegal Unicode character Make sure you rebuild Objects/unicodeobject.o and the ucnhash extension. If they build without warnings, run the following script. import ucnhash count = 0 for code in range(65536): try: name = ucnhash.getname(code) if ucnhash.getcode(name) != code: print name count += 1 except ValueError: pass print count if it prints anything but "10538", let me know. > It looks like one of the unicode literals in test_ucn is invalid, but it's > damned hard to pin down which: If the ucnhash extension cannot be found, the script won't even compile... shouldn't be too hard to fix. From Barrett@stsci.edu Fri Jan 19 17:32:26 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Fri, 19 Jan 2001 12:32:26 -0500 (EST) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <200101191600.LAA28788@cj20424-a.reston1.va.home.com> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> Message-ID: <14952.30800.112503.123675@nem-srvr.stsci.edu> Guido van Rossum writes: > > ... I can see no valid reason why someone would want to overload > a>b different than b I agree. But this assumes that the result of AA is a collection of Booleans. In the Interactive Data Language (IDL) these operators are essentially mapped to ceiling and floor functions which are not commutative. I personally find this silly, but IDL users coming to Python may be surprised when the comparison of two Numeric arrays returns a Boolean-like result. -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From nas@arctrix.com Fri Jan 19 10:43:12 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Fri, 19 Jan 2001 02:43:12 -0800 Subject: [Python-Dev] new Makefile.in In-Reply-To: <20010119111455.C25056@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Fri, Jan 19, 2001 at 11:14:55AM -0500 References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> <200101191549.KAA28699@cj20424-a.reston1.va.home.com> <20010119111455.C25056@kronos.cnri.reston.va.us> Message-ID: <20010119024312.A16179@glacier.fnational.com> On Fri, Jan 19, 2001 at 11:14:55AM -0500, Andrew Kuchling wrote: > Which reminds me... there should really be a way to ignore the > setup.py stuff and use the old method. How should that be done. A > --use-makesetup flag to configure, maybe? A different target for make would be easy. Neil From fredrik@effbot.org Fri Jan 19 18:13:15 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 19:13:15 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> Message-ID: <03a201c08243$7fa62af0$e46940d5@hagrid> thomas wrote: > > I'm currently seeing a failure in test_ucn: > > > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > > error: Illegal Unicode character > > > > It looks like one of the unicode literals in test_ucn is invalid, but it's > > damned hard to pin down which: > > Feels to me like there's a bug in the string literal processing that > makes *any* string literal containing \N{...} fail during code > generation. I took another look at the error message: the only explanation I can see here is that the lookup succeeds, but the call to ucn- hash returns a value larger than 0x10ffff. What is Py_UCS4 set to under gcc? Confusing /F From guido@digicool.com Fri Jan 19 18:11:21 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 13:11:21 -0500 Subject: [Python-Dev] Re: test_support.py In-Reply-To: Your message of "Fri, 19 Jan 2001 17:34:58 +0100." <3A686CB2.C75D184D@lemburg.com> References: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> <3A686CB2.C75D184D@lemburg.com> Message-ID: <200101191811.NAA32539@cj20424-a.reston1.va.home.com> > > Proposal: > > > > (1) change verify() to raise TestFailed; > > > > (2) change test_unicode.py to catch TestFailed instead. > > +1 > > Why not simply make TestFailed a subclass of AssertionError ? > Then we wouldn't have to fear about breaking test code... No, I'd rather see the two separated. There can be assert statements in the modules we're testing, and I'd prefer not to see those caught by test code that is trying to catch TestFailed. I'll check this in momentarily. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Fri Jan 19 18:19:37 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 19:19:37 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> Message-ID: <03b301c08244$627f22a0$e46940d5@hagrid> > Feels to me like there's a bug in the string literal processing that > makes *any* string literal containing \N{...} fail during code > generation. umm. can anyone explain how this can happen: python ../lib/test/regrtest.py test_ucn test_ucn 1 test OK. python ../lib/test/test_ucn.py UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name how can a test that works under regrtest.py fail when it's run separately? what am I missing here? From mal@lemburg.com Fri Jan 19 18:48:53 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 19:48:53 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> <03a201c08243$7fa62af0$e46940d5@hagrid> Message-ID: <3A688C15.8C9CFF46@lemburg.com> Fredrik Lundh wrote: > > thomas wrote: > > > I'm currently seeing a failure in test_ucn: > > > > > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > > > error: Illegal Unicode character > > > > > > It looks like one of the unicode literals in test_ucn is invalid, but it's > > > damned hard to pin down which: > > > > Feels to me like there's a bug in the string literal processing that > > makes *any* string literal containing \N{...} fail during code > > generation. > > I took another look at the error message: the only explanation > I can see here is that the lookup succeeds, but the call to ucn- > hash returns a value larger than 0x10ffff. > > What is Py_UCS4 set to under gcc? Should be "unsigned int" on all modern Intel platforms. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@digicool.com Fri Jan 19 18:48:45 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 13:48:45 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Fri, 19 Jan 2001 12:32:26 EST." <14952.30800.112503.123675@nem-srvr.stsci.edu> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> <14952.30800.112503.123675@nem-srvr.stsci.edu> Message-ID: <200101191848.NAA02765@cj20424-a.reston1.va.home.com> > > ... I can see no valid reason why someone would want to overload > > a>b different than b > > > I agree. But this assumes that the result of AA is a > collection of Booleans. In the Interactive Data Language (IDL) these > operators are essentially mapped to ceiling and floor functions which > are not commutative. I personally find this silly, but IDL users > coming to Python may be surprised when the comparison of two Numeric > arrays returns a Boolean-like result. This means that Python can't be used to emulate this part of IDL. I don't understand how these can be not commutative unless they have a side effect on the left argument, and that's not possible in Python anyway. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Fri Jan 19 19:18:04 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 19 Jan 2001 14:18:04 -0500 Subject: [Python-Dev] test_ucn errors ? Message-ID: [/F] > umm. can anyone explain how this can happen: > > python ../lib/test/regrtest.py test_ucn > test_ucn > test OK. > > python ../lib/test/test_ucn.py > UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name > > how can a test that works under regrtest.py fail when > it's run separately? what am I missing here? Dunno, but add to the pile of mysteries that you're unique. Here on Win98SE: python ../lib/test/regrtest.py test_ucn test_ucn test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name 1 test failed: test_ucn python ../lib/test/test_ucn.py UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name I suggest you reformat your hard drive, and reinstall Windows . From mwh21@cam.ac.uk Fri Jan 19 19:25:03 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 19 Jan 2001 19:25:03 +0000 Subject: [Python-Dev] test_ucn errors ? In-Reply-To: "Fredrik Lundh"'s message of "Fri, 19 Jan 2001 19:19:37 +0100" References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> <03b301c08244$627f22a0$e46940d5@hagrid> Message-ID: "Fredrik Lundh" writes: > > Feels to me like there's a bug in the string literal processing that > > makes *any* string literal containing \N{...} fail during code > > generation. > > umm. can anyone explain how this can happen: > > python ../lib/test/regrtest.py test_ucn > test_ucn > 1 test OK. This will run the .pyc if present? > python ../lib/test/test_ucn.py > UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name This won't? Note: no traceback -> (in effect, if not design) compile time error. > how can a test that works under regrtest.py fail when > it's run separately? what am I missing here? Well, this is just my guess. Cheers, M. -- Well, you pretty much need Microsoft stuff to get misbehaviours bad enough to actually tear the time-space continuum. Luckily for you, MS Internet Explorer is available for Solaris. -- Calle Dybedahl, alt.sysadmin.recovery From skip@mojam.com (Skip Montanaro) Fri Jan 19 19:55:29 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 19 Jan 2001 13:55:29 -0600 (CST) Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010119173415.M17295@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> Message-ID: <14952.39857.83065.24889@beluga.mojam.com> Thomas> But it is going to have to change: you now have to include the Thomas> build tree as well, and that is quite a bit more difficult to Thomas> figure out. I'd suggest a 'make run' that calls python with the Thomas> appropriate PYTHONPATH environment variable, but that doesn't Thomas> cover test-scripts (which I use a lot myself.) Doesn't Andrew's new "platform" target in the top-level Makefile do the right thing? It *should* generate a platform-specific path to the correct build subdirectory. Skip From MarkH@ActiveState.com Fri Jan 19 20:11:02 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Fri, 19 Jan 2001 12:11:02 -0800 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) In-Reply-To: <010c01c08201$4b0ec050$e46940d5@hagrid> Message-ID: > you can compile the module as C++, but that's also a bit painful... My understanding is that the C std doesn't guarantee the order of static object initialization, whereas C++ does provide these semantics. At least that is the excuse I found when digging into this some years ago. Can't-believe-I-mentioned-the-C-standard-while-Tim-is-listening ly, Mark. From guido@digicool.com Fri Jan 19 20:44:53 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 15:44:53 -0500 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Fri, 19 Jan 2001 10:58:08 +0100." <3A680FB0.AED2DB55@lemburg.com> References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> <3A680FB0.AED2DB55@lemburg.com> Message-ID: <200101192044.PAA04154@cj20424-a.reston1.va.home.com> > If we agree to merge the semantics of the two APIs, then str() > would have to change too: is this desirable ? (IMHO, yes) Not clear. Which is why I'm backing off from my initial support for merging the two. I believe unicode() (which is really just an interface to PyUnicode_FromEncodedObject()) currently already does too much. In particular this whole business with calling __str__ on instances seems to me to be unnecessary. I think it should *only* bother to look for something that supports the buffer interface (checking for regular strings only as a tiny optimization), or existing unicode objects. > Here's what we could do: > > a) merge the semantics of unistr() into unicode() > b) apply the same semantics in str() > c) remove unistr() -- how's that for a short-living builtin ;) > > About the semantics: > > These should be backward compatible to str() in that everything > that worked before should continue to work after the merge. > > A strawman for processing str() and unicode(): > > 1. strings/Unicode is passed back as-is I hope you mean str() passes 8-bit strings back as-is, unicode() passes Unicode strings back as-is, right? > 2. tp_str is tried > 3. the method __str__ is tried Shouldn't have to -- instances should define tp_str and all the magic for calling __str__ should be there. I don't understand why it's not done that way, probably just for historical reasons. I also don't think __str__ should be tried for non-instance types. But, more seriously, I believe tp_str or __str__ shouldn't be tried at all by unicode(). > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > 5. for str(): Unicode return values are converted to strings using > the default encoding > for unicode(): Unicode return values are passed back as-is; > string return values are decoded according to the > encoding parameter > 6. the return object is type-checked: str() will always return > a string object, unicode() always a Unicode object > > Note that passing back Unicode is only allowed in case no encoding > was given. Otherwise an execption is raised: you can't decode > Unicode. > > As extension we could add encoding and error parameters to str() > as well. The result would be either an encoding of Unicode objects > passed back by tp_str or __str__ or a recoding of string objects > returned by checks 2, 3 or 4. Naaaah! > If we agree to take this approach, then we should remove the > unistr() Python API before the alpha ships. Frankly, I believe we need more time to sort this out, and therefore I propose to remove the unistr() built-in before the release. Marc, would you do the honors? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Fri Jan 19 20:55:53 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 21:55:53 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <14952.39857.83065.24889@beluga.mojam.com>; from skip@mojam.com on Fri, Jan 19, 2001 at 01:55:29PM -0600 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> <14952.39857.83065.24889@beluga.mojam.com> Message-ID: <20010119215552.O17295@xs4all.nl> On Fri, Jan 19, 2001 at 01:55:29PM -0600, Skip Montanaro wrote: > > Thomas> But it is going to have to change: you now have to include the > Thomas> build tree as well, and that is quite a bit more difficult to > Thomas> figure out. I'd suggest a 'make run' that calls python with the > Thomas> appropriate PYTHONPATH environment variable, but that doesn't > Thomas> cover test-scripts (which I use a lot myself.) > Doesn't Andrew's new "platform" target in the top-level Makefile do the > right thing? It *should* generate a platform-specific path to the correct > build subdirectory. Yes, it does, that's what I meant with 'make run'. But that isn't quite as user-friendly as the current method. How would you run a script with the current python ? 'make SCRIPT=./spamtest.py runscript' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Fri Jan 19 22:06:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 17:06:03 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Your message of "Fri, 19 Jan 2001 17:34:15 +0100." <20010119173415.M17295@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> Message-ID: <200101192206.RAA12072@cj20424-a.reston1.va.home.com> I finally figured the best way to fix sys.path to find shared modules built by setup.py. At first I thought I had to add it to getpath.c, but the problem is that the name is calculated by calling distutils.util.get_platform(), and that requires a working Python interpreter, so we'd end up with a chicken-or-egg situation. So instead I added 5 lines to site.py, which tests for os.name=='posix', then for sys.path[-1] ending in '/Modules' -- this tests only succeeds when running from the build directory. Then it calls distutils.util.get_platform() and uses the result to calculate the correct directory name, which is then appended to sys.path. Yes, this slows down startup (it imports a large portion of the distutils package), but I don't care -- after all this is mostly for me so I can play with the interpreter right after I've built it, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Jan 19 21:32:34 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:32:34 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> <3A680FB0.AED2DB55@lemburg.com> <200101192044.PAA04154@cj20424-a.reston1.va.home.com> Message-ID: <3A68B272.BBBAECD1@lemburg.com> Guido van Rossum wrote: > > > If we agree to merge the semantics of the two APIs, then str() > > would have to change too: is this desirable ? (IMHO, yes) > > Not clear. Which is why I'm backing off from my initial support for > merging the two. > > I believe unicode() (which is really just an interface to > PyUnicode_FromEncodedObject()) currently already does too much. In > particular this whole business with calling __str__ on instances seems > to me to be unnecessary. I think it should *only* bother to look for > something that supports the buffer interface (checking for regular > strings only as a tiny optimization), or existing unicode objects. Hmm, unicode() should (just like str()) take an object and convert it to a Unicode string. Since many objects either don't support the tp_str slot (instances don't for some reason -- just like they don't tp_call), I had to add some special cases to make Python instances compatible to Unicode in the same way str() does. What I think is really needed is a concept for "stringification" in Python. We currently have these schemes: 1. tp_str 2. method __str__ (not only of Python instances, but any object) 3. character buffer interface These three could easily be unified into the tp_str slot: e.g. tp_str could do the necessary magic to call __str__ or the buffer interface. Note that the same is true for e.g. tp_call -- the special cases we have in ceval.c for the different builtin callable objects would not be necessary if they would implement tp_call. > > Here's what we could do: > > > > a) merge the semantics of unistr() into unicode() > > b) apply the same semantics in str() > > c) remove unistr() -- how's that for a short-living builtin ;) > > > > About the semantics: > > > > These should be backward compatible to str() in that everything > > that worked before should continue to work after the merge. > > > > A strawman for processing str() and unicode(): > > > > 1. strings/Unicode is passed back as-is > > I hope you mean str() passes 8-bit strings back as-is, unicode() > passes Unicode strings back as-is, right? Right. > > 2. tp_str is tried > > 3. the method __str__ is tried > > Shouldn't have to -- instances should define tp_str and all the magic > for calling __str__ should be there. I don't understand why it's not > done that way, probably just for historical reasons. I also don't > think __str__ should be tried for non-instance types. Ok. > But, more seriously, I believe tp_str or __str__ shouldn't be tried at > all by unicode(). Hmm, but how would you implement generic conversion to Unicode then ? We'll need some way for instances (and other types) to provide a conversion to Unicode. Some time ago we discussed this issue and came to the conclusion that tp_str should be allowed to return Unicode data instead of inventing a new tp_unicode slot for this purpose. > > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > > 5. for str(): Unicode return values are converted to strings using > > the default encoding > > for unicode(): Unicode return values are passed back as-is; > > string return values are decoded according to the > > encoding parameter > > 6. the return object is type-checked: str() will always return > > a string object, unicode() always a Unicode object > > > > Note that passing back Unicode is only allowed in case no encoding > > was given. Otherwise an execption is raised: you can't decode > > Unicode. > > > > As extension we could add encoding and error parameters to str() > > as well. The result would be either an encoding of Unicode objects > > passed back by tp_str or __str__ or a recoding of string objects > > returned by checks 2, 3 or 4. > > Naaaah! Would be nice for symmetry and useful in the light of making Unicode the only string type in Py4k ;-) > > If we agree to take this approach, then we should remove the > > unistr() Python API before the alpha ships. > > Frankly, I believe we need more time to sort this out, and therefore I > propose to remove the unistr() built-in before the release. Marc, > would you do the honors? Ok. I'll remove the builtin and the docs, but will leave the PyObject_Unicode() API enabled. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From uche.ogbuji@fourthought.com Fri Jan 19 21:42:40 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 19 Jan 2001 14:42:40 -0700 Subject: [Python-Dev] Extension doc bugs Message-ID: <200101192142.OAA29168@localhost.localdomain> I'm using the bleeding-edge documentation at http://python.sourceforge.net/devel-docs/api/api.html I know that it's not complete until someone has the time to do so, but I've run into a few places where it's completely wrong. For instance, from the object protocol docs: """ int PyObject_Cmp (PyObject *o1, PyObject *o2, int *result) Compare the values of o1 and o2 using a routine provided by o1, if one exists, otherwise with a routine provided by o2. The result of the comparison is returned in result. Returns -1 on failure. This is the equivalent of the Python statement "result = cmp(o1, o2)". """ After getting weird behavior implementing this, and then squinting at the relevant Python 2.0 code, it appears that in actuality the Cmp function is to return the direct comparison results (-1, 0, 1 based on ordering of the parameters) furthermore, there is no such "result" argument. 4Suite has a lot of C extension code developed by squinting at Python sources and long gdb sessions and I have a feeling that in many cases we're taking up hacks that would get us into trouble across versions, and all that; but the "official" interfaces and behaviors are not documented (or only poorly documented). In general, the C API docs are in a rather sorry state and though I doubt I could do a great deal about fixing it, I'd be interested in discussion of the matter, and perhaps making what contribution I can. Is the doc-sig the best place for this? My experience there wouldn't seem to encourage this conclusion (most of the discussion is of docstring syntax and neat-o automagic document generators). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mal@lemburg.com Fri Jan 19 21:46:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:46:24 +0100 Subject: [Python-Dev] readline and setup.py Message-ID: <3A68B5B0.771412F7@lemburg.com> The new setup.py procedure for Python causes readline not to be built on my machine. Instead I get a linker error telling me that termcap is not found. Looking at my old Setup file, I have this line: readline readline.c \ -I/usr/include/readline -L/usr/lib/termcap \ -lreadline -lterm I guess, setup.py should be modified to include additional library search paths -- shouldn't hurt on platforms which don't need them. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Jan 19 21:50:53 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:50:53 +0100 Subject: [Python-Dev] _tkinter and setup.py Message-ID: <3A68B6BD.BAD038D6@lemburg.com> Why does setup.py stop with an error in case _tkinter cannot be built (due to an old Tk/Tcl version in my case) ? I think the policy in setup.py should be to output warnings, but continue building the rest of the Python modules. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@digicool.com Fri Jan 19 22:38:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 17:38:22 -0500 Subject: [Python-Dev] 2.1 alpha 1 release schedule Message-ID: <200101192238.RAA12413@cj20424-a.reston1.va.home.com> Practicality beats purity: we're very close to a release, but I've decided to hold off to give Jeremy a chance to finish the nested scopes, to give Fred a chance to revise the weak references according to Martin's wishes, and in general for things to settle. Most likely we'll be able to release Monday night (Jan 22). Unfortunately email through python.org seems to be wedged again (I swear, it seems like it starts getting wedged every afternoon between 3 and 4!) so I don't have a clear view of what the latest checkins were; but from cvs update it seems that the following things happened this afternoon: - Barry fixed a core dump in function attribute assignments - Marc-Andre withrew unistr(), pending more discussion - Fredrik fixed the ucnhash problem - I fixed two path problems in the new build process that only occurred when you were building in a subdirectory of the source tree Good work, crew! I'm taking the weekend off. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack@oratrix.nl Fri Jan 19 23:23:18 2001 From: jack@oratrix.nl (Jack Jansen) Date: Sat, 20 Jan 2001 00:23:18 +0100 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Message by Guido van Rossum , Fri, 19 Jan 2001 11:34:45 -0500 , <200101191634.LAA29239@cj20424-a.reston1.va.home.com> Message-ID: <20010119232323.70B03116392@oratrix.oratrix.nl> Recently, Guido van Rossum said: > > I get the impression that I'm currently seeing a non-NULL third > > argument in my (C) methods even though the method is called without > > keyword arguments. > > > Is this new semantics that I missed the discussion about, or is this a bug? > > [...] > Do you really need the NULL? The places that I know I was counting on the NULL now have "if ( kw && PyObject_IsTrue(kw))", so I'll just have to hope there aren't any more lingering in there. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim.one@home.com Sat Jan 20 00:04:10 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 19 Jan 2001 19:04:10 -0500 Subject: [Python-Dev] MS CRT crashing: In-Reply-To: <200101191556.KAA28761@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I'd just like to note for the record that this is exactly what I had > predicted. I would have hoped you'd be content to let the record speak for itself . > I'd also like to note that I *agree*. With what? That the program is undefined by the C std was never in dispute. > Tim seems to think there's a race condition in the threading code, > but it's really much simpler than that: the same bug can easily be > provoked with a single-threaded program: just randomly read and > write alternatingly. And this is a point in their favor?! "It's OK that the MT library corrupts itself, because even the single-threaded library does"? > So obviously the people who wrote the threading code aren't interested > in the bug, I don't know that it ever got as far as the people who wrote the threading code, but I sure doubt it: when the reply starts "Turns out the C standard explicitly says ...", it strongly suggests it was written by someone who didn't already know what the C std says, and went looking for an excuse to get it off their plate without further effort. Par for the course, if so. > because it's not in their code -- and the people who wrote the code > that doesn't behave well when abused are protected by the C standard... The behavior of things designated "undefined" and "implementation-defined" by the std fall under "quality of implementation". In the real world, the latter is what vendors compete on; meeting the letter of the std is a bare minimum for playing the game at all. The plain fact is that their library is less robust than others in this case. I worked on a multithreaded stdio implementation at KSR, and that sure couldn't corrupt itself. Looks like no flavor of Linux does either. It's not *reasonable* for a library to corrupt itself in this case, although it's certainly reasonable for its behavior to vary from run to run. There's nothing in the C std that says a conforming implementation can't *crash* on the program void main() {int i = 1;} either . a-std-is-a-floor-on-acceptable-behavior-not-a-ceiling-ly y'rs - tim From gstein@lyra.org Sat Jan 20 01:21:56 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 19 Jan 2001 17:21:56 -0800 Subject: [Python-Dev] initializing ob_type In-Reply-To: ; from MarkH@ActiveState.com on Fri, Jan 19, 2001 at 12:11:02PM -0800 References: <010c01c08201$4b0ec050$e46940d5@hagrid> Message-ID: <20010119172156.Y7731@lyra.org> On Fri, Jan 19, 2001 at 12:11:02PM -0800, Mark Hammond wrote: > > you can compile the module as C++, but that's also a bit painful... > > My understanding is that the C std doesn't guarantee the order of static > object initialization, whereas C++ does provide these semantics. At least > that is the excuse I found when digging into this some years ago. True, but when PyWhatever_Type is initialized, &PyType_Type ought to be ready (even if it isn't initialized). Heck, &PyType_Type points into the Python core which is *definitely* loaded by that point. Now, if "initialization" also means "relocation to a specific address" then I can understand. Hrm... I've just spent some time with the Windows SDK docs, and I can't find anything that really discusses the problem and resolution. There certainly isn't any warning about "don't do this." It all talks about how fixups are stored with the DLL, how you can optionally use BIND to pre-bind the values, blah blah blah. But nothing saying "it doesn't work." It would be interesting to know more about the actual symptoms that appears when the ob_type init is performed by the structure (rather than at runtime). What happens? Bad address? NULL value? Failure to resolve and load? Is PyType_Type not exported correctly or something? Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@digicool.com Sat Jan 20 02:05:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 21:05:39 -0500 Subject: [Python-Dev] How to get setup.py to build expat? Message-ID: <200101200205.VAA13299@cj20424-a.reston1.va.home.com> The setup.py script does not build the expat module for me. I have expat installed in /usr/local, at least I believe so: I have /usr/local/include/xmlparse.h and /usr/local/lib/libexpat.a -- do I need more? How can I get setup.py to spit out what it tries, and why it fails? setup.py -v build doesn't give any extra output. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Sat Jan 20 02:41:43 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Sat, 20 Jan 2001 03:41:43 +0100 Subject: [Python-Dev] initializing ob_type References: <010c01c08201$4b0ec050$e46940d5@hagrid> <20010119172156.Y7731@lyra.org> Message-ID: <00f001c0828a$bc903900$e46940d5@hagrid> greg wrote: > It would be interesting to know more about the actual symptoms that appears > when the ob_type init is performed by the structure (rather than at runtime). > What happens? http://www.python.org/doc/FAQ.html#3.24 "3.24. "Initializer not a constant" while building DLL on MS-Windows "Static type object initializers in extension modules may cause compiles to fail with an error message like "initializer not a constant" Cheers /F From uche.ogbuji@fourthought.com Sat Jan 20 05:29:23 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Fri, 19 Jan 2001 22:29:23 -0700 Subject: [Python-Dev] Extension doc bugs In-Reply-To: Message from uche.ogbuji@fourthought.com of "Fri, 19 Jan 2001 14:42:40 MST." <200101192142.OAA29168@localhost.localdomain> Message-ID: <200101200529.WAA30349@localhost.localdomain> > For instance, from the object protocol docs: > > """ > int PyObject_Cmp (PyObject *o1, PyObject *o2, int *result) > Compare the values of o1 and o2 using a routine provided by o1, if one > exists, otherwise with a routine provided by o2. The result of the > comparison is returned in result. Returns -1 on failure. This is the > equivalent of the Python statement "result = cmp(o1, o2)". > """ > > After getting weird behavior implementing this, and then squinting at the > relevant Python 2.0 code, it appears that in actuality the Cmp function is to > return the direct comparison results (-1, 0, 1 based on ordering of the > parameters) furthermore, there is no such "result" argument. Bother. I didn't squint hard enough. I mistook the tp_compare slot for the PyObject_Cmp equivalent. I have indeed run into what I'm sure are nits in the Python/C API but given that my greatest alarm was false, I'll be more careful before bringing up the others. I'm still curious as to the best forum for this. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one@home.com Sat Jan 20 05:36:12 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 20 Jan 2001 00:36:12 -0500 Subject: [Python-Dev] Extension doc bugs In-Reply-To: <200101192142.OAA29168@localhost.localdomain> Message-ID: [uche.ogbuji@fourthought.com] > ... > In general, the C API docs are in a rather sorry state and though > I doubt I could do a great deal about fixing it, I'd be interested in > discussion of the matter, and perhaps making what contribution I can. > > Is the doc-sig the best place for this? Nope! Discussing it won't do any good, there or anywhere else. What it needs is for people to send better docs to python-docs@python.org or upload LaTeX patches to SourceForge, and to report doc bugs on SourceForge (which is where the start of this msg should have gone!). Most days we just work on whatever is backed up at SourceForge; if doc bugs don't show up there, they won't get repaired. the-docs-are-only-10x-better-than-the-sum-of-the-individual- contributions-ly y'rs - tim From tim.one@home.com Sat Jan 20 06:17:04 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 20 Jan 2001 01:17:04 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Objects object.c,2.109,2.110 In-Reply-To: Message-ID: [Barry] > Modified Files: > object.c > Log Message: > default_3way_compare(): When comparing the pointers, they must be cast > to integer types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t). > ANSI specifies that pointer compares other than == and != to > non-related structures are undefined. This quiets an Insure > portability warning. Barry, that comment belongs in the code, not in the checkin msg. The code *used* to do this correctly (as you well know, since you & I went thru considerable pain to fix this the first time). However, because the *reason* for the convolution wasn't recorded in the code as a comment, somebody threw it all away the first time it got reworked. c-code-isn't-often-self-explanatory-ly y'rs - tim From tim.one@home.com Sat Jan 20 06:30:42 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 20 Jan 2001 01:30:42 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 Message-ID: I had a huge string and wanted to put a double-quote on each end. The boring: '"' + huge + '"' does the job, but is inefficent . Then this transparent variation sprang unbidden from my hoary brow: huge.join('""') *That* should put to rest the argument over whether .join() is more properly a method of the separator or the sequence -- '""'.join(huge) instead would look plain silly . not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim From tim.one@home.com Sat Jan 20 09:28:18 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 20 Jan 2001 04:28:18 -0500 Subject: [Python-Dev] Comparison of recursive objects In-Reply-To: <14952.21218.416551.695660@localhost.localdomain> Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0000_01C08299.69A67E20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit [Guido's checkin msg] > ... > In a discussion with Tim, where we discovered that our intuition > on when a<=b should be true was failing, we decided to outlaw > ordering comparisons on recursive objects. (Once we have fixed our > intuition and designed a matching algorithm that's practical and > reasonable to implement, we can allow such orderings again.) [Jeremy] > Sounds sensible to me! I was quite puzzled about what <= should > return for recursive objects. That's easy: x <= y for recursive objects should return true if and only if x < y or x == y return true <0.9 wink>. x == y isn't a problem, although Python gives a remarkable answer: recursive objects in Python are instances of rooted, ordered, directed, finite, node-labeled graphs, and "x == y" in Python answers whether their graphs are isomorphic. Viewed that way (which is the correct way <0.5 wink>), the *natural* meaning for "x <= y" is "y contains a subgraph isomorphic to x". And that has *almost* all the nice properties we like: x <= x is true (x <= y and y <= z) implies x <= z (x <= y and y <= x) if and only if x == y However, 1. That's much harder to compute. 2. It implies, e.g., [2] <= [1, 2], and that's not what we *want* non-recursive sequence comparison to mean. 3. It's a partial ordering: given arbitrary x and y, it may be that neither contains an isomorphic image of the other. 4. We've again given up on avoiding surprises in *simple* comparisons among builtin types, like (under current CVS): >>> 1 < [1] < 0L < 1 1 >>> 1 < 1 0 >>> so it's hard to see why we should do any work at all to avoid violating "intuition" when comparing recursive objects: we're already scrubbing the face of intuition with steel wool, setting it on fire, then putting it out with an axe . Now let's look at Guido's example (or one of them, anyway): >>> a = [] >>> a.append(a) >>> a.append("x") >>> b = [] >>> b.append(b) >>> b.append("y") >>> a [[...], 'x'] >>> b [[...], 'y'] >>> I think it's a trick of *typography* that caused my first thought to be "well, clearly, a < b". That is, the *display* shows me two 2-element lists, each with the same "blob" as the first element, and where a[1] is obviously less than b[1]. Since "the blobs" are the same, the second elements control the outcome. But those "blobs" aren't really the same: a[0] is a, and b[0] is b, so asking whether a < b by looking first at their first elements just leads back to the original question: asking whether a[0] < b[0] is again asking whether a < b, and that makes no progress. Saying that a is less than b by fiat is *consistent* with the rules for lexicographic ordering, but so is insisting that a is greater than b. There's no basis for picking one over the other, and so no clear hope of coming up with a generally consistent scheme. Well, one clear hope: if recursive comparison says "not equal", it could resolve the dilemma by comparing object id instead. That would be consistent (I mostly think at the moment ...), but if you run the program above multiple times it may say a < b on some runs and b < a on others. WRT "the right way", it should be clear from the attached picture that neither a nor b contains an isomorphic image of the other, so from that POV they're not comparable (a != b, but neither a <= b nor b <= a holds). So this is what Guido made Python do: >>> a == b # still cool: they're not isomorphic and Python knows it 0 >>> a < b Traceback (most recent call last): File "", line 1, in ? ValueError: can't order recursive values >>> a <= b Traceback (most recent call last): File "", line 1, in ? ValueError: can't order recursive values In light of that, I still find these mildly surprising: >>> a < a 0 >>> a <= a 1 >>> I guess some recursive values are more orderable than others . >>> import copy >>> c = copy.deepcopy(a) >>> c [[...], 'x'] >>> a == c 1 >>> a <= c 1 >>> a < c 0 >>> BTW, this kind of construction appears to give equality-testing that's at best(!) exponential-time in the size of the dicts: def timeeq(x, y): from time import clock import sys s = clock() result = x == y f = clock() print x, result, round(f-s, 1), "seconds" sys.stdout.flush() d = {} e = {} timeeq(d, e) d[0] = d e[0] = e timeeq(d, e) d[1] = d e[1] = e timeeq(d, e) d[2] = d e[2] = e timeeq(d, e) Output: {} 1 0.0 seconds {0: {...}} 1 0.0 seconds {1: {...}, 0: {...}} 1 6.5 seconds After more than 15 minutes, the 3-element dict comparison still hasn't completed (yikes!). ackerman's-function-eat-your-heart-out-ly y'rs - tim ------=_NextPart_000_0000_01C08299.69A67E20 Content-Type: image/jpeg; name="loopy.jpg" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="loopy.jpg" /9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcU FhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgo KCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAARCAGsAdIDASIA AhEBAxEB/8QAHAABAQEBAQEBAQEAAAAAAAAAAAYHCAUDBAIJ/8QASBAAAQIFAQMGCQgJBQADAQAA AAECAwQFBhESBxMhFBYiMVZ1CBc3QZOVs9LUFTI2UVSktdMYI0JVYWZxluMkM4GRlCVEsaH/xAAb AQEAAwEBAQEAAAAAAAAAAAAAAwQFBgIBB//EADkRAQACAQIDBAYIBQUBAAAAAAABAgMEEQUhQQYS MVETYXGBkaEVIlJT0dLh8BQyQpLBFiNUscIz/9oADAMBAAIRAxEAPwDqkAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAEBTaFAuC6LyiVKerf8ApanCl4EOWrM3LQ4cPkMq/CMhRWtTpRHrnGVVynq8w6R9 suT+46j+eBVAleYdI+2XJ/cdR/PHMOkfbLk/uOo/ngVQJXmHSPtlyf3HUfzxzDpH2y5P7jqP54FU CV5h0j7Zcn9x1H88cw6R9suT+46j+eBVAleYdI+2XJ/cdR/PHMOkfbLk/uOo/ngVQJXmHSPtlyf3 HUfzxzDpH2y5P7jqP54FUCV5h0j7Zcn9x1H88/Jaki2kX9cFNlpupRpJtMp8w2HO1CPN6Ij4s41y tWM9ytykNmURcdFALUAAAAAAAAE1tEm52TttrqZORZGZjVCQlOUQmMc+GyNOQYT1aj2ubnS92MtU +XNer9vLk9BTvhQKoErzXq/by5PQU74Uc16v28uT0FO+FAqgSvNer9vLk9BTvhRzXq/by5PQU74U CqBK816v28uT0FO+FHNer9vLk9BTvhQKoErzXq/by5PQU74Uc16v28uT0FO+FAqgSvNer9vLk9BT vhRzXq/by5PQU74UCqBK816v28uT0FO+FPEvimV6hWXcFXk75uB8zT6fMTcJsWXp6sc+HDc5EciS qLjKJnCoBooAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlbM+kV+d9Q/w6SKolbM+kV+d9Q/w6SK oAAAAAAAAAAAAAAErTfKncXctM9vPlUStN8qdxdy0z28+BVAAAAAAAAldpf0dk++qT+Iy5VErtL+ jsn31SfxGXKoAAAAAAAAAAAAAAErtZ8ll5dyznsHlUSu1nyWXl3LOeweBVAAAAAAAAAAAAAAAAAA AAAAAAA5AuC7bv2keEDM2bTbqqVtU2DOzUnL8ieqaNzDdqc7QsNX63QVVEc5dOtURfroPBE2lXDc VbqdtXFPzNUhQZLlktMTMRHxIWmKiParlTU/VvW8XOXSjEREwvAOnwAAAAEXKbS7embyjW0kaNCn YcR0BIsViNhRIqLhYbVznVnKJlERVTCKuUzaGRbVNkcKvR5msW65kvU3NV8WVVESHNPz1oucMcqZ yvU5cZx0nLPWHtaqNBmn0faBCnV0aGsjxIOmPA4NTEVq4VzdPS1cXdfzspijGptivNM8bRPhPR1N uCYNfpo1HCrd61Yjv0n+bfzjzj9xz5RqlmfSK/O+of4dJFURWzapSdXqN6T9NmGTEnHq8J8OIzqV Pk+T/wCUVFyiovFFRUXiWpdiYmN4cxatqWmto2mAAH15AAAAAAAAAAAJWm+VO4u5aZ7efKolab5U 7i7lpnt58CqAAAAAACPqW0Sh0+9YFsTCzXL4rmQ1iNhZhMe9MsYq5zlct4oip0kyqcceL3rTnadk +n0ubUzNcNZtMRMzt5R4y+20v6OyffVJ/EZcqiV2l/R2T76pP4jLlUe0AAAAAAAAAAAAAAErtZ8l l5dyznsHlUSu1nyWXl3LOeweBVAAAAAAAAAAAAAAAAAAAAAAAA41uCt3Dta28zNgVmuzMjbbanNS vJZJqMasKBqd0k/bc7cIqK/UjXOVUTHRPy+BR5U6r3LF9vAN6u/wf7Jum7nXBPQp6XjxntiTMtKR Ww4Ey9Fy5z00q5Fd1OVrm56/nKrl/Xsu2J25s4r8xV6HO1ePMx5V0o5s5FhuYjFexyqiNhtXOWJ5 /rA+O2GjX3UqtIRLPmZpsg2ArXw5WcSXckTUuXOy5upFTSicVxpdwTPHP+au177RWvXLfzTpUFPJ oq5LTabTz9bpNF2mzaPBXBXDjmK9ZrO/v5w5q5q7XvtFa9ct/NHNXa99orXrlv5p0qDx9H0+1b4/ os/6w1H3GL+2fzOauau177RWvXLfzSHvaQuCRqTGXZMvjT6N0aY08yZiw2p0kR2HuVqdLKIuM5VU 850NtrvSq2fSZF1GlmK+cdEhum4jFc2AqN4IidWpcqqZynQXgvmk9mmyOLMx4FwXq58WNFcsf5Pj IrnPcqoqOjOVcqqrlVYqfVqX5zSlm0sWv6LHMzPXeeUOl4bx62LTfSGtrjpSd4rFaz3rTHj1nb3/ AC5b/wAeDPRJyFGrFWiTD4UsmmTdJua5querIcZsRUXCY0RG6V45SIvV592JWzPpFfnfUP8ADpIq jWwYYw0ikPz3inEcnE9TbU5I2mekdIjw9vtAATM8AAAAAAAAAAAzmsXZRrV2oVd9fnFlGTVGp6QX bmI9HqyPO6k6LV6tbev6zRjK7tsen3vtNqMKrTM7ChyNHkXQklnMblYkacR2dTVz/tt//pHlnJFf 9vbf1rvD66S2eI1szFOf8u2/q8fwez43LI/ff3SP7g8blkfvv7pH9w8HxDWx9vrXpoX5Y8Q1sfb6 16aF+WVO9rPs1/fvdB6Ds397l+X5XveNyyP3390j+4PG5ZH77+6R/cPB8Q1sfb616aF+WPENbH2+ temhfljvaz7Nf37z0HZv73L8vyve8blkfvv7pH9wyXbXVLPuXc1a3qnBWrQ+hMQ+SxmOmWcEaupW o3U3+PWi9fRai3niGtj7fWvTQvyzyLp2T2RbFGjVOr1WtQ4EPg1qRYSvivXqYxN3xcuP/wBVcIiq kOeupyY5rkiu379bR4Vk4Jo9VTLpMmWb+ERtE779Jju89/18Xiyu1iVqdo0mj1yHMMqEtUKfEiTn GJDfCgTUGK6I9cq/WrYbsoiLlePDOE6KOMZSiQq3XXNocCehUTlktLOmJjTEfAbHjMgsc/Glqqrn oulOOM8V0qp2HR5CFSqTJU+Xc90GUgMl2OeqK5WsajUVcIiZwn1Eugy5clZ7/hHhKj2t0Gg0eWv8 LyvbebV8t+cezry/x4/rABoOPDmCseEDdMlXtosjCp9EWFbu95IroMXU/TPwZdN5+s49CI5eGOKJ 5uC9Pk/Fs6hRZm5I8SRzFuKCyXqjt9E/1ENsNYbU+d0cMcqZbheOeviBnWzHbpQ67Q7dg3VOwpC6 Ku9YcOUgyMy2FEV0d8KHocrXNVF0oirqVEcjkVUwqJ9ts+22jWTIVinUqelo14ye53chMysd0Nda scuXNRrf9tyu+f148/AzTwvLRg29SrLuG3U5B8laKRDfDjxN9DaxqvltC5XGjRF6WUdlzevzefsF o8ltR2wX7dFWZFnKSrI8OHCmoz2x2smlfDY3orjCQGxYeNXRy3T1IqB7dH8IG6Z2vbOpGLT6IkK4 t1ytWwYupmqfjS67v9Zw6ENq8c8VXzcE6fJ+FZ1ChTNtx4cjiLbsF8vS3b6J/p4boaQ3J87pZY1E y7K8M9fEoAAAAErtZ8ll5dyznsHlUSu1nyWXl3LOeweBVAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAACVsz6RX531D/DpIqjnrZnfFUqu1yYgy809tIq87HmokB8GGjnI2Bph5VEVUVGQYScF/Z8+VVe hSHBnrnrNq9J2afFOFZuF5a4c8xvMRblv13jnvEc+QACZmAAAAAAAAAAAErTfKncXctM9vPlUStN 8qdxdy0z28+BVAAAAABzrAtO9No13ql6NmpCQknOR6rC3bGNVy5ZATqeq4xr6XBEVVd0UXooEGbB GbaLTyjp5tXhnFsnDYvbDWO/aNotMc6+e3t3/wCkFdVCplu2XJSFFlGSsqlcpT9DVVVc5ajL5VXK qqq9SZVepETqRC9JXaX9HZPvqk/iMuVRNERWNoZuTJfLab3neZ8ZnnMgAPrwAADkDwsKhEvDa3bV k0yLLb2W3cvriNe3dzM09nB7sLlqMSC7LUXGp3WvBPlYkwzZt4WtVpUZJGXkanNRZJrYMNyshQ5l WxpdjEaiaV1bli8FamXebpJ2IAAAAAAASu1nyWXl3LOeweVRK7WfJZeXcs57B4FUAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAATW0qq/Ith1yeR8ZkRss6HDfBXD2RH9BjkXKYw5yLlOKY4FKZr4QtQ5Fs 5jQN1r5dMwpfVqxowqxM9XH/AG8Y4defNgh1Fu5itb1NHhGD+I12HFMbxNo39m/P5M08G2kcsvKa qUSBrhU+WXRE143cWIulvDPHLEi+ZU/5wdKmPeDJJwGWnVZxrMTMad3L35XixkNqtTHVwWI//v8A ghsJBw+ncwR6+bU7XamdRxTJHSu1Y90fjMgALrmgAAAAAAAAAACVpvlTuLuWme3nyqJWm+VO4u5a Z7efAqgAAAAAAASu0v6OyffVJ/EZcqiV2l/R2T76pP4jLlUAAAAAAAAAAAAAACV2s+Sy8u5Zz2Dy qJXaz5LLy7lnPYPAqgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADJvCX+gsh3lD9lFNZM82+ScCZ2Z 1GLHZqiSsSDGgrlU0vWIjM8OvovcnH6/6FfVxvht7Gx2fyRj4ngmftRHx5PD8GCHEZZVYdEfqa+r PcxMqulu4gJj+HFFX/k18wDwfLwpVEo1Ykq/VaXTJZsxDjQHzkyyCsR72qjkRXORFwkNvV1ZX+Bq vjHsjtlbfrSB7550Vu9grKTtLgnBxTPWes7/AB5/5VQJXxj2R2ytv1pA98eMeyO2Vt+tIHvlphqo Er4x7I7ZW360ge+PGPZHbK2/WkD3wKoEr4x7I7ZW360ge+eVUdsmz6nzj5aZuqn71iNVVhK6K1UV Ecio5iK1eCp1KfJtFfGUmPDkyztjrM+yN1+DM6jtts6WZAfJzE3UmRUVdUrLqiNThhV3itznPDGe rzcM0MDaLaD5CUmpm5aNJcphtiNhTc/BhxG5RF0uRXcHJnCp5lI65sd7d2tomVrPwzV6fHGbNimt Z6zGyrBK+MeyO2Vt+tIHvjxj2R2ytv1pA98lUVUStN8qdxdy0z28+PGPZHbK2/WkD3zz7SrdKr20 m5Jqh1ORqUsyk02G6LJx2RmNekaeVWqrVVM4VFx/FALoAAAAAAAErtL+jsn31SfxGXKoldpf0dk+ +qT+Iy5VAAAAAAAAAAAAAAAldrPksvLuWc9g8qiV2s+Sy8u5Zz2DwKoAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAA/BX6f8rUKpU3e7nlktEl95p1aNbVbnGUzjPVk/eD5MbxtL1S847RevjHNxta9aqNh Xlync5mZOI+XmpZYmGxERVa9iq1cLxTKLxTKIuFwdfUmpSdXpsvP0yYZMScduuHFZ1Kn/wCoqLlF ReKKiovEynbls4j13NwUKHvKhBho2YlWMTVMMbnDm4TKvROGFzlERE4oiOybZttCqNkTUVIUPllN j5WLJvfoTXjCPa7C6XcEReC5TgvUipjYsk6HJOPJ/LPhP7+b9J1+jx9qdHXW6TaM9Y2tXz9XP41n wnwn1ddA8K0LspF2yDpqizO93elI0J7VbEguVMojkX/lMplFVFwq4U902K2i0b1neH5zmw5MF5x5 azW0eMTykPFvOvwrXtmfrEeA+YZLNRUhMVEV7nORrUyvUmXJleOEzwXqPaP4mIMKZgRIExDZFgxW qx8N7Uc17VTCoqLwVFTzC0TMTEeJhtSuStssb1iY3jw3jrG/TdzjVdt1zVWO6VoUhKySx3MZARkN ZiOjspwRV6LlVcpjR1Ljr4nk1eibRLop87UrgbUGyEq18xFSdduIbFhQlXLYPDiqcEVrcKqrlfnK bfXrvs/Z9AiSjUlZeMrta0+nQWJEVyo3i5rcNaulWrlyplE4ZwYDde0O5rvqT4EGZmpeVmHLBg06 TcqI5r8N3btOFiqv8c8VXCIi4MXUbU5Zck2nyh+n8Fm+efSaDSVw44/rvvMzHq6/OY6zz5Ic6v2E zsnNbNqbCkdaLKuiQY7XrlWxNavXjhEwqPRyYzhFRMqqKpF7Ndj6RbfmJm6EfLTU81Ehwmw4bosC Fhc53jXI1zsovBEe3SnSRVchO7IqlNWNtKmrdq6aIc5ESTi4auN6irunplupWuzhOpFSIjl6j5pK X0t63vHK3L2PXH9Rg45ps2n0tt74Zi3LwttvE7ee28+/bzdKgA3X5SAAAAAAAAAAD8tVpsjV5CLI 1WSlp6Si43kvMwmxYb8KiplrkVFwqIv9UQn/ABcWR2Ntv1XA90qgBK+LiyOxtt+q4HujxcWR2Ntv 1XA90qgBK+LiyOxtt+q4HujxcWR2Ntv1XA90qgBK+LiyOxtt+q4HujxcWR2Ntv1XA90qgBK+LiyO xtt+q4HujxcWR2Ntv1XA90qgBK+LiyOxtt+q4HujxcWR2Ntv1XA90qgBK+LiyOxtt+q4HujxcWR2 Ntv1XA90qgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAzjaLsppV0MjTlPaynVjS5WxIbUbCjv VdWYqImVVVz0k49LK6sIho4I8mKuWvdvG8Lei12fQ5Yzae3dtH75+blS49j910WVSYZLwalD/bSQ c6I9nFEToK1HLnP7KLjCquD70SqbU6LIMkqfK3A2WZjQyLTnRtCIiIjWq9iqjUREw1OCfUdSApfR 1azvjtMOontnmzY/R6vBTJ7Y/wAc4c1c6tr32etepm/lHg3MzaLc27SuU+4JqHDxphcgeyGipnDt DWo3V0lTVjOFxk6zAtoJtG1skzD5h7WY8Fovi0eOsx1iNp+UOZqTsMuabZLxJ6PT5Bj3Yiw3xFiR Ybc4VcNRWquOKJq+rKp5thsPZnQ7QfDmoDHzlVa1UWcj9bcoiO0NTg1Fwv1uw5U1KilwCbDosOKd 4jn62bxHtNxDiFJx5L7VnpXlHsnrMe8PCm7QoM3ccGvTNMgxKtB06Y6qvW3g1ytzpVyeZyoqphML wTHugs2rFvGN2JizZMMzOO013jadp25T4x7AAHpEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAhZSWrFeuS6mtuqr02Wp9QhykCXk4MmrGsWTl4qqqxYD3 Kquiv/a+o9DmvV+3lyegp3wosz6RX531D/DpIqgJXmvV+3lyegp3wo5r1ft5cnoKd8KVQAlea9X7 eXJ6CnfCjmvV+3lyegp3wpVACV5r1ft5cnoKd8KOa9X7eXJ6CnfClUAJXmvV+3lyegp3wo5r1ft5 cnoKd8KVQAlea9X7eXJ6CnfCjmvV+3lyegp3wpVACV5r1ft5cnoKd8KfK11qUnelcpE/W56rS0Cn yU3CdOQoDHw3xYk016IsGHDRUVILOtF8/wBZXkrTfKncXctM9vPgVQAAAAAAAPKuyr837VrNZ3HK Pk6SjTm516N5u2Ofp1YXGdOM4XH1Hlcuvfs7bfr6P8GNrPksvLuWc9g8qgJXl179nbb9fR/gxy69 +ztt+vo/wZVACV5de/Z22/X0f4Mcuvfs7bfr6P8ABlUAJXl179nbb9fR/gxy69+ztt+vo/wZVACV 5de/Z22/X0f4Mcuvfs7bfr6P8GVQAleXXv2dtv19H+DHLr37O236+j/BlUAJXl179nbb9fR/gxy6 9+ztt+vo/wAGVQA8q0qvzgtWjVnccn+UZKDObnXr3e8Yj9OrCZxnGcJn6j1SV2TeSyze5ZL2DCqA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlbM+kV+d9Q/w6SKolbM+kV+d9Q/w6SKoAAAAAAAAAAAA AAErTfKncXctM9vPlUStN8qdxdy0z28+BVAAAAAAAAldrPksvLuWc9g8qiV2s+Sy8u5Zz2DyqAAA AAAAAAAAAAAAAAldk3kss3uWS9gwqiV2TeSyze5ZL2DCqAAAAAAAAAAAAAAAAAAAAAAAOdb626Vy 2tuaWi6BRIVAZOycGNNzLHtiQ4UVkJ0R6v3iNTGty5VMIiJnJqtm7VLKvKfdI27cEtMzqfNl4jHw IkTg5eg2I1qvwjXKunOE68ZQC1AAAAAARe12WuGZs2Mlpxo0KdhxGxIqS7lbGiQkRcthqnHVnSuE VFVEVEznCz2xHaHCr9Ng0SrzL1rku1UZEjORVm2JlUVF87mpwVFyqomrK9LEE6itcsYrct/Bq4+E 5cuhtrsUxaKztaI8Yjzn1fvz2q7M+kV+d9Q/w6SKolbM+kV+d9Q/w6SKonZQAAAAAAAAAAAAAErT fKncXctM9vPlUStN8qdxdy0z28+BVAAAAAAI/ardE/aNqrU6XJMmo2/ZCVYiOWHBauem5G4XGURv WnFyf0X8GzTaZTLvgQJSYeyUrytXXK8UbE0omXQ1XgqKnHTnUmHdaJqWGc9IyejmebRpwrVZNJOt pXfHE7Ttz226zHSPW9Xaz5LLy7lnPYPKoldrPksvLuWc9g8qiZnAAAErX9oVsW/dNNtur1Pk9aqO 75LL8niv3m8esNnSa1WplzVTiqY8/AqjKtoGynnVtbs27kmt3L0nHK4axMOdunrFgbtNCov6xzkf lU6OMYXiBV2LtCti++Xc1Kny/kWjlH+niwtGvVp+e1uc6HdWeoqjluubJarsv2S33P0KtVeFUflC FNyjqXNPa9ZKE5WtSMrWNXKMjRXvx0egxeCNXOdXftGuB+yHZ5Gpt11ttSbGqcvUY0OdjMiPiNiQ nsR78or8Q4rMLlURHY+tAOqprbHYkrOVyVj13TMUXVy9vJI67nTGbBXijMO/WPa3o5689WVKq1rh pd1UKVrNBmuV02Z1bqNu3Q9Wlysd0XIip0mqnFPMYUzwcd1G2g//ADMzMtrMkyDTo01Na4zoutkd 75l2645jwofFuVVivz0lRU2DZXanMjZ9RLedF30WTg/rno7U1Yr3K+JpXS3o63O05TOMZ4gVQAAA ACV2TeSyze5ZL2DCqJXZN5LLN7lkvYMKoAAAAAAAAAAAAAAAAAAAAAA4m2rS0Cc8L6FKzkGFHlo9 WpcOLCisRzIjFhy6K1yLwVFRVRUUbKJaBJ+F9GlZODCgS0CrVSHChQmI1kNjYcwiNaicEREREREP jtinfk7ws1nuTTM3yap0yNyeVh640XTDgLoY39py4wiedVQ9DYdJVG4fCgqVfk6XPQZGWqE/NTaT MPdvlEipGaxkVFXhE1ORNKKq8HeZqqgdZXpdVOtCjfKVW3zoSxGwWMgs1Pe9crhMqidSOXiqdX14 RYPx82x9grXoYX5ho9w0KmXFTXSFalGTUqrkfocqorXJ1KjkVFRetMovUqp1KpLeKOyP3J97j++V c1dRNv8AamNvW3uG5eD0w7a6l5vv/Tttt8YeD4+bY+wVr0ML8wePm2PsFa9DC/MPe8UdkfuT73H9 8eKOyP3J97j++Rd3Wfar+/cv+n7N/dZfl+Z4Pj5tj7BWvQwvzDHr8uKkz95Q7itNs7JzLojZiIyY gQ2tZGaqKj26XORcqmVRydeVyurCbDe9kbPrStyZq05Qt7u8NhQUnozXRoirhGoqv/qq4yqIirhc GebENn0C6pqZqVbg72jS+YKMSKrFixsIuF08dKNdnrTiretNSFPURqMlow2mJmefLo6Lg9+EaTBl 4lgpetK/VnvbfW36RG879PL/ALaXsNujnO+6pqPB3M7HnYM1Gaz/AG0zLQoPRVVzxWWc7C9SORMr xU1Ei9nEhK0qfvKn06C2BJStWhQ4MJucMb8nya+f61VVVetVVVXiqloa+Kt60iLzvL871+TBl1F7 6WncpPhEzvt+/l4AAJFQAAAAAAAAAAAx7aFfbrD2lTcdaak+yoUiTYico3SsWFGm1X9l2c71P+jY SGfRqZVtqdd+VabJT26otN3fKYDYujMeezjUi4zhP+kI8tb2rtjnaV3h+XTYs8W1ePv058onb2eC B/SD/ln7/wD4x+kH/LP3/wDxmtczbY7OUX/wwvdHM22OzlF/8ML3Sp6HV/eR8I/B0H0j2f8A+Hb+ +35mS/pB/wAs/f8A/GP0g/5Z+/8A+M1rmbbHZyi/+GF7o5m2x2cov/hhe6PQ6v7yPhH4H0j2f/4d v77fmZDMbfYUzAiQJi1GRYMVqsfDfPI5r2qmFRUWFhUVPMZTcdYp05WUqVv0yNRIu83yshzWtkN/ BUWFhjVZhUVetetMaUTB1pzNtjs5Rf8AwwvdMh251G3aDAbQ6DRaE2px2qszFZJw95KsVEwidHCO cirxzlqJnHSaqVdVhyxTvZbxtHqbvAOJ8Ptqow8P01q2t4/XmY26zMTMxy9nqjmkou1msTliVO26 rBhTizkm6SbOucqRWsejmvV/Wj3aXYReHFMu1cTouxKzNXDaNMqs/KckmZmHqfCRFROtURyZ46XI iOTr4OTivWuPSmzWVoGx66azWILY1bi0SZjwmxYapyL9S5zUajkykRFRFV3WiphOpVdvxb0ePNEd 7Lbp4Oe7SavhmTJOLQYtpi282jwnptEeW/s8OXiAAvOWAAB59w0mBXaBU6ROPislqhKxZSK6EqI9 rIjFaqtVUVM4VcZRT/PCgQJW5KVZtqMnNxOzNwTDIrt0rtzDmGycNj/MjuMOJwRc9HjjKKd/39K3 DO2nPy9lz8tT6+/d8mmZlqOhsxEar8orH9bEcnzV4qnV1mC7O/B8uOU2owLsvuqUidbDmn1F7ZN0 TXGmldqa5U0Q0aiPXXwynRRunCrgOmgAAAAAAASuybyWWb3LJewYVRK7JvJZZvcsl7BhVAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAGCbZKDdt23/KUyWpz1pUJqJKTO7xBaj2osR8SImcLlqpheOGtw1VX pbZQaVK0OjSdMkGaZaVhpDZlERXY63LhERXKuVVccVVVP3k9f9yQrUtWeqj1YsZjdEvDdj9ZFdwa mMoqpniqIudKOXzFauKuG18sz4trNxDPxLFp+H46xEV5REdZnrPr/GZ6vJsyrU7nhesn8oSnLI9Z asKBvm7yIjZCVa5WtzlcOhvRcdSscnmUtzljYVCnKltUk5xyvmHwmx5mZivfl2HMc3UqquXKr3t+ teOfrOpz5pM86ik3mNub1x/hNOE6munpfvfViZ9szPL5b+8ABaYYAAAAAAAAAABK03yp3F3LTPbz 5VErTfKncXctM9vPgVQAAAADwr7lqvN2jU4FtxtzVnw8QXo7QvWmpEd5nK3UiLwwqouU60zvZRsu j06fdcF4t31YWIsSDAiREi7t+crFe5FVHPVeKcVx1/O+bsIIL6el7xkt0+DT03FtRpdLfSYdoi88 52+tt5b+X76yldrPksvLuWc9g8qiV2s+Sy8u5Zz2DyqJ2YAAAAAAAAAAAAAAAAldk3kss3uWS9gw qiV2TeSyze5ZL2DCqAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABh/hO1XTIUWkMfBXeRHzcVuf1jdKa WLjPBq64nWnFW8OpTcDmrwl/p1Id2w/axSlxC01wTt1dR2Pw1y8VpNv6Ymfl+u6u8F2FAdbFemWw tMz8o8mfE1KutjYMN7eHUmFiv/7/AKY2chtltP8AkiLdVN3u+5HUZaX3mnTr0UyRbnGVxnHVkuSx gx+jx1p5MbimrnW6zLqJnfvTO3s6fCNoAASqAAAAAAAAAAABK03yp3F3LTPbz5VErTfKncXctM9v PgVQAAAAAAAJXaz5LLy7lnPYPKoldrPksvLuWc9g8qgAAAAAAAAAAAAAAAAJXZN5LLN7lkvYMKol dk3kss3uWS9gwqgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAc3eExBitvOmx3Q3pBfT2sbEVq6XObEi KqIvUqojm5T+KfWdIma7e7Y+XbNdPwG5naTqmG8fnQsfrU4qidSI7PFehhOsqa7HOTDMR7XQ9ltZ TR8Tx2yeFvq/Hw+exsLrESv0u4qpH3qxZipQt4+IxG64jJCUhvciJwwr2Oxj/pOo0o5I2RTVrQLj iy950ikTkpOMbDhzM/JsjbiIirp6Ts6GLqdlcdelVwiKp0b4uLI7G236rge6fdHnjNjieseKPtFw q3DNbakR9S3Ovlt5e7wVQJXxcWR2Ntv1XA90eLiyOxtt+q4HulphKoEr4uLI7G236rge6PFxZHY2 2/VcD3QKoGSbVaBa1o2qtTpdgWvNRt+yEqxKVCWHBauem5GtRcZRG9acXJ/RcapNiVu96g2p0m2q bJS6u0Mjy8nCk4ENqucnRwiK9G9JFVNTsIiLngVM2rjHbuVrM28nQcM4BfW4f4rLlrjxRO28z19n 4zDoi4dqlpURjtVUZPRtKPbCkP1yuRVx85F0IqcVwrkXH9UzkVzbcq9UN2yhy8GkQ0wrncJiI5eO Uy5ulG8U4ac5Tr44PBrtBte09/LT9TjVyuwtKLKSjN1LQYqY1MixF6T28cdDS7oqi6VXKS0vLTlx VyHLUyRYs1MuSHBlpZmlrURMIiZ8yInFzlVeCucqrlTLz6zPae7vtPlH4u84R2c4Vir6eaTesc+9 flHurO3L1zHsnZumzHa7U7lumWo1Vp0knKtW7jSyuZu9LHvXLXK7VnSidaY49ZtByjsmo0i/aYyh 3RR5afzv5d8vM6YkOFFYiqqq3i1+NDkwvDKo5OKIdCeLiyOxtt+q4Humjw/JfJimbzvO7jO12j02 k1ta6WndrNYnl4TznnHOfL9+M1RK03yp3F3LTPbz48XFkdjbb9VwPdPVoVt0O39/8g0am0zf6d7y KVZB3mnOnVpRM4yuM9WVLzlnqgAAAAAAAldrPksvLuWc9g8qjz7hpMCu0Cp0icfFZLVCViykV0JU R7WRGK1VaqoqZwq4yini816v28uT0FO+FAqgSvNer9vLk9BTvhRzXq/by5PQU74UCqBK816v28uT 0FO+FHNer9vLk9BTvhQKoErzXq/by5PQU74Uc16v28uT0FO+FAqgSvNer9vLk9BTvhRzXq/by5PQ U74UCqBK816v28uT0FO+FHNer9vLk9BTvhQKoErzXq/by5PQU74Uc16v28uT0FO+FAbJvJZZvcsl 7BhVHn29SYFBoFMpEm+K+Wp8rClIToqor3MhsRqK5UREzhEzhEPQAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAA5q2mbIajSZ+LOWtKRp6kv6W4h9ONLqqomnT857ePBUyqIi6urUvg2TtSuG191A3/wAo U1mG8lmlV2lqaUwx/W3DW4ROLUyq6VOsyTvHZ7b12aolSk91Or/9yWVIcb9nrXCo7g1E6SLhM4wZ mTQWrbv6e20+TudF2sxZ8UaXi+P0lftdffHn64mJ9svMtHaxbNfgIkxOMpU41uXwZ16Mb1JnTEXo uTK4TqcuFXSiFFzytjtHRf8A3QveMl/R8/mb7h/kH6Pn8zfcP8h6rl1kRtNIn3x+KDNoezl7zbHq rVjy7tp/8ta55Wx2jov/ALoXvHmVPabZ1NjtgzFelXvc3Wiy6OjtxlU+dDRyIvDqznq+tDOP0fP5 m+4f5B+j5/M33D/IfZy6zpjj4/q804f2difr6u0x6qzH/mXzuHb7FV7mW5R2NYjkVI0+5VVzccU3 bFTC58+teCdXHhnFbuy672n3ysWZnZrlGUbT5Nrt2rUVXoiQ2/O09eVyuGpleBvFE2LWpTJ9k1Fb O1DRhWwpyK10NHIqKiqjWt1dWMLlFRVyimhyMlK0+VZKyEtBlZaHnRCgw0YxuVyuETgnFVX/AJIv 4XUZv/tfaPKF2OP8H4bt9H6bvW+1b9d5+GznG1tiFenZqC+vug02SSJiKxIqRI7mImcs05bx6sqv DiuF4Iu8WhadItKQdK0WW3W80rGivcrokZyJhFcq/wDK4TCIqrhEyp7oLmDSYsHOsc/NzvFO0Ot4 p9XNbav2Y5R+vvYvbOz6vSG2mbr0zBgtpPKZmZZHSKi60io9GtRvztSbzjlETorhV4Z2gAkw4a4Y mK9Z3VOI8SzcRvS+bbetYrG3lG/z5gAJWeAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAP/9k= ------=_NextPart_000_0000_01C08299.69A67E20-- From thomas@xs4all.net Sat Jan 20 14:30:26 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 15:30:26 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <200101192206.RAA12072@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, Jan 19, 2001 at 05:06:03PM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> <200101192206.RAA12072@cj20424-a.reston1.va.home.com> Message-ID: <20010120153026.L17392@xs4all.nl> On Fri, Jan 19, 2001 at 05:06:03PM -0500, Guido van Rossum wrote: > So instead I added 5 lines to site.py, which tests for > os.name=='posix', then for sys.path[-1] ending in '/Modules' -- this > tests only succeeds when running from the build directory. Then it > calls distutils.util.get_platform() and uses the result to calculate > the correct directory name, which is then appended to sys.path. > Yes, this slows down startup (it imports a large portion of the > distutils package), but I don't care -- after all this is mostly for > me so I can play with the interpreter right after I've built it, > right? Right. The only downside (as far as I can tell) is that 'python -S' no longer works, in the build tree. I don't think that's that big a deal, but it should be documented somewhere, so we don't end up being boggled by it once we forget about it :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Sat Jan 20 16:18:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 11:18:39 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: Your message of "Fri, 19 Jan 2001 00:45:32 +0100." <20010119004532.G17392@xs4all.nl> References: <20010119004532.G17392@xs4all.nl> Message-ID: <200101201618.LAA15675@cj20424-a.reston1.va.home.com> > On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > > > filename = '/tmp/delete_me' > > This reminds me: we need a portable way to handle test-files :) Yeah, I noticed that this test failed on Windows -- fixed now. The test_support module exports TESTFN; there's also tempfile.mktemp() which should generate temporary files on all platforms. Is that enough? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Sat Jan 20 16:36:05 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 17:36:05 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201618.LAA15675@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Jan 20, 2001 at 11:18:39AM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> Message-ID: <20010120173605.P17295@xs4all.nl> On Sat, Jan 20, 2001 at 11:18:39AM -0500, Guido van Rossum wrote: > > On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > > > > > filename = '/tmp/delete_me' > > > > This reminds me: we need a portable way to handle test-files :) > Yeah, I noticed that this test failed on Windows -- fixed now. > The test_support module exports TESTFN; there's also tempfile.mktemp() > which should generate temporary files on all platforms. > Is that enough? Well, there is one more issue, which we can't fix terribly easy: test_fcntl tries to flock() the file. flock() doesn't work on all filesystems (like NFS) :P If we cared a lot, we could try several alternatives (current dir, /tmp, /var/tmp) in the specific case of flock, but personally I don't want to bother, and real sysadmins (who should care about the test failure) are more likely to build Python on a local disk than in their NFS-mounted homedirectory. At least that's how we do it :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Sat Jan 20 16:43:49 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 11:43:49 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 In-Reply-To: Your message of "Sat, 20 Jan 2001 01:30:42 EST." References: Message-ID: <200101201643.LAA16269@cj20424-a.reston1.va.home.com> > I had a huge string and wanted to put a double-quote on each end. The > boring: > > '"' + huge + '"' > > does the job, but is inefficent . Then this transparent variation > sprang unbidden from my hoary brow: > > huge.join('""') Points off for obscurity though! My favorite for this is: '"%s"' % huge Worth a microbenchmark? > *That* should put to rest the argument over whether .join() is more properly > a method of the separator or the sequence -- '""'.join(huge) instead would > look plain silly . > > not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim Give up the channeling for a while -- there's too much interference in the air from the Microsoft threaded stdio debate still. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Sat Jan 20 16:47:44 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 20 Jan 2001 10:47:44 -0600 (CST) Subject: [Python-Dev] how to test my __all__ lists? Message-ID: <14953.49456.654121.987189@beluga.mojam.com> How do I test the __all__ lists I'm building? I'm worried about a couple things: 1. I may have typos 2. I may leave something out of a list that should be imported by from-module-import-*. Thoughts? Skip From guido@digicool.com Sat Jan 20 17:00:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 12:00:05 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: Your message of "Sat, 20 Jan 2001 17:36:05 +0100." <20010120173605.P17295@xs4all.nl> References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> Message-ID: <200101201700.MAA16491@cj20424-a.reston1.va.home.com> > > > > filename = '/tmp/delete_me' > > > > > > This reminds me: we need a portable way to handle test-files :) > > Yeah, I noticed that this test failed on Windows -- fixed now. > > > The test_support module exports TESTFN; there's also tempfile.mktemp() > > which should generate temporary files on all platforms. > > Is that enough? > > Well, there is one more issue, which we can't fix terribly easy: test_fcntl > tries to flock() the file. flock() doesn't work on all filesystems (like > NFS) :P If we cared a lot, we could try several alternatives (current dir, > /tmp, /var/tmp) in the specific case of flock, but personally I don't want to > bother, and real sysadmins (who should care about the test failure) are more > likely to build Python on a local disk than in their NFS-mounted > homedirectory. At least that's how we do it :-) These days, I would think that it's a pretty sure bet that the system's tmp directory is not on NFS. Then we could just use tempfile.mktemp() in that module, right? Or does the /tmp filesystem on Linux (which AFAIK is a RAM disk implemented in virtual memory so it uses swap space when it runs out of RAM) not support locking? I don't particularly care about fixing this -- I haven't seen bug reports about this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sat Jan 20 17:38:38 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 12:38:38 -0500 Subject: [Python-Dev] how to test my __all__ lists? In-Reply-To: Your message of "Sat, 20 Jan 2001 10:47:44 CST." <14953.49456.654121.987189@beluga.mojam.com> References: <14953.49456.654121.987189@beluga.mojam.com> Message-ID: <200101201738.MAA16636@cj20424-a.reston1.va.home.com> > How do I test the __all__ lists I'm building? I'm worried about a couple > things: > > 1. I may have typos Do "from M import *" -- this will raise an AttributeError if there's something in __all__ that's not defined in the module. > 2. I may leave something out of a list that should be imported by > from-module-import-*. That's what alpha-testing's for. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@netaxs.com Sat Jan 20 17:49:43 2001 From: esr@netaxs.com (Eric Raymond) Date: Sat, 20 Jan 2001 12:49:43 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <3A672376.4B951848@lemburg.com>; from M.-A. Lemburg on Thu, Jan 18, 2001 at 06:10:14PM +0100 References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> Message-ID: <20010120124943.C6073@unix3.netaxs.com> > A combination of time.time(), process id and counter should > work in all cases. Make sure you use a lock around the counter, > though. Yes, but...this hack has to work in a multithreaded environment, so process ID isn't good enough. And I don't want to keep a counter around if I don't have to. -- Eric S. Raymond From guido@digicool.com Sat Jan 20 18:01:04 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 13:01:04 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Your message of "Sat, 20 Jan 2001 12:49:43 EST." <20010120124943.C6073@unix3.netaxs.com> References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> <20010120124943.C6073@unix3.netaxs.com> Message-ID: <200101201801.NAA16880@cj20424-a.reston1.va.home.com> > > A combination of time.time(), process id and counter should > > work in all cases. Make sure you use a lock around the counter, > > though. > > Yes, but...this hack has to work in a multithreaded environment, > so process ID isn't good enough. And I don't want to keep a counter > around if I don't have to. Sorry Eric, this just doesn't make sense. Keeping a counter around in your module (protected by a semaphore) is obviously the right solution. Why are you fighting it? --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@netaxs.com Sat Jan 20 18:20:26 2001 From: esr@netaxs.com (Eric Raymond) Date: Sat, 20 Jan 2001 13:20:26 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <200101201801.NAA16880@cj20424-a.reston1.va.home.com>; from Guido van Rossum on Sat, Jan 20, 2001 at 01:01:04PM -0500 References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> <20010120124943.C6073@unix3.netaxs.com> <200101201801.NAA16880@cj20424-a.reston1.va.home.com> Message-ID: <20010120132026.E6073@unix3.netaxs.com> On Sat, Jan 20, 2001 at 01:01:04PM -0500, Guido van Rossum wrote: > > Yes, but...this hack has to work in a multithreaded environment, > > so process ID isn't good enough. And I don't want to keep a counter > > around if I don't have to. > > Sorry Eric, this just doesn't make sense. Keeping a counter around in > your module (protected by a semaphore) is obviously the right > solution. Why are you fighting it? Actually, I'm not fighting it any more. I changed my mind a few minutes after shipping that response. -- Eric S. Raymond From thomas@xs4all.net Sat Jan 20 18:37:10 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 19:37:10 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201700.MAA16491@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Jan 20, 2001 at 12:00:05PM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> Message-ID: <20010120193710.Q17295@xs4all.nl> On Sat, Jan 20, 2001 at 12:00:05PM -0500, Guido van Rossum wrote: > > Well, there is one more issue, which we can't fix terribly easy: test_fcntl > > tries to flock() the file. flock() doesn't work on all filesystems (like > > NFS) :P If we cared a lot, we could try several alternatives (current dir, > > /tmp, /var/tmp) in the specific case of flock, but personally I don't want to > > bother, and real sysadmins (who should care about the test failure) are > > more likely to build Python on a local disk than in their NFS-mounted > > homedirectory. At least that's how we do it :-) > These days, I would think that it's a pretty sure bet that the > system's tmp directory is not on NFS. Then we could just use > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > it uses swap space when it runs out of RAM) not support locking? Actually, most Linux distributions don't care enough about /tmp to make it a RAM-based filesystem. At least Debian and RedHat don't :) (There's a good reason for that: Linux's disk-data cache rocks if you have enough RAM, so there's no real gain in using a ramdisk) BSDI does (optionally) have such a /tmp, and probably the other BSD derived systems as well. But that doesn't mean it doesn't support locking, so that's not a real excuse. But like I said, I don't care enough to worry about it. I'll look at it before alpha2. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Sat Jan 20 20:10:51 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 20 Jan 2001 15:10:51 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Message-ID: [Tim] > ... > 4. We've again given up on avoiding surprises in *simple* comparisons > among builtin types, like (under current CVS): > > >>> 1 < [1] < 0L < 1 > 1 > >>> 1 < 1 > 0 > >>> I really dislike that. Here's a consequence at a higher level: N = 5 x = [1 for i in range(N)] + \ [[1] for i in range(N)] + \ [0L for i in range(N)] x.sort() print x from random import shuffle tries = failures = 0 while failures < 5: tries += 1 y = x[:] shuffle(y) y.sort() if x != y: print "oops, on try number", tries print y failures += 1 and here's a typical run (2.1a1): [1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L, 0L] oops, on try number 3 [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] oops, on try number 5 [[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]] oops, on try number 6 [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] oops, on try number 7 [[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]] oops, on try number 8 [0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L] I've often used list.sort() on a heterogeneous list simply to bring the elements of the same type next to each other. But as "try number 5" shows, I can no longer rely on even getting all the lists together. Indeed, heterogenous list.sort() has become a very bad (biased and slow) implementation of random.shuffle() . Under 2.0, the program never prints "oops", because the only violations of transitivity in 2.0's ordering of builtin types were bugs in the implementation (none of which show up in this simple test case); 2.0's .sort() *always* produces [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] The base trick in 2.0 was sound: when falling back to the "compare by name of the type" last resort, treat all numeric types as if they had the same name. While Python can't enforce that any user-defined __cmp__ is consistent, I think it should continue to set a good example in the way it implements its own comparisons. grumblingly y'rs - tim From skip@mojam.com (Skip Montanaro) Sat Jan 20 20:42:27 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 20 Jan 2001 14:42:27 -0600 (CST) Subject: [Python-Dev] should a module's thread safety be documented? Message-ID: <14953.63539.629197.232848@beluga.mojam.com> A bit late for 2.1alpha1, but it just occurred to me that perhaps there should be an annotation in the documentation that indicates whether or not a module is thread-safe. For example, many functions in fileinput rely on a module global called _state. It strikes me that this module is not likely to be thread-safe, yet the documentation doesn't appear to mention this, certainly not in an obvious fashion. Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of LaTex macros in Fred's arsenal? This would make documenting these properties both easy and consistent across modules. Skip From tim.one@home.com Sat Jan 20 21:13:41 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 20 Jan 2001 16:13:41 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 In-Reply-To: <200101201643.LAA16269@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > huge.join('""') [Guido] > Points off for obscurity though! The Subject line was "Stupid Python Tricks" for a reason . Those who don't know the language inside-out should be tickled by figuring out why it even *works* (hint for the baffled: you have to view '""' as a sequence rather than as an atomic string). > My favorite for this is: > > '"%s"' % huge > > Worth a microbenchmark? Absolutely! I get: obvious 15.574 obscure 8.165 sprintf 8.133 after running: ITERS = 1000 indices = [0] * ITERS def obvious(huge): for i in indices: '"' + huge + '"' def obscure(huge): for i in indices: huge.join('""') def sprintf(huge): for i in indices: '"%s"' % huge def runtimes(huge): from time import clock for f in obvious, obscure, sprintf: start = clock() f(huge) finish = clock() print "%12s %7.3f" % (f.__name__, finish - start) runtimes("x" * 1000000) under current 2.1a1. Not a dead-quiet machine, but the difference is too small to care. Speed up huge.join attr lookup, and it would probably be faster . Hmm: if I boost ITERS high enough and cut back the size of huge, "obscure" eventually becomes *slower* than "obvious", and even if the "huge.join" lookup is floated out of the loop. I guess that points to the relative burden of calling a bound method. So, in real life, the huge.join approach may well be the slowest! >> not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim > Give up the channeling for a while -- there's too much interference in > the air from the Microsoft threaded stdio debate still. :-) What debate? You need two arguably valid points of view for a debate to even start . gloating-in-victory-vicious-in-defeat-but-simply-unbearable-in- ambiguity-ly y'rs - tim From fdrake@acm.org Sat Jan 20 21:23:58 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 20 Jan 2001 16:23:58 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201700.MAA16491@cj20424-a.reston1.va.home.com> References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> Message-ID: <14954.494.223724.705495@cj42289-a.reston1.va.home.com> Guido van Rossum writes: > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > it uses swap space when it runs out of RAM) not support locking? I thought it was Solaris that used available+virtual memory for /tmp; that was what we ran into at CNRI. (Which doesn't preclude Linux from doing the same, I just don't recall that we've encountered that.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Sat Jan 20 22:05:27 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 20 Jan 2001 17:05:27 -0500 (EST) Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14953.63539.629197.232848@beluga.mojam.com> References: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > should be an annotation in the documentation that indicates whether or not a > module is thread-safe. For example, many functions in fileinput rely on a If you can create a list of the known thread safe and known thread unsafe modules, I'll come up with appropriate annotations for the documentation. > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > LaTex macros in Fred's arsenal? This would make documenting these > properties both easy and consistent across modules. Not sure that this is exactly the right approach to the markup; I'll think about this one. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip@mojam.com (Skip Montanaro) Sat Jan 20 22:31:52 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 20 Jan 2001 16:31:52 -0600 (CST) Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> References: <14953.63539.629197.232848@beluga.mojam.com> <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> Message-ID: <14954.4568.460875.662560@beluga.mojam.com> Fred> If you can create a list of the known thread safe and known thread Fred> unsafe modules, I'll come up with appropriate annotations for the Fred> documentation. I think that's going to be a significant undertaking, requiring examination of a lot of Python and C code. I'd rather approach it incrementally, which was why I suggested the LaTeX macros. As modules are determined to be safe or unsafe, the appropriate safety macro could just be inserted into the correct lib*.tex file. It would (in my mind) expand to a stock bit of text inserted at a standard place in the file. Skip From tim.one@home.com Sat Jan 20 22:52:09 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 20 Jan 2001 17:52:09 -0500 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: [Skip Montanaro] > ... > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to > the litany of LaTex macros in Fred's arsenal? This would make > documenting these properties both easy and consistent across > modules. When a module is *not* threadsafe, that's usually considered "a bug" in the module. So we should just point out modules that aren't threadsafe by design. Alas, that's A Project. From nas@arctrix.com Sat Jan 20 15:59:14 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sat, 20 Jan 2001 07:59:14 -0800 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sat, Jan 20, 2001 at 03:10:51PM -0500 References: Message-ID: <20010120075914.B18840@glacier.fnational.com> On Sat, Jan 20, 2001 at 03:10:51PM -0500, Tim Peters wrote: > While Python can't enforce that any user-defined __cmp__ is consistent, I > think it should continue to set a good example in the way it implements its > own comparisons. I think the 2.0 behavior should be fairly easy to restore. I'll leave it up to Guido though since he's "Mr. Comparison" now and I haven't looked at the code since I checked in the coercion patch. Neil From nas@arctrix.com Sat Jan 20 16:03:36 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sat, 20 Jan 2001 08:03:36 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <14954.494.223724.705495@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Sat, Jan 20, 2001 at 04:23:58PM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> <14954.494.223724.705495@cj42289-a.reston1.va.home.com> Message-ID: <20010120080336.C18840@glacier.fnational.com> On Sat, Jan 20, 2001 at 04:23:58PM -0500, Fred L. Drake, Jr. wrote: > > Guido van Rossum writes: > > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > > it uses swap space when it runs out of RAM) not support locking? > > I thought it was Solaris that used available+virtual memory for > /tmp; that was what we ran into at CNRI. (Which doesn't preclude > Linux from doing the same, I just don't recall that we've encountered > that.) I don't know of any Linux system that uses a RAM based /tmp. The Linux implemention of ext2 is so fast it doesn't make any sense. If you have enough memory all the data is stored in the buffer, page, and inode caches anyhow. Neil From trentm@ActiveState.com Sat Jan 20 23:35:56 2001 From: trentm@ActiveState.com (Trent Mick) Date: Sat, 20 Jan 2001 15:35:56 -0800 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? Message-ID: <20010120153556.C18375@ActiveState.com> ... or am I missing something? With Python 2.0 on Windows 2000, when playing with sys.exit() and sys.argv() I get some unexpected results. First here is a simple case that shows what I expect. I run "caller_good.py" which call "callee_good.py" and prints its return value. "callee_good.py" returns 42 so "42" is printed: ----------------- caller_good.py -------------------- import os retval = os.system("python callee_good.py") print "caller: the retval is", retval ----------------------------------------------------- ----------------- callee_good.py -------------------- import sys sys.exit(42) ----------------------------------------------------- D:\trentm\tmp>python caller_good.py caller: the retval is 42 Now here is what I didn't expect. I changed "caller_bad.py" to pass, as an argument, the value that "callee_bad.py" should return. ----------------- caller_bad.py --------------------- import os retval = os.system("python callee_bad.py 42") print "caller: the retval is", retval ----------------------------------------------------- ----------------- callee_bad.py --------------------- import sys firstarg = sys.argv[1] print "callee_bad: firstarg is", firstarg sys.exit(firstarg) ----------------------------------------------------- D:\trentm\tmp>python caller_bad.py callee_bad: firstarg is 42 42 # <---- where did *this* print come from? caller: the retval is 1 # <---- and this retval is incorrect Any ideas? I have not tried to track this down yet nor have I tried the latest Python-CVS state. Trent -- Trent Mick TrentM@ActiveState.com From moshez@zadka.site.co.il Sun Jan 21 12:37:57 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Sun, 21 Jan 2001 14:37:57 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,NONE,1.1 In-Reply-To: References: Message-ID: <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> Yay! I can change to python-dev manually! (hear sounds of the timbot's teeth grinding) On Sat, 20 Jan 2001, Skip Montanaro wrote: > def check_all(_modname): > exec "import %s" % _modname > verify(hasattr(sys.modules[_modname],"__all__"), > "%s has no __all__ attribute" % _modname) > exec "del %s" % _modname > exec "from %s import *" % _modname > > _keys = locals().keys() .... Wouldn't it be better to use the d = {} exec "foo", d And verify "d" instead? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido@digicool.com Sun Jan 21 16:51:45 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 21 Jan 2001 11:51:45 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sat, 20 Jan 2001 15:10:51 EST." References: Message-ID: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> [Tim, complaining that numerical types are no longer lumped together in default comparisons:] > I've often used list.sort() on a heterogeneous list simply to bring the > elements of the same type next to each other. But as "try number 5" shows, > I can no longer rely on even getting all the lists together. Indeed, > heterogenous list.sort() has become a very bad (biased and slow) > implementation of random.shuffle() . > > Under 2.0, the program never prints "oops", because the only violations of > transitivity in 2.0's ordering of builtin types were bugs in the > implementation (none of which show up in this simple test case); 2.0's > .sort() *always* produces > > [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] > > The base trick in 2.0 was sound: when falling back to the "compare by name > of the type" last resort, treat all numeric types as if they had the same > name. > > While Python can't enforce that any user-defined __cmp__ is consistent, I > think it should continue to set a good example in the way it implements its > own comparisons. I think I can put this behavior back. (I believe that before I reorganized the comparison code, it seemed really tricky to do this, but after refactoring the code, it's quite easy to do.) My only concern is that under the old schele, two different numeric extension types that somehow can't be compared will end up being *equal*. To fix this, I propose that if the names compare equal, as a last resort we compare the type pointers -- this should be consistent too. Here's a patch that stops your test program from reporting failures: *** object.c 2001/01/21 16:25:18 2.112 --- object.c 2001/01/21 16:50:16 *************** *** 522,527 **** --- 522,528 ---- default_3way_compare(PyObject *v, PyObject *w) { int c; + char *vname, *wname; if (v->ob_type == w->ob_type) { /* When comparing these pointers, they must be cast to *************** *** 550,557 **** } /* different type: compare type names */ ! c = strcmp(v->ob_type->tp_name, w->ob_type->tp_name); ! return (c < 0) ? -1 : (c > 0) ? 1 : 0; } #define CHECK_TYPES(o) PyType_HasFeature((o)->ob_type, Py_TPFLAGS_CHECKTYPES) --- 551,571 ---- } /* different type: compare type names */ ! if (v->ob_type->tp_as_number) ! vname = ""; ! else ! vname = v->ob_type->tp_name; ! if (w->ob_type->tp_as_number) ! wname = ""; ! else ! wname = w->ob_type->tp_name; ! c = strcmp(vname, wname); ! if (c < 0) ! return -1; ! if (c > 0) ! return 1; ! /* Same type name, or (more likely) incomparable numeric types */ ! return (v->ob_type < w->ob_type) ? -1 : 1; } #define CHECK_TYPES(o) PyType_HasFeature((o)->ob_type, Py_TPFLAGS_CHECKTYPES) Let me know if you agree with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sun Jan 21 17:00:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 21 Jan 2001 12:00:02 -0500 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: Your message of "Sat, 20 Jan 2001 14:42:27 CST." <14953.63539.629197.232848@beluga.mojam.com> References: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: <200101211700.MAA25479@cj20424-a.reston1.va.home.com> > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > should be an annotation in the documentation that indicates whether or not a > module is thread-safe. For example, many functions in fileinput rely on a > module global called _state. It strikes me that this module is not likely > to be thread-safe, yet the documentation doesn't appear to mention this, > certainly not in an obvious fashion. > > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > LaTex macros in Fred's arsenal? This would make documenting these > properties both easy and consistent across modules. It's hard to say whether a *whole module* is threadsafe. E.g. in the fileinput example, there's the clear implication that if you use this in multiple threads, you should instantiate your own FileInput instances, and then you're totally thread-safe. Clearly the semantics of the module-global functions are thread-unsafe though. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Sun Jan 21 18:45:07 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 13:45:07 -0500 Subject: [Python-Dev] test_sax failing (Windows) Message-ID: test test_sax crashed -- exceptions.SystemError: 'finally' pops bad exception Sometimes it crashes (some flavor of memory fault) instead. Elsewhere? From nas@arctrix.com Sun Jan 21 12:28:35 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sun, 21 Jan 2001 04:28:35 -0800 Subject: [Python-Dev] autoconf --enable vs. --with Message-ID: <20010121042835.A19774@glacier.fnational.com> I've been working a bit on the build process lately. I came across this in the autoconf documentation: If a software package has optional compile-time features, the user can give `configure' command line options to specify whether to compile them. The options have one of these forms: --enable-FEATURE[=ARG] --disable-FEATURE Some packages require, or can optionally use, other software packages which are already installed. The user can give `configure' command line options to specify which such external software to use. The options have one of these forms: --with-package[=ARG] --without-package Is it worth fixing the Python configure script to comply with these definitions? It looks like with-cycle-gc and mybe with-pydebug would have to be changed. Neil AC_ARG_ENABLE From tim.one@home.com Sun Jan 21 19:44:38 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 14:44:38 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> Message-ID: [Guido, on again lumping numbers together] > I think I can put this behavior back. (I believe that before I > reorganized the comparison code, it seemed really tricky to do this, > but after refactoring the code, it's quite easy to do.) I can believe that; and I believe the "bugs" in 2.0 ended up somewhere in or around the bowels of the xxxHalfBinOp-like routines (which were really tricky to my eyes -- the interactions among coercions and comparisons were hard to keep straight). > My only concern is that under the old schele, two different numeric > extension types that somehow can't be compared will end up being > *equal*. To fix this, I propose that if the names compare equal, as a > last resort we compare the type pointers -- this should be consistent > too. Agreed, and sounds fine! Save Barry a little work, though: > ! /* Same type name, or (more likely) incomparable numeric types */ > ! return (v->ob_type < w->ob_type) ? -1 : 1; That's non-std C in a way Insure complains about elsewhere; change to return ((Py_uintptr_t)v->ob_type < (Py_uintptr_t)w->ob_type) ? -1 : 1; if-vendors-stuck-to-the-letter-of-the-c-std-python-wouldn't- compile-at-all-ly y'rs - tim From trentm@ActiveState.com Sun Jan 21 20:01:44 2001 From: trentm@ActiveState.com (Trent Mick) Date: Sun, 21 Jan 2001 12:01:44 -0800 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? In-Reply-To: <20010120153556.C18375@ActiveState.com>; from trentm@ActiveState.com on Sat, Jan 20, 2001 at 03:35:56PM -0800 References: <20010120153556.C18375@ActiveState.com> Message-ID: <20010121120144.B28643@ActiveState.com> On Sat, Jan 20, 2001 at 03:35:56PM -0800, Trent Mick wrote: > > ... or am I missing something? Ignore me. RTFM (sys.exit), Trent. Sorry, Trent -- Trent Mick TrentM@ActiveState.com From tim.one@home.com Sun Jan 21 20:13:02 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 15:13:02 -0500 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? In-Reply-To: <20010121120144.B28643@ActiveState.com> Message-ID: [Trent, quoting Trent] >> >> ... or am I missing something? [and back to Trent] > Ignore me. RTFM (sys.exit), Trent. Nobody wants to ignore *you*, Trent! If it's not the case that you wanted to code sys.exit(int(firstarg)) instead, holler, cuz if that wasn't the problem I'm still baffled. or-if-it-was-it-caught-you-because-sys.exit's-tricks-aren't- really-pythonic-ly y'rs - tim From loewis@informatik.hu-berlin.de Sun Jan 21 21:21:24 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:21:24 +0100 (MET) Subject: [Python-Dev] test_sax failing (Windows) Message-ID: <200101212121.WAA16327@pandora.informatik.hu-berlin.de> > Elsewhere? Not for me, on neither Solaris nor Linux. What expat version? Regards, Martin From loewis@informatik.hu-berlin.de Sun Jan 21 21:22:44 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:22:44 +0100 (MET) Subject: [Python-Dev] autoconf --enable vs. --with Message-ID: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> > It looks like with-cycle-gc and mybe with-pydebug would have to be > changed. I'm in favour of changing it. Regards, Martin From loewis@informatik.hu-berlin.de Sun Jan 21 21:34:08 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:34:08 +0100 (MET) Subject: [Python-Dev] test___all__ fails with no bsddb Message-ID: <200101212134.WAA16446@pandora.informatik.hu-berlin.de> On my Solaris 2.6 installation, with no bsddb module, I get test test___all__ failed -- dbhash has no __all__ attribute This is caused by anydbm importing dbhash first. After that fails, dbhash is still in sys.modules, and the next import of dbhash silently loads an incomplete module. Regards, Martin From tim.one@home.com Sun Jan 21 21:38:11 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 16:38:11 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212121.WAA16327@pandora.informatik.hu-berlin.de> Message-ID: [Martin von Loewis] > Not for me, on neither Solaris nor Linux. What expat version? Tell me how to answer the question, and I'll be happy to (I have no idea what any of this stuff is or does). My pyexpat.c (well, my *everything*) is current CVS, pyexpat.c in particular is revision 2.33. xmltok.dll and xmlparse.dll were obtained from ftp://ftp.jclark.com/pub/xml/expat.zip for the 2.0 release. Is any of that relevant? The tests passed in the wee hours (EST; UTC -0500) this morning. They began failing after I updated around 1pm EST today. From thomas@xs4all.net Sun Jan 21 21:54:05 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 21 Jan 2001 22:54:05 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 02:44:38PM -0500 References: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> Message-ID: <20010121225405.M17392@xs4all.nl> On Sun, Jan 21, 2001 at 02:44:38PM -0500, Tim Peters wrote: > > ! /* Same type name, or (more likely) incomparable numeric types */ > > ! return (v->ob_type < w->ob_type) ? -1 : 1; > That's non-std C in a way Insure complains about elsewhere; change to > return ((Py_uintptr_t)v->ob_type < > (Py_uintptr_t)w->ob_type) ? -1 : 1; Why is comparing v->ob_type with w->ob_type illegal ? They're both pointers to the same type, aren't they ? > if-vendors-stuck-to-the-letter-of-the-c-std-python-wouldn't- > compile-at-all-ly y'rs - tim That's easy to check, gcc has these nice (and from a users point of view, fairly useless) options: '-ansi', '-pedantic' and '-pedantic-errors'. '-ansi' disables some GCC-specific features, -pedantic turns gcc into a whiney pedantic I'm sure you'd get along with just fine , and -pedantic-errors turns those whines into errors. Doing a quick check I see one error I added myself (but haven't commited) in the continue-inside-try patch (a trailing comma in an enumerator definition), and one error in configure (it mis-detects the arguments to setpgrp() in strict-ANSI mode, for some reason.) I don't see any errors in the core Python. I see an error in the nis module (missing function prototype, and broken system-include file) and a *lot* of errors in linuxaudiodev, but nothing else in the set of modules I can compile. Not bad! Note that this was tested in a current tree. I couldn't find either Guido's 'broken' code or your proposed 'good' code, so I don't know if you checked in a fix yet. If you didn't, don't bother, it's not broken :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis@informatik.hu-berlin.de Sun Jan 21 22:00:47 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 23:00:47 +0100 (MET) Subject: [Python-Dev] Re: test_sax failing (Windows) In-Reply-To: References: Message-ID: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> > [Martin von Loewis] > > Not for me, on neither Solaris nor Linux. What expat version? > > Tell me how to answer the question, and I'll be happy to (I have no idea > what any of this stuff is or does). > > My pyexpat.c (well, my *everything*) is current CVS, pyexpat.c in > particular is revision 2.33. That's good; mine too. > xmltok.dll and xmlparse.dll were obtained from > > ftp://ftp.jclark.com/pub/xml/expat.zip > > for the 2.0 release. > > Is any of that relevant? That gives some clue, yes. Unfortunately, that URL itself is a symlink that was expat1_1.zip (157936 bytes) at some point, and now is expat1_2.zip (153591 bytes). The files themselves are not self-identifying, it's hard to tell once unzipped... Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either works for me. I never tested 1.95.x (which is also not available from jclark.com). > The tests passed in the wee hours (EST; UTC -0500) this morning. > They began failing after I updated around 1pm EST today. I just merged pyexpat changes from PyXML into Python 2 so that could be the cause. However, this very code has been used for some time by PyXML users, why it crashes for you is a mystery to me. Any chance of producing a C backtrace? Regards, Martin From tim.one@home.com Sun Jan 21 22:09:30 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 17:09:30 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: Message-ID: FYI, under the debug-build Python, running test_sax.py under the debugger dies like so: Passed test_attrs_empty Passed test_attrs_wattr Passed test_escape_all Passed test_escape_basic Passed test_escape_extra Passed test_expat_attrs_empty Passed test_expat_attrs_wattr Passed test_expat_dtdhandler Passed test_expat_entityresolver Passed test_expat_file Traceback (most recent call last): File "../lib/test/test_sax.py", line 603, in ? confirm(value(), name) File "../lib/test/test_sax.py", line 435, in test_expat_incomplete parser.parse(StringIO("")) File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 42, in parse xmlreader.IncrementalParser.parse(self, source) File "c:\code\python\dist\src\lib\xml\sax\xmlreader.py", line 122, in parse self.close() File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 91, in close self.feed("", isFinal = 1) File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 82, in feed except expat.error: SystemError: 'finally' pops bad exception Running it from a command line instead produces the same output up to but not including the traceback, and Python crashes with a memory fault then. Attaching to the process with a debugger at that point shows it trying to do _Py_Dealloc on an op whose op->op_type member is NULL. Here's the call stack at that point: _Py_Dealloc(_object * 0x007af100) line 1304 + 6 bytes insertdict(dictobject * 0x007637ec, _object * 0x007a8270, long -1601350627, _object * 0x1e1eff18 __Py_NoneStruct) line 364 + 48 bytes PyDict_SetItem(_object * 0x007637ec, _object * 0x007a8270, _object * 0x1e1eff18 __Py_NoneStruct) line 498 + 21 bytes PyDict_SetItemString(_object * 0x007637ec, char * 0x1e1d84fc, _object * 0x1e1eff18 __Py_NoneStruct) line 1272 + 17 bytes PySys_SetObject(char * 0x1e1d84fc, _object * 0x1e1eff18 __Py_NoneStruct) line 67 + 17 bytes reset_exc_info(_ts * 0x00760630) line 2207 + 17 bytes eval_code2(PyCodeObject * 0x00993df0, _object * 0x0098794c, _object * 0x00000000, _object * * 0x007a9d28, int 2, _object * * 0x007a9d30, int 1, _object * * 0x009a0b60, int 1) line 2125 + 9 bytes fast_function(_object * 0x009a4f6c, _object * * * 0x0063f5a0, int 4, int 2, int 1) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x00993910, _object * 0x0098794c, _object * 0x00000000, _object * * 0x007a05e8, int 1, _object * * 0x007a05ec, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes fast_function(_object * 0x009a549c, _object * * * 0x0063f738, int 1, int 1, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x007b35e0, _object * 0x0098110c, _object * 0x00000000, _object * * 0x009beb10, int 2, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes call_eval_code2(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2765 + 57 bytes call_object(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2594 + 17 bytes call_method(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2717 + 17 bytes call_object(_object * 0x007e125c, _object * 0x009beafc, _object * 0x00000000) line 2592 + 17 bytes do_call(_object * 0x007e125c, _object * * * 0x0063f96c, int 2, int 0) line 2915 + 17 bytes eval_code2(PyCodeObject * 0x00991560, _object * 0x0098794c, _object * 0x00000000, _object * * 0x009bce98, int 2, _object * * 0x009bcea0, int 0, _object * * 0x00000000, int 0) line 1863 + 30 bytes fast_function(_object * 0x009a7dfc, _object * * * 0x0063fb04, int 2, int 2, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x009f7e00, _object * 0x0076f14c, _object * 0x00000000, _object * * 0x00775904, int 0, _object * * 0x00775904, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes fast_function(_object * 0x009bc8ac, _object * * * 0x0063fc9c, int 0, int 0, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x009f86d0, _object * 0x0076f14c, _object * 0x0076f14c, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes PyEval_EvalCode(PyCodeObject * 0x009f86d0, _object * 0x0076f14c, _object * 0x0076f14c) line 338 + 29 bytes run_node(_node * 0x007aa740, char * 0x00760dd9, _object * 0x0076f14c, _object * 0x0076f14c) line 919 + 17 bytes run_err_node(_node * 0x007aa740, char * 0x00760dd9, _object * 0x0076f14c, _object * 0x0076f14c) line 907 + 21 bytes PyRun_FileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 257, _object * 0x0076f14c, _object * 0x0076f14c, int 1) line 899 + 21 bytes PyRun_SimpleFileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 1) line 612 + 30 bytes PyRun_AnyFileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 1) line 466 + 17 bytes Py_Main(int 2, char * * 0x00760da0) line 295 + 44 bytes main(int 2, char * * 0x00760da0) line 10 + 13 bytes insertdict is doing Py_DECREF(old_value); reset_exc_info is doing PySys_SetObject("exc_type", frame->f_exc_type); Bet that's as helpful to you as it was to me . From thomas@xs4all.net Sun Jan 21 22:13:02 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 21 Jan 2001 23:13:02 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <20010121225405.M17392@xs4all.nl>; from thomas@xs4all.net on Sun, Jan 21, 2001 at 10:54:05PM +0100 References: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> <20010121225405.M17392@xs4all.nl> Message-ID: <20010121231302.N17392@xs4all.nl> On Sun, Jan 21, 2001 at 10:54:05PM +0100, Thomas Wouters wrote: > I see an error in the nis module (missing function prototype, and broken > system-include file) and a *lot* of errors in linuxaudiodev The errors in linuxaudiodev are only errors because for some reason, in -ansi -pedantic-errors mode, gcc doesn't define the 'linux' symbol. IMHO, not worth fixing. The nismodule is 'broken' because of this: static nismaplist * nis_maplist (void) { nisresp_maplist *list; char *dom; CLIENT *cl, *clnt_create(); clnt_create() should be declared by the system include files. Anyone have objections to me moving it to pyport.h, inside the '#if 0' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Sun Jan 21 22:28:45 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 17:28:45 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <20010121225405.M17392@xs4all.nl> Message-ID: [Thomas Wouters] > Why is comparing v->ob_type with w->ob_type illegal ? They're > both pointers to the same type, aren't they ? Non-equality comparison of pointers is defined if and only if the pointers are both addresses in the same contiguous structure (think struct or array); an exception is made for a pointer "one beyond the end" of an array, i.e. if sometype a[N]; then &a[0] < &a[N] == 1 is guaranteed despite that &a[N] is outside the bounds of a; but &a[0] < &a[N+1] is undefined (which *means* undefined! e.g., it's OK if they compare equal, or if the comparison causes a hardware fault, or ...). > That's easy to check, gcc has these nice (and from a users point of view, > fairly useless) options: '-ansi', '-pedantic' and '-pedantic-errors'. > '-ansi' disables some GCC-specific features, -pedantic turns gcc into a > whiney pedantic I'm sure you'd get along with just fine , and > -pedantic-errors turns those whines into errors. Your faith in gcc is as charming as it is naive : the most interesting cases of undefined behavior can't be checked no-way, no-how at compile-time. That's why Barry keeps talking employers into dumping thousands of dollars into a single Insure++ license. Insure++ actually tags every pointer at runtime with its source, and gripes if non-equality comparisons are done on a pair not derived from the same array or malloc etc. Since Python type objects are individually allocated (not taken from a preallocated contiguous vector), Insure++ should complain about that compare. > ... > Note that this was tested in a current tree. I couldn't find > either Guido's 'broken' code or your proposed 'good' code, so I > don't know if you checked in a fix yet. If you didn't, don't bother, > it's not broken :-) Guido hasn't checked it in yet, but gcc isn't smart enough to detect *this* breakage anyway. From fredrik@effbot.org Sun Jan 21 23:02:10 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Mon, 22 Jan 2001 00:02:10 +0100 Subject: [Python-Dev] more unicode database changes Message-ID: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Just checked in another unicode database patch, which saves another ~60k. On my Windows box, the Unicode tables are now about 200k (down from 600k in 2.0). After this change, Modules/unicodedatabase.[ch] are no longer used. Since I'm on a Windows box with MSVC 5.0, I don't really want to try removing them from the official build files. In- stead, I've checked in empty versions of the files. Can anyone help me get rid of all references to them from the build files (and CVS)? PS. btw, if my changes broke the build somewhere, let me know asap! From tim.one@home.com Sun Jan 21 23:07:14 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:07:14 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: [Martin, on ftp://ftp.jclark.com/pub/xml/expat.zip] > ... > That gives some clue, yes. Unfortunately, that URL itself is a symlink > that was expat1_1.zip (157936 bytes) at some point, That's the one I've been using. > and now is expat1_2.zip (153591 bytes). I'm assuming you're recommending that one! Based on that assumption, I've downloaded a new one and will put that in the 2.1a1 Windows release. Scream if that's not what you want. > ... > Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either > works for me. I never tested 1.95.x (which is also not available from > jclark.com). If you do and love it, let me know where to get it and I'll ship that instead. >> The tests passed in the wee hours (EST; UTC -0500) this morning. >> They began failing after I updated around 1pm EST today. > I just merged pyexpat changes from PyXML into Python 2 so that could > be the cause. However, this very code has been used for some time by > PyXML users, why it crashes for you is a mystery to me. Perhaps gc, perhaps uninitialized vars, ..., hard to say. Unfortunately, it's not unusual for flawed code to display different behavior across platforms; or, from the long-term QA perspective, it's *great* that flawed code doesn't always appear to work on all platforms . > Any chance of producing a C backtrace? Sent that before; doesn't look like much help; we're seeing a NULL type pointer, but at that stage there's no telling when or where or why it *became* NULL. I'm going to rebuild the world from scratch, and use the new DLLs. You should assume that didn't help unless I say otherwise within 15 minutes. From tim.one@home.com Sun Jan 21 23:09:51 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:09:51 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: [/F] > Just checked in another unicode database patch, which > saves another ~60k. On my Windows box, the Unicode > tables are now about 200k (down from 600k in 2.0). Yay! I take it CNRI wasn't paying you by the byte . > After this change, Modules/unicodedatabase.[ch] are no > longer used. > > Since I'm on a Windows box with MSVC 5.0, I don't really > want to try removing them from the official build files. In- > stead, I've checked in empty versions of the files. That's fine. > Can anyone help me get rid of all references to them from > the build files (and CVS)? > > > > PS. btw, if my changes broke the build somewhere, let me > know asap! I'll take care of the MS project files -- and I was just about to rebuild the world from scratch anyway. From tim.one@home.com Sun Jan 21 23:20:03 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:20:03 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: > After this change, Modules/unicodedatabase.[ch] are no > longer used. Not so: unicodedata.c still #includes unicodedatabase.h. From tim.one@home.com Sun Jan 21 23:53:13 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:53:13 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: [/F] > ... > PS. btw, if my changes broke the build somewhere, let me > know asap! The Windows build is fine now and changes checked-in. You can remove Modules/unicodedatabase.[ch] from the project without hurting it (although I imagine the Unixish builds still need to learn about this!). From tim.one@home.com Mon Jan 22 00:12:21 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 19:12:21 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: More FYI: With the new expat1_2.zip (153591 bytes) DLLs, all tests pass on Windows except for test_sax. No change in symptoms. The failure modes for test_sax depend on all of: + Whether run in release or debug builds. + Whether text_sax.py is run directly or via regrtest.py. + Whether I delete all .pyc/.pyo files first, or use precomplied ones. + In debug builds, whether the test is started from within the debugger, or I start it via cmdline and attach to the process after it crashes (with a memory fault). Here's a new failure mode: test test_sax crashed -- XMLParserType: no element found: line 1, column 5 So this smells to high heaven of either a nasty gc problem or referencing uninitialized memory. Symptoms don't change if I stick import gc gc.disable() at the start of test_sax.py. Barry, can you try running test_sax under Insure? I've got little chance of making enough time tonight to figure this out the hard way ... From nas@arctrix.com Sun Jan 21 17:28:52 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Sun, 21 Jan 2001 09:28:52 -0800 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 07:12:21PM -0500 References: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: <20010121092852.A24605@glacier.fnational.com> On Sun, Jan 21, 2001 at 07:12:21PM -0500, Tim Peters wrote: > So this smells to high heaven of either a nasty gc problem or referencing > uninitialized memory. Symptoms don't change if I stick > > import gc > gc.disable() > > at the start of test_sax.py. Can you try it with WITH_CYCLE_GC undefined? Neil From greg@cosc.canterbury.ac.nz Mon Jan 22 00:25:08 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 13:25:08 +1300 (NZDT) Subject: [Python-Dev] a>b == b Message-ID: <200101220025.NAA01809@s454.cosc.canterbury.ac.nz> Suppose I have a class which checks whether it knows how to do a comparison, and if not, wants to pass it on to the other operand in case it knows: class Foo: def __lt__(self, other): if I_know_about(other): # do the comparison else: return other.__gt__(self) If the other operand has a __gt__ method which is doing similar tricks, infinite recursion could result. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Jan 22 00:36:51 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 13:36:51 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <200101191848.NAA02765@cj20424-a.reston1.va.home.com> Message-ID: <200101220036.NAA01813@s454.cosc.canterbury.ac.nz> Guido: > I don't understand how these can be not commutative unless they have a > side effect on the left argument I think he meant "not reflective". If ab == ceil(a,b), then clearly aa. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mwh21@cam.ac.uk Mon Jan 22 00:48:16 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 22 Jan 2001 00:48:16 +0000 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Greg Ewing's message of "Mon, 22 Jan 2001 13:36:51 +1300 (NZDT)" References: <200101220036.NAA01813@s454.cosc.canterbury.ac.nz> Message-ID: Greg Ewing writes: > Guido: > > > I don't understand how these can be not commutative unless they have a > > side effect on the left argument > > I think he meant "not reflective". If ab == > ceil(a,b), then clearly aa. What's floor of two arguments? In common lisp, (floor a b) is the largest integer n such that (<= n (/ a b)), in Python it's a type error... if you meant min(a,b), then I then think the programmer who thinks "min(a,b)" is spelt "a Message-ID: <200101220052.NAA01817@s454.cosc.canterbury.ac.nz> > Non-equality comparison of pointers is defined if and only if the pointers > are both addresses in the same contiguous structure I'm not sure that the proposed alternative (casting both pointers to ints and comparing the ints) is any better. Does the C std define the result of doing that to two unrelated pointers? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Mon Jan 22 00:56:16 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 19:56:16 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <20010121092852.A24605@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Can you try it with WITH_CYCLE_GC undefined? Good idea -- for someone with an infinite amount of free time . But being a good sport, I did as you asked with giddy cheer. Alas, it didn't help (all the same bizarre context-dependent test_sax failure modes). I'm sure I disabled WITH_CYCLE_GC correctly, because "import gc" now fails with ImportError in both release and debug builds. BTW, a refcount-too-low problem is another good candidate. From greg@cosc.canterbury.ac.nz Mon Jan 22 01:00:46 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 14:00:46 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: <200101220100.OAA01820@s454.cosc.canterbury.ac.nz> Michael Hudson : > if you meant min(a,b), Yes, sorry, that's what I meant. Or at least that's what I thought the original poster meant - if he didn't, then I'm confused, too! Anyway, I agree that it's a silly thing to want to make a>b mean, and I'm not all that disappointed that it won't be possible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Mon Jan 22 01:11:52 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:11:52 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101220052.NAA01817@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > I'm not sure that the proposed alternative (casting both > pointers to ints and comparing the ints) is any better. > Does the C std define the result of doing that to two > unrelated pointers? C99 guarantees that, if the type exists, casting a pointer to type uintptr_t won't blow up, and also guarantees that comparisons between (at least) ints of the same type won't blow up. Beyond that, we don't care what it returns. Mostly we're trying to eliminate warnings Barry has to wade thru from Insure++ -- same reason we have a "no compiler warnings!" build policy. Doing the cast is obviously "better" when viewed through Barry's 4AM eyes. You can find out *why* C has this rule (which was in C89, not new in C99) by reading the C FAQ. From tim.one@home.com Mon Jan 22 01:23:27 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:23:27 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: [Michael Hudson] > ... > if you meant min(a,b), then I then think the programmer who > thinks "min(a,b)" is spelt "a deal with (if min has a symbol it's /\, but never mind that). Curiously, in the Icon language, if a is less than b then a < b returns b while b > a returns a. In this way they get the same effect as Python's chained comparisons a < b < c < d via purely binary operators (if a is *not* less than b, a < b in Icon "fails", which is a silent event that causes the expression's context to backtrack -- but we won't go into that here ). Anyway, that accounts for this curious Icon idiom: a <:= b which is short for a := a < b and binds a to max(a, b) (if a is smaller, a < b returns b and the assignment proceeds; but if a is not smaller, a < b fails and that propagates into its context, which here has no other possibilities to backtrack into, so the stmt just ends leaving a alone). "<"-and-">"-are-just-bags-of-pixels-ly y'rs - tim From uche.ogbuji@fourthought.com Mon Jan 22 01:24:46 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Sun, 21 Jan 2001 18:24:46 -0700 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: Message from Guido van Rossum of "Sun, 21 Jan 2001 12:00:02 EST." <200101211700.MAA25479@cj20424-a.reston1.va.home.com> Message-ID: <200101220124.SAA08868@localhost.localdomain> > > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > > should be an annotation in the documentation that indicates whether or not a > > module is thread-safe. For example, many functions in fileinput rely on a > > module global called _state. It strikes me that this module is not likely > > to be thread-safe, yet the documentation doesn't appear to mention this, > > certainly not in an obvious fashion. > > > > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > > LaTex macros in Fred's arsenal? This would make documenting these > > properties both easy and consistent across modules. > > It's hard to say whether a *whole module* is threadsafe. E.g. in the > fileinput example, there's the clear implication that if you use this > in multiple threads, you should instantiate your own FileInput > instances, and then you're totally thread-safe. Clearly the semantics > of the module-global functions are thread-unsafe though. Perhaps what is needed rather is a prose annotation for thread-safety issues. My TeX is rusty, but in Docbook, with the use of role attributes, one could have, taking your FileInput example The module-global functions are not safe, but if you instantiate your own FileInput instances, they will be totally thread-safe. That way the MT issues could be styled differently on rendering, gathered into separate documentation, stripped by those who don't care, etc. I imagine this is also possible in TeX. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one@home.com Mon Jan 22 01:32:30 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:32:30 -0500 Subject: [Python-Dev] a>b == b Message-ID: [Greg Ewing] > Suppose I have a class which checks whether it knows > how to do a comparison, and if not, wants to pass it > on to the other operand in case it knows: > > class Foo: > > def __lt__(self, other): > if I_know_about(other): > # do the comparison > else: > return other.__gt__(self) > > If the other operand has a __gt__ method which is > doing similar tricks, infinite recursion could result. Does this have something to do with comparisons? That is, wouldn't the same be true if you coded two methods named "spam" and "eggs" in this way? whatever = 0 class Foo: def spam(self, other): if whatever: return 1 else: return other.eggs(self) class Bar: def eggs(self, other): if whatever: return 1 else: return other.spam(self) Foo().spam(Bar()) # RuntimeError: Maximum recursion depth exceeded It that's all there is to it, you got what you asked for. From greg@cosc.canterbury.ac.nz Mon Jan 22 03:31:41 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 16:31:41 +1300 (NZDT) Subject: [Python-Dev] a>b == b Message-ID: <200101220331.QAA01833@s454.cosc.canterbury.ac.nz> Tim Peters : > Does this have something to do with comparisons? That is, wouldn't the same > be true if you coded two methods named "spam" and "eggs" in this > way? Yes, but Guido hasn't decreed that a.spam(b) and b.eggs(a) are to have a reflective relationship with each other. But don't worry - I've belatedly realised that the correct way to do what I was talking about is to return NotImplemented and let the interpreter take care of calling the reflected method. So I withdraw my objection. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Mon Jan 22 07:54:32 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 02:54:32 -0500 Subject: [Python-Dev] Worse news Message-ID: I still don't have a clue about test_sax, but have stumbled into more failure modes. Most of them seem related to the SystemError ("'finally' pops bad exception"). Around that part of ceval.c, sometimes the v popped off the stack has a NULL type pointer, other times it's a pointer to a damaged PyTuple_Type (for example, with a tp_dealloc field of 0x61, which leads to an illegal instruction exception). The MS debug heap routines fill all newly malloc'ed memory with 0Xcd ("clean landfill"), fill free'ed memory with 0Xdd ("dead landfill"), and *pad* malloc'ed memory with some number of 0xfd bytes on both sides ("no-man's land"). The clean landfill and no-man's land patterns are showing up more often they should "by chance", and especially in high-order bytes. Just more evidence of the obvious: something is really screwed up . I cannot get the subtest that test_sax is calling (test_expat_incomplete) to fail in isolation. Next headache: If I delete all .pyc files from Lib/ and Lib/test/, and then run: python ../lib/test/regrtest.py -x test_sax by hand, all the 98 tests that *should* run on Windows (excluding, of course, test_sax, which is no longer tried) pass. If I immediately run them again (without deleting .pyc) by hand: python ../lib/test/regrtest.py -x test_sax then they again pass. However, if I do rt -x test_sax which does exactly the steps (delete .pyc, run regrest excluding test_sax, run regrtest again) via the little MS batch file rt.bat, then on the second time thru regrtest, and 5 times out of 5, it died in test_extcall with an "illegal operation", while executing if (TYPE(c) == DOUBLESTAR) { near the end of symtable_params in compile.c. This is an optimized build, and the debugger has no idea what's in c at this point; to judge from the offending machine instruction and register contents, though, c is a bad pointer. Have not been able to get test_extcall to fail in isolation. Have also been unable to get test_extcall to fail in the debug build. So there's evidence of Deep Rot beyond test_sax, but test_sax remains the only test that fails every time and under both build types. Running regrtest with -r (randomize test order) is also "interesting": first time I tried that, test_cpickle failed (truncated output) as well as test_sax. I doubt anyone has run the tests more often than me over the last week, so I'm not surprised I'm seeing the most problems. However, since *nobody* is seeing anything on Linux, I'd at least like to get *someone* else to run the tests on Windows. While I'm not having any unusual problems with my box, it's certainly possible that I've got a corrupted file or a flaky memory chip etc, or that MSVC is generating bad code for some recent change (although that's unlikely since the debug build generates *really* straightforward code). Deleting my entire PCbuild subtree and refetching it from CVS didn't make any difference. From esr@thyrsus.com Mon Jan 22 08:01:27 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 03:01:27 -0500 Subject: [Python-Dev] autoconf --enable vs. --with In-Reply-To: <200101212122.WAA16371@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Sun, Jan 21, 2001 at 10:22:44PM +0100 References: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> Message-ID: <20010122030127.C20804@thyrsus.com> Martin von Loewis : > > It looks like with-cycle-gc and mybe with-pydebug would have to be > > changed. > > I'm in favour of changing it. Likewise. Let's be good neighbors. -- Eric S. Raymond Where rights secured by the Constitution are involved, there can be no rule making or legislation which would abrogate them. -- Miranda vs. Arizona, 384 US 436 p. 491 From loewis@informatik.hu-berlin.de Mon Jan 22 08:26:15 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 22 Jan 2001 09:26:15 +0100 (MET) Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: References: Message-ID: <200101220826.JAA20819@pandora.informatik.hu-berlin.de> > Running it from a command line instead produces the same output up to but > not including the traceback, and Python crashes with a memory fault then. > Attaching to the process with a debugger at that point shows it trying to do > _Py_Dealloc on an op whose op->op_type member is NULL. [...] > Bet that's as helpful to you as it was to me . Well, it was atleast motivating enough to try it out on my Whistler installation. Purify would probably find this rather quickly; the code writes into the 257th element of a 256-elements array. I've committed a fix. Depending on the exact organization of globals, this could have easily gone unnoticed. MSVC packs variables more than gcc does, so the write would overwrite one byte in ErrorObject, which would then not point to a PyObject anymore. Thanks for your patience, Martin From tim.one@home.com Mon Jan 22 09:18:04 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 04:18:04 -0500 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <200101220826.JAA20819@pandora.informatik.hu-berlin.de> Message-ID: [Martin] > Well, it was atleast motivating enough to try it out on my Whistler > installation. Purify would probably find this rather quickly; the code > writes into the 257th element of a 256-elements array. Ah! You shouldn't do that . > I've committed a fix. But you should do that. Thank you! Here's where I am now: ========================================================================= All test_sax failures have gone away (yay!). ========================================================================= Running rt -x test_sax on Windows still blows up in test_extcall on the 2nd pass. It does not blow up: using the debug build; or if test_sax is *not* excluded; or in the 1st pass; or when running text_extcall in isolation; or if the steps rt performs are done by hand ========================================================================= Running rt -r on Windows still sees test_cpickle fail in the first pass (with truncated output), but succeed in the second pass. First-pass failure is always like so (modulo line breaks I'm inserting by hand): test test_cpickle failed -- Tail of expected stdout unseen: 'dumps()\012 loads()\012 ok\012 loads() DATA\012 ok\012 dumps() binary\012 loads() binary\012 ok\012 loads() BINDATA\012 ok\012 dumps() RECURSIVE\012 ok\012' I've also seen it fail at least once when doing the same thing by hand: del ..\lib\*.pyc del ..\lib\test\*.pyc python ../lib/test/regrtest.py -r else-i-would-have-asked-martin-to-look-for-a digit-to-change-in- command.com-ly y'rs - tim From mal@lemburg.com Mon Jan 22 10:19:18 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:19:18 +0100 Subject: [Python-Dev] more unicode database changes References: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: <3A6C0926.D0A004E4@lemburg.com> Fredrik Lundh wrote: > > Just checked in another unicode database patch, which > saves another ~60k. On my Windows box, the Unicode > tables are now about 200k (down from 600k in 2.0). Great work, Fredrik :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 22 10:42:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:42:52 +0100 Subject: [Python-Dev] readline and setup.py References: <3A68B5B0.771412F7@lemburg.com> Message-ID: <3A6C0EAC.7D322174@lemburg.com> "M.-A. Lemburg" wrote: > > The new setup.py procedure for Python causes readline not to > be built on my machine. Instead I get a linker error telling > me that termcap is not found. > > Looking at my old Setup file, I have this line: > > readline readline.c \ > -I/usr/include/readline -L/usr/lib/termcap \ > -lreadline -lterm > > I guess, setup.py should be modified to include additional > library search paths -- shouldn't hurt on platforms which > don't need them. Here's a patch which works for me: projects/Python> diff CVS-Python/setup.py Dev-Python/ --- CVS-Python/setup.py Mon Jan 22 11:36:56 2001 +++ Dev-Python/setup.py Mon Jan 22 11:40:15 2001 @@ -216,10 +216,11 @@ class PyBuildExt(build_ext): exts.append( Extension('rgbimg', ['rgbimgmodule.c']) ) # readline if (self.compiler.find_library_file(lib_dirs, 'readline')): exts.append( Extension('readline', ['readline.c'], + library_dirs=['/usr/lib/termcap'], libraries=['readline', 'termcap']) ) # The crypt module is now disabled by default because it breaks builds # on many systems (where -lcrypt is needed), e.g. Linux (I believe). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 22 10:52:17 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:52:17 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> Message-ID: <3A6C10E1.EF890356@lemburg.com> "M.-A. Lemburg" wrote: > > Why does setup.py stop with an error in case _tkinter cannot > be built (due to an old Tk/Tcl version in my case) ? > > I think the policy in setup.py should be to output warnings, > but continue building the rest of the Python modules. I haven't heard anything from the powers to be... what should the policy be for auto-detected and -configured modules ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas@xs4all.net Mon Jan 22 12:37:04 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 13:37:04 +0100 Subject: [Python-Dev] _tkinter and setup.py In-Reply-To: <3A6C10E1.EF890356@lemburg.com>; from mal@lemburg.com on Mon, Jan 22, 2001 at 11:52:17AM +0100 References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> Message-ID: <20010122133704.O17392@xs4all.nl> On Mon, Jan 22, 2001 at 11:52:17AM +0100, M.-A. Lemburg wrote: > "M.-A. Lemburg" wrote: > > I think the policy in setup.py should be to output warnings, > > but continue building the rest of the Python modules. > I haven't heard anything from the powers to be... what should the > policy be for auto-detected and -configured modules ? I think Andrew is still working on a way to disable modules from the command line somehow. (I think moving setup.py to setup.py.in, and using autoconf --options would be easiest on both developer and user, but that's just me.) I also think everyone agrees with you that a module that can't be build shouldn't stop the entire process in the final release (and possibly the betas) but that it's definately a good way to debug setup.py in the alphas. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tismer@tismer.com Mon Jan 22 13:13:46 2001 From: tismer@tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 14:13:46 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: Message-ID: <3A6C320A.37CBB4E5@tismer.com> Maybe I can help. Tim Peters wrote: ... > Here's where I am now: > > ========================================================================= > All test_sax failures have gone away (yay!). > ========================================================================= > Running > > rt -x test_sax > > on Windows still blows up in test_extcall on the 2nd pass. It does not blow > up: > > using the debug build; or > if test_sax is *not* excluded; or > in the 1st pass; or > when running text_extcall in isolation; or > if the steps rt performs are done by hand ... I got problems with XML as well. I'm not using SAX, but plain expat for speed. The following error happens after parsing thousands of small XML files: from_my_log_window=""" \\bned-s1\tismer\pxml\sdf\mdl\DisplayRGB\1 \\bned-s1\tismer\pxml\sdf\mdl\DisplayVideo\1 Traceback (innermost last): File "", line 1, in ? File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 149, in getall res.append(p.parse()) File "D:\crml_doc\pxml\clean.py", line 81, in parse self.parsers[0].Parse(self.txt1, 1) File "D:\crml_doc\pxml\clean.py", line 53, in endElementMaster if self.txt2: self.parsers[1].Parse(self.txt2, 1) File "D:\crml_doc\pxml\clean.py", line 46, in startElementOther if name <> "MASTER": UnicodeError: UTF-8 decoding error: invalid data """ The good news: The error is reproducible, happens the same under PythonWin and DOS Python, and I can reduce it to a single XML file. That indicates to me that I am near the reason of the bug, not at late, indirect effects. It also *might* be related to Unicode. I will now try to create a minimized script and XML data that produces the above again. back in an hour - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas@xs4all.net Mon Jan 22 13:52:44 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 14:52:44 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 05:28:45PM -0500 References: <20010121225405.M17392@xs4all.nl> Message-ID: <20010122145244.Y17295@xs4all.nl> On Sun, Jan 21, 2001 at 05:28:45PM -0500, Tim Peters wrote: > [Thomas Wouters] > > Why is comparing v->ob_type with w->ob_type illegal ? They're > > both pointers to the same type, aren't they ? > Non-equality comparison of pointers is defined if and only if the pointers > are both addresses in the same contiguous structure (think struct or array); > an exception is made for a pointer "one beyond the end" of an array, i.e. if > sometype a[N]; > then &a[0] < &a[N] == 1 is guaranteed despite that &a[N] is outside the > bounds of a; but &a[0] < &a[N+1] is undefined (which *means* undefined! > e.g., it's OK if they compare equal, or if the comparison causes a hardware > fault, or ...). Ok, I guess I stand corrected. I was confused by the name of Py_uintptr_t: I thought it was a pointer-to-int, not an int large enough to hold a pointer. I'm also positively appalled by the fact the standard refuses to define sane behaviour for out-of-bounds access on an array, but attaches some weird significance to what pointers are pointing *to*, when comparing the values of those pointers, regardless of what type of object they are stored in. But I guess I don't have to whine about that to you, Tim :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tismer@tismer.com Mon Jan 22 14:03:25 2001 From: tismer@tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 15:03:25 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> Message-ID: <3A6C3DAD.522CE623@tismer.com> Christian Tismer wrote: > > Maybe I can help. ... ... > I will now try to create a minimized script and XML data that > produces the above again. > > back in an hour - chris Here we go. The following session produces the mentioned UTF8 error: >>> txt = "" >>> def startelt(name, dic): ... print name, dic ... >>> p=expat.ParserCreate() >>> p.StartElementHandler = startelt >>> p.Parse(txt) Traceback (innermost last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data Behavior depends of the ASCII code. >From code 128 (0200) to 191 (0277) the parser gives an not well-formed exception, as it should be. The codes from 192 to 236, 238-243 produce "UTF-8 decoding error: invalid data", the rest gives "not well-formed". I would like to know if this happens with your (Tim) modified version as well. I'm using plain vanilla BeOpen Python 2.0 . cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From jeremy@alum.mit.edu Mon Jan 22 14:19:34 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 09:19:34 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: References: Message-ID: <14956.16758.68050.257212@localhost.localdomain> Tim, Funny (strange or haha?) that test_extcall is failing since the two pieces of code I've modified most recently are compile.c and the section of ceval.c that handles extended call syntax. I just got through my mail this morning and I'll see what I can reproduce on Linux. As for the test_sax failure, is any of the Python code being executed conditional on platform? The compiler may be generating bad bytecode for a code path that is only executed on Windows. Jeremy From mal@lemburg.com Mon Jan 22 14:27:38 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 15:27:38 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> Message-ID: <3A6C4359.BCB06252@lemburg.com> Christian Tismer wrote: > > Christian Tismer wrote: > > > > Maybe I can help. > > ... > > ... > > I will now try to create a minimized script and XML data that > > produces the above again. > > > > back in an hour - chris > > Here we go. > The following session produces the mentioned UTF8 error: > > >>> txt = "" > >>> def startelt(name, dic): > ... print name, dic > ... > >>> p=expat.ParserCreate() > >>> p.StartElementHandler = startelt > >>> p.Parse(txt) > Traceback (innermost last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > Behavior depends of the ASCII code. > >From code 128 (0200) to 191 (0277) the parser gives an > not well-formed exception, as it should be. > > The codes from 192 to 236, 238-243 produce > "UTF-8 decoding error: invalid data", > the rest gives "not well-formed". > > I would like to know if this happens with your (Tim) modified > version as well. I'm using plain vanilla BeOpen Python 2.0 . This has nothing to do with Python. UTF-8 marks the codes from 128-191 as illegal prefix. See Object/unicodeobject.c: static char utf8_code_length[256] = { /* Map UTF-8 encoded prefix byte to sequence length. zero means illegal prefix. see RFC 2279 for details */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 0, 0 }; Perhaps the parser should catch the UnicodeError and instead return a not-wellformed exception ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 22 14:38:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 15:38:14 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> Message-ID: <3A6C45D5.9A6FA25C@lemburg.com> Thomas Wouters wrote: > > On Mon, Jan 22, 2001 at 11:52:17AM +0100, M.-A. Lemburg wrote: > > "M.-A. Lemburg" wrote: > > > > I think the policy in setup.py should be to output warnings, > > > but continue building the rest of the Python modules. > > > I haven't heard anything from the powers to be... what should the > > policy be for auto-detected and -configured modules ? > > I think Andrew is still working on a way to disable modules from the command > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > --options would be easiest on both developer and user, but that's just me.) This is fairly simple to do: distutils allows great flexibility when it comes to adding user options, e.g. we could have python setup.py --enable-tkinter --disable-readline or more generic python setup.py --enable-package tkinter --disable-package readline The options could then be edited in setup.cfg. > I also think everyone agrees with you that a module that can't be build > shouldn't stop the entire process in the final release (and possibly the > betas) but that it's definately a good way to debug setup.py in the alphas. True... but currently the only way to get Python to compile is to hand-edit setup.py and this is not easy for people with no prior distutils experience. BTW, in my case, setup.py did find the TK-libs for 8.0, but for a beta version -- as a result, _tkinter.c's version #error line triggered and the build failed. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@digicool.com Mon Jan 22 14:38:30 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 09:38:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,NONE,1.1 In-Reply-To: Your message of "Sun, 21 Jan 2001 14:37:57 +0200." <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> References: <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> Message-ID: <200101221438.JAA29303@cj20424-a.reston1.va.home.com> > Wouldn't it be better to use the > > d = {} > exec "foo", d Surely you meant exec "foo" in d --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Mon Jan 22 14:43:42 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 15:43:42 +0100 Subject: [Python-Dev] _tkinter and setup.py In-Reply-To: <3A6C45D5.9A6FA25C@lemburg.com>; from mal@lemburg.com on Mon, Jan 22, 2001 at 03:38:14PM +0100 References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> <3A6C45D5.9A6FA25C@lemburg.com> Message-ID: <20010122154342.B17295@xs4all.nl> On Mon, Jan 22, 2001 at 03:38:14PM +0100, M.-A. Lemburg wrote: > > I think Andrew is still working on a way to disable modules from the command > > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > > --options would be easiest on both developer and user, but that's just me.) > This is fairly simple to do: distutils allows great flexibility > when it comes to adding user options, e.g. we could have > > python setup.py --enable-tkinter --disable-readline > > or more generic > > python setup.py --enable-package tkinter --disable-package readline > > The options could then be edited in setup.cfg. Note that the 'user' only has 'configure' and 'make' to run, so optimally, the options would have to be given to one of those (preferably to 'configure', to keep it similar to 90% of the packages out there.) > but currently the only way to get Python to compile is > to hand-edit setup.py and this is not easy for people with no > prior distutils experience. You only have to edit the 'disabled_module_list' variable... not too hard even if you don't have distutils experience (though you do need some python experience.) I don't think its wrong to expect people who compile alpha versions to have at least that much knowledge (though it should be noted in the README somewhere.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis@informatik.hu-berlin.de Mon Jan 22 14:46:39 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 22 Jan 2001 15:46:39 +0100 (MET) Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <3A6C4359.BCB06252@lemburg.com> (mal@lemburg.com) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> <3A6C4359.BCB06252@lemburg.com> Message-ID: <200101221446.PAA05164@pandora.informatik.hu-berlin.de> > This has nothing to do with Python. UTF-8 marks the codes > from 128-191 as illegal prefix. [...] > Perhaps the parser should catch the UnicodeError and > instead return a not-wellformed exception ?! Right on both accounts. If no encoding is specified, and if the document appears not to be UTF-16 in any endianness, an XML processor shall assume it is UTF-8. As Marc-Andre explains, your document is not proper UTF-8, hence the error. The confusing thing is that expat itself does not care about it not being UTF-8; that is only detected when the callback is invoked in pyexpat, and therefore conversion to a Unicode object is attempted. The right solution probably would be to change expat so that it determines correctness of the encoding for each string it gets as part of the wellformedness analysis, and produces illformedness exceptions when an encoding error occurs. Patches are welcome, although they probable should go to sourceforge.net/projects/expat. Regards, Martin From jack@oratrix.nl Mon Jan 22 14:57:33 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 22 Jan 2001 15:57:33 +0100 Subject: [Python-Dev] test_sax and site-python Message-ID: <20010122145733.85E51373C95@snelboot.oratrix.nl> I'm not sure whether this is really a bug, but I had the problem that there was something wrong with the xml package I had installed into my Lib/site-python, and this caused test_sax to complain. If the test stuff is expected to test only the core functionality maybe sys.path should be edited so that it only contains directories that are part of the core distribution? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From tismer@tismer.com Mon Jan 22 15:05:24 2001 From: tismer@tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 16:05:24 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> <3A6C4359.BCB06252@lemburg.com> Message-ID: <3A6C4C34.4D1252C9@tismer.com> "M.-A. Lemburg" wrote: ... > > The codes from 192 to 236, 238-243 produce > > "UTF-8 decoding error: invalid data", > > the rest gives "not well-formed". > > > > I would like to know if this happens with your (Tim) modified > > version as well. I'm using plain vanilla BeOpen Python 2.0 . > > This has nothing to do with Python. UTF-8 marks the codes > from 128-191 as illegal prefix. See Object/unicodeobject.c: ... Schade. > Perhaps the parser should catch the UnicodeError and > instead return a not-wellformed exception ?! I belive it would be better. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@digicool.com Mon Jan 22 15:06:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:06:06 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.24,2.25 In-Reply-To: Your message of "Sun, 21 Jan 2001 15:34:14 PST." References: Message-ID: <200101221506.KAA29773@cj20424-a.reston1.va.home.com> > Move declaration of 'clnt_create()' NIS function to pyport.h, as it's > supposed to be declared in system include files (with a proper prototype.) > Should be moved to a platform-specific block if anyone finds out which > broken platforms need it :-) [The following is inside #if 0] > + /* From Modules/nismodule.c */ > + CLIENT *clnt_create(); > + Thomas, I'm not sure if this particular declaration belongs in pyport.h, even inside #if 0. CLIENT is declared in a NIS-specific header file that's not included by pyport.h, but which *is* included by nismodule.c. I think you did the right thing to nismodule.c; the pyport.h patch is redundant in my eyes. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Mon Jan 22 15:12:49 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 16:12:49 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> <3A6C45D5.9A6FA25C@lemburg.com> <20010122154342.B17295@xs4all.nl> Message-ID: <3A6C4DF1.F71AA631@lemburg.com> Thomas Wouters wrote: > > On Mon, Jan 22, 2001 at 03:38:14PM +0100, M.-A. Lemburg wrote: > > > > I think Andrew is still working on a way to disable modules from the command > > > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > > > --options would be easiest on both developer and user, but that's just me.) > > > This is fairly simple to do: distutils allows great flexibility > > when it comes to adding user options, e.g. we could have > > > > python setup.py --enable-tkinter --disable-readline > > > > or more generic > > > > python setup.py --enable-package tkinter --disable-package readline > > > > The options could then be edited in setup.cfg. > > Note that the 'user' only has 'configure' and 'make' to run, so optimally, > the options would have to be given to one of those (preferably to > 'configure', to keep it similar to 90% of the packages out there.) Hmm, but then you'll have to hack autoconf again... (even if only to pass the options to setup.py somehow, e.g. via your proposed setup.cfg.in trick). > > but currently the only way to get Python to compile is > > to hand-edit setup.py and this is not easy for people with no > > prior distutils experience. > > You only have to edit the 'disabled_module_list' variable... not too hard > even if you don't have distutils experience (though you do need some python > experience.) I don't think its wrong to expect people who compile alpha > versions to have at least that much knowledge (though it should be noted in > the README somewhere.) Oops, you're right; must have overlooked that one in setup.py. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas@xs4all.net Mon Jan 22 15:14:02 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 16:14:02 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.24,2.25 In-Reply-To: <200101221506.KAA29773@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 10:06:06AM -0500 References: <200101221506.KAA29773@cj20424-a.reston1.va.home.com> Message-ID: <20010122161402.D17295@xs4all.nl> On Mon, Jan 22, 2001 at 10:06:06AM -0500, Guido van Rossum wrote: > > Move declaration of 'clnt_create()' NIS function to pyport.h, as it's > > supposed to be declared in system include files (with a proper prototype.) > > Should be moved to a platform-specific block if anyone finds out which > > broken platforms need it :-) > > [The following is inside #if 0] > > + /* From Modules/nismodule.c */ > > + CLIENT *clnt_create(); > > + > > Thomas, I'm not sure if this particular declaration belongs in > pyport.h, even inside #if 0. > > CLIENT is declared in a NIS-specific header file that's not included by > pyport.h, but which *is* included by nismodule.c. > > I think you did the right thing to nismodule.c; the pyport.h patch is > redundant in my eyes. The same goes for most prototypes inside that '#if 0'. I see it more as an easy list to see what prototypes were removed than as proper examples of the prototype. You're right about CLIENT being defined in system-specific include files, I just wasn't worried about it because it was inside an '#if 0' that will never be turned into an '#if 1'. If a specific platform needs that prototype, we'll figure out how to arrange the prototype then :) But if you want me to remove it, that's fine. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Mon Jan 22 15:22:29 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:22:29 -0500 Subject: [Python-Dev] autoconf --enable vs. --with In-Reply-To: Your message of "Mon, 22 Jan 2001 03:01:27 EST." <20010122030127.C20804@thyrsus.com> References: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> <20010122030127.C20804@thyrsus.com> Message-ID: <200101221522.KAA30287@cj20424-a.reston1.va.home.com> > I've been working a bit on the build process lately. I came > across this in the autoconf documentation: > > > If a software package has optional compile-time features, the > user can give `configure' command line options to specify > whether to compile them. The options have one of these forms: > > --enable-FEATURE[=ARG] > --disable-FEATURE > > Some packages require, or can optionally use, other software > packages which are already installed. The user can give > `configure' command line options to specify which such > external software to use. The options have one of these > forms: > > --with-package[=ARG] > --without-package > > > Is it worth fixing the Python configure script to comply with > these definitions? It looks like with-cycle-gc and mybe > with-pydebug would have to be changed. OK, but please add explicit checks for the old --with[out]-cycle-gc and --with[out]-pydebug flags that cause errors (not just warnings) when these forms are used. It's bad enough that configure doesn't flag typos in such options as errors; if we change the option names, we really owe users who were using the old forms a clear error. (Is this stupid autoconf behavior changable? Does it also apply to enable/disable?) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Mon Jan 22 15:19:49 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 22 Jan 2001 10:19:49 -0500 (EST) Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: References: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: <14956.20373.104748.573294@cj42289-a.reston1.va.home.com> [Martin, on ftp://ftp.jclark.com/pub/xml/expat.zip] > Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either > works for me. I never tested 1.95.x (which is also not available from > jclark.com). Tim Peters writes: > If you do and love it, let me know where to get it and I'll ship that > instead. I'll recommend not updating to 1.95.1; let's awit at least until 1.95.2 is out. These are really just pre-2.0 releases to shake things out. I have been using the current Expat CVS lightly, but need to do more testing before I can be confident in it and our bindings (not yet checked in anywhere; should be in PyXML soon). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jeremy@alum.mit.edu Mon Jan 22 15:44:41 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 10:44:41 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: References: Message-ID: <14956.21865.943601.735426@localhost.localdomain> On Linux, I am also seeing test_cpickle failures. I have not been able to reproduce failures in test_extcall or test_sax. I ran 'regrtest.py -r -x test_thread test_unicodedata test_signal test_select test_poll' 10 times and test_cpickle failed five times. (I did the peculiar run because exclyding those five tests shaves two minutes off the running time of the test suite.) No more time to look into this... Jeremy From jeremy@alum.mit.edu Mon Jan 22 15:26:27 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 10:26:27 -0500 (EST) Subject: [Python-Dev] getcode() function in pyexpat.c Message-ID: <14956.20771.447958.389724@localhost.localdomain> The pyexpat module uses functions named getcode() and call_with_frame() for handlers of some sort. I can make this much out from the code, but the rest is a bit of a mystery. I was trying to read this code because of the errors Tim is seeing with test_sax on Windows. A few comments to explain this highly stylized and macro-laden code would be appreciated. The module appears to be creating empty code objects and calling them. I say they appear to be empty, because when they are created they don't appear to have anything initialized except name, filename, and firstlineno. getcode(EndNamespaceDecl, 419) (The freevars and cellvars entries are part of the support for nested scopes. They can be safely ignored for the moment.) I simply don't understand what's going on -- and I'm deeply suspicious that it is the source of whatever problems Tim is seeing with test_sax. Jeremy From thomas@xs4all.net Mon Jan 22 15:55:35 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 16:55:35 +0100 Subject: [Python-Dev] 'make distclean' broken. Message-ID: <20010122165535.P17392@xs4all.nl> 'make distclean' seems broken, at least on non-GNU make's: [snip] clobbering subdirectory Modules rm -f *.o python core *~ [@,#]* *.old *.orig *.rej rm -f add2lib hassignal rm -f *.a tags TAGS config.c Makefile.pre rm -f *.so *.sl so_locations make -f ./Makefile.in SUBDIRS="Include Lib Misc Demo" clobber "./Makefile.in", line 134: Need an operator make: fatal errors encountered -- cannot continue *** Error code 1 (ignored) rm -f config.status config.log config.cache config.h Makefile rm -f buildno platform rm -f Modules/Makefile [snip] (This is using FreeBSD's 'make'.) Looking at line 134, I'm not sure why it works with GNU make other than that it avoids complaining about syntax errors it doesn't run into (which could be both bad and good :) or that it avoids complaining about obvious GNU autoconf tricks. But I don't know enough about make to say for sure, nor to fix the above problem. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Mon Jan 22 15:55:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:55:42 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sun, 21 Jan 2001 17:28:45 EST." References: Message-ID: <200101221555.KAA30935@cj20424-a.reston1.va.home.com> > Your faith in gcc is as charming as it is naive : the most > interesting cases of undefined behavior can't be checked no-way, no-how at > compile-time. That's why Barry keeps talking employers into dumping > thousands of dollars into a single Insure++ license. Insure++ actually tags > every pointer at runtime with its source, and gripes if non-equality > comparisons are done on a pair not derived from the same array or malloc > etc. Since Python type objects are individually allocated (not taken from a > preallocated contiguous vector), Insure++ should complain about that > compare. IMHO, *this* *particular* gripe of Insure++ is just a pain in the butt, and I wish there was a way to turn it off in Insure++ without having to fix the code. IMHO, this was included in the standard to allow segmented-memory implementations of C. Think certain DOS or Windows 3.1 memory models where a pointer is a segment plus an offset. This is not current practice even on Palmpilots! The standard may say that such comparisons are undefined, but I don't care about this particular undefinedness, and I'm annoyed by the required patches. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jan 22 16:02:15 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 11:02:15 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sun, 21 Jan 2001 14:44:38 EST." References: Message-ID: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> > > My only concern is that under the old schele, two different numeric > > extension types that somehow can't be compared will end up being > > *equal*. To fix this, I propose that if the names compare equal, as a > > last resort we compare the type pointers -- this should be consistent > > too. > > Agreed, and sounds fine! Checked in now. While fixing the test_b1 code again, which depends on this behavior, I thought of a refinement: it wouldn't be hard to make None compare smaller than *anything* (including numbers). Is this worth it? diff -c -r2.113 object.c *** object.c 2001/01/22 15:59:32 2.113 --- object.c 2001/01/22 16:03:38 *************** *** 550,555 **** --- 550,561 ---- PyErr_Clear(); } + /* None is smaller than anything */ + if (v == Py_None) + return -1; + if (w == Py_None) + return 1; + /* different type: compare type names */ if (v->ob_type->tp_as_number) vname = ""; --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh21@cam.ac.uk Mon Jan 22 16:12:47 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: Mon, 22 Jan 2001 16:12:47 +0000 (GMT) Subject: [Python-Dev] Worse news In-Reply-To: <14956.21865.943601.735426@localhost.localdomain> Message-ID: On Mon, 22 Jan 2001, Jeremy Hylton wrote: > On Linux, I am also seeing test_cpickle failures. I have not been > able to reproduce failures in test_extcall or test_sax. Hmm - my machine's done 28 exemplary "make clean; make test" runs this morning. I last updated yesterday afternoon my time (~1700 GMT). Of course, I don't build pyexpat... > No more time to look into this... Don't you just love memory corruption bugs? Cheers, M. From akuchlin@mems-exchange.org Mon Jan 22 16:28:59 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Mon, 22 Jan 2001 11:28:59 -0500 Subject: [Python-Dev] Python 2.1 article Message-ID: I've put together an almost-complete first draft of a "What's New in 2.1" article. The only missing piece is a section on the Nested Scopes PEP, which obviously has to wait for the changes to get checked in. http://www.amk.ca/python/2.1/ ; as usual, nitpicking comments are welcomed. --amk From nas@arctrix.com Mon Jan 22 10:00:43 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 02:00:43 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from mwh21@cam.ac.uk on Mon, Jan 22, 2001 at 04:12:47PM +0000 References: <14956.21865.943601.735426@localhost.localdomain> Message-ID: <20010122020043.A25687@glacier.fnational.com> On Mon, Jan 22, 2001 at 04:12:47PM +0000, Michael Hudson wrote: > Don't you just love memory corruption bugs? Great fun. I've played around with efence and debauch on the weekend. I even when as far as merging an updated fmalloc from the XFree source tree into debauch and writing a reporting script in Python. I probably would have caught the pyexpat overrun if I would have used efence with EF_ALIGNMENT=0 and complied with -fpack-struct. I'll have to try it tonight. Maybe something else will turn up. Neil From guido@digicool.com Mon Jan 22 17:12:29 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 12:12:29 -0500 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: Your message of "Mon, 22 Jan 2001 16:55:35 +0100." <20010122165535.P17392@xs4all.nl> References: <20010122165535.P17392@xs4all.nl> Message-ID: <200101221712.MAA00694@cj20424-a.reston1.va.home.com> > 'make distclean' seems broken, at least on non-GNU make's: > > [snip] > clobbering subdirectory Modules > rm -f *.o python core *~ [@,#]* *.old *.orig *.rej > rm -f add2lib hassignal > rm -f *.a tags TAGS config.c Makefile.pre > rm -f *.so *.sl so_locations > make -f ./Makefile.in SUBDIRS="Include Lib Misc Demo" clobber > "./Makefile.in", line 134: Need an operator > make: fatal errors encountered -- cannot continue > *** Error code 1 (ignored) > rm -f config.status config.log config.cache config.h Makefile > rm -f buildno platform > rm -f Modules/Makefile > [snip] > > (This is using FreeBSD's 'make'.) > > Looking at line 134, I'm not sure why it works with GNU make other than that > it avoids complaining about syntax errors it doesn't run into (which could > be both bad and good :) or that it avoids complaining about obvious GNU > autoconf tricks. But I don't know enough about make to say for sure, nor to > fix the above problem. There's one line in Makefile.in that trips over Make (mine also complains about it): @SET_DLLLIBRARY@ Looking at the code in configure.in that generates this macro: AC_SUBST(SET_DLLLIBRARY) LDLIBRARY='' SET_DLLLIBRARY='' . . (and later) . cygwin*) LDLIBRARY='libpython$(VERSION).dll.a' SET_DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' ;; I don't see why we couldn't change this so that Makefile.in just contains DLLLIBRARY= @DLLLIBRARY@ and then configure.in could be changed to AC_SUBST(DLLLIBRARY) LDLIBRARY='' DLLLIBRARY='' . . (and later) . cygwin*) LDLIBRARY='libpython$(VERSION).dll.a' DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' ;; Or am I missing something? Does this fix the problem? --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Mon Jan 22 17:21:09 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 12:21:09 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221602.LAA31103@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 11:02:15AM -0500 References: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> Message-ID: <20010122122109.A14952@thyrsus.com> Guido van Rossum : > While fixing the test_b1 code again, which depends on this behavior, I > thought of a refinement: it wouldn't be hard to make None compare > smaller than *anything* (including numbers). > > Is this worth it? I think so, if only for the sake of well-definedness. -- Eric S. Raymond "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin, Historical Review of Pennsylvania, 1759. From thomas@xs4all.net Mon Jan 22 17:25:30 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 18:25:30 +0100 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: <200101221712.MAA00694@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 12:12:29PM -0500 References: <20010122165535.P17392@xs4all.nl> <200101221712.MAA00694@cj20424-a.reston1.va.home.com> Message-ID: <20010122182530.E17295@xs4all.nl> On Mon, Jan 22, 2001 at 12:12:29PM -0500, Guido van Rossum wrote: > and then configure.in could be changed to > AC_SUBST(DLLLIBRARY) > LDLIBRARY='' > DLLLIBRARY='' > . > . (and later) > . > cygwin*) > LDLIBRARY='libpython$(VERSION).dll.a' > DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' > ;; You mean DLLLIBRARY='$(basename $(LDLIBRARY))' But yes, that fixes it. > Or am I missing something? Well, on *that* I'm not sure, that's why I asked :P If things in the Python source boggle me, they are always there for a good reason. Well, maybe just 'almost always', but practically always :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas@arctrix.com Mon Jan 22 10:39:59 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 02:39:59 -0800 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: <200101221712.MAA00694@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 12:12:29PM -0500 References: <20010122165535.P17392@xs4all.nl> <200101221712.MAA00694@cj20424-a.reston1.va.home.com> Message-ID: <20010122023959.A25798@glacier.fnational.com> [Guido on change SET_DLLLIBRARY] > Or am I missing something? I don't think so. My new Makefile uses "FOO = @FOO@" everywhere. SET_CXX is the same way in the current Makefile. Neil From esr@thyrsus.com Mon Jan 22 17:41:59 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 12:41:59 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? Message-ID: <20010122124159.A14999@thyrsus.com> \section{\module{set} --- Basic set algebra for Python} \declaremodule{standard}{set} \modulesynopsis{Basic set algebra operations on sequences.} \moduleauthor{Eric S. Raymond}{esr@thyrsus.com} \sectionauthor{Eric S. Raymond}{esr@thyrsus.com} The \module{set} module defines functions for treating lists and other sequences as mathematical sets, and defines a set class that uses these operations natively and overloads Python's standard operator set. The \module{set} functions work on any sequence type and return lists. The set methods can take a set or any sequence type as an argument. Set or sequence elements may be of any type and may be mutable. Comparisons and membership tests of elements against sequence objects are done using \keyword{in}, and so can be customized by supplying a suitable \method{__getattr__} method for the sequence type. The running time of these functions is O(n**2) in the worst case unless otherwise noted. For cases that can be short-circuited by cardinality comparisons, this has been done. \begin{funcdesc}{setify}{list1} Returns a list of the argument sequence's elements with duplicates removed. \end{funcdesc} \begin{funcdesc}{union}{list1, list2} Set union. All elements of both sets or sequences are returned. \end{funcdesc} \begin{funcdesc}{intersection}{list1, list2} Set intersection. All elements common to both sets or sequences are returned. \end{funcdesc} \begin{funcdesc}{difference}{list1, list2} Set difference. All elements of the first set or sequence not present in the second are returned. \end{funcdesc} \begin{funcdesc}{symmetric_difference}{list1, list2} Set symmetric difference. All elements present in one sequence or the other but not in both are returned. \end{funcdesc} \begin{funcdesc}{cartesian}{list1, list2} Returns a list of tuples consisting of all possible pairs of elements from the first and second sequences or sets. \end{funcdesc} \begin{funcdesc}{equality}{list1, list2} Set comparison. Return 1 if the two sets or sequences contain exactly the same elements, 0 or otherwise. \end{funcdesc} \begin{funcdesc}{subset}{list1, list2} Set subset test. Return 1 if all elements of the fiorst set or sequence are members of the second, 0 otherwise. \end{funcdesc} \begin{funcdesc}{proper_subset}{list1, list2} Set subset test, excluding equality. Return 1 if the arguments fail a set equality test, and all elements of the fiorst set or sequence are members of the second, 0 otherwise. \end{funcdesc} \begin{funcdesc}{powerset}{list1} Return the set of all subsets of the argument set or sequence. Warning: this produces huge results from small arguments and is O(2**n) in both running time and space requirements; you can readily run yourself out of memory using it. \end{funcdesc} \subsection{set Objects \label{set-objects}} A \class{set} instance uses the \module{set} module functions to implement set semantics on the list it contains, and to support a full set of Python list methods and operaors. Thus, the set methods can take a set or any sequence type as an argument. A set object contains a single data member: \begin{memberdesc}{elements} List containing the elements of the set. \end{memberdesc} Set objects can be treated as mutable sequences; they support the special methods \method{__len__}, \method{__getattr__}, \method{__setattr__}, and \method{__delattr__}. Through \method{__getattr__}, they support the memebership test via \keyword{in}. All the standard mutable-sequence methods \method{list}, \method{append}, \method{extend}, \method{count}, \method{index}, \method{insert} (the index argument is ignored), \method{pop}, \method{remove}, \method{reverse}, and \method{sort} are also supported. After method calls that add elements (\method{setattr}, \method{append}, \method{extend}, \method{insert}), the elements of the data member are re-setified, so it is not possible to introduce duplicates. Calling \function{repr()} on a set returns the result of calling \function{repr} on its element list. Calling \function{str()} returns a representation resembling mathematical notation for the set; an open set bracket, followed by a comma-separated list of \function{str()} representations of the elements, followed by a close set brackets. Set objects support the following Python operators: \begin {tableiii}{l|l|l}{code}{Operator}{Function}{Description} \lineiii{|,+}{union}{Union} \lineiii{&}{intersection}{Intersection} \lineiii{-}{difference}{Difference} \lineiii{^}{symmetric_difference}{Symmetric differe} \lineiii{*}{cartesian}{Cartesian product} \lineiii{==}{equality}{Equality test} \lineiii{!=,<>}{}{Inequality test} \lineiii{<}{proper_subset}{Proper-subset test} \lineiii{<=}{subset}{Subset test} \lineiii{>}{}{Proper superset test} \lineiii{>=}{}{Superset test} \end {tableiii} -- Eric S. Raymond Government is actually the worst failure of civilized man. There has never been a really good one, and even those that are most tolerable are arbitrary, cruel, grasping and unintelligent. -- H. L. Mencken From esr@snark.thyrsus.com Mon Jan 22 18:28:57 2001 From: esr@snark.thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 13:28:57 -0500 Subject: [Python-Dev] I still can't build HTML in a current CVS tree. Message-ID: <200101221828.f0MISvH15121@snark.thyrsus.com> Fred, I still can't build HTML documentation in a current CVS tree -- same complaint about lib/modindex.html being absent. Can we get this fixed before 2.1 ships? -- Eric S. Raymond ...Virtually never are murderers the ordinary, law-abiding people against whom gun bans are aimed. Almost without exception, murderers are extreme aberrants with lifelong histories of crime, substance abuse, psychopathology, mental retardation and/or irrational violence against those around them, as well as other hazardous behavior, e.g., automobile and gun accidents." -- Don B. Kates, writing on statistical patterns in gun crime From fredrik@effbot.org Mon Jan 22 18:33:56 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Mon, 22 Jan 2001 19:33:56 +0100 Subject: [Python-Dev] Python 2.1 article References: Message-ID: <059b01c084a1$e431e490$e46940d5@hagrid> > I've put together an almost-complete first draft of a "What's New in > 2.1" article. The only missing piece is a section on the Nested > Scopes PEP, which obviously has to wait for the changes to get checked > in. what's the current 2.1a1 eta? (pep 226 still says last friday) today? wednesday? this week? this month? Curious /F From mal@lemburg.com Mon Jan 22 18:33:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 19:33:24 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <20010122124159.A14999@thyrsus.com> Message-ID: <3A6C7CF4.F10AA77B@lemburg.com> [LaTeX file] Eric, we are all hackers, but plain LaTeX is not really the right format for a posting to a mailing list... at least not if you really expect feedback ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin@mira.cs.tu-berlin.de Mon Jan 22 18:36:16 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 22 Jan 2001 19:36:16 +0100 Subject: [Python-Dev] getcode() function in pyexpat.c Message-ID: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> > A few comments to explain this highly stylized and macro-laden code > would be appreciated. I probably can't do that before 2.1a1, but I promise to suggest something right afterwards. In general, the macro magic is designed to make the many expat callbacks available to Python. RC_HANDLER (for return code) is the most general template; VOID_HANDLER and INT_HANDLER are common specializations. In the core of RC_HANDLER, there a tuple is built and a Python function is called. The code used to do PyEval_CallObject right inside the macro; the call_with_frame feature is new compared to 2.0. It solves the specific problem of incomprehensible tracebacks. In a typical SAX application, the user code calls expatreader.ExpatParser.parse, which in turn calls self._parser.Parse(data, isFinal) Now, in 2.0, a common problem was a traceback self._parser.Parse(data, isFinal) TypeError: not enough arguments; expected 4, got 2 Everybody assumes a problem in the call to Parse; the real problem is in the call to the callback inside RC_HANDLER, which tried to call a user's function with two arguments that expected four. 2.1 would improve this slightly on its own, writing self._parser.Parse(data, isFinal) TypeError: characters() takes exactly 4 arguments (2 given) With that code, you get File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 81, in feed self._parser.Parse(data, isFinal) File "pyexpat.c", line 379, in CharacterData TypeError: characters() takes exactly 4 arguments (2 given) So that tells you that it is the CharacterData handler that invokes characters(). You are right that the frame object is not used otherwise; it is just there to make a nice traceback. > I simply don't understand what's going on -- and I'm deeply > suspicious that it is the source of whatever problems Tim is seeing > with test_sax. I thought so, too, at first; it turned out that the problem was elsewhere. Regards, Martin From guido@digicool.com Mon Jan 22 19:04:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 14:04:02 -0500 Subject: [Python-Dev] Python 2.1 article In-Reply-To: Your message of "Mon, 22 Jan 2001 19:33:56 +0100." <059b01c084a1$e431e490$e46940d5@hagrid> References: <059b01c084a1$e431e490$e46940d5@hagrid> Message-ID: <200101221904.OAA01170@cj20424-a.reston1.va.home.com> > what's the current 2.1a1 eta? (pep 226 still > says last friday) You missed my email that I sent out Friday. Tentatively it's going out tonight. No point in updating the PEP each time there's slippage. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jan 22 19:10:54 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 14:10:54 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Mon, 22 Jan 2001 12:41:59 EST." <20010122124159.A14999@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> Message-ID: <200101221910.OAA01218@cj20424-a.reston1.va.home.com> Eric, There's already a PEP on a set object type, and everybody and their aunt has already implemented a set datatype. If *your* set module is ready for prime time, why not publish it in the Vaults of Parnassus? --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Mon Jan 22 19:29:18 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 14:29:18 -0500 (EST) Subject: [Python-Dev] Re: getcode() function in pyexpat.c In-Reply-To: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> References: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> Message-ID: <14956.35342.724657.865367@localhost.localdomain> >>>>> "MvL" == Martin v Loewis writes: >> I simply don't understand what's going on -- and I'm deeply >> suspicious that it is the source of whatever problems Tim is >> seeing with test_sax. MvL> I thought so, too, at first; it turned out that the problem was MvL> elsewhere. What was the cause of that problem? I didn't see any mail after Tim's middle-of-the-night message "Worse news." Jeremy From tim.one@home.com Mon Jan 22 20:01:59 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:01:59 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > While fixing the test_b1 code again, which depends on this behavior, I > thought of a refinement: it wouldn't be hard to make None compare > smaller than *anything* (including numbers). > > Is this worth it? First, an attempt to see what Python did in this morning's CVS turned up an internal error for Jeremy: >>> [None < x for x in (1, 1L, 1j, 1.0, [1], {}, (1,))] name: None, in ?, file '', line 1 locals: {'[1]': 0, 'x': 1} globals: {} Fatal Python error: compiler did not label name as local or global abnormal program termination A simpler way to provoke that: >>> [None < 2 for x in "x"] name: None, in ?, file '', line 1 locals: {'[1]': 0, 'x': 1} globals: {} Fatal Python error: compiler did not label name as local or global Anyway, I think forcing None to be "the smallest" is cute! Inexpensive to do, and while I don't see a compelling *use* for it, I bet it would be least surprising to newbies. +1. From fdrake@acm.org Mon Jan 22 20:08:54 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 22 Jan 2001 15:08:54 -0500 (EST) Subject: [Python-Dev] Re: I still can't build HTML in a current CVS tree. In-Reply-To: <200101221828.f0MISvH15121@snark.thyrsus.com> References: <200101221828.f0MISvH15121@snark.thyrsus.com> Message-ID: <14956.37718.968912.189834@cj42289-a.reston1.va.home.com> Eric S. Raymond writes: > Fred, I still can't build HTML documentation in a current CVS tree -- same > complaint about lib/modindex.html being absent. Can we get this fixed > before 2.1 ships? I'm guessing I've lost a previous email on the topic, or it's buried in my inbox. If this is still a problem after today's checkins, could you please file a bug report and assign it to me? Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Mon Jan 22 20:26:15 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:26:15 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221555.KAA30935@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > IMHO, *this* *particular* gripe of Insure++ is just a pain in the > butt, and I wish there was a way to turn it off in Insure++ without > having to fix the code. Maybe there is. Barry? > IMHO, this was included in the standard to allow segmented-memory > implementations of C. Think certain DOS or Windows 3.1 memory models > where a pointer is a segment plus an offset. This is not current > practice even on Palmpilots! I could ask Tom MacDonald (former X3J11 chair), but don't want to bother him. The way these things usually turn out: the committee debated it 100 times over 10 years, but some committee member steadfastly claimed it was important. Since ANSI/ISO committees work via consensus, one implacable objector is enough. WRT pointers, I know that while the C committee did worry about segmented architectures a lot in the past, tagged architectures gave them much thornier problems (the HW tags each "word" with some manner of metadata (such as a busy/free or empty/full bit, or read+write permission bits, or a data type identifier, or a "capability" tag tying into a HW-enforced security architecture, ...), and checks those on each access, and some of the metadata can propagate into a pointer, and the HW can raise faults on pointer comparisons if the metadata doesn't match). While such machines aren't in common use, the US Govt does all sorts of things they don't talk about -- if it's not IBM's representative protecting a 40-year old architecture, it's someone emphatically not from the NSA protecting something they're not at liberty to discuss. Of course Python wants to run there too, even if we never hear about it ... > The standard may say that such comparisons are undefined, but I don't > care about this particular undefinedness, and I'm annoyed by the > required patches. Ya, and I'm annoyed that MS stdio corrupts itself -- but they're just clinging to the letter of the std too, and I've learned to live with it gracefully . pointer-ordering-comparisons-should-be-very-rare-anyway-ly y'rs - tim From tim.one@home.com Mon Jan 22 20:55:30 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:55:30 -0500 Subject: [Python-Dev] Worse news In-Reply-To: Message-ID: [Michael Hudson] > Hmm - my machine's done 28 exemplary "make clean; make test" runs this > morning. I last updated yesterday afternoon my time (~1700 GMT). So does mine now. The remaining failures require *unusual* ways of running the test suite (with -r to get test_cpickle to fail, confirmed now by Jeremy under Linux; and in an extremely specialized and seemingly Windows-specific way to get test_extcall to blow up w/ a bad pointer). From tim.one@home.com Mon Jan 22 21:07:27 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 16:07:27 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <14956.16758.68050.257212@localhost.localdomain> Message-ID: [Jeremy Hylton] > Funny (strange or haha?) that test_extcall is failing since the two > pieces of code I've modified most recently are compile.c and the > section of ceval.c that handles extended call syntax. Ya, I knew that, but I avoided wagging a Finger of Shame in your direction because coincidence isn't proof . > ... > As for the test_sax failure, There is no test_sax failure anywhere anymore that I know of (Martin found a dead-wrong array decl in contributed pyexpat.c code and repaired it). And I believe my "rt -x test_sax" failure in test_extcall almost certainly has nothing to do with test_sax -- far more likely the connection to test_sax is an accident, and that if I spend umpteen hours trying other things at random I'll provoke the same memory accident leading to a bad pointer via excluding some other test. I just picked test_sax because that *was* broken and I wanted to get thru the rest of the tests. BTW, delighted(?) to hear that test_cpickle fails for you too! I'm sure test_extcall is going to blow up for other people eventually too -- but it is sooooo hard to provoke even for me. I've dropped the effort pending news from someone running Insure++ or efence or whatever. From guido@digicool.com Mon Jan 22 21:18:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 16:18:26 -0500 Subject: [Python-Dev] Worse news In-Reply-To: Your message of "Mon, 22 Jan 2001 16:07:27 EST." References: Message-ID: <200101222118.QAA28305@cj20424-a.reston1.va.home.com> [Tim] > So does mine now. The remaining failures require *unusual* ways of running > the test suite (with -r to get test_cpickle to fail, confirmed now by Jeremy > under Linux; [and later] > BTW, delighted(?) to hear that test_cpickle fails for you too! This (test_cpickle) is a red herring -- it's a shallow failure in the test suite. test_cpickle imports test_pickle, but test_pickle first outputs the test output from testing pickle -- unless test_pickle has been run before! This succeeds: ./python Lib/test/regrtest.py test_cpickle test_pickle and this fails: ./python Lib/test/regrtest.py test_pickle test_cpickle Use regrtest.py -v to fidn out why. :-) I'm not sure how to restucture this, but it's not of the same quality as test_extcall or test_sax failing. Neither of those has failed for me on Linux during hours of testing. However on Windows I get an occasional appfail dialog box when using rt.bat. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@arctrix.com Mon Jan 22 14:44:00 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 06:44:00 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from tim.one@home.com on Mon, Jan 22, 2001 at 04:07:27PM -0500 References: <14956.16758.68050.257212@localhost.localdomain> Message-ID: <20010122064400.A26543@glacier.fnational.com> On Mon, Jan 22, 2001 at 04:07:27PM -0500, Tim Peters wrote: > I've dropped the effort pending news from someone running > Insure++ or efence or whatever. efence to the rescue! I compiled with -fstruct-pack and used EF_ALIGNMENT=0 and now I can trigger a core dump by running test_extcall. More news comming... Neil From tim.one@home.com Mon Jan 22 21:41:08 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 16:41:08 -0500 Subject: [Python-Dev] test_sax and site-python In-Reply-To: <20010122145733.85E51373C95@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > I'm not sure whether this is really a bug, but I had the problem > that there was something wrong with the xml package I had > installed into my Lib/site-python, and this caused test_sax to > complain. > > If the test stuff is expected to test only the core functionality > maybe sys.path should be edited so that it only contains directories > that are part of the core distribution? AFAIK, xml *is* considered part of the core now, and has been since 2.0 was released. The wisdom of that decision is debatable with hindsight, but AFAICT xml is in the same boat as, say, zlib now: not builtin, and requires 3rd-party code to work, but part of the core all the same. The Windows installer comes w/ the necessary xml (and zlib) pieces, and I suppose the Mac Python package also should. From nas@arctrix.com Mon Jan 22 15:00:57 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 07:00:57 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from tim.one@home.com on Mon, Jan 22, 2001 at 04:07:27PM -0500 References: <14956.16758.68050.257212@localhost.localdomain> Message-ID: <20010122070057.A26575@glacier.fnational.com> Perhaps this will help somone track down the bug: [running test_extcall...] unbound method method() must be called with instance as first argument unbound method method() must be called with instance as first argument Program received signal SIGSEGV, Segmentation fault. symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 4330 if (TYPE(c) == DOUBLESTAR) { (gdb) l 4325 symtable_add_def(st, STR(CHILD(n, i)), 4326 DEF_PARAM | DEF_STAR); 4327 i += 2; 4328 c = CHILD(n, i); 4329 } 4330 if (TYPE(c) == DOUBLESTAR) { 4331 i++; 4332 symtable_add_def(st, STR(CHILD(n, i)), 4333 DEF_PARAM | DEF_DOUBLESTAR); 4334 } (gdb) p c $3 = (node *) 0x42a43fff (gdb) p *c $4 = {n_type = 0, n_str = 0x0, n_lineno = 0, n_nchildren = 0, n_child = 0x0} (gdb) p n $5 = (node *) 0x42a3ffd7 (gdb) p *n $6 = {n_type = 261, n_str = 0x0, n_lineno = 1, n_nchildren = 2, n_child = 0x42a43fc3} (gdb) bt 10 #0 symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 #1 0x8060126 in symtable_funcdef (st=0x429bafd0, n=0x42a23feb) at Python/compile.c:4245 #2 0x805fd29 in symtable_node (st=0x429bafd0, n=0x429b0fc3) at Python/compile.c:4128 #3 0x80600da in symtable_node (st=0x429bafd0, n=0x4290cfeb) at Python/compile.c:4232 #4 0x805f443 in symtable_build (c=0xbffff5c8, n=0x4290cfeb) at Python/compile.c:3816 #5 0x805f130 in jcompile (n=0x4290cfeb, filename=0x80a040f "", base=0x0) at Python/compile.c:3720 #6 0x805f0c2 in PyNode_Compile (n=0x4290cfeb, filename=0x80a040f "") at Python/compile.c:3699 #7 0x8069adf in run_node (n=0x4290cfeb, filename=0x80a040f "", globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:915 #8 0x8069ac0 in run_err_node (n=0x4290cfeb, filename=0x80a040f "", globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:907 #9 0x8069a30 in PyRun_String ( str=0x429f9fd1 "def zv(*v): print \"ok zv\", a, b, d, e, v, k", start=257, globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:881 (More stack frames follow...) From thomas@xs4all.net Mon Jan 22 22:13:29 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 23:13:29 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <20010122070057.A26575@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 22, 2001 at 07:00:57AM -0800 References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> Message-ID: <20010122231329.A27785@xs4all.nl> On Mon, Jan 22, 2001 at 07:00:57AM -0800, Neil Schemenauer wrote: > Perhaps this will help somone track down the bug: > [running test_extcall...] > unbound method method() must be called with instance as first argument > unbound method method() must be called with instance as first argument > > Program received signal SIGSEGV, Segmentation fault. > symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 > 4330 if (TYPE(c) == DOUBLESTAR) { > (gdb) l > 4325 symtable_add_def(st, STR(CHILD(n, i)), > 4326 DEF_PARAM | DEF_STAR); > 4327 i += 2; > 4328 c = CHILD(n, i); > 4329 } > 4330 if (TYPE(c) == DOUBLESTAR) { > 4331 i++; > 4332 symtable_add_def(st, STR(CHILD(n, i)), > 4333 DEF_PARAM | DEF_DOUBLESTAR); > 4334 } > (gdb) p c > $3 = (node *) 0x42a43fff > (gdb) p *c > $4 = {n_type = 0, n_str = 0x0, n_lineno = 0, n_nchildren = 0, n_child = 0x0} > (gdb) p n > $5 = (node *) 0x42a3ffd7 > (gdb) p *n > $6 = {n_type = 261, n_str = 0x0, n_lineno = 1, n_nchildren = 2, > n_child = 0x42a43fc3} n_child is 0x42a43fc3. That's n_child[0]. 0x42a43fff is the child being handled now. That would be n_child[3] (0x42a43fff - 0x42a3ffd7 == 60, a struct node is 20 bytes.) But n_children is 2, so it's an off-by-two error somewhere -- and look, there's a "i += 2' right above it ! It *looks* like this code will blow up whenever you use '*eggs' without '**spam' in a funtion definition. That's a fairly wild guess, but it's worth a try. Try this patch: Index: Python/compile.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/compile.c,v retrieving revision 2.148 diff -c -c -r2.148 compile.c *** Python/compile.c 2001/01/22 04:35:57 2.148 --- Python/compile.c 2001/01/22 22:12:31 *************** *** 4324,4329 **** --- 4324,4331 ---- i++; symtable_add_def(st, STR(CHILD(n, i)), DEF_PARAM | DEF_STAR); + if (NCH(n) <= i+2) + return; i += 2; c = CHILD(n, i); } -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr@thyrsus.com Mon Jan 22 20:13:09 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 15:13:09 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101221910.OAA01218@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 02:10:54PM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> Message-ID: <20010122151309.C15236@thyrsus.com> Guido van Rossum : > There's already a PEP on a set object type, and everybody and their > aunt has already implemented a set datatype. I've just read the PEP. Greg's proposal has a couple of problems. The biggest one is that the interface design isn't very Pythonic -- it's formally adequate, but doesn't exploit the extent to which sets naturally have common semantics with existing Python sequence types. This is bad; it means that a lot of code that could otherwise ignore the difference between lists and sets would have to be specialized one way or the other for no good reason. The only other set module I can find in the Vaults or anywhere else is kjBuckets (which I knew about before). Looks like a good design, but complicated -- and requires installation of an extension. > If *your* set module is ready for prime time, why not publish it in > the Vaults of Parnassus? I suppose that's what I'll do if you don't bless it for the standard library. But here are the reasons I suggest you should do so: 1. It supports a set of operations that are both often useful and fiddly to get right, thus enhancing the "batteries are included" effect. (I used its ancestor for representing seen-message numbers in a specialized mailreader, for example.) 2. It's simple for application programmers to use. No extension module to integrate. 3. It's unsurprising. My set objects behave almost exactly like other mutable sequences, with all the same built-in methods working, except for the fact that you can't introduce duplicates with the mutators. 4. It's already completely documented in a form suitable for the library. 5. It's simple enough not to cause you maintainance hassles down the road, and even if it did the maintainer is unlikely to disappear :-). -- Eric S. Raymond The United States is in no way founded upon the Christian religion -- George Washington & John Adams, in a diplomatic message to Malta. From guido@digicool.com Mon Jan 22 22:29:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 17:29:26 -0500 Subject: [Python-Dev] test_sax and site-python In-Reply-To: Your message of "Mon, 22 Jan 2001 16:41:08 EST." References: Message-ID: <200101222229.RAA28667@cj20424-a.reston1.va.home.com> > [Jack Jansen] > > I'm not sure whether this is really a bug, but I had the problem > > that there was something wrong with the xml package I had > > installed into my Lib/site-python, and this caused test_sax to > > complain. > > > > If the test stuff is expected to test only the core functionality > > maybe sys.path should be edited so that it only contains directories > > that are part of the core distribution? > [Tim] > AFAIK, xml *is* considered part of the core now, and has been since 2.0 was > released. The wisdom of that decision is debatable with hindsight, but > AFAICT xml is in the same boat as, say, zlib now: not builtin, and requires > 3rd-party code to work, but part of the core all the same. The Windows > installer comes w/ the necessary xml (and zlib) pieces, and I suppose the > Mac Python package also should. Yes, but Jack was talking about a non-std xml package in site-python... I agree that this shouldn't be picked up. But is it worth taking draconian measures to avoid this? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Mon Jan 22 22:35:08 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 22 Jan 2001 17:35:08 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <200101222118.QAA28305@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This (test_cpickle) is a red herring -- it's a shallow failure in the > test suite. Fixed now -- thanks! Please note that Neil got text_extcall to fail in exactly the same place (see his recent Python-Dev) mail. That's the only remaining failure I know of. > ... > However on Windows I get an occasional appfail dialog box when > using rt.bat. I don't believe I've ever seen one of those ("appfail" rings no bells), and rt has never acted strangely for me. Your DOS-box properties may be screwed up: use Start -> Find -> Files or Folders ...; set "Look in" to C:; enter *.pif in the "Named:" box; click Find. You'll probably get a dozen hits. One of them will correspond to the method you use to open a DOS box (which I don't know). Right-click on that one and select Properties. On the Memory tab of the dialog that pops up, the four dropdown lists should have "Auto" selected. "Uses HMA" should be checked. Hmm ... looks like "Protected" *should* be checked but mine isn't ... oh, this goes on and on. I don't even know which version of Windows you're using here! How about I look at it next time I'm at your house ... From greg@cosc.canterbury.ac.nz Mon Jan 22 22:50:07 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 11:50:07 +1300 (NZDT) Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl> Message-ID: <200101222250.LAA01929@s454.cosc.canterbury.ac.nz> > 4330 if (TYPE(c) == DOUBLESTAR) { > 4325 symtable_add_def(st, STR(CHILD(n, i)), > 4326 DEF_PARAM | DEF_STAR); Shouldn't line 4330 say if (TYPE(c) == STAR) ? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From thomas@xs4all.net Mon Jan 22 22:56:02 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 23:56:02 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <200101222250.LAA01929@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Tue, Jan 23, 2001 at 11:50:07AM +1300 References: <20010122231329.A27785@xs4all.nl> <200101222250.LAA01929@s454.cosc.canterbury.ac.nz> Message-ID: <20010122235602.B27785@xs4all.nl> On Tue, Jan 23, 2001 at 11:50:07AM +1300, Greg Ewing wrote: > > 4330 if (TYPE(c) == DOUBLESTAR) { > > 4325 symtable_add_def(st, STR(CHILD(n, i)), > > 4326 DEF_PARAM | DEF_STAR); > Shouldn't line 4330 say if (TYPE(c) == STAR) ? No, that's line 4323. You can't have doublestar without having star, and star should precede doublestar. (Grammar should enforce that.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From paulp@ActiveState.com Mon Jan 22 23:02:07 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 22 Jan 2001 15:02:07 -0800 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> Message-ID: <3A6CBBEF.4732BFF2@ActiveState.com> Guido van Rossum wrote: > > .... > > Yes, wow! > > .... I apologize but I'm not clear on my responsibilities here, if any. I wrote a PEP for online help. I submitted a partial implementation. Ping wrote a full implementation that basically supercedes mine. There are various ideas for improving it, but I think that we agree that the core is solid. Several people have said that it should be moved into the core library. Nobody has said that it shouldn't. Whose move is it? What's next? Paul Prescod From fredrik@effbot.org Mon Jan 22 23:08:40 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 00:08:40 +0100 Subject: [Python-Dev] test___all__ fails if bsddb not available Message-ID: <079a01c084c8$43023e40$e46940d5@hagrid> test___all__ test test___all__ failed -- dbhash has no __all__ attribute maybe this test shouldn't depend on optional modules? From nas@arctrix.com Mon Jan 22 16:24:34 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 08:24:34 -0800 Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 22, 2001 at 11:13:29PM +0100 References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> <20010122231329.A27785@xs4all.nl> Message-ID: <20010122082433.B26765@glacier.fnational.com> On Mon, Jan 22, 2001 at 11:13:29PM +0100, Thomas Wouters wrote: > That's a fairly wild guess, but it's worth a try. Try this > patch: [...] Works for me. Neil From greg@cosc.canterbury.ac.nz Mon Jan 22 23:21:14 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 12:21:14 +1300 (NZDT) Subject: [Python-Dev] Worse news In-Reply-To: <20010122235602.B27785@xs4all.nl> Message-ID: <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> Thomas Wouters : > You can't have doublestar without having star What?!? You could in 1.5.2. Has that changed? Anyway, it just looked a bit odd that it seemed to be testing for DOUBLESTAR and then adding a DEF_STAR thing to the symtab. But I guess I should shut up until I've seen all of the code. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From thomas@xs4all.net Mon Jan 22 23:26:02 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 23 Jan 2001 00:26:02 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <200101222321.MAA01957@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Tue, Jan 23, 2001 at 12:21:14PM +1300 References: <20010122235602.B27785@xs4all.nl> <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> Message-ID: <20010123002602.C27785@xs4all.nl> On Tue, Jan 23, 2001 at 12:21:14PM +1300, Greg Ewing wrote: > Thomas Wouters : > > You can't have doublestar without having star > What?!? You could in 1.5.2. Has that changed? Sorry, my bad, I'm wrong. (I just tested this.) I could swear it was that way, but it's 0:25 right now, after a night with about 2 hours decent sleep, so ignore my delusions :) > Anyway, it just looked a bit odd that it seemed to be testing > for DOUBLESTAR and then adding a DEF_STAR thing to the symtab. > But I guess I should shut up until I've seen all of the code. No, it's not doing that. It's adding the symbol name to the symtab, with DEF_DOUBLESTAR as one of its flags. Not sure what the flag does, but I could guess. (But see the above mentioned delusions as to why I'm not doing that out loud anymore :-) The 'if' in front of it adds the symbol to the symtab with DEF_STAR as a flag, in the case of 'STAR' (rather than DOUBLESTAR). Really. go check :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Mon Jan 22 23:31:03 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 23 Jan 2001 00:31:03 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <20010123002602.C27785@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 23, 2001 at 12:26:02AM +0100 References: <20010122235602.B27785@xs4all.nl> <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> <20010123002602.C27785@xs4all.nl> Message-ID: <20010123003103.D27785@xs4all.nl> On Tue, Jan 23, 2001 at 12:26:02AM +0100, Thomas Wouters wrote: > On Tue, Jan 23, 2001 at 12:21:14PM +1300, Greg Ewing wrote: > > Thomas Wouters : > > > > You can't have doublestar without having star > > > What?!? You could in 1.5.2. Has that changed? > Sorry, my bad, I'm wrong. (I just tested this.) I could swear it was that > way, but it's 0:25 right now, after a night with about 2 hours decent sleep, > so ignore my delusions :) Ah, yeah, what I meant to *think* was: you can't have *spam *after* **eggs: >>> def foo(x, **kwarg, *arg) File "", line 1 def foo(x, **kwarg, *arg) ^ SyntaxError: invalid syntax So the logic of the latter part of the function seems okay (after the little patch I posted before.) Jeremy should give his expert opinion before it goes in, though, since it's his code :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Mon Jan 22 23:36:17 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 18:36:17 -0500 Subject: [Python-Dev] test___all__ fails if bsddb not available In-Reply-To: Your message of "Tue, 23 Jan 2001 00:08:40 +0100." <079a01c084c8$43023e40$e46940d5@hagrid> References: <079a01c084c8$43023e40$e46940d5@hagrid> Message-ID: <200101222336.SAA30480@cj20424-a.reston1.va.home.com> > test test___all__ failed -- dbhash has no __all__ attribute > > maybe this test shouldn't depend on optional modules? Fixed -- I just skip dbhash if bsddb can't be imported. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Tue Jan 23 00:38:28 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 19:38:28 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl> References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> <20010122231329.A27785@xs4all.nl> Message-ID: <14956.53892.651549.493268@localhost.localdomain> Thomas, Your patch has the right diagnosis, although I would write it a tad differently. NCH(n) <= i + 2 should be NCH(n) < i + 2, because CHILD(n, NCH(i)) is not valid. I'll check it in. Jeremy From jeremy@alum.mit.edu Tue Jan 23 01:23:56 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 20:23:56 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <20010119232323.70B03116392@oratrix.oratrix.nl> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> Message-ID: <14956.56620.706531.647341@localhost.localdomain> >>>>> "JJ" == Jack Jansen writes: JJ> Recently, Guido van Rossum said: >> > I get the impression that I'm currently seeing a non-NULL third >> > argument in my (C) methods even though the method is called >> > without keyword arguments. >> >> > Is this new semantics that I missed the discussion about, or is >> > this a bug? >> >> [...] Do you really need the NULL? JJ> The places that I know I was counting on the NULL now have "if ( JJ> kw && PyObject_IsTrue(kw))", so I'll just have to hope there JJ> aren't any more lingering in there. Guido, Does your query ("Do you really need the NULL?") mean that you don't care whether the argument is NULL or an empty dictionary? I could change the code to do either for 2.1a2, if you have a preference. Jeremy From guido@digicool.com Tue Jan 23 01:33:20 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 20:33:20 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Mon, 22 Jan 2001 20:23:56 EST." <14956.56620.706531.647341@localhost.localdomain> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> Message-ID: <200101230133.UAA04378@cj20424-a.reston1.va.home.com> > Guido, > > Does your query ("Do you really need the NULL?") mean that you don't > care whether the argument is NULL or an empty dictionary? I could > change the code to do either for 2.1a2, if you have a preference. > > Jeremy Robust code IMO should treat NULL and {} the same. But since traditionally we passed NULL, it's better to pass NULL rather than {}. I believe that's the status quo now, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Tue Jan 23 01:54:53 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 20:54:53 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <200101230133.UAA04378@cj20424-a.reston1.va.home.com> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> <200101230133.UAA04378@cj20424-a.reston1.va.home.com> Message-ID: <14956.58477.874472.190937@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: [Jeremy wrote:] >> Does your query ("Do you really need the NULL?") mean that you >> don't care whether the argument is NULL or an empty dictionary? >> I could change the code to do either for 2.1a2, if you have a >> preference. GvR> Robust code IMO should treat NULL and {} the same. But since GvR> traditionally we passed NULL, it's better to pass NULL rather GvR> than {}. I believe that's the status quo now, right? The current status in CVS is to pass {}, because there appeared to be some case where a PyCFunction was not expecting NULL. I assumed, without checking, that {} was required and change the implementation to always pass a dictionary to METH_KEYWORDS functions. I could change it back to NULL and see if I can reproduce the error I was seeing. Jeremy From guido@digicool.com Tue Jan 23 02:01:12 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 21:01:12 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Mon, 22 Jan 2001 20:54:53 EST." <14956.58477.874472.190937@localhost.localdomain> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> <200101230133.UAA04378@cj20424-a.reston1.va.home.com> <14956.58477.874472.190937@localhost.localdomain> Message-ID: <200101230201.VAA15993@cj20424-a.reston1.va.home.com> > [Jeremy wrote:] > >> Does your query ("Do you really need the NULL?") mean that you > >> don't care whether the argument is NULL or an empty dictionary? > >> I could change the code to do either for 2.1a2, if you have a > >> preference. > > GvR> Robust code IMO should treat NULL and {} the same. But since > GvR> traditionally we passed NULL, it's better to pass NULL rather > GvR> than {}. I believe that's the status quo now, right? > > The current status in CVS is to pass {}, because there appeared to be > some case where a PyCFunction was not expecting NULL. I assumed, > without checking, that {} was required and change the implementation > to always pass a dictionary to METH_KEYWORDS functions. I could > change it back to NULL and see if I can reproduce the error I was > seeing. Yes, that's a good idea. I hope that the {} in alpha 1 won't make folks think that they will never see a NULL in the future and code accordingly... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 23 02:15:11 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 21:15:11 -0500 Subject: [Python-Dev] 2.1a1 release tonight -- but no nested scopes or weak refs Message-ID: <200101230215.VAA16577@cj20424-a.reston1.va.home.com> We've decided to release 2.1a1 without further ado, but without two big hopeful patches: Jeremy's nested scopes aren't finished and will take considerably more time, and Fred's weak references need more review (I haven't had the time to look at the code). Rather than wait longer, I've decided to try and release 2.1a1 tonight -- there's nothing I'm waiting for now before I can cut a tarball. There will be an alpha2 release around February 1. Please don't make any check-ins until I announce the 2.1a1 release here. (PythonLabs: please mail or phone me if you need to check in a last-minute thing -- I'm tagging the tree now.) More news as it happens, --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Tue Jan 23 02:36:24 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 22 Jan 2001 20:36:24 -0600 (CST) Subject: [Python-Dev] test_grammar failing Message-ID: <14956.60968.363878.643640@beluga.mojam.com> At the end of this: make distclean ; ./configure ; make OPT='-g -pipe' ; make test I get this: rm -f ./Lib/test/*.py[co] PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l test_grammar name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 locals: {'x': 2, '[1]': 1, 'l': 0} globals: {} Fatal Python error: compiler did not label name as local or global make: *** [test] Aborted PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l test_grammar name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 locals: {'x': 2, '[1]': 1, 'l': 0} globals: {} Fatal Python error: compiler did not label name as local or global make: *** [test] Aborted Any ideas? I notice that Jeremy checked in some changes to test_grammar.py this evening. Skip From gvwilson@nevex.com Tue Jan 23 02:47:33 2001 From: gvwilson@nevex.com (Greg Wilson) Date: Mon, 22 Jan 2001 21:47:33 -0500 (EST) Subject: [Python-Dev] re: I think my set module is ready for prime time Message-ID: > > Guido van Rossum: > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. > Eric Raymond: > Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > ...doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. I agree with Eric's point; I put the interface design on hold while I went off to try to find an efficient implementation capable of handling mutable values (i.e. one that would allow things like sets of sets). I'm still looking :-(, but would appreciate comments from this list on Eric's interface. Thanks, Greg From guido@digicool.com Tue Jan 23 03:02:50 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 22:02:50 -0500 Subject: [Python-Dev] test_grammar failing In-Reply-To: Your message of "Mon, 22 Jan 2001 20:36:24 CST." <14956.60968.363878.643640@beluga.mojam.com> References: <14956.60968.363878.643640@beluga.mojam.com> Message-ID: <200101230302.WAA27104@cj20424-a.reston1.va.home.com> > At the end of this: > > make distclean ; ./configure ; make OPT='-g -pipe' ; make test > > I get this: > > rm -f ./Lib/test/*.py[co] > PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l > test_grammar > name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 > locals: {'x': 2, '[1]': 1, 'l': 0} > globals: {} > Fatal Python error: compiler did not label name as local or global > make: *** [test] Aborted > PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l > test_grammar > name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 > locals: {'x': 2, '[1]': 1, 'l': 0} > globals: {} > Fatal Python error: compiler did not label name as local or global > make: *** [test] Aborted > > Any ideas? I notice that Jeremy checked in some changes to test_grammar.py > this evening. Try another cvs update and rebuild. The test that Jeremy checked in is supposed to catch a bug in the compiler code that he checked in. The latest compile.c is 103277 bytes long (in Unix). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 23 03:33:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 22:33:02 -0500 Subject: [Python-Dev] Python 2.1 alpha 1 released! Message-ID: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Thanks to the PythonLabs developers and the many hard-working volunteers, I'm proud to release Python 2.1a1 -- the first alpha release of Python version 2.1. The release mechanics are different than for previous releases: we're only releasing through SourceForge for now. The official source tarball is already available from the download page: http://sourceforge.net/project/showfiles.php?group_id=5470 Additional files will be released soon: a Windows installer, Linux RPMs, and documentation. Please give it a good try! The only way Python 2.1 can become a rock-solid product is if people test the alpha releases. Especially if you are using Python for demanding applications or on extreme platforms we are interested in hearing your feedback. Are you embedding Python or using threads? Please test your application using Python 2.1a1! Please submit all bug reports through SourceForge: http://sourceforge.net/bugs/?group_id=5470 Here's the NEWS file: What's New in Python 2.1 alpha 1? ================================= Core language, builtins, and interpreter - There is a new Unicode companion to the PyObject_Str() API called PyObject_Unicode(). It behaves in the same way as the former, but assures that the returned value is an Unicode object (applying the usual coercion if necessary). - The comparison operators support "rich comparison overloading" (PEP 207). C extension types can provide a rich comparison function in the new tp_richcompare slot in the type object. The cmp() function and the C function PyObject_Compare() first try the new rich comparison operators before trying the old 3-way comparison. There is also a new C API PyObject_RichCompare() (which also falls back on the old 3-way comparison, but does not constrain the outcome of the rich comparison to a Boolean result). The rich comparison function takes two objects (at least one of which is guaranteed to have the type that provided the function) and an integer indicating the opcode, which can be Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, Py_GE (for <, <=, ==, !=, >, >=), and returns a Python object, which may be NotImplemented (in which case the tp_compare slot function is used as a fallback, if defined). Classes can overload individual comparison operators by defining one or more of the methods__lt__, __le__, __eq__, __ne__, __gt__, __ge__. There are no explicit "reflected argument" versions of these; instead, __lt__ and __gt__ are each other's reflection, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reflection (similar at the C level). No other implications are made; in particular, Python does not assume that == is the Boolean inverse of !=, or that < is the Boolean inverse of >=. This makes it possible to define types with partial orderings. Classes or types that want to implement (in)equality tests but not the ordering operators (i.e. unordered types) should implement == and !=, and raise an error for the ordering operators. It is possible to define types whose rich comparison results are not Boolean; e.g. a matrix type might want to return a matrix of bits for A < B, giving elementwise comparisons. Such types should ensure that any interpretation of their value in a Boolean context raises an exception, e.g. by defining __nonzero__ (or the tp_nonzero slot at the C level) to always raise an exception. - Complex numbers use rich comparisons to define == and != but raise an exception for <, <=, > and >=. Unfortunately, this also means that cmp() of two complex numbers raises an exception when the two numbers differ. Since it is not mathematically meaningful to compare complex numbers except for equality, I hope that this doesn't break too much code. - Functions and methods now support getting and setting arbitrarily named attributes (PEP 232). Functions have a new __dict__ (a.k.a. func_dict) which hold the function attributes. Methods get and set attributes on their underlying im_func. It is a TypeError to set an attribute on a bound method. - The xrange() object implementation has been improved so that xrange(sys.maxint) can be used on 64-bit platforms. There's still a limitation that in this case len(xrange(sys.maxint)) can't be calculated, but the common idiom "for i in xrange(sys.maxint)" will work fine as long as the index i doesn't actually reach 2**31. (Python uses regular ints for sequence and string indices; fixing that is much more work.) - Two changes to from...import: 1) "from M import X" now works even if M is not a real module; it's basically a getattr() operation with AttributeError exceptions changed into ImportError. 2) "from M import *" now looks for M.__all__ to decide which names to import; if M.__all__ doesn't exist, it uses M.__dict__.keys() but filters out names starting with '_' as before. Whether or not __all__ exists, there's no restriction on the type of M. - File objects have a new method, xreadlines(). This is the fastest way to iterate over all lines in a file: for line in file.xreadlines(): ...do something to line... See the xreadlines module (mentioned below) for how to do this for other file-like objects. - Even if you don't use file.xreadlines(), you may expect a speedup on line-by-line input. The file.readline() method has been optimized quite a bit in platform-specific ways: on systems (like Linux) that support flockfile(), getc_unlocked(), and funlockfile(), those are used by default. On systems (like Windows) without getc_unlocked(), a complicated (but still thread-safe) method using fgets() is used by default. You can force use of the fgets() method by #define'ing USE_FGETS_IN_GETLINE at build time (it may be faster than getc_unlocked()). You can force fgets() not to be used by #define'ing DONT_USE_FGETS_IN_GETLINE (this is the first thing to try if std test test_bufio.py fails -- and let us know if it does!). - In addition, the fileinput module, while still slower than the other methods on most platforms, has been sped up too, by using file.readlines(sizehint). - Support for run-time warnings has been added, including a new command line option (-W) to specify the disposition of warnings. See the description of the warnings module below. - Extensive changes have been made to the coercion code. This mostly affects extension modules (which can now implement mixed-type numerical operators without having to use coercion), but occasionally, in boundary cases the coercion semantics have changed subtly. Since this was a terrible gray area of the language, this is considered an improvement. Also note that __rcmp__ is no longer supported -- instead of calling __rcmp__, __cmp__ is called with reflected arguments. - In connection with the coercion changes, a new built-in singleton object, NotImplemented is defined. This can be returned for operations that wish to indicate they are not implemented for a particular combination of arguments. From C, this is Py_NotImplemented. - The interpreter accepts now bytecode files on the command line even if they do not have a .pyc or .pyo extension. On Linux, after executing echo ':pyc:M::\x87\xc6\x0d\x0a::/usr/local/bin/python:' > /proc/sys/fs/binfmt_misc/register any byte code file can be used as an executable (i.e. as an argument to execve(2)). - %[xXo] formats of negative Python longs now produce a sign character. In 1.6 and earlier, they never produced a sign, and raised an error if the value of the long was too large to fit in a Python int. In 2.0, they produced a sign if and only if too large to fit in an int. This was inconsistent across platforms (because the size of an int varies across platforms), and inconsistent with hex() and oct(). Example: >>> "%x" % -0x42L '-42' # in 2.1 'ffffffbe' # in 2.0 and before, on 32-bit machines >>> hex(-0x42L) '-0x42L' # in all versions of Python The behavior of %d formats for negative Python longs remains the same as in 2.0 (although in 1.6 and before, they raised an error if the long didn't fit in a Python int). %u formats don't make sense for Python longs, but are allowed and treated the same as %d in 2.1. In 2.0, a negative long formatted via %u produced a sign if and only if too large to fit in an int. In 1.6 and earlier, a negative long formatted via %u raised an error if it was too big to fit in an int. - Dictionary objects have an odd new method, popitem(). This removes an arbitrary item from the dictionary and returns it (in the form of a (key, value) pair). This can be useful for algorithms that use a dictionary as a bag of "to do" items and repeatedly need to pick one item. Such algorithms normally end up running in quadratic time; using popitem() they can usually be made to run in linear time. Standard library - In the time module, the time argument to the functions strftime, localtime, gmtime, asctime and ctime is now optional, defaulting to the current time (in the local timezone). - The ftplib module now defaults to passive mode, which is deemed a more useful default given that clients are often inside firewalls these days. Note that this could break if ftplib is used to connect to a *server* that is inside a firewall, from outside; this is expected to be a very rare situation. To fix that, you can call ftp.set_pasv(0). - The module site now treats .pth files not only for path configuration, but also supports extensions to the initialization code: Lines starting with import are executed. - There's a new module, warnings, which implements a mechanism for issuing and filtering warnings. There are some new built-in exceptions that serve as warning categories, and a new command line option, -W, to control warnings (e.g. -Wi ignores all warnings, -We turns warnings into errors). warnings.warn(message[, category]) issues a warning message; this can also be called from C as PyErr_Warn(category, message). - A new module xreadlines was added. This exports a single factory function, xreadlines(). The intention is that this code is the absolutely fastest way to iterate over all lines in an open file(-like) object: import xreadlines for line in xreadlines.xreadlines(file): ...do something to line... This is equivalent to the previous the speed record holder using file.readlines(sizehint). Note that if file is a real file object (as opposed to a file-like object), this is equivalent: for line in file.xreadlines(): ...do something to line... - The bisect module has new functions bisect_left, insort_left, bisect_right and insort_right. The old names bisect and insort are now aliases for bisect_right and insort_right. XXX_right and XXX_left methods differ in what happens when the new element compares equal to one or more elements already in the list: the XXX_left methods insert to the left, the XXX_right methods to the right. Code that doesn't care where equal elements end up should continue to use the old, short names ("bisect" and "insort"). - The new curses.panel module wraps the panel library that forms part of SYSV curses and ncurses. Contributed by Thomas Gellekum. - The SocketServer module now sets the allow_reuse_address flag by default in the TCPServer class. - A new function, sys._getframe(), returns the stack frame pointer of the caller. This is intended only as a building block for higher-level mechanisms such as string interpolation. Build issues - For Unix (and Unix-compatible) builds, configuration and building of extension modules is now greatly automated. Rather than having to edit the Modules/Setup file to indicate which modules should be built and where their include files and libraries are, a distutils-based setup.py script now takes care of building most extension modules. All extension modules built this way are built as shared libraries. Only a few modules that must be linked statically are still listed in the Setup file; you won't need to edit their configuration. - Python should now build out of the box on Cygwin. If it doesn't, mail to Jason Tishler (jlt63 at users.sourceforge.net). - Python now always uses its own (renamed) implementation of getopt() -- there's too much variation among C library getopt() implementations. - C++ compilers are better supported; the CXX macro is always set to a C++ compiler if one is found. Windows changes - select module: By default under Windows, a select() call can specify no more than 64 sockets. Python now boosts this Microsoft default to 512. If you need even more than that, see the MS docs (you'll need to #define FD_SETSIZE and recompile Python from source). - Support for Windows 3.1, DOS and OS/2 is gone. The Lib/dos-8x3 subdirectory is no more! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Tue Jan 23 04:11:09 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:11:09 -0800 (PST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <3A6CBBEF.4732BFF2@ActiveState.com> Message-ID: Guido van Rossum wrote: > Yes, wow! Paul Prescod wrote: > I apologize but I'm not clear on my responsibilities here, if any. I > wrote a PEP for online help. I submitted a partial implementation. Hi, guys. Sorry i haven't been sending updates on what i'm doing. Here's the current picture as i see it. > Ping wrote a full implementation that basically supercedes mine. My implementation is "full" in that it deploys and seems to work on arbitrary modules as it stands, but it doesn't really supercede Paul's because it leaves out the big piece of Paul's work that did conversion from packaged HTML docs to plain text. It also has the deficiency that it imports modules live; for untrusted modules, this is a security risk. I know Paul has been working on stuff to compile a module into a kind of skeleton object that has all the same name bindings but no live contents, and if that works reliably, we should definitely try plugging that in. > There are various ideas for improving it, but I think that we agree > that the core is solid. Yes. I believe that as it stands, pydoc is useful enough to be a net positive addition to the core. inspect.py alone has been stable and alpha-ready for some time, i believe. Here is a summary of its status and work that remains. pydoc has: inspecting live objects generating text docs from live objects generating HTML docs from live objects serving HTML docs from a little web server showing docs from the command line showing docs from within the interactive interpreter apropos-style module listing It's missing the following, and Paul had stuff for this: inspecting unsafe modules generating text docs from packaged HTML (e.g. language reference) It also needs these: generating docs from a file given on the command line (easy) more Windows and Mac testing and decisions various small bugfixes This past week i've been messing around with Windows and Mac stuff, trying to see whether it's possible to reliably spawn a webserver and launch a web browser at the same time (this would seem to be a good default action to do on GUI platforms). In trying to do the latter i've found the webbrowser module pretty unreliable, by the way. For example, it relies on a constant delay of 4 seconds to launch a new browser that can't be expected on all platforms, and fails to launch Netscape 3 because it supplies an illegal command-line option. When i've found good cross-platform ways to make this work i'll suggest some patches. I've so far considered this project blocked only on cross-platform testing -- do you agree? While i know that inspecting unsafe modules and processing packaged HTML are important features, i don't consider them essential. -- ?!ng From ping@lfw.org Tue Jan 23 04:14:50 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:14:50 -0800 (PST) Subject: [Python-Dev] webbrowser.py In-Reply-To: Message-ID: On Mon, 22 Jan 2001, Ka-Ping Yee wrote: > In trying to do the latter i've found the webbrowser module pretty > unreliable, by the way. For example, it relies on a constant delay > of 4 seconds to launch a new browser that can't be expected on all > platforms, and fails to launch Netscape 3 because it supplies an > illegal command-line option. When i've found good cross-platform > ways to make this work i'll suggest some patches. Oh, and i forgot to mention... i was pretty disappointed that: setenv BROWSER my_browser_program python -c 'import webbrowser; webbrowser.open("http://python.org/")' doesn't execute "my_browser_program http://python.org/" as i would have hoped. Even for a known browser type: setenv BROWSER lynx python -c 'import webbrowser; webbrowser.open("http://python.org/")' does not work as expected, either. (Red Hat Linux here.) -- ?!ng From ping@lfw.org Tue Jan 23 04:22:56 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:22:56 -0800 (PST) Subject: [Python-Dev] Is X a (sequence|mapping)? Message-ID: We can implement abstract interfaces (sequence, mapping, number) in Python with the appropriate __special__ methods, but i don't see an easy way to test if something supports one of these abstract interfaces in Python. At the moment, to see if something is a sequence i believe i have to say something like try: x[0] except: # not a sequence else: # okay, it's a sequence or if hasattr(x, '__getitem__') or type(x) in [type(()), type([])]: ... Is there, or should there be, a better way to do this? -- ?!ng From greg@cosc.canterbury.ac.nz Tue Jan 23 04:46:26 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 17:46:26 +1300 (NZDT) Subject: [Python-Dev] re: I think my set module is ready for prime time In-Reply-To: Message-ID: <200101230446.RAA01992@s454.cosc.canterbury.ac.nz> Greg Wilson : > an efficient implementation capable of > handling mutable values (i.e. one that would allow things like sets of > sets) I suspect that such a thing is impossible. To avoid a linear search you have to take advantage of some kind of hashing or ordering, which you can't do if your objects can change their values out from under you. Also, there's nothing to stop someone from mutating two previously unequal elements so that they're equal. Then you have a "set" with two identical elements, which isn't a set any more, it's just a collection. So, I submit that the very concept of a set only makes sense for immutable values. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Tue Jan 23 05:03:18 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 23 Jan 2001 00:03:18 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: Message-ID: [?!ng] > ... > At the moment, to see if something is a sequence i believe i have to > say something like > > try: > x[0] > except: > # not a sequence > else: > # okay, it's a sequence > > or > > if hasattr(x, '__getitem__') or type(x) in [type(()), type([])]: > ... > > Is there, or should there be, a better way to do this? Dunno. What's a sequence? If you want to know whether x[0] will blow up, trying x[0] is the most obvious way. BTW, I expect trying x[:0] is a better idea: doesn't succeed for dicts, and doesn't blow up for an irrelevant reason if x is an empty sequence. BTW2, your second method suggests an uncomfortable truth: many contexts that want "a sequence" don't want strings to pass the test, despite that strings are as much sequences as lists in Python, no matter how "a sequence" is defined. afraid that-what-you-want-to-do-with-it-is-more-important-than-what- python-calls-it-ly y'rs - tim From ping@lfw.org Tue Jan 23 05:27:30 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 21:27:30 -0800 (PST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010122124159.A14999@thyrsus.com> Message-ID: On Mon, 22 Jan 2001, Eric S. Raymond wrote: > \section{\module{set} --- > Basic set algebra for Python} I'd like to look at the module. Did you actually show us the code for this, or am i a blind doofus? (Please, no answers to the unasked question of whether i am a doofus.) -- ?!ng From tim.one@home.com Tue Jan 23 06:05:26 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 23 Jan 2001 01:05:26 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <20010122064400.A26543@glacier.fnational.com> Message-ID: In finding and repairing the test_extcall bug, Neil and Thomas have once again contributed beyond the call of duty. Thank you! It took some doing to convince Guido to release his Dutch Death Grip on the PythonLabs coffers, but in the end he was overcome by the moral necessity of rewarding you sterling fellows for your golden deeds: you're both entitled to free(*)-- yes, FREE(*)! --copies of all Python 2.1 alpha, *and* beta, releases(*)! you-wouldn't-believe-how-much-he-charges-us-ly y'rs - tim (*) Does not apply to Jython releases. All applicable taxes are the responsibility of the recipient. No warranty is expressed or implied. This offer has not been reviewed or approved by CWI, CNRI, BeOpen.com, or Digital Creations 2. Export restrictions may apply. By acceptance of this offer, recipient grants perpetual license to use their name, image and likeness in Python promotional materials without compensation. Packaging, handling, shipping and insurance costs to be borne by recipient, but in no case to exceed 1 (one) US$/byte. This offer may be withdrawn at any time, including but not limited to retroactively, at the sole discretion of Guido van Rossum, or such of his heirs and successors as he may designate from time to time. From martin@mira.cs.tu-berlin.de Tue Jan 23 08:14:32 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 23 Jan 2001 09:14:32 +0100 Subject: [Python-Dev] Is X a (sequence|mapping)? Message-ID: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> > i don't see an easy way to test if something supports one of these > abstract interfaces in Python. Why do you want to test for that? If you have an algorithm that only operates on integer-indexed things, what can you do if the test fails? So it is always better to just use the object in the algorithm, and let it break with an exception if somebody passes a bad object. Regards, Martin From mal@lemburg.com Tue Jan 23 09:08:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 10:08:24 +0100 Subject: [Python-Dev] webbrowser.py References: Message-ID: <3A6D4A08.B3806984@lemburg.com> Ka-Ping Yee wrote: > > On Mon, 22 Jan 2001, Ka-Ping Yee wrote: > > In trying to do the latter i've found the webbrowser module pretty > > unreliable, by the way. For example, it relies on a constant delay > > of 4 seconds to launch a new browser that can't be expected on all > > platforms, and fails to launch Netscape 3 because it supplies an > > illegal command-line option. When i've found good cross-platform > > ways to make this work i'll suggest some patches. > > Oh, and i forgot to mention... i was pretty disappointed that: > > setenv BROWSER my_browser_program > python -c 'import webbrowser; webbrowser.open("http://python.org/")' > > doesn't execute "my_browser_program http://python.org/" as i would > have hoped. Even for a known browser type: > > setenv BROWSER lynx > python -c 'import webbrowser; webbrowser.open("http://python.org/")' > > does not work as expected, either. (Red Hat Linux here.) Hmm, lynx should work (the module has explicit support for it) and yes, I agree, webbrowser should trust BROWSER and use a generic calling mechanism (program ) for opening the URL. Too late for 2.1a1, but maybe for a2 ?! BTW, I think that the second line here is causing the problem: class CommandLineBrowser: _browsers = [] # <- this overrides the global of the same name if os.environ.get("DISPLAY"): _browsers.extend([ ("netscape", "netscape %s >/dev/null &"), ("mosaic", "mosaic %s >/dev/null &"), ]) _browsers.extend([ ("lynx", "lynx %s"), ("w3m", "w3m %s"), ]) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Jan 23 09:15:11 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 10:15:11 +0100 Subject: [Python-Dev] Is X a (sequence|mapping)? References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> Message-ID: <3A6D4B9F.38B17046@lemburg.com> "Martin v. Loewis" wrote: > > > i don't see an easy way to test if something supports one of these > > abstract interfaces in Python. > > Why do you want to test for that? If you have an algorithm that only > operates on integer-indexed things, what can you do if the test fails? > > So it is always better to just use the object in the algorithm, and > let it break with an exception if somebody passes a bad object. Right. Polymorphic code will usually get you more out of an algorithm, than type-safe or interface-safe code. BTW, there are Python interfaces to PySequence_Check() and PyMapping_Check() burried in the builtin operator module in case you really do care ;) ... operator.isSequenceType() operator.isMappingType() + some other C style _Check() APIs These only look at the type slots though, so Python instances will appear to support everything but when used fail with an exception if they don't provide the proper __xxx__ hooks. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Tue Jan 23 09:17:30 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 04:17:30 -0500 Subject: [Python-Dev] webbrowser.py Message-ID: <20010123041730.A25165@thyrsus.com> Ping's complaints are justified -- I've been looking at and testing webbrowser.py and it's a mess. Among other things: 1. The BROWSER variable is not interpreted properly. 2. The code is stupid about loading platform support it doesn't need. 3. It's not possible to specify lynx as a browser under Unix, because the computation of available browsers is split in two and partly done inside the CommandLineBrowser class. 3. The module code is excessively hard to read, obscuring these bugs. Our mistake was hurriedly merging the launcher code from IDLE with the browser-finder hack I wrote (the guts of CommandLineBrowser). The resulting code is a bad, overcomplicated architecture with a nasty seam in it. As co-designer/implementor I should have caught this sooner, but I was in a hurry to get a CML2 prototype out the door and didn't test anything but the case I needed. My apologies to all. I'm rewriting to fix these problems now. Documented semantics of entry points will be preserved. -- Eric S. Raymond The politician attempts to remedy the evil by increasing the very thing that caused the evil in the first place: legal plunder. -- Frederick Bastiat From mal@lemburg.com Tue Jan 23 10:26:16 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 11:26:16 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> Message-ID: <3A6D5C48.A076DA0@lemburg.com> "Eric S. Raymond" wrote: > > Guido van Rossum : > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. > > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. > > The only other set module I can find in the Vaults or anywhere else is > kjBuckets (which I knew about before). Looks like a good design, but > complicated -- and requires installation of an extension. There's also a kjSet.py available at Aaron's site: http://www.chordate.com/kwParsing/index.html which is a pure Python version of the C extenion's kjSet type. > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: > > 1. It supports a set of operations that are both often useful and > fiddly to get right, thus enhancing the "batteries are included" > effect. (I used its ancestor for representing seen-message numbers in > a specialized mailreader, for example.) > > 2. It's simple for application programmers to use. No extension module > to integrate. > > 3. It's unsurprising. My set objects behave almost exactly like other > mutable sequences, with all the same built-in methods working, except for > the fact that you can't introduce duplicates with the mutators. > > 4. It's already completely documented in a form suitable for the library. > > 5. It's simple enough not to cause you maintainance hassles down the > road, and even if it did the maintainer is unlikely to disappear :-). All very well, but are sets really that essential to every day Python programming ? If we include sets then we ought to also include graphs, tries, btrees and all those other goodies we have in computer science. All of these types are available out there, but I believe the audience who really cares for these types is also capable of downloading the extensions and installing them. It would be nice if all of these extension could go into a SUMO edition of Python though... together with your set module. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Tue Jan 23 11:08:06 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 06:08:06 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D5C48.A076DA0@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 11:26:16AM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> Message-ID: <20010123060806.A25436@thyrsus.com> M.-A. Lemburg : > All very well, but are sets really that essential to every > day Python programming ? If we include sets then we ought to > also include graphs, tries, btrees and all those other goodies > we have in computer science. I use sets a lot. And there was enough demand to generate a PEP. But the wider question here is how seriously we take "batteries are included" as a design principle. Does a facility have to be useful *every day* to be worth being in the standard library? And if so, what are things like the POP3 and IMAP libraries (or, for that matter, my own shlex and netrc modules) doing there? I don't think so. I think there are at least four different possible reasons for something to be in the standard library: 1. It's useful every day. 2. It's useful less frequently than every day, but is a stable cross-platform implementation of a wheel that would otherwise have to be reinvented frequently. That is, you can solve it *once* and have a zero-maintainance increment to the power of the language. 3. It's a technique that's not often used, and not necessarily stable in the face of platform variations, but nothing else will do when you need it and it's notably difficult to get right. (popen2 and BaseHTTPServer would be good examples of this.) 4. It's a developer checklist feature that improves Python's competitive position against Perl, Tcl, and other contenders for the same ecological niche. IMO a lightweight set facility, like POP3 and IMAP, qualifies under 2 and 4 even if not under 1 and 3. This question keeps coming up in different guises. I'm often the one to raise it, because I favor an aggressive interpretation of "batteries are included" that would pull in a lot of stuff. Yes, this makes more work for us -- but I think it's work we should be doing. While minimalism is an excellent design heuristic for the core language, I think it's a bad one for the libraries. Python is a high-level language and programmers using it both expect and deserve high-level libraries -- yes, including graphs/tries/btrees and all that computer science stuff. Just as much to the point, Python competing against languages like Perl that frequently get design wins against it because of the richness of the environment *they* are willing to carry around. Guido and Tim and others are more conservative than I, which would be OK -- but it seems to me that the conservatives do not have consistent or well-thought-out criteria for what to include, which is *not* OK. We need to solve this problem. Some time back I initiated a library guidelines PEP, then dropped it due to press of overwork. But the general question is going to keep coming up and we ought to have policy guidelines that potential library developers can understand. Should I pick this up again? -- Eric S. Raymond I do not find in orthodox Christianity one redeeming feature. -- Thomas Jefferson From mal@lemburg.com Tue Jan 23 11:50:39 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 12:50:39 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> Message-ID: <3A6D700F.7A9E2509@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > All very well, but are sets really that essential to every > > day Python programming ? If we include sets then we ought to > > also include graphs, tries, btrees and all those other goodies > > we have in computer science. > > I use sets a lot. And there was enough demand to generate a PEP. Sure, but sets are fairly easy to implement using Python dictionaries -- at least at the level normally needed by Python programs. Sets, queues and graphs are examples of data types which can have many different faces; it is hard to design APIs for these which meet everyones needs. > But the wider question here is how seriously we take "batteries are > included" as a design principle. Does a facility have to be useful > *every day* to be worth being in the standard library? And if so, > what are things like the POP3 and IMAP libraries (or, for that matter, > my own shlex and netrc modules) doing there? You can argue the same way for all kinds of extensions and packages you find in the Vaults. That's why there's demand for a different packaging of Python and this is what Moshe's PEP 206 addresses: http://python.sourceforge.net/peps/pep-0206.html > I don't think so. I think there are at least four different > possible reasons for something to be in the standard library: > > 1. It's useful every day. > > 2. It's useful less frequently than every day, but is a stable > cross-platform implementation of a wheel that would otherwise have to > be reinvented frequently. That is, you can solve it *once* and have a > zero-maintainance increment to the power of the language. > > 3. It's a technique that's not often used, and not necessarily stable > in the face of platform variations, but nothing else will do > when you need it and it's notably difficult to get right. (popen2 and > BaseHTTPServer would be good examples of this.) > > 4. It's a developer checklist feature that improves Python's competitive > position against Perl, Tcl, and other contenders for the same ecological > niche. > > IMO a lightweight set facility, like POP3 and IMAP, qualifies under 2 and 4 > even if not under 1 and 3. > > This question keeps coming up in different guises. I'm often the one to > raise it, because I favor an aggressive interpretation of "batteries > are included" that would pull in a lot of stuff. Yes, this makes more > work for us -- but I think it's work we should be doing. > > While minimalism is an excellent design heuristic for the core language, > I think it's a bad one for the libraries. Python is a high-level language > and programmers using it both expect and deserve high-level libraries -- > yes, including graphs/tries/btrees and all that computer science stuff. > > Just as much to the point, Python competing against languages like > Perl that frequently get design wins against it because of the > richness of the environment *they* are willing to carry around. > > Guido and Tim and others are more conservative than I, which would be > OK -- but it seems to me that the conservatives do not have consistent > or well-thought-out criteria for what to include, which is *not* OK. > We need to solve this problem. > > Some time back I initiated a library guidelines PEP, then dropped it > due to press of overwork. But the general question is going to keep > coming up and we ought to have policy guidelines that potential > library developers can understand. > > Should I pick this up again? Hmm, we already have the PEP 206 which focusses on the topic. Perhaps you could work with Moshe to sort out the "which batteries do we need" sub-topic ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Tue Jan 23 12:20:46 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 07:20:46 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D700F.7A9E2509@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 12:50:39PM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> Message-ID: <20010123072046.A25593@thyrsus.com> M.-A. Lemburg : > > But the wider question here is how seriously we take "batteries are > > included" as a design principle. Does a facility have to be useful > > *every day* to be worth being in the standard library? And if so, > > what are things like the POP3 and IMAP libraries (or, for that matter, > > my own shlex and netrc modules) doing there? > > You can argue the same way for all kinds of extensions and > packages you find in the Vaults. That's why there's demand for > a different packaging of Python and this is what Moshe's > PEP 206 addresses: > > http://python.sourceforge.net/peps/pep-0206.html Muttering "PEP 206" evades the fundamental problem rather than solving it. Not that I'm saying Moshe hasn't made a valiant effort, within the political constraint that the BDFL and others seem unwilling to confront the deeper issue. But PEP 206 is not enough. Here is why: 1. If the "Sumo" packaging ever happens, the vanilla non-Sumo version that Guido issues will quickly become of mostly theoretical interest -- because Red Hat and everybody else will move to Sumo instantly, figuring they have nothing to lose by including more features. 2. If by some change I'm wrong about 1, the outcome will be worse; we'll in effect have fragmented the language, because there won't be consistency in what library stuff is available between Sumo and non-Sumo builds on the same platform. 3. There are documentation issues as well. It's already a blot on Python that the standard documentation set doesn't cover Tkinter. In the Sumo distribution, the gap between what's installed and what's documented is likely to widen further. Developers will see this as pointlessly irritating -- and they'll be right. The stock distribution should *be* the Sumo distribution. If we're really so terrified of the extra maintainence load, then the right fix is to mark some modules and documentation as "externally maintained" with prominent pointers back to the responsible people. -- Eric S. Raymond The day will come when the mystical generation of Jesus by the Supreme Being as his father, in the womb of a virgin, will be classed with the fable of the generation of Minerva in the brain of Jupiter. -- Thomas Jefferson, 1823 From mal@lemburg.com Tue Jan 23 12:48:09 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 13:48:09 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> Message-ID: <3A6D7D89.A6BE1B74@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > > But the wider question here is how seriously we take "batteries are > > > included" as a design principle. Does a facility have to be useful > > > *every day* to be worth being in the standard library? And if so, > > > what are things like the POP3 and IMAP libraries (or, for that matter, > > > my own shlex and netrc modules) doing there? > > > > You can argue the same way for all kinds of extensions and > > packages you find in the Vaults. That's why there's demand for > > a different packaging of Python and this is what Moshe's > > PEP 206 addresses: > > > > http://python.sourceforge.net/peps/pep-0206.html > > Muttering "PEP 206" evades the fundamental problem rather than solving it. > > Not that I'm saying Moshe hasn't made a valiant effort, within the political > constraint that the BDFL and others seem unwilling to confront the deeper > issue. But PEP 206 is not enough. Here is why: > > 1. If the "Sumo" packaging ever happens, the vanilla non-Sumo version that > Guido issues will quickly become of mostly theoretical interest -- because > Red Hat and everybody else will move to Sumo instantly, figuring they have > nothing to lose by including more features. > > 2. If by some change I'm wrong about 1, the outcome will be worse; > we'll in effect have fragmented the language, because there won't be > consistency in what library stuff is available between Sumo and > non-Sumo builds on the same platform. > > 3. There are documentation issues as well. It's already a blot on > Python that the standard documentation set doesn't cover Tkinter. In > the Sumo distribution, the gap between what's installed and what's > documented is likely to widen further. Developers will see this as > pointlessly irritating -- and they'll be right. > > The stock distribution should *be* the Sumo distribution. If we're really > so terrified of the extra maintainence load, then the right fix is to > mark some modules and documentation as "externally maintained" with > prominent pointers back to the responsible people. That's your POV, others think different and since this is not a democracy, the Sumo distribution is a feasable way of satisfying both needs. There are a few other issues to consider as well: * licensing is a problem (and this is also mentioned in the PEP 206) since some of the nicer additions are GPLed and thus not in the spirit of Python's closed-source friendliness which has provided it with a large user base in the commercial field * packages authors are not all the same and some may not want to split their distribution due to the integration of their package in a Sumo-distribution * the packages mentioned in PEP 206 are very complex and usually largish; maintaining them will cause much more effort compared to the standard lib modules and extensions * the build process varies widely between packages; even though we have distutils, some of the packages extend it to fit their specific needs (which is OK, but causes extra efforts in getting the build process combined) I'm not objecting to the Sumo-distribution project; to the contrary -- I tried a similar project a few years ago: the Python PowerTools distribution which you can download from: http://www.lemburg.com/python/PowerTools-0.2.zip The project died quickly though, as I wasn't able to keep up with the maintenance effort. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin@mems-exchange.org Tue Jan 23 13:40:06 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 23 Jan 2001 08:40:06 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D7D89.A6BE1B74@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 01:48:09PM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> <3A6D7D89.A6BE1B74@lemburg.com> Message-ID: <20010123084006.A23485@newcnri.cnri.reston.va.us> On Tue, Jan 23, 2001 at 01:48:09PM +0100, M.-A. Lemburg wrote: >There are a few other issues to consider as well: > To add a few: * The larger the amount of code in the distribution, the more effort it is maintain it all. * Minor fixes aren't available until the next Python release. For example, to drag out the XML code again: there have been two PyXML releases since Python 2.0 fixing various bugs, but someone who sticks to installing just Python will not be able to get at those bugfixes until April (when 2.1 is supposed to get finalized). If there were a core Python distribution and a sumo distribution, and the sumo distribution was the one that most people downloaded and used, that would be perfectly OK. Practically no one assembles their own Linux distribution, and that's not considered a problem. To some degree, if you're using a well-packaged Linux distribution such as Debian, you also have Python distribution mechanism with intermodule dependencies; we just have to reinvent the wheel for people on other platforms. >The project died quickly though, as I wasn't able to keep >up with the maintenance effort. Interesting. Did you get much feedback indicating that people used it much? Perhaps when you were doing that effort the Python community was composed more of self-reliant early adopter types; there are probably more newbies around now. --amk From mal@lemburg.com Tue Jan 23 14:05:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 15:05:13 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> <3A6D7D89.A6BE1B74@lemburg.com> <20010123084006.A23485@newcnri.cnri.reston.va.us> Message-ID: <3A6D8F99.53A0F411@lemburg.com> Andrew Kuchling wrote: > > On Tue, Jan 23, 2001 at 01:48:09PM +0100, M.-A. Lemburg wrote: > >There are a few other issues to consider as well: > > > > To add a few: > > * The larger the amount of code in the distribution, the more effort it is > maintain it all. > > * Minor fixes aren't available until the next Python release. For example, > to drag out the XML code again: there have been two PyXML releases since > Python 2.0 fixing various bugs, but someone who sticks to installing just > Python will not be able to get at those bugfixes until April (when 2.1 > is supposed to get finalized). > > If there were a core Python distribution and a sumo distribution, and the > sumo distribution was the one that most people downloaded and used, that > would be perfectly OK. Practically no one assembles their own Linux > distribution, and that's not considered a problem. To some degree, if > you're using a well-packaged Linux distribution such as Debian, you also > have Python distribution mechanism with intermodule dependencies; we just > have to reinvent the wheel for people on other platforms. > > >The project died quickly though, as I wasn't able to keep > >up with the maintenance effort. > > Interesting. Did you get much feedback indicating that people used it much? Not much -- the interested parties were mostly Python experts (the lib started out as a project called expert-lib). > Perhaps when you were doing that effort the Python community was composed > more of self-reliant early adopter types; there are probably more newbies > around now. True. The included packages are dated 1997-1998 -- at that time Starship was just starting to get off the ground (this are moving at a much faster pace now). The PowerTools package still uses the Makefile.pre.in mechanism (with much success though) as distutils wasn't even considered at the time. Perhaps Moshe could pick this up to have a head start for Sumo-Python ?! Some of the included packages are not available elsewhere, AFAIK, so it may well be worthwhile having a look (e.g. the LGPLed trie and btree implementations donated by John W. M. Stevens). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@digicool.com Tue Jan 23 14:06:47 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 09:06:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: Your message of "Tue, 23 Jan 2001 04:17:30 EST." <20010123041730.A25165@thyrsus.com> References: <20010123041730.A25165@thyrsus.com> Message-ID: <200101231406.JAA04765@cj20424-a.reston1.va.home.com> > Ping's complaints are justified -- I've been looking at and testing > webbrowser.py and it's a mess. Among other things: > > 1. The BROWSER variable is not interpreted properly. > > 2. The code is stupid about loading platform support it doesn't need. > > 3. It's not possible to specify lynx as a browser under Unix, because the > computation of available browsers is split in two and partly done inside > the CommandLineBrowser class. > > 3. The module code is excessively hard to read, obscuring these bugs. > > Our mistake was hurriedly merging the launcher code from IDLE with the > browser-finder hack I wrote (the guts of CommandLineBrowser). The resulting > code is a bad, overcomplicated architecture with a nasty seam in it. > > As co-designer/implementor I should have caught this sooner, but I was > in a hurry to get a CML2 prototype out the door and didn't test > anything but the case I needed. My apologies to all. > > I'm rewriting to fix these problems now. Documented semantics of entry > points will be preserved. Excellent, Eric! That's the spirit. Can you point me to docs explaining the meaning of the BROWSER environment variable? I've never heard of it... The last new environment variables I learned were PAGER and EDITOR, probably 15 years ago when 4.1BSD was released... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Tue Jan 23 14:22:26 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 09:22:26 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101231406.JAA04765@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 09:06:47AM -0500 References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> Message-ID: <20010123092226.A25968@thyrsus.com> Guido van Rossum : > Can you point me to docs explaining the meaning of the BROWSER > environment variable? I've never heard of it... The last new > environment variables I learned were PAGER and EDITOR, probably 15 > years ago when 4.1BSD was released... :-) You've never heard of BROWSER because I invented it and have not widely popularized it yet :-). Ping knew about it either because he read the module code and saw that it was supposed to work, or because he remembered the design discussion when webbrowser.py was first implemented. I've had conversations with some key Perl and Tcl people (Larry Wall, Tom Christiansen, Clif Flynt) about the BROWSER convention, and they agree it's a good idea. I'll probably hack support for it into Perl's browser launcher next. It's documented in the version of libwebbrowser.tex now in the CVS tree. -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From nas@arctrix.com Tue Jan 23 08:30:56 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 23 Jan 2001 00:30:56 -0800 Subject: [Python-Dev] Does autoconfig detect INSTALL incorrectly? Message-ID: <20010123003056.A28309@glacier.fnational.com> Why is the configure.in file set to always use "install-sh"? There is a comment that says: # Install just never works :-( I don't think that statement is accurate. /usr/bin/install works quite well on my machine. The only commments I can find in the changelog are: revision 1.16 date: 1995/01/20 14:12:16; author: guido; state: Exp; lines: +27 -2 add INSTALL_PROGRAM and INSTALL_DATA; check for getopt and: revision 1.5 date: 1994/08/19 15:33:51; author: guido; state: Exp; lines: +14 -6 Simplify value of INSTALL (always 'cp'). Is there any reason why the autoconf macro AC_PROG_INSTALL is not used? The documentation seems to indicate that is does what we want. Neil From guido@digicool.com Tue Jan 23 15:31:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 10:31:39 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: Your message of "Tue, 23 Jan 2001 10:15:11 +0100." <3A6D4B9F.38B17046@lemburg.com> References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> <3A6D4B9F.38B17046@lemburg.com> Message-ID: <200101231531.KAA05122@cj20424-a.reston1.va.home.com> > Polymorphic code will usually get you more out of an > algorithm, than type-safe or interface-safe code. Right. But there are times when people want to write methods that take e.g. either a sequence or a mapping, and need to distinguish between the two. That's not easy in Python! Java and C++ support it very well though, and thus we'll always keep seeing this kind of complaint. Not sure what to do, except to recommend "find out which methods you expect in one case but not in the other (e.g. keys()) and do a hasattr() test for that." > BTW, there are Python interfaces to PySequence_Check() and > PyMapping_Check() burried in the builtin operator module in case > you really do care ;) ... > > operator.isSequenceType() > operator.isMappingType() > + some other C style _Check() APIs > > These only look at the type slots though, so Python instances > will appear to support everything but when used fail with > an exception if they don't provide the proper __xxx__ hooks. Yes, these should probably be deprecated. I certainly have never used them! (The operator module doesn't seem to get much use in general... Was it a bad idea?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 23 15:49:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 10:49:23 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Mon, 22 Jan 2001 15:13:09 EST." <20010122151309.C15236@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> Message-ID: <200101231549.KAA05172@cj20424-a.reston1.va.home.com> > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. Actually, I thought that Greg's proposal has some charm: it seems to be using a natural extension of the existing dictionary syntax, where a set is a dictionary without the values. I haven't thought about this deeply enough, but I see a lot of potential here. I understand that you have probably given this more thought than I have recently, so I'd like to see your more detailed analysis of what you do and don't like about Greg's proposal! > The only other set module I can find in the Vaults or anywhere else is > kjBuckets (which I knew about before). Looks like a good design, but > complicated -- and requires installation of an extension. > > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: > > 1. It supports a set of operations that are both often useful and > fiddly to get right, thus enhancing the "batteries are included" > effect. (I used its ancestor for representing seen-message numbers in > a specialized mailreader, for example.) I haven't read your docs yet (and no time because Digital Creations is requiring my attention all of today), but I expect that designing a universal set type, one that is good enough to be used in all sorts of applications, is very difficult. > 2. It's simple for application programmers to use. No extension module > to integrate. This is a silly argument for wanting something to be added to the core. If it's part of the core, the need for an extension is immaterial because that extension will always be available. So I conclude that your module is set up perfectly for a popular module in the Vaults. :-) > 3. It's unsurprising. My set objects behave almost exactly like other > mutable sequences, with all the same built-in methods working, except for > the fact that you can't introduce duplicates with the mutators. Ah, so you see a set as an extension of a sequence. That may be the big rift between your version and Greg's PEP: are sets more like sequences or more like dictionaries? > 4. It's already completely documented in a form suitable for the library. Much appreciated. > 5. It's simple enough not to cause you maintainance hassles down the > road, and even if it did the maintainer is unlikely to disappear :-). I'll be the judge of that, and since you prefer not to show your source code (why is that?), I can't tell yet. [...time flows...] Having just skimmed your docs, I'm disappointed that you choose lists as your fundamental representation type -- this makes it slow to test for membership and hence makes intersection and union slow. I suppose that you have evidence from using this that those operations aren't used much, or not for large sets? This is one of the problems with coming up with a set type for the core: it has to work for (nearly) everybody. It's no big deal if the Vaults contain three or more set modules -- perfect even, people can choose the best one for their purpose. But in the core, there's only room for one set type or module. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Tue Jan 23 16:30:50 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 11:30:50 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231549.KAA05172@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 10:49:23AM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> Message-ID: <20010123113050.A26162@thyrsus.com> --tKW2IUtsqtDRztdT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Guido van Rossum : > I understand that you have probably given this more thought than I > have recently, so I'd like to see your more detailed analysis of what > you do and don't like about Greg's proposal! I've already covered my big objection, the fact that it doesn't support the degree of polymorphic crossover one might expect with sequence types (and Greg has agreed that I have a point there). Another problem is the lack of support for mutable elements (and yes, I'm quite aware of the problems with this.) One thing I do like is the proposal for an actual set input syntax. Of course this would require that the set type become one of the builtins, with compiler support. > I haven't read your docs yet (and no time because Digital Creations is > requiring my attention all of today), but I expect that designing a > universal set type, one that is good enough to be used in all sorts of > applications, is very difficult. For "difficult" read "can't be done". This is one of those cases where no matter what implementation you choose, some of the operations you want to be cheap will be worst-case quadratic. Life is like that. So I chose a dead-simple representation and accepted quadratic times for union/intersection. > > 2. It's simple for application programmers to use. No extension module > > to integrate. > > This is a silly argument for wanting something to be added to the > core. If it's part of the core, the need for an extension is > immaterial because that extension will always be available. So > I conclude that your module is set up perfectly for a popular module > in the Vaults. :-) Reasonable point. > > 3. It's unsurprising. My set objects behave almost exactly like other > > mutable sequences, with all the same built-in methods working, except for > > the fact that you can't introduce duplicates with the mutators. > > Ah, so you see a set as an extension of a sequence. That may be the > big rift between your version and Greg's PEP: are sets more like > sequences or more like dictionaries? Indeed it is. > > 5. It's simple enough not to cause you maintainance hassles down the > > road, and even if it did the maintainer is unlikely to disappear :-). > > I'll be the judge of that, and since you prefer not to show your > source code (why is that?), I can't tell yet. No nefarious concealment going on here here :-), I've sent versions of the code to Greg and Ping already. I'll shoot you a copy too. > Having just skimmed your docs, I'm disappointed that you choose lists > as your fundamental representation type -- this makes it slow to test > for membership and hence makes intersection and union slow. Not quite. Membership test is still linear-time; so is adding and deleting elements. It's true that union and intersection are quadratic, but see below. > I suppose > that you have evidence from using this that those operations aren't > used much, or not for large sets? Exactly! In my experience the usage pattern of a class like this runs heavily to small sets (usually < 64 elements); membership tests dominate usage, with addition and deletion of elements running second and the "classical" boolean operations like union and intersection being uncommon. What you get by going with a dictionary representation is that membership test becomes close to constant-time, while insertion and deletion become sometimes cheap and sometimes quite expensive (depending of course on whether you have to allocate a new hash bucket). Given the usage pattern I described, the overall difference in performance is marginal. > This is one of the problems with > coming up with a set type for the core: it has to work for (nearly) > everybody. As I pointed out above (and someone else on the list had made the same point earlier), "works for everbody" isn't really possible here. So my solution does the next best thing -- pick a choice of tradeoffs that isn't obviously worse than the alternatives and keeps things bog-simple. -- Eric S. Raymond Alcohol still kills more people every year than all `illegal' drugs put together, and Prohibition only made it worse. Oppose the War On Some Drugs! --tKW2IUtsqtDRztdT Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="set.py" """ A set-algebra module for Python. The functions work on any sequence type and return lists. The set methods can take a set or any sequence type as an argument. They are insensitive to the types of the elements. Lists are used rather than dictionaries so the elements can be mutable. """ # Design and implementation by ESR, January 2001. def setify(list1): # Used by set constructor "Remove duplicates in sequence." res = [] for i in range(len(list1)): duplicate = 0 for j in range(i): if list1[i] == list1[j]: duplicate = 1 break if not duplicate: res.append(list1[i]) return res def union(list1, list2): # Used for | "Compute set intersection of sequences." res = list1[:] for x in list2: if not x in list1: res.append(x) return res def intersection(list1, list2): # Used for & "Compute set intersection of sequences." res = [] for x in list1: if x in list2: res.append(x) return res def difference(list1, list2): # Used for - "Compute set difference of sequences." res = [] for x in list1: if not x in list2: res.append(x) return res def symmetric_difference(list1, list2): # Used for ^ "Compute set symmetric-difference of sequences." res = [] for x in list1: if not x in list2: res.append(x) for x in list2: if not x in list1: res.append(x) return res def cartesian(list1, list2): # Used for * "Cartesian product of sequences considered as sets." res = [] for x in list1: for y in list2: res.append((x,y)) return res def equality(list1, list2): "Test sequences considered as sets for equality." if len(list1) != len(list2): return 0 for x in list1: if not x in list2: return 0 for x in list2: if not x in list1: return 0 return 1 def proper_subset(list1, list2): "Return 1 if first argument is a proper subset of second, 0 otherwise." if not len(list1) < len(list2): return 0 for x in list1: if not x in list2: return 0 return 1 def subset(list1, list2): "Return 1 if first argument is a subset of second, 0 otherwise." if not len(list1) <= len(list2): return 0 for x in list1: if not x in list2: return 0 return 1 def powerset(base): "Compute the set of all subsets of a set." powerset = [] for n in xrange(2 ** len(base)): subset = [] for e in xrange(len(base)): if n & 2 ** e: subset.append(base[e]) powerset.append(subset) return powerset class set: "Lists with set-theoretic operations." def __init__(self, value): self.elements = setify(value) def __len__(self): return len(self.elements) def __getitem__(self, ind): return self.elements[ind] def __setitem__(self, ind, val): if val not in self.elements: self.elements[ind] = val def __delitem__(self, ind): del self.elements[ind] def list(self): return self.elements def append(self, new): if new not in self.elements: self.elements.append(new) def extend(self, new): self.elements.extend(new) self.elements = setify(self.elements) def count(self, x): self.elements.count(x) def index(self, x): self.elements.index(x) def insert(self, i, x): if x not in self.elements: self.elements.index(i, x) def pop(self, i=None): self.elements.pop(i) def remove(self, x): self.elements.remove(x) def reverse(self): self.elements.reverse() def sort(self, cmp=None): self.elements.sort(cmp) def __or__(self, other): if type(other) == type(self): other = other.elements return set(union(self.elements, other)) __add__ = __or__ def __and__(self, other): if type(other) == type(self): other = other.elements return set(intersection(self.elements, other)) def __sub__(self, other): if type(other) == type(self): other = other.elements return set(difference(self.elements, other)) def __xor__(self, other): if type(other) == type(self): other = other.elements return set(symmetric_difference(self.elements, other)) def __mul__(self, other): if type(other) == type(self): other = other.elements return set(cartesian(self.elements, other)) def __eq__(self, other): if type(other) == type(self): other = other.elements return self.elements == other def __ne__(self, other): if type(other) == type(self): other = other.elements return self.elements != other def __lt__(self, other): if type(other) == type(self): other = other.elements return proper_subset(self.elements, other) def __le__(self, other): if type(other) == type(self): other = other.elements return subset(self.elements, other) def __gt__(self, other): if type(other) == type(self): other = other.elements return proper_subset(other, self.elements) def __ge__(self, other): if type(other) == type(self): other = other.elements return subset(other, self.elements) def __str__(self): res = "{" for x in self.elements: res = res + str(x) + ", " res = res[0:-2] + "}" return res def __repr__(self): return repr(self.elements) if __name__ == '__main__': a = set([1, 2, 3, 4]) b = set([1, 4]) c = set([5, 6]) d = [1, 1, 2, 1] print `d`, "setifies to", set(d) print `a`, "|", `b`, "is", `a | b` print `a`, "^", `b`, "is", `a ^ b` print `a`, "&", `b`, "is", `a & b` print `b`, "*", `c`, "is", `b * c` print `a`, '<', `b`, "is", `a < b` print `a`, '>', `b`, "is", `a > b` print `b`, '<', `c`, "is", `b < c` print `b`, '>', `c`, "is", `b > c` print "Power set of", `c`, "is", powerset(c) # end --tKW2IUtsqtDRztdT-- From sdm7g@virginia.edu Tue Jan 23 17:12:22 2001 From: sdm7g@virginia.edu (Steven D. Majewski) Date: Tue, 23 Jan 2001 12:12:22 -0500 (EST) Subject: [Python-Dev] libraries=['m'] in config.py [Re: Python 2.1 alpha 1 released!] In-Reply-To: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Message-ID: Is there a simple way (other than editing config.py) to remove the effect of all of the "libraries=['m']" options from config.py ? This breaks the MacOSX build as there's no libm -- that functionality is build into the System.framework . Shouldn't these type of flags be acquired from configure or the make environment somehow ? -- Steve Majewski ( BTW: OSX build also needs a "-traditional-cpp" flag to get thru compiling classobject.c without error. ) From uche.ogbuji@fourthought.com Tue Jan 23 17:28:18 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 23 Jan 2001 10:28:18 -0700 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: Message from Martin von Loewis of "Mon, 22 Jan 2001 15:46:39 +0100." <200101221446.PAA05164@pandora.informatik.hu-berlin.de> Message-ID: <200101231728.KAA03408@localhost.localdomain> > > This has nothing to do with Python. UTF-8 marks the codes > > from 128-191 as illegal prefix. > [...] > > Perhaps the parser should catch the UnicodeError and > > instead return a not-wellformed exception ?! > > Right on both accounts. If no encoding is specified, and if the > document appears not to be UTF-16 in any endianness, an XML processor > shall assume it is UTF-8. As Marc-Andre explains, your document is not > proper UTF-8, hence the error. > > The confusing thing is that expat itself does not care about it not > being UTF-8; that is only detected when the callback is invoked in > pyexpat, and therefore conversion to a Unicode object is attempted. Pyexpat violates the XML spec here. XML parsers are not allowed to "recover" from well-formedness errors. And I would classify blithley reporting the character data as "recovery". However, I'm amazed that this wouldn't have come up before, considering the pedigree of expat. I'll poke around, and raise a bug on the expat site if need be. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tismer@tismer.com Tue Jan 23 17:35:08 2001 From: tismer@tismer.com (Christian Tismer) Date: Tue, 23 Jan 2001 18:35:08 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <200101231728.KAA03408@localhost.localdomain> Message-ID: <3A6DC0CC.C4FF83DF@tismer.com> uche.ogbuji@fourthought.com wrote: > > > > This has nothing to do with Python. UTF-8 marks the codes > > > from 128-191 as illegal prefix. > > [...] > > > Perhaps the parser should catch the UnicodeError and > > > instead return a not-wellformed exception ?! > > > > Right on both accounts. If no encoding is specified, and if the > > document appears not to be UTF-16 in any endianness, an XML processor > > shall assume it is UTF-8. As Marc-Andre explains, your document is not > > proper UTF-8, hence the error. > > > > The confusing thing is that expat itself does not care about it not > > being UTF-8; that is only detected when the callback is invoked in > > pyexpat, and therefore conversion to a Unicode object is attempted. > > Pyexpat violates the XML spec here. XML parsers are not allowed to "recover" > from well-formedness errors. And I would classify blithley reporting the > character data as "recovery". > > However, I'm amazed that this wouldn't have come up before, considering the > pedigree of expat. Well, I had to write a preprocessor which turns some "xml-like" but not well-formed stuff into something useable. This was a bulk of 100 MB of data, partially hand-written, partially machine-generated, but not really well-formed. Some special characters appeared very late in the data set, raising an error in Python 2.0, but not in 1.5.2, so I perceived it as an error in the parser first, not the data. :-) ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From uche.ogbuji@fourthought.com Tue Jan 23 17:55:12 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 23 Jan 2001 10:55:12 -0700 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: Message from Christian Tismer of "Mon, 22 Jan 2001 16:05:24 +0100." <3A6C4C34.4D1252C9@tismer.com> Message-ID: <200101231755.KAA03471@localhost.localdomain> > "M.-A. Lemburg" wrote: > ... > > > The codes from 192 to 236, 238-243 produce > > > "UTF-8 decoding error: invalid data", > > > the rest gives "not well-formed". > > > > > > I would like to know if this happens with your (Tim) modified > > > version as well. I'm using plain vanilla BeOpen Python 2.0 . > > > > This has nothing to do with Python. UTF-8 marks the codes > > from 128-191 as illegal prefix. See Object/unicodeobject.c: > ... > > Schade. > > > Perhaps the parser should catch the UnicodeError and > > instead return a not-wellformed exception ?! > > I belive it would be better. Yes, and given there is not much time before thr 2.1 release, doing so is an acceptable stop-gap. However, I think the real fix has to lie in expat. I just had a *very* quick and dirty perusal of expat 1.2 and 1.95.1, and not only do the UTF-8 validity checks (at the top of xmltok.c) seem wrong, but it doesn't look as if they're ever invoked. I'll try to some time to look into this more closely, or perhaps someone will straighten me out if I'm on the wrong trail. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fredrik@effbot.org Tue Jan 23 18:03:42 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 19:03:42 +0100 Subject: [Python-Dev] getting rid of ucnhash Message-ID: <013901c08566$d2a8f360$e46940d5@hagrid> It's probably just me, but the names of the two unicode modules tend to irritate me: > ls u*.pyd ucnhash.pyd unicodedata.pyd (the former contains names, the latter data) I've been meaning to rename the former, but I just realized that it might be better to get rid of it completely, and move its functionality into the unicodedata module. The result is a single 200k unicodedata module, which con- tains the name database as well as two new functions: name(character [, default]) => map unicode character to name. if the name doesn't exist, return the default object, or raise ValueError. lookup(name) => unicode character (or raise KeyError if it doesn't exist) Should I check it in now, change the names/semantics and check it in, or post it to sourceforge? Cheers /F From uche.ogbuji@fourthought.com Tue Jan 23 18:00:19 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 23 Jan 2001 11:00:19 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Mon, 22 Jan 2001 12:41:59 EST." <20010122124159.A14999@thyrsus.com> Message-ID: <200101231800.LAA03515@localhost.localdomain> > \section{\module{set} --- > Basic set algebra for Python} Looks good. Are you making this available for download? I could put this to experimental use right away (experimental since, IIRC, you are using the new rich comparisons). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Jan 23 18:16:27 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 23 Jan 2001 11:16:27 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Mon, 22 Jan 2001 15:13:09 EST." <20010122151309.C15236@thyrsus.com> Message-ID: <200101231816.LAA03551@localhost.localdomain> > Guido van Rossum : > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. Tim mentioned that he had one, and he also claimed that every other dodder had a set class, but the only one listed in the vaults is kjBuckets, which I'm not sure is maintained any more. (Is Aaron Watters hereabouts?) > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. IMO, Eric's Set interface is close to perfect. PEP 218 is interesting, but I'm not sure it's worth slogging through the inevitable uproar over an entirely new syntactic construct (the "{}" notation) before getting something as useful as a set class into the standard library. > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: For what it's worth, I'm +1 on adding this to the standard library. I've seen so many set hacks with dictionaries (memory ouch) and list hacks (speed ouch) in Python code out there, that I'm convinced it would meet much more common usage than, say zlib, xdr, or even expat. On this hacker list everyone's aunt might whip up set extensions on boring weekends, but I doubt this describes the overall Python populace. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji@fourthought.com Tue Jan 23 18:29:36 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 23 Jan 2001 11:29:36 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "M.-A. Lemburg" of "Tue, 23 Jan 2001 11:26:16 +0100." <3A6D5C48.A076DA0@lemburg.com> Message-ID: <200101231829.LAA03575@localhost.localdomain> > All very well, but are sets really that essential to every > day Python programming ? Not everyday, but as I said, the standard library has zlib, expat, tkinter, colorsys, and a whole lot of other stuff that is undoubtedly less useful than a set class. > If we include sets then we ought to > also include graphs, tries, btrees I see all of these as far less commonly useful than sets (at least in situations where implementations using existing data structures won't suffice). I run into needs for sets all the time. I don't have as much trouble with your other examples, though I've always considered tries as a possible performance boost in XPath. Oddly enough another data structure I often wish I had is a splay tree, and I hope to wrap my old C++ splay tree implementation for Python one of these days. > and all those other goodies > we have in computer science. All of these types are available > out there, but I believe the audience who really cares for these > types is also capable of downloading the extensions and installing > them. > > It would be nice if all of these extension could go into a SUMO > edition of Python though... together with your set module. Considering "batteries included", it's worth considering these very important "batteries". -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From skip@mojam.com (Skip Montanaro) Tue Jan 23 18:35:04 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 23 Jan 2001 12:35:04 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: References: Message-ID: <14957.52952.48739.53360@beluga.mojam.com> Guido> - Use "exec ... in dict" to avoid having to walk on eggshells; Guido> locals no don't have to start with underscore. Thanks. I have just been incredibly short on time lately. Guido> - Only test dbhash if bsddb can be imported. (Wonder if there Guido> are more like this?) Alpha testing should pick those up, yes? ;-) Guido> ! try: Guido> ! import bsddb Guido> ! except ImportError: Guido> ! if verbose: Guido> ! print "can't import bsddb, so skipping dbhash" Guido> ! else: Guido> ! check_all("dbhash") Instead of having to know that dbhash includes bsddb, shouldn't dbhash be the module that's imported here? Skip From uche.ogbuji@fourthought.com Tue Jan 23 18:36:59 2001 From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com) Date: Tue, 23 Jan 2001 11:36:59 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Tue, 23 Jan 2001 11:30:50 EST." <20010123113050.A26162@thyrsus.com> Message-ID: <200101231836.LAA03655@localhost.localdomain> > """ > A set-algebra module for Python. > > The functions work on any sequence type and return lists. > The set methods can take a set or any sequence type as an argument. > They are insensitive to the types of the elements. > > Lists are used rather than dictionaries so the elements can be mutable. > > """ Hmm. I was hoping this was actually a C extension for the performance boost, esp. given the number of __foo__ methods in the set class. Implementation in Python makes my interest in adding it to the standard lib more tepid (not to cast the least bit of aspersion on your work). -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From skip@mojam.com (Skip Montanaro) Tue Jan 23 18:37:44 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 23 Jan 2001 12:37:44 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <3A6CBBEF.4732BFF2@ActiveState.com> References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> <3A6CBBEF.4732BFF2@ActiveState.com> Message-ID: <14957.53112.119272.797494@beluga.mojam.com> Paul> I apologize but I'm not clear on my responsibilities here, if Paul> any. I wrote a PEP for online help. I submitted a partial Paul> implementation. Perhaps I am the one who should apologize. I started the thread. I tried Ping's code and was simply amazed at how useful it was. I didn't bother checking the list of PEPs to see if it overlapped with something there, and I suspect any discussion of this stuff has taken place in the doc sig, where I don't hang out. Skip From esr@thyrsus.com Tue Jan 23 18:39:04 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 13:39:04 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231816.LAA03551@localhost.localdomain>; from uche.ogbuji@fourthought.com on Tue, Jan 23, 2001 at 11:16:27AM -0700 References: <200101231816.LAA03551@localhost.localdomain> Message-ID: <20010123133904.B26487@thyrsus.com> uche.ogbuji@fourthought.com : > I've seen so many set hacks with dictionaries (memory ouch) and list > hacks (speed ouch) in Python code out there, that I'm convinced it > would meet much more common usage than, say zlib, xdr, or even > expat. Uche brings up a point I meant to make in my reply to Guido. The dict- vs.-list choice in set representation is indeed a choice between memory ouch and speed ouch. I believe most uses of sets are small sets. That reduces the speed ouch of using a list representation and increases the proportional memory ouch of a dictionary implementation. -- Eric S. Raymond Question with boldness even the existence of a God; because, if there be one, he must more approve the homage of reason, than that of blindfolded fear.... Do not be frightened from this inquiry from any fear of its consequences. If it ends in the belief that there is no God, you will find incitements to virtue in the comfort and pleasantness you feel in its exercise... -- Thomas Jefferson, in a 1787 letter to his nephew From jeremy@alum.mit.edu Tue Jan 23 18:41:23 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 13:41:23 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123113050.A26162@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> Message-ID: <14957.53331.342827.462297@localhost.localdomain> --OvJPdPv5cJ Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit >>>>> "ESR" == Eric S Raymond writes: ESR> Guido van Rossum : >> Having just skimmed your docs, I'm disappointed that you choose >> lists as your fundamental representation type -- this makes it >> slow to test for membership and hence makes intersection and >> union slow. ESR> Not quite. Membership test is still linear-time; so is adding ESR> and deleting elements. It's true that union and intersection ESR> are quadratic, but see below. >> I suppose that you have evidence from using this that those >> operations aren't used much, or not for large sets? ESR> Exactly! In my experience the usage pattern of a class like ESR> this runs heavily to small sets (usually < 64 elements); ESR> membership tests dominate usage, with addition and deletion of ESR> elements running second and the "classical" boolean operations ESR> like union and intersection being uncommon. I use a Set type in the compiler package (Tools/compiler/compiler) to collect the names for a code block. I implemented a trivial Set type using a dictionary, because it supported the operations I was most interested in: addition, membership tests, intersection, and get elements as sequence (in arbitrary order). Those are the only operations the compiler uses. I think I use sets for this purpose frequently, although I can't think of any other good examples at the moment. I usually just use a dictionary explicitly. In the compiler, I chose an explicit Set class with unique method names (add, has_elt, elements) to make it obvious for readers that I was using a set. ESR> What you get by going with a dictionary representation is that ESR> membership test becomes close to constant-time, while insertion ESR> and deletion become sometimes cheap and sometimes quite ESR> expensive (depending of course on whether you have to allocate ESR> a new hash bucket). Given the usage pattern I described, the ESR> overall difference in performance is marginal. The cost of insertion would presumably be dominated by the frequency of dictionary resizes. I don't know how often they occur, but I assume the dictionary type is designed to accommodate efficient insert. I did a quick and dirty performance comparison of dictionary-based and list-based sets. (I'll include the code below.) It uses sample data collected from running the compiler; so it is measuring actual usage. The tests showed that dictionary-based sets were always faster. For small tests (3 operations), the difference was about 10 percent. For larger tests (88 operations), the difference ranged from 180 to almost 700 percent. >> This is one of the problems with coming up with a set type for >> the core: it has to work for (nearly) everybody. ESR> As I pointed out above (and someone else on the list had made ESR> the same point earlier), "works for everbody" isn't really ESR> possible here. So my solution does the next best thing -- pick ESR> a choice of tradeoffs that isn't obviously worse than the ESR> alternatives and keeps things bog-simple. For my applications, the dictionary-based approach is faster and offers a natural interface. If a set implementation were included in the standard library, I would like to see either (1) the implementation that favors my needs or (2) multiple implementations tuned for different uses. I think it would be just as easy to make set implementations available separately, though. Jeremy --OvJPdPv5cJ Content-Type: text/plain Content-Disposition: inline; filename="sets.tar" Content-Transfer-Encoding: base64 c2V0cy8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwNDA3NzUA MDAwMTU1NgAwMDAwNzY1ADAwMDAwMDAwMDAwADA3MjMzMzUwMDA1ADAxMTIxNQAgNQAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhciAgAGplcmVt eQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYWRtaW4AAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABzZXRzL3Rlc3RzZXQxOC5weQAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAMDEwMDY2NAAwMDAxNTU2ADAwMDA3NjUAMDAwMDAwMDQ2MTQA MDcyMzMzNDcyNDMAMDEzNDQ3ACAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAHVzdGFyICAAamVyZW15AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABh ZG1pbgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHNp emUgPSA4OA0KDQpkZWYgdGVzdChmYWN0b3J5KToNCiAgICBzZXQgPSBmYWN0b3J5KCkNCiAg ICBzZXQuYWRkKCdvcHRpbWl6ZWQnKQ0KICAgIHNldC5hZGQoJ19faW5pdF9fJykNCiAgICBz ZXQuYWRkKCdfc2V0dXBHcmFwaERlbGVnYXRpb24nKQ0KICAgIHNldC5hZGQoJ2dldENvZGUn KQ0KICAgIHNldC5hZGQoJ2lzTG9jYWxOYW1lJykNCiAgICBzZXQuYWRkKCdzdG9yZU5hbWUn KQ0KICAgIHNldC5hZGQoJ2xvYWROYW1lJykNCiAgICBzZXQuYWRkKCdkZWxOYW1lJykNCiAg ICBzZXQuYWRkKCdfbmFtZU9wJykNCiAgICBzZXQuYWRkKCdzZXRfbGluZW5vJykNCiAgICBz ZXQuYWRkKCd2aXNpdE1vZHVsZScpDQogICAgc2V0LmFkZCgndmlzaXRGdW5jdGlvbicpDQog ICAgc2V0LmFkZCgndmlzaXRMYW1iZGEnKQ0KICAgIHNldC5hZGQoJ192aXNpdEZ1bmNPckxh bWJkYScpDQogICAgc2V0LmFkZCgndmlzaXRDbGFzcycpDQogICAgc2V0LmFkZCgndmlzaXRJ ZicpDQogICAgc2V0LmFkZCgndmlzaXRXaGlsZScpDQogICAgc2V0LmFkZCgndmlzaXRGb3In KQ0KICAgIHNldC5hZGQoJ3Zpc2l0QnJlYWsnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0Q29udGlu dWUnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0VGVzdCcpDQogICAgc2V0LmFkZCgndmlzaXRBbmQn KQ0KICAgIHNldC5hZGQoJ3Zpc2l0T3InKQ0KICAgIHNldC5hZGQoJ3Zpc2l0Q29tcGFyZScp DQogICAgc2V0LmFkZCgnX19saXN0X2NvdW50JykNCiAgICBzZXQuYWRkKCd2aXNpdExpc3RD b21wJykNCiAgICBzZXQuYWRkKCd2aXNpdExpc3RDb21wRm9yJykNCiAgICBzZXQuYWRkKCd2 aXNpdExpc3RDb21wSWYnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0QXNzZXJ0JykNCiAgICBzZXQu YWRkKCd2aXNpdFJhaXNlJykNCiAgICBzZXQuYWRkKCd2aXNpdFRyeUV4Y2VwdCcpDQogICAg c2V0LmFkZCgndmlzaXRUcnlGaW5hbGx5JykNCiAgICBzZXQuYWRkKCd2aXNpdERpc2NhcmQn KQ0KICAgIHNldC5hZGQoJ3Zpc2l0Q29uc3QnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0S2V5d29y ZCcpDQogICAgc2V0LmFkZCgndmlzaXRHbG9iYWwnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0TmFt ZScpDQogICAgc2V0LmFkZCgndmlzaXRQYXNzJykNCiAgICBzZXQuYWRkKCd2aXNpdEltcG9y dCcpDQogICAgc2V0LmFkZCgndmlzaXRGcm9tJykNCiAgICBzZXQuYWRkKCdfcmVzb2x2ZURv dHMnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0R2V0YXR0cicpDQogICAgc2V0LmFkZCgndmlzaXRB c3NpZ24nKQ0KICAgIHNldC5hZGQoJ3Zpc2l0QXNzTmFtZScpDQogICAgc2V0LmFkZCgndmlz aXRBc3NBdHRyJykNCiAgICBzZXQuYWRkKCdfdmlzaXRBc3NTZXF1ZW5jZScpDQogICAgc2V0 LmFkZCgndmlzaXRBc3NUdXBsZScpDQogICAgc2V0LmFkZCgndmlzaXRBc3NMaXN0JykNCiAg ICBzZXQuYWRkKCd2aXNpdEFzc1R1cGxlJykNCiAgICBzZXQuYWRkKCd2aXNpdEFzc0xpc3Qn KQ0KICAgIHNldC5hZGQoJ3Zpc2l0QXVnQXNzaWduJykNCiAgICBzZXQuYWRkKCdfYXVnbWVu dGVkX29wY29kZScpDQogICAgc2V0LmFkZCgndmlzaXRBdWdOYW1lJykNCiAgICBzZXQuYWRk KCd2aXNpdEF1Z0dldGF0dHInKQ0KICAgIHNldC5hZGQoJ3Zpc2l0QXVnU2xpY2UnKQ0KICAg IHNldC5hZGQoJ3Zpc2l0QXVnU3Vic2NyaXB0JykNCiAgICBzZXQuYWRkKCd2aXNpdEV4ZWMn KQ0KICAgIHNldC5hZGQoJ3Zpc2l0Q2FsbEZ1bmMnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0UHJp bnQnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0UHJpbnRubCcpDQogICAgc2V0LmFkZCgndmlzaXRS ZXR1cm4nKQ0KICAgIHNldC5hZGQoJ3Zpc2l0U2xpY2UnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0 U3Vic2NyaXB0JykNCiAgICBzZXQuYWRkKCdiaW5hcnlPcCcpDQogICAgc2V0LmFkZCgndmlz aXRBZGQnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0U3ViJykNCiAgICBzZXQuYWRkKCd2aXNpdE11 bCcpDQogICAgc2V0LmFkZCgndmlzaXREaXYnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0TW9kJykN CiAgICBzZXQuYWRkKCd2aXNpdFBvd2VyJykNCiAgICBzZXQuYWRkKCd2aXNpdExlZnRTaGlm dCcpDQogICAgc2V0LmFkZCgndmlzaXRSaWdodFNoaWZ0JykNCiAgICBzZXQuYWRkKCd1bmFy eU9wJykNCiAgICBzZXQuYWRkKCd2aXNpdEludmVydCcpDQogICAgc2V0LmFkZCgndmlzaXRV bmFyeVN1YicpDQogICAgc2V0LmFkZCgndmlzaXRVbmFyeUFkZCcpDQogICAgc2V0LmFkZCgn dmlzaXRVbmFyeUludmVydCcpDQogICAgc2V0LmFkZCgndmlzaXROb3QnKQ0KICAgIHNldC5h ZGQoJ3Zpc2l0QmFja3F1b3RlJykNCiAgICBzZXQuYWRkKCdiaXRPcCcpDQogICAgc2V0LmFk ZCgndmlzaXRCaXRhbmQnKQ0KICAgIHNldC5hZGQoJ3Zpc2l0Qml0b3InKQ0KICAgIHNldC5h ZGQoJ3Zpc2l0Qml0eG9yJykNCiAgICBzZXQuYWRkKCd2aXNpdEVsbGlwc2lzJykNCiAgICBz ZXQuYWRkKCd2aXNpdFR1cGxlJykNCiAgICBzZXQuYWRkKCd2aXNpdExpc3QnKQ0KICAgIHNl dC5hZGQoJ3Zpc2l0U2xpY2VvYmonKQ0KICAgIHNldC5hZGQoJ3Zpc2l0RGljdCcpDQoAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAHNldHMvdGVzdHNldDg4LnB5AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMTAw NjY0ADAwMDE1NTYAMDAwMDc2NQAwMDAwMDAwMDUzNAAwNzIzMzM0NzI0MwAwMTM0NTMAIDAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdXN0YXIgIABq ZXJlbXkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGFkbWluAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAc2l6ZSA9IDEzDQoNCmRlZiB0ZXN0KGZh Y3RvcnkpOg0KICAgIHNldCA9IGZhY3RvcnkoKQ0KICAgIHNldC5hZGQoJ3NlbGYnKQ0KICAg IHNldC5hZGQoJ2V4cHInKQ0KICAgIHNldC5hZGQoJ2ZsYWdzJykNCiAgICBzZXQuYWRkKCds b3dlcicpDQogICAgc2V0LmFkZCgndXBwZXInKQ0KICAgIHNldC5oYXNfZWx0KCdleHByJykN CiAgICBzZXQuaGFzX2VsdCgnc2VsZicpDQogICAgc2V0Lmhhc19lbHQoJ2ZsYWdzJykNCiAg ICBzZXQuaGFzX2VsdCgnc2VsZicpDQogICAgc2V0Lmhhc19lbHQoJ2xvd2VyJykNCiAgICBz ZXQuaGFzX2VsdCgnc2VsZicpDQogICAgc2V0Lmhhc19lbHQoJ3VwcGVyJykNCiAgICBzZXQu aGFzX2VsdCgnc2VsZicpDQoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAHNldHMvdGVzdHNldDk4LnB5AAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAwMTAwNjY0ADAwMDE1NTYAMDAwMDc2NQAwMDAwMDAwMDE3NQAwNzIzMzM0 NzI0MwAwMTM0NTUAIDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAdXN0YXIgIABqZXJlbXkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGFkbWluAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAc2l6ZSA9IDMN Cg0KZGVmIHRlc3QoZmFjdG9yeSk6DQogICAgc2V0ID0gZmFjdG9yeSgpDQogICAgc2V0LmFk ZCgnX19pbml0X18nKQ0KICAgIHNldC5hZGQoJ19nZXRDaGlsZHJlbicpDQogICAgc2V0LmFk ZCgnX19yZXByX18nc2V0cy90aW1lc2V0LnB5AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAADAxMDA2NjQAMDAwMTU1NgAwMDAwNzY1ADAwMDAwMDAxNDczADA3 MjMzMzQ3NjE0ADAxMzI1NwAgMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAB1c3RhciAgAGplcmVteQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYWRt aW4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABpbXBv cnQgZXNyc2V0DQppbXBvcnQgamFoc2V0DQppbXBvcnQgb3MNCmltcG9ydCB0aW1lDQoNCmRl ZiB0aW1laXQoZiwgaXRlcnM9cmFuZ2UoMzAwMCkpOg0KICAgIHQwID0gdGltZS5jbG9jaygp DQogICAgZm9yIGkgaW4gaXRlcnM6DQogICAgICAgIGYoKQ0KICAgIHQxID0gdGltZS5jbG9j aygpDQogICAgcmV0dXJuIHQxIC0gdDANCg0KY2xhc3MgZXNyd3JhcChlc3JzZXQuc2V0KToN CiAgICBkZWYgX19pbml0X18oc2VsZik6DQogICAgICAgIHNlbGYuZWxlbWVudHMgPSBbXQ0K DQogICAgYWRkID0gZXNyc2V0LnNldC5hcHBlbmQNCg0KICAgIGRlZiBoYXNfZWx0KHNlbGYs IGVsdCk6DQogICAgICAgIHJldHVybiBlbHQgaW4gc2VsZi5lbGVtZW50cw0KDQogICAgZGVm IHJlbW92ZShzZWxmLCBlbHQpOg0KICAgICAgICBpID0gc2VsZi5pbmRleChlbHQpDQogICAg ICAgIGRlbCBzZWxmLmVsZW1lbnRzW2ldDQoNCmRlZiBsaXN0X3Rlc3QoKToNCiAgICBtb2R1 bGUudGVzdChlc3J3cmFwKQ0KDQpkZWYgZGljdF90ZXN0KCk6DQogICAgbW9kdWxlLnRlc3Qo amFoc2V0LlNldCkNCg0KZm9yIGZpbGUgaW4gb3MubGlzdGRpcigiLiIpOg0KICAgIGlmIG5v dCBmaWxlLnN0YXJ0c3dpdGgoJ3Rlc3RzZXQnKToNCiAgICAgICAgY29udGludWUNCiAgICBu YW1lLCBleHQgPSBvcy5wYXRoLnNwbGl0ZXh0KGZpbGUpDQogICAgaWYgZXh0ICE9ICcucHkn Og0KICAgICAgICBjb250aW51ZQ0KICAgIG1vZHVsZSA9IF9faW1wb3J0X18obmFtZSkNCg0K ICAgIHByaW50IG5hbWUsIG1vZHVsZS5zaXplDQogICAgcHJpbnQgImRpY3QiLCB0aW1laXQo ZGljdF90ZXN0KSwgImxpc3QiLCB0aW1laXQobGlzdF90ZXN0KQ0KICAgIHByaW50DQogICAg DQoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHNldHMvZXNyc2V0LnB5AAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMTAwNjY0ADAwMDE1NTYAMDAwMDc2NQAw MDAwMDAxMzA0MgAwNzIzMzM0NzI1MwAwMTMxMDQAIDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdXN0YXIgIABqZXJlbXkAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAGFkbWluAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAIyBEZXNpZ24gYW5kIGltcGxlbWVudGF0aW9uIGJ5IEVTUiwgSmFudWFyeSAy MDAxLg0KDQpkZWYgc2V0aWZ5KGxpc3QxKToJCSMgVXNlZCBieSBzZXQgY29uc3RydWN0b3IN CiAgICAiUmVtb3ZlIGR1cGxpY2F0ZXMgaW4gc2VxdWVuY2UuIg0KICAgIHJlcyA9IFtdDQog ICAgZm9yIGkgaW4gcmFuZ2UobGVuKGxpc3QxKSk6DQoJZHVwbGljYXRlID0gMA0KICAgICAg ICBmb3IgaiBpbiByYW5nZShpKToNCgkgICAgaWYgbGlzdDFbaV0gPT0gbGlzdDFbal06DQoJ CWR1cGxpY2F0ZSA9IDENCgkJYnJlYWsNCglpZiBub3QgZHVwbGljYXRlOg0KCSAgICByZXMu YXBwZW5kKGxpc3QxW2ldKQ0KICAgIHJldHVybiByZXMNCg0KZGVmIHVuaW9uKGxpc3QxLCBs aXN0Mik6CQkjIFVzZWQgZm9yIHwNCiAgICAiQ29tcHV0ZSBzZXQgaW50ZXJzZWN0aW9uIG9m IHNlcXVlbmNlcy4iDQogICAgcmVzID0gbGlzdDFbOl0NCiAgICBmb3IgeCBpbiBsaXN0MjoN CglpZiBub3QgeCBpbiBsaXN0MToNCgkgICAgcmVzLmFwcGVuZCh4KQ0KICAgIHJldHVybiBy ZXMNCg0KZGVmIGludGVyc2VjdGlvbihsaXN0MSwgbGlzdDIpOgkJIyBVc2VkIGZvciAmDQog ICAgIkNvbXB1dGUgc2V0IGludGVyc2VjdGlvbiBvZiBzZXF1ZW5jZXMuIg0KICAgIHJlcyA9 IFtdDQogICAgZm9yIHggaW4gbGlzdDE6DQoJaWYgeCBpbiBsaXN0MjoNCgkgICAgcmVzLmFw cGVuZCh4KQ0KICAgIHJldHVybiByZXMNCg0KZGVmIGRpZmZlcmVuY2UobGlzdDEsIGxpc3Qy KToJCSMgVXNlZCBmb3IgLQ0KICAgICJDb21wdXRlIHNldCBkaWZmZXJlbmNlIG9mIHNlcXVl bmNlcy4iDQogICAgcmVzID0gW10NCiAgICBmb3IgeCBpbiBsaXN0MToNCglpZiBub3QgeCBp biBsaXN0MjoNCgkgICAgcmVzLmFwcGVuZCh4KQ0KICAgIHJldHVybiByZXMNCg0KZGVmIHN5 bW1ldHJpY19kaWZmZXJlbmNlKGxpc3QxLCBsaXN0Mik6CSMgVXNlZCBmb3IgXg0KICAgICJD b21wdXRlIHNldCBzeW1tZXRyaWMtZGlmZmVyZW5jZSBvZiBzZXF1ZW5jZXMuIg0KICAgIHJl cyA9IFtdDQogICAgZm9yIHggaW4gbGlzdDE6DQoJaWYgbm90IHggaW4gbGlzdDI6DQoJICAg IHJlcy5hcHBlbmQoeCkNCiAgICBmb3IgeCBpbiBsaXN0MjoNCglpZiBub3QgeCBpbiBsaXN0 MToNCgkgICAgcmVzLmFwcGVuZCh4KQ0KICAgIHJldHVybiByZXMNCg0KZGVmIGNhcnRlc2lh bihsaXN0MSwgbGlzdDIpOgkJIyBVc2VkIGZvciAqDQogICAgIkNhcnRlc2lhbiBwcm9kdWN0 IG9mIHNlcXVlbmNlcyBjb25zaWRlcmVkIGFzIHNldHMuIg0KICAgIHJlcyA9IFtdDQogICAg Zm9yIHggaW4gbGlzdDE6DQoJZm9yIHkgaW4gbGlzdDI6DQoJICAgIHJlcy5hcHBlbmQoKHgs eSkpDQogICAgcmV0dXJuIHJlcw0KDQpkZWYgZXF1YWxpdHkobGlzdDEsIGxpc3QyKToNCiAg ICAiVGVzdCBzZXF1ZW5jZXMgY29uc2lkZXJlZCBhcyBzZXRzIGZvciBlcXVhbGl0eS4iDQog ICAgaWYgbGVuKGxpc3QxKSAhPSBsZW4obGlzdDIpOg0KICAgICAgICByZXR1cm4gMA0KICAg IGZvciB4IGluIGxpc3QxOg0KICAgICAgICBpZiBub3QgeCBpbiBsaXN0MjoNCiAgICAgICAg ICAgIHJldHVybiAwDQogICAgZm9yIHggaW4gbGlzdDI6DQogICAgICAgIGlmIG5vdCB4IGlu IGxpc3QxOg0KICAgICAgICAgICAgcmV0dXJuIDANCiAgICByZXR1cm4gMQ0KDQpkZWYgcHJv cGVyX3N1YnNldChsaXN0MSwgbGlzdDIpOg0KICAgICJSZXR1cm4gMSBpZiBmaXJzdCBhcmd1 bWVudCBpcyBhIHByb3BlciBzdWJzZXQgb2Ygc2Vjb25kLCAwIG90aGVyd2lzZS4iDQogICAg aWYgbm90IGxlbihsaXN0MSkgPCBsZW4obGlzdDIpOg0KICAgICAgICByZXR1cm4gMA0KICAg IGZvciB4IGluIGxpc3QxOg0KICAgICAgICBpZiBub3QgeCBpbiBsaXN0MjoNCiAgICAgICAg ICAgIHJldHVybiAwDQogICAgcmV0dXJuIDENCg0KZGVmIHN1YnNldChsaXN0MSwgbGlzdDIp Og0KICAgICJSZXR1cm4gMSBpZiBmaXJzdCBhcmd1bWVudCBpcyBhIHN1YnNldCBvZiBzZWNv bmQsIDAgb3RoZXJ3aXNlLiINCiAgICBpZiBub3QgbGVuKGxpc3QxKSA8PSBsZW4obGlzdDIp Og0KICAgICAgICByZXR1cm4gMA0KICAgIGZvciB4IGluIGxpc3QxOg0KICAgICAgICBpZiBu b3QgeCBpbiBsaXN0MjoNCiAgICAgICAgICAgIHJldHVybiAwDQogICAgcmV0dXJuIDENCg0K ZGVmIHBvd2Vyc2V0KGJhc2UpOg0KICAgICJDb21wdXRlIHRoZSBzZXQgb2YgYWxsIHN1YnNl dHMgb2YgYSBzZXQuIg0KICAgIHBvd2Vyc2V0ID0gW10NCiAgICBmb3IgbiBpbiB4cmFuZ2Uo MiAqKiBsZW4oYmFzZSkpOg0KCXN1YnNldCA9IFtdDQoJZm9yIGUgaW4geHJhbmdlKGxlbihi YXNlKSk6DQoJICAgICBpZiBuICYgMiAqKiBlOg0KCQlzdWJzZXQuYXBwZW5kKGJhc2VbZV0p DQoJcG93ZXJzZXQuYXBwZW5kKHN1YnNldCkNCiAgICByZXR1cm4gcG93ZXJzZXQNCg0KY2xh c3Mgc2V0Og0KICAgICJMaXN0cyB3aXRoIHNldC10aGVvcmV0aWMgb3BlcmF0aW9ucy4iDQoN CiAgICBkZWYgX19pbml0X18oc2VsZiwgdmFsdWUpOg0KICAgICAgICBzZWxmLmVsZW1lbnRz ID0gc2V0aWZ5KHZhbHVlKQ0KDQogICAgZGVmIF9fbGVuX18oc2VsZik6DQoJcmV0dXJuIGxl bihzZWxmLmVsZW1lbnRzKQ0KDQogICAgZGVmIF9fZ2V0aXRlbV9fKHNlbGYsIGluZCk6DQoJ cmV0dXJuIHNlbGYuZWxlbWVudHNbaW5kXQ0KDQogICAgZGVmIF9fc2V0aXRlbV9fKHNlbGYs IGluZCwgdmFsKToNCiAgICAgICAgaWYgdmFsIG5vdCBpbiBzZWxmLmVsZW1lbnRzOg0KICAg ICAgICAgICAgc2VsZi5lbGVtZW50c1tpbmRdID0gdmFsDQoNCiAgICBkZWYgX19kZWxpdGVt X18oc2VsZiwgaW5kKToNCglkZWwgc2VsZi5lbGVtZW50c1tpbmRdDQoNCiAgICBkZWYgbGlz dChzZWxmKToNCiAgICAgICAgcmV0dXJuIHNlbGYuZWxlbWVudHMNCg0KICAgIGRlZiBhcHBl bmQoc2VsZiwgbmV3KToNCiAgICAgICAgaWYgbmV3IG5vdCBpbiBzZWxmLmVsZW1lbnRzOg0K ICAgICAgICAgICAgc2VsZi5lbGVtZW50cy5hcHBlbmQobmV3KQ0KDQogICAgZGVmIGV4dGVu ZChzZWxmLCBuZXcpOg0KCXNlbGYuZWxlbWVudHMuZXh0ZW5kKG5ldykNCiAgICAgICAgc2Vs Zi5lbGVtZW50cyA9IHNldGlmeShzZWxmLmVsZW1lbnRzKQ0KDQogICAgZGVmIGNvdW50KHNl bGYsIHgpOg0KCXNlbGYuZWxlbWVudHMuY291bnQoeCkNCg0KICAgIGRlZiBpbmRleChzZWxm LCB4KToNCglzZWxmLmVsZW1lbnRzLmluZGV4KHgpDQoNCiAgICBkZWYgaW5zZXJ0KHNlbGYs IGksIHgpOg0KICAgICAgICBpZiB4IG5vdCBpbiBzZWxmLmVsZW1lbnRzOg0KICAgICAgICAg ICAgc2VsZi5lbGVtZW50cy5pbmRleChpLCB4KQ0KDQogICAgZGVmIHBvcChzZWxmLCBpPU5v bmUpOg0KCXNlbGYuZWxlbWVudHMucG9wKGkpDQoNCiAgICBkZWYgcmVtb3ZlKHNlbGYsIHgp Og0KCXNlbGYuZWxlbWVudHMucmVtb3ZlKHgpDQoNCiAgICBkZWYgcmV2ZXJzZShzZWxmKToN CglzZWxmLmVsZW1lbnRzLnJldmVyc2UoKQ0KDQogICAgZGVmIHNvcnQoc2VsZiwgY21wPU5v bmUpOg0KCXNlbGYuZWxlbWVudHMuc29ydChjbXApDQoNCiAgICBkZWYgX19vcl9fKHNlbGYs IG90aGVyKToNCglpZiB0eXBlKG90aGVyKSA9PSB0eXBlKHNlbGYpOg0KCSAgICBvdGhlciA9 IG90aGVyLmVsZW1lbnRzDQogICAgICAgIHJldHVybiBzZXQodW5pb24oc2VsZi5lbGVtZW50 cywgb3RoZXIpKQ0KDQogICAgX19hZGRfXyA9IF9fb3JfXw0KDQogICAgZGVmIF9fYW5kX18o c2VsZiwgb3RoZXIpOg0KCWlmIHR5cGUob3RoZXIpID09IHR5cGUoc2VsZik6DQoJICAgIG90 aGVyID0gb3RoZXIuZWxlbWVudHMNCiAgICAgICAgcmV0dXJuIHNldChpbnRlcnNlY3Rpb24o c2VsZi5lbGVtZW50cywgb3RoZXIpKQ0KDQogICAgZGVmIF9fc3ViX18oc2VsZiwgb3RoZXIp Og0KCWlmIHR5cGUob3RoZXIpID09IHR5cGUoc2VsZik6DQoJICAgIG90aGVyID0gb3RoZXIu ZWxlbWVudHMNCiAgICAgICAgcmV0dXJuIHNldChkaWZmZXJlbmNlKHNlbGYuZWxlbWVudHMs IG90aGVyKSkNCg0KICAgIGRlZiBfX3hvcl9fKHNlbGYsIG90aGVyKToNCglpZiB0eXBlKG90 aGVyKSA9PSB0eXBlKHNlbGYpOg0KCSAgICBvdGhlciA9IG90aGVyLmVsZW1lbnRzDQogICAg ICAgIHJldHVybiBzZXQoc3ltbWV0cmljX2RpZmZlcmVuY2Uoc2VsZi5lbGVtZW50cywgb3Ro ZXIpKQ0KDQogICAgZGVmIF9fbXVsX18oc2VsZiwgb3RoZXIpOg0KCWlmIHR5cGUob3RoZXIp ID09IHR5cGUoc2VsZik6DQoJICAgIG90aGVyID0gb3RoZXIuZWxlbWVudHMNCiAgICAgICAg cmV0dXJuIHNldChjYXJ0ZXNpYW4oc2VsZi5lbGVtZW50cywgb3RoZXIpKQ0KDQogICAgZGVm IF9fZXFfXyhzZWxmLCBvdGhlcik6DQoJaWYgdHlwZShvdGhlcikgPT0gdHlwZShzZWxmKToN CgkgICAgb3RoZXIgPSBvdGhlci5lbGVtZW50cw0KICAgICAgICByZXR1cm4gc2VsZi5lbGVt ZW50cyA9PSBvdGhlcg0KDQogICAgZGVmIF9fbmVfXyhzZWxmLCBvdGhlcik6DQoJaWYgdHlw ZShvdGhlcikgPT0gdHlwZShzZWxmKToNCgkgICAgb3RoZXIgPSBvdGhlci5lbGVtZW50cw0K ICAgICAgICByZXR1cm4gc2VsZi5lbGVtZW50cyAhPSBvdGhlcg0KDQogICAgZGVmIF9fbHRf XyhzZWxmLCBvdGhlcik6DQoJaWYgdHlwZShvdGhlcikgPT0gdHlwZShzZWxmKToNCgkgICAg b3RoZXIgPSBvdGhlci5lbGVtZW50cw0KICAgICAgICByZXR1cm4gcHJvcGVyX3N1YnNldChz ZWxmLmVsZW1lbnRzLCBvdGhlcikNCg0KICAgIGRlZiBfX2xlX18oc2VsZiwgb3RoZXIpOg0K CWlmIHR5cGUob3RoZXIpID09IHR5cGUoc2VsZik6DQoJICAgIG90aGVyID0gb3RoZXIuZWxl bWVudHMNCiAgICAgICAgcmV0dXJuIHN1YnNldChzZWxmLmVsZW1lbnRzLCBvdGhlcikNCg0K ICAgIGRlZiBfX2d0X18oc2VsZiwgb3RoZXIpOg0KCWlmIHR5cGUob3RoZXIpID09IHR5cGUo c2VsZik6DQoJICAgIG90aGVyID0gb3RoZXIuZWxlbWVudHMNCiAgICAgICAgcmV0dXJuIHBy b3Blcl9zdWJzZXQob3RoZXIsIHNlbGYuZWxlbWVudHMpDQoNCiAgICBkZWYgX19nZV9fKHNl bGYsIG90aGVyKToNCglpZiB0eXBlKG90aGVyKSA9PSB0eXBlKHNlbGYpOg0KCSAgICBvdGhl ciA9IG90aGVyLmVsZW1lbnRzDQogICAgICAgIHJldHVybiBzdWJzZXQob3RoZXIsIHNlbGYu ZWxlbWVudHMpDQoNCiAgICBkZWYgX19zdHJfXyhzZWxmKToNCiAgICAgICAgcmVzID0gInsi DQogICAgICAgIGZvciB4IGluIHNlbGYuZWxlbWVudHM6DQogICAgICAgICAgICByZXMgPSBy ZXMgKyBzdHIoeCkgKyAiLCAiDQogICAgICAgIHJlcyA9IHJlc1swOi0yXSArICJ9Ig0KICAg ICAgICByZXR1cm4gcmVzDQoNCiAgICBkZWYgX19yZXByX18oc2VsZik6DQogICAgICAgIHJl dHVybiByZXByKHNlbGYuZWxlbWVudHMpDQoNCmlmIF9fbmFtZV9fID09ICdfX21haW5fXyc6 DQogICAgYSA9IHNldChbMSwgMiwgMywgNF0pDQogICAgYiA9IHNldChbMSwgNF0pDQogICAg YyA9IHNldChbNSwgNl0pDQogICAgZCA9IFsxLCAxLCAyLCAxXQ0KICAgIHByaW50IGBkYCwg InNldGlmaWVzIHRvIiwgc2V0KGQpDQogICAgcHJpbnQgYGFgLCAifCIsIGBiYCwgImlzIiwg YGEgfCBiYA0KICAgIHByaW50IGBhYCwgIl4iLCBgYmAsICJpcyIsIGBhIF4gYmANCiAgICBw cmludCBgYWAsICImIiwgYGJgLCAiaXMiLCBgYSAmIGJgDQogICAgcHJpbnQgYGJgLCAiKiIs IGBjYCwgImlzIiwgYGIgKiBjYA0KICAgIHByaW50IGBhYCwgJzwnLCBgYmAsICJpcyIsIGBh IDwgYmANCiAgICBwcmludCBgYWAsICc+JywgYGJgLCAiaXMiLCBgYSA+IGJgDQogICAgcHJp bnQgYGJgLCAnPCcsIGBjYCwgImlzIiwgYGIgPCBjYA0KICAgIHByaW50IGBiYCwgJz4nLCBg Y2AsICJpcyIsIGBiID4gY2ANCiAgICBwcmludCAiUG93ZXIgc2V0IG9mIiwgYGNgLCAiaXMi LCBwb3dlcnNldChjKQ0KDQojIGVuldHMvamFoc2V0LnB5AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAwMTAwNjY0ADAwMDE1NTYAMDAwMDc2NQAwMDAwMDAwMDYwMQAwNzIzMzM0Nzcx NQAwMTMwNTUAIDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAdXN0YXIgIABqZXJlbXkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGFkbWluAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAY2xhc3MgU2V0Og0K ICAgIGRlZiBfX2luaXRfXyhzZWxmKToNCiAgICAgICAgc2VsZi5lbHRzID0ge30NCiMjICAg ICAgICBzZXRzIGFyZSBmYXN0ZXIgd2hlbiBtZXRob2Qgb3ZlcmhlYWQgaXMgcmVtb3ZlZDoN CiMjICAgICAgICBzZWxmLmVsZW1lbnRzID0gc2VsZi5lbHRzLmtleXMNCiMjICAgICAgICBz ZWxmLmhhc19lbHQgPSBzZWxmLmVsdHMuaGFzX2tleQ0KDQogICAgZGVmIGFkZChzZWxmLCBl bHQpOg0KICAgICAgICBzZWxmLmVsdHNbZWx0XSA9IE5vbmUNCg0KICAgIGRlZiBlbGVtZW50 cyhzZWxmKToNCiAgICAgICAgcmV0dXJuIHNlbGYuZWx0cy5rZXlzKCkNCg0KICAgIGRlZiBo YXNfZWx0KHNlbGYsIGVsdCk6DQogICAgICAgIHJldHVybiBzZWxmLmVsdHMuaGFzX2tleShl bHQpDQogICAgDQovJPdPv5cJ-- From loewis@informatik.hu-berlin.de Tue Jan 23 18:51:37 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 23 Jan 2001 19:51:37 +0100 (MET) Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <200101231755.KAA03471@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101231755.KAA03471@localhost.localdomain> Message-ID: <200101231851.TAA19488@pandora.informatik.hu-berlin.de> > I'll try to some time to look into this more closely, or perhaps > someone will straighten me out if I'm on the wrong trail. Spending only a little time myself, either, I'd agree with your conclusions. Regards, Martin From esr@thyrsus.com Tue Jan 23 18:55:30 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 13:55:30 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <14957.53331.342827.462297@localhost.localdomain>; from jeremy@alum.mit.edu on Tue, Jan 23, 2001 at 01:41:23PM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> Message-ID: <20010123135530.A26565@thyrsus.com> Jeremy Hylton : Content-Description: message body text > The tests showed that dictionary-based sets were always faster. For > small tests (3 operations), the difference was about 10 percent. For > larger tests (88 operations), the difference ranged from 180 to almost > 700 percent. Not surprising. 88 elements is getting pretty large. -- Eric S. Raymond Hoplophobia (n.): The irrational fear of weapons, correctly described by Freud as "a sign of emotional and sexual immaturity". Hoplophobia, like homophobia, is a displacement symptom; hoplophobes fear their own "forbidden" feelings and urges to commit violence. This would be harmless, except that they project these feelings onto others. The sequelae of this neurosis include irrational and dangerous behaviors such as passing "gun-control" laws and trashing the Constitution. From petrilli@amber.org Tue Jan 23 19:06:05 2001 From: petrilli@amber.org (Christopher Petrilli) Date: Tue, 23 Jan 2001 14:06:05 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123133904.B26487@thyrsus.com>; from esr@thyrsus.com on Tue, Jan 23, 2001 at 01:39:04PM -0500 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> Message-ID: <20010123140604.E18796@trump.amber.org> Eric S. Raymond [esr@thyrsus.com] wrote: > I believe most uses of sets are small sets. That reduces the speed ouch > of using a list representation and increases the proportional memory > ouch of a dictionary implementation. The problem is that there are a lot of uses for large sets, especially when you begin to introduce intersections and unions. If an implementation is only useful for a few dozen (or a hundered) items in the set, that eliminates a lot of places where the real use of set types is useful---optimizing large scale manipulations. Zope for example, manipulates sets with 10,000 items in it on a regular basis when doing text index manipulation. The data structures are heavily optimized for this kind of behaviour, without a major sacrifice in space. I think Jim perhaps can talk to this. Unfortunately, for me, a Python implementation of Sets is only interesting academicaly. Any time I've needed to work with them at a large scale, I've needed them *much* faster than Python could achieve without a C extension. Perhaps the difference is in problem domain. In the "scripting" problem domain, I would agree that Setswould rarely reach large sizes, and so a algorithm which performed in quadratic time might be fine, because the actual resultant time is small. However, in more full-blown applications, this would be counter productive, and the user would be forced implement their own (or use Aaron's excellent kjBuckets). Just my opinion, of course. Chris -- | Christopher Petrilli | petrilli@amber.org From ping@lfw.org Tue Jan 23 19:27:38 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 11:27:38 -0800 (PST) Subject: [Python-Dev] Sets: elt in dict, lst.include In-Reply-To: <14957.53331.342827.462297@localhost.localdomain> Message-ID: On Tue, 23 Jan 2001, Jeremy Hylton wrote: > For my applications, the dictionary-based approach is faster and > offers a natural interface. The only change that needs to be made to support sets of immutable elements is to provide "in" on dictionaries. The rest is then all quite natural: dict[key] = 1 if key in dict: ... for key in dict: ... (Then we can also get rid of the ugly has_key method.) For those that need mutable set elements badly enough to sacrifice a little speed, we can add two methods to lists: lst.include(elt) # same as - if elt not in lst: lst.append(elt) lst.exclude(elt) # same as - while elt in lst: lst.remove(elt) (These are generally useful methods to have anyway.) This proposal has the following advantages: 1. You still get to choose which implementation best suits your needs. 2. No new types are introduced; lists and dicts are well understood. 3. Both features are extremely simple to understand and explain. 4. Both features are useful in their own right, and could stand as independent proposals to improve lists and dicts respectively. (For instance, i spotted about 10 places in the std library where the 'include' method could be used, and i know i would use it myself -- certainly more often than pop or reverse!) 5. In all cases this is faster than a new Python class. (For instance, Jeremy's implementation even contained a commented-out optimization that stored self.elts.has_key as self.has_elt to speed things up a bit. Using straight dicts would see this optimization and raise it one, with no effort at all.) 6. Either feature can be independently approved or rejected without affecting the other. -- ?!ng From loewis@informatik.hu-berlin.de Tue Jan 23 19:33:00 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 23 Jan 2001 20:33:00 +0100 (MET) Subject: [Python-Dev] getting rid of ucnhash Message-ID: <200101231933.UAA02223@pandora.informatik.hu-berlin.de> > Should I check it in now, change the names/semantics and check it > in, or post it to sourceforge? Is that two or three options? If three, what change in semantics did you propose? Anyway, I feel it could go in right now; the only breakage would be to applications that use ucnhash.ucnhashAPI, right? Regards, Martin From fredrik@effbot.org Tue Jan 23 19:49:09 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 20:49:09 +0100 Subject: [Python-Dev] Re: getting rid of ucnhash References: <200101231933.UAA02223@pandora.informatik.hu-berlin.de> Message-ID: <01e801c08575$8f71c680$e46940d5@hagrid> martin wrote: > > Should I check it in now, change the names/semantics and check it > > in, or post it to sourceforge? > > Is that two or three options? three, I think. > If three, what change in semantics did you propose? none -- but maybe someone else has a better name for "lookup"? (the "name" function behaves like the existing property methods in 2.0's unicodedata) > Anyway, I feel it could go in right now; the only breakage would be to > applications that use ucnhash.ucnhashAPI, right? yup -- and those applications are already broken, since the CObject was renamed in 2.1a1. (well, any code using 2.1a1's new ucnhash.getcode/getname functions will of course also break. but I think we can live with that ;-) Cheers /F From ping@lfw.org Tue Jan 23 19:43:50 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 11:43:50 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Message-ID: Christopher Petrilli wrote: > The problem is that there are a lot of uses for large sets, especially > when you begin to introduce intersections and unions. [...] > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. On Tue, 23 Jan 2001, Ka-Ping Yee wrote: > This proposal has the following advantages: [six nice things about 'in dict' and 'lst.include'] I forgot to mention an important seventh advantage: 7. The list and dictionary data structures are implemented in the C core, so we leave open the possibility of a wizard going and optimizing the snot out of them later. Just as there's e.g. a boundary on recursion levels before Python invokes the cycle detection algorithm during comparison, if we decide we need more speed for big sets, Python could notice when a list or dictionary gets very big and invoke more powerful optimizations. We don't have to do this now, but the important thing is that we will always have the option to make Christopher's dream come true. (A wizard can do this once, and every Python script on the planet benefits.) In general i support Python deciding on the Right Thing to do under the hood, performance-wise, so that the programmer doesn't have to think too hard about what data structure to choose. -- ?!ng From nas@arctrix.com Tue Jan 23 13:08:07 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 23 Jan 2001 05:08:07 -0800 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123140604.E18796@trump.amber.org>; from petrilli@amber.org on Tue, Jan 23, 2001 at 02:06:05PM -0500 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> Message-ID: <20010123050807.A29115@glacier.fnational.com> On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. I think this argues that if sets are added to the core they should be implemented as an extension type with the speed of dictionaries and the memory usage of lists. Basicly, we would use the implementation of PyDict but drop the values. Neil From jeremy@alum.mit.edu Tue Jan 23 19:48:18 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 14:48:18 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <14957.53331.342827.462297@localhost.localdomain> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> Message-ID: <14957.57346.248852.656387@localhost.localdomain> --lebymX04xi Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit Sorry about the garbled attachment on the previous message; I think I got the content-type wrong. Here's a second try. Jeremy --lebymX04xi Content-Type: application/octet-stream Content-Disposition: attachment; filename="sets.tar" Content-Transfer-Encoding: base64 c2V0cy8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwNDA3NzUA MDAwMTU1NgAwMDAwNzY1ADAwMDAwMDAwMDAwADA3MjMzMzUwMDA1ADAxMTIxNQAgNQAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhciAgAGplcmVt eQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYWRtaW4AAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABzZXRzL3Rlc3RzZXQxOC5weQAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAMDEwMDY2NAAwMDAxNTU2ADAwMDA3NjUAMDAwMDAwMDQ2MTQA MDcyMzMzNDcyNDMAMDEzNDQ3ACAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAHVzdGFyICAAamVyZW15AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABh ZG1pbgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHNp emUgPSA4OAoKZGVmIHRlc3QoZmFjdG9yeSk6CiAgICBzZXQgPSBmYWN0b3J5KCkKICAgIHNl dC5hZGQoJ29wdGltaXplZCcpCiAgICBzZXQuYWRkKCdfX2luaXRfXycpCiAgICBzZXQuYWRk KCdfc2V0dXBHcmFwaERlbGVnYXRpb24nKQogICAgc2V0LmFkZCgnZ2V0Q29kZScpCiAgICBz ZXQuYWRkKCdpc0xvY2FsTmFtZScpCiAgICBzZXQuYWRkKCdzdG9yZU5hbWUnKQogICAgc2V0 LmFkZCgnbG9hZE5hbWUnKQogICAgc2V0LmFkZCgnZGVsTmFtZScpCiAgICBzZXQuYWRkKCdf bmFtZU9wJykKICAgIHNldC5hZGQoJ3NldF9saW5lbm8nKQogICAgc2V0LmFkZCgndmlzaXRN b2R1bGUnKQogICAgc2V0LmFkZCgndmlzaXRGdW5jdGlvbicpCiAgICBzZXQuYWRkKCd2aXNp dExhbWJkYScpCiAgICBzZXQuYWRkKCdfdmlzaXRGdW5jT3JMYW1iZGEnKQogICAgc2V0LmFk ZCgndmlzaXRDbGFzcycpCiAgICBzZXQuYWRkKCd2aXNpdElmJykKICAgIHNldC5hZGQoJ3Zp c2l0V2hpbGUnKQogICAgc2V0LmFkZCgndmlzaXRGb3InKQogICAgc2V0LmFkZCgndmlzaXRC cmVhaycpCiAgICBzZXQuYWRkKCd2aXNpdENvbnRpbnVlJykKICAgIHNldC5hZGQoJ3Zpc2l0 VGVzdCcpCiAgICBzZXQuYWRkKCd2aXNpdEFuZCcpCiAgICBzZXQuYWRkKCd2aXNpdE9yJykK ICAgIHNldC5hZGQoJ3Zpc2l0Q29tcGFyZScpCiAgICBzZXQuYWRkKCdfX2xpc3RfY291bnQn KQogICAgc2V0LmFkZCgndmlzaXRMaXN0Q29tcCcpCiAgICBzZXQuYWRkKCd2aXNpdExpc3RD b21wRm9yJykKICAgIHNldC5hZGQoJ3Zpc2l0TGlzdENvbXBJZicpCiAgICBzZXQuYWRkKCd2 aXNpdEFzc2VydCcpCiAgICBzZXQuYWRkKCd2aXNpdFJhaXNlJykKICAgIHNldC5hZGQoJ3Zp c2l0VHJ5RXhjZXB0JykKICAgIHNldC5hZGQoJ3Zpc2l0VHJ5RmluYWxseScpCiAgICBzZXQu YWRkKCd2aXNpdERpc2NhcmQnKQogICAgc2V0LmFkZCgndmlzaXRDb25zdCcpCiAgICBzZXQu YWRkKCd2aXNpdEtleXdvcmQnKQogICAgc2V0LmFkZCgndmlzaXRHbG9iYWwnKQogICAgc2V0 LmFkZCgndmlzaXROYW1lJykKICAgIHNldC5hZGQoJ3Zpc2l0UGFzcycpCiAgICBzZXQuYWRk KCd2aXNpdEltcG9ydCcpCiAgICBzZXQuYWRkKCd2aXNpdEZyb20nKQogICAgc2V0LmFkZCgn X3Jlc29sdmVEb3RzJykKICAgIHNldC5hZGQoJ3Zpc2l0R2V0YXR0cicpCiAgICBzZXQuYWRk KCd2aXNpdEFzc2lnbicpCiAgICBzZXQuYWRkKCd2aXNpdEFzc05hbWUnKQogICAgc2V0LmFk ZCgndmlzaXRBc3NBdHRyJykKICAgIHNldC5hZGQoJ192aXNpdEFzc1NlcXVlbmNlJykKICAg IHNldC5hZGQoJ3Zpc2l0QXNzVHVwbGUnKQogICAgc2V0LmFkZCgndmlzaXRBc3NMaXN0JykK ICAgIHNldC5hZGQoJ3Zpc2l0QXNzVHVwbGUnKQogICAgc2V0LmFkZCgndmlzaXRBc3NMaXN0 JykKICAgIHNldC5hZGQoJ3Zpc2l0QXVnQXNzaWduJykKICAgIHNldC5hZGQoJ19hdWdtZW50 ZWRfb3Bjb2RlJykKICAgIHNldC5hZGQoJ3Zpc2l0QXVnTmFtZScpCiAgICBzZXQuYWRkKCd2 aXNpdEF1Z0dldGF0dHInKQogICAgc2V0LmFkZCgndmlzaXRBdWdTbGljZScpCiAgICBzZXQu YWRkKCd2aXNpdEF1Z1N1YnNjcmlwdCcpCiAgICBzZXQuYWRkKCd2aXNpdEV4ZWMnKQogICAg c2V0LmFkZCgndmlzaXRDYWxsRnVuYycpCiAgICBzZXQuYWRkKCd2aXNpdFByaW50JykKICAg IHNldC5hZGQoJ3Zpc2l0UHJpbnRubCcpCiAgICBzZXQuYWRkKCd2aXNpdFJldHVybicpCiAg ICBzZXQuYWRkKCd2aXNpdFNsaWNlJykKICAgIHNldC5hZGQoJ3Zpc2l0U3Vic2NyaXB0JykK ICAgIHNldC5hZGQoJ2JpbmFyeU9wJykKICAgIHNldC5hZGQoJ3Zpc2l0QWRkJykKICAgIHNl dC5hZGQoJ3Zpc2l0U3ViJykKICAgIHNldC5hZGQoJ3Zpc2l0TXVsJykKICAgIHNldC5hZGQo J3Zpc2l0RGl2JykKICAgIHNldC5hZGQoJ3Zpc2l0TW9kJykKICAgIHNldC5hZGQoJ3Zpc2l0 UG93ZXInKQogICAgc2V0LmFkZCgndmlzaXRMZWZ0U2hpZnQnKQogICAgc2V0LmFkZCgndmlz aXRSaWdodFNoaWZ0JykKICAgIHNldC5hZGQoJ3VuYXJ5T3AnKQogICAgc2V0LmFkZCgndmlz aXRJbnZlcnQnKQogICAgc2V0LmFkZCgndmlzaXRVbmFyeVN1YicpCiAgICBzZXQuYWRkKCd2 aXNpdFVuYXJ5QWRkJykKICAgIHNldC5hZGQoJ3Zpc2l0VW5hcnlJbnZlcnQnKQogICAgc2V0 LmFkZCgndmlzaXROb3QnKQogICAgc2V0LmFkZCgndmlzaXRCYWNrcXVvdGUnKQogICAgc2V0 LmFkZCgnYml0T3AnKQogICAgc2V0LmFkZCgndmlzaXRCaXRhbmQnKQogICAgc2V0LmFkZCgn dmlzaXRCaXRvcicpCiAgICBzZXQuYWRkKCd2aXNpdEJpdHhvcicpCiAgICBzZXQuYWRkKCd2 aXNpdEVsbGlwc2lzJykKICAgIHNldC5hZGQoJ3Zpc2l0VHVwbGUnKQogICAgc2V0LmFkZCgn dmlzaXRMaXN0JykKICAgIHNldC5hZGQoJ3Zpc2l0U2xpY2VvYmonKQogICAgc2V0LmFkZCgn dmlzaXREaWN0JykKAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAABzZXRzL3Rlc3RzZXQ4OC5weQAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAMDEwMDY2NAAwMDAxNTU2ADAwMDA3NjUAMDAwMDAwMDA1MzQAMDcyMzMz NDcyNDMAMDEzNDUzACAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAHVzdGFyICAAamVyZW15AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABhZG1pbgAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHNpemUgPSAx MwoKZGVmIHRlc3QoZmFjdG9yeSk6CiAgICBzZXQgPSBmYWN0b3J5KCkKICAgIHNldC5hZGQo J3NlbGYnKQogICAgc2V0LmFkZCgnZXhwcicpCiAgICBzZXQuYWRkKCdmbGFncycpCiAgICBz ZXQuYWRkKCdsb3dlcicpCiAgICBzZXQuYWRkKCd1cHBlcicpCiAgICBzZXQuaGFzX2VsdCgn ZXhwcicpCiAgICBzZXQuaGFzX2VsdCgnc2VsZicpCiAgICBzZXQuaGFzX2VsdCgnZmxhZ3Mn KQogICAgc2V0Lmhhc19lbHQoJ3NlbGYnKQogICAgc2V0Lmhhc19lbHQoJ2xvd2VyJykKICAg IHNldC5oYXNfZWx0KCdzZWxmJykKICAgIHNldC5oYXNfZWx0KCd1cHBlcicpCiAgICBzZXQu aGFzX2VsdCgnc2VsZicpCgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAc2V0cy90ZXN0c2V0OTgucHkAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAADAxMDA2NjQAMDAwMTU1NgAwMDAwNzY1ADAwMDAwMDAwMTc1ADA3MjMzMzQ3 MjQzADAxMzQ1NQAgMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAB1c3RhciAgAGplcmVteQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYWRtaW4AAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABzaXplID0gMwoK ZGVmIHRlc3QoZmFjdG9yeSk6CiAgICBzZXQgPSBmYWN0b3J5KCkKICAgIHNldC5hZGQoJ19f aW5pdF9fJykKICAgIHNldC5hZGQoJ19nZXRDaGlsZHJlbicpCiAgICBzZXQuYWRkKCdfX3Jl cHJfXycpCgldHMvdGltZXNldC5weQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAwMTAwNjY0ADAwMDE1NTYAMDAwMDc2NQAwMDAwMDAwMTQ3MwAwNzIzMzM0NzYx NAAwMTMyNTcAIDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAdXN0YXIgIABqZXJlbXkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGFkbWluAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaW1wb3J0IGVzcnNl dAppbXBvcnQgamFoc2V0CmltcG9ydCBvcwppbXBvcnQgdGltZQoKZGVmIHRpbWVpdChmLCBp dGVycz1yYW5nZSgzMDAwKSk6CiAgICB0MCA9IHRpbWUuY2xvY2soKQogICAgZm9yIGkgaW4g aXRlcnM6CiAgICAgICAgZigpCiAgICB0MSA9IHRpbWUuY2xvY2soKQogICAgcmV0dXJuIHQx IC0gdDAKCmNsYXNzIGVzcndyYXAoZXNyc2V0LnNldCk6CiAgICBkZWYgX19pbml0X18oc2Vs Zik6CiAgICAgICAgc2VsZi5lbGVtZW50cyA9IFtdCgogICAgYWRkID0gZXNyc2V0LnNldC5h cHBlbmQKCiAgICBkZWYgaGFzX2VsdChzZWxmLCBlbHQpOgogICAgICAgIHJldHVybiBlbHQg aW4gc2VsZi5lbGVtZW50cwoKICAgIGRlZiByZW1vdmUoc2VsZiwgZWx0KToKICAgICAgICBp ID0gc2VsZi5pbmRleChlbHQpCiAgICAgICAgZGVsIHNlbGYuZWxlbWVudHNbaV0KCmRlZiBs aXN0X3Rlc3QoKToKICAgIG1vZHVsZS50ZXN0KGVzcndyYXApCgpkZWYgZGljdF90ZXN0KCk6 CiAgICBtb2R1bGUudGVzdChqYWhzZXQuU2V0KQoKZm9yIGZpbGUgaW4gb3MubGlzdGRpcigi LiIpOgogICAgaWYgbm90IGZpbGUuc3RhcnRzd2l0aCgndGVzdHNldCcpOgogICAgICAgIGNv bnRpbnVlCiAgICBuYW1lLCBleHQgPSBvcy5wYXRoLnNwbGl0ZXh0KGZpbGUpCiAgICBpZiBl eHQgIT0gJy5weSc6CiAgICAgICAgY29udGludWUKICAgIG1vZHVsZSA9IF9faW1wb3J0X18o bmFtZSkKCiAgICBwcmludCBuYW1lLCBtb2R1bGUuc2l6ZQogICAgcHJpbnQgImRpY3QiLCB0 aW1laXQoZGljdF90ZXN0KSwgImxpc3QiLCB0aW1laXQobGlzdF90ZXN0KQogICAgcHJpbnQK ICAgIAoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHNldHMvZXNyc2V0LnB5 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMTAwNjY0ADAwMDE1NTYAMDAwMDc2 NQAwMDAwMDAxMzA0MgAwNzIzMzM0NzI1MwAwMTMxMDQAIDAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdXN0YXIgIABqZXJlbXkAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAGFkbWluAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAIyBEZXNpZ24gYW5kIGltcGxlbWVudGF0aW9uIGJ5IEVTUiwgSmFudWFy eSAyMDAxLgoKZGVmIHNldGlmeShsaXN0MSk6CQkjIFVzZWQgYnkgc2V0IGNvbnN0cnVjdG9y CiAgICAiUmVtb3ZlIGR1cGxpY2F0ZXMgaW4gc2VxdWVuY2UuIgogICAgcmVzID0gW10KICAg IGZvciBpIGluIHJhbmdlKGxlbihsaXN0MSkpOgoJZHVwbGljYXRlID0gMAogICAgICAgIGZv ciBqIGluIHJhbmdlKGkpOgoJICAgIGlmIGxpc3QxW2ldID09IGxpc3QxW2pdOgoJCWR1cGxp Y2F0ZSA9IDEKCQlicmVhawoJaWYgbm90IGR1cGxpY2F0ZToKCSAgICByZXMuYXBwZW5kKGxp c3QxW2ldKQogICAgcmV0dXJuIHJlcwoKZGVmIHVuaW9uKGxpc3QxLCBsaXN0Mik6CQkjIFVz ZWQgZm9yIHwKICAgICJDb21wdXRlIHNldCBpbnRlcnNlY3Rpb24gb2Ygc2VxdWVuY2VzLiIK ICAgIHJlcyA9IGxpc3QxWzpdCiAgICBmb3IgeCBpbiBsaXN0MjoKCWlmIG5vdCB4IGluIGxp c3QxOgoJICAgIHJlcy5hcHBlbmQoeCkKICAgIHJldHVybiByZXMKCmRlZiBpbnRlcnNlY3Rp b24obGlzdDEsIGxpc3QyKToJCSMgVXNlZCBmb3IgJgogICAgIkNvbXB1dGUgc2V0IGludGVy c2VjdGlvbiBvZiBzZXF1ZW5jZXMuIgogICAgcmVzID0gW10KICAgIGZvciB4IGluIGxpc3Qx OgoJaWYgeCBpbiBsaXN0MjoKCSAgICByZXMuYXBwZW5kKHgpCiAgICByZXR1cm4gcmVzCgpk ZWYgZGlmZmVyZW5jZShsaXN0MSwgbGlzdDIpOgkJIyBVc2VkIGZvciAtCiAgICAiQ29tcHV0 ZSBzZXQgZGlmZmVyZW5jZSBvZiBzZXF1ZW5jZXMuIgogICAgcmVzID0gW10KICAgIGZvciB4 IGluIGxpc3QxOgoJaWYgbm90IHggaW4gbGlzdDI6CgkgICAgcmVzLmFwcGVuZCh4KQogICAg cmV0dXJuIHJlcwoKZGVmIHN5bW1ldHJpY19kaWZmZXJlbmNlKGxpc3QxLCBsaXN0Mik6CSMg VXNlZCBmb3IgXgogICAgIkNvbXB1dGUgc2V0IHN5bW1ldHJpYy1kaWZmZXJlbmNlIG9mIHNl cXVlbmNlcy4iCiAgICByZXMgPSBbXQogICAgZm9yIHggaW4gbGlzdDE6CglpZiBub3QgeCBp biBsaXN0MjoKCSAgICByZXMuYXBwZW5kKHgpCiAgICBmb3IgeCBpbiBsaXN0MjoKCWlmIG5v dCB4IGluIGxpc3QxOgoJICAgIHJlcy5hcHBlbmQoeCkKICAgIHJldHVybiByZXMKCmRlZiBj YXJ0ZXNpYW4obGlzdDEsIGxpc3QyKToJCSMgVXNlZCBmb3IgKgogICAgIkNhcnRlc2lhbiBw cm9kdWN0IG9mIHNlcXVlbmNlcyBjb25zaWRlcmVkIGFzIHNldHMuIgogICAgcmVzID0gW10K ICAgIGZvciB4IGluIGxpc3QxOgoJZm9yIHkgaW4gbGlzdDI6CgkgICAgcmVzLmFwcGVuZCgo eCx5KSkKICAgIHJldHVybiByZXMKCmRlZiBlcXVhbGl0eShsaXN0MSwgbGlzdDIpOgogICAg IlRlc3Qgc2VxdWVuY2VzIGNvbnNpZGVyZWQgYXMgc2V0cyBmb3IgZXF1YWxpdHkuIgogICAg aWYgbGVuKGxpc3QxKSAhPSBsZW4obGlzdDIpOgogICAgICAgIHJldHVybiAwCiAgICBmb3Ig eCBpbiBsaXN0MToKICAgICAgICBpZiBub3QgeCBpbiBsaXN0MjoKICAgICAgICAgICAgcmV0 dXJuIDAKICAgIGZvciB4IGluIGxpc3QyOgogICAgICAgIGlmIG5vdCB4IGluIGxpc3QxOgog ICAgICAgICAgICByZXR1cm4gMAogICAgcmV0dXJuIDEKCmRlZiBwcm9wZXJfc3Vic2V0KGxp c3QxLCBsaXN0Mik6CiAgICAiUmV0dXJuIDEgaWYgZmlyc3QgYXJndW1lbnQgaXMgYSBwcm9w ZXIgc3Vic2V0IG9mIHNlY29uZCwgMCBvdGhlcndpc2UuIgogICAgaWYgbm90IGxlbihsaXN0 MSkgPCBsZW4obGlzdDIpOgogICAgICAgIHJldHVybiAwCiAgICBmb3IgeCBpbiBsaXN0MToK ICAgICAgICBpZiBub3QgeCBpbiBsaXN0MjoKICAgICAgICAgICAgcmV0dXJuIDAKICAgIHJl dHVybiAxCgpkZWYgc3Vic2V0KGxpc3QxLCBsaXN0Mik6CiAgICAiUmV0dXJuIDEgaWYgZmly c3QgYXJndW1lbnQgaXMgYSBzdWJzZXQgb2Ygc2Vjb25kLCAwIG90aGVyd2lzZS4iCiAgICBp ZiBub3QgbGVuKGxpc3QxKSA8PSBsZW4obGlzdDIpOgogICAgICAgIHJldHVybiAwCiAgICBm b3IgeCBpbiBsaXN0MToKICAgICAgICBpZiBub3QgeCBpbiBsaXN0MjoKICAgICAgICAgICAg cmV0dXJuIDAKICAgIHJldHVybiAxCgpkZWYgcG93ZXJzZXQoYmFzZSk6CiAgICAiQ29tcHV0 ZSB0aGUgc2V0IG9mIGFsbCBzdWJzZXRzIG9mIGEgc2V0LiIKICAgIHBvd2Vyc2V0ID0gW10K ICAgIGZvciBuIGluIHhyYW5nZSgyICoqIGxlbihiYXNlKSk6CglzdWJzZXQgPSBbXQoJZm9y IGUgaW4geHJhbmdlKGxlbihiYXNlKSk6CgkgICAgIGlmIG4gJiAyICoqIGU6CgkJc3Vic2V0 LmFwcGVuZChiYXNlW2VdKQoJcG93ZXJzZXQuYXBwZW5kKHN1YnNldCkKICAgIHJldHVybiBw b3dlcnNldAoKY2xhc3Mgc2V0OgogICAgIkxpc3RzIHdpdGggc2V0LXRoZW9yZXRpYyBvcGVy YXRpb25zLiIKCiAgICBkZWYgX19pbml0X18oc2VsZiwgdmFsdWUpOgogICAgICAgIHNlbGYu ZWxlbWVudHMgPSBzZXRpZnkodmFsdWUpCgogICAgZGVmIF9fbGVuX18oc2VsZik6CglyZXR1 cm4gbGVuKHNlbGYuZWxlbWVudHMpCgogICAgZGVmIF9fZ2V0aXRlbV9fKHNlbGYsIGluZCk6 CglyZXR1cm4gc2VsZi5lbGVtZW50c1tpbmRdCgogICAgZGVmIF9fc2V0aXRlbV9fKHNlbGYs IGluZCwgdmFsKToKICAgICAgICBpZiB2YWwgbm90IGluIHNlbGYuZWxlbWVudHM6CiAgICAg ICAgICAgIHNlbGYuZWxlbWVudHNbaW5kXSA9IHZhbAoKICAgIGRlZiBfX2RlbGl0ZW1fXyhz ZWxmLCBpbmQpOgoJZGVsIHNlbGYuZWxlbWVudHNbaW5kXQoKICAgIGRlZiBsaXN0KHNlbGYp OgogICAgICAgIHJldHVybiBzZWxmLmVsZW1lbnRzCgogICAgZGVmIGFwcGVuZChzZWxmLCBu ZXcpOgogICAgICAgIGlmIG5ldyBub3QgaW4gc2VsZi5lbGVtZW50czoKICAgICAgICAgICAg c2VsZi5lbGVtZW50cy5hcHBlbmQobmV3KQoKICAgIGRlZiBleHRlbmQoc2VsZiwgbmV3KToK CXNlbGYuZWxlbWVudHMuZXh0ZW5kKG5ldykKICAgICAgICBzZWxmLmVsZW1lbnRzID0gc2V0 aWZ5KHNlbGYuZWxlbWVudHMpCgogICAgZGVmIGNvdW50KHNlbGYsIHgpOgoJc2VsZi5lbGVt ZW50cy5jb3VudCh4KQoKICAgIGRlZiBpbmRleChzZWxmLCB4KToKCXNlbGYuZWxlbWVudHMu aW5kZXgoeCkKCiAgICBkZWYgaW5zZXJ0KHNlbGYsIGksIHgpOgogICAgICAgIGlmIHggbm90 IGluIHNlbGYuZWxlbWVudHM6CiAgICAgICAgICAgIHNlbGYuZWxlbWVudHMuaW5kZXgoaSwg eCkKCiAgICBkZWYgcG9wKHNlbGYsIGk9Tm9uZSk6CglzZWxmLmVsZW1lbnRzLnBvcChpKQoK ICAgIGRlZiByZW1vdmUoc2VsZiwgeCk6CglzZWxmLmVsZW1lbnRzLnJlbW92ZSh4KQoKICAg IGRlZiByZXZlcnNlKHNlbGYpOgoJc2VsZi5lbGVtZW50cy5yZXZlcnNlKCkKCiAgICBkZWYg c29ydChzZWxmLCBjbXA9Tm9uZSk6CglzZWxmLmVsZW1lbnRzLnNvcnQoY21wKQoKICAgIGRl ZiBfX29yX18oc2VsZiwgb3RoZXIpOgoJaWYgdHlwZShvdGhlcikgPT0gdHlwZShzZWxmKToK CSAgICBvdGhlciA9IG90aGVyLmVsZW1lbnRzCiAgICAgICAgcmV0dXJuIHNldCh1bmlvbihz ZWxmLmVsZW1lbnRzLCBvdGhlcikpCgogICAgX19hZGRfXyA9IF9fb3JfXwoKICAgIGRlZiBf X2FuZF9fKHNlbGYsIG90aGVyKToKCWlmIHR5cGUob3RoZXIpID09IHR5cGUoc2VsZik6Cgkg ICAgb3RoZXIgPSBvdGhlci5lbGVtZW50cwogICAgICAgIHJldHVybiBzZXQoaW50ZXJzZWN0 aW9uKHNlbGYuZWxlbWVudHMsIG90aGVyKSkKCiAgICBkZWYgX19zdWJfXyhzZWxmLCBvdGhl cik6CglpZiB0eXBlKG90aGVyKSA9PSB0eXBlKHNlbGYpOgoJICAgIG90aGVyID0gb3RoZXIu ZWxlbWVudHMKICAgICAgICByZXR1cm4gc2V0KGRpZmZlcmVuY2Uoc2VsZi5lbGVtZW50cywg b3RoZXIpKQoKICAgIGRlZiBfX3hvcl9fKHNlbGYsIG90aGVyKToKCWlmIHR5cGUob3RoZXIp ID09IHR5cGUoc2VsZik6CgkgICAgb3RoZXIgPSBvdGhlci5lbGVtZW50cwogICAgICAgIHJl dHVybiBzZXQoc3ltbWV0cmljX2RpZmZlcmVuY2Uoc2VsZi5lbGVtZW50cywgb3RoZXIpKQoK ICAgIGRlZiBfX211bF9fKHNlbGYsIG90aGVyKToKCWlmIHR5cGUob3RoZXIpID09IHR5cGUo c2VsZik6CgkgICAgb3RoZXIgPSBvdGhlci5lbGVtZW50cwogICAgICAgIHJldHVybiBzZXQo Y2FydGVzaWFuKHNlbGYuZWxlbWVudHMsIG90aGVyKSkKCiAgICBkZWYgX19lcV9fKHNlbGYs IG90aGVyKToKCWlmIHR5cGUob3RoZXIpID09IHR5cGUoc2VsZik6CgkgICAgb3RoZXIgPSBv dGhlci5lbGVtZW50cwogICAgICAgIHJldHVybiBzZWxmLmVsZW1lbnRzID09IG90aGVyCgog ICAgZGVmIF9fbmVfXyhzZWxmLCBvdGhlcik6CglpZiB0eXBlKG90aGVyKSA9PSB0eXBlKHNl bGYpOgoJICAgIG90aGVyID0gb3RoZXIuZWxlbWVudHMKICAgICAgICByZXR1cm4gc2VsZi5l bGVtZW50cyAhPSBvdGhlcgoKICAgIGRlZiBfX2x0X18oc2VsZiwgb3RoZXIpOgoJaWYgdHlw ZShvdGhlcikgPT0gdHlwZShzZWxmKToKCSAgICBvdGhlciA9IG90aGVyLmVsZW1lbnRzCiAg ICAgICAgcmV0dXJuIHByb3Blcl9zdWJzZXQoc2VsZi5lbGVtZW50cywgb3RoZXIpCgogICAg ZGVmIF9fbGVfXyhzZWxmLCBvdGhlcik6CglpZiB0eXBlKG90aGVyKSA9PSB0eXBlKHNlbGYp OgoJICAgIG90aGVyID0gb3RoZXIuZWxlbWVudHMKICAgICAgICByZXR1cm4gc3Vic2V0KHNl bGYuZWxlbWVudHMsIG90aGVyKQoKICAgIGRlZiBfX2d0X18oc2VsZiwgb3RoZXIpOgoJaWYg dHlwZShvdGhlcikgPT0gdHlwZShzZWxmKToKCSAgICBvdGhlciA9IG90aGVyLmVsZW1lbnRz CiAgICAgICAgcmV0dXJuIHByb3Blcl9zdWJzZXQob3RoZXIsIHNlbGYuZWxlbWVudHMpCgog ICAgZGVmIF9fZ2VfXyhzZWxmLCBvdGhlcik6CglpZiB0eXBlKG90aGVyKSA9PSB0eXBlKHNl bGYpOgoJICAgIG90aGVyID0gb3RoZXIuZWxlbWVudHMKICAgICAgICByZXR1cm4gc3Vic2V0 KG90aGVyLCBzZWxmLmVsZW1lbnRzKQoKICAgIGRlZiBfX3N0cl9fKHNlbGYpOgogICAgICAg IHJlcyA9ICJ7IgogICAgICAgIGZvciB4IGluIHNlbGYuZWxlbWVudHM6CiAgICAgICAgICAg IHJlcyA9IHJlcyArIHN0cih4KSArICIsICIKICAgICAgICByZXMgPSByZXNbMDotMl0gKyAi fSIKICAgICAgICByZXR1cm4gcmVzCgogICAgZGVmIF9fcmVwcl9fKHNlbGYpOgogICAgICAg IHJldHVybiByZXByKHNlbGYuZWxlbWVudHMpCgppZiBfX25hbWVfXyA9PSAnX19tYWluX18n OgogICAgYSA9IHNldChbMSwgMiwgMywgNF0pCiAgICBiID0gc2V0KFsxLCA0XSkKICAgIGMg PSBzZXQoWzUsIDZdKQogICAgZCA9IFsxLCAxLCAyLCAxXQogICAgcHJpbnQgYGRgLCAic2V0 aWZpZXMgdG8iLCBzZXQoZCkKICAgIHByaW50IGBhYCwgInwiLCBgYmAsICJpcyIsIGBhIHwg YmAKICAgIHByaW50IGBhYCwgIl4iLCBgYmAsICJpcyIsIGBhIF4gYmAKICAgIHByaW50IGBh YCwgIiYiLCBgYmAsICJpcyIsIGBhICYgYmAKICAgIHByaW50IGBiYCwgIioiLCBgY2AsICJp cyIsIGBiICogY2AKICAgIHByaW50IGBhYCwgJzwnLCBgYmAsICJpcyIsIGBhIDwgYmAKICAg IHByaW50IGBhYCwgJz4nLCBgYmAsICJpcyIsIGBhID4gYmAKICAgIHByaW50IGBiYCwgJzwn LCBgY2AsICJpcyIsIGBiIDwgY2AKICAgIHByaW50IGBiYCwgJz4nLCBgY2AsICJpcyIsIGBi ID4gY2AKICAgIHByaW50ICJQb3dlciBzZXQgb2YiLCBgY2AsICJpcyIsIHBvd2Vyc2V0KGMp CgojIGVuZAoc2V0cy9qYWhzZXQucHkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAxMDA2NjQA MDAwMTU1NgAwMDAwNzY1ADAwMDAwMDAwNjAxADA3MjMzMzQ3NzE1ADAxMzA1NQAgMAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhciAgAGplcmVt eQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYWRtaW4AAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABjbGFzcyBTZXQ6CiAgICBkZWYgX19pbml0X18o c2VsZik6CiAgICAgICAgc2VsZi5lbHRzID0ge30KIyMgICAgICAgIHNldHMgYXJlIGZhc3Rl ciB3aGVuIG1ldGhvZCBvdmVyaGVhZCBpcyByZW1vdmVkOgojIyAgICAgICAgc2VsZi5lbGVt ZW50cyA9IHNlbGYuZWx0cy5rZXlzCiMjICAgICAgICBzZWxmLmhhc19lbHQgPSBzZWxmLmVs dHMuaGFzX2tleQoKICAgIGRlZiBhZGQoc2VsZiwgZWx0KToKICAgICAgICBzZWxmLmVsdHNb ZWx0XSA9IE5vbmUKCiAgICBkZWYgZWxlbWVudHMoc2VsZik6CiAgICAgICAgcmV0dXJuIHNl bGYuZWx0cy5rZXlzKCkKCiAgICBkZWYgaGFzX2VsdChzZWxmLCBlbHQpOgogICAgICAgIHJl dHVybiBzZWxmLmVsdHMuaGFzX2tleShlbHQpCiAglebymX04xi-- From petrilli@amber.org Tue Jan 23 20:06:16 2001 From: petrilli@amber.org (Christopher Petrilli) Date: Tue, 23 Jan 2001 15:06:16 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com>; from nas@arctrix.com on Tue, Jan 23, 2001 at 05:08:07AM -0800 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> Message-ID: <20010123150616.F18796@trump.amber.org> Neil Schemenauer [nas@arctrix.com] wrote: > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > Unfortunately, for me, a Python implementation of Sets is only > > interesting academicaly. Any time I've needed to work with them at a > > large scale, I've needed them *much* faster than Python could achieve > > without a C extension. > > I think this argues that if sets are added to the core they > should be implemented as an extension type with the speed of > dictionaries and the memory usage of lists. Basicly, we would > use the implementation of PyDict but drop the values. This is effectively the implementation that Zope has for Sets. In addition we have "buckets" that have scores on them (which are implemented as a modified BTree). Unfortunately Jim Fulton (who wrote all the code for that level) is in a meeting, but I hope he'll comment on the implementation that was chosen for our software. Chris -- | Christopher Petrilli | petrilli@amber.org From jeremy@alum.mit.edu Tue Jan 23 19:56:05 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 14:56:05 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123135530.A26565@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> <20010123135530.A26565@thyrsus.com> Message-ID: <14957.57813.23072.723418@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Jeremy Hylton : Content-Description: ESR> message body text >> The tests showed that dictionary-based sets were always faster. >> For small tests (3 operations), the difference was about 10 >> percent. For larger tests (88 operations), the difference ranged >> from 180 to almost 700 percent. ESR> Not surprising. 88 elements is getting pretty large. Large for what? I've got directories with that many files and modules with the many names defined at the top-level :-). I'm just reporting the range of set sizes I've encountered for a real application. In general, I expect a few hundred elements should be handled without trouble by most Python containers. Jeremy From gvwilson@nevex.com Tue Jan 23 20:26:22 2001 From: gvwilson@nevex.com (Greg Wilson) Date: Tue, 23 Jan 2001 15:26:22 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123200601.87817EF68@mail.python.org> Message-ID: <001101c0857a$c0dce420$770a0a0a@nevex.com> Greg Wilson: Meta-question: do people want to continue to discuss sets on the general python-dev list, or take it out-of-line (e.g. to an egroups list)? I'm finding all of the discussion very useful, but I realize that many readers might prefer to concentrate on the 2.1 release... > Jeremy Hylton : > > The tests showed that dictionary-based sets were always faster. > > small tests (3 operations), the difference was about 10 percent. > > larger tests (88 operations), the difference ranged from > > 180 to almost 700 percent. > Eric Raymond : > Not surprising. 88 elements is getting pretty large. Greg Wilson: Really? I was testing my implementation with sets of email addresses grep'd out of old mail folders --- typical sizes were several thousand elements. > From: Christopher Petrilli > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. Greg Wilson: I had been expecting to implement this in C, not in pure Python, for performance. > From: Christopher Petrilli > In the "scripting" problem domain, I would agree that Sets would > rarely reach large sizes, > and so a algorithm which performed in quadratic time might be fine, Greg Wilson: I strongly disagree (see the email address example above --- it was the first thing that occurred to me to try). I am still hoping to find a sub-quadratic (preferably sub-linear) implementation. I can do it in C++ with observer/observable (contained items notify containers of changes in value, sets store all equivalent items in the same bucket), but that doesn't really help... > From: Ka-Ping Yee > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries... and: > From: Neil Schemenauer > ...if sets are added to the core...we would > use the implementation of PyDict but drop the values. Unfortunately, if values are required to be immutable, then sets of sets aren't possible... :-( Thanks, everyone, Greg From esr@thyrsus.com Tue Jan 23 20:38:39 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 15:38:39 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from ping@lfw.org on Tue, Jan 23, 2001 at 11:27:38AM -0800 References: <14957.53331.342827.462297@localhost.localdomain> Message-ID: <20010123153839.B26676@thyrsus.com> Ka-Ping Yee : > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries. The rest is then all > quite natural: > > dict[key] = 1 > if key in dict: ... > for key in dict: ... Independently of implementation issues about sets, I think this is a damn fine idea. +1. > (Then we can also get rid of the ugly has_key method.) > > For those that need mutable set elements badly enough to sacrifice > a little speed, we can add two methods to lists: > > lst.include(elt) # same as - if elt not in lst: lst.append(elt) > lst.exclude(elt) # same as - while elt in lst: lst.remove(elt) +1 on the concept, -0 on the names. -- Eric S. Raymond [The disarming of citizens] has a double effect, it palsies the hand and brutalizes the mind: a habitual disuse of physical forces totally destroys the moral [force]; and men lose at once the power of protecting themselves, and of discerning the cause of their oppression. -- Joel Barlow, "Advice to the Privileged Orders", 1792-93 From tim.one@home.com Tue Jan 23 22:02:41 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 23 Jan 2001 17:02:41 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: <200101231531.KAA05122@cj20424-a.reston1.va.home.com> Message-ID: >> operator.isMappingType() >> + some other C style _Check() APIs [Guido] > Yes, these should probably be deprecated. I certainly have never > used them! (The operator module doesn't seem to get much use in > general... It's used heavily by test_operator.py . Outside of that, it's used maybe three times in the std distribution, nowhere essential; the return map(operator.__div__, rgbtuple, _maxtuple) in Pynche's ColorDB.py is typical. 2.0's return [x / 256. for x in rgbtuple] does the same thing more clearly (_maxtuple is a module constant). It appeals to functional-language fans and extreme micro-optimizers, so they don't have to type "lambda" in the simplest cases. At least operator.truth(x) is *clearer* than "not not x". > Was it a bad idea?) Mixed, but I'd say more bad than good overall. From thomas@xs4all.net Tue Jan 23 23:38:14 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 00:38:14 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010123153839.B26676@thyrsus.com>; from esr@thyrsus.com on Tue, Jan 23, 2001 at 03:38:39PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> Message-ID: <20010124003814.F27785@xs4all.nl> On Tue, Jan 23, 2001 at 03:38:39PM -0500, Eric S. Raymond wrote: > > The only change that needs to be made to support sets of immutable > > elements is to provide "in" on dictionaries. The rest is then all > > quite natural: > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > Independently of implementation issues about sets, I think this is a > damn fine idea. +1. It's come up before. The problem with it is that it's not quite obvious whether it is 'if key in dict' or 'if value in dict'. Sure, from the above example it's obvious what you *expect*, but I suspect that 'for x in dict' will result in a 40/60 split in expectations, and like American voters, the 20% middle section will change their vote each recount :-) Now, if only there was a terribly obvious way to spell it... so that it's immediately obvious which of the two you wanted.... something like, oh, I donno, this, maybe: if key in dict.keys: ... if value in dict.values: ... Ponder-ponder--Guido-should-use-the-time-machine-for-this-one!-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik@effbot.org Wed Jan 24 00:13:20 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 01:13:20 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> Message-ID: <02f401c0859a$765d07c0$e46940d5@hagrid> > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. you forgot "if (key, value) in dict" on the other hand, it's not quite obvious that "list.sort" doesn't return the sorted list, "print >>None" prints to standard output, "except KeyError, ValueError" doesn't catch a ValueError exception, etc, etc, etc. (nor that it's "has_key" and "hasattr", and not "has_key" and "has_attr" or "haskey" and "hasattr" ;-) let's just say that "in" is the same thing as "has_key", and be done with it. Cheers /F From tim.one@home.com Wed Jan 24 01:51:22 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 23 Jan 2001 20:51:22 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123140604.E18796@trump.amber.org> Message-ID: [Christopher Petrilli] > .... > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. How do you know that? I've used large sets in Python happily without resorting to C or kjbuckets (which is really aiming at fast operations on *graphs*, in which area it has no equal). Everyone (except Eric ) uses dicts to implement sets in Python, and "most" set operations can work at full C speed then; e.g., assuming both sets have N elements: membership testing O(1) -- it's just dict.has_key() element insertion O(1) -- dict[element] = 1 element removal O(1) -- del dict[element] union O(N), but at full C speed -- dict1.update(dict2) intersection O(N), but at Python speed (the only 2.1 dog in the bunch!) choose some element and remove it took O(N) time and additional space in 2.0, but is O(1) in both since dict.pop() was introduced iteration O(N), with O(N) additional space using dict.keys(), or O(1) additional space using dict.pop() repeatedly What are you going to do in C that's faster than using a Python dict for this purpose? Most key set operations are straightforward Python dict 1-liners then, and Python dicts are very fast. kjbuckets sets were slower last time I timed them (several years ago, but Python dicts have gotten faster since then while kjbuckets has been stagnant). There's a long tradition in the Lisp world of using unordered lists to represent sets (when the only tool you have is a hammer ... <0.5 wink>), but it's been easy to do much better than that in Python almost since the start. Even in the Python list world, enormous improvements for large sets can be gotten by maintaining lists in sorted order (then most O(N) operations drop to O(log2(N)), and O(N**2) to O(N)). Curiously, though, in 2.1 we can still use a dict-set for complex numbers, but no longer a sorted-list-set! Requiring a total ordering can get in the way more than requiring hashability (and vice versa -- that's a tough one). measurement-is-the-measure-of-all-measurable-things-ly y'rs - tim From greg@cosc.canterbury.ac.nz Wed Jan 24 02:45:01 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:45:01 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010124003814.F27785@xs4all.nl> Message-ID: <200101240245.PAA02098@s454.cosc.canterbury.ac.nz> Thomas Wouters : > Now, if only there was a terribly obvious way to spell it... so that it's > immediately obvious which of the two you wanted... Well, in the case of for key in d: or for value in d: it's immediately obvious to a *human* reader what is meant, so all we need to do is make the compiler a bit smarter. This can easily be done by the use of a small table, containing the equivalents of the words 'key' and 'value' in all known natural languages, against which the target variable name is matched using some suitable fuzzy matching algorithm. Soundex could be used for this, if we can decide on which version to use... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Wed Jan 24 02:46:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 21:46:37 -0500 Subject: [Python-Dev] getting rid of ucnhash In-Reply-To: Your message of "Tue, 23 Jan 2001 19:03:42 +0100." <013901c08566$d2a8f360$e46940d5@hagrid> References: <013901c08566$d2a8f360$e46940d5@hagrid> Message-ID: <200101240246.VAA06336@cj20424-a.reston1.va.home.com> > It's probably just me, but the names of the two unicode > modules tend to irritate me: > > > ls u*.pyd > ucnhash.pyd unicodedata.pyd > > (the former contains names, the latter data) > > I've been meaning to rename the former, but I just realized > that it might be better to get rid of it completely, and move > its functionality into the unicodedata module. > > The result is a single 200k unicodedata module, which con- > tains the name database as well as two new functions: > > name(character [, default]) => map unicode > character to name. if the name doesn't exist, > return the default object, or raise ValueError. > > lookup(name) => unicode character > (or raise KeyError if it doesn't exist) > > Should I check it in now, change the names/semantics and check > it in, or post it to sourceforge? To me, both of these are irrelevant details of the Unicode implementation. :-) IOW, feel free to check it in. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Wed Jan 24 02:49:21 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:49:21 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message-ID: <200101240249.PAA02101@s454.cosc.canterbury.ac.nz> Tim Peters : > Requiring a total ordering can get in the way more than requiring > hashability Often it's useful to have *some* total ordering, and you don't really care what it is as long as its consistent. Maybe all types should be required to support cmp(x,y) even if doing x < y via the rich comparison route raises a NotOrderable exception. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Jan 24 02:52:43 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:52:43 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com> Message-ID: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Neil Schemenauer : > Basicly, we would > use the implementation of PyDict but drop the values. This could be incorporated into PyDict. Instead of storing keys and values in the same array, keep them in separate arrays and only allocate the values array the first time someone stores a value other than 1. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Wed Jan 24 02:58:59 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 21:58:59 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Wed, 24 Jan 2001 01:13:20 +0100." <02f401c0859a$765d07c0$e46940d5@hagrid> References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> Message-ID: <200101240258.VAA06479@cj20424-a.reston1.va.home.com> > let's just say that "in" is the same thing as "has_key", > and be done with it. You know, I've long resisted this, but I agree now -- this is the right thing. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jan 24 03:11:30 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:11:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: Your message of "Tue, 23 Jan 2001 12:35:04 CST." <14957.52952.48739.53360@beluga.mojam.com> References: <14957.52952.48739.53360@beluga.mojam.com> Message-ID: <200101240311.WAA06582@cj20424-a.reston1.va.home.com> > Guido> - Use "exec ... in dict" to avoid having to walk on eggshells; > Guido> locals no don't have to start with underscore. > > Thanks. I have just been incredibly short on time lately. You're welcome. > Guido> - Only test dbhash if bsddb can be imported. (Wonder if there > Guido> are more like this?) > > Alpha testing should pick those up, yes? ;-) Yes. :-) > Guido> ! try: > Guido> ! import bsddb > Guido> ! except ImportError: > Guido> ! if verbose: > Guido> ! print "can't import bsddb, so skipping dbhash" > Guido> ! else: > Guido> ! check_all("dbhash") > > Instead of having to know that dbhash includes bsddb, shouldn't dbhash be > the module that's imported here? I think I saw a complaint about this that specifically said that when dbhash is imported when bsddb can't be imported, an incomplete dbhash is left behind in sys.modules, and then a second import of dbhash will succeed -- but of course it will define no objects. Since dbhash may be imported elsewhere, testing for bsddb is safer. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jan 24 03:22:14 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:22:14 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.114,2.115 In-Reply-To: Your message of "Tue, 23 Jan 2001 08:24:38 PST." References: Message-ID: <200101240322.WAA06671@cj20424-a.reston1.va.home.com> > A few miscellaneous helpers. > > PyObject_Dump(): New function that is useful when debugging Python's C > runtime. In something like gdb it can be a pain to get some useful > information out of PyObject*'s. This function prints the str() of the > object to stderr, along with the object's refcount and hex address. > > PyGC_Dump(): Similar to PyObject_Dump() but knows how to cast from the > garbage collector prefix back to the PyObject* structure. > > [See Misc/gdbinit for some useful gdb hooks] > > none_dealloc(): Rather than SEGV if we accidentally decref None out of > existance, we assign None's and NotImplemented's destructor slot to > this function, which just calls abort(). Barry, since these are only gdb helpers, would it perhaps be better if their names started with "_Py" to indicate that they aren't part of the regular API? They violate an important rule: you shouldn't write to stderr directly, but always to sys.stderr. (There's a helper routines to write to stderr: PySys_WriteStderr().) I understand that for the gdb helper it's important to use the real stderr, and I don't object to having these functions present at all times (they're so small), but I do think that we should make it clear (by a _Py name, and also by a comment) that they should not be called! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Wed Jan 24 03:29:24 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 19:29:24 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010124003814.F27785@xs4all.nl> Message-ID: I wrote: > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries. Thomas Wouters wrote: > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. Yes, and i've seen this objection before, and i think it's silly. > Sure, from the above > example it's obvious what you *expect*, but I suspect that 'for x in dict' > will result in a 40/60 split in expectations, No way... it's at least 90/10. How often do you write 'dict.has_key(x)'? (std lib says: 206) How often do you write 'for x in dict.keys()'? (std lib says: 49) How often do you write 'x in dict.values()'? (std lib says: 0) How often do you write 'for x in dict.values()'? (std lib says: 3) I rest my case. -- ?!ng From barry@digicool.com Wed Jan 24 03:44:31 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 23 Jan 2001 22:44:31 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.114,2.115 References: <200101240322.WAA06671@cj20424-a.reston1.va.home.com> Message-ID: <14958.20383.795064.832967@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Barry, since these are only gdb helpers, would it perhaps be GvR> better if their names started with "_Py" to indicate that GvR> they aren't part of the regular API? They violate an GvR> important rule: you shouldn't write to stderr directly, but GvR> always to sys.stderr. (There's a helper routines to write to GvR> stderr: PySys_WriteStderr().) I understand that for the gdb GvR> helper it's important to use the real stderr, and I don't GvR> object to having these functions present at all times GvR> (they're so small), but I do think that we should make it GvR> clear (by a _Py name, and also by a comment) that they should GvR> not be called! I thought about it, couldn't decide and figured I'd check it in anyway, knowing that you'd let me know. See how wise I was? :) I will rename them as _Py* and fix the gdbinit file accordingly. One note: these functions /ought/ to be useful for dbx or any other command line debugger. I just haven't used anything but gdb for years. If anybody's got a dbxinit equivalent I could add that to Misc too. nothing-an-adjacent-office-wouldn't-have-solved-much-more-quick-ly y'rs, -Barry From guido@digicool.com Wed Jan 24 03:46:47 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:46:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: Your message of "Tue, 23 Jan 2001 09:22:26 EST." <20010123092226.A25968@thyrsus.com> References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> Message-ID: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> > Guido van Rossum : > > Can you point me to docs explaining the meaning of the BROWSER > > environment variable? I've never heard of it... The last new > > environment variables I learned were PAGER and EDITOR, probably 15 > > years ago when 4.1BSD was released... :-) ESR replies: > You've never heard of BROWSER because I invented it and have not > widely popularized it yet :-). Ping knew about it either because he > read the module code and saw that it was supposed to work, or because > he remembered the design discussion when webbrowser.py was first > implemented. > > I've had conversations with some key Perl and Tcl people (Larry Wall, > Tom Christiansen, Clif Flynt) about the BROWSER convention, and they > agree it's a good idea. I'll probably hack support for it into Perl's > browser launcher next. > > It's documented in the version of libwebbrowser.tex now in the CVS > tree. Grumble. That wasn't the kind of answer I expected. I don't like it if Python is used as a wedge to get a particular thing introduced to the rest of the world, no matter how useful it may seem at the time. If something is already a popular convention, I'll happily adopt it, but I'm not comfortable being put in front of somebody else's cart. There just are too many carts that would like to be pulled by a horse as strong as Python, and I don't want to take sides if I can avoid it. BROWSER seems unlikely to take the world by storm and I don't feel I need to be involved in the effort to get it accepted. (And yes, I know there are enough cases where I *did* take sides. There were some cases where I *do* want to take a side, and there were some mistakes -- which is one of the reasons why I'm shy about taking sides now.) Anyway, shouldn't you also talk to the developers of packages like KDE and Gnome? Surely their users would like to be able to configure the default webbrowser. Talking just to the scripting language people seems like you're thinking too small. There must be lots of C apps with the desire to invoke a browser. Also Emacs, which has an extensive list of browser-url-* functions (you might even learn a few tricks from it about how to invoke various external browsers) but AFAIK no default browser selection. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jan 24 03:54:25 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:54:25 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Wed, 24 Jan 2001 15:52:43 +1300." <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> > Neil Schemenauer : > > > Basicly, we would > > use the implementation of PyDict but drop the values. > > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Not a bad idea! (But shouldn't the default value be something else, like none?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jan 24 04:20:56 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 23:20:56 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Wed, 24 Jan 2001 00:38:14 +0100." <20010124003814.F27785@xs4all.nl> References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> Message-ID: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> > > > dict[key] = 1 > > > if key in dict: ... > > > for key in dict: ... > > > Independently of implementation issues about sets, I think this is a > > damn fine idea. +1. > > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. Sure, from the above > example it's obvious what you *expect*, but I suspect that 'for x in dict' > will result in a 40/60 split in expectations, and like American voters, the > 20% middle section will change their vote each recount :-) > > Now, if only there was a terribly obvious way to spell it... so that it's > immediately obvious which of the two you wanted.... something like, oh, I > donno, this, maybe: > > if key in dict.keys: ... > if value in dict.values: ... > > Ponder-ponder--Guido-should-use-the-time-machine-for-this-one!-ly y'rs, No chance of a time-machine escape, but I *can* say that I agree that Ping's proposal makes a lot of sense. This is a reversal of my previous opinion on this matter. (Take note -- those don't happen very often! :-) First to submit a working patch gets a free copy of 2.1a2 and subsequent releases, --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Wed Jan 24 04:50:49 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 23 Jan 2001 23:50:49 -0500 Subject: [Python-Dev] getting rid of ucnhash In-Reply-To: <013901c08566$d2a8f360$e46940d5@hagrid> Message-ID: [/F] > It's probably just me, but the names of the two unicode > modules tend to irritate me: I don't care much about the names, but having two Unicode subprojects in the MS build seems overkill . > ls u*.pyd > ucnhash.pyd unicodedata.pyd > > (the former contains names, the latter data) Maybe that's the reason: the names don't get loaded at all unless you *use* one of the name APIs? Hard to say whether that's worth the bother; now that everything has been nicely compressed, it's sure not as compelling as it may have been earlier. > I've been meaning to rename the former, but I just realized > that it might be better to get rid of it completely, and move > its functionality into the unicodedata module. > > The result is a single 200k unicodedata module, which con- > tains the name database as well as two new functions: > > name(character [, default]) => map unicode > character to name. if the name doesn't exist, > return the default object, or raise ValueError. > > lookup(name) => unicode character > (or raise KeyError if it doesn't exist) > > Should I check it in now, change the names/semantics and check > it in, or post it to sourceforge? I have no opinion on what's best: you're working with it, you're the best judge of that. I only vote for checking in whatever you decide sooner rather than later; I'll fiddle the MS project files and readmes accordingly ASAP after that. From moshez@zadka.site.co.il Wed Jan 24 14:07:08 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Wed, 24 Jan 2001 16:07:08 +0200 (IST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001, Greg Ewing wrote: > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Cool idea, but even cooler (would catch more idioms, that is) is "the first time someone stores something not 'is' something in the dict, allocate the values array". This would catch small numbers, None and identifier-looking strings, for the measly cost of one pointer/dict object. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From moshez@zadka.site.co.il Wed Jan 24 14:15:39 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Wed, 24 Jan 2001 16:15:39 +0200 (IST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>, <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> Message-ID: <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il> On Tue, 23 Jan 2001 22:46:47 -0500, Guido van Rossum wrote: [ESR] > You've never heard of BROWSER because I invented it and have not > widely popularized it yet :-). [Guido v. Rossum] > Grumble. That wasn't the kind of answer I expected. I don't like it > if Python is used as a wedge to get a particular thing introduced to > the rest of the world, no matter how useful it may seem at the time. Guido, I think you're being over-dramatic. BROWSER is right in the tradition of PAGER and EDITOR, and a lot of other programs need it. I know Eric uses RH and mutt, so probably RH's urlview program (which mutt uses to jump to URLs) uses BROWSER. I was just about to submit a bug report to Debian that their urlview doesn't respect it. And if you really don't want to be a horse in front of a cart... > Anyway, shouldn't you also talk to the developers of packages like KDE > and Gnome? Surely their users would like to be able to configure the > default webbrowser. Yes -- via GNOME/KDE specific mechanisms. I have 0 experience with KDE, but I'm guessing the GNOME guys would do it via the GNOME "registry". KDE probably has something similar. I'm sure you wouldn't want Python to depend on GNOME, though it would be nice to make the browser-choosing part pluggable so when "import gnome" is done, it automatically tries to choose the user's browser. On UNIX (as opposed to GNOME/KDE, which are pretty much operating systems themselves), these things are done via environment variable. And $BROWSER doesn't seem like that much of an innovation. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From skip@mojam.com (Skip Montanaro) Wed Jan 24 06:28:21 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 24 Jan 2001 00:28:21 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: <200101240311.WAA06582@cj20424-a.reston1.va.home.com> References: <14957.52952.48739.53360@beluga.mojam.com> <200101240311.WAA06582@cj20424-a.reston1.va.home.com> Message-ID: <14958.30213.325584.373062@beluga.mojam.com> Guido> I think I saw a complaint about this that specifically said that Guido> when dbhash is imported when bsddb can't be imported, an Guido> incomplete dbhash is left behind in sys.modules, and then a Guido> second import of dbhash will succeed -- but of course it will Guido> define no objects. So it does: % ./python Python 2.1a1 (#2, Jan 23 2001, 23:30:41) [GCC 2.95.3 19991030 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import dbhash Traceback (most recent call last): File "", line 1, in ? File "/home/beluga/skip/src/python/dist/src/Lib/dbhash.py", line 3, in ? import bsddb ImportError: No module named bsddb >>> import dbhash >>> Can that be construed as a bug? If import fails, shouldn't the stub module that was inserted in sys.modules be removed? Skip From skip@mojam.com (Skip Montanaro) Wed Jan 24 06:31:08 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 24 Jan 2001 00:31:08 -0600 (CST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <14958.30380.851599.764535@beluga.mojam.com> Guido> BROWSER seems unlikely to take the world by storm and I don't Guido> feel I need to be involved in the effort to get it accepted. Editors and web browsers are classes of tools which (one would hope) will always come in several varieties. Users have to have some way to specify what to launch. BROWSER seems analogous to the EDITOR environment variable which is commonly used in Unix environments for just that purpose. Skip From thomas@xs4all.net Wed Jan 24 07:03:09 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 08:03:09 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101240420.XAA07153@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 11:20:56PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <200101240420.XAA07153@cj20424-a.reston1.va.home.com> Message-ID: <20010124080308.G27785@xs4all.nl> On Tue, Jan 23, 2001 at 11:20:56PM -0500, Guido van Rossum wrote: > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, Patch submitted. It only implements 'if key in dict', not 'for key in dict'. The latter is kind of hard until we have a separate iteration protocol. (PEP, anyone ?) Once we have it, we could consider 'for key, value in dict', which is now easily explained with 'dict.popitem()'. Does this mean I get a legally sound and thus empty legal statement with every Python release for the rest of your, its or my life, Guido, or will you just make me 'Free Python Release Receiver For Life' ? :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From pf@artcom-gmbh.de Wed Jan 24 07:31:30 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 24 Jan 2001 08:31:30 +0100 (MET) Subject: OT: contribution rewards (was Re: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> from Guido van Rossum at "Jan 23, 2001 11:20:56 pm" Message-ID: Hi, Guido van Rossum: [...] > Ping's proposal makes a lot of sense. This is a reversal of my > previous opinion on this matter. (Take note -- those don't happen > very often! :-) It gives a warm und fuzzy feeling to see that happen sometimes at all. ;-) > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, This repeated offer of free copies of Python becomes increasingly boring. For quite a while I myself have not contributed anything useful and I am nevertheless hoarding free copies of Python here. ;-) What about offering another immaterial reward to potential contributors instead? What about "fame points"? Anybody contributing something useful to Python receives a certain number of "fame points": These fame points will be added and placed in front of the name of the contributor into the ACKS file and the file will be sorted accordingly turning the ACKS file effectively into some kind of "Python contribution high score" ... ;-) Just kidding, Peter From tim.one@home.com Wed Jan 24 08:08:50 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 03:08:50 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com> Message-ID: [Neil Schemenauer] > I think this argues that if sets are added to the core they > should be implemented as an extension type with the speed of > dictionaries and the memory usage of lists. Basicly, we would > use the implementation of PyDict but drop the values. They'll be slower than dicts and take more memory than lists then. WRT memory, dicts cache the hash code with each entry for speed (so double the memory of a list even without the value field), and are never more than 2/3 full anyway. The dict implementation also gets low-level speed benefits out of using both the key and value fields to characterize the nature of a slot (the key field is NULL iff the slot is virgin; the value field is NULL iff the slot is available (virgin or dummy)). Dummy slots can be avoided (and so also the need for runtime code to distinguish them from active slots) by using a hash table of pointers to linked lists-- or flex vectors, or linked lists of small vectors --instead, and in most ways that leads to much simpler code (no more fiddling with dummies, no more probe-sequence hassles, no more boosting the size before the table is full). But without fine control over the internals of malloc, that takes even more memory in the end. Interesting twist: "a dict" *is* "a set", but a set of (key, value) pairs further constrained so that no two elements have the same key. So any set implementation can be used as-is to implement a dict as a set of 2-tuples, customizing the hash and "is equal" functions to look at just the tuples' first elements. The was the view taken by SETL in 1969, although their "map" (dict) type was eventually optimized to get away from actually constructing 2-tuples. Indeed, SETL eventually grew an elaborate optional type declaration sublanguage, allowing the user to influence many details of its many internal set-storage schemes; e.g., from pg 399 of "Programming With Sets: An Introduction to SETL": For example, we can declare [I'm putting their keywords in UPPERCASE for, umm, clarity] successors: LOCAL MMAP(ELMT b) REMOTE SET(ELMT b); This declaration specifies that for each x in b the image set successors{x} is stored in the element block of x, and that this image set is always to be represented as a bit vector. Similarly, the declaration successors: LOCAL MMAP(ELMT b) SPARSE SET(ELMT b); specifies that for each x in b the image set successors{x} is to be stored as a hash table containing pointers to elements of b. Note that the attribute LOCAL cannot be used for image sets of multivalued maps, This follows from the remarks in section 10.4.3 on the awkwardness of making local objects into subparts of composite objects. Clear? Snort. Here are some citations lifted from the web for their experience in trying to make these kinds of decisions by magic: @article{dewar:79, title="Programming by Refinement, as Exemplified by the {SETL} Representation Sublanguage", author="Robert B. K. Dewar and Arthur Grand and Ssu-Cheng Liu and Jacob T. Schwartz and Edmond Schonberg", journal=toplas, year=1979, month=jul, volume=1, number=1, pages="27--49" } @article{schonberg:81, title="An Automatic Technique for Selection of Data Structures in {SETL} Programs", author="Edmond Schonberg and Jacob T. Schwartz and Micha Sharir", journal=toplas, year=1981, month=apr, volume=3, number=2, pages="126--143" } @article{freudenberger:83, title="Experience with the {SETL} Optimizer", author="Stefan M. Freudenberger and Jacob T. Schwartz and Micha Sharir", pages="26--45", journal=toplas, year=1983, month=jan, volume=5, number=1 } If someone wanted to take sets seriously today, a better approach would be to define a minimal "set interface" ("abstract base class" in C++ terms), then supply multiple implementations of that interface, letting the user choose directly which implementation strategy they want for each of their sets. And people are doing just that in the C++ and Java worlds; e.g., http://developer.java.sun.com/developer/onlineTraining/ collections/Collection.html#SetInterface Curiously, the newer Java Collections Framework (covering multiple implementations of list, set, and dict interfaces) gave up on thread-safety by default, because it cost too much at runtime. Just another thing to argue about . we're-not-exactly-pioneers-here-ly y'rs - tim From fredrik@effbot.org Wed Jan 24 08:29:30 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 09:29:30 +0100 Subject: [Python-Dev] getting rid of ucnhash References: <013901c08566$d2a8f360$e46940d5@hagrid> <200101240246.VAA06336@cj20424-a.reston1.va.home.com> Message-ID: <019801c085df$c7ee0540$e46940d5@hagrid> guido wrote: > > It's probably just me, but the names of the two unicode > > modules tend to irritate me: > > > > > ls u*.pyd > > ucnhash.pyd unicodedata.pyd > > To me, both of these are irrelevant details of the Unicode > implementation. :-) IOW, feel free to check it in. Done. Note that Include/ucnhash.h is still there; it declares the "ucnhash_CAPI" structure used to access names from the unicodeobject module. (and all name-related tests are still kept in test_ucn) I'll leave it to Tim to update the MSVC build files. Cheers /F From tim.one@home.com Wed Jan 24 08:28:34 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 03:28:34 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Can you point me to docs explaining the meaning of the BROWSER > environment variable? I've never heard of it... The last new > environment variables I learned were PAGER and EDITOR, probably 15 > years ago when 4.1BSD was released... :-) I gotta say, politics aside, BROWSER is a screamingly natural answer to the question "what comes next in this sequence?": PAGER, EDITOR, ... Dear Lord, even *I* use a browser almost every week . explicit-is-better-than-implicit-ly y'rs - tim From esr@thyrsus.com Wed Jan 24 09:02:59 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:02:59 -0500 Subject: OT: contribution rewards (was Re: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: ; from pf@artcom-gmbh.de on Wed, Jan 24, 2001 at 08:31:30AM +0100 References: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> Message-ID: <20010124040259.A28086@thyrsus.com> Peter Funk : > What about offering another immaterial reward to potential contributors > instead? What about "fame points"? Anybody contributing something > useful to Python receives a certain number of "fame points": These > fame points will be added and placed in front of the name of > the contributor into the ACKS file and the file will be sorted > accordingly turning the ACKS file effectively into some kind of > "Python contribution high score" ... ;-) > > Just kidding, Peter You may be joking, but as an observer of how gift cultures work I say this isn't a bad idea. -- Eric S. Raymond "One of the ordinary modes, by which tyrants accomplish their purposes without resistance, is, by disarming the people, and making it an offense to keep arms." -- Constitutional scholar and Supreme Court Justice Joseph Story, 1840 From esr@thyrsus.com Wed Jan 24 09:09:18 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:09:18 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101240258.VAA06479@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 09:58:59PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> <200101240258.VAA06479@cj20424-a.reston1.va.home.com> Message-ID: <20010124040918.B28086@thyrsus.com> Guido van Rossum : > > let's just say that "in" is the same thing as "has_key", > > and be done with it. > > You know, I've long resisted this, but I agree now -- this is the > right thing. I think we've just justified the time and energy that went into this discussion. -- Eric S. Raymond What is a magician but a practicing theorist? -- Obi-Wan Kenobi, 'Return of the Jedi' From esr@thyrsus.com Wed Jan 24 09:14:27 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:14:27 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 10:46:47PM -0500 References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124041427.D28086@thyrsus.com> Guido van Rossum : > Grumble. That wasn't the kind of answer I expected. I don't like it > if Python is used as a wedge to get a particular thing introduced to > the rest of the world, no matter how useful it may seem at the time. Oh, stop! I'm not using Python as an argument for other people to adopt the BROWSER convention. The idea sells itself quite nicely by analogy to EDITOR and PAGER the second people hear it. > Anyway, shouldn't you also talk to the developers of packages like KDE > and Gnome? Surely their users would like to be able to configure the > default webbrowser. Talking just to the scripting language people > seems like you're thinking too small. There must be lots of C apps > with the desire to invoke a browser. Also Emacs, which has an > extensive list of browser-url-* functions (you might even learn a few > tricks from it about how to invoke various external browsers) but > AFAIK no default browser selection. All on my TO-DO list. -- Eric S. Raymond It is proper to take alarm at the first experiment on our liberties. We hold this prudent jealousy to be the first duty of citizens and one of the noblest characteristics of the late Revolution. The freemen of America did not wait till usurped power had strengthened itself by exercise and entangled the question in precedents. They saw all the consequences in the principle, and they avoided the consequences by denying the principle. We revere this lesson too much ... to forget it -- James Madison. From esr@thyrsus.com Wed Jan 24 09:16:12 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:16:12 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 03:28:34AM -0500 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124041612.E28086@thyrsus.com> Tim Peters : > I gotta say, politics aside, BROWSER is a screamingly natural answer to the > question "what comes next in this sequence?": > > PAGER, EDITOR, ... That's exactly what I thought when I was struck by the obvious. Everybody I spread this meme to seems to agree. -- Eric S. Raymond Government is actually the worst failure of civilized man. There has never been a really good one, and even those that are most tolerable are arbitrary, cruel, grasping and unintelligent. -- H. L. Mencken From esr@thyrsus.com Wed Jan 24 09:21:56 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:21:56 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Wed, Jan 24, 2001 at 04:15:39PM +0200 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>, <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il> Message-ID: <20010124042156.F28086@thyrsus.com> Moshe Zadka : > I know Eric uses RH and mutt, so probably RH's urlview program (which > mutt uses to jump to URLs) uses BROWSER. I was just about to submit > a bug report to Debian that their urlview doesn't respect it. Oh, *do* that! Note: BROWSER may consist of a colon-separated series of parts, browser commands to be tried in order (this is useful so you can put an X browser first, then a console browser, and have the right thing happen). If a part contains %s, the URL is substituted there; otherwise, the URL is concatenated to the command after a space. -- Eric S. Raymond Gun Control: The theory that a woman found dead in an alley, raped and strangled with her panty hose, is somehow morally superior to a woman explaining to police how her attacker got that fatal bullet wound. -- L. Neil Smith From tim.one@home.com Wed Jan 24 09:24:26 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 04:24:26 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> Message-ID: [Greg Ewing] > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. [Guido] > Not a bad idea! In theory, but if Vladimir were here he'd bust a gut over the possibly bad cache effects on "real dicts" (by keeping everything together, simply accessing the cached hash code brings both the key and value pointers into L1 cache too). We would need to quantify the effect of breaking that connection. > (But shouldn't the default value be something else, > like none?) Bleech. I hate the idiom of using a false value to mean "present". d = {} for x in seq: d[x] = 1 runs faster too (None needs a LOAD_GLOBAL now). From tim.one@home.com Wed Jan 24 10:01:36 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 05:01:36 -0500 Subject: [Python-Dev] test___all__ failing; Windows Message-ID: > python ../lib/test/regrtest.py test___all__ test___all__ test test___all__ crashed -- exceptions.AttributeError: 'locale' module has no attribute 'LC_MESSAGES' And indeed it does not: > python Python 2.1a1 (#9, Jan 24 2001, 04:40:55) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import locale >>> dir(locale) ['CHAR_MAX', 'Error', 'LC_ALL', 'LC_COLLATE', 'LC_CTYPE', 'LC_MONETARY', 'LC_NUMERIC', 'LC_TIME', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '_build_localename', '_group', '_parse_localename', '_print_locale', '_setlocale', '_test', 'atof', 'atoi', 'encoding_alias', 'format', 'getdefaultlocale', 'getlocale', 'locale_alias', 'localeconv', 'normalize', 'resetlocale', 'setlocale', 'str', 'strcoll', 'string', 'strxfrm', 'sys', 'windows_locale'] >>> Nor is LC_MESSAGES std C (the other LC_XXX guys are). I pin the blame on from _locale import * in locale.py -- who knows what that's supposed to export? Certainly not Skip . From tim.one@home.com Wed Jan 24 10:17:47 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 05:17:47 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: Message-ID: Nevermind; checked in a hack to stop the error on Windows. From mal@lemburg.com Wed Jan 24 13:00:28 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 14:00:28 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> Message-ID: <3A6ED1EC.237B5B1D@lemburg.com> Fredrik Lundh wrote: > > > It's come up before. The problem with it is that it's not quite obvious > > whether it is 'if key in dict' or 'if value in dict'. > > you forgot "if (key, value) in dict" > > on the other hand, it's not quite obvious that "list.sort" > doesn't return the sorted list, "print >>None" prints to > standard output, "except KeyError, ValueError" doesn't > catch a ValueError exception, etc, etc, etc. > > (nor that it's "has_key" and "hasattr", and not "has_key" > and "has_attr" or "haskey" and "hasattr" ;-) > > let's just say that "in" is the same thing as "has_key", > and be done with it. +1 all the way :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jan 24 14:01:33 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 15:01:33 +0100 Subject: [Python-Dev] Interfaces (Is X a (sequence|mapping)?) References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> <3A6D4B9F.38B17046@lemburg.com> <200101231531.KAA05122@cj20424-a.reston1.va.home.com> Message-ID: <3A6EE03D.4D5DFD17@lemburg.com> Guido van Rossum wrote: > > > Polymorphic code will usually get you more out of an > > algorithm, than type-safe or interface-safe code. > > Right. > > But there are times when people want to write methods that take > e.g. either a sequence or a mapping, and need to distinguish between > the two. That's not easy in Python! Java and C++ support it very > well though, and thus we'll always keep seeing this kind of > complaint. Not sure what to do, except to recommend "find out which > methods you expect in one case but not in the other (e.g. keys()) and > do a hasattr() test for that." Perhaps we should provide simple means for testing a set of available methods and slots ?! E.g. hasinterface(obj, ('keys', 'items', '__len__')) Objects could provide an __interface__ special attribute for this purpose (since not all slots can be auto-detected and -verified without side-effects). > > BTW, there are Python interfaces to PySequence_Check() and > > PyMapping_Check() burried in the builtin operator module in case > > you really do care ;) ... > > > > operator.isSequenceType() > > operator.isMappingType() > > + some other C style _Check() APIs > > > > These only look at the type slots though, so Python instances > > will appear to support everything but when used fail with > > an exception if they don't provide the proper __xxx__ hooks. > > Yes, these should probably be deprecated. I certainly have never used > them! (The operator module doesn't seem to get much use in > general... Was it a bad idea?) Some of these are nice to have and provide some good performance boost (e.g. the numeric slot access APIs). The type slot checking APIs are not too useful though. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim@digicool.com Wed Jan 24 09:05:44 2001 From: jim@digicool.com (Jim Fulton) Date: Wed, 24 Jan 2001 04:05:44 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> <20010123150616.F18796@trump.amber.org> Message-ID: <3A6E9AE8.6C2D3CF0@digicool.com> Christopher Petrilli wrote: > > Neil Schemenauer [nas@arctrix.com] wrote: > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > Unfortunately, for me, a Python implementation of Sets is only > > > interesting academicaly. Any time I've needed to work with them at a > > > large scale, I've needed them *much* faster than Python could achieve > > > without a C extension. > > > > I think this argues that if sets are added to the core they > > should be implemented as an extension type with the speed of > > dictionaries and the memory usage of lists. Basicly, we would > > use the implementation of PyDict but drop the values. > > This is effectively the implementation that Zope has for Sets. Except we use sorted collections with binary search for sets. I think that a simple hash-based set would make alot of sense. > In > addition we have "buckets" that have scores on them (which are > implemented as a modified BTree). > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > a meeting, but I hope he'll comment on the implementation that was > chosen for our software. We have a number of special needs: - Scalability is critical. We make some special opimizations, like sets of integers and mapping objects with integer keys and values. In these cases, data are stored using C int arrays, allowing very efficient data storage and manipulation, especially when using integer keys. - We need to spread data over multiple database records. Our data structures may be hundreds of megabytes in size. We have ZODB-aware structures that use multiple independently stored database objects. - Range searches are very common, and under some circomstances, sorted collections and BTrees can have very little overhead compared to dictionaries. For this reason, out mapping objects and sets have been based on BTrees and sorted collections. Unfortunately, our current BTree implementation has a flaw that causes excessive number of objects to be updated when items are added and removed. (Each BTree internal node keeps track of the number of objects contained in it.) Also, out current sets are limited to integers and cannot be spread over multiple database records. We are completing a new BTree implementation that overcomes these limitations. IN this implementation, we will provide sets as value-less BTrees. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org From gvwilson@nevex.com Wed Jan 24 14:10:41 2001 From: gvwilson@nevex.com (Greg Wilson) Date: Wed, 24 Jan 2001 09:10:41 -0500 Subject: [Python-Dev] re: sets In-Reply-To: <20010124032401.EB329F199@mail.python.org> Message-ID: <000301c0860f$6fa29010$770a0a0a@nevex.com> 1. I did a poll overnight by email of 22 friends and colleagues, none of whom are regular Python users (yet). My question was, "Would you expect the interface of a set class to be like the interface of a vector or list, or like the interface of a map or hash?" 15 people have replied; all 15 have said, "map or hash". Several respondents are Perl hackers, so I'm sure the answer is influenced by previous exposure to the set-as-valueless-hash idiom. Still, I think 15-0 is a pretty convincing score... Four, unprompted, said that they thought the STL's hierarchy of containers was as good as it gets, and that other languages should mirror it. (One of those added that this makes teaching much simpler --- students can transfer instincts from one language to another.) 2. Is there enough interest in sets for a BOF at IPC9? Please reply to me point-to-point if you're interested; I'll summarize and post the result. I volunteer to bring the donuts... > > Ka-Ping Yee: > > The only change that needs to be made to support sets of immutable > > elements is to provide "in" on dictionaries. The rest is then all > > quite natural: > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > various: > > [but what about 'value in dict' or '(key, value) in dict'?] > Fredrik Lundh: > let's just say that "in" is the same thing as "has_key", > and be done with it. > Guido van Rossum: > You know, I've long resisted this, but I agree now -- this is the > right thing. Greg Wilson: Woo hoo! Now, on a related note, what is the status of the 'indices()' proposal, as in: for i in indices(someList): instead of: for i in range(len(someList)): Would 'indices(dict)' be the same as 'dict.keys()', to allow uniform iteration? Or would it be more economical to introduce a 'keys()' method on lists and tuples, so that: for i in collection.keys(): would work on dicts, lists, and tuples? I know that 'keys()' is the wrong name for lists and tuples, but dicts are already using it, and it's completely unambiguous... Thanks, Greg From mal@lemburg.com Wed Jan 24 14:46:10 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 15:46:10 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> <20010123150616.F18796@trump.amber.org> <3A6E9AE8.6C2D3CF0@digicool.com> Message-ID: <3A6EEAB2.5E6A4E83@lemburg.com> Jim Fulton wrote: > > Christopher Petrilli wrote: > > > > Neil Schemenauer [nas@arctrix.com] wrote: > > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > > Unfortunately, for me, a Python implementation of Sets is only > > > > interesting academicaly. Any time I've needed to work with them at a > > > > large scale, I've needed them *much* faster than Python could achieve > > > > without a C extension. > > > > > > I think this argues that if sets are added to the core they > > > should be implemented as an extension type with the speed of > > > dictionaries and the memory usage of lists. Basicly, we would > > > use the implementation of PyDict but drop the values. > > > > This is effectively the implementation that Zope has for Sets. > > Except we use sorted collections with binary search for sets. > > I think that a simple hash-based set would make alot of sense. > > > In > > addition we have "buckets" that have scores on them (which are > > implemented as a modified BTree). > > > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > > a meeting, but I hope he'll comment on the implementation that was > > chosen for our software. > > We have a number of special needs: > > - Scalability is critical. We make some special opimizations, > like sets of integers and mapping objects with integer keys > and values. In these cases, data are stored using C int arrays, > allowing very efficient data storage and manipulation, especially > when using integer keys. > > - We need to spread data over multiple database records. Our data > structures may be hundreds of megabytes in size. We have ZODB-aware > structures that use multiple independently stored database objects. > > - Range searches are very common, and under some circomstances, > sorted collections and BTrees can have very little overhead > compared to dictionaries. For this reason, out mapping objects > and sets have been based on BTrees and sorted collections. > > Unfortunately, our current BTree implementation has a flaw that > causes excessive number of objects to be updated when items are > added and removed. (Each BTree internal node keeps track of the number > of objects contained in it.) Also, out current sets are limited > to integers and cannot be spread over multiple database records. > > We are completing a new BTree implementation that overcomes these > limitations. IN this implementation, we will provide sets as > value-less BTrees. You may want to check out a soon to be released new mx package: mxBeeBase. This is an on-disk b+tree implementation which supports data files up to 2GB on 32-bit platforms. Here's a preview: http://www.lemburg.com/python/mxBeeBase.html (The links on that page are not functional.) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip@mojam.com (Skip Montanaro) Wed Jan 24 14:42:23 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 24 Jan 2001 08:42:23 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: References: Message-ID: <14958.59855.4855.52638@beluga.mojam.com> Tim> Nor is LC_MESSAGES std C (the other LC_XXX guys are). Tim> I pin the blame on Tim> from _locale import * Tim> in locale.py -- who knows what that's supposed to export? Tim> Certainly not Skip . Was that a roundabout way of complimenting me for having found a bug? ;-) Skip From skip@mojam.com (Skip Montanaro) Wed Jan 24 14:50:02 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 24 Jan 2001 08:50:02 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: References: Message-ID: <14958.60314.482226.825611@beluga.mojam.com> Tim> Nevermind; checked in a hack to stop the error on Windows. Probably should file a bug report (if you haven't already) so the root problem isn't forgotten because the hack obscures it. I see this code in localemodule.c: #ifdef LC_MESSAGES x = PyInt_FromLong(LC_MESSAGES); PyDict_SetItemString(d, "LC_MESSAGES", x); Py_XDECREF(x); #endif /* LC_MESSAGES */ Martin, looks like this module is your baby. Care to hazard a guess about whether LC_MESSAGES should always or never be there? Skip From fredrik@effbot.org Wed Jan 24 15:11:33 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 16:11:33 +0100 Subject: [Python-Dev] test___all__ failing; Windows References: <14958.60314.482226.825611@beluga.mojam.com> Message-ID: <04de01c08617$f56216f0$e46940d5@hagrid> Skip wrote: > Probably should file a bug report (if you haven't already) so the root > problem isn't forgotten because the hack obscures it. I see this code in > localemodule.c: > > #ifdef LC_MESSAGES > x = PyInt_FromLong(LC_MESSAGES); > PyDict_SetItemString(d, "LC_MESSAGES", x); > Py_XDECREF(x); > #endif /* LC_MESSAGES */ > > Martin, looks like this module is your baby. Care to hazard a guess about > whether LC_MESSAGES should always or never be there? I think the correct answer is "sometimes": ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, and LC_TIME Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME in other words, if it's supported, it should be exposed by the Python bindings. Cheers /F From tismer@tismer.com Wed Jan 24 14:40:04 2001 From: tismer@tismer.com (Christian Tismer) Date: Wed, 24 Jan 2001 16:40:04 +0200 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <3A6EE944.C8CC6EF7@tismer.com> Greg Ewing wrote: > > Neil Schemenauer : > > > Basicly, we would > > use the implementation of PyDict but drop the values. > > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Very good idea. It fits also in my view of how dicts should be implemented: Keep keys and values apart, since this information has different access patterns. I think (or at least hope) that dictionaries become faster, when hashes, keys and values are in seperate areas, giving more cache hits. Not sure if hashes and keys should be apart, but sure for values. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@digicool.com Wed Jan 24 15:37:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 10:37:03 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: Your message of "Wed, 24 Jan 2001 00:28:21 CST." <14958.30213.325584.373062@beluga.mojam.com> References: <14957.52952.48739.53360@beluga.mojam.com> <200101240311.WAA06582@cj20424-a.reston1.va.home.com> <14958.30213.325584.373062@beluga.mojam.com> Message-ID: <200101241537.KAA27039@cj20424-a.reston1.va.home.com> > Guido> I think I saw a complaint about this that specifically said that > Guido> when dbhash is imported when bsddb can't be imported, an > Guido> incomplete dbhash is left behind in sys.modules, and then a > Guido> second import of dbhash will succeed -- but of course it will > Guido> define no objects. > > So it does: > > % ./python > Python 2.1a1 (#2, Jan 23 2001, 23:30:41) > [GCC 2.95.3 19991030 (prerelease)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> import dbhash > Traceback (most recent call last): > File "", line 1, in ? > File "/home/beluga/skip/src/python/dist/src/Lib/dbhash.py", line 3, in ? > import bsddb > ImportError: No module named bsddb > >>> import dbhash > >>> > > Can that be construed as a bug? If import fails, shouldn't the stub module > that was inserted in sys.modules be removed? Yep, but not a very important bug -- typically this isn't caught. Feel free to check in a change; I think you should be able to insert something like import sys try: import bsddb except ImportError: del sys.modules[__name__] raise into dbhash. If this works for you in testing, forget the patch manager, just check it in. (I'm too busy to do much myself, the company needs me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pf@artcom-gmbh.de Wed Jan 24 15:32:55 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 24 Jan 2001 16:32:55 +0100 (MET) Subject: LC_MESSAGES (was Re: [Python-Dev] test___all__ failing; Windows) In-Reply-To: <14958.60314.482226.825611@beluga.mojam.com> from Skip Montanaro at "Jan 24, 2001 8:50: 2 am" Message-ID: Hi, Skip Montanaro: > > Tim> Nevermind; checked in a hack to stop the error on Windows. > > Probably should file a bug report (if you haven't already) so the root > problem isn't forgotten because the hack obscures it. I see this code in > localemodule.c: > > #ifdef LC_MESSAGES > x = PyInt_FromLong(LC_MESSAGES); > PyDict_SetItemString(d, "LC_MESSAGES", x); > Py_XDECREF(x); > #endif /* LC_MESSAGES */ > > Martin, looks like this module is your baby. Care to hazard a guess about > whether LC_MESSAGES should always or never be there? AFAI found out, LC_MESSAGES was added to the POSIX "standard" in Posix.2. Non-posix2 compatible systems probably miss the proper functionality behind 'setlocale()'. So the best solution would be to add a clever emulation/approximation of this feature, if the underlying platform (here windows) doesn't provide it. This would require to wrap 'setlocale()'. But I'm not sure how to emulate for example 'setlocale(LC_MESSAGES, 'DE_de') on a Windows box. May be it is impossible to achieve. What I would love to see is that the typical query 'setlocale(LC_MESSAGES)' would return 'DE_de' on a Box running for example the german version of Windows or MacOS. This would eliminate the need for ugly language selection menus on these platforms in a portable fashion. Regards, Peter From guido@digicool.com Wed Jan 24 15:41:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 10:41:07 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Wed, 24 Jan 2001 16:07:08 +0200." <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> Message-ID: <200101241541.KAA27082@cj20424-a.reston1.va.home.com> > > This could be incorporated into PyDict. Instead of storing keys and > > values in the same array, keep them in separate arrays and only > > allocate the values array the first time someone stores a value other > > than 1. > > Cool idea, but even cooler (would catch more idioms, that is) is > "the first time someone stores something not 'is' something in the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > dict, allocate the values array". This would catch small numbers, > None and identifier-looking strings, for the measly cost of one > pointer/dict object. Sorry, but I don't understand what you mean by the ^^^ marked phrase. Can you please elaborate? Regarding storing one for "present", that's all well and fine, but it suggests to me that storing a false value could mean "not present". Do we really want that? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez@zadka.site.co.il Thu Jan 25 00:50:13 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 02:50:13 +0200 (IST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101241541.KAA27082@cj20424-a.reston1.va.home.com> References: <200101241541.KAA27082@cj20424-a.reston1.va.home.com>, <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> Message-ID: <20010125005013.58C12A840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 10:41:07 -0500, Guido van Rossum wrote: > > Cool idea, but even cooler (would catch more idioms, that is) is > > "the first time someone stores something not 'is' something in the > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > dict, allocate the values array". This would catch small numbers, > > None and identifier-looking strings, for the measly cost of one > > pointer/dict object. > > Sorry, but I don't understand what you mean by the ^^^ marked phrase. > Can you please elaborate? I should really stop writing incomprehensible bits like that. Heck, I can't even understand it on second reading. I meant that the dictionary would keep a slot for "the one and only value". First time someone puts a value in the dict, it puts it in the "one and only value" slot, and doesn't initalize the value array. The second time someone puts a value, it checks for pointer equality with that "one and only value". If it is the same, it it still doesn't initalize the value array. The only time when the dictionary initalizes the value array is when two pointer-different values are put in. This would let me code a[key] = None For my sets (but consistent in the same set!) a[key] = 1 When the timbot codes (again, consistent in the same set) and a[key] = 'present' If you're really weird. (identifier-like strings get interned) That's not *semantics*, that's *optimization* for a commonly used (I think) idiom with dictionaries -- you can't predict the value, but it will probably remain the same. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From skip@mojam.com (Skip Montanaro) Wed Jan 24 16:44:17 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 24 Jan 2001 10:44:17 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <04de01c08617$f56216f0$e46940d5@hagrid> References: <14958.60314.482226.825611@beluga.mojam.com> <04de01c08617$f56216f0$e46940d5@hagrid> Message-ID: <14959.1633.163407.779930@beluga.mojam.com> Fredrik> I think the correct answer is "sometimes": Fredrik> ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, Fredrik> LC_MONETARY, LC_NUMERIC, and LC_TIME Fredrik> Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, Fredrik> LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and Fredrik> LC_TIME Fredrik> in other words, if it's supported, it should be exposed by Fredrik> the Python bindings. Then this suggests that either Tim's hack is the correct fix (leave it out because we can't rely on it always being there) or I should add it to __all__ at the bottom of the file if and only if it's present in the module's namespace. Skip From moshez@zadka.site.co.il Thu Jan 25 00:57:22 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 02:57:22 +0200 (IST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <04de01c08617$f56216f0$e46940d5@hagrid> References: <04de01c08617$f56216f0$e46940d5@hagrid>, <14958.60314.482226.825611@beluga.mojam.com> Message-ID: <20010125005722.D2229A840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 16:11:33 +0100, "Fredrik Lundh" wrote: > I think the correct answer is "sometimes": > > ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, > LC_MONETARY, LC_NUMERIC, and LC_TIME > > Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, > LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and > LC_TIME > > in other words, if it's supported, it should be exposed by > the Python bindings. In that case, the __all__ attribute in the module has to be calculated dynamically. Say, adding code like try: LC_MESSAGES except NameError: pass else: __all__.append('LC_MESSAGES') Ditto for anything else. Should I check in a patch? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From trentm@ActiveState.com Wed Jan 24 16:49:17 2001 From: trentm@ActiveState.com (Trent Mick) Date: Wed, 24 Jan 2001 08:49:17 -0800 Subject: [Python-Dev] webbrowser.py In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 03:28:34AM -0500 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124084917.C29977@ActiveState.com> How will the expected adherence of apps to BROWSER jive with the current (and poorly understood by me) Windows convention of specifying the "default" browser somewhere in the registry? Trent -- Trent Mick TrentM@ActiveState.com From skip@mojam.com (Skip Montanaro) Wed Jan 24 16:49:23 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 24 Jan 2001 10:49:23 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <20010125005722.D2229A840@darjeeling.zadka.site.co.il> References: <04de01c08617$f56216f0$e46940d5@hagrid> <14958.60314.482226.825611@beluga.mojam.com> <20010125005722.D2229A840@darjeeling.zadka.site.co.il> Message-ID: <14959.1939.398029.896891@beluga.mojam.com> Moshe> In that case, the __all__ attribute in the module has to be Moshe> calculated dynamically. Say, adding code like No need. I've already got this exact change in my local copy and I'll be adding a few more __all__ lists later today. Skip From paulp@ActiveState.com Wed Jan 24 16:56:26 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 24 Jan 2001 08:56:26 -0800 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> <200101241541.KAA27082@cj20424-a.reston1.va.home.com> Message-ID: <3A6F093A.A311C71E@ActiveState.com> Guido van Rossum wrote: > >... > > > Cool idea, but even cooler (would catch more idioms, that is) is > > "the first time someone stores something not 'is' something in the > > Sorry, but I don't understand what you mean by the ^^^ marked phrase. > Can you please elaborate? I wasn't clear about that either. The idea is: def add(new_value): if not values_array: if self.magic_value is NULL: self.magic_value = new_value elif new_value is not self.magic_value: self.values_array=[self.magic_value, new_value, ... ] else: # new_value is self.magic_value: do nothing I am neutral on this proposal myself. I think that even if we optimize any code where you pass the same thing over and over again, we should document a convention for consistency. So I'm not sure there is much advantage. Paul Prescod From esr@thyrsus.com Wed Jan 24 16:53:31 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 11:53:31 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124084917.C29977@ActiveState.com>; from trentm@ActiveState.com on Wed, Jan 24, 2001 at 08:49:17AM -0800 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> Message-ID: <20010124115331.A15059@thyrsus.com> Trent Mick : > How will the expected adherence of apps to BROWSER jive with the current (and > poorly understood by me) Windows convention of specifying the "default" > browser somewhere in the registry? BROWSER overrides the registry setting. Which is OK; under Windows, only wizards are going to muck with it. -- Eric S. Raymond Ideology, politics and journalism, which luxuriate in failure, are impotent in the face of hope and joy. -- P. J. O'Rourke From guido@digicool.com Wed Jan 24 16:59:00 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 11:59:00 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: Your message of "Wed, 24 Jan 2001 10:44:17 CST." <14959.1633.163407.779930@beluga.mojam.com> References: <14958.60314.482226.825611@beluga.mojam.com> <04de01c08617$f56216f0$e46940d5@hagrid> <14959.1633.163407.779930@beluga.mojam.com> Message-ID: <200101241659.LAA27650@cj20424-a.reston1.va.home.com> > Fredrik> I think the correct answer is "sometimes": > > Fredrik> ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, > Fredrik> LC_MONETARY, LC_NUMERIC, and LC_TIME > > Fredrik> Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, > Fredrik> LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and > Fredrik> LC_TIME > > Fredrik> in other words, if it's supported, it should be exposed by > Fredrik> the Python bindings. > > Then this suggests that either Tim's hack is the correct fix (leave it out > because we can't rely on it always being there) or I should add it to > __all__ at the bottom of the file if and only if it's present in the > module's namespace. The latter. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez@zadka.site.co.il Thu Jan 25 17:05:44 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 19:05:44 +0200 (IST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124084917.C29977@ActiveState.com> References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 08:49:17 -0800, Trent Mick wrote: > How will the expected adherence of apps to BROWSER jive with the current (and > poorly understood by me) Windows convention of specifying the "default" > browser somewhere in the registry? The "webbrowser" module should prefer to take the setting from the registry on windows. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido@digicool.com Wed Jan 24 17:17:09 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 12:17:09 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Thu, 25 Jan 2001 02:50:13 +0200." <20010125005013.58C12A840@darjeeling.zadka.site.co.il> References: <200101241541.KAA27082@cj20424-a.reston1.va.home.com>, <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> <20010125005013.58C12A840@darjeeling.zadka.site.co.il> Message-ID: <200101241717.MAA27852@cj20424-a.reston1.va.home.com> > I meant that the dictionary would keep a slot for "the one and only > value". First time someone puts a value in the dict, it puts it > in the "one and only value" slot, and doesn't initalize the value > array. The second time someone puts a value, it checks for pointer > equality with that "one and only value". If it is the same, it > it still doesn't initalize the value array. The only time when > the dictionary initalizes the value array is when two pointer-different > values are put in. > > This would let me code > > a[key] = None > > For my sets (but consistent in the same set!) > > a[key] = 1 > > When the timbot codes (again, consistent in the same set) > > and > > a[key] = 'present' > > If you're really weird. > > (identifier-like strings get interned) > > That's not *semantics*, that's *optimization* for a commonly > used (I think) idiom with dictionaries -- you can't predict > the value, but it will probably remain the same. This I like! But note that a dict currently uses 12 bytes per slot in the hash table (on a 32-bit platform: long me_hash; PyObject *me_key, *me_value). The hash table's fill factor is typically between 50 and 67%. I think removing the hashes would slow down lookups too much, so optimizing identical values out would only save 6-8 bytes per existing key on average. Not clear if it's worth enough. I think I have to agree with Tim's expectation that two (or three) separate parallel arrays will reduce the cache locality and thus slow things down. Once you start probing, you jump through the hashtable at large random strides, causing bad cache performance (for largeish hash tables); but since often enough the first slot tried is right, you have the hash, key and value right next together, typically on the same cache line. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Wed Jan 24 17:31:55 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 12:31:55 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 25, 2001 at 07:05:44PM +0200 References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> Message-ID: <20010124123155.A15203@thyrsus.com> Moshe Zadka : > > How will the expected adherence of apps to BROWSER jive with the > > current (and poorly understood by me) Windows convention of > > specifying the "default" browser somewhere in the registry? > > The "webbrowser" module should prefer to take the setting from the > registry on windows. Um, that's not the way it works right now. The windows-default browser choice launches the registered default browser, but BROWSER may have something else in its search list first. -- Eric S. Raymond The real point of audits is to instill fear, not to extract revenue; the IRS aims at winning through intimidation and (thereby) getting maximum voluntary compliance -- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980 From esr@thyrsus.com Wed Jan 24 17:52:11 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 12:52:11 -0500 Subject: [Python-Dev] BROWSER status Message-ID: <20010124125211.A15276@thyrsus.com> I spent the morning writing and testing patches to make urlview and GNU Emacs BROWSER-aware, and have sent them off to the relevant maintainers. I've also sent a patch to Andries Brouwer for the environ(5) man page. Those of you interested in my latest bit of social engineering can take a look at http://www.tuxedo.org/~esr/BROWSER/ A bow in Guido's direction -- if he hadn't been grouchy about this I probably wouldn't have gotten to shipping those patches for a while. -- Eric S. Raymond A right is not what someone gives you; it's what no one can take from you. -- Ramsey Clark From thomas@xs4all.net Wed Jan 24 18:33:27 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 19:33:27 +0100 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 25, 2001 at 07:05:44PM +0200 References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> Message-ID: <20010124193326.B962@xs4all.nl> On Thu, Jan 25, 2001 at 07:05:44PM +0200, Moshe Zadka wrote: > On Wed, 24 Jan 2001 08:49:17 -0800, Trent Mick wrote: > > > How will the expected adherence of apps to BROWSER jive with the current (and > > poorly understood by me) Windows convention of specifying the "default" > > browser somewhere in the registry? > The "webbrowser" module should prefer to take the setting from the > registry on windows. Why ? That's a lot harder to change, and not settable per 'shell'/'thread'/'process'. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Wed Jan 24 19:54:47 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 14:54:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124115331.A15059@thyrsus.com> Message-ID: Guys, while I like BROWSER, don't think it has anything to do with Windows! Windows is not Unix; doesn't have PAGER or EDITOR either; and, in general, use of envars is an abomination under Windows. The old webbrowser.py uses the Windows-specific os.startfile(url) because that's the *right* way to do it on Windows, wizard or not. And you would have to be a Windows wizard to succeed in launching a browser under Windows in any other way anyway. You may as well try to sell the notion that, on Unix, Python should maintain a dict mapping file extensions to the user's preferred ways of opening such files <0.9 wink>. From tim.one@home.com Wed Jan 24 19:56:32 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 14:56:32 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124193326.B962@xs4all.nl> Message-ID: >> The "webbrowser" module should prefer to take the setting from the >> registry on windows. > Why ? That's a lot harder to change, and not settable per > 'shell'/'thread'/'process'. A Windows user has a legitimate expectation that *every* time an .html file is opened, it will come up in their browser of choice. That choice is made via the registry, and this is how *all* apps work under Windows. Ditto for .htm files (and that may be a different browser than is used for .html files, but again the user has set up their registry to do what *they* want done with it). It's not supposed to be easy to change; it is supposed to be consistent. Using a different browser per shell/thread/process is a foreign concept; it's also a useless concept on Windows <0.5 wink>. From tim.one@home.com Wed Jan 24 20:32:35 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 15:32:35 -0500 Subject: LC_MESSAGES (was Re: [Python-Dev] test___all__ failing; Windows) In-Reply-To: Message-ID: [Peter Funk] > ... > AFAI found out, LC_MESSAGES was added to the POSIX "standard" in Posix.2. FYI, it appears that C99 declined to adopt this extension to C89, but don't know why (the C99 Rationale doesn't mention it). That means the vendors who don't already support it can (well, *will*) use the new C99 std as "a reason" to continue leaving it out. From tim.one@home.com Wed Jan 24 20:15:28 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 15:15:28 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <14959.1633.163407.779930@beluga.mojam.com> Message-ID: [Skip] > Then this suggests that either Tim's hack is the correct fix (leave it out > because we can't rely on it always being there) or I should add it to > __all__ at the bottom of the file if and only if it's present in the > module's namespace. What you suggest at the end *is* the hack I checked in. That is, it's already done. The existence of LC_MESSAGES is clearly platform-specific; if anyone can say for sure a priori *which* platforms it's available on, tell Fred Drake so he can update the docs accordingly. From skip@mojam.com (Skip Montanaro) Wed Jan 24 21:25:45 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 24 Jan 2001 15:25:45 -0600 (CST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124123155.A15203@thyrsus.com> References: <20010124084917.C29977@ActiveState.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> <20010124123155.A15203@thyrsus.com> Message-ID: <14959.18521.648454.488731@beluga.mojam.com> >>>>> "Eric" == Eric S Raymond writes: Moshe Zadka : >> The "webbrowser" module should prefer to take the setting from the >> registry on windows. Eric> Um, that's not the way it works right now. The windows-default Eric> browser choice launches the registered default browser, but Eric> BROWSER may have something else in its search list first. Why not have a special REGISTRY token you can place in the BROWSER path to tell it when to consult the registry? On non-Windows platforms it can simply be ignored: BROWSER=netscape:REGISTRY:explorer Skip From esr@thyrsus.com Wed Jan 24 21:30:44 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 16:30:44 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <14959.18521.648454.488731@beluga.mojam.com>; from skip@mojam.com on Wed, Jan 24, 2001 at 03:25:45PM -0600 References: <20010124084917.C29977@ActiveState.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> <20010124123155.A15203@thyrsus.com> <14959.18521.648454.488731@beluga.mojam.com> Message-ID: <20010124163044.A15877@thyrsus.com> Skip Montanaro : > Why not have a special REGISTRY token you can place in the BROWSER path to > tell it when to consult the registry? On non-Windows platforms it can > simply be ignored: In effect, windows-default is that special token. -- Eric S. Raymond The Bible is not my book, and Christianity is not my religion. I could never give assent to the long, complicated statements of Christian dogma. -- Abraham Lincoln From martin@mira.cs.tu-berlin.de Wed Jan 24 21:41:11 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 24 Jan 2001 22:41:11 +0100 Subject: [Python-Dev] Tkinter documentation (Was: What does "batteries are included" mean?) Message-ID: <200101242141.f0OLfBT01812@mira.informatik.hu-berlin.de> > It's already a blot on Python that the standard documentation set > doesn't cover Tkinter. Just point your friendly web browser to Ping's HTML generator and ask for Tkinter, or invoke "pydoc.py Tkinter". [I wouldn't have brought this up if it hadn't been the contribution of my friend Nils Fischbeck:-] Regards, Martin From nas@arctrix.com Wed Jan 24 15:31:55 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Wed, 24 Jan 2001 07:31:55 -0800 Subject: [Python-Dev] Makefile changes Message-ID: <20010124073155.B32266@glacier.fnational.com> I've checked in my new makefile. Hopefully everything goes well. The following files are no longer used so please don't patch them: Grammar/Makefile.in Include/Makefile Lib/Makefile Modules/Makefile.pre.in Objects/Makefile.in Parser/Makefile.in Python/Makefile.in Makefile.in They will be removed in a few days assuming all goes well. You should re-run configure to use the new makefile. I would appreciate it if people using platforms other than Linux and GNU make could give me some feedback on the build process. Does configure and make work okay? Does "make test" and "make install" work? Thanks. Neil From greg@cosc.canterbury.ac.nz Wed Jan 24 22:55:00 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Jan 2001 11:55:00 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> Message-ID: <200101242255.LAA02208@s454.cosc.canterbury.ac.nz> Guido: > But shouldn't the default value be something else, > like none? It should really be whatever is the first value that gets stored after the dict is created. That way people can use whatever they want for their dummy value and it will Just Work. And it will probably catch most existing uses of a dict as a set as well. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From ping@lfw.org Wed Jan 24 20:33:43 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 24 Jan 2001 12:33:43 -0800 (PST) Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! Message-ID: Hi -- after updating my CVS tree today with Python 2.1a1, i ran the tests and test_inspect failed. This revealed that the format of code.co_varnames has changed. At first i tried to update the inspect.py module to check the Python version number and track the change, but now i believe this is actually symptomatic of a real interpreter problem. Consider the function: def f(a, (b, c), *d): x = 1 print a, b, c, d, x Whereas in Python 1.5.2: f.func_code.co_argcount = 2 f.func_code.co_nlocals = 6 f.func_code.co_names = ('x', 'a', 'b', 'c', 'd') f.func_code.co_varnames = ('a', '.2', 'd', 'b', 'c', 'x') In Python 2.1a1: f.func_code.co_argcount = 2 f.func_code.co_nlocals = 6 f.func_code.co_names = ('b', 'c', 'x', 'a', 'd') f.func_code.co_varnames = ('a', '.2', 'b', 'c', 'd', 'x') Notice how the ordering of the variable names has changed. I went and looked at the CO_VARARGS clause in eval_code2 to see if it put the varargs and kwdict arguments in different slots, but it appears unchanged! It still puts varargs at locals[co_argcount] and kwdict at locals[co_argcount + 1]. Please try: >>> def f(a, (b, c), *d): ... x = 1 ... print a, b, c, d, x ... >>> f(1, (2, 3), 4) 1 2 3 Traceback (most recent call last): File "", line 1, in ? File "", line 3, in f UnboundLocalError: local variable 'd' referenced before assignment >>> In Python 1.5.2, this prints "1 2 3 (4,)" as expected. I only have 1.5.2 and 2.1a1 to test. I hope this problem isn't present in 2.0... Note that test_inspect was the only test to fail! It might be the only test that checks anonymous and *varargs at the same time. (Yet another reason to put inspect in the core...) I did recently check in additions to test_extcall that made the test much beefier -- but that only tested combinations of regular, keyword, varargs, and kwdict arguments; it neglected to test anonymous (tuple) arguments as well. -- ?!ng From tim.one@home.com Wed Jan 24 23:56:25 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 18:56:25 -0500 Subject: [Python-Dev] Re: test___all__ failing; Windows Message-ID: > In that case, the __all__ attribute in the module has to be calculated > dynamically. Say, adding code like > > try: > LC_MESSAGES > except NameError: > pass > else: > __all__.append('LC_MESSAGES') > > Ditto for anything else. > > Should I check in a patch? SourceForge CVS doesn't appear to be broken, so I can only conclude everyone decided this was a bad to stop taking drugs <0.9 wink>. From tim.one@home.com Thu Jan 25 00:04:50 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 19:04:50 -0500 Subject: [Python-Dev] (no subject) Message-ID: [Skip] > Why not have a special REGISTRY token you can place in the BROWSER > path to tell it when to consult the registry? On non-Windows > platforms it can simply be ignored: > > BROWSER=netscape:REGISTRY:explorer Because non-Windows platforms shouldn't be bothered with Windows silliness any more than Windows users should be bothered with Unix silliness. BROWSER isn't of any use on Windows, and REGISTRY isn't of any use on Unix. Eric may still *think* BROWSER is of use on Windows, but if so that's not really a technical problem . From thomas@xs4all.net Thu Jan 25 00:25:54 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 01:25:54 +0100 Subject: [Python-Dev] Makefile changes In-Reply-To: <20010124073155.B32266@glacier.fnational.com>; from nas@arctrix.com on Wed, Jan 24, 2001 at 07:31:55AM -0800 References: <20010124073155.B32266@glacier.fnational.com> Message-ID: <20010125012554.F962@xs4all.nl> On Wed, Jan 24, 2001 at 07:31:55AM -0800, Neil Schemenauer wrote: > I would appreciate it if people using platforms other than Linux > and GNU make could give me some feedback on the build process. > Does configure and make work okay? Does "make test" and "make > install" work? Thanks. Only have time for a quick check now, and no time what so ever tomorrow, but at first glance, it looks okay (read: it compiles Python) on BSDI 4.0.1, BSDI 4.1 and FreeBSD 4.2. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr@thyrsus.com Thu Jan 25 00:15:10 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 19:15:10 -0500 Subject: [Python-Dev] (no subject) In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 07:04:50PM -0500 References: Message-ID: <20010124191510.A17782@thyrsus.com> Tim Peters : > Because non-Windows platforms shouldn't be bothered with Windows silliness > any more than Windows users should be bothered with Unix silliness. BROWSER > isn't of any use on Windows, and REGISTRY isn't of any use on Unix. Eric > may still *think* BROWSER is of use on Windows, but if so that's not really > a technical problem . Actually that's not something I have an opinion on. I addressed the original question because I know it would be technically possible to set a BROWSER variable under Windows. Yes, an unlikely move, but possible. -- Eric S. Raymond A man who has nothing which he is willing to fight for, nothing which he cares about more than he does about his personal safety, is a miserable creature who has no chance of being free, unless made and kept so by the exertions of better men than himself. -- John Stuart Mill, writing on the U.S. Civil War in 1862 From tim.one@home.com Thu Jan 25 04:38:54 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 24 Jan 2001 23:38:54 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <3A6EE944.C8CC6EF7@tismer.com> Message-ID: [Christian Tismer] > ... > Not sure if hashes and keys should be apart, but > sure for values. How so? That is, under what assumptions? Any savings from separation would appear to require that I look up keys a lot more than I access the associated values; while trivially true for dicts used as sets, it seems dubious to me for use of dicts as mappings (count[word] += 1, etc). From Jason.Tishler@dothill.com Thu Jan 25 06:09:47 2001 From: Jason.Tishler@dothill.com (Jason Tishler) Date: Thu, 25 Jan 2001 01:09:47 -0500 Subject: [Python-Dev] Re: Python 2.1 alpha 1 released! In-Reply-To: <200101230333.WAA28376@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 10:33:02PM -0500 References: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Message-ID: <20010125010947.M1256@dothill.com> On Mon, Jan 22, 2001 at 10:33:02PM -0500, Guido van Rossum wrote: > - Python should now build out of the box on Cygwin. If it doesn't, > mail to Jason Tishler (jlt63 at users.sourceforge.net). Although Python CVS built OOTB under Cygwin until 2001/01/17 18:54:54, Python 2.1a1 needs a small patch in order to build cleanly under Cygwin. If interested, please see the following for details: http://www.cygwin.com/ml/cygwin-apps/2001-01/msg00019.html Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From tim.one@home.com Thu Jan 25 07:29:19 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 25 Jan 2001 02:29:19 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231549.KAA05172@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > It's no big deal if the Vaults contain three or more set modules -- > perfect even, people can choose the best one for their purpose. They really can't, not realistically, unless all the modules in question conform to the same interface (which users can't control), and users restrict themselves to methods defined only in the interface (which users can control). The problem is that "their purpose" changes over time, and in some cases the effects of representation on performance simply can't be out-guessed in advance of actual measurement. If people need to change any more than just the import statement, *then* a single implementation has to be all things to all people. I hate to say this (bet ?), but I suspect the fact that Python's basic types are all builtin and not classes has kept us from fully appreciating the class-based "1 interface, N implementations" approach that C++ and Java hackers are having so much fun with. They're not all that easy to find, but people who have climbed the steep STL learning curve often end up in the same ecstatic trance I used to see only among fellow Pythoneers. > But in the core, there's only room for one set type or module. I don't like the conclusion: it implies there's no room in the core for more than one implementation of anything, yet one-size-fits-all doesn't. I have no problem with the idea that there's only room for one Set *interface* in the core. Then you only need Pronounce on a reasonable set of abstract operations, and leave the implementation tradeoffs to be made by different people in different ways (I've really got no use for Eric's list-based sets; he's really got no use for my sets-of-sets). That said, if there can be at most one, and must be at least one, a hashtable based set is the best compromise there is, and mutable objects as elements should not be supported (they add great implementation complexity for the benefit of relatively few applications). jeremy's-set-class-couldn't-be-accused-of-overkill-ly y'rs - tim From tim.one@home.com Thu Jan 25 07:57:18 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 25 Jan 2001 02:57:18 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123113050.A26162@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > What you get by going with a dictionary representation is that > membership test becomes close to constant-time, while insertion and > deletion become sometimes cheap and sometimes quite expensive > (depending of course on whether you have to allocate a new > hash bucket). Note that Python's dicts aren't vulnerable to that: they use open addressing in a contiguous, preallocated vector. There are no mallocs() or free()s going on for lookups, deletes, or inserts, unless an insert happens to hit a "time to double the size of the vector" boundary. Deletes never cost more than a lookup; inserts never more unless the table-size boundary is hit (one in 2**N unique inserts, at which point N goes up too). > ... > "works for everbody" isn't really possible here. So my solution > does the next best thing -- pick a choice of tradeoffs that isn't > obviously worse than the alternatives and keeps things bog-simple. I agree that this shouldn't be an either/or choice, but if it's going to be forced into that mold I have to protest that the performance of unordered lists would kill most of the set applications I've ever had. I typically have a small number of very large sets (and I'm talking not 100s, but often 100s of 1000s of elements). The relatively large memory burden of a dict representation wouldn't bother me unless I instead had 100s of 1000s of very small sets. which-we-may-happen-in-my-next-life-but-not-in-this-one-ly y'rs - tim From tim.one@home.com Thu Jan 25 08:08:30 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 25 Jan 2001 03:08:30 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <001101c0857a$c0dce420$770a0a0a@nevex.com> Message-ID: [Greg Wilson] > ... > Unfortunately, if values are required to be immutable, then sets of > sets aren't possible... :-( Sure they are. I wrote about how before, and Moshe put up a simple implementation as a SourceForge patch. Not bulletproof, though: "consentng adults". No matter *what* you implement, I'll find *some* way to trick it into believing my sets are immutable , so don't worry about that. Bulletproof is very hard, and is a minority distraction at best. IIRC, SETL had "by value" semantics when inserting a set into another set as an element, and had some exceedingly hairy copy-on-write scheme under the covers to make that bearably quick. That may be wrong, though. Herman Venter's Slim (Sets, Lists and Maps) language does work that way (Guido, Herman was a friend of the departed Stoffel Erasmus, who you may recall fondly from Python's very early days -- if *that* doesn't make sets attractive to you, nothing will ). Ah! Meant to post this before: http://birch.eecs.lehigh.edu/~bacon/setlprog.ps.gz That's a readable and very good intro to SETL Classic. People pondering computerized sets should at least catch up with what was common knowledge 30 years ago . From thomas@xs4all.net Thu Jan 25 09:24:24 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 10:24:24 +0100 Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! In-Reply-To: ; from ping@lfw.org on Wed, Jan 24, 2001 at 12:33:43PM -0800 References: Message-ID: <20010125102424.G962@xs4all.nl> On Wed, Jan 24, 2001 at 12:33:43PM -0800, Ka-Ping Yee wrote: > Please try: > >>> def f(a, (b, c), *d): > ... x = 1 > ... print a, b, c, d, x > ... > >>> f(1, (2, 3), 4) > 1 2 3 > Traceback (most recent call last): > File "", line 1, in ? > File "", line 3, in f > UnboundLocalError: local variable 'd' referenced before assignment > >>> > In Python 1.5.2, this prints "1 2 3 (4,)" as expected. > I only have 1.5.2 and 2.1a1 to test. I hope this problem > isn't present in 2.0... It isn't present in 2.0. This is probably related to Jeremy's changes in the call mechanism or the compiler track, though Jeremy himself is the best person to claim that for sure :) > Note that test_inspect was the only test to fail! It might be the > only test that checks anonymous and *varargs at the same time. > (Yet another reason to put inspect in the core...) Well, this is not an inspect-specific test, so it shouldn't *be* in test_inspect, it should be in test_extcall :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik@effbot.org Thu Jan 25 09:45:31 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 10:45:31 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/lib libwinsound.tex,1.5,1.6 References: Message-ID: <003801c086b3$8ff41560$e46940d5@hagrid> tim accidentally wrote: > \versionadded{1.5.3} % XXX fix this version number when release is scheduled! 1.5.3? time for a 1.5.3 => 1.6 query replace? > fgrep 1.5.3 doc/*/*.tex doc/lib/libcmp.tex:\deprecated{1.5.3}{Use the \module{filecmp} module inste doc/lib/libcmpcache.tex:\deprecated{1.5.3}{Use the \module{filecmp} module ad.} doc/lib/libwinsound.tex: \versionadded{1.5.3} % XXX fix this version number or am I missing something? Cheers /F From tim.one@home.com Thu Jan 25 11:20:18 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 25 Jan 2001 06:20:18 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/lib libwinsound.tex,1.5,1.6 In-Reply-To: <003801c086b3$8ff41560$e46940d5@hagrid> Message-ID: Gotta ask Fred about this one! > or am I missing something? Yes, the Python 1.5.3 release. I use it all the time . From tismer@tismer.com Thu Jan 25 12:22:32 2001 From: tismer@tismer.com (Christian Tismer) Date: Thu, 25 Jan 2001 14:22:32 +0200 Subject: [Python-Dev] Intended to work? (lambda x,y:map(eval, ["x", "y"]))(2,3) Message-ID: <3A701A88.F2C68635@tismer.com> In a function like this: def f(x): return eval("x") , eval uses the local function namespace, and the above works. This is according to chapter 2.3 of the Python library ref. Now on my problem: When eval() is used with map, the same mechanism takes place: def f(x): return map(eval,["x"]) It works the same as the above, because map is a builtin function that does not modify the frame chain, so eval finds the local namespace. Not so with Stackless Python (at the moment), since Stackless map assigns an own frame to map without passing the correct namespaces to it. (Reported by Bernd Rinn) Question: Is this by chance, or is eval() *meant* to function with the local namespace, even if it is executed in the context of a function like map() ? The description of map() does not state whether it has to pass its surrounding namespace to the mapped function, and if one simulates map() by writing one's own python implementation, it will fail exactly like Stackless does today. The same applies to apply(). I think I should fix Stackless here, anyway? ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@digicool.com Thu Jan 25 13:35:12 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 08:35:12 -0500 Subject: [Python-Dev] Re: Intended to work? (lambda x,y:map(eval, ["x", "y"]))(2,3) In-Reply-To: Your message of "Thu, 25 Jan 2001 14:22:32 +0200." <3A701A88.F2C68635@tismer.com> References: <3A701A88.F2C68635@tismer.com> Message-ID: <200101251335.IAA16713@cj20424-a.reston1.va.home.com> > In a function like this: > > def f(x): > return eval("x") > > , eval uses the local function namespace, and the above works. > This is according to chapter 2.3 of the Python library ref. > > Now on my problem: When eval() is used with map, the same > mechanism takes place: > > def f(x): > return map(eval,["x"]) > > It works the same as the above, because map is a builtin function > that does not modify the frame chain, so eval finds the local > namespace. > Not so with Stackless Python (at the moment), since Stackless map > assigns an own frame to map without passing the correct namespaces > to it. (Reported by Bernd Rinn) > > Question: Is this by chance, or is eval() *meant* to function with > the local namespace, even if it is executed in the context of > a function like map() ? Map, being a built-in, is transparent to namespaces. > The description of map() does not state whether it has to pass > its surrounding namespace to the mapped function, and if one > simulates map() by writing one's own python implementation, > it will fail exactly like Stackless does today. The same > applies to apply(). So you can't simulate a built-in. > I think I should fix Stackless here, anyway? Yes. Note: beware of Jeremy's nested scopes. That adds a whole slew of namespaces! (But eval() is more crippled there.) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Thu Jan 25 15:20:45 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 10:20:45 -0500 (EST) Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! In-Reply-To: <20010125102424.G962@xs4all.nl> References: <20010125102424.G962@xs4all.nl> Message-ID: <14960.17485.549337.5476@localhost.localdomain> >>>>> "TW" == Thomas Wouters writes: TW> On Wed, Jan 24, 2001 at 12:33:43PM -0800, Ka-Ping Yee wrote: >> Please try: >> >>> def f(a, (b, c), *d): >> ... x = 1 ... print a, b, c, d, x ... >> >>> f(1, (2, 3), 4) >> 1 2 3 Traceback (most recent call last): File "", line 1, >> in ? File "", line 3, in f UnboundLocalError: local >> variable 'd' referenced before assignment >> >>> >> In Python 1.5.2, this prints "1 2 3 (4,)" as expected. >> I only have 1.5.2 and 2.1a1 to test. I hope this problem isn't >> present in 2.0... TW> It isn't present in 2.0. This is probably related to Jeremy's TW> changes in the call mechanism or the compiler track, though TW> Jeremy himself is the best person to claim that for sure :) The bug is in the compiler. It creates varnames while it is parsing the argument list. While I got the handling of the anonymous tuples right, I forgot to insert *varargs or **kwargs in varnames *before* the names defined in the tuple. I will fix it real soon now. >> Note that test_inspect was the only test to fail! It might be >> the only test that checks anonymous and *varargs at the same >> time. (Yet another reason to put inspect in the core...) TW> Well, this is not an inspect-specific test, so it shouldn't *be* TW> in test_inspect, it should be in test_extcall :) It should probably be in test_grammar. The ext call mechanism is only invoked when the caller uses a form like 'f(*arg)'. Perhaps the name "ext call" isn't very clear. Jeremy From esr@thyrsus.com Thu Jan 25 16:19:36 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 11:19:36 -0500 Subject: [Python-Dev] Waiting method for file objects Message-ID: <20010125111936.A23512@thyrsus.com> --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I have been researching the question of how to ask a file descriptor how much data it has waiting for the next sequential read, with a view to discovering what cross-platform behavior we could count on for a hypothetical `waiting' method in Python's built-in file class. 1: Why bother? I have these main applications in mind: 1. Detecting EOF on a static plain file. 2. Non-blocking poll of a socket opened in non-blocking mode. 3. Non-blocking poll of a FIFO opened in non-blocking mode. 4. Non-blocking poll of a terminal device opened in non-blocking mode. These are all frequently requested capabilities on C newsgroups -- how often have *you* seen the "how do I detect an individual keypress" question from beginning programmers? I believe having these capabilities would substantially enhance Python's appeal. 2: What would be under the hood? Summary: We can do this portably, and we can do it with only one (1) new #ifdef. Our tools for this purpose will be the fstat(2) st_size field and the FIONREAD ioctl(2) call. They are complementary. In all supposedly POSIX-conformant environments I know of, the st_size field has a documented meaning for plain files (S_IFREG) and may or may not give a meaningful number for FIFOs, sockets, and tty devices. The Single Unix Specification is silent on the meaning of st_size for file types other than regular files (S_IFREG). I have filed a defect report about this with OpenGroup and am discussing appropriate language with them. (The last sentence of the Inferno operating system's language on stat(2) is interesting: "If the file resides on permanent storage and is not a directory, the length returned by stat is the number of bytes in the file. For directories, the length returned is zero. Some devices report a length that is the number of bytes that may be read from the device without blocking.") The FIONREAD ioctl(2) call, on the other hand, returns bytes waiting on character devices such as FIFOs, sockets, or ttys -- but does not return a useful value for files or directories or block devices. The FIONREAD ioctl was supported in both SVr4 and 4.2BSD. It's present in all the open-source Unixes, SunOS, Solaris, and AIX. Via Google search I have discovered that it's also supported in the Windows Sockets API and the GUSI POSIX libraries for the Macintosh. Thus, it can be considered portable for Python's purposes even though it's rather sparsely documented. I was able to obtain confirming information on Linux from Linus Torvalds himself. My information on Windows and the Mac is from Gavriel State, formerly a lead developer on Corel's WINE team and a programmer with extensive cross-platform experience. Gavriel reported on the MSCRT POSIX environment, on the Metrowerks Standard Library POSIX implementation for the Mac, and on the GUSI POSIX implementation for the Mac. 2.1: Plain files Torvalds and State confirm that for plain files (S_IFREG) the st_size field is reliable on all three platforms. On the Mac it gives the file's data fork size. One apparent difficulty with the plain-file case is that POSIX does not guarantee anything about seek_t quantities such as lseek(2) returns and the st_size field except that they can be compared for equality. Thus, under the strict letter of POSIX law, `waiting' can be used to detect EOF but not to get a reliable read-size return in any other file position. Fortunately, this is less an issue than it appears. The weakness of the POSIX language was a 1980s-era concession to a generation of mainframe operating systems with record-oriented file structures -- all of which are now either thoroughly obsolete or (in the case of IBM VM/CMS) have become Linux emulators :-). On modern operating systems under which files have character granularity, stat(2) emulations can be and are written to give the right result. 2.2: Block devices The directory case (S_IFDIR) is a complete loss. Under Unixes, including Linux, the fstat(2) size field gives the allocated size of the directory as if it were a plain file. Under MSCRT POSIX the meaning is undocumented and unclear. Metroworks returns garbage. GUSI POSIX returns the number of files in the directory! FIONREAD cannot be used on directories. Block devices (S_IFBLK) are a mess again. Linus points out that a system with removable or unmountable volumes *cannot* return a useful st_size field -- what happens when the device is dismounted? 2.3: Character devices Pipes and FIFOs (S_IFIFO) look better. On MSCRT the fstat(2) size field returns the number of bytes waiting to be read. This is also true under current Linuxes, though Torvalds says it is "an implementation detail" and recommends polling with the FIONREAD ioctl instead. Fortunately, FIONREAD is available under Unix, Windows, and the Mac. Sockets (S_IFSOCK) look better too. Under Linux, the fstat(2) size field gives number of bytes waiting. Torvalds again says this is "an implementation detail" and recommends polling with the FIONREAD ioctl. Neither MSCRT POSIX nor Metroworks has direct support for sockets. GUSI POSIX returns 1 (!) in the st_size field. But FIONREAD is available under Unix, Windows, and the GUSI POSIX libraries on the Mac. Character devices (S_IFCHR) can be polled with FIONREAD. This technique has a long history of use with tty devices under Unix. I don't know whether it will work with the equivalents of terminal devices for Windows and the Mac. Fortunately this is not a very important question, as those are GUI environments with the terminal devices are rarely if ever used. 3. How does this turn into Python? The upshot of our portability analysis is that by using FIONREAD and fstat(2), we can get useful results for plain files, pipes, and sockets on all three platforms. Directories and block devices are a complete loss. Character devices (in particular, ttys) we can poll reliably under Unix. What we'll get polling the equivalents of tty or character devices under Windows and the Mac is presently unknown, but also unimportant. My proposed semantics for a Python `waiting' method is that it reports the amount of data that would be returned by a read() call at the time of the waiting-method invocation. The interpreter throws OSError if such a report is impossible or forbidden. I have enclosed a patch against the current CVS sources, including documentation. This patch is tested and working against plain files, sockets, and FIFOs under Linux. I have also attached the Python test program I used under Linux. I would appreciate it if those of you on Windows and Macintosh machines would test the waiting method. The test program will take some porting, because it needs to write to a FIFO in background. Under Linux I do it this way: (echo -n '%s' >testfifo; echo 'Data written to FIFO.') & I don't know how to do the equivalent under Windows or Mac. When you run this program, it will try to mail me your test results. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Description: Patch implementing the waiting method Content-Disposition: attachment; filename="waiting.patch" Index: fileobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/fileobject.c,v retrieving revision 2.108 diff -c -r2.108 fileobject.c *** fileobject.c 2001/01/18 03:03:16 2.108 --- fileobject.c 2001/01/25 16:16:10 *************** *** 35,40 **** --- 35,44 ---- #include #endif + #ifndef DONT_HAVE_IOCTL_H + #include + #endif + typedef struct { PyObject_HEAD *************** *** 423,428 **** --- 427,513 ---- } static PyObject * + file_waiting(PyFileObject *f, PyObject *args) + { + struct stat stbuf; + #ifdef HAVE_FSTAT + int ret; + #endif + + if (f->f_fp == NULL) + return err_closed(); + if (!PyArg_NoArgs(args)) + return NULL; + #ifndef HAVE_FSTAT + PyErr_SetString(PyExc_OSError, "fstat(2) is not available."); + clearerr(f->f_fp); + return NULL; + #else + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = fstat(fileno(f->f_fp), &stbuf); + Py_END_ALLOW_THREADS + if (ret == -1) { /* the fstat failed */ + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } else if (S_ISDIR(stbuf.st_mode) || S_ISBLK(stbuf.st_mode)) { + PyErr_SetString(PyExc_IOError, + "Can't poll a block device or directory."); + clearerr(f->f_fp); + return NULL; + } else if (S_ISREG(stbuf.st_mode)) { /* plain file */ + #if defined(HAVE_LARGEFILE_SUPPORT) && SIZEOF_OFF_T < 8 && SIZEOF_FPOS_T >= 8 + fpos_t pos; + #else + off_t pos; + #endif + Py_BEGIN_ALLOW_THREADS + errno = 0; + pos = _portable_ftell(f->f_fp); + Py_END_ALLOW_THREADS + if (pos == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + #if !defined(HAVE_LARGEFILE_SUPPORT) + return PyInt_FromLong(stbuf.st_size - pos); + #else + return PyLong_FromLongLong(stbuf.st_size - pos); + #endif + } else if (S_ISFIFO(stbuf.st_mode) + || S_ISSOCK(stbuf.st_mode) + || S_ISCHR(stbuf.st_mode)) { /* stream device */ + #ifndef FIONREAD + PyErr_SetString(PyExc_OSError, + "FIONREAD is not available."); + clearerr(f->f_fp); + return NULL; + #else + int waiting; + + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = ioctl(fileno(f->f_fp), FIONREAD, &waiting); + Py_END_ALLOW_THREADS + if (ret == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + + return Py_BuildValue("i", waiting); + #endif /* FIONREAD */ + } else { /* should never happen! */ + PyErr_SetString(PyExc_OSError, "Unknown file type."); + clearerr(f->f_fp); + return NULL; + } + #endif /* HAVE_FSTAT */ + } + + static PyObject * file_fileno(PyFileObject *f, PyObject *args) { if (f->f_fp == NULL) *************** *** 1263,1268 **** --- 1348,1354 ---- {"truncate", (PyCFunction)file_truncate, 1}, #endif {"tell", (PyCFunction)file_tell, 0}, + {"waiting", (PyCFunction)file_waiting, 0}, {"readinto", (PyCFunction)file_readinto, 0}, {"readlines", (PyCFunction)file_readlines, 1}, {"xreadlines", (PyCFunction)file_xreadlines, 1}, --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Description: Test program for the waiting method Content-Disposition: attachment; filename="waiting_test.py" #!/usr/bin/env python import sys, os, random, string, time, socket, smtplib, readline print "This program tests the `waiting' method of file objects." fp = open("waiting_test.py") if hasattr(fp, "waiting"): print "Good, you're running a patched Python with `waiting' available." else: print "You haven't installed the `waiting' patch yet. This won't work." sys.exit(1) successes = "" failures = "" nogo = "" print "" print "First, plain files:" filesize = fp.waiting() print "There are %d bytes waiting to be read in this file." % filesize if os.name == 'posix': os.system("ls -l waiting_test.py") print "That should match the number in the ls listing above." else: print "Please check this with your OS's directory tools." get = random.randrange(fp.waiting()) print "I'll now read a random number (%d) of bytes." % get fp.read(get) print "The waiting method sees %d bytes left." % fp.waiting() if get + fp.waiting() == filesize: print "%d + %d = %d. That's consistent. Test passed." % \ (get, fp.waiting(), filesize) successes += "Plain file random-read test passed.\n" else: print "That's not consistent. Test failed." failures += "Plain file random-read test failed\n" print "Now let's see if we can detect EOF reliably." fp.read() left = fp.waiting() print "I'll do a read()...the waiting method now returns %d" % left if left == 0: print "That looks like EOF." successes += "Plain file EOF test passed.\n" else: print "%d bytes left. Test failed." % left failures += "Plain file EOF test failed\n" fp.close() print "" print "Now sockets:" print "Connecting to imap.netaxs.com's IMAP server now..." sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) file = sock.makefile('rb') sock.connect(("imap.netaxs.com", 143)) print "Waiting a few seconds to avoid a race condition..." time.sleep(3) greetsize = file.waiting() print "There appear to be %d bytes waiting..." % greetsize greeting = file.readline() print "I just read the greeting line..." sys.stdout.write(greeting) if len(greeting) == greetsize: print "...and the size matches. Test passed." successes += "Socket test passed.\n" else: print "That's not right. Test failed." failures += "Socket test failed.\n" sock.close() print "" if not hasattr(os, "mkfifo"): print "Your platform doesn't have FIFOs (mkfifo() is absent), so I can't test them." nogo = "FIFO test could not be performed." else: print "Now FIFOs:" print "I'm making a FIFO named testfifo."; os.mkfifo("testfifo") str = string.letters[:random.randrange(len(string.letters))] print "I'm going to send it the following string '%s' of random length %d:" \ % (str, len(str),) # Note: Unix dependency here! os.system("(echo -n '%s' >testfifo; echo 'Data written to FIFO.') &" % str) fp = open("testfifo", "r") print "Waiting a few seconds to avoid a race condition..." time.sleep(3) ready = fp.waiting() print "I see %d bytes waiting in the FIFO." % ready if ready == len(str): print "That's consistent. Test passed." successes += "FIFO test passed.\n" else: print "That's not consistent. Test failed." failures += "FIFO test failed\n" os.remove("testfifo") print "\nSummary:" report = "Platform is: %s, version is %s\n" % (sys.platform, sys.version) if successes: report += "The following tests succeeded:\n" + successes if failures: report += "The following tests failed:\n" + failures if nogo: report += "The following tests could not be performed:\n" + nogo if not nogo: report += "No tests were skipped.\n" if not failures: report += "All tests succeeded.\n" print report if os.name == 'posix': me = os.environ["USER"] + "@" + socket.getfqdn() else: me = raw_input("Enter your emasil address, please?") try: server = smtplib.SMTP('localhost') report = ("From: %s\nTo: esr@thyrsus.com\nSubject: waiting_test\n\n" % me) + report server.sendmail(me, ["esr@thyrsus.com"], report) server.quit() except: print "The attempt to mail your test result failed.\n" --UugvWAfsgieZRqgk-- From esr@snark.thyrsus.com Thu Jan 25 16:46:20 2001 From: esr@snark.thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 11:46:20 -0500 Subject: [Python-Dev] Documentation patch for waiting method. Message-ID: <200101251646.f0PGkKM23567@snark.thyrsus.com> Index: libstdtypes.tex =================================================================== RCS file: /cvsroot/python/python/dist/src/Doc/lib/libstdtypes.tex,v retrieving revision 1.50 diff -u -r1.50 libstdtypes.tex --- libstdtypes.tex 2001/01/17 01:18:00 1.50 +++ libstdtypes.tex 2001/01/25 16:46:40 @@ -1142,6 +1142,24 @@ \UNIX{} versions support this operation). \end{methoddesc} +\begin{methoddesc}[file]{waiting}{} + Return the number of bytes waiting to be read from this file object. + For regular files, this returns the size of the file in bytes minus + the current seek address, as would be returned by \method{tell()}; a + zero return can be used to detect EOF. For streams such as FIFOs, + sockets, Unix ttys, and other Unix character devices, this method + returns the number of bytes currently buffered up and waiting to be + read. Attempts to call this method on Unix block devices or + on directories will raise an error. + \footnote{The \method{waiting()} method uses + \cfunction{fstat(2)} and \cfunction{lseek(2)} on plain files; + these should be reliable on all of Unix, Windows, and MacOS. + It uses the FIONREAD ioctl(2) call to query FIFOs, sockets, + Unix ttys, and other POSIX character devices; FIFO and socket + behavior should be consistent across all three platforms, but + the results from querying other character devices may vary.} +\end{methoddesc} + \begin{methoddesc}[file]{write}{str} Write a string to the file. There is no return value. Note: Due to buffering, the string may not actually show up in the file until -- Eric S. Raymond "To disarm the people... was the best and most effectual way to enslave them." -- George Mason, speech of June 14, 1788 From fredrik@effbot.org Thu Jan 25 19:23:50 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 20:23:50 +0100 Subject: [Python-Dev] Fw: random.py gives wrong results (+ a solution) Message-ID: <00f701c08704$59bde510$e46940d5@hagrid> I'm pretty sure Tim's seen this already, but just in case... ----- Original Message ----- From: "Ivan Frohne" Newsgroups: comp.lang.python Sent: Thursday, January 25, 2001 5:20 PM Subject: Re: random.py gives wrong results (+ a solution) > > "Janne Sinkkonen" wrote in message > news:m3u26oy1rw.fsf@kinos.nnets.fi... > > > > At least in Python 2.0 and earlier, the samples returned by the > > function betavariate() of random.py are not from a beta distribution > > although the function name misleadingly suggests so. > > > > The following would give beta-distributed samples: > > > > def betavariate(alpha, beta): > > y = gammavariate(alpha,1) > > if y==0: return 0.0 > > else: return y/(y+gammavariate(beta,1)) > > > > This is from matlab. A comment in the original matlab code refers to > > Devroye, L. (1986) Non-Uniform Random Variate Generation, theorem 4.1A > > (p. 430). Another reference would be Gelman, A. et al. (1995) Bayesian > > data analysis, p. 481, which I have checked and found to agree with > > the code above. > > > I'm convinced that Janne Sinkkonen is right: The beta distribution > generator in module random.py does not return Beta-distributed > random numbers. Janne's suggested fix should work just fine. > > Here's my guess on how and why this bug bit -- it won't be of interest to > most but > this subject is so obscure sometimes that there needs to be a detailed > analysis. > > The probability density function of the gamma distribution with (positive) > parameters > A and B is usually written > > g(x; A, B) = (x**(A-1) * exp(x/B)) / (Gamma(A) * B**A), where x, A, and > B > 0. > > Here Gamma(A) is the gamma function -- for A a positive integer, Gamma(A) is > the > factorial of A - 1, Gamma(A) = (A-1)!. In fact, this is the definition used > by the authors of random.py in defining gammavariate(alpha, beta), the gamma > distribution random number generator. > > Now it happens that a gamma-distributed random variable with parameters A = > 1 and > B has the (much simpler) exponential distribution with density function > > g(x; 1, B) = exp(-x/B) / B. > > Keep that in mind. > > The reference "Discrete Event Simulation in ," by Kevin Watkins > (McGraw-Hill, 1993) > was consulted by the random.py authors. But this reference defines the > gamma probability distribution a little differently, as > > g1(x; A, B) = (B**A * x**(A-1) * exp(B*x)) / Gamma(A), where x, A, B > > 0. > > (See p. 85). On page 87, Watkins states (incorrectly) that if grv(A, B) is > a function which > returns a gamma random variable with parameters A and B (using his > definition on p. 85), > then the function > > brv(A, B) = grv(1, 1/B) / ( grv(1, 1/B) + grv(1, A) ) [ not > true!] > > will return a random variable which has the beta distribution with > parameters A and B. > > Believing Watkins to be correct, the random.py authors remembered that a > gamma > random variable with parameter A = 1 is just an exponential random variable > and > further simplified their beta generator to > > brv(A, B) = erv(1/B) / (erv(1/B) + erv(A)), where erv(K) is a random > variable > > having the exponential distribution with > > parameter K. > > The corrected equation for a beta random variable, using Watkins' definition > of the > gamma density, is > > brv(A, B) = grv(A, 1) / ( grv(A, 1) + grv(1/B, 1) ), > > which translates to > > brv(A, B) = grv(A, 1) / (grv(A, 1) + grv(B, 1) > > using the more common gamma density definition (the one used in random.py). > Many standard statistical references give this equation -- two are > "Non-Uniform random Variate Generation," by Luc Devroye, Springer-Verlag, > 1986, > p. 432, and "Monte Carlo Concepts, Algorithms and Applications," by > George S. Fishman, Springer, 1996, p. 200. > > --Ivan Frohne > > > > > >>> > > > > From jeremy@alum.mit.edu Thu Jan 25 17:13:03 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 12:13:03 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <20010124073155.B32266@glacier.fnational.com> References: <20010124073155.B32266@glacier.fnational.com> Message-ID: <14960.24223.599357.388059@localhost.localdomain> Neil, What would it take to add useful dependency information to the Makefile? Or does it already exist? When I was working the nested scopes, building was tedious at times because a change to funcobject.h meant that, e.g., newmodule.c needed to be recompiled. The Makefiles didn't capture that information, so I had been adding it to the individual Makefiles, e.g. newmodule.o: newmodule.c ../Include/funcobject.h (I think this worked.) It would be great if the Makefile captured all the dependencies. Could we just use makedepend? Jeremy From MarkH@ActiveState.com Thu Jan 25 19:43:35 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Thu, 25 Jan 2001 11:43:35 -0800 Subject: [Python-Dev] Waiting method for file objects In-Reply-To: <20010125111936.A23512@thyrsus.com> Message-ID: > I would appreciate it if those of you on Windows and Macintosh > machines would test the waiting method. The test program will take > some porting, because it needs to write to a FIFO in background. This didn't compile under Windows. I have a patch (against CVS) that compiles, but doesnt appear to work (and will be forwarded to Eric under seperate cover) [news flash :-) Changing the open call to add "rb" as the mode makes it work - text v binary bites again] I didn't try any sort of fifo test. The sockets test failed with a socket error, but would certainly have failed had the socket connected, as my patch includes: #ifndef S_ISSOCK # define S_ISSOCK(mode) (0) #endif I have no idea if it managed to mail the results, but I guess not, so the output is below. The test file (after some small mods, including the "rb" param) is indeed 4252 bytes long. Hope this is useful! Mark. This program tests the `waiting' method of file objects. Good, you're running a patched Python with `waiting' available. First, plain files: There are 4252 bytes waiting to be read in this file. Please check this with your OS's directory tools. I'll now read a random number (3091) of bytes. The waiting method sees 1161 bytes left. 3091 + 1161 = 4252. That's consistent. Test passed. Now let's see if we can detect EOF reliably. I'll do a read()...the waiting method now returns 0 That looks like EOF. Now sockets: Connecting to imap.netaxs.com's IMAP server now... Traceback (most recent call last): File "c:\temp\waiting_test.py", line 57, in ? sock.connect(("imap.netaxs.com", 143)) File "", line 1, in connect socket.error: (10060, 'Operation timed out') From nas@arctrix.com Thu Jan 25 13:07:53 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 05:07:53 -0800 Subject: [Python-Dev] Makefile changes In-Reply-To: <14960.24223.599357.388059@localhost.localdomain>; from jeremy@alum.mit.edu on Thu, Jan 25, 2001 at 12:13:03PM -0500 References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> Message-ID: <20010125050753.A1573@glacier.fnational.com> On Thu, Jan 25, 2001 at 12:13:03PM -0500, Jeremy Hylton wrote: > What would it take to add useful dependency information to the > Makefile? Or does it already exist? Some of it exists but I don't think its complete. > When I was working the nested scopes, building was tedious at times > because a change to funcobject.h meant that, e.g., newmodule.c needed > to be recompiled. The Makefiles didn't capture that information, so I > had been adding it to the individual Makefiles, e.g. > > newmodule.o: newmodule.c ../Include/funcobject.h > > (I think this worked.) Hmm, I don't think so. Which makefile did you add this to? Are you using the new makefile? The Makefile.pre.in file contains a line like: $(LIBRARY_OBJS) $(MAINOBJ): $(PYTHON_HEADERS) but newmodule.o not in LIBRARY_OBJS. By default its not compiled by make but with distutils. If you add newmodule to Setup then a line like: Modules/newmodule.o: $(PYTHON_HEADERS) would do the trick. I think I will add a line like: $(MODOBJS): $(PYTHON_HEADERS) to fix the problem. I could easily restore the mkdep target but my feeling right now that explicitly including the header dependencies is better. What do you think? Neil From jeremy@alum.mit.edu Thu Jan 25 20:02:46 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 15:02:46 -0500 (EST) Subject: [Python-Dev] PEP 227 checkins to follow Message-ID: <14960.34406.342961.834827@localhost.localdomain> I am about to check in the changes that implemention PEP 227. There are many changes, which I will make via separate commits. You might want to wait until the checkins are done to do an update. I'll send a note when I'm done. I also wanted to mention that the PEP has fallen a little out of date. There are a few wrinkles that it doesn't deal with, e.g. def f(x): def g(y): return x + y del x return g For now, this raises a SyntaxError. I'll flesh out the PEP to reflect the current implemention and spec out some of the less obvious cases. I'd welcome any comments on the code itself. I know there are a number of rough edges and also, most likely, a bunch of memory leaks. I'll be working to clean things up before 2.1a2, but wanted to get the code into CVS ASAP. Jeremy From jeremy@alum.mit.edu Thu Jan 25 20:15:01 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 15:15:01 -0500 (EST) Subject: [Python-Dev] checkins done for PEP 227 Message-ID: <14960.35141.237252.468467@localhost.localdomain> It looks like python-dev is very slow, so you'll see my original warning well after the checkins occurred. Oh, well. They're done. Jeremy From tim.one@home.com Thu Jan 25 20:58:03 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 25 Jan 2001 15:58:03 -0500 Subject: [Python-Dev] Fw: random.py gives wrong results (+ a solution) Message-ID: [/F, fwds a c.l.py claim that random.betavariate is dead wrong] Not to worry; I had already entered that into the SF bug database and assigned it to me (hmm: why would you send it to Python-Dev instead of putting it in the database?). I suspect he's correct, and, more importantly, so does Ivan Frohne. We'll settle it before 2.1a2, but perhaps not today. Alas, I have no idea where the original code came from ("Guido" isn't a useful answer -- he was just converting somebody else's C++ code to Python). From fredrik@effbot.org Thu Jan 25 20:42:05 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 21:42:05 +0100 Subject: [Python-Dev] Waiting method for file objects References: <20010125111936.A23512@thyrsus.com> Message-ID: <01fb01c0870f$48517110$e46940d5@hagrid> eric wrote: > Fortunately, this is less an issue than it appears. only if you ignore Windows... -1 on making this a file method +0 on adding it as an optional support function to the os module. From martin@mira.cs.tu-berlin.de Thu Jan 25 20:42:39 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 25 Jan 2001 21:42:39 +0100 Subject: [Python-Dev] jeremy@alum.mit.edu Message-ID: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de> > It would be great if the Makefile captured all the dependencies. That would be great, yes. However, setup.py should probably also consider dependencies. > Could we just use makedepend? Not sure. Certainly not in the build process. I dislike distributions which, as the first thing, perform dependency generation. Dependencies change less often than the actual source, so it is should be sufficient to update them manually. Furthermore, generated files as part of the CVS repository fail to work properly unless everybody uses the exact same generator. For autoconf alone, that's a problem because of multiple autoconf versions. I don't know how many different makedepend versions are in use. Regards, Martin From tim.one@home.com Thu Jan 25 21:02:11 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 25 Jan 2001 16:02:11 -0500 Subject: [Python-Dev] Windows compile broken Message-ID: Linking... Creating library ./python21.lib and object ./python21.exp ceval.obj : error LNK2001: unresolved external symbol _PyCell_Set ceval.obj : error LNK2001: unresolved external symbol _PyCell_Get frameobject.obj : error LNK2001: unresolved external symbol _PyCell_New ./python21.dll : fatal error LNK1120: 3 unresolved externals Error executing link.exe. Sorry if this has already been discussed. I don't see mention of it in the Python-Dev archive, and my email is almost worse than useless (random delays of minutes to days, due to what appears to be the simultaneous worldwide wedging of every email server servicing every email account I have). From esr@thyrsus.com Thu Jan 25 21:12:25 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 16:12:25 -0500 Subject: [Python-Dev] Waiting method for file objects In-Reply-To: <01fb01c0870f$48517110$e46940d5@hagrid>; from fredrik@effbot.org on Thu, Jan 25, 2001 at 09:42:05PM +0100 References: <20010125111936.A23512@thyrsus.com> <01fb01c0870f$48517110$e46940d5@hagrid> Message-ID: <20010125161225.A24305@thyrsus.com> Fredrik Lundh : > > Fortunately, this is less an issue than it appears. > > only if you ignore Windows... I don't understand this. Explain? -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From esr@thyrsus.com Thu Jan 25 21:13:31 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 16:13:31 -0500 Subject: [Python-Dev] jeremy@alum.mit.edu In-Reply-To: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 25, 2001 at 09:42:39PM +0100 References: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de> Message-ID: <20010125161331.B24305@thyrsus.com> Martin v. Loewis : > Not sure. Certainly not in the build process. I dislike distributions > which, as the first thing, perform dependency generation. Dependencies > change less often than the actual source, so it is should be > sufficient to update them manually. Furthermore, generated files as > part of the CVS repository fail to work properly unless everybody uses > the exact same generator. For autoconf alone, that's a problem because > of multiple autoconf versions. I don't know how many different > makedepend versions are in use. Easily solved -- there are script versions of makedepend we can just ship with the distribution. -- Eric S. Raymond Morality is always the product of terror; its chains and strait-waistcoats are fashioned by those who dare not trust others, because they dare not trust themselves, to walk in liberty. -- Aldous Huxley From mal@lemburg.com Thu Jan 25 21:26:04 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 25 Jan 2001 22:26:04 +0100 Subject: [Python-Dev] Windows compile broken References: Message-ID: <3A7099EC.81689EA5@lemburg.com> Tim Peters wrote: > > Linking... > Creating library ./python21.lib and object ./python21.exp > ceval.obj : error LNK2001: unresolved external symbol _PyCell_Set > ceval.obj : error LNK2001: unresolved external symbol _PyCell_Get > frameobject.obj : error LNK2001: unresolved external symbol _PyCell_New > ./python21.dll : fatal error LNK1120: 3 unresolved externals > Error executing link.exe. > > Sorry if this has already been discussed. I don't see mention of it in the > Python-Dev archive, and my email is almost worse than useless (random delays > of minutes to days, due to what appears to be the simultaneous worldwide > wedging of every email server servicing every email account I have). These must be related to checkins by Jeremy and his nested scopes... (I knew these would get us into trouble ;-) I think Jeremy forgot to check in the needed change for Objects/Makefile.in and probably the Windows project file is missing the new object type too (Objects/cellobject.c). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy@alum.mit.edu Thu Jan 25 21:14:52 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 16:14:52 -0500 (EST) Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A7099EC.81689EA5@lemburg.com> References: <3A7099EC.81689EA5@lemburg.com> Message-ID: <14960.38732.773129.793360@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> Tim Peters wrote: >> >> Linking... Creating library ./python21.lib and object >> ./python21.exp ceval.obj : error LNK2001: unresolved external >> symbol _PyCell_Set ceval.obj : error LNK2001: unresolved external >> symbol _PyCell_Get frameobject.obj : error LNK2001: unresolved >> external symbol _PyCell_New ./python21.dll : fatal error LNK1120: >> 3 unresolved externals Error executing link.exe. >> >> Sorry if this has already been discussed. I don't see mention of >> it in the Python-Dev archive, and my email is almost worse than >> useless (random delays of minutes to days, due to what appears to >> be the simultaneous worldwide wedging of every email server >> servicing every email account I have). MAL> These must be related to checkins by Jeremy and his nested MAL> scopes... (I knew these would get us into trouble ;-) Just you wait and see! MAL> I think Jeremy forgot to check in the needed change for MAL> Objects/Makefile.in and probably the Windows project file is MAL> missing the new object type too (Objects/cellobject.c). That's right. I didn't change the Makefile in Objects or do anything with Windows. Don't know how to do the latter, but perhaps Tim will stop by my desk next week and show me. As for the Makefile, I thought I saw a message from Neil saying not to update those anymore. Jeremy From nas@arctrix.com Thu Jan 25 15:10:56 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 07:10:56 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include cellobject.h,NONE,2.1 Python.h,2.30,2.31 In-Reply-To: ; from jhylton@users.sourceforge.net on Thu, Jan 25, 2001 at 12:04:16PM -0800 References: Message-ID: <20010125071056.A2390@glacier.fnational.com> On Thu, Jan 25, 2001 at 12:04:16PM -0800, Jeremy Hylton wrote: > A cell contains a reference to a single PyObject. It could be > implemented as a mutable, one-element sequence, but the separate type > has less overhead. Can this object be involved in reference cycles? If so, it should probably have the GC methods added to it. Neil From jeremy@alum.mit.edu Thu Jan 25 21:42:04 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 16:42:04 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include cellobject.h,NONE,2.1 Python.h,2.30,2.31 In-Reply-To: <20010125071056.A2390@glacier.fnational.com> References: <20010125071056.A2390@glacier.fnational.com> Message-ID: <14960.40364.594582.353511@localhost.localdomain> >>>>> "NS" == Neil Schemenauer writes: NS> On Thu, Jan 25, 2001 at 12:04:16PM -0800, Jeremy Hylton wrote: >> A cell contains a reference to a single PyObject. It could be >> implemented as a mutable, one-element sequence, but the separate >> type has less overhead. NS> Can this object be involved in reference cycles? If so, it NS> should probably have the GC methods added to it. It's already there. (Last five lines of cellobject.c quoted as proof.) > Py_TPFLAGS_DEFAULT | Py_TPFLAGS_GC, /* tp_flags */ > 0, /* tp_doc */ > (traverseproc)cell_traverse, /* tp_traverse */ > (inquiry)cell_clear, /* tp_clear */ >}; From nas@arctrix.com Thu Jan 25 15:19:22 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 07:19:22 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A7099EC.81689EA5@lemburg.com>; from mal@lemburg.com on Thu, Jan 25, 2001 at 10:26:04PM +0100 References: <3A7099EC.81689EA5@lemburg.com> Message-ID: <20010125071922.B2390@glacier.fnational.com> On Thu, Jan 25, 2001 at 10:26:04PM +0100, M.-A. Lemburg wrote: > I think Jeremy forgot to check in the needed change for > Objects/Makefile.in That file is dead. Should I remove it now? I haven't heard any major complaints about Makefile.pre.in yet. Maybe the messages are all sitting in the python.org mail spool. Barry, what the hell is going on? You need to drop that Postfix crap and get qmail. :-) Neil From thomas@xs4all.net Thu Jan 25 22:19:37 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 23:19:37 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include modsupport.h,2.35,2.36 In-Reply-To: ; from fdrake@users.sourceforge.net on Thu, Jan 25, 2001 at 02:13:36PM -0800 References: Message-ID: <20010125231937.I962@xs4all.nl> On Thu, Jan 25, 2001 at 02:13:36PM -0800, Fred L. Drake wrote: > The addition of new parameters to functions in the Python/C API requires > that PYTHON_API_VERSION be incremented. When we update the API version, isn't it time to clean up the TP_HASFEATURE stuff ? Since we updated the API, all the current slots should be there, right ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Thu Jan 25 22:32:32 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 17:32:32 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include modsupport.h,2.35,2.36 In-Reply-To: Your message of "Thu, 25 Jan 2001 23:19:37 +0100." <20010125231937.I962@xs4all.nl> References: <20010125231937.I962@xs4all.nl> Message-ID: <200101252232.RAA20013@cj20424-a.reston1.va.home.com> > > The addition of new parameters to functions in the Python/C API requires > > that PYTHON_API_VERSION be incremented. > > When we update the API version, isn't it time to clean up the TP_HASFEATURE > stuff ? Since we updated the API, all the current slots should be there, > right ? No, we're issuing a warning about old API versions but still try to work with them. After all most extensions don't create frame or code objects. I added the flags for the tp_richcompare field when I tried 2.1a1 with Zope's ExtensionClasses and Acquisition modules. Turns out I cot a core dump, while 2.1 ran flawlessly. The reason: they have their own type struct which has the same lay-out as the Python 1.5.2 (or even older) type struct, followed by fields of their own. They have the tp_flags field set to 0, so up to 2.0, it was compatible. I expect that 2.1a2 will work with the unchanged Zope code because of the flag I added. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Jan 25 23:04:54 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 00:04:54 +0100 Subject: [Python-Dev] Windows compile broken References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> Message-ID: <3A70B116.12BF756B@lemburg.com> Neil Schemenauer wrote: > > On Thu, Jan 25, 2001 at 10:26:04PM +0100, M.-A. Lemburg wrote: > > I think Jeremy forgot to check in the needed change for > > Objects/Makefile.in > > That file is dead. Should I remove it now? I haven't heard any > major complaints about Makefile.pre.in yet. What about that file ? Are you saying that Makefile.pre.in will no longer work in 2.1 ??? Please don't remove that mechanism -- it has been in use for quite a while and is much more stable than distutils. We should at least wait a few more distutils releases for the dust to settle before removing the old fallback solution. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@digicool.com Thu Jan 25 23:06:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 18:06:40 -0500 Subject: [Python-Dev] Windows compile broken In-Reply-To: Your message of "Fri, 26 Jan 2001 00:04:54 +0100." <3A70B116.12BF756B@lemburg.com> References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> Message-ID: <200101252306.SAA20173@cj20424-a.reston1.va.home.com> > > That file is dead. Should I remove it now? I haven't heard any > > major complaints about Makefile.pre.in yet. > > What about that file ? Are you saying that Makefile.pre.in > will no longer work in 2.1 ??? > > Please don't remove that mechanism -- it has been in use for > quite a while and is much more stable than distutils. We should > at least wait a few more distutils releases for the dust to > settle before removing the old fallback solution. Let's at least mark it clearly as obsolete though -- it's a pain to maintain. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@arctrix.com Thu Jan 25 16:31:28 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 08:31:28 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A70B116.12BF756B@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 12:04:54AM +0100 References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> Message-ID: <20010125083128.A2699@glacier.fnational.com> On Fri, Jan 26, 2001 at 12:04:54AM +0100, M.-A. Lemburg wrote: > What about that file ? Are you saying that Makefile.pre.in > will no longer work in 2.1 ??? I'm talking about Objects/Makefile.in. Which Makefile.pre.in are you talking about? Modules/Makefile.pre.in is dead too. There is a Makefile.pre.in in the toplevel directory which does the same thing. There is also Misc/Makefile.pre.in. That file gets installed into lib and still works as it aways did. The toplevel Makefile.pre.in can use Modules/Setup* just like the old Modules/Makefile.pre.in could. Does this address your concerns? > Please don't remove that mechanism -- it has been in use for > quite a while and is much more stable than distutils. We should > at least wait a few more distutils releases for the dust to > settle before removing the old fallback solution. No doubt. Neil From nas@arctrix.com Thu Jan 25 16:33:48 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 08:33:48 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <200101252306.SAA20173@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jan 25, 2001 at 06:06:40PM -0500 References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> <200101252306.SAA20173@cj20424-a.reston1.va.home.com> Message-ID: <20010125083348.B2699@glacier.fnational.com> On Thu, Jan 25, 2001 at 06:06:40PM -0500, Guido van Rossum wrote: > Let's at least mark it clearly as obsolete though -- it's a pain to > maintain. Are you talking about Misc/Makefile.pre.in? If so, how do you suggest we mark it? I don't think Modules/Setup should go away any time soon. I often like to build lots of modules staticly into the interpreter. setup.py has no support for building static modules. Neil From tim.one@home.com Thu Jan 25 23:27:52 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 25 Jan 2001 18:27:52 -0500 Subject: [Python-Dev] Windows compile broken In-Reply-To: <14960.38732.773129.793360@localhost.localdomain> Message-ID: Thanks for the clues, everyone! I'll fix it for Windows. Note that I'm getting email in wild bursts, and most often delayed. So I'm generally not seeing any checkin msgs, or SF bug email, or Python-Dev email, ..., anywhere near the time (or, alas, sometimes even day) they're generated. So I simply didn't see the checkin msg introducing cellobject.c. all's-well-that-looks-like-it-may-end-ly y'rs - tim From mal@lemburg.com Fri Jan 26 09:32:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 10:32:14 +0100 Subject: [Python-Dev] Makefile.pre.in (Windows compile broken) References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> <20010125083128.A2699@glacier.fnational.com> Message-ID: <3A71441E.4584A5C8@lemburg.com> Neil Schemenauer wrote: > > On Fri, Jan 26, 2001 at 12:04:54AM +0100, M.-A. Lemburg wrote: > > What about that file ? Are you saying that Makefile.pre.in > > will no longer work in 2.1 ??? > > I'm talking about Objects/Makefile.in. Which Makefile.pre.in are > you talking about? Modules/Makefile.pre.in is dead too. There > is a Makefile.pre.in in the toplevel directory which does the > same thing. There is also Misc/Makefile.pre.in. That file gets > installed into lib and still works as it aways did. The toplevel > Makefile.pre.in can use Modules/Setup* just like the old > Modules/Makefile.pre.in could. Does this address your concerns? Yes. Thanks. I was talking about the Misc/Makefile.pre.in mechanism which was used in the past by many Python C extensions to provide a portable of compiling the extension into a shared module or statically into the Python interpreter. I have been using that mechanism for years now and with much success. Even though I am currently moving to distutils I have no idea how stable distutils is on exotic platforms or ones which have special needs (like e.g. AIX). > > Please don't remove that mechanism -- it has been in use for > > quite a while and is much more stable than distutils. We should > > at least wait a few more distutils releases for the dust to > > settle before removing the old fallback solution. > > No doubt. Ok. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Jan 26 09:37:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 10:37:12 +0100 Subject: [Python-Dev] setup.py Message-ID: <3A714548.C487DCC9@lemburg.com> I have posted two messages here regarding the new setup.py mechanism for building Modules/ but have received no comments on them so far. Here's another go: 1. I think that setup.py should output warnings about modules which cannot be built for some reason rather than having ot the build process completely. 2. I suggest adding -L/usr/lib/termcap to the readline extension. This doesn't hurt anywhere and will get this extension to compile on SuSE Linux too. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Fri Jan 26 12:27:56 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 26 Jan 2001 07:27:56 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <3A714548.C487DCC9@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 10:37:12AM +0100 References: <3A714548.C487DCC9@lemburg.com> Message-ID: <20010126072756.A5013@thyrsus.com> M.-A. Lemburg : > 1. I think that setup.py should output warnings about modules > which cannot be built for some reason rather than having > ot the build process completely. > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > This doesn't hurt anywhere and will get this extension to compile > on SuSE Linux too. Both good ideas. -- Eric S. Raymond Such are a well regulated militia, composed of the freeholders, citizen and husbandman, who take up arms to preserve their property, as individuals, and their rights as freemen. -- "M.T. Cicero", in a newspaper letter of 1788 touching the "militia" referred to in the Second Amendment to the Constitution. From mal@lemburg.com Fri Jan 26 14:13:45 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 15:13:45 +0100 Subject: [Python-Dev] setup.py References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> Message-ID: <3A718619.6278AF41@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > 1. I think that setup.py should output warnings about modules > > which cannot be built for some reason rather than having > > ot the build process completely. > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > This doesn't hurt anywhere and will get this extension to compile > > on SuSE Linux too. > > Both good ideas. Should I implement the two and check these in ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Fri Jan 26 14:25:59 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 26 Jan 2001 09:25:59 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <3A718619.6278AF41@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 03:13:45PM +0100 References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> <3A718619.6278AF41@lemburg.com> Message-ID: <20010126092559.A5623@thyrsus.com> M.-A. Lemburg : > "Eric S. Raymond" wrote: > > > > M.-A. Lemburg : > > > 1. I think that setup.py should output warnings about modules > > > which cannot be built for some reason rather than having > > > ot the build process completely. > > > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > > This doesn't hurt anywhere and will get this extension to compile > > > on SuSE Linux too. > > > > Both good ideas. > > Should I implement the two and check these in ? I may not channel Guido the way Tim does, but I suspect he gave you developer privileges because he trusts you to do routine stuff like this. -- Eric S. Raymond The saddest life is that of a political aspirant under democracy. His failure is ignominious and his success is disgraceful. -- H.L. Mencken From mal@lemburg.com Fri Jan 26 14:29:18 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 15:29:18 +0100 Subject: [Python-Dev] setup.py References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> <3A718619.6278AF41@lemburg.com> <20010126092559.A5623@thyrsus.com> Message-ID: <3A7189BE.C6C2806E@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > "Eric S. Raymond" wrote: > > > > > > M.-A. Lemburg : > > > > 1. I think that setup.py should output warnings about modules > > > > which cannot be built for some reason rather than having > > > > ot the build process completely. > > > > > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > > > This doesn't hurt anywhere and will get this extension to compile > > > > on SuSE Linux too. > > > > > > Both good ideas. > > > > Should I implement the two and check these in ? > > I may not channel Guido the way Tim does, but I suspect he gave you > developer privileges because he trusts you to do routine stuff like this. Just asking because setup.py is Andrew's baby. I'll add the above two later today. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mwh21@cam.ac.uk Fri Jan 26 16:40:47 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 26 Jan 2001 16:40:47 +0000 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. Message-ID: Following discussion on c.l.py I've just submitted: http://sourceforge.net/patch/?func=detailpatch&patch_id=103441&group_id=5470 which implements a syntax for adding function attributes inline: >>> def f(a) having (publish=1): ... print 1 ... >>> f.publish 1 It uses an "import-as" like strategy to avoid makeing "having" a keyword (which interacts a bit badly with error reporting, as it happens). Obviously, it would be easy to change "having" to a different word. Another idea I had was: >>> def f(a) having (.publish=1): ... print 1 ... >>> f.publish 1 to emphasize the attributeness of what's going on, but I didn't like this as much in practice (I always forgot the period!). Emile van Sebille also suggested >>> d = {'a':1} >>> def f(a) having (**d): ... print 1 ... >>> f.a 1 which I haven't implemented, because I didn't really like it, but I thought I'd mention. I'll do test suites and documentation in time, but I thought I'd call in here to check the idea wasn't DOA. What do you all think? Cheers, M. -- surely, somewhere, somehow, in the history of computing, at least one manual has been written that you could at least remotely attempt to consider possibly glancing at. -- Adam Rixey From nas@arctrix.com Fri Jan 26 09:55:57 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 01:55:57 -0800 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. In-Reply-To: ; from mwh21@cam.ac.uk on Fri, Jan 26, 2001 at 04:40:47PM +0000 References: Message-ID: <20010126015556.A4215@glacier.fnational.com> I don't see whats wrong with: def f(a): print 1 f.publish = 1 Its perfectly clear to me. As a bonus it works already. I'm -1 on inventing more syntax. Neil From evan@digicool.com Fri Jan 26 17:12:43 2001 From: evan@digicool.com (Evan Simpson) Date: Fri, 26 Jan 2001 12:12:43 -0500 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. References: Message-ID: <00c001c087bb$322a9720$3e48a4d8@digicool.com> From: Michael Hudson > >>> def f(a) having (publish=1): > ... print 1 This doesn't really need special syntax. I would much rather have this (or something like it) as a way of spelling initialized local variables. That is, when I want static local variables, instead of corrupting the function signature by writing: def f(x, marker=[], foo=foo) ...I could write: def f(x) having (marker=[], foo) Cheers, Evan @ digicool From jeremy@alum.mit.edu Fri Jan 26 17:58:24 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 26 Jan 2001 12:58:24 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <20010125050753.A1573@glacier.fnational.com> References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> Message-ID: <14961.47808.315324.734238@localhost.localdomain> >>>>> "NS" == Neil Schemenauer writes: >> When I was working the nested scopes, building was tedious at >> times because a change to funcobject.h meant that, e.g., >> newmodule.c needed to be recompiled. The Makefiles didn't >> capture that information, so I had been adding it to the >> individual Makefiles, e.g. >> >> newmodule.o: newmodule.c ../Include/funcobject.h >> >> (I think this worked.) NS> Hmm, I don't think so. Which makefile did you add this to? Just to clarify: I added this line to the old Makefile before you checked the new one in. NS> Hmm, I don't think so. Which makefile did you add this to? Are NS> you using the new makefile? The Makefile.pre.in file contains a NS> line like: NS> $(LIBRARY_OBJS) $(MAINOBJ): $(PYTHON_HEADERS) NS> but newmodule.o not in LIBRARY_OBJS. By default its not NS> compiled by make but with distutils. If you add newmodule to NS> Setup then a line like: NS> Modules/newmodule.o: $(PYTHON_HEADERS) NS> would do the trick. I think I will add a line like: NS> $(MODOBJS): $(PYTHON_HEADERS) NS> to fix the problem. NS> I could easily restore the mkdep target but my feeling right now NS> that explicitly including the header dependencies is better. NS> What do you think? Isn't it overkill to have every .o file depend on all the .h files? If I change cobject.h, there are very few .o files that depend on this change. I suppose, however, it's not worth the effort to get it right at a finer granularity, e.g. that the only files that depend on cobject.h are cobject, cStringIO, unicodedata, _cursesmodule, object, and unicodeobject. Jeremy From fdrake@acm.org Fri Jan 26 20:36:18 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 26 Jan 2001 15:36:18 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <14961.47808.315324.734238@localhost.localdomain> References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> <14961.47808.315324.734238@localhost.localdomain> Message-ID: <14961.57282.880552.358709@cj42289-a.reston1.va.home.com> Jeremy Hylton writes: > Isn't it overkill to have every .o file depend on all the .h files? > If I change cobject.h, there are very few .o files that depend on this > change. I suppose, however, it's not worth the effort to get it right Perhaps. It's definately easier to maintain than tracking it more specifically and better than what we had, so I'll live with it. ;) > at a finer granularity, e.g. that the only files that depend on > cobject.h are cobject, cStringIO, unicodedata, _cursesmodule, object, > and unicodeobject. And py_curses.h, which is also used in _curses_panel.c. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas@arctrix.com Fri Jan 26 13:58:50 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 05:58:50 -0800 Subject: [Python-Dev] Makefile changes In-Reply-To: <14961.47808.315324.734238@localhost.localdomain>; from jeremy@alum.mit.edu on Fri, Jan 26, 2001 at 12:58:24PM -0500 References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> <14961.47808.315324.734238@localhost.localdomain> Message-ID: <20010126055850.C4918@glacier.fnational.com> On Fri, Jan 26, 2001 at 12:58:24PM -0500, Jeremy Hylton wrote: > Isn't it overkill to have every .o file depend on all the .h files? Maybe, but Python compiles pretty fast anyhow. I'd rather error on the safe side (ie. compiling too much). Trying to figure out which of the subheaders a .c file uses when it imports Python.h would be a lot of work and error prone. More power to you if you want to do it. ;-) Neil From dgoodger@atsautomation.com Fri Jan 26 21:46:13 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Fri, 26 Jan 2001 16:46:13 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very rusty (long live Python!), I don't know my way around configure, and am not familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of tweaks), but I'm getting caught by the new way of building things. Please help if you can! Many thanks in advance. Here's an excerpt of my efforts: # cd /tmp/py # gunzip -c < python-2.1a1.tgz | tar -rf - # cd Python-2.1a1 # ./configure 2>&1 | tee ../configure.1 # make 2>&1 | tee ../make.1 ... ./python //5/tmp/py/Python-2.1a1/setup.py build 'import site' failed; use -v for traceback Traceback (most recent call last): File "//5/tmp/py/Python-2.1a1/setup.py", line 4, in ? import sys, os, string, getopt ImportError: No module named string Running ./python results in stack overflow. The old QNX instructions in README recommend editing Modules/Makefile: LDFLAGS= -N 64k # make 2>&1 | tee ../make.2 Same error as first make. But now the stack doesn't overflow. # python 'import site' failed; use -v for traceback Python 2.1a1 (#2, Jan 26 2001, 11:38:55) [C] on qnxJ Type "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/usr/local/lib/python', '/home/dgoodger/lib/python', '/5/tmp/py/Python-2.1a1/Lib', '/5/tmp/py/Python-2.1a1/Lib/plat-qnxJ', '/tmp/py/Python-2.1a1/Modules'] >>> ^D # fullpath . . is //5/tmp/py/Python-2.1a1 The QNX node number prefix '//5' (machine or host number, equivalent to a 'hostname:' prefix for network paths) is being reduced somehow (path normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are required at the head of the path. Is this something that can be fixed? I added a prefix (QNX virtual-to-real path mapping on the filesystem tree) to correct this: # prefix -A /5=//5 Now /5 points to //5, similar to a link. # make 2>&1 | tee ../make.3 ... ./python //5/tmp/py/Python-2.1a1/setup.py build unable to execute ld: No such file or directory running build running build_ext building 'struct' extension creating build creating build/temp.qnx-J-PCI-2.1 cc -O -I. -I/5/tmp/py/Python-2.1a1/./Include -IInclude/ -I/usr/local/include -c /5/tmp/py/Python-2.1a1/Modules/structmodule.c -o build/temp.qnx-J-PCI-2.1/structmodule.o creating build/lib.qnx-J-PCI-2.1 ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o build/lib.qnx-J-PCI-2.1/struct.so error: command 'ld' failed with exit status 1 make: *** [sharedmods] Error 1 QNX doesn't have an 'ld' command. Is configure not getting its info to setup.py? (Is it supposed to?) What should I check? I have logs of each of the configure & make runs. Should I submit this as a bug on SourceForge? Hope to hear from somebody soon. David Goodger Systems Administrator & Programmer, Advanced Systems Automation Tooling Systems Inc., Automation Systems Division direct: (519) 653-4483 ext. 7121 fax: (519) 650-6695 e-mail: dgoodger@atsautomation.com From guido@digicool.com Fri Jan 26 21:52:47 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 26 Jan 2001 16:52:47 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: Your message of "Fri, 26 Jan 2001 16:46:13 EST." References: Message-ID: <200101262152.QAA26624@cj20424-a.reston1.va.home.com> > [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] > > I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very > rusty (long live Python!), I don't know my way around configure, and am not > familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of > tweaks), but I'm getting caught by the new way of building things. Please > help if you can! Many thanks in advance. > > Here's an excerpt of my efforts: > > # cd /tmp/py > # gunzip -c < python-2.1a1.tgz | tar -rf - > # cd Python-2.1a1 > # ./configure 2>&1 | tee ../configure.1 > # make 2>&1 | tee ../make.1 > ... > ./python //5/tmp/py/Python-2.1a1/setup.py build > 'import site' failed; use -v for traceback > Traceback (most recent call last): > File "//5/tmp/py/Python-2.1a1/setup.py", line 4, in ? > import sys, os, string, getopt > ImportError: No module named string > > Running ./python results in stack overflow. The old QNX instructions in > README recommend editing Modules/Makefile: > LDFLAGS= -N 64k > > # make 2>&1 | tee ../make.2 > > Same error as first make. But now the stack doesn't overflow. > > # python > 'import site' failed; use -v for traceback > Python 2.1a1 (#2, Jan 26 2001, 11:38:55) [C] on qnxJ > Type "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.path > ['', '/usr/local/lib/python', '/home/dgoodger/lib/python', > '/5/tmp/py/Python-2.1a1/Lib', '/5/tmp/py/Python-2.1a1/Lib/plat-qnxJ', > '/tmp/py/Python-2.1a1/Modules'] > >>> ^D > > # fullpath . > . is //5/tmp/py/Python-2.1a1 > > The QNX node number prefix '//5' (machine or host number, equivalent to a > 'hostname:' prefix for network paths) is being reduced somehow (path > normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are > required at the head of the path. Is this something that can be fixed? Aha -- you may need QNX-specific path manipulation functions. What's going on is that site.py normalizes the entries in sys.path, using this function: def makepath(*paths): dir = os.path.join(*paths) return os.path.normcase(os.path.abspath(dir)) I've got a feeling that os.path.abspath(dir) here is the culprit in posixpath.py: def abspath(path): """Return an absolute path.""" if not isabs(path): path = join(os.getcwd(), path) return normpath(path) And here I think that normpath(path) is the routine that actually gets rid of the double leading /. Feel free to submit a patch that leaves double leading slashes in if on QNX. > I added a prefix (QNX virtual-to-real path mapping on the filesystem tree) > to correct this: > > # prefix -A /5=//5 > > Now /5 points to //5, similar to a link. > > # make 2>&1 | tee ../make.3 > ... > ./python //5/tmp/py/Python-2.1a1/setup.py build > unable to execute ld: No such file or directory > running build > running build_ext > building 'struct' extension > creating build > creating build/temp.qnx-J-PCI-2.1 > cc -O -I. -I/5/tmp/py/Python-2.1a1/./Include -IInclude/ > -I/usr/local/include -c /5/tmp/py/Python-2.1a1/Modules/structmodule.c -o > build/temp.qnx-J-PCI-2.1/structmodule.o > creating build/lib.qnx-J-PCI-2.1 > ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o > build/lib.qnx-J-PCI-2.1/struct.so > error: command 'ld' failed with exit status 1 > make: *** [sharedmods] Error 1 > > QNX doesn't have an 'ld' command. Is configure not getting its info to > setup.py? (Is it supposed to?) > > What should I check? I have logs of each of the configure & make runs. > Should I submit this as a bug on SourceForge? > > Hope to hear from somebody soon. This is probably in the realm of the distutils. I have no idea how to teach it to build on QNX, sorry! --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Fri Jan 26 22:01:01 2001 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 26 Jan 2001 17:01:01 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 04:46:13PM -0500 References: Message-ID: <20010126170101.B2762@amarok.cnri.reston.va.us> On Fri, Jan 26, 2001 at 04:46:13PM -0500, Goodger, David wrote: > ImportError: No module named string The 'import string' in setup.py actually seems to be redundant now, since nothing seems to actually refer to the string module. I've removed it from CVS. >The QNX node number prefix '//5' (machine or host number, equivalent to a >'hostname:' prefix for network paths) is being reduced somehow (path >normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are >required at the head of the path. Is this something that can be fixed? Ooh, very likely: >>> os.path.normpath('//5/foo/bar') '/5/foo/bar' Isn't // at the root a Unix convention of some sort for some network filesystems? Probably normpath() should just leave it alone. >QNX doesn't have an 'ld' command. Is configure not getting its info to >setup.py? (Is it supposed to?) setup.py should be parsing the Makefile. The old QNX instructions say Modules/Makefile should be edited, but with Neil's non-recursive Makefile patch (committed after alpha1's release), editing Modules/Makefile will have no effect. Try editing just the top-level Makefile, which should affect setup.py. --amk From mal@lemburg.com Fri Jan 26 22:15:09 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 23:15:09 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> Message-ID: <3A71F6ED.D6D642A7@lemburg.com> "Andrew M. Kuchling" wrote: > >The QNX node number prefix '//5' (machine or host number, equivalent to a > >'hostname:' prefix for network paths) is being reduced somehow (path > >normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are > >required at the head of the path. Is this something that can be fixed? > > Ooh, very likely: > >>> os.path.normpath('//5/foo/bar') > '/5/foo/bar' > > Isn't // at the root a Unix convention of some sort for some > network filesystems? Probably normpath() should just leave it alone. Samba uses ////. os.path.normpath() should probably leave the leading '//' untouched (having too many of those in the path doesn't do any harm, AFAIK). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nas@arctrix.com Fri Jan 26 15:26:12 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 07:26:12 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 04:46:13PM -0500 References: Message-ID: <20010126072611.A5345@glacier.fnational.com> On Fri, Jan 26, 2001 at 04:46:13PM -0500, Goodger, David wrote: > Running ./python results in stack overflow. The old QNX instructions in > README recommend editing Modules/Makefile: > LDFLAGS= -N 64k > > # make 2>&1 | tee ../make.2 The README should be changed to say edit the toplevel Makefile. Should those flags be the default? If you can give me the MACHDEP from your Makefile I can add it to configure.in. > QNX doesn't have an 'ld' command. Is configure not getting its info to > setup.py? (Is it supposed to?) I'm not sure how distutils figures out what to use for ld. It doesn't appear in the Makefile. It think this is probably some distutils thing. Andrew? Neil From fredrik@effbot.org Fri Jan 26 22:25:34 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 26 Jan 2001 23:25:34 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> Message-ID: <001a01c087e6$ec3b9710$e46940d5@hagrid> mal wrote:> > Ooh, very likely: > > >>> os.path.normpath('//5/foo/bar') > > '/5/foo/bar' > > > > Isn't // at the root a Unix convention of some sort for some > > network filesystems? Probably normpath() should just leave it alone. > > Samba uses ////. os.path.normpath() > should probably leave the leading '//' untouched (having too > many of those in the path doesn't do any harm, AFAIK). from 1.5.2's posixpath: def normpath(path): """Normalize path, eliminating double slashes, etc.""" import string # Treat initial slashes specially slashes = '' while path[:1] == '/': slashes = slashes + '/' path = path[1:] ... return slashes + string.joinfields(comps, '/') from 2.0's posixpath: def normpath(path): """Normalize path, eliminating double slashes, etc.""" if path == '': return '.' import string initial_slash = (path[0] == '/') ... if initial_slash: path = '/' + path return path or '.' interesting... Cheers /F From akuchlin@mems-exchange.org Fri Jan 26 22:28:03 2001 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 26 Jan 2001 17:28:03 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <20010126072611.A5345@glacier.fnational.com>; from nas@arctrix.com on Fri, Jan 26, 2001 at 07:26:12AM -0800 References: <20010126072611.A5345@glacier.fnational.com> Message-ID: <20010126172803.A2817@amarok.cnri.reston.va.us> On Fri, Jan 26, 2001 at 07:26:12AM -0800, Neil Schemenauer wrote: >I'm not sure how distutils figures out what to use for ld. It >doesn't appear in the Makefile. It think this is probably some >distutils thing. Andrew? It looks at LDSHARED. See customize_compiler in Lib/distutils/sysconfig.py. Looking in Modules/Makefile, LDFLAGS is only used for the final link to produce a Python executable, so I think this is up to the Makefile, not setup.py. --amk From nas@arctrix.com Fri Jan 26 15:56:41 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 07:56:41 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <20010126172803.A2817@amarok.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Fri, Jan 26, 2001 at 05:28:03PM -0500 References: <20010126072611.A5345@glacier.fnational.com> <20010126172803.A2817@amarok.cnri.reston.va.us> Message-ID: <20010126075641.A5534@glacier.fnational.com> On Fri, Jan 26, 2001 at 05:28:03PM -0500, Andrew M. Kuchling wrote: > On Fri, Jan 26, 2001 at 07:26:12AM -0800, Neil Schemenauer wrote: > >I'm not sure how distutils figures out what to use for ld. > > It looks at LDSHARED. Okay. David, what should LDSHARED say for QNX? I can add the magic to configure.in. Neil From mal@lemburg.com Fri Jan 26 22:51:09 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 23:51:09 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> Message-ID: <3A71FF5D.DC609775@lemburg.com> Fredrik Lundh wrote: > > mal wrote:> > Ooh, very likely: > > > >>> os.path.normpath('//5/foo/bar') > > > '/5/foo/bar' > > > > > > Isn't // at the root a Unix convention of some sort for some > > > network filesystems? Probably normpath() should just leave it alone. > > > > Samba uses ////. os.path.normpath() > > should probably leave the leading '//' untouched (having too > > many of those in the path doesn't do any harm, AFAIK). > > from 1.5.2's posixpath: > > def normpath(path): > """Normalize path, eliminating double slashes, etc.""" > import string > # Treat initial slashes specially > slashes = '' > while path[:1] == '/': > slashes = slashes + '/' > path = path[1:] > ... > return slashes + string.joinfields(comps, '/') > > from 2.0's posixpath: > > def normpath(path): > """Normalize path, eliminating double slashes, etc.""" > if path == '': > return '.' > import string > initial_slash = (path[0] == '/') > ... > if initial_slash: > path = '/' + path > return path or '.' > > interesting... Here's the log message: revision 1.34 date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 added rewritten normpath from Moshe Zadka that does the right thing with paths containing .. and the diff: diff -r1.34 -r1.33 349,350d348 < if path == '': < return '.' 352,367c350,372 < initial_slash = (path[0] == '/') < comps = string.split(path, '/') < new_comps = [] < for comp in comps: < if comp in ('', '.'): < continue < if (comp != '..' or (not initial_slash and not new_comps) or < (new_comps and new_comps[-1] == '..')): < new_comps.append(comp) < elif new_comps: < new_comps.pop() < comps = new_comps < path = string.join(comps, '/') < if initial_slash: < path = '/' + path < return path or '.' --- > # Treat initial slashes specially > slashes = '' > while path[:1] == '/': > slashes = slashes + '/' > path = path[1:] > comps = string.splitfields(path, '/') > i = 0 > while i < len(comps): > if comps[i] == '.': > del comps[i] > while i < len(comps) and comps[i] == '': > del comps[i] > elif comps[i] == '..' and i > 0 and comps[i-1] not in ('', '..'): > del comps[i-1:i+1] > i = i-1 > elif comps[i] == '' and i > 0 and comps[i-1] <> '': > del comps[i] > else: > i = i+1 > # If the path is now empty, substitute '.' > if not comps and not slashes: > comps.append('.') > return slashes + string.joinfields(comps, '/') Revision 1.33 clearly leaves initial slashes untouched. I guess we should restore this... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nas@arctrix.com Fri Jan 26 16:12:15 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 08:12:15 -0800 Subject: [Python-Dev] LINKCC defaults to CXX Message-ID: <20010126081215.B5534@glacier.fnational.com> Dear lord why? So people can develop extensions using C++? Its not worth the pain inflicted on everyone else. Let them recompile with LINKCC=CXX. Linking with CXX opens a huge can of stinky worms. First of all, just because configure found a value for CXX doesn't mean it works. Even if it does that doesn't mean that using it is a good idea. Linking with CXX will bring in the C++ runtime. There are a large number of platforms where the C++ ABI has not been standarized; for example, anything that used g++. Can we please leave LINKCC default to CXX? Its easy enough for the crazies to override if they like. I'll even create a configure option for them. Neil From barry@digicool.com Fri Jan 26 23:09:57 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 26 Jan 2001 18:09:57 -0500 Subject: [Python-Dev] LINKCC defaults to CXX References: <20010126081215.B5534@glacier.fnational.com> Message-ID: <14962.965.464326.794431@anthem.wooz.org> >>>>> "NS" == Neil Schemenauer writes: NS> Can we please leave LINKCC default to CXX? I think you mean default it to CC, eh? +1 From mal@lemburg.com Sat Jan 27 00:16:01 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 01:16:01 +0100 Subject: [Python-Dev] Nightly CVS tarballs Message-ID: <3A721341.3F348E51@lemburg.com> I just got a request from someone who wants to test the latest CVS version but unfortunately can't because he's behind a firewall. Is there any chance of reactivating the nightly tarball generation that was once in place ? http://www.python.org/download/cvs.html Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From dgoodger@atsautomation.com Sat Jan 27 00:30:21 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Fri, 26 Jan 2001 19:30:21 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: Thank you all for your prompt replies. (Guido's was within seconds! Well, minutes, certainly.) I'll give it another go on Monday. I've got renovations to fill my weekend. /David From thomas@xs4all.net Sat Jan 27 00:35:41 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 27 Jan 2001 01:35:41 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 07:30:21PM -0500 References: Message-ID: <20010127013541.N962@xs4all.nl> On Fri, Jan 26, 2001 at 07:30:21PM -0500, Goodger, David wrote: > Thank you all for your prompt replies. (Guido's was within seconds! Well, > minutes, certainly.) Oh, the wonderful things one can do with a time machine.... -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jeremy@alum.mit.edu Fri Jan 26 22:14:26 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 26 Jan 2001 17:14:26 -0500 (EST) Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <3A721341.3F348E51@lemburg.com> References: <3A721341.3F348E51@lemburg.com> Message-ID: <14961.63170.394043.790610@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> I just got a request from someone who wants to test the latest MAL> CVS version but unfortunately can't because he's behind a MAL> firewall. MAL> Is there any chance of reactivating the nightly tarball MAL> generation that was once in place ? MAL> http://www.python.org/download/cvs.html I plan to set up nightly cvs snapshots soon. We should be moving into our new office next week; I hope to have a machine that is on the net 24x7 shortly after that. Jeremy From bckfnn@worldonline.dk Sat Jan 27 07:58:38 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Sat, 27 Jan 2001 07:58:38 GMT Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <14961.63170.394043.790610@localhost.localdomain> References: <3A721341.3F348E51@lemburg.com> <14961.63170.394043.790610@localhost.localdomain> Message-ID: <3a727e79.835771@smtp.worldonline.dk> >>>>>> "MAL" == M -A Lemburg writes: > > MAL> I just got a request from someone who wants to test the latest > MAL> CVS version but unfortunately can't because he's behind a > MAL> firewall. > > MAL> Is there any chance of reactivating the nightly tarball > MAL> generation that was once in place ? > > MAL> http://www.python.org/download/cvs.html [Jeremy] >I plan to set up nightly cvs snapshots soon. We should be moving into >our new office next week; I hope to have a machine that is on the net >24x7 shortly after that. FWIW, I have been using this cron and shell script running on shell.sourceforge.net. This way I don't need 24x7 in order to make a cvs tarball (and .zip) available. 22 2 * * * $HOME/bin/jython-snap SHOTLABEL=`date +%Y%m%d` LOGLABEL=log.`date +%Y%m%d` cd /home/groups/jython/htdocs/cvssnaps (cvs -Qd :pserver:anonymous@cvs1:/cvsroot/jython checkout -d jython-$SHOTLABEL jython && \ tar zcf jython-nightly.tar.gz jython-$SHOTLABEL && \ rm -fr jython-nightly.zip && \ zip -qr9 jython-nightly.zip jython-$SHOTLABEL && \ rm -fr jython-$SHOTLABEL) >$LOGLABEL 2>&1 regards, finn From tim.one@home.com Sat Jan 27 09:35:14 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 27 Jan 2001 04:35:14 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <20010126092559.A5623@thyrsus.com> Message-ID: [Eric S. Raymond] > I may not channel Guido the way Tim does, but I suspect he gave you > developer privileges because he trusts you to do routine stuff like this. Excellent, Eric! You're batting 1%. Here's how to boost it to 93%: whenever a new idea comes up, just grumble "no". You'll be right 92% of the time . Reminds me of a friend who got sucked into working at a neural-net startup trying to build a black box to predict whether the daily close of the S&P 500 would be above or below the previous day's. He was greatly impressed by the research they had done, showing that the prototype got the right answer more than half the time when fed historical data, and at a very high significance level (i.e., it almost certainly did better than flipping a coin). What he didn't realize at the time is that if they had written the prototype in Python: # S&P close daily direction predictor print "higher" it would have been right about 2/3rds the time <0.33 wink>. never-ascribe-to-insight-what-can-be-explained-by-idiocy-ly y'rs - tim From martin@mira.cs.tu-berlin.de Sat Jan 27 09:38:41 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 27 Jan 2001 10:38:41 +0100 Subject: [Python-Dev] Nightly CVS tarballs Message-ID: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> > Is there any chance of reactivating the nightly tarball generation > that was once in place ? What's wrong with http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz ? Regards, Martin From fredrik@effbot.org Sat Jan 27 10:43:50 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Sat, 27 Jan 2001 11:43:50 +0100 Subject: [Python-Dev] setup.py References: Message-ID: <008c01c0884e$09bd2030$e46940d5@hagrid> tim wrote: > Reminds me of a friend who got sucked into working at a neural-net startup > trying to build a black box to predict whether the daily close of the S&P > 500 would be above or below the previous day's. /.../ > > # S&P close daily direction predictor > print "higher" replace "higher" with "same", and you have a pretty decent weather predictor. Cheers /F From mal@lemburg.com Sat Jan 27 12:01:30 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 13:01:30 +0100 Subject: [Python-Dev] Nightly CVS tarballs References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> Message-ID: <3A72B89A.E03C1912@lemburg.com> "Martin v. Loewis" wrote: > > > Is there any chance of reactivating the nightly tarball generation > > that was once in place ? > > What's wrong with > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz > > ? I didn't realize that SF does this automagically. Could someone please redirect the link on the python.org cvs page to the above address (David Ascher's tarball generation stopped in February 2000 !). Thanks for the hint, Martin. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake@acm.org Sat Jan 27 13:16:01 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 27 Jan 2001 08:16:01 -0500 (EST) Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <3A72B89A.E03C1912@lemburg.com> References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> <3A72B89A.E03C1912@lemburg.com> Message-ID: <14962.51729.905084.154359@cj42289-a.reston1.va.home.com> "Martin v. Loewis" wrote: > What's wrong with > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz M.-A. Lemburg writes: > I didn't realize that SF does this automagically. Could someone > please redirect the link on the python.org cvs page to the > above address (David Ascher's tarball generation stopped in > February 2000 !). Did you want a "snapshot" or a copy of the repository? What SF produces is a tarball of the repository, not a snapshot. We still need to do something to create snapshots. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal@lemburg.com Sat Jan 27 13:28:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 14:28:40 +0100 Subject: [Python-Dev] Nightly CVS tarballs References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> <3A72B89A.E03C1912@lemburg.com> <14962.51729.905084.154359@cj42289-a.reston1.va.home.com> Message-ID: <3A72CD08.F47DAA69@lemburg.com> "Fred L. Drake, Jr." wrote: > > "Martin v. Loewis" wrote: > > What's wrong with > > > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz > > M.-A. Lemburg writes: > > I didn't realize that SF does this automagically. Could someone > > please redirect the link on the python.org cvs page to the > > above address (David Ascher's tarball generation stopped in > > February 2000 !). > > Did you want a "snapshot" or a copy of the repository? What SF > produces is a tarball of the repository, not a snapshot. I meant a copy of what you get when you check out the Python CVS tree wrapped into a .tar.gz file. The size of the above archive (16MB) suggests that a lot more is going into the .tar.gz file. A .tar.gz of the CVS checkout is around 4MB in size. Looks like we still need to do something after all ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From armin@steinhoff.de Sat Jan 27 16:24:57 2001 From: armin@steinhoff.de (Armin Steinhoff) Date: Sat, 27 Jan 2001 17:24:57 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: <4.3.2.7.2.20010127170125.00b2ee80@mail.secureweb.de> Hello Guido, nice to see the first 2.1 version :) At 16:52 26.01.01 -0500, you wrote: > > [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] > > > > I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very > > rusty (long live Python!), I don't know my way around configure, and am not > > familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of > > tweaks), but I'm getting caught by the new way of building things. Please > > help if you can! Many thanks in advance. > > > > Here's an excerpt of my efforts: > > > > # cd /tmp/py > > # gunzip -c < python-2.1a1.tgz | tar -rf - > > # cd Python-2.1a1 > > # ./configure 2>&1 | tee ../configure.1 I did a fast hack with the new 2.1 version: CC=cc LINKCC=cc configure --without-gcc --shared=no --without-threads (Hope '--shared=no' works ... QNX4 doesn't support dynamic loading) Please replace all references to g++ by cc -> in the main Makefile and the Modules/Makefile. In the Modules/Makefile set LDFLAGS=250K ... the default stacksize of 32K seems to be too small. > > # make 2>&1 | tee ../make.1 > > ... > > ./python //5/tmp/py/Python-2.1a1/setup.py build > > 'import site' failed; use -v for traceback 'python -v' shows that the module 'distutils.util' isn't there .... it seems to be not included in the source distribution. 'import site' failed; traceback: Traceback (most recent call last): File "//1/Python-2.1a1/Lib/site.py", line 85, in ? from distutils.util import get_platform ImportError: No module named distutils.util ^^^^^^^^^^^^^^ [ clip ..] >This is probably in the realm of the distutils. I have no idea how to >teach it to build on QNX, sorry! IMHO ... it is not a path problem. In the moment there is no time left for me to go into these details. A clean port will happen in a few weeks. Please check out PyQNX for news regarding QNX4.25 and QNX6.0 (aka QNX Neutrino). Greetings Armin Steinhoff Life-Demo of PyDACHS http://www.dachs.net/PyDACHS_python-tilcon.htm in our booth at Embedded Systems 2001, Nuremberg, GER http://www.embedded-systems-messe.de Febr. 14-16, 2000 Hall 11, Booth P 04 From guido@digicool.com Sat Jan 27 16:50:50 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:50:50 -0500 Subject: [Python-Dev] LINKCC defaults to CXX In-Reply-To: Your message of "Fri, 26 Jan 2001 08:12:15 PST." <20010126081215.B5534@glacier.fnational.com> References: <20010126081215.B5534@glacier.fnational.com> Message-ID: <200101271650.LAA30720@cj20424-a.reston1.va.home.com> > Dear lord why? So people can develop extensions using C++? Its > not worth the pain inflicted on everyone else. Let them > recompile with LINKCC=CXX. > > Linking with CXX opens a huge can of stinky worms. First of all, > just because configure found a value for CXX doesn't mean it > works. Even if it does that doesn't mean that using it is a good > idea. Linking with CXX will bring in the C++ runtime. There are > a large number of platforms where the C++ ABI has not been > standarized; for example, anything that used g++. > > Can we please leave LINKCC default to CXX? Its easy enough for > the crazies to override if they like. I'll even create a > configure option for them. Arg. My bad. I did this as an experiment; it didn't break on my machine, but I didn't intend this to become the standard! Thanks for changing it back. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sat Jan 27 16:52:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:52:23 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: Your message of "Fri, 26 Jan 2001 23:51:09 +0100." <3A71FF5D.DC609775@lemburg.com> References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> Message-ID: <200101271652.LAA30750@cj20424-a.reston1.va.home.com> > revision 1.34 > date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 > added rewritten normpath from Moshe Zadka that does the right thing with > paths containing .. [...] > Revision 1.33 clearly leaves initial slashes untouched. > I guess we should restore this... Yes, please! (Just the "leading extra slashes stay" behavior.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sat Jan 27 16:57:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:57:40 -0500 Subject: [Python-Dev] New bug in function object hash() and comparisons In-Reply-To: Your message of "Fri, 26 Jan 2001 17:02:09 EST." References: Message-ID: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> Barry noticed: > Anyway, did you know that you can use functions as keys to a > dictionary, but that you can mutate them to "lose" the element? > > -------------------- snip snip -------------------- > Python 2.0 (#13, Jan 10 2001, 13:06:39) > [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> d = {} > >>> def foo(): pass > ... > >>> def bar(): pass > ... > >>> d[foo] = 1 > >>> d[foo] > 1 > >>> foocode = foo.func_code > >>> foo.func_code = bar.func_code > >>> d[foo] > Traceback (most recent call last): > File "", line 1, in ? > KeyError: > >>> d[bar] = 2 > >>> d[bar] > 2 > >>> d[foo] > 2 > >>> foo.func_code = foocode > >>> d[foo] > 1 > -------------------- snip snip -------------------- > > It's because a function's func_code attribute is used in its hash > calculation, but func_code is writable! Clearly, something changed. I'm pretty sure it's the function attributes. Either the function attributes shouldn't be used in comparing function objects, or hash() on functions should be unimplemented, or comparison on functions should use simple pointer compares. What's the right solution? Do people use functions as dict keys? If not, we can remove the hash() implementation. But I suspect they *are* used as dict keys. Not using the __dict__ on comparisons appears ugly, so probably the best solution is to change function comparisons to use simple pointer compares. That removes the possibility to see whether two different functions implement the same code -- but does anybody really use that? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez@zadka.site.co.il Sat Jan 27 17:17:50 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Sat, 27 Jan 2001 19:17:50 +0200 (IST) Subject: [Python-Dev] New bug in function object hash() and comparisons In-Reply-To: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com>, Message-ID: <20010127171750.91412A840@darjeeling.zadka.site.co.il> On Sat, 27 Jan 2001 11:57:40 -0500, Guido van Rossum wrote: (about function hash doing the wrong thing) > What's the right solution? I have no idea... > Do people use functions as dict keys? If > not, we can remove the hash() implementation. ...but this ain't it. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From gvwilson@ca.baltimore.com Sat Jan 27 17:23:42 2001 From: gvwilson@ca.baltimore.com (Greg Wilson) Date: Sat, 27 Jan 2001 12:23:42 -0500 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1119 - 17 msgs In-Reply-To: <20010127170103.DA6DEEA44@mail.python.org> Message-ID: <000001c08885$e5418c40$770a0a0a@nevex.com> > Guido wrote: > What's the right solution? Do people use functions as dict keys? Yup --- even use this as an example in the course (part of drumming home to students that functions are just a special kind of data). Greg From barry@digicool.com Sat Jan 27 17:43:43 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Sat, 27 Jan 2001 12:43:43 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> Message-ID: <14963.2255.268933.615456@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Clearly, something changed. I'm pretty sure it's the GvR> function attributes. Actually no. func_code is used in func_hash() but somewhere in the Python 1.6 cycle, func_code was made assignable. GvR> Either the function attributes shouldn't be used in comparing GvR> function objects, or hash() on functions should be GvR> unimplemented, or comparison on functions should use simple GvR> pointer compares. GvR> What's the right solution? We should definitely continue to allow functions as keys to dictionaries, but probably just remove func_code as an input to the function's hash. -Barry From barry@digicool.com Sat Jan 27 17:48:33 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Sat, 27 Jan 2001 12:48:33 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> <14963.2255.268933.615456@anthem.wooz.org> Message-ID: <14963.2545.14600.667505@anthem.wooz.org> Me> We should definitely continue to allow functions as keys to Me> dictionaries, but probably just remove func_code as an input Me> to the function's hash. But of course, func_globals won't be sufficient as a hash for functions. Probably changing the hash to a pointer compare is the best thing after all. -Barry From guido@digicool.com Sat Jan 27 17:49:16 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 12:49:16 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons In-Reply-To: Your message of "Sat, 27 Jan 2001 12:43:43 EST." <14963.2255.268933.615456@anthem.wooz.org> References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> <14963.2255.268933.615456@anthem.wooz.org> Message-ID: <200101271749.MAA32025@cj20424-a.reston1.va.home.com> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Clearly, something changed. I'm pretty sure it's the > GvR> function attributes. > > Actually no. func_code is used in func_hash() but somewhere in the > Python 1.6 cycle, func_code was made assignable. Argh! You're right. > GvR> Either the function attributes shouldn't be used in comparing > GvR> function objects, or hash() on functions should be > GvR> unimplemented, or comparison on functions should use simple > GvR> pointer compares. > > GvR> What's the right solution? > > We should definitely continue to allow functions as keys to > dictionaries, but probably just remove func_code as an input to the > function's hash. OK, that settles it. There's not much point in having a function compare do anything besides a pointer comparison when the code objects aren't compared. (Two completely different functions could compare equal e.g. if they has the same attribute dict.) So we should just punt, and compare functions by object pointer. The proper way to do this is to *delete* func_hash and func_compare from funcobject.c -- the default comparison will take care of this. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Sat Jan 27 18:58:30 2001 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Sat, 27 Jan 2001 13:58:30 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: Message-ID: <200101271858.NAA04898@mira.erols.com> On Sat, 27 Jan 2001 18:28:02 +0100, Andreas Jung wrote: >Is there a reason why 2.1 runs significantly slower ? >Both Python versions were compiled with -g -O2 only. [CC'ing to python-dev] Confirmed: [amk@mira Python-2.0]$ ./python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.14 This machine benchmarks at 3184.71 pystones/second [amk@mira Python-2.0]$ python2.1 Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.81 This machine benchmarks at 2624.67 pystones/second The ceval.c changes seem a likely candidate to have caused this. Anyone want to run Marc-Andre's microbenchmarks and see how the numbers have changed? --amk From moshez@zadka.site.co.il Sat Jan 27 19:14:28 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Sat, 27 Jan 2001 21:14:28 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? Message-ID: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Attached is an example Python session after I patched the intepreter. The test-suite passes all right. I want an OK to check this in. Here is the patch: Index: Objects/funcobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/funcobject.c,v retrieving revision 2.33 diff -c -r2.33 funcobject.c *** Objects/funcobject.c 2001/01/25 20:06:59 2.33 --- Objects/funcobject.c 2001/01/27 19:13:08 *************** *** 347,358 **** 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ ! (cmpfunc)func_compare, /*tp_compare*/ (reprfunc)func_repr, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ ! (hashfunc)func_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ (getattrofunc)func_getattro, /*tp_getattro*/ --- 347,358 ---- 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ ! 0, /*tp_compare*/ (reprfunc)func_repr, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ ! 0, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ (getattrofunc)func_getattro, /*tp_getattro*/ Python 2.1a1 (#1, Jan 27 2001, 21:01:24) [GCC 2.95.3 20010111 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> def foo(): ... pass ... >>> def bar(): ... pass ... >>> hash(foo) 135484636 >>> hash(bar) 135481676 >>> foo == bar 0 >>> d = {} >>> d[foo] =1 >>> def temp(): ... print "baz" ... >>> foo.func_code = temp.func_code >>> d[foo] 1 -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one@home.com Sat Jan 27 20:06:20 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 27 Jan 2001 15:06:20 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <200101271858.NAA04898@mira.erols.com> Message-ID: [A.M. Kuchling] > [CC'ing to python-dev] Confirmed: > > [amk@mira Python-2.0]$ ./python Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.14 > This machine benchmarks at 3184.71 pystones/second > [amk@mira Python-2.0]$ python2.1 Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.81 > This machine benchmarks at 2624.67 pystones/second > > The ceval.c changes seem a likely candidate to have caused this. > Anyone want to run Marc-Andre's microbenchmarks and see how the > numbers have changed? Want to, yes, but it looks hopeless on my box: **** 2.0 C:\Python20>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 0.851013 This machine benchmarks at 11750.7 pystones/second C:\Python20>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.24279 This machine benchmarks at 8046.41 pystones/second **** 2.1a1 C:\Python21a1>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 0.823313 This machine benchmarks at 12146 pystones/second C:\Python21a1>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.27046 This machine benchmarks at 7871.15 pystones/second **** CVS C:\Code\python\dist\src\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.836391 This machine benchmarks at 11956.1 pystones/second C:\Code\python\dist\src\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 1.3055 This machine benchmarks at 7659.9 pystones/second That's after a reboot: no matter which Python I use, it gets about 12000 on the first run with a given python.exe, and about 8000 on the second. Not shown is that it *stays* at about 8000 until the next reboot. So there's a Windows (W98SE) Mystery, but also no evidence that timings have changed worth spit under the MS compiler. The eval loop is very touchy, and I suspect you won't track this down on your box until staring at the code gcc (I presume you're using gcc) generates. May be sensitive to which release of gcc you're using too. switch-to-windows-and-you'll-have-easier-things-to-worry-about-ly y'rs - tim From fredrik@pythonware.com Sun Jan 28 09:37:45 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 28 Jan 2001 10:37:45 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> Message-ID: <00ed01c0890e$e3bf5ad0$e46940d5@hagrid> guido wrote: > > Revision 1.33 clearly leaves initial slashes untouched. > > I guess we should restore this... > > Yes, please! (Just the "leading extra slashes stay" behavior.) just looked this up in the specs, and POSIX seem to require that leading slashes are preserved only if there are exactly two of them: A pathname that begins with two successive slashes may be interpreted in an implementation-dependent manner, although more than two leading slashes are treated as a single slash. (from susv2) maybe we should add a if len(slashes) > 2: slashes = "/" test to the patch? Cheers /F From thomas@xs4all.net Sun Jan 28 17:39:58 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 28 Jan 2001 18:39:58 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <00ed01c0890e$e3bf5ad0$e46940d5@hagrid>; from fredrik@pythonware.com on Sun, Jan 28, 2001 at 10:37:45AM +0100 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> <00ed01c0890e$e3bf5ad0$e46940d5@hagrid> Message-ID: <20010128183958.Q962@xs4all.nl> On Sun, Jan 28, 2001 at 10:37:45AM +0100, Fredrik Lundh wrote: > guido wrote: > > > Revision 1.33 clearly leaves initial slashes untouched. > > > I guess we should restore this... > > > > Yes, please! (Just the "leading extra slashes stay" behavior.) > just looked this up in the specs, and POSIX seem to > require that leading slashes are preserved only if there > are exactly two of them: > A pathname that begins with two successive slashes > may be interpreted in an implementation-dependent > manner, although more than two leading slashes are > treated as a single slash. > (from susv2) > maybe we should add a if len(slashes) > 2: slashes = "/" > test to the patch? How strictly do we need (or want, for that matter) to follow POSIX here ? I'm aware the module is called 'posixpath', but it's used in a bit more than just POSIX environments (or POSIX behaviours) so it might make sense to ignore this particular tidbit. What if there is a system that attaches a special meaning to ///, should we create a new path module for it ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin@mira.cs.tu-berlin.de Sun Jan 28 20:50:35 2001 From: martin@mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 21:50:35 +0100 Subject: [Python-Dev] XSLT parser interface Message-ID: <200101282050.f0SKoZr08809@mira.informatik.hu-berlin.de> Based on my previous IDL interface for XPath parsers, I've defined an API for a parser that parsers XSLT pattern expressions. It is an extension to the XPath API, so I attach only the additional functions. Any comments are appreciated. Martin module XPath{ // XSLT exprType values const unsigned short PATTERN = 17; const unsigned short LOCATION_PATTERN = 18; const unsigned short RELATIVE_PATH_PATTERN = 19; const unsigned short STEP_PATTERN = 20; interface Pattern; interface LocationPathPattern; interface RelativePathPattern; interface StepPattern; interface PatternFactory:ExprFactory{ Pattern createPattern(in LocationPathPattern first); // idkey may be null, represents IdKeyPattern // if parent is true, it is '/', else '//' // rel may be null LocationPathPattern createLocationPathPattern(in FunctionCall idkey, boolean parent, in RelativePathPattern rel); // if parent is true, it is /, else // RelativePathPattern createRelativePathPattern(in RelativePathPattern rel, boolean parent, in StepPattern step); StepPattern createStepPattern(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates); }; typedef sequence LocationPathPatterns; interface Pattern:Expr{ readonly attribute LocationPathPatterns patterns; void append(in LocationPathPattern pattern); }; interface LocationPathPattern:Expr{ readonly attribute FunctionCall idkey; readonly attribute boolean parent; readonly attribute RelativePathPattern relative_pattern; }; interface RelativePathPattern:Expr{ readonly attribute RelativePathPattern relative; readonly attribute boolean parent; readonly attribute StepPattern step; }; interface StepPattern:Expr{ readonly attribute AxisSpecifier axis; readonly attribute NodeTest test; readonly attribute PredicateList predicates; }; interface XSLTParser:Parser{ Pattern parsePattern(in DOMString pattern); }; }; From skip@mojam.com (Skip Montanaro) Sun Jan 28 21:40:28 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sun, 28 Jan 2001 15:40:28 -0600 (CST) Subject: [Python-Dev] What happened to Setup.local's functionality? Message-ID: <14964.37324.642566.602319@beluga.mojam.com> I just remembered Modules/Setup.local. I peeked at mine and noticed it had been zeroed out. I then copied a version of it over from another machine and reran make a couple times. Makesetup ran but nothing mentioned in Setup.local got built. I don't think 2.1 can be released without providing a way for users to recover from this change. I didn't see anything obvious in setup.py. Am I missing something? Skip From thomas@xs4all.net Mon Jan 29 00:39:04 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 29 Jan 2001 01:39:04 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.20,2.21 In-Reply-To: <20001104001415.A2093@53b.hoffleit.de>; from gregor@hoffleit.de on Sat, Nov 04, 2000 at 12:14:15AM +0100 References: <200010050142.SAA08326@slayer.i.sourceforge.net> <20001104001415.A2093@53b.hoffleit.de> Message-ID: <20010129013904.R962@xs4all.nl> On Sat, Nov 04, 2000 at 12:14:15AM +0100, Gregor Hoffleit wrote: > FYI: This misdefinition with LONG_BIT was due to a bug in glibc's limits.h. It > has been fixed in glibc 2.96. Do you mean gcc 2.96, or glibc 2.(1|2).96 ? Or is 2.96 some internal versioning for glibc that I was unaware of ? :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry@digicool.com Mon Jan 29 05:03:45 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 00:03:45 -0500 Subject: [Python-Dev] Function Hash: Check it in? References: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <14964.63921.966960.445548@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> Attached is an example Python session after I patched the MZ> intepreter. The test-suite passes all right. MZ> I want an OK to check this in. Moshe, please remove the func_hash() and func_compare() functions, and if the patch passes the test suite, go ahead and check it all in. Please also check in a test case. Thanks, -Barry From barry@digicool.com Mon Jan 29 05:04:12 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 00:04:12 -0500 Subject: [Python-Dev] Function Hash: Check it in? References: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <14964.63948.492662.775413@anthem.wooz.org> Oh yeah, please also add an entry to the NEWS file. Thanks, -Barry From moshez@zadka.site.co.il Mon Jan 29 06:26:25 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 08:26:25 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <14964.63948.492662.775413@anthem.wooz.org> References: <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 00:04:12 -0500, barry@digicool.com (Barry A. Warsaw) wrote: > Oh yeah, please also add an entry to the NEWS file. Done. The checkin to the NEWS file will be done in about a million years, when my antique of a modem finishes sending the data. I had to change test_opcodes since it tested that functions with the same code compare equal. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From gregor@hoffleit.de Mon Jan 29 11:13:39 2001 From: gregor@hoffleit.de (Gregor Hoffleit) Date: Mon, 29 Jan 2001 12:13:39 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.20,2.21 In-Reply-To: <20010129013904.R962@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 29, 2001 at 01:39:04AM +0100 References: <200010050142.SAA08326@slayer.i.sourceforge.net> <20001104001415.A2093@53b.hoffleit.de> <20010129013904.R962@xs4all.nl> Message-ID: <20010129121339.A1166@mediasupervision.de> On Mon, Jan 29, 2001 at 01:39:04AM +0100, Thomas Wouters wrote: > On Sat, Nov 04, 2000 at 12:14:15AM +0100, Gregor Hoffleit wrote: > > FYI: This misdefinition with LONG_BIT was due to a bug in glibc's limits.h. It > > has been fixed in glibc 2.96. > > Do you mean gcc 2.96, or glibc 2.(1|2).96 ? Or is 2.96 some internal > versioning for glibc that I was unaware of ? :) Sorry, it was fixed in glibc 2.1.96. Gregor From mal@lemburg.com Mon Jan 29 11:31:11 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 12:31:11 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> Message-ID: <3A75547F.A601E219@lemburg.com> Guido van Rossum wrote: > > > revision 1.34 > > date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 > > added rewritten normpath from Moshe Zadka that does the right thing with > > paths containing .. > [...] > > Revision 1.33 clearly leaves initial slashes untouched. > > I guess we should restore this... > > Yes, please! (Just the "leading extra slashes stay" behavior.) Checked in a patch which preserves '/' and '//' but converts more than 3 initial slashes into one (see Fredrik's note about POSIX standard on this). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 29 12:24:15 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 13:24:15 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> Message-ID: <3A7560EF.39D6CF@lemburg.com> Here the results of my micro benckmark pybench 0.7: PYBENCH 0.7 Benchmark: /home/lemburg/tmp/pybench-2.1a1.pyb (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 1102.30 ms 8.65 us +7.56% BuiltinMethodLookup: 966.75 ms 1.84 us +4.56% ConcatStrings: 1198.55 ms 7.99 us +11.63% ConcatUnicode: 1835.60 ms 12.24 us +19.29% CreateInstances: 1556.40 ms 37.06 us +2.49% CreateStringsWithConcat: 1396.70 ms 6.98 us +5.44% CreateUnicodeWithConcat: 1895.80 ms 9.48 us +31.61% DictCreation: 1760.50 ms 11.74 us +2.43% ForLoops: 1426.90 ms 142.69 us -7.51% IfThenElse: 1155.25 ms 1.71 us -6.24% ListSlicing: 555.40 ms 158.69 us -4.14% NestedForLoops: 784.55 ms 2.24 us -6.33% NormalClassAttribute: 1052.80 ms 1.75 us -10.42% NormalInstanceAttribute: 1053.80 ms 1.76 us +0.89% PythonFunctionCalls: 1127.50 ms 6.83 us +12.56% PythonMethodCalls: 909.10 ms 12.12 us +9.70% Recursion: 942.40 ms 75.39 us +23.74% SecondImport: 924.20 ms 36.97 us +3.98% SecondPackageImport: 951.10 ms 38.04 us +6.16% SecondSubmoduleImport: 1211.30 ms 48.45 us +7.69% SimpleComplexArithmetic: 1635.30 ms 7.43 us +5.58% SimpleDictManipulation: 963.35 ms 3.21 us -0.57% SimpleFloatArithmetic: 877.00 ms 1.59 us -2.92% SimpleIntFloatArithmetic: 851.10 ms 1.29 us -5.89% SimpleIntegerArithmetic: 850.05 ms 1.29 us -6.41% SimpleListManipulation: 1168.50 ms 4.33 us +8.14% SimpleLongArithmetic: 1231.15 ms 7.46 us +1.52% SmallLists: 2153.35 ms 8.44 us +10.77% SmallTuples: 1314.65 ms 5.48 us +3.80% SpecialClassAttribute: 1050.80 ms 1.75 us +1.48% SpecialInstanceAttribute: 1248.75 ms 2.08 us -2.32% StringMappings: 1702.60 ms 13.51 us +19.69% StringPredicates: 1024.25 ms 3.66 us -25.49% StringSlicing: 1093.35 ms 6.25 us +4.35% TryExcept: 1584.85 ms 1.06 us -10.90% TryRaiseExcept: 1239.50 ms 82.63 us +4.64% TupleSlicing: 983.00 ms 9.36 us +3.36% UnicodeMappings: 1631.65 ms 90.65 us +42.76% UnicodePredicates: 1762.10 ms 7.83 us +15.99% UnicodeProperties: 1410.80 ms 7.05 us +19.57% UnicodeSlicing: 1366.20 ms 7.81 us +19.23% ------------------------------------------------------------------------ Average round time: 58001.00 ms +3.30% *) measured against: /home/lemburg/tmp/pybench-2.0.pyb (rounds=10, warp=20) The benchmark is available here in case someone wants to verify the results on different platforms: http://www.lemburg.com/python/pybench-0.7.zip The above tests were done on a Linux 2.2 system, AMD K6 233MHz. The figures shown compare CVS Python (2.1a1) against stock Python 2.0. As you can see, Python function calls have suffered a lot for some reason. Unicode mappings and other Unicode database related methods show the effect of the compression of the Unicode database -- a clear space/speed tradeoff. I can't really explain why Unicode concatenation has had a slowdown -- perhaps the new coercion logic has something to do with this ?! On the nice side: attribute lookups are faster; probably due to the string key optimizations in the dictionary implementation. Loops and exceptions are also a tad faster. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@pythonware.com Mon Jan 29 12:30:32 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 29 Jan 2001 13:30:32 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> Message-ID: <01fc01c089ef$48072230$0900a8c0@SPIFF> mal wrote: > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > Unicode mappings and other Unicode database related methods > show the effect of the compression of the Unicode database -- a > clear space/speed tradeoff. umm. the tests don't seem to test the "\N{name}" escapes, so the only thing that has changed in 2.1 is the "decomposition" method (used in the UnicodeProperties test). are you sure you're comparing against 2.0 final? Cheers /F From mal@lemburg.com Mon Jan 29 12:52:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 13:52:12 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> Message-ID: <3A75677C.E4FA82A0@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > > > Unicode mappings and other Unicode database related methods > > show the effect of the compression of the Unicode database -- a > > clear space/speed tradeoff. > > umm. the tests don't seem to test the "\N{name}" escapes, so the > only thing that has changed in 2.1 is the "decomposition" method > (used in the UnicodeProperties test). The mappings figure surprised me too: the code has not changed, but the unicodetype_db.h look different. Don't know how this affects performance though. The differences could also be explained by a increase in Unicode object creation time (the concatenation is also a lot slower), so perhaps that's where we should look... > are you sure you're comparing against 2.0 final? Yes... after a check of the Makefile I found that I had compiled Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this makes a difference w/r to inlining of code. I'll recompile and rerun the benchmark. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one@home.com Mon Jan 29 12:56:49 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 07:56:49 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include Message-ID: [Ping] > dict[key] = 1 > if key in dict: ... > for key in dict: ... [Guido] > No chance of a time-machine escape, but I *can* say that I agree that > Ping's proposal makes a lot of sense. This is a reversal of my > previous opinion on this matter. (Take note -- those don't happen > very often! :-) > > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, Thomas since submitted a patch to do the "if key in dict" part (which I reviewed and accepted, pending resolution of doc issues). It does not do the "for key in dict" part. It's not entirely clear whether you intended to approve that part too (I've simplified away many layers of quoting in the above ). In any case, nobody is working on that part. WRT that part, Ping produced some stats in: http://mail.python.org/pipermail/python-dev/2001-January/012106.html > How often do you write 'dict.has_key(x)'? (std lib says: 206) > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > How often do you write 'x in dict.values()'? (std lib says: 0) > How often do you write 'for x in dict.values()'? (std lib says: 3) However, he did not report on occurrences of for k, v in dict.items() I'm not clear exactly which files he examined in the above, or how the counts were obtained. So I don't know how this compares: I counted 188 instances of the string ".items(" in 122 .py files, under the dist/ portion of current CVS. A number of those were assignment and return stmts, others were dict.items() in an arglist, and at least one was in a comment. After weeding those out, I was left with 153 legit "for" loops iterating over x.items(). In all: 153 iterating over x.items() 118 " over x.keys() 17 " over x.values() So I conclude that iterating over .values() is significantly more common than iterating over .keys(). On c.l.py about an hour ago, Thomas complained that two (out of two) of his coworkers guessed wrong about what for x in dict: would do, but didn't say what they *did* think it would do. Since Thomas doesn't work with idiots, I'm guessing they *didn't* guess it would iterate over either values or the lines of a freshly-opened file named "dict" . So if you did intend to approve "for x in dict" iterating over dict.keys(), maybe you want to call me out on that "approval post" I forged under your name. falls-on-swords-so-often-there's-nothing-left-to-puncture-ly y'rs - tim From mal@lemburg.com Mon Jan 29 13:18:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 14:18:52 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> Message-ID: <3A756DBC.8EAC42F5@lemburg.com> "M.-A. Lemburg" wrote: > > Fredrik Lundh wrote: > > > > mal wrote: > > > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > > > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > > > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > > > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > > > > > Unicode mappings and other Unicode database related methods > > > show the effect of the compression of the Unicode database -- a > > > clear space/speed tradeoff. > > > > umm. the tests don't seem to test the "\N{name}" escapes, so the > > only thing that has changed in 2.1 is the "decomposition" method > > (used in the UnicodeProperties test). > > The mappings figure surprised me too: the code has not changed, > but the unicodetype_db.h look different. Don't know how this > affects performance though. > > The differences could also be explained by a increase in Unicode > object creation time (the concatenation is also a lot slower), > so perhaps that's where we should look... > > > are you sure you're comparing against 2.0 final? > > Yes... after a check of the Makefile I found that I had compiled > Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this makes > a difference w/r to inlining of code. I'll recompile and rerun > the benchmark. Looks like there is an effect of choosing -O3 over -O2 (even though not necessarily positive all the way); what results do you get on Windows ? -- PYBENCH 0.7 Benchmark: /home/lemburg/tmp/pybench-2.1a1.pyb (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 1065.10 ms 8.35 us +3.93% BuiltinMethodLookup: 1286.30 ms 2.45 us +39.12% ConcatStrings: 1243.30 ms 8.29 us +15.80% ConcatUnicode: 1449.10 ms 9.66 us -5.83% CreateInstances: 1639.25 ms 39.03 us +7.95% CreateStringsWithConcat: 1453.45 ms 7.27 us +9.73% CreateUnicodeWithConcat: 1558.45 ms 7.79 us +8.19% DictCreation: 1869.35 ms 12.46 us +8.77% ForLoops: 1526.85 ms 152.69 us -1.03% IfThenElse: 1381.00 ms 2.05 us +12.09% ListSlicing: 547.40 ms 156.40 us -5.52% NestedForLoops: 824.50 ms 2.36 us -1.56% NormalClassAttribute: 1233.55 ms 2.06 us +4.96% NormalInstanceAttribute: 1215.50 ms 2.03 us +16.37% PythonFunctionCalls: 1107.30 ms 6.71 us +10.55% PythonMethodCalls: 1047.00 ms 13.96 us +26.34% Recursion: 940.35 ms 75.23 us +23.47% SecondImport: 894.05 ms 35.76 us +0.59% SecondPackageImport: 915.05 ms 36.60 us +2.14% SecondSubmoduleImport: 1131.10 ms 45.24 us +0.56% SimpleComplexArithmetic: 1652.05 ms 7.51 us +6.67% SimpleDictManipulation: 1150.25 ms 3.83 us +18.72% SimpleFloatArithmetic: 889.65 ms 1.62 us -1.52% SimpleIntFloatArithmetic: 900.80 ms 1.36 us -0.40% SimpleIntegerArithmetic: 901.75 ms 1.37 us -0.72% SimpleListManipulation: 1125.40 ms 4.17 us +4.15% SimpleLongArithmetic: 1305.15 ms 7.91 us +7.62% SmallLists: 2102.85 ms 8.25 us +8.18% SmallTuples: 1329.55 ms 5.54 us +4.98% SpecialClassAttribute: 1234.60 ms 2.06 us +19.23% SpecialInstanceAttribute: 1422.55 ms 2.37 us +11.28% StringMappings: 1585.55 ms 12.58 us +11.46% StringPredicates: 1241.35 ms 4.43 us -9.69% StringSlicing: 1206.20 ms 6.89 us +15.12% TryExcept: 1764.35 ms 1.18 us -0.81% TryRaiseExcept: 1217.40 ms 81.16 us +2.77% TupleSlicing: 933.00 ms 8.89 us -1.90% UnicodeMappings: 1137.35 ms 63.19 us -0.49% UnicodePredicates: 1632.05 ms 7.25 us +7.43% UnicodeProperties: 1244.05 ms 6.22 us +5.44% UnicodeSlicing: 1252.10 ms 7.15 us +9.27% ------------------------------------------------------------------------ Average round time: 58804.00 ms +4.73% *) measured against: /home/lemburg/tmp/pybench-2.0.pyb (rounds=10, warp=20) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 29 13:28:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 14:28:24 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A756FF8.B7185FA2@lemburg.com> Tim Peters wrote: > > [Ping] > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > [Guido] > > No chance of a time-machine escape, but I *can* say that I agree that > > Ping's proposal makes a lot of sense. This is a reversal of my > > previous opinion on this matter. (Take note -- those don't happen > > very often! :-) > > > > First to submit a working patch gets a free copy of 2.1a2 and > > subsequent releases, > > Thomas since submitted a patch to do the "if key in dict" part (which I > reviewed and accepted, pending resolution of doc issues). > > It does not do the "for key in dict" part. It's not entirely clear whether > you intended to approve that part too (I've simplified away many layers of > quoting in the above ). In any case, nobody is working on that part. > > WRT that part, Ping produced some stats in: > > http://mail.python.org/pipermail/python-dev/2001-January/012106.html > > > How often do you write 'dict.has_key(x)'? (std lib says: 206) > > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > > > How often do you write 'x in dict.values()'? (std lib says: 0) > > How often do you write 'for x in dict.values()'? (std lib says: 3) > > However, he did not report on occurrences of > > for k, v in dict.items() > > I'm not clear exactly which files he examined in the above, or how the > counts were obtained. So I don't know how this compares: I counted 188 > instances of the string ".items(" in 122 .py files, under the dist/ portion > of current CVS. A number of those were assignment and return stmts, others > were dict.items() in an arglist, and at least one was in a comment. After > weeding those out, I was left with 153 legit "for" loops iterating over > x.items(). In all: > > 153 iterating over x.items() > 118 " over x.keys() > 17 " over x.values() > > So I conclude that iterating over .values() is significantly more common > than iterating over .keys(). > > On c.l.py about an hour ago, Thomas complained that two (out of two) of his > coworkers guessed wrong about what > > for x in dict: > > would do, but didn't say what they *did* think it would do. Since Thomas > doesn't work with idiots, I'm guessing they *didn't* guess it would iterate > over either values or the lines of a freshly-opened file named "dict" > . > > So if you did intend to approve "for x in dict" iterating over dict.keys(), > maybe you want to call me out on that "approval post" I forged under your > name. Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect... Please also take into account that dictionaries are *mutable* and their internal state is not defined to e.g. not change due to lookups (take the string optimization for example...), so exposing PyDict_Next() in any to Python will cause trouble. In the end, you will need to create a list or tuple to iterate over one way or another, so why bother overloading for-loops w/r to dictionaries ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bckfnn@worldonline.dk Mon Jan 29 13:48:44 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Mon, 29 Jan 2001 13:48:44 GMT Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> References: <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> Message-ID: <3a75747e.17414620@smtp.worldonline.dk> On Mon, 29 Jan 2001 08:26:25 +0200 (IST), you wrote: >I had to change test_opcodes since it tested that functions with the >same code compare equal. Thanks. With this change, Jython too can complete the test_opcodes. In Jython a code object can never compare equal to anything but itself. regards, finn From moshez@zadka.site.co.il Mon Jan 29 14:04:47 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 16:04:47 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <3a75747e.17414620@smtp.worldonline.dk> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> Message-ID: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 13:48:44 GMT, bckfnn@worldonline.dk (Finn Bock) wrote: > Thanks. With this change, Jython too can complete the test_opcodes. In > Jython a code object can never compare equal to anything but itself. Great! I'm happy to have helped. I'm starting to wonder what the tests really test: the language definition, or accidents of the implementation? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From MarkH@ActiveState.com Mon Jan 29 14:35:25 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Tue, 30 Jan 2001 01:35:25 +1100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A756DBC.8EAC42F5@lemburg.com> Message-ID: "M.-A. Lemburg" wrote: > what results do you get on Windows ? Win2k, dual 800, relatively quiet! Python 2.0 F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.847605 This machine benchmarks at 11798 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.845104 This machine benchmarks at 11832.9 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.846069 This machine benchmarks at 11819.4 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.849447 This machine benchmarks at 11772.4 pystones/second Python from CVS today: F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.885801 This machine benchmarks at 11289.2 pystones/second F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.889048 This machine benchmarks at 11248 pystones/second F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.892422 This machine benchmarks at 11205.5 pystones/second Although I deleted Tim's earlier mail, from memory this is pretty similar in terms of performance lost. I'm afraid I have no idea what your benchmarks are or how to build them , but did check that the optimizer is set for "mazimize for speed" (/O2). Other compiler options gave significantly smaller results (no optimizations around 8500, and "optimize for space" (/O1) at around 10000). Other fiddling with the optimizer couldn't get better results than the existing settings. Mark. From guido@digicool.com Mon Jan 29 14:48:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 09:48:22 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 07:56:49 EST." References: Message-ID: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> > [Ping] > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > [Guido] > > No chance of a time-machine escape, but I *can* say that I agree that > > Ping's proposal makes a lot of sense. This is a reversal of my > > previous opinion on this matter. (Take note -- those don't happen > > very often! :-) > > > > First to submit a working patch gets a free copy of 2.1a2 and > > subsequent releases, > > Thomas since submitted a patch to do the "if key in dict" part (which I > reviewed and accepted, pending resolution of doc issues). > > It does not do the "for key in dict" part. It's not entirely clear whether > you intended to approve that part too (I've simplified away many layers of > quoting in the above ). In any case, nobody is working on that part. > > WRT that part, Ping produced some stats in: > > http://mail.python.org/pipermail/python-dev/2001-January/012106.html > > > How often do you write 'dict.has_key(x)'? (std lib says: 206) > > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > > > How often do you write 'x in dict.values()'? (std lib says: 0) > > How often do you write 'for x in dict.values()'? (std lib says: 3) > > However, he did not report on occurrences of > > for k, v in dict.items() > > I'm not clear exactly which files he examined in the above, or how the > counts were obtained. So I don't know how this compares: I counted 188 > instances of the string ".items(" in 122 .py files, under the dist/ portion > of current CVS. A number of those were assignment and return stmts, others > were dict.items() in an arglist, and at least one was in a comment. After > weeding those out, I was left with 153 legit "for" loops iterating over > x.items(). In all: > > 153 iterating over x.items() > 118 " over x.keys() > 17 " over x.values() > > So I conclude that iterating over .values() is significantly more common > than iterating over .keys(). I did a less sophisticated count but come to the same conclusion: iterations over items() are (somewhat) more common than over keys(), and values() are 1-2 orders of magnitude less common. My numbers: $ cd python/src/Lib $ grep 'for .*items():' *.py | wc -l 47 $ grep 'for .*keys():' *.py | wc -l 43 $ grep 'for .*values():' *.py | wc -l 2 > On c.l.py about an hour ago, Thomas complained that two (out of two) of his > coworkers guessed wrong about what > > for x in dict: > > would do, but didn't say what they *did* think it would do. Since Thomas > doesn't work with idiots, I'm guessing they *didn't* guess it would iterate > over either values or the lines of a freshly-opened file named "dict" > . I don't much value to the readability argument: typically, one will write "for key in dict" or "for name in dict" and then it's obvious what is meant. > So if you did intend to approve "for x in dict" iterating over dict.keys(), > maybe you want to call me out on that "approval post" I forged under your > name. But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody has even asked me for a has_item() method). I can live with "x in list" checking the values and "x in dict" checking the keys. But I can *not* live with "x in dict" equivalent to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()". I also think that defining "x in dict" but not "for x in dict" will be confusing. So we need to think more. How about: for key in dict: ... # ... over keys for key:value in dict: ... # ... over items This is syntactically unambiguous (a colon is currently illegal in that position). This also suggests: for index:value in list: ... # ... over zip(range(len(list), list) while doesn't strike me as bad or ugly, and would fulfill my brother's dearest wish. (And why didn't we think of this before?) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Mon Jan 29 14:58:16 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 29 Jan 2001 15:58:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291448.JAA11473@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:48:22AM -0500 References: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: <20010129155816.T962@xs4all.nl> On Mon, Jan 29, 2001 at 09:48:22AM -0500, Guido van Rossum wrote: > How about: > for key in dict: ... # ... over keys > for key:value in dict: ... # ... over items > This is syntactically unambiguous (a colon is currently illegal in > that position). I won't comment on the syntax right now, I need to look at it for a while first :-) However, what about MAL's point about dict ordering, internally ? Wouldn't FOR_LOOP be forced to generate a list of keys anyway, to avoid skipping keys ? I know currently the dict implementation doesn't do any reordering except during adds/deletes, but there is nothing in the language ref that supports that -- it's an implementation detail. Would we make a future enhancement where (some form of) gc would 'clean up' large dictionaries impossible ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Mon Jan 29 15:00:38 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:00:38 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 14:28:24 +0100." <3A756FF8.B7185FA2@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> Message-ID: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> > Dictionaries are not sequences. I wonder what order a user of > for k,v in dict: (or whatever other of this proposal you choose) > will expect... The same order that for k,v in dict.items() will yield, of course. > Please also take into account that dictionaries are *mutable* > and their internal state is not defined to e.g. not change due to > lookups (take the string optimization for example...), so exposing > PyDict_Next() in any to Python will cause trouble. In the end, > you will need to create a list or tuple to iterate over one way > or another, so why bother overloading for-loops w/r to dictionaries ? Actually, I was going to propose to play dangerously here: the for k:v in dict: ... syntax I proposed in my previous message should indeed expose PyDict_Next(). It should be a big speed-up, and I'm expecting (though don't have much proof) that most loops over dicts don't mutate the dict. Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jan 29 15:30:17 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:30:17 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: Your message of "Mon, 29 Jan 2001 16:04:47 +0200." <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <200101291530.KAA12037@cj20424-a.reston1.va.home.com> > I'm starting to wonder what the tests really test: the language definition, > or accidents of the implementation? It's good to test conformance to the language definition, but this is also a regression test for the implementation. The "accidents of the implementation" definitely need to be tested. E.g. if we decide that repr(s) uses \n rather than \012 or \x0a, this should be tested too. The language definition gives the implementer a choice here; but once the implementer has made a choice, it's good to have a test that tests that this choice is implemented correctly. Perhaps there should be several parts to the regression test, e.g. language conformance, library conformance, platform-specific features, and implementation conformance? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jan 29 15:57:12 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:57:12 -0500 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: Your message of "Sun, 28 Jan 2001 15:40:28 CST." <14964.37324.642566.602319@beluga.mojam.com> References: <14964.37324.642566.602319@beluga.mojam.com> Message-ID: <200101291557.KAA12347@cj20424-a.reston1.va.home.com> > I just remembered Modules/Setup.local. I peeked at mine and noticed it had > been zeroed out. I then copied a version of it over from another machine > and reran make a couple times. Makesetup ran but nothing mentioned in > Setup.local got built. > > I don't think 2.1 can be released without providing a way for users to > recover from this change. I didn't see anything obvious in setup.py. Am I > missing something? Well, Module/Setup is still used, so it should be trivial to add Setup.local back too. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@arctrix.com Mon Jan 29 09:23:55 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 29 Jan 2001 01:23:55 -0800 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <14964.37324.642566.602319@beluga.mojam.com>; from skip@mojam.com on Sun, Jan 28, 2001 at 03:40:28PM -0600 References: <14964.37324.642566.602319@beluga.mojam.com> Message-ID: <20010129012355.A14763@glacier.fnational.com> On Sun, Jan 28, 2001 at 03:40:28PM -0600, Skip Montanaro wrote: > Makesetup ran but nothing mentioned in Setup.local got built. I believe Setup.local should still work. One possibility is that the modules in Setup.local were marked as shared. Shared modules from Setup* don't get build by default. You have to do "make oldsharedmods". I'm not sure why oldsharedmods is not included in the all target. Andrew, can you think of any reason why it shouldn't be added. Neil From dgoodger@atsautomation.com Mon Jan 29 16:19:12 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 29 Jan 2001 11:19:12 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: Marc-Andre Lemburg's patch to posixpath.py clears up the path problem. Thanks! MACHDEP is qnxJ for QNX 4.25, qnxG for QNX 4.23. I don't know what it is for QNX 6 (Neutrino). Perhaps test for MACHDEP[:3]=='qnx'? I'm still stuck at 'python setup.py build': unable to execute ld: no such file or directory running build running build_ext building 'struct' extension skipping //5/tmp/py/Python-2.1a1/Modules/structmodule.c (build/temp.qnx-J-PCI-2.1/structmodule.o up-to-date) ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o build/lib.qnx-J-PCI-2.1/struct.so error: command 'ld' failed with exit status 1 make: *** [sharedmods] Error 1 Armin Steinhoff said "QNX4 doesn't support dynamic loading". Is this compatible with distutils? If not, is there a workaround? Neil Schemenauer asked, "what should LDSHARED say for QNX?". I don't know. Python 2.0 compiled OK, and its makefile says LDSHARED=ld. However, Modules/Setup has no uncommented "*shared*" line. Those of us who rely on Python to get our work done, and who don't have the bandwidth for the implementation complexities, owe a lot to everyone who makes it possible to compile Python out-of-the-box. Very much appreciated. Thank you! David Goodger Systems Administrator & Programmer, Advanced Systems Automation Tooling Systems Inc., Automation Systems Division direct: (519) 653-4483 ext. 7121 fax: (519) 650-6695 e-mail: dgoodger@atsautomation.com From nas@arctrix.com Mon Jan 29 09:40:07 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Mon, 29 Jan 2001 01:40:07 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Mon, Jan 29, 2001 at 11:19:12AM -0500 References: Message-ID: <20010129014007.C14763@glacier.fnational.com> On Mon, Jan 29, 2001 at 11:19:12AM -0500, Goodger, David wrote: > I'm still stuck at 'python setup.py build': ... > Armin Steinhoff said "QNX4 doesn't support dynamic loading". Is this > compatible with distutils? If not, is there a workaround? The setup.py script only builds shared modules. Your going to have to enable modules using the old Setup file. I think Setup.dist should got back to including all the modules (commented out of course). This would make it easier to people who can't or don't want to build shared modules. Neil From akuchlin@mems-exchange.org Mon Jan 29 16:50:31 2001 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Mon, 29 Jan 2001 11:50:31 -0500 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <20010129012355.A14763@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 29, 2001 at 01:23:55AM -0800 References: <14964.37324.642566.602319@beluga.mojam.com> <20010129012355.A14763@glacier.fnational.com> Message-ID: <20010129115031.B4018@amarok.cnri.reston.va.us> On Mon, Jan 29, 2001 at 01:23:55AM -0800, Neil Schemenauer wrote: >from Setup* don't get build by default. You have to do "make >oldsharedmods". I'm not sure why oldsharedmods is not included >in the all target. Andrew, can you think of any reason why it >shouldn't be added. That's an excellent idea, particularly if we add back Setup.dist, too, and comment out all but the required modules. I'll try to do that today. Note that I'm leaving on vacation tomorrow, and will be back next Monday. Everyone, feel free to check in changes to setup.py that are required. --amk From jeremy@alum.mit.edu Mon Jan 29 16:48:11 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 11:48:11 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A75677C.E4FA82A0@lemburg.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> Message-ID: <14965.40651.233438.311104@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> Yes... after a check of the Makefile I found that I had MAL> compiled Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this MAL> makes a difference w/r to inlining of code. I'll recompile and MAL> rerun the benchmark. When I was working in the CALL_FUNCTION revision, I compared 2.0 final with my development working using -O3. At that time, I saw no significant performance difference between the two. And I did notice a difference between -O2 and -O3. The strange thing is that I notice a difference between -O2 and -O3 with 2.1a1, but in the opposite direction. On pystone, python -O2 runs consistently faster than -O3; the difference is .05 sec on my machine. Jeremy From esr@thyrsus.com Mon Jan 29 17:12:05 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 12:12:05 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.40651.233438.311104@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 29, 2001 at 11:48:11AM -0500 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> Message-ID: <20010129121205.A8337@thyrsus.com> Jeremy Hylton : > The strange thing is that I notice a difference between -O2 and -O3 > with 2.1a1, but in the opposite direction. On pystone, python -O2 > runs consistently faster than -O3; the difference is .05 sec on my > machine. Bizarre. Make me wonder if we have a C compiler problem. -- Eric S. Raymond In every country and in every age, the priest has been hostile to liberty. He is always in alliance with the despot, abetting his abuses in return for protection to his own. -- Thomas Jefferson, 1814 From jeremy@alum.mit.edu Mon Jan 29 17:27:08 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 12:27:08 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <20010129121205.A8337@thyrsus.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> <20010129121205.A8337@thyrsus.com> Message-ID: <14965.42988.362288.154254@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Jeremy Hylton : >> The strange thing is that I notice a difference between -O2 and >> -O3 with 2.1a1, but in the opposite direction. On pystone, >> python -O2 runs consistently faster than -O3; the difference is >> .05 sec on my machine. ESR> Bizarre. Make me wonder if we have a C compiler problem. Depends on your defintion of "compiler problem" . If you mean, it compiles our code so it runs slower, then, yes, we've got one :-). One of the differences between -O2 and -O3, according to the man page, is that -O3 will perform optimizations that involve a space-speed tradeoff. It also include -finline-functions. I can imagine that some of these optimizations hurt memory performance enough to make a difference. not-really-understanding-but-not-really-expecting-too-ly y'rs, Jeremy From jeremy@alum.mit.edu Mon Jan 29 17:39:05 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 12:39:05 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.40651.233438.311104@localhost.localdomain> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> Message-ID: <14965.43705.367236.994786@localhost.localdomain> The recursion test in pybench is testing the performance of the nested scopes changes, which must do some extra bookkeeping to reference the recursive function in a nested scope. To some extent, a performance hit is a necessary consequence for nested functions with free variables. Nonetheless, there are two interesting things to say about this situation. First, there is a bug in the current implementation of nested scopes that the benchmark tickles. The problem is with code like this: def outer(): global f def f(x): if x > 0: return f(x - 1) The compiler determines that f is free in f. (It's recursive.) If f is free in f, in the absence of the global decl, the body of outer must allocate fresh storage (a cell) for f each time outer is called and add a reference to that cell to f's closure. If f is declared global in outer, then it ought to be treated as a global in nested scopes, too. In general terms, a free variable should use the binding found in the nearest enclosing scope. If the nearest enclosing scope has a global binding, then the reference is global. If I fix this problem, the recursion benchmark shouldn't be any slower than a normal function call. The second interesting thing to say is that frame allocation and dealloc is probably more expensive than it needs to be in the current implementation. The frame object has a new f_closure slot that holds a tuple that is freshly allocated every time the frame is allocated. (Unless the closure is empty, then f_closure is just NULL.) The extra tuple allocation can probably be done away with by using the same allocation strategy as locals & stack. If the f_localsplus array holds cells + frees + locals + stack, then a new frame will never require more than a single malloc (and often not even that). Jeremy From akuchlin@mems-exchange.org Mon Jan 29 17:54:37 2001 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Mon, 29 Jan 2001 12:54:37 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.42988.362288.154254@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 29, 2001 at 12:27:08PM -0500 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> <20010129121205.A8337@thyrsus.com> <14965.42988.362288.154254@localhost.localdomain> Message-ID: <20010129125437.E4018@amarok.cnri.reston.va.us> On Mon, Jan 29, 2001 at 12:27:08PM -0500, Jeremy Hylton wrote: >Depends on your defintion of "compiler problem" . If you mean, >it compiles our code so it runs slower, then, yes, we've got one :-). Compiling with gcc and -g, with no optimization, 2.0 and 2.1cvs seem to be very close, with 2.1 slightly slower: 2.0: Pystone(1.1) time for 10000 passes = 1.04 This machine benchmarks at 9615.38 pystones/second This machine benchmarks at 9345.79 pystones/second This machine benchmarks at 9433.96 pystones/second This machine benchmarks at 9433.96 pystones/second This machine benchmarks at 9523.81 pystones/second 2.1cvs: Pystone(1.1) time for 10000 passes = 1.09 This machine benchmarks at 9174.31 pystones/second This machine benchmarks at 9090.91 pystones/second This machine benchmarks at 9259.26 pystones/second This machine benchmarks at 9174.31 pystones/second This machine benchmarks at 9090.91 pystones/second Would it be worth experimenting with platform-specific compiler options to try to squeeze out the last bit of performance (can wait for the betas, probably). --amk From jeremy@alum.mit.edu Mon Jan 29 18:04:28 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 13:04:28 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A756DBC.8EAC42F5@lemburg.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <3A756DBC.8EAC42F5@lemburg.com> Message-ID: <14965.45228.197778.579989@localhost.localdomain> I hope another set of benchmarks isn't overkill for the list. I see different results comparing 2.1 with 2.0 (both -O3) using pybench 0.6. The interesting differences I see in this benchmark that I didn't see in MAL's are: DictCreation +15.87% SeoncdImport +20.29% Other curious differences, which show up in both benchmarks, include: SpecialClassAttribute +17.91% (private variables) SpecialInstanceAttribute +15.34% (__methods__) Jeremy PYBENCH 0.6 Benchmark: py21 (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 305.05 ms 2.39 us +4.77% BuiltinMethodLookup: 319.65 ms 0.61 us +2.55% ConcatStrings: 383.70 ms 2.56 us +1.27% CreateInstances: 463.85 ms 11.04 us +1.96% CreateStringsWithConcat: 381.20 ms 1.91 us +2.39% DictCreation: 508.85 ms 3.39 us +15.87% ForLoops: 577.60 ms 57.76 us +5.65% IfThenElse: 443.70 ms 0.66 us +1.02% ListSlicing: 207.50 ms 59.29 us -4.18% NestedForLoops: 315.75 ms 0.90 us +3.54% NormalClassAttribute: 379.80 ms 0.63 us +7.39% NormalInstanceAttribute: 385.45 ms 0.64 us +8.04% PythonFunctionCalls: 400.00 ms 2.42 us +13.62% PythonMethodCalls: 306.25 ms 4.08 us +5.13% Recursion: 337.25 ms 26.98 us +19.00% SecondImport: 301.20 ms 12.05 us +20.29% SecondPackageImport: 298.20 ms 11.93 us +18.15% SecondSubmoduleImport: 339.15 ms 13.57 us +11.40% SimpleComplexArithmetic: 392.70 ms 1.79 us -10.52% SimpleDictManipulation: 350.40 ms 1.17 us +3.87% SimpleFloatArithmetic: 300.75 ms 0.55 us +2.04% SimpleIntFloatArithmetic: 347.95 ms 0.53 us +9.01% SimpleIntegerArithmetic: 356.40 ms 0.54 us +12.01% SimpleListManipulation: 351.85 ms 1.30 us +11.33% SimpleLongArithmetic: 309.00 ms 1.87 us -5.81% SmallLists: 584.25 ms 2.29 us +10.20% SmallTuples: 442.00 ms 1.84 us +10.33% SpecialClassAttribute: 406.50 ms 0.68 us +17.91% SpecialInstanceAttribute: 557.40 ms 0.93 us +15.34% StringSlicing: 336.45 ms 1.92 us +9.56% TryExcept: 650.60 ms 0.43 us +1.40% TryRaiseExcept: 345.95 ms 23.06 us +2.70% TupleSlicing: 266.35 ms 2.54 us +4.70% ------------------------------------------------------------------------ Average round time: 14413.00 ms +7.07% *) measured against: py20 (rounds=10, warp=20) From skip@mojam.com (Skip Montanaro) Mon Jan 29 18:07:26 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 29 Jan 2001 12:07:26 -0600 (CST) Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <20010129012355.A14763@glacier.fnational.com> References: <14964.37324.642566.602319@beluga.mojam.com> <20010129012355.A14763@glacier.fnational.com> Message-ID: <14965.45406.933528.53857@beluga.mojam.com> Neil> You have to do "make oldsharedmods". This did the trick. This should be emblazoned in big red letters somewhere if the decision is made to not include oldsharedmods as a dependency for the all target. Thx, Skip From gvwilson@ca.baltimore.com Mon Jan 29 18:19:21 2001 From: gvwilson@ca.baltimore.com (Greg Wilson) Date: Mon, 29 Jan 2001 13:19:21 -0500 Subject: [Python-Dev] Re: Re: Sets: elt in dict, lst.include In-Reply-To: <20010129162012.32158ED49@mail.python.org> Message-ID: <001501c08a20$00dca2a0$770a0a0a@nevex.com> > > > [Ping] > > > dict[key] = 1 > > > if key in dict: ... > > > for key in dict: ... > "Tim Peters" > "if (k, v) in dict" is clearly useless... > I can live with "x in list" checking the values and "x in dict" > checking the keys. But I can *not* live with "x in dict" equivalent > to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()". > I also think that defining "x in dict" but not "for x in dict" will be > confusing. [Greg] Quick poll (four people): if the expression "if a in b" works, then all four expected "for a in b" to work as well. This is also my intuition; are there any exceptions in really existing Python? > [Guido] > for key in dict: ... # ... over keys > for key:value in dict: ... # ... over items [Greg] I'm probably revealing my ignorance of Python's internals here, but can the iteration protocol be extended so that the object (in this case, the dict) is told the number and type(s) of the values the loop is expecting? With: for key in dict: ... the dict would be asked for one value; with: for (key, value) in dict: the dict would be told that a two-element tuple was expected, and so on. This would allow multi-dimensional structures (e.g. NumPy arrays) to do things like: for (i, j, k) in array: # please give me three indices and: for ((i, j, k), v) in array: # three indices and value > [Guido] > for index:value in list: ... # ... over zip(range(len(list), list) How do you feel about: for i in seq.keys(): # strings, tuples, etc. "keys()" is kind of strange ("indices" or something would be more natural), *but* this allows uniform iteration over all built-in collections: def showem(c): for i in c.keys(): print i, c[i] Greg From bckfnn@worldonline.dk Mon Jan 29 18:31:48 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Mon, 29 Jan 2001 18:31:48 GMT Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <3a75aba9.31537178@smtp.worldonline.dk> On Mon, 29 Jan 2001 16:04:47 +0200 (IST), you wrote: >On Mon, 29 Jan 2001 13:48:44 GMT, bckfnn@worldonline.dk (Finn Bock) wrote: > >> Thanks. With this change, Jython too can complete the test_opcodes. In >> Jython a code object can never compare equal to anything but itself. > >Great! I'm happy to have helped. >I'm starting to wonder what the tests really test: the language definition, >or accidents of the implementation? Based on the amount of code in test_opcodes dedicated to code comparison, I doubt this particular situation was an accident. The problems I have had with the test suite are better described as accidents of the tests themself. From test_extcall: We expected (repr): "g() got multiple values for keyword argument 'b'" But instead we got: "g() got multiple values for keyword argument 'a'" This is caused by a difference in iteration over a dictionary. Or from test_import: test test_import crashed -- java.lang.ClassFormatError: java.lang.ClassFormatError: @test$py (Illegal Class name "@test$py") where '@' isn't allowed in java classnames. These are failures that have very little to do with the thing the test are about and nothing at all to do with the language definition. regards, finn From cgw@alum.mit.edu Mon Jan 29 18:35:58 2001 From: cgw@alum.mit.edu (Charles G Waldman) Date: Mon, 29 Jan 2001 12:35:58 -0600 (CST) Subject: [Python-Dev] Re: Re: Sets: elt in dict, lst.include In-Reply-To: <001501c08a20$00dca2a0$770a0a0a@nevex.com> References: <20010129162012.32158ED49@mail.python.org> <001501c08a20$00dca2a0$770a0a0a@nevex.com> Message-ID: <14965.47118.135246.700571@sirius.net.home> Greg Wilson writes: > This would allow multi-dimensional structures > (e.g. NumPy arrays) to do things like: > > for (i, j, k) in array: # please give me three indices > > and: > > for ((i, j, k), v) in array: # three indices and value And what if I had, for example, a 3-dimensional array where the values are 3-tuples? Would "for (i,j,k) in array" refer to the indices or the values? From mal@lemburg.com Mon Jan 29 19:03:41 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 20:03:41 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3A75BE8D.1B7673EE@lemburg.com> With all this confusion about how to actually write the iteration on dictionary items, wouldn't it make more sense to implement an extension module which then provides a __getitem__ style iterator for dictionaries by interfacing to PyDict_Next() ? The module could have three different iterators: 1. iterate over items 2. ... over keys 3. ... over values The reasoning behind this is that the __getitem__ interface is well established and this doesn't introduce any new syntax while still providing speed and flexibility. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jan 29 18:08:16 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 19:08:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3A75B190.3FD2A883@lemburg.com> Guido van Rossum wrote: > > > Dictionaries are not sequences. I wonder what order a user of > > for k,v in dict: (or whatever other of this proposal you choose) > > will expect... > > The same order that for k,v in dict.items() will yield, of course. And then people find out that the order has some sorting properties and start to use it... "how to sort a dictionary?" comes up again, every now and then. > > Please also take into account that dictionaries are *mutable* > > and their internal state is not defined to e.g. not change due to > > lookups (take the string optimization for example...), so exposing > > PyDict_Next() in any to Python will cause trouble. In the end, > > you will need to create a list or tuple to iterate over one way > > or another, so why bother overloading for-loops w/r to dictionaries ? > > Actually, I was going to propose to play dangerously here: the > > for k:v in dict: ... > > syntax I proposed in my previous message should indeed expose > PyDict_Next(). It should be a big speed-up, and I'm expecting (though > don't have much proof) that most loops over dicts don't mutate the > dict. > > Maybe we could add a flag to the dict that issues an error when a new > key is inserted during such a for loop? (I don't think the key order > can be affected when a key is *deleted*.) You mean: mark it read-only ? That would be a "nice to have" property for a lot of mutable types indeed -- sort of like low-level locks. This would be another candidate for an object flag (much like the one Fred wants to introduce for weak referenced objects). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@digicool.com Mon Jan 29 19:22:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 14:22:07 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 19:08:16 +0100." <3A75B190.3FD2A883@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> Message-ID: <200101291922.OAA13321@cj20424-a.reston1.va.home.com> > > > Dictionaries are not sequences. I wonder what order a user of > > > for k,v in dict: (or whatever other of this proposal you choose) > > > will expect... > > > > The same order that for k,v in dict.items() will yield, of course. > > And then people find out that the order has some sorting > properties and start to use it... "how to sort a dictionary?" > comes up again, every now and then. I don't understand why you bring this up. We're not revealing anything new here, the random order of dict items has always been part of the language. The answer to "how to sort a dict" should be "copy it into a list and sort that." Or am I missing something? > > > Please also take into account that dictionaries are *mutable* > > > and their internal state is not defined to e.g. not change due to > > > lookups (take the string optimization for example...), so exposing > > > PyDict_Next() in any to Python will cause trouble. In the end, > > > you will need to create a list or tuple to iterate over one way > > > or another, so why bother overloading for-loops w/r to dictionaries ? > > > > Actually, I was going to propose to play dangerously here: the > > > > for k:v in dict: ... > > > > syntax I proposed in my previous message should indeed expose > > PyDict_Next(). It should be a big speed-up, and I'm expecting (though > > don't have much proof) that most loops over dicts don't mutate the > > dict. > > > > Maybe we could add a flag to the dict that issues an error when a new > > key is inserted during such a for loop? (I don't think the key order > > can be affected when a key is *deleted*.) > > You mean: mark it read-only ? That would be a "nice to have" > property for a lot of mutable types indeed -- sort of like > low-level locks. This would be another candidate for an object flag > (much like the one Fred wants to introduce for weak referenced > objects). Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From gvwilson@ca.baltimore.com Mon Jan 29 19:38:50 2001 From: gvwilson@ca.baltimore.com (Greg Wilson) Date: Mon, 29 Jan 2001 14:38:50 -0500 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1124 - 13 msgs In-Reply-To: <20010129193101.7BF83EF62@mail.python.org> Message-ID: <001a01c08a2b$1ba5a040$770a0a0a@nevex.com> > Greg Wilson writes: > > This would allow multi-dimensional structures > > (e.g. NumPy arrays) to do things like: > > for (i, j, k) in array: > > and: > > for ((i, j, k), v) in array: # three indices and value > Charles Waldman asks: > And what if I had, for example, a 3-dimensional array where the values > are 3-tuples? Would "for (i,j,k) in array" refer to the > indices or the values? Greg Wilson writes: That would be up to the module's implementer --- my idea was to have the 'for' loop provide more information to the object being iterated over, so that it could "do the right thing" (just as objects do right now with "x[i]"). Greg From mal@lemburg.com Mon Jan 29 19:45:46 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 20:45:46 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> Message-ID: <3A75C86A.3A4236E8@lemburg.com> Guido van Rossum wrote: > > > > > Dictionaries are not sequences. I wonder what order a user of > > > > for k,v in dict: (or whatever other of this proposal you choose) > > > > will expect... > > > > > > The same order that for k,v in dict.items() will yield, of course. > > > > And then people find out that the order has some sorting > > properties and start to use it... "how to sort a dictionary?" > > comes up again, every now and then. > > I don't understand why you bring this up. We're not revealing > anything new here, the random order of dict items has always been part > of the language. The answer to "how to sort a dict" should be "copy > it into a list and sort that." > > Or am I missing something? I just wanted to hint at a problem which iterating over items in an unordered set can cause. Especially new Python users will find it confusing that the order of the items in an iteration can change from one run to the next. Not much of an argument, but I like explicit programming more than magic under the cover. What we really want is iterators for dictionaries, so why not implement these instead of tweaking for-loops. If you are looking for speedups w/r to for-loops, applying a different indexing technique in for-loops would go a lot further and provide better performance not only to dictionary loops, but also to other sequences. I have made some good experience with a special counter object (sort of like a mutable integer) which is used instead of the iteration index integer in the current implementation. Using an iterator object instead of the integer + __getitem__ call machinery would allow more flexibility for all kinds of sequences or containers. There could be an iterator type for dictionaries, one for generic __getitem__ style sequences, one for lists and tuples, etc. All of these could include special logic to get the most out of the targetted datatype. Well, just a thought... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr@thyrsus.com Mon Jan 29 20:02:47 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 15:02:47 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291922.OAA13321@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 02:22:07PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> Message-ID: <20010129150247.B10191@thyrsus.com> Guido van Rossum : > > > Maybe we could add a flag to the dict that issues an error when a new > > > key is inserted during such a for loop? (I don't think the key order > > > can be affected when a key is *deleted*.) > > > > You mean: mark it read-only ? That would be a "nice to have" > > property for a lot of mutable types indeed -- sort of like > > low-level locks. This would be another candidate for an object flag > > (much like the one Fred wants to introduce for weak referenced > > objects). > > Yes. For different reasons, I'd like to be able to set a constant flag on a object instance. Simple semantics: if you try to assign to a member or method, it throws an exception. Application? I have a large Python program that goes to a lot of effort to build elaborate context structures in core. It would be nice to know they can't be even inadvertently trashed without throwing an exception I can watch for. -- Eric S. Raymond No one is bound to obey an unconstitutional law and no courts are bound to enforce it. -- 16 Am. Jur. Sec. 177 late 2d, Sec 256 From esr@thyrsus.com Mon Jan 29 20:09:14 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 15:09:14 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75C86A.3A4236E8@lemburg.com>; from mal@lemburg.com on Mon, Jan 29, 2001 at 08:45:46PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> Message-ID: <20010129150914.C10191@thyrsus.com> M.-A. Lemburg : > If you are looking for speedups w/r to for-loops, applying a > different indexing technique in for-loops would go a lot further > and provide better performance not only to dictionary loops, > but also to other sequences. Which reminds me... There's not much I miss from C these days, but one thing I wish Python had is a more general for-loop. The C semantics that let you have any initialization, any termination test, and any iteration you like are rather cool. Yes, I realize that for (; ; ) {} can be simulated with: while 1: if : break Still, having them spatially grouped the way a C for does it is nice. Makes it easier to see invariants, I think. -- Eric S. Raymond "Rightful liberty is unobstructed action, according to our will, within limits drawn around us by the equal rights of others." -- Thomas Jefferson From moshez@zadka.site.co.il Mon Jan 29 20:29:53 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 22:29:53 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <200101291530.KAA12037@cj20424-a.reston1.va.home.com> References: <200101291530.KAA12037@cj20424-a.reston1.va.home.com>, <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <20010129202953.D1498A840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 10:30:17 -0500, Guido van Rossum wrote: > It's good to test conformance to the language definition, but this is > also a regression test for the implementation. The "accidents of the > implementation" definitely need to be tested. E.g. if we decide that > repr(s) uses \n rather than \012 or \x0a, this should be tested too. > The language definition gives the implementer a choice here; but once > the implementer has made a choice, it's good to have a test that tests > that this choice is implemented correctly. I agree. > Perhaps there should be several parts to the regression test, > e.g. language conformance, library conformance, platform-specific > features, and implementation conformance? This sounds like a good idea...probably for the 2.2 timeline. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one@home.com Mon Jan 29 21:51:56 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 16:51:56 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: [Moshe Zadka] > ... > I'm starting to wonder what the tests really test: the language > definition, or accidents of the implementation? You'd be amazed (appalled?) at how hard it is to separate them. In two previous lives as a Big Iron compiler hacker, we routinely had to get our compilers validated by a govt agency before any US govt account would be allowed to buy our stuff; e.g., http://www.itl.nist.gov/div897/ctg/vpl/language.htm This usually *started* as a two-day process, flying the inspector to our headquarters, taking perhaps 2 minutes of machine time to run the test suite, then sitting around that day and into the next arguing about whether the "failures" were due to non-standard assumptions in the tests, or compiler bugs. It was almost always the former, but sometimes that didn't get fully resolved for months (if the inspector was being particularly troublesome, it could require getting an Official Interpretation from the relevant stds body -- not swift!). (BTW, this is one reason huge customers are often very reluctant to move to a new release: the validation process can be very expensive and drag on for months) >>> def f(): ... global g ... g += 1 ... return g ... >>> g = 0 >>> d = {f(): f()} >>> d {2: 1} >>> The Python Lang Ref doesn't really say whether {2: 1} or {1: 2} "should be" the result, nor does it say it's implementation-defined. If you *asked* Guido what he thought it should do, he'd probably say {1: 2} (not much of a guess: I asked him in the past, and that's what he did say ). Something "like that" can show up in the test suite, but buried under layers of obfuscating accidents. Nobody is likely to realize it in the absence of a failure motivating people to search for it. Which is a trap: sometimes ours was the only compiler (of dozens and dozens) that had *ever* "failed" a particular test. This was most often the case at Cray Research, which had bizarre (but exceedingly fast -- which is what Cray's customers valued most) floating-point arithmetic. I recall one test in particular that failed because Cray's was the only box on earth that set I to 1 in INTEGER I I = 6.0/3.0 Fortran doesn't define that the result must be 2. But-- you guessed it --neither does Python. Cute: at KSR, INT(6.0/3.0) did return 2 -- but INT(98./49.) did not . then-again-the-python-test-suite-is-still-shallow-ly y'rs - tim From hughett@mercur.uphs.upenn.edu Mon Jan 29 22:05:22 2001 From: hughett@mercur.uphs.upenn.edu (Paul Hughett) Date: Mon, 29 Jan 2001 17:05:22 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: References: Message-ID: <200101292205.RAA18790@mercur.uphs.upenn.edu> tim says: > Cray's was the only box on earth that set I to 1 in > INTEGER I > I = 6.0/3.0 > Fortran doesn't define that the result must be 2. But-- you guessed > it --neither does Python. I would _guess_ that the IEEE 754 floating point standard does require that, but I haven't actually gotten my hands on a copy of the standard yet. If it doesn't, I may have to stop writing code that depends on the assumption that floating point computation is exact for exactly representable integers. If so, then we're reasonably safe; there aren't many non-IEEE machines left these days. Un-lurking-ly yours, Paul Hughett From tim.one@home.com Mon Jan 29 22:53:43 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 17:53:43 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <200101292205.RAA18790@mercur.uphs.upenn.edu> Message-ID: [Paul Hughett] > I would _guess_ that the IEEE 754 floating point standard does require > that [6./3. == 2.], It does, but 754 is silent on how languages may or may not *bind* to its semantics. The C99 std finally addresses that (15 years after 754), and Java does too (albeit in a way Kahan despises), but that's about it for "name brand" languages. > ... > If it doesn't, I may have to stop writing code that depends on > the assumption that floating point computation is exact for exactly > representable integers. If so, then we're reasonably safe; there > aren't many non-IEEE machines left these days. I'm afraid you've got no guarantees even on a box with 100% conforming 754 hardware. One of the last "mystery bugs" I helped tracked down at my previous employer only showed up under Intel's C++ compiler. It turned out the compiler was looking for code of the form: double *a, *b, scale; for (i=0; i < n; ++i) { a[i] = b[i] / scale; } and rewriting it as: double __temp = 1./scale; for (i=0; i < n; ++i) { a[i] = b[i] * __temp; } for speed. As time goes on, PC compilers are becoming more and more like Cray's and KSR's in this respect: float division is much more expensive than float mult, and so variations of "so multiply by the reciprocal instead" are hard for vendors to resist. And, e.g., under 754 double rules, (17. * 123.) * (1./123.) must *not* yield exactly 17.0 if done wholly in 754 double (but then 754 says nothing about how any language maps that string to 754 operations). if-you-like-logic-chopping-you'll-love-arguing-stds-ly y'rs - tim From guido@digicool.com Mon Jan 29 23:59:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 18:59:34 -0500 Subject: [Python-Dev] Does autoconfig detect INSTALL incorrectly? In-Reply-To: Your message of "Tue, 23 Jan 2001 00:30:56 PST." <20010123003056.A28309@glacier.fnational.com> References: <20010123003056.A28309@glacier.fnational.com> Message-ID: <200101292359.SAA20364@cj20424-a.reston1.va.home.com> > Why is the configure.in file set to always use "install-sh"? > There is a comment that says: > > # Install just never works :-( > > I don't think that statement is accurate. /usr/bin/install works > quite well on my machine. The only commments I can find in the > changelog are: > > revision 1.16 > date: 1995/01/20 14:12:16; author: guido; state: Exp; lines: +27 -2 > add INSTALL_PROGRAM and INSTALL_DATA; check for getopt > > and: > > revision 1.5 > date: 1994/08/19 15:33:51; author: guido; state: Exp; lines: +14 -6 > Simplify value of INSTALL (always 'cp'). > > Is there any reason why the autoconf macro AC_PROG_INSTALL is not used? The > documentation seems to indicate that is does what we want. Neil, It's too long for me to remember, and I bet this was before AC_PROG_INSTALL. If there's a reason to prefer a working "install" over install-sh, feel free to do the right thing! (You're in charge of the Makefile anyway now, it seems. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Tue Jan 30 00:17:25 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 29 Jan 2001 18:17:25 -0600 (CST) Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP Message-ID: <14966.2069.950895.627663@beluga.mojam.com> After reading through this thread and noticing (but not paying close attention to) all the related posts on c.l.py (subject: "in for dicts"), it seems to me that the whole "if/for something in dict" thing needds to be hashed out in a PEP. There were a fair amount of "Python's changing too fast" rants when 2.0 was released. Adding a major feature such as this at the 2.1 stage is only going to generate that many more rants. The fact that it was easy for Thomas to implement "if key in dict" doesn't make the overall concept less controversial. There are apparently lots of varying opinions about what's reasonable. This topic seems related to PEP 212 (Loop Counter Iteration) and PEP 218 (Adding a Built-In Set Object Type), but may well warrant its own. That said, I have plenty enough on my plate trying to keep Mojam afloat these days, so I can't step into the crevass, just observe that it looks to me like a very long ways to the bottom... ;-) Skip From guido@digicool.com Tue Jan 30 00:22:58 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 19:22:58 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Your message of "Mon, 29 Jan 2001 18:17:25 CST." <14966.2069.950895.627663@beluga.mojam.com> References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <200101300022.TAA21244@cj20424-a.reston1.va.home.com> > After reading through this thread and noticing (but not paying close > attention to) all the related posts on c.l.py (subject: "in for dicts"), it > seems to me that the whole "if/for something in dict" thing needds to be > hashed out in a PEP. There were a fair amount of "Python's changing too > fast" rants when 2.0 was released. Adding a major feature such as this at > the 2.1 stage is only going to generate that many more rants. The fact that > it was easy for Thomas to implement "if key in dict" doesn't make the > overall concept less controversial. There are apparently lots of varying > opinions about what's reasonable. This topic seems related to PEP 212 (Loop > Counter Iteration) and PEP 218 (Adding a Built-In Set Object Type), but may > well warrant its own. Excellent. Good reminder also that this shouldn't go into 2.1 -- clearly the design space is too complicated for a quick decision. > That said, I have plenty enough on my plate trying to keep Mojam afloat > these days, so I can't step into the crevass, just observe that it looks to > me like a very long ways to the bottom... ;-) I'm not able to lead such a PEP effort myself either, but I hope *someone* will be. This PEP has a good chance for 2.2 though (what with BDFL approval and all :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue Jan 30 01:39:17 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 20:39:17 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I did a less sophisticated count but come to the same conclusion: > iterations over items() are (somewhat) more common than over keys(), > and values() are 1-2 orders of magnitude less common. My numbers: > > $ cd python/src/Lib > $ grep 'for .*items():' *.py | wc -l > 47 > $ grep 'for .*keys():' *.py | wc -l > 43 > $ grep 'for .*values():' *.py | wc -l > 2 I like my larger sample and anal methodology better . A closer look showed that it may have been unduly biased by the mass of files in Lib/encodings/, where encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k is at the end of most files (btw, MAL, that's the answer to your question: people would expect "the same" ordering you expected there, i.e. none in particular). > ... > I don't much value to the readability argument: typically, one will > write "for key in dict" or "for name in dict" and then it's obvious > what is meant. Well, "fiddlesticks" comes to mind <0.9 wink>. If I've got a dict mapping phone numbers to names, "for name in dict" is dead backwards. for vevent in keydefs.keys(): for x in self.subdirs.keys(): for name in lsumdict.keys(): for locale in self.descriptions.keys(): for name in attrs.keys(): for func in other.top_level.keys(): for func in target.keys(): for i in u2.keys(): for s in d.keys(): for url in self.bad.keys(): are other cases in the CVS tree where I don't think the name makes it obvious in the absence of ".keys()". But I don't personally give any weight to whether people can guess what something does at first glance. My rule is that it doesn't matter, provided it's (a) easy to learn; and (especially), (b) hard to *forget* once you've learned it. A classic example is Python's "points between elements" treatment of slice indices: few people guess right what that does at first glance, but once they "get it" they're delighted and rarely mess up again. And I think this is "like that". > ... > But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody > has even asked me for a has_item() method). Yup. > I can live with "x in list" checking the values and "x in dict" > checking the keys. But I can *not* live with "x in dict" equivalent > to "dict.has_key(x)" if "for x in dict" would mean > "for x in dict.items()". That's why I brought it up -- it's not entirely clear what's to be done here. > I also think that defining "x in dict" but not "for x in dict" will > be confusing. > > So we need to think more. The hoped-for next step indeed. > How about: > > for key in dict: ... # ... over keys > > for key:value in dict: ... # ... over items > > This is syntactically unambiguous (a colon is currently illegal in > that position). Cool! Can we resist adding if key:value in dict for "parallelism"? (I know I can ...) 2/3rd of these are marginally more attractive: for key: in dict: # over dict.keys() for :value in dict: # over dict.values() for : in dict: # a delay loop > This also suggests: > > for index:value in list: ... # ... over zip(range(len(list), list) > > while doesn't strike me as bad or ugly, and would fulfill my brother's > dearest wish. You mean besides the one that you fry in hell for not adding "for ... indexing"? Ya, probably. > (And why didn't we think of this before?) Best guess: we were focused exclusively on sequences, and a colon just didn't suggest itself in that context. Second-best guess: having finally approved one of these gimmicks, you finally got desperate enough to make it work . ponderingly y'rs - tim From tim.one@home.com Tue Jan 30 01:58:59 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 20:58:59 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > I'm expecting (though don't have much proof) that most loops over > dicts don't mutate the dict. Safe bet! I do recall writing one once: it del'ed keys for which the associated count was 1, because the rest of the algorithm was only interested in duplicates. > Maybe we could add a flag to the dict that issues an error when a new > key is inserted during such a for loop? (I don't think the key order > can be affected when a key is *deleted*.) That latter is true but specific to this implementation. "Can't mutate the dict period" is easier to keep straight, and probably harmless in practice (if not, it could be relaxed later). Recall that a similar trick is played during list.sort(), replacing the list's type pointer for the duration (to point to an internal "immutable list" type, same as the list type except the "dangerous" slots point to a function that raises an "immutable list" TypeError). Then no runtime expense is incurred for regular lists to keep checking flags. I thought of this as an elegant use for switching types at runtime; you may still be appalled by it, though! From tim.one@home.com Tue Jan 30 02:07:36 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:07:36 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75B190.3FD2A883@lemburg.com> Message-ID: [Guido] > The same order that for k,v in dict.items() will yield, of course. [MAL] > And then people find out that the order has some sorting > properties and start to use it... Except that it has none. dict insertion has never used any comparison outcome beyond "equal"/"not equal", so any ordering you think you see is-- and always was --an illusion. From guido@digicool.com Tue Jan 30 02:06:35 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:06:35 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 20:39:17 EST." References: Message-ID: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> This is all PEP material now. Tim, do you want to own the PEP? It seems just up your alley! > Cool! Can we resist adding > > if key:value in dict > > for "parallelism"? (I know I can ...) That's easy to resist because, unlike ``for key:value in dict'', it's not unambiguous: ``if key:value in dict'' is already legal syntax currently, with 'key' as the condition and 'value in dict' as the (not particularly useful) body of the if statement. > > (And why didn't we think of this before?) > > Best guess: we were focused exclusively on sequences, and a colon just > didn't suggest itself in that context. Second-best guess: having finally > approved one of these gimmicks, you finally got desperate enough to make it > work . I'm certainly more comfortable with just ``for key in dict'' than with the whole slow of extensions using colons. But, again, that's for the PEP to fight over. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 30 02:15:04 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:15:04 -0500 Subject: [Python-Dev] C's for statement In-Reply-To: Your message of "Mon, 29 Jan 2001 15:09:14 EST." <20010129150914.C10191@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> Message-ID: <200101300215.VAA21955@cj20424-a.reston1.va.home.com> [ESR] > There's not much I miss from C these days, but one thing I wish Python > had is a more general for-loop. The C semantics that let you have > any initialization, any termination test, and any iteration you like > are rather cool. > > Yes, I realize that > > for (; ; ) {} > > can be simulated with: > > > while 1: > if : > break > > > Still, having them spatially grouped the way a C for does it is nice. > Makes it easier to see invariants, I think. Hm, I've seen too many ugly C for loops to have much appreciation for it. I can recognize and appreciate the few common forms that clearly iterate over an array; most other forms look rather contorted to me. Check out the Python C sources; if you find anything more complicated than ``for (i = n; i > 0; i--)'' I probably didn't write it. :-) Common abominations include: - writing a while loop as for(;;) - putting arbitrary initialization code in - having an empty condition, so the becomes an arbitraty extension of the body that's written out-of-sequence --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue Jan 30 02:19:12 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:19:12 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75C86A.3A4236E8@lemburg.com> Message-ID: [MAL] > I just wanted to hint at a problem which iterating over items > in an unordered set can cause. Especially new Python users will find > it confusing that the order of the items in an iteration can change > from one run to the next. Do they find "for k, v in dict.items()" confusing now? Would be the same. > ... > What we really want is iterators for dictionaries, so why not > implement these instead of tweaking for-loops. Seems an unrelated topic: would "iterators for dictionaries" solve the supposed problem with iteration order? > If you are looking for speedups w/r to for-loops, applying a > different indexing technique in for-loops would go a lot further > and provide better performance not only to dictionary loops, > but also to other sequences. > > I have made some good experience with a special counter object > (sort of like a mutable integer) which is used instead of the > iteration index integer in the current implementation. Please quantify, if possible. My belief (based on past experiments) is that in loops fancier than for i in range(n): pass the loop overhead quickly falls into the noise even now. > Using an iterator object instead of the integer + __getitem__ > call machinery would allow more flexibility for all kinds of > sequences or containers. ... This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new iteration *protocol* could have major attractions. From guido@digicool.com Tue Jan 30 02:17:27 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:17:27 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: Your message of "Mon, 29 Jan 2001 15:02:47 EST." <20010129150247.B10191@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> Message-ID: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> [ESR] > For different reasons, I'd like to be able to set a constant flag on a > object instance. Simple semantics: if you try to assign to a > member or method, it throws an exception. > > Application? I have a large Python program that goes to a lot of effort > to build elaborate context structures in core. It would be nice to know > they can't be even inadvertently trashed without throwing an exception I > can watch for. Yes, this is a good thing. Easy to do on lists and dicts. Questions: - How to spell it? x.freeze()? x.readonly()? - Should this reversible? I.e. should there be an x.unfreeze()? - Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue Jan 30 02:29:25 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:29:25 -0500 Subject: [Python-Dev] C's for statement In-Reply-To: <200101300215.VAA21955@cj20424-a.reston1.va.home.com> Message-ID: Check out SETL's loop statement. I think Perl5 is a subset of it <0.9 wink>. From esr@thyrsus.com Tue Jan 30 02:34:01 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 21:34:01 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <200101300215.VAA21955@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:15:04PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> Message-ID: <20010129213401.A17235@thyrsus.com> Guido van Rossum : > Common abominations include: > > - writing a while loop as for(;;) Agreed. Bletch. > - putting arbitrary initialization code in Not sure what's "arbitrary", unless you mean unrelated to the iteration variable. > - having an empty condition, so the becomes an arbitraty > extension of the body that's written out-of-sequence Again agreed. Double bletch. I guess my archetype of the cute C for-loop is the idiom for pointer-list traversal: struct foo {int data; struct foo *next;} *ptr, *head; for (ptr = head; *ptr; ptr = ptr->next) do_something_with(ptr->data) This is elegant. It separates the logic for list traversal from the operation on the list element. Not the highest on my list of wants -- I'd sooner have ?: back. I submitted a patch for that once, and the discussion sort of died. Were you dead det against it, or should I revive this proposal? -- Eric S. Raymond "The bearing of arms is the essential medium through which the individual asserts both his social power and his participation in politics as a responsible moral being..." -- J.G.A. Pocock, describing the beliefs of the founders of the U.S. From esr@thyrsus.com Tue Jan 30 02:49:59 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 21:49:59 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:17:27PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <20010129214959.B17235@thyrsus.com> Guido van Rossum : > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? I like "freeze", it'a a clear imperative where "readonly()" sounds like a test (e.g. "is this readonly()?") > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... Moshe Zadka sent me a hack that handles instances: > class MarkableAsConstant: > > def __init__(self): > self.mark_writable() > > def __setattr__(self, name, value): > if self._writable: > self.__dict__[name] = value > else: > raise ValueError, "object is read only" > > def mark_writable(self): > self.__dict__['_writable'] = 1 > > def mark_readonly(self): > self.__dict__['_writable'] = 0 > - Should this reversible? I.e. should there be an x.unfreeze()? I gave this some thought earlier today. There are advantages to either way. Making freeze a one-way operation would make it possible to use freezing to get certain kinds of security and integrity guarantees that you can't have if freezing is reversible. Fortunately, there's a semantics that captures both. If we allow freeze to take an optional key argument, and require that an unfreeze call must supply the same key or fail, we get both worlds. We can even one-way-hash the keys so they don't have to be stored in the bytecode. Want to lock a structure permanently? Pick a random long key. Freeze with it. Then throw that key away... -- Eric S. Raymond Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune From tim.one@home.com Tue Jan 30 02:57:59 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:57:59 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? See below. > - Should this reversible? Of course. Or x.freeze(solid=1) to default to permanent rigidity, but not require it. > I.e. should there be an x.unfreeze()? That conveniently answers the first question, since x.unreadonly() reads horribly . > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... "Should be" supported for every mutable object. Next step: as in endless C++ debates, endless Python debates about "representation freeze" vs "logical freeze" ("well, yes, I'm changing this member, but it's just an invisible cache so I *should* be able to tag the object as const anyway ..."; etc etc etc). keep-it-simple-ly y'rs - tim From guido@digicool.com Tue Jan 30 02:57:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:57:24 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: Your message of "Mon, 29 Jan 2001 21:34:01 EST." <20010129213401.A17235@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> Message-ID: <200101300257.VAA22186@cj20424-a.reston1.va.home.com> > > - putting arbitrary initialization code in > > Not sure what's "arbitrary", unless you mean unrelated to the > iteration variable. Yes, that. > I guess my archetype of the cute C for-loop is the idiom for > pointer-list traversal: > > struct foo {int data; struct foo *next;} *ptr, *head; > > for (ptr = head; *ptr; ptr = ptr->next) > do_something_with(ptr->data) > > This is elegant. It separates the logic for list traversal from the > operation on the list element. And it rarely happens in Python, because sequences are rarely represented as linked lists. > Not the highest on my list of wants -- I'd sooner have ?: back. I submitted > a patch for that once, and the discussion sort of died. Were you dead > det against it, or should I revive this proposal? Not dead set against something like it, but dead set against the ?: syntax because then : becomes too overloaded for the human reader, e.g.: if foo ? bar : bletch : spam = eggs If you want to revive this, I strongly suggest writing a PEP first before posting here. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 30 02:59:17 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:59:17 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: Your message of "Mon, 29 Jan 2001 21:49:59 EST." <20010129214959.B17235@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <20010129214959.B17235@thyrsus.com> Message-ID: <200101300259.VAA22208@cj20424-a.reston1.va.home.com> > > - How to spell it? x.freeze()? x.readonly()? > > I like "freeze", it'a a clear imperative where "readonly()" sounds > like a test (e.g. "is this readonly()?") Agreed. > > - Should we support something like this for instances too? Sometimes > > it might be cool to be able to freeze changing attribute values... > > Moshe Zadka sent me a hack that handles instances: [...] OK, so no special support needed there. > > - Should this reversible? I.e. should there be an x.unfreeze()? > > I gave this some thought earlier today. There are advantages to either > way. Making freeze a one-way operation would make it possible to use > freezing to get certain kinds of security and integrity guarantees that > you can't have if freezing is reversible. > > Fortunately, there's a semantics that captures both. If we allow > freeze to take an optional key argument, and require that an unfreeze > call must supply the same key or fail, we get both worlds. We can > even one-way-hash the keys so they don't have to be stored in the > bytecode. > > Want to lock a structure permanently? Pick a random long key. Freeze > with it. Then throw that key away... Way too cute. My suggestion freeze(0) freezes forever, freeze(1) can be unfrozen. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Tue Jan 30 03:06:19 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 22:06:19 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <200101300257.VAA22186@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:57:24PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> <200101300257.VAA22186@cj20424-a.reston1.va.home.com> Message-ID: <20010129220619.A17713@thyrsus.com> Guido van Rossum : > Not dead set against something like it, but dead set against the ?: > syntax because then : becomes too overloaded for the human reader, e.g.: > > if foo ? bar : bletch : spam = eggs > > If you want to revive this, I strongly suggest writing a PEP first > before posting here. Noted. Will do. -- Eric S. Raymond Such are a well regulated militia, composed of the freeholders, citizen and husbandman, who take up arms to preserve their property, as individuals, and their rights as freemen. -- "M.T. Cicero", in a newspaper letter of 1788 touching the "militia" referred to in the Second Amendment to the Constitution. From tim.one@home.com Tue Jan 30 03:18:47 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 22:18:47 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: <20010129214959.B17235@thyrsus.com> Message-ID: Note that even adding a "frozen" flag would add 4 bytes to every freezable object on most machines. That's why I'd rather .freeze() replace the type pointer and .unfreeze() restore it. No time or space overhead; no cluttering up the normal-case (i.e., unfrozen) type implementations with new tests. From tim.one@home.com Tue Jan 30 03:57:07 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 22:57:07 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.42988.362288.154254@localhost.localdomain> Message-ID: Note that optimizing compilers use a pile of linear-time heuristics to attempt to solve exponential-time optimization problems (from optimal register assignment to optimal instruction scheduling, they're all formally intractable even in isolation). When code gets non-trivial, not even a compiler's chief designer can reliably outguess what optimization may do. It's really not unusual for a higher optimization level to yield slower code, and especially not when the source code is pushing or exceeding machine limits (# of registers, # of instruction pipes, size of branch-prediction buffers; I-cache structure; dynamic restrictions on execution units; ...). [Jeremy] > ... > One of the differences between -O2 and -O3, according to the man page, > is that -O3 will perform optimizations that involve a space-speed > tradeoff. It also include -finline-functions. I can imagine that > some of these optimizations hurt memory performance enough to make a > difference. One of the time-consuming ongoing tasks at my last employer was running profiles and using them to override counterproductive compiler inlining decisions (in both directions). It's not just memory that excessive inlining can screw up, but also things like running out of registers and so inserting gobs of register spill/restore code, and inlining so much code that the instruction scheduler effectively gives up (under many compilers, a sure sign of this is when you look at the generated code for a function, and it looks beautiful "at the top" but terrible "at the bottom"; some clever optimizers tried to get around that by optimizing "bottom-up", and then it looks beautiful at the bottom but terrible at the top <0.5 wink>; others work middle-out or burn the candle at both ends, with visible consequences you should be able to recognize now!). optimization-is-easier-than-speech-recog-but-the-latter-doesn't-work- all-that-well-either-ly y'rs - tim From barry@digicool.com Tue Jan 30 04:13:24 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:13:24 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <14966.16228.548177.112853@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> it seems to me that the whole "if/for something in dict" thing SM> needds to be hashed out in a PEP. SM> There are apparently lots of varying opinions about what's SM> reasonable. This topic seems related to PEP 212 (Loop Counter SM> Iteration) and PEP 218 (Adding a Built-In Set Object Type), SM> but may well warrant its own. As keeper of PEP0, I have to agree. I personally would vastly prefer a new iterator protocol than syntax such as "for key:value in dict". I'd really like to see a PEP on an iterator protocol for Python, but like Skip, I'm too busy at the moment to do it myself. If nobody takes it on before then, I might be willing to champion such a PEP for the 2.2 time frame. Until then, I'm decidedly -1 on "for/if in dict". -Barry From barry@digicool.com Tue Jan 30 04:25:09 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:25:09 -0500 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <14966.16933.209494.214183@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Yes, this is a good thing. Easy to do on lists and dicts. GvR> Questions: GvR> - How to spell it? x.freeze()? x.readonly()? GvR> - Should this reversible? I.e. should there be an GvR> x.unfreeze()? GvR> - Should we support something like this for instances too? GvR> Sometimes it might be cool to be able to freeze changing GvR> attribute values... lock(x) ...? :) -Barry From barry@digicool.com Tue Jan 30 04:26:50 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:26:50 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <20010129214959.B17235@thyrsus.com> Message-ID: <14966.17034.721204.305315@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Fortunately, there's a semantics that captures both. If we ESR> allow freeze to take an optional key argument, and require ESR> that an unfreeze call must supply the same key or fail, we ESR> get both worlds. We can even one-way-hash the keys so they ESR> don't have to be stored in the bytecode. ESR> Want to lock a structure permanently? Pick a random long ESR> key. Freeze with it. Then throw that key away... Clever! From esr@thyrsus.com Tue Jan 30 04:32:16 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 23:32:16 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <14966.16933.209494.214183@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 29, 2001 at 11:25:09PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <14966.16933.209494.214183@anthem.wooz.org> Message-ID: <20010129233215.A18533@thyrsus.com> Barry A. Warsaw : > lock(x) ...? :) I was thinking that myself, Barry. -- Eric S. Raymond "Boys who own legal firearms have much lower rates of delinquency and drug use and are even slightly less delinquent than nonowners of guns." -- U.S. Department of Justice, National Institute of Justice, Office of Juvenile Justice and Delinquency Prevention, NCJ-143454, "Urban Delinquency and Substance Abuse," August 1995. From tim.one@home.com Tue Jan 30 04:56:09 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 29 Jan 2001 23:56:09 -0500 Subject: [Python-Dev] SSL socket read at EOF; SourceForge problem Message-ID: I tried to open an SF bug for the following msg from c.l.py, but SF balked: ERROR ERROR getting bug_id Logged out, logged in, tried it again, same outcome. Intended bug report content: Good question from c.l.py, assigned to Guido cuz he's a Socket Guy: From: Clarence Gardner Subject: RE: Thread Safety Date: Mon, 29 Jan 2001 09:51:03 -0800 ... I'm going to repeat a question that I posted about a week ago that passed without comment on the newsgroup. The issue is the SSL support in the socket module, which raises an exception when the reading socket is at EOF, rather than returning an empty string. I'm hesitant to call it a "bug", but I wouldn't have implemented it this way. There are the names of two people mentioned at the top of socketmodule.c, but no contact information, so I'm suggesting here that it be changed to conform to normal file/socket practice. (SSL was actually added at 2.0, so I'm late to the party with this; mea culpa, mea culpa. I delayed trying Python2 because of the extension rebuilding.) From thomas@xs4all.net Tue Jan 30 06:14:20 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:14:20 +0100 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <20010129213401.A17235@thyrsus.com>; from esr@thyrsus.com on Mon, Jan 29, 2001 at 09:34:01PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> Message-ID: <20010130071420.U962@xs4all.nl> On Mon, Jan 29, 2001 at 09:34:01PM -0500, Eric S. Raymond wrote: > I guess my archetype of the cute C for-loop is the idiom for > pointer-list traversal: > struct foo {int data; struct foo *next;} *ptr, *head; > for (ptr = head; *ptr; ptr = ptr->next) > do_something_with(ptr->data) Note two things: in Python, you would use a list, so 'for x i list' does exactly what you want here ;) And if you really need it, you could use iterators for exactly this (once we have them, of course): you are inventing a new storage type. Quite common in C, since the only one it has is useless for anything other than strings, but not so common in Python. > Not the highest on my list of wants -- I'd sooner have ?: back. I submitted > a patch for that once, and the discussion sort of died. Were you dead > det against it, or should I revive this proposal? Triple blech. Guido will never go for it! (There, increased your chance of getting it approved! :) Seriously though, I wouldn't like it much, it's too cryptic a syntax. I notice I use it less and less in C, too. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Tue Jan 30 06:18:25 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:18:25 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from tim.one@home.com on Mon, Jan 29, 2001 at 08:39:17PM -0500 References: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: <20010130071825.V962@xs4all.nl> On Mon, Jan 29, 2001 at 08:39:17PM -0500, Tim Peters wrote: > for key: in dict: # over dict.keys() > for :value in dict: # over dict.values() > for : in dict: # a delay loop Wot's the last one supposed to do ? 'for unused_var in range(len(dict)):' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Tue Jan 30 06:25:51 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 30 Jan 2001 01:25:51 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010130071825.V962@xs4all.nl> Message-ID: >> for key: in dict: # over dict.keys() >> for :value in dict: # over dict.values() >> for : in dict: # a delay loop [Thomas Wouters] > Wot's the last one supposed to do ? 'for unused_var in > range(len(dict)):' ? Well, as the preceding line said in the original: >> 2/3rd of these are marginally more attractive [than >> "if key:value in dict"]: I think you've guessed which 2/3 those are . I don't see that the last line has any visible semantics whatsoever, so Python can do whatever it likes, provided it doesn't do anything visible. You still hang out on c.l.py! So you gotta know that if something of the form x:y is suggested, people will line up to suggest meanings for the 3 obvious variations, along with x::y and x:-:y and x lambda y too <0.9 wink>. From thomas@xs4all.net Tue Jan 30 06:26:48 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:26:48 +0100 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <14966.2069.950895.627663@beluga.mojam.com>; from skip@mojam.com on Mon, Jan 29, 2001 at 06:17:25PM -0600 References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <20010130072648.W962@xs4all.nl> On Mon, Jan 29, 2001 at 06:17:25PM -0600, Skip Montanaro wrote: > The fact that it was easy for Thomas to implement "if key in dict" doesn't > make the overall concept less controversial. Note that the fact I implemented it doesn't mean I'm +1 on it (witness my posts on python-list.) In fact, *while implementing it*, I grew from +0 to -0 and maybe even to a weak -1 (all in 5 minutes :) The enthousiastic subject of the patch was a weak attempt at 5AM humour, not a venting of an ancient desire :) More-5AM-humour-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Tue Jan 30 06:55:16 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:55:16 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: ; from jhylton@users.sourceforge.net on Mon, Jan 29, 2001 at 05:27:30PM -0800 References: Message-ID: <20010130075515.X962@xs4all.nl> On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > add note about two kinds of illegal imports that are now checked > + - The compiler will report a SyntaxError if "from ... import *" occurs > + in a function or class scope or if a name bound by the import > + statement is declared global in the same scope. The language > + reference has also documented that these cases are illegal, but > + they were not enforced. Woah. Is this really a good idea ? I have seen 'from ... import *' in a function scope put to good (relatively -- we're talking 'import *' here) use. I also thought of 'import' as yet another assignment statement, so to me it's both logical and consistent if 'import' would listen to 'global'. Otherwise we have to re-invent 'import spam; eggs = spam' if we want eggs to be global. Is there really a reason to enforce this, or are we enforcing the wording of the language reference for the sake of enforcing the wording of the language reference ? When writing 'import as' for 2.0, I fixed some of the inconsistencies in import, making it adhere to 'global' statements in as many cases as possible (all except 'from ... import *') but I was apparently not aware of the wording of the language reference. I'd suggest updating the wording in the language reference, not the implementation, unless there is a good reason to disallow this. I also have another issue with your recent patches, Jeremy, also in the backwards-compatibility departement :) You gave new.code two new, non-optional arguments, in the middle of the long argument list. I sent a note about it to python-checkins instead of python-dev by accident, but Fred seemed to agree with me there. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh21@cam.ac.uk Tue Jan 30 08:30:15 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 30 Jan 2001 08:30:15 +0000 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: "Tim Peters"'s message of "Mon, 29 Jan 2001 22:57:07 -0500" References: Message-ID: In the interest of generating some numbers (and filling up my hard drive), last night I wrote a script to build lots & lots of versions of python (many of which turned out to be redundant - eg. -O6 didn't seem to do anything different to -O3 and pybench doesn't work with 1.5.2), and then run pybench with them. Summarised results below; first a key: src-n: this morning's CVS (with Jeremy's f_localsplus optimisation) (only built this with -O3) src: CVS from yesterday afternoon src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc patch applied. More on this later... Python-2.0: you can guess what this is. All runs are compared against Python-2.0-O2: Benchmark: src-n-O3 (rounds=10, warp=20) Average round time: 49029.00 ms -0.86% Benchmark: src (rounds=10, warp=20) Average round time: 67141.00 ms +35.76% Benchmark: src-O (rounds=10, warp=20) Average round time: 50167.00 ms +1.44% Benchmark: src-O2 (rounds=10, warp=20) Average round time: 49641.00 ms +0.37% Benchmark: src-O3 (rounds=10, warp=20) Average round time: 49104.00 ms -0.71% Benchmark: src-O6 (rounds=10, warp=20) Average round time: 49131.00 ms -0.66% Benchmark: src-obmalloc (rounds=10, warp=20) Average round time: 63276.00 ms +27.94% Benchmark: src-obmalloc-O (rounds=10, warp=20) Average round time: 46927.00 ms -5.11% Benchmark: src-obmalloc-O2 (rounds=10, warp=20) Average round time: 46146.00 ms -6.69% Benchmark: src-obmalloc-O3 (rounds=10, warp=20) Average round time: 46456.00 ms -6.07% Benchmark: src-obmalloc-O6 (rounds=10, warp=20) Average round time: 46450.00 ms -6.08% Benchmark: Python-2.0 (rounds=10, warp=20) Average round time: 68933.00 ms +39.38% Benchmark: Python-2.0-O (rounds=10, warp=20) Average round time: 49542.00 ms +0.17% Benchmark: Python-2.0-O3 (rounds=10, warp=20) Average round time: 48262.00 ms -2.41% Benchmark: Python-2.0-O6 (rounds=10, warp=20) Average round time: 48273.00 ms -2.39% My conclusion? Python 2.1 is slower than Python 2.0, but not by enough to care about. Interestingly, adding obmalloc speeds things up. Let's take a closer look: $ python pybench.py -c src-obmalloc-O3 -s src-O3 PYBENCH 0.7 Benchmark: src-O3 (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 843.35 ms 6.61 us +2.93% BuiltinMethodLookup: 878.70 ms 1.67 us +0.56% ConcatStrings: 1068.80 ms 7.13 us -1.22% ConcatUnicode: 1373.70 ms 9.16 us -1.24% CreateInstances: 1433.55 ms 34.13 us +9.06% CreateStringsWithConcat: 1031.75 ms 5.16 us +10.95% CreateUnicodeWithConcat: 1277.85 ms 6.39 us +3.14% DictCreation: 1275.80 ms 8.51 us +44.22% ForLoops: 1415.90 ms 141.59 us -0.64% IfThenElse: 1152.70 ms 1.71 us -0.15% ListSlicing: 397.40 ms 113.54 us -0.53% NestedForLoops: 789.75 ms 2.26 us -0.37% NormalClassAttribute: 935.15 ms 1.56 us -0.41% NormalInstanceAttribute: 961.15 ms 1.60 us -0.60% PythonFunctionCalls: 1079.65 ms 6.54 us -1.00% PythonMethodCalls: 908.05 ms 12.11 us -0.88% Recursion: 838.50 ms 67.08 us -0.00% SecondImport: 741.20 ms 29.65 us +25.57% SecondPackageImport: 744.25 ms 29.77 us +18.66% SecondSubmoduleImport: 947.05 ms 37.88 us +25.60% SimpleComplexArithmetic: 1129.40 ms 5.13 us +114.92% SimpleDictManipulation: 1048.55 ms 3.50 us -0.00% SimpleFloatArithmetic: 746.05 ms 1.36 us -2.75% SimpleIntFloatArithmetic: 823.35 ms 1.25 us -0.37% SimpleIntegerArithmetic: 823.40 ms 1.25 us -0.37% SimpleListManipulation: 1004.70 ms 3.72 us +0.01% SimpleLongArithmetic: 865.30 ms 5.24 us +100.65% SmallLists: 1657.65 ms 6.50 us +6.63% SmallTuples: 1143.95 ms 4.77 us +2.90% SpecialClassAttribute: 949.00 ms 1.58 us -0.22% SpecialInstanceAttribute: 1353.05 ms 2.26 us -0.73% StringMappings: 1161.00 ms 9.21 us +7.30% StringPredicates: 1069.65 ms 3.82 us -5.30% StringSlicing: 846.30 ms 4.84 us +8.61% TryExcept: 1590.40 ms 1.06 us -0.49% TryRaiseExcept: 1104.65 ms 73.64 us +24.46% TupleSlicing: 681.10 ms 6.49 us -3.13% UnicodeMappings: 1021.70 ms 56.76 us +0.79% UnicodePredicates: 1308.45 ms 5.82 us -4.79% UnicodeProperties: 1148.45 ms 5.74 us +13.67% UnicodeSlicing: 984.15 ms 5.62 us -0.51% ------------------------------------------------------------------------ Average round time: 49104.00 ms +5.70% *) measured against: src-obmalloc-O3 (rounds=10, warp=20) Words fail me slightly, but maybe some tuning of the memory allocation of longs & complex numbers would be in order? Time for lectures - I don't think algebraic geometry is going to make my head hurt as much as trying to explain benchmarks... Cheers, M. -- ARTHUR: But which is probably incapable of drinking the coffee. -- The Hitch-Hikers Guide to the Galaxy, Episode 6 From ping@lfw.org Tue Jan 30 08:38:12 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 30 Jan 2001 00:38:12 -0800 (PST) Subject: [Python-Dev] Read-only function attributes Message-ID: Hi there. I see that the function attribute feature specifically allows assignment to func_code and func_defaults, but no other special attributes. This seems really suspect to me. Why would we want to allow the reassignment of special attributes at all? Functions have always been immutable objects, and i can see some motivation for attaching mutable dictionaries to them, but it's a more serious move to make the functions mutable themselves. I don't recall any discussion about changing special attributes; i don't see a clear purpose to them; and i do see a danger in making it harder to be certain that a program is safe and predictable. (Yes, i did notice that function attributes can't be set in restricted mode, but the addition of extra features requiring extra security checks makes me uneasy.) -- ?!ng From ping@lfw.org Tue Jan 30 08:52:43 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 30 Jan 2001 00:52:43 -0800 (PST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: Eric S. Raymond wrote: > For different reasons, I'd like to be able to set a constant flag on a > object instance. Simple semantics: if you try to assign to a > member or method, it throws an exception. Guido van Rossum wrote: > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? I'm not so sure. There seem to be many issues here. More questions: What's the difference between a frozen list and a tuple? Is a frozen list hashable? > - Should this reversible? I.e. should there be an x.unfreeze()? What if two threads lock and then unlock the same structure? > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... If you do this, i bet people will immediately want to freeze individual attributes. Some might be confused by a.x = [1, 2, 3] lock(a.x) # intend to lock the attribute, not the list a.x = 3 # hey, why is this allowed? What does locking an extension object do? What happens when you lock an object that implements list or dict semantics? Do we care that locking a UserList accomplishes nothing? Should unfreeze/unlock() be disallowed in restricted mode? -- ?!ng No software is totally secure, but using [Microsoft] Outlook is like hanging a sign on your back that reads "PLEASE MESS WITH MY COMPUTER." -- Scott Rosenberg, Salon Magazine From fredrik@effbot.org Tue Jan 30 09:05:47 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 30 Jan 2001 10:05:47 +0100 Subject: [Python-Dev] Read-only function attributes References: Message-ID: <01d701c08a9b$d7a9fe60$e46940d5@hagrid> Ka-Ping Yee wrote: > I see that the function attribute feature specifically allows > assignment to func_code and func_defaults, but no other special > attributes. This seems really suspect to me. Why would we want > to allow the reassignment of special attributes at all? to allow an IDE to "patch" a running program? From gvwilson@ca.baltimore.com Tue Jan 30 13:08:42 2001 From: gvwilson@ca.baltimore.com (Greg Wilson) Date: Tue, 30 Jan 2001 08:08:42 -0500 (EST) Subject: [Python-Dev] re: Making mutable objects readonly In-Reply-To: <20010130085202.18E71EAC4@mail.python.org> Message-ID: > Barry Warsaw: > lock(x) ...? :) Greg Wilson: -1 --- everyone will assume it's mutual exclusion, rather than immutability. From guido@digicool.com Tue Jan 30 14:01:15 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 09:01:15 -0500 Subject: [Python-Dev] Read-only function attributes In-Reply-To: Your message of "Tue, 30 Jan 2001 00:38:12 PST." References: Message-ID: <200101301401.JAA25600@cj20424-a.reston1.va.home.com> > I see that the function attribute feature specifically allows > assignment to func_code and func_defaults, but no other special > attributes. This seems really suspect to me. Why would we want > to allow the reassignment of special attributes at all? As Effbot said, this is useful in certain circumstances where a development environment wants to implement a "better reload". For this same reason you can assign to a class's __bases__ and __dict__ and to an instance's __class__ and __dict__. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 30 15:00:58 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:00:58 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: Your message of "Tue, 30 Jan 2001 00:52:43 PST." References: Message-ID: <200101301500.KAA25733@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > > > - How to spell it? x.freeze()? x.readonly()? Ping: > I'm not so sure. There seem to be many issues here. More questions: > > What's the difference between a frozen list and a tuple? A frozen list can be unfrozen (maybe)? > Is a frozen list hashable? Yes -- that's what started this thread (using dicts as dict keys, actually). > > - Should this reversible? I.e. should there be an x.unfreeze()? > > What if two threads lock and then unlock the same structure? That's up to the threads -- it's no different that other concurrent access. > > - Should we support something like this for instances too? Sometimes > > it might be cool to be able to freeze changing attribute values... > > If you do this, i bet people will immediately want to freeze > individual attributes. Some might be confused by > > a.x = [1, 2, 3] > lock(a.x) # intend to lock the attribute, not the list > a.x = 3 # hey, why is this allowed? That's a matter of API. I wouldn't make this a built-in, but rather a method on freezable objects (please don't call it lock()!). > What does locking an extension object do? What does adding 1 to an extension object do? > What happens when you lock an object that implements list or dict > semantics? Do we care that locking a UserList accomplishes nothing? Who says it doesn't? > Should unfreeze/unlock() be disallowed in restricted mode? I don't see why not. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 30 15:06:57 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:06:57 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Tue, 30 Jan 2001 07:55:16 +0100." <20010130075515.X962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> Message-ID: <200101301506.KAA25763@cj20424-a.reston1.va.home.com> > On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > > > add note about two kinds of illegal imports that are now checked > > > + - The compiler will report a SyntaxError if "from ... import *" occurs > > + in a function or class scope or if a name bound by the import > > + statement is declared global in the same scope. The language > > + reference has also documented that these cases are illegal, but > > + they were not enforced. > Woah. Is this really a good idea ? I have seen 'from ... import *' > in a function scope put to good (relatively -- we're talking 'import > *' here) use. I also thought of 'import' as yet another assignment > statement, so to me it's both logical and consistent if 'import' > would listen to 'global'. Otherwise we have to re-invent 'import > spam; eggs = spam' if we want eggs to be global. Note that Jeremy is only raising errors for "from M import *". > Is there really a reason to enforce this, or are we enforcing the > wording of the language reference for the sake of enforcing the > wording of the language reference ? When writing 'import as' for > 2.0, I fixed some of the inconsistencies in import, making it adhere > to 'global' statements in as many cases as possible (all except > 'from ... import *') but I was apparently not aware of the wording > of the language reference. I'd suggest updating the wording in the > language reference, not the implementation, unless there is a good > reason to disallow this. I think Jeremy has an excellent reason. Compilers want to do analysis of name usage at compile time. The value of * cannot be determined at compile time (you cannot know what module will actually be imported at run time). Up till now, we were able to fudge this, but Jeremy's new compiler needs to know exactly which names are defined in all local scopes, in order to do nested scopes right. > I also have another issue with your recent patches, Jeremy, also in > the backwards-compatibility departement :) You gave new.code two > new, non-optional arguments, in the middle of the long argument > list. I sent a note about it to python-checkins instead of > python-dev by accident, but Fred seemed to agree with me there. (Tim will love this. :-) I don't know what those new arguments represent. If they can reasonably be assumed to be empty for code that doesn't use the new features, I'd say move them to the end and default them properly. If they must be specified, I'd say too bad, the new module is an accident of the implementation anyway, and its users should update their code. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jan 30 15:08:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:08:39 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Your message of "Tue, 30 Jan 2001 07:26:48 +0100." <20010130072648.W962@xs4all.nl> References: <14966.2069.950895.627663@beluga.mojam.com> <20010130072648.W962@xs4all.nl> Message-ID: <200101301508.KAA25825@cj20424-a.reston1.va.home.com> > Note that the fact I implemented it doesn't mean I'm +1 on it (witness my > posts on python-list.) In fact, *while implementing it*, I grew from +0 to > -0 and maybe even to a weak -1 (all in 5 minutes :) The enthousiastic > subject of the patch was a weak attempt at 5AM humour, not a venting of an > ancient desire :) Can you say "PEP time"? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Tue Jan 30 15:29:43 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 30 Jan 2001 10:29:43 -0500 Subject: [Python-Dev] Read-only function attributes References: Message-ID: <14966.56807.288840.7850@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> I see that the function attribute feature specifically allows KY> assignment to func_code and func_defaults, but no other KY> special attributes. This seems really suspect to me. Why KY> would we want to allow the reassignment of special attributes KY> at all? ... and actually, none of that changed w/ the function attribute patch. You've been able to assign to func_code and func_defaults since Python 1.6! -Barry From thomas@xs4all.net Tue Jan 30 15:52:04 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 16:52:04 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <200101301506.KAA25763@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 30, 2001 at 10:06:57AM -0500 References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> Message-ID: <20010130165204.I962@xs4all.nl> On Tue, Jan 30, 2001 at 10:06:57AM -0500, Guido van Rossum wrote: > > On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > > > > > add note about two kinds of illegal imports that are now checked > > > > > + - The compiler will report a SyntaxError if "from ... import *" occurs > > > + in a function or class scope or if a name bound by the import > > > + statement is declared global in the same scope. The language > > > + reference has also documented that these cases are illegal, but > > > + they were not enforced. > > Woah. Is this really a good idea ? I have seen 'from ... import *' > > in a function scope put to good (relatively -- we're talking 'import > > *' here) use. I also thought of 'import' as yet another assignment > > statement, so to me it's both logical and consistent if 'import' > > would listen to 'global'. Otherwise we have to re-invent 'import > > spam; eggs = spam' if we want eggs to be global. > Note that Jeremy is only raising errors for "from M import *". No, he says he's also raising errors for 'import spam' if 'spam' is declared global, like so: def viking(): global spam import spam > > Is there really a reason to enforce this, or are we enforcing the > > wording of the language reference for the sake of enforcing the > > wording of the language reference ? When writing 'import as' for > > 2.0, I fixed some of the inconsistencies in import, making it adhere > > to 'global' statements in as many cases as possible (all except > > 'from ... import *') but I was apparently not aware of the wording > > of the language reference. I'd suggest updating the wording in the > > language reference, not the implementation, unless there is a good > > reason to disallow this. > I think Jeremy has an excellent reason. Compilers want to do analysis > of name usage at compile time. The value of * cannot be determined at > compile time (you cannot know what module will actually be imported at > run time). Up till now, we were able to fudge this, but Jeremy's new > compiler needs to know exactly which names are defined in all local > scopes, in order to do nested scopes right. Hrrmm.... I guess I have to agree with that. None the less, I wish we could have a "ack! this is stupid code! it uses 'from larch import *'! All bets are off, we do a lot of slow complicated runtime checking now!" mode. The thing I still enjoy most about Python is that it always does what I want, and though I'd never want to do 'from different import *' in a local scope, I do want other, less wise people to have the same experience, where possible :) And I also want to be able to do: def fill_me(with): global me if with == 1: import me elif with == 2: import me_too as me elif with == 3: from me.Tools import me_me as me elif with == 4: me = FakeModule() sys.modules['me'] = me else: raise ValueError And I can't quite argue that away with 'the compiler needs to know ...' -- it's all there! > > I also have another issue with your recent patches, Jeremy, also in > > the backwards-compatibility departement :) You gave new.code two > > new, non-optional arguments, in the middle of the long argument > > list. I sent a note about it to python-checkins instead of > > python-dev by accident, but Fred seemed to agree with me there. > (Tim will love this. :-) > I don't know what those new arguments represent. If they can > reasonably be assumed to be empty for code that doesn't use the new > features, I'd say move them to the end and default them properly. If > they must be specified, I'd say too bad, the new module is an accident > of the implementation anyway, and its users should update their code. Okay, I can live with that. It's sure to cause some gripes though. Then again, from looking at the code I'd say those arguments (freevars and cellvars) can easily default to empty tuples. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From bckfnn@worldonline.dk Tue Jan 30 17:34:10 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 30 Jan 2001 17:34:10 GMT Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3a76df10.22007715@smtp.worldonline.dk> [Guido] >Maybe we could add a flag to the dict that issues an error when a new >key is inserted during such a for loop? FWIW, some of the java2 collections decided to throw a Concurrent- ModificationException in the iterator if the collection was modified during the iteration. Generally none of java2 collections can be modified while iterating over it (the exception is calling .remove() on the iterator object and not all collections support that). >(I don't think the key order can be affected when a key is *deleted*.) Probably also true for the Hashtables which is backing our PyDictionary, but I'll rather not depend too much on it being true. [Tim] >That latter is true but specific to this implementation. "Can't mutate the >dict period" is easier to keep straight, and probably harmless in practice >(if not, it could be relaxed later). Agree. >Recall that a similar trick is played >during list.sort(), replacing the list's type pointer for the duration (to >point to an internal "immutable list" type, same as the list type except the >"dangerous" slots point to a function that raises an "immutable list" >TypeError). Then no runtime expense is incurred for regular lists to keep >checking flags. I thought of this as an elegant use for switching types at >runtime; you may still be appalled by it, though! Changing the type of a type? Yuck! I might very likely be reading the CPython sources wrongly, but it seems this trick will cause an BadInternalCall if some other C extension are trying to modify a list while it is freezed by the type switching trick. I imagine this would happen if the extension called: PyList_SetItem(myList, 0, aValue); I guess Jython could support this from the python side, but its hard to ensure from the java side without adding an additional PyList_Check(..) to all list methods. It just doesn't feel like the right thing to go since it would cause slower access to all mutable objects. regards, finn From guido@digicool.com Tue Jan 30 20:42:58 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 15:42:58 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Tue, 30 Jan 2001 16:52:04 +0100." <20010130165204.I962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> Message-ID: <200101302042.PAA29301@cj20424-a.reston1.va.home.com> > > > Woah. Is this really a good idea ? I have seen 'from ... import *' > > > in a function scope put to good (relatively -- we're talking 'import > > > *' here) use. I also thought of 'import' as yet another assignment > > > statement, so to me it's both logical and consistent if 'import' > > > would listen to 'global'. Otherwise we have to re-invent 'import > > > spam; eggs = spam' if we want eggs to be global. > > > Note that Jeremy is only raising errors for "from M import *". > > No, he says he's also raising errors for 'import spam' if 'spam' is declared > global, like so: > > def viking(): > global spam > import spam Yeah, this was just brought to my attention at our group meeting today. I'm with you on this one -- there really isn't a good reason why this shouldn't work. (I wonder why that constraint was ever added to the reference manual; maybe I was just upset that someone would *do* something as ugly as that, or maybe there was a J[P]ython reason???.) > > I think Jeremy has an excellent reason. Compilers want to do analysis > > of name usage at compile time. The value of * cannot be determined at > > compile time (you cannot know what module will actually be imported at > > run time). Up till now, we were able to fudge this, but Jeremy's new > > compiler needs to know exactly which names are defined in all local > > scopes, in order to do nested scopes right. > > Hrrmm.... I guess I have to agree with that. None the less, I wish we could > have a "ack! this is stupid code! it uses 'from larch import *'! All bets > are off, we do a lot of slow complicated runtime checking now!" mode. The > thing I still enjoy most about Python is that it always does what I want, > and though I'd never want to do 'from different import *' in a local scope, > I do want other, less wise people to have the same experience, where > possible :) Hm, maybe, just *maybe* Jeremy can do this if there are no nested scopes in sight. But I don't think it's a big deal as long as the error message is clear -- it's bad style. > And I also want to be able to do: > > def fill_me(with): > global me > if with == 1: > import me > elif with == 2: > import me_too as me > elif with == 3: > from me.Tools import me_me as me > elif with == 4: > me = FakeModule() > sys.modules['me'] = me > else: > raise ValueError > > And I can't quite argue that away with 'the compiler needs to know ...' -- > it's all there! Sort of, although I would prefer to do a two-stager here: first some variation of "import me as meohmy", and then "global me; me = meohmy" . > > > I also have another issue with your recent patches, Jeremy, also in > > > the backwards-compatibility departement :) You gave new.code two > > > new, non-optional arguments, in the middle of the long argument > > > list. I sent a note about it to python-checkins instead of > > > python-dev by accident, but Fred seemed to agree with me there. > > > (Tim will love this. :-) > > > I don't know what those new arguments represent. If they can > > reasonably be assumed to be empty for code that doesn't use the new > > features, I'd say move them to the end and default them properly. If > > they must be specified, I'd say too bad, the new module is an accident > > of the implementation anyway, and its users should update their code. > > Okay, I can live with that. It's sure to cause some gripes though. Then > again, from looking at the code I'd say those arguments (freevars and > cellvars) can easily default to empty tuples. OK. I hope Jeremy can fix this when he gets home. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Tue Jan 30 22:30:25 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 23:30:25 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3a76df10.22007715@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Tue, Jan 30, 2001 at 05:34:10PM +0000 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> Message-ID: <20010130233025.J962@xs4all.nl> On Tue, Jan 30, 2001 at 05:34:10PM +0000, Finn Bock wrote: > >Recall that a similar trick is played during list.sort(), replacing the > >list's type pointer for the duration (to point to an internal "immutable > >list" type, same as the list type except the "dangerous" slots point to a > >function that raises an "immutable list" TypeError). Then no runtime > >expense is incurred for regular lists to keep checking flags. I thought > >of this as an elegant use for switching types at runtime; you may still > >be appalled by it, though! > Changing the type of a type? Yuck! No, the typeobject itself isn't changed -- that would freeze *all* dicts/lists/whatever, not just the one we want. We'd be changing the type of an object (or 'type instance', if you want, but not "type 'instance'"), not the type of a type. > I might very likely be reading the CPython sources wrongly, but it seems > this trick will cause an BadInternalCall if some other C extension are > trying to modify a list while it is freezed by the type switching trick. > I imagine this would happen if the extension called: > PyList_SetItem(myList, 0, aValue); Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() (or whatever), and methods/operations that modify the listobject would have to check if the list is frozen, and raise an appropriate error if so. This might throw 'unexpected' errors, but only in situations that can't happen right now! -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik@effbot.org Tue Jan 30 22:45:16 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 30 Jan 2001 23:45:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> <20010130233025.J962@xs4all.nl> Message-ID: <003501c08b0e$51f975c0$e46940d5@hagrid> > Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, > 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() > (or whatever), and methods/operations that modify the listobject would have > to check if the list is frozen, and raise an appropriate error if so. This > might throw 'unexpected' errors. did someone just subscribe me to the perl-porters list? -1 on "modal freeze" (it's madness) -0 on an "immutable dictionary" type in the core From tim.one@home.com Tue Jan 30 23:53:45 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 30 Jan 2001 18:53:45 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This is all PEP material now. Yup. > Tim, do you want to own the PEP? Not really. Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first). >> Cool! Can we resist adding >> >> if key:value in dict >> >> for "parallelism"? (I know I can ...) > That's easy to resist because, unlike ``for key:value in dict'', it's > not unambiguous: But if (key:value) in dict is. Just trying to help whoever *does* want the PEP . > ... > I'm certainly more comfortable with just ``for key in dict'' than with > the whole slow of extensions using colons. What about just the for key:value in dict for index:value in sequence extensions? The degenerate forms (omitting x or y or both in x:y) are mechanical variations so are likely to get raised. > But, again, that's for the PEP to fight over. PEPs are easier if you Pronounce on things you hate early so that those can get recorded in the "BDFL Pronouncements" section without further ado. whatever-this-may-look-like-it's-not-a-pep-discussion-ly y'rs - tim From nas@arctrix.com Tue Jan 30 17:12:15 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 30 Jan 2001 09:12:15 -0800 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <003501c08b0e$51f975c0$e46940d5@hagrid>; from fredrik@effbot.org on Tue, Jan 30, 2001 at 11:45:16PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> <20010130233025.J962@xs4all.nl> <003501c08b0e$51f975c0$e46940d5@hagrid> Message-ID: <20010130091215.C18319@glacier.fnational.com> On Tue, Jan 30, 2001 at 11:45:16PM +0100, Fredrik Lundh wrote: > did someone just subscribe me to the perl-porters list? > > -1 on "modal freeze" (it's madness) > -0 on an "immutable dictionary" type in the core I'm glad I'm not the only one who had that feeling. I agree with your votes too. Neil From nas@arctrix.com Tue Jan 30 17:24:54 2001 From: nas@arctrix.com (Neil Schemenauer) Date: Tue, 30 Jan 2001 09:24:54 -0800 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from tim.one@home.com on Tue, Jan 30, 2001 at 06:53:45PM -0500 References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> Message-ID: <20010130092454.D18319@glacier.fnational.com> [Tim Peters on adding yet more syntatic sugar] > Available time is finite, and this isn't at the top of the list > of things I'd like to see (resuming the discussion of > generators + coroutines + iteration protocol comes to mind > first). What's the chances of getting generators into 2.2? The implementation should not be hard. Didn't Steven Majewski have something years ago? Why do we always get sidetracked on trying to figure out how to do coroutines and continuations? Generators would add real power to the language and are simple enough that most users could benefit from them. Also, it should be possible to design an interface that does not preclude the addition of coroutines or continuations later. I'm not volunteering to champion the cause just yet. I just want to know if there is some issue I'm missing. Neil From barry@digicool.com Wed Jan 31 00:24:05 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 30 Jan 2001 19:24:05 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <14967.23333.57259.347222@anthem.wooz.org> >>>>> "NS" == Neil Schemenauer writes: NS> What's the chances of getting generators into 2.2? The NS> implementation should not be hard. Didn't Steven Majewski NS> have something years ago? Why do we always get sidetracked on NS> trying to figure out how to do coroutines and continuations? I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, renaming it just "Generators" and filling it out for the 2.2 time frame. If we want to address coroutines and continuations later, we can write separate PEPs for them. Send me a draft. -Barry From guido@digicool.com Wed Jan 31 00:28:44 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:28:44 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 18:53:45 EST." References: Message-ID: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> > Not really. Available time is finite, and this isn't at the top of the list > of things I'd like to see (resuming the discussion of generators + > coroutines + iteration protocol comes to mind first). OK, get going on that one then! > >> Cool! Can we resist adding > >> > >> if key:value in dict > >> > >> for "parallelism"? (I know I can ...) > > > That's easy to resist because, unlike ``for key:value in dict'', it's > > not unambiguous: > > But > > if (key:value) in dict > > is. Just trying to help whoever *does* want the PEP . OK, I'll pronounce -1 on this one. It looks ugly to me -- too reminiscent of C's if (...) required parentheses. Also it suggests that (key:value) is a new tuple notation that might be useful in other contexts -- which it's not. > > ... > > I'm certainly more comfortable with just ``for key in dict'' than with > > the whole slow of extensions using colons. > > What about just the > > for key:value in dict > for index:value in sequence > > extensions? I'm not against these -- I'd say +0.5. > The degenerate forms (omitting x or y or both in x:y) are > mechanical variations so are likely to get raised. For those, +0.2. > > But, again, that's for the PEP to fight over. > > PEPs are easier if you Pronounce on things you hate early so that those can > get recorded in the "BDFL Pronouncements" section without further ado. At your service -- see above. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jan 31 00:49:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:49:24 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 09:24:54 PST." <20010130092454.D18319@glacier.fnational.com> References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <200101310049.TAA30197@cj20424-a.reston1.va.home.com> > [Tim Peters on adding yet more syntatic sugar] > > Available time is finite, and this isn't at the top of the list > > of things I'd like to see (resuming the discussion of > > generators + coroutines + iteration protocol comes to mind > > first). > > What's the chances of getting generators into 2.2? The > implementation should not be hard. Didn't Steven Majewski have > something years ago? Why do we always get sidetracked on trying > to figure out how to do coroutines and continuations? I think there's a very good chance of getting them into 2.2. But it *is* true that coroutines are a very attractice piece of land "just nextdoor". On the other hand, continiations are a mirage, so don't try to go there. :-) > Generators would add real power to the language and are simple > enough that most users could benefit from them. Also, it should be > possible to design an interface that does not preclude the > addition of coroutines or continuations later. > > I'm not volunteering to champion the cause just yet. I just want > to know if there is some issue I'm missing. There are different ways to do interators. Here is a very "tame" proposal (and definitely in the realm of 2.2), that doesn't require any coroutine-like tricks. Let's propose that for var in expr: ...do something with var... will henceforth be translated into __iter = iterator(expr) while __iter.more(): var = __iter.next() ...do something with var... -- or some variation that combines more() and next() (I don't care). Then a new built-in function iterator() is needed that creates an iterator object. It should try two things: (1) If the object implements __iterator__() (or a C API equivalent), call that and be done; this way arbitrary iterators can be created. (2) If the object smells like a sequence (how to test???), use an iterator sort of like this: class Iterator: def __init__(self, sequence): self.sequence = sequence self.index = 0 def more(self): # Store the item so that each index is tried exactly once try: self.item = self.sequence[self.index] except IndexError: return 0 else: self.index = self.index + 1 return 1 def next(self): return self.item (I don't necessarily mean that all those instance variables should be publicly available.) The built-in sequence types can use a very fast built-in iterator type that uses a C int for the index and doesn't store the item in the iterator. (This should be as fast as Marc-Andre's for loop optimization using a C counter.) Dictionaries can define an appropriate iterator that uses PyDict_Next(). If the argument to iterator() is itself an iterator (how to test???), it returns the argument unchanged, so that one can also write for var in iterator(obj): ...do something with var... Files of course should have iterators that return the next input line. We could build filtering and mapping iterators that take an iterator argument and do certain manipulations with the elements; this would effectively introduce the notion lazy evaluation on sequences. Etc., etc. This does not come close to Icon generators -- but it doesn't require any coroutine-like capabilities, unlike those. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Wed Jan 31 00:55:10 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 30 Jan 2001 19:55:10 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3a76df10.22007715@smtp.worldonline.dk> Message-ID: [Finn Bock] > Changing the type of a type? Yuck! No, it temporarily changes the type of the single list being sorted, like so, where "self" is a pointer to a PyListObject (which is a list, not a list *type* object): self->ob_type = &immutable_list_type; err = samplesortslice(self->ob_item, self->ob_item + self->ob_size, compare); self->ob_type = &PyList_Type; immutable_list_type is "just like" PyList_Type, except that the slots for mutating methods point to a function that raises a TypeError. Before this drastic step came years of increasingly ugly hacks trying to stop core dumps when people mutated a list during the sort. Python's sort is very complex, and lots of pointers are tucked away -- having the size of the array, or its position in memory, or the set of objects it contains, change as a side effect of doing a compare, would be difficult and expensive to recover from -- and by "difficult" read "nobody ever managed to get it right before this" <0.5 wink>. > I might very likely be reading the CPython sources wrongly, but it seems > this trick will cause an BadInternalCall if some other C extension are > trying to modify a list while it is freezed by the type switching trick. > I imagine this would happen if the extension called: > > PyList_SetItem(myList, 0, aValue); Well, in CPython it's not "legal" for any other thread to use the C API while the sort is in progress, because the thread doing the sort holds the global interpreter lock for the duration. So this could happen "legally" only if a comparison function called by the sort called out to a C extension attempting to mutate the list. In that case, fine, it *is* a bad call: mutation is not allowed during list sorting, so they deserve whatever they get -- and far better a "bad internal call" than a core dump. If the immutable_list_type were used more generally, it would require more general support (but I see Thomas already talked about that -- thanks). From guido@digicool.com Wed Jan 31 00:55:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:55:19 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 19:24:05 EST." <14967.23333.57259.347222@anthem.wooz.org> References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> <14967.23333.57259.347222@anthem.wooz.org> Message-ID: <200101310055.TAA30250@cj20424-a.reston1.va.home.com> > I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, > renaming it just "Generators" and filling it out for the 2.2 time > frame. If we want to address coroutines and continuations later, we > can write separate PEPs for them. I think it's better not to re-use PEP 220 for that. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Wed Jan 31 00:58:32 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 01:58:32 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310028.TAA30090@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 30, 2001 at 07:28:44PM -0500 References: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> Message-ID: <20010131015832.K962@xs4all.nl> On Tue, Jan 30, 2001 at 07:28:44PM -0500, Guido van Rossum wrote: > > What about just the > > for key:value in dict > > for index:value in sequence > > extensions? > I'm not against these -- I'd say +0.5. What, fractions ? Isn't that against the whole idea of (+|-)(0|1) ? :) But since we are voting, I'm -0 on this right now, and might end up -1 or +0, depending on the implementation; I still can't *see* this, though I wouldn't be myself if I hadn't tried to implement it anyway :) And I ran into some fairly mind-boggling issues. The worst bit is 'how the f*ck does FOR_LOOP know if something's a dict or a list'. And the almost-as-bad bit is 'WTF to do for user classes, extension types and almost-list/almost-dict practically-builtin types (arrays, the *dbm's, etc.)'. After some sleep-deprived consideration I gave up and decided we need an iteration/generator protocol first. However, my life's been busy (or rather, my work has been) with all kinds of small and not so small details, and I haven't been getting much sleep in the last week or so, so I might be overlooking something very simple. That's why I can go either way based on implementation -- it might prove me wrong :) Until my boss is back and I stop being 'responsible' (end of this week, start of next week) and I get a chance to get rid of about 2 months of work backlog (the time he was away) I won't have time to champion or even contribute to such a PEP. Then again, by that time I might be preparing for IPC9 (_if_ my boss sends me there) or even my ApacheCon US presentation (which got accepted today, yay!) So, if that other message was an attempt to drop the PEP on me, Guido, the answer is the same as I tend to give to suits that show up next to my desk wanting to discuss something important (to them) right away: "b'gg'r 'ff" :) I'll-save-my-answer-to-PR-officers-doing-the-same-for-when-you-do-something- -*really*-offensive-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Wed Jan 31 01:16:51 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 20:16:51 -0500 Subject: [Python-Dev] Let's release 2.1a2 Thursday night Message-ID: <200101310116.UAA30386@cj20424-a.reston1.va.home.com> Things look good for a release of 2.1a2 this week; we're aiming for Thursday night. I won't be in town (speaking to the press at LinuxWorld Expo in New York) but Jeremy will handle the release process and the other PythonLabs folks will assist him. Tomorrow Fred will check in his weak references after making some changes (mostly making it more Spartan :-) that I suggested in a code review. After that, I think we're good for the second (and last!) alpha release; and enough has changed (e.g. nested scopes, lots of setup.py changes, flat Makefile) to warrant going ahead now. Now is the time for those last-minute bugfixes that you're all so famous for! I propose a checkin freeze for non-PythonLabs folks Wednesday midnight US west coast time, to give Jeremy c.s. enough time to build the release and give it a good work-out. (An internal freeze is up to Jeremy to declare, but should probably take Tim's sleep cycle into account.) --Guido van Rossum (home page: http://www.python.org/~guido/) PS. I'll be out of reach from noon US east coast time tomorrow (Wednesday), traveling to New York by train. I probably won't check my email while out there; I'll be back Friday night. From guido@digicool.com Wed Jan 31 01:35:25 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 20:35:25 -0500 Subject: [Python-Dev] SSL socket read at EOF; SourceForge problem In-Reply-To: Your message of "Mon, 29 Jan 2001 23:56:09 EST." References: Message-ID: <200101310135.UAA30629@cj20424-a.reston1.va.home.com> > I'm going to repeat a question that I posted about a week ago that passed > without comment on the newsgroup. The issue is the SSL support in the socket > module, which raises an exception when the reading socket is at EOF, rather > than returning an empty string. I'm hesitant to call it a "bug", but I > wouldn't have implemented it this way. There are the names of two people > mentioned at the top of socketmodule.c, but no contact information, so I'm > suggesting here that it be changed to conform to normal file/socket > practice. (SSL was actually added at 2.0, so I'm late to the party with > this; mea culpa, mea culpa. I delayed trying Python2 because of the > extension rebuilding.) I agree that it makes more sense if a read at EOF returns an empty string, since that's what other file-like objects in Python do. I can't do much about this right now, but I'd love to see a patch. It could go into 2.1a2 if small enough. Note that input() and raw_input() are specifically excepted because they are intended for use in interactive mode by newbies mostly; and because "" as return value for EOF would be ambiguous for these. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Wed Jan 31 04:12:23 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jan 2001 17:12:23 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> Message-ID: <200101310412.RAA03140@s454.cosc.canterbury.ac.nz> : > for index:value in sequence -1, because we only construct dicts using that notation, not sequences. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Wed Jan 31 05:21:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 00:21:37 -0500 Subject: [Python-Dev] codecity.com Message-ID: <200101310521.AAA31653@cj20424-a.reston1.va.home.com> Should I spread this word, or is this a joke? The Python quiz category is laughable. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Sat, 27 Jan 2001 23:16:02 -0800 From: "Jeff Cordova" To: Subject: New, fun way to learn Python. Hi Guido, I wanted to let you know about www.codecity.com After several years of managing large software projects in Silicon Valley, I realized that I was spending a lot of time teaching jr. programmers how to write code. So, I created CodeCity to help me automate some of that. If you go to the site, you'll see that I've created a category for Python. There's not much depth to the Python content yet (the site is only a week old) but I'm expecting the Python community to add their wisdom over a period of time. If you could spread the word, it would be highly appreciated. Thankyou, Jeff C. ------- End of Forwarded Message From tim.one@home.com Wed Jan 31 06:16:48 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 01:16:48 -0500 Subject: [Python-Dev] codecity.com In-Reply-To: <200101310521.AAA31653@cj20424-a.reston1.va.home.com> Message-ID: [Guido, on www.codecity.com] > Should I spread this word, or is this a joke? The Python quiz > category is laughable. While the Python section still seems to have only one question, the first day this was announced the third choice wasn't today's: Python is Open Source code, so it doesn't have a creator but: Martha Stewart I liked it better before <0.9 wink>. From moshez@zadka.site.co.il Wed Jan 31 06:30:07 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 08:30:07 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310049.TAA30197@cj20424-a.reston1.va.home.com> References: <200101310049.TAA30197@cj20424-a.reston1.va.home.com>, <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <20010131063007.536ACA83E@darjeeling.zadka.site.co.il> On Tue, 30 Jan 2001 19:49:24 -0500, Guido van Rossum wrote: > There are different ways to do interators. > > Here is a very "tame" proposal (and definitely in the realm of 2.2), > that doesn't require any coroutine-like tricks. Let's propose that > > for var in expr: > ...do something with var... > > will henceforth be translated into > > __iter = iterator(expr) > while __iter.more(): > var = __iter.next() > ...do something with var... I'm +1 on that...but Tim's "try to use that to write something that will return the nodes of a binary tree" still haunts me. Personally, though, I'd thin down the interface to while 1: try: var = __iter.next() except NoMoreError: break # pseudo-break? With the usual caveat that this is a lie as far as "else" is concerned (IOW, pseudo-break gets into the else) > Then a new built-in function iterator() is needed that creates an > iterator object. It should try two things: > > (1) If the object implements __iterator__() (or a C API equivalent), > call that and be done; this way arbitrary iterators can be > created. > (2) If the object smells like a sequence (how to test???), use an > iterator sort of like this: Why not, "if the object doesn't have __iterator__, try this. If it won't work, we'll find out by the exception that will be thrown in our face". class Iterator: def __init__(self, seq): self.seq = seq self.index = 0 def next(self): try: try: return self.seq[self.index] # <- smells like except IndexError: raise NoMoreError(self.index) finally: self.index += 1 > (I don't necessarily mean that all those instance variables should > be publicly available.) But what about your poor brother? Er....I mean, this would make implementing "indexing" really about just getting the index from the iterator. > If the argument to iterator() is itself an iterator (how to test???), No idea, and this looks problematic. I see your point -- but it's still problematic. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one@home.com Wed Jan 31 06:57:26 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 01:57:26 -0500 Subject: [Python-Dev] Can't enter new Python bugs on SourceForge? Message-ID: Reported this earlier. Still can't create a new bug. Guido either. Here's the SF Support request opened on this: http://sourceforge.net/support/ index.php?func=detailsupport&support_id=113100&group_id=1 The good(?) news is that Python isn't the only project to report this problem. From tim.one@home.com Wed Jan 31 07:50:18 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 02:50:18 -0500 Subject: [Python-Dev] FW: Python programmer needed (addition to urllib2 and HTTPS support) Message-ID: Get rich quick! -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of Albert Chin-A-Young Sent: Wednesday, January 31, 2001 2:31 AM To: python-list@python.org Subject: Python programmer needed (addition to urllib2 and HTTPS support) We're in need of a contract Python programmer for the following: 1. Allow connecting to a host with urlopen() which requires BASIC HTTP authentication with a proxy (via urllib2.py). This should address bug #125217: http://sourceforge.net/bugs/?func=detailbug&bug_id=125217&group_id=5470 2. Allow connecting to a host with urlopen() which requires BASIC HTTP authentication with a proxy that requires BASIC HTTP authentication (via urllib2.py). 3. Support for non-authenticated clients to connect to a HTTPS server 4. Support for a client to authenticate the HTTPS host (to verify that it's certificate is valid) What we might consider adding (depends on cost): 1. Support for authenticated clients to connect to a HTTPS server. Please note that solutions to the four items above must be rolled back into the main Python distribution (implies the "community" and the Python developers need to agree on the adopted solution). -- albert chin (china at thewrittenword dot com) -- http://mail.python.org/mailman/listinfo/python-list From ping@lfw.org Wed Jan 31 09:47:10 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 01:47:10 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Message-ID: On Tue, 30 Jan 2001, Guido van Rossum wrote: > > Can you say "PEP time"? :-) Okay, i have written a draft PEP that tries to combine the "elt in dict", custom iterator, and "for k:v" issues into a coherent proposal. Have a look: http://www.lfw.org/python/pep-iterators.txt http://www.lfw.org/python/pep-iterators.html Could i get a number for this please? -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From moshez@zadka.site.co.il Wed Jan 31 10:14:49 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 12:14:49 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: References: Message-ID: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> On Wed, 31 Jan 2001 01:47:10 -0800 (PST), Ka-Ping Yee wrote: > Okay, i have written a draft PEP that tries to combine the > "elt in dict", custom iterator, and "for k:v" issues into a > coherent proposal. Have a look: > > http://www.lfw.org/python/pep-iterators.txt > http://www.lfw.org/python/pep-iterators.html Er....one problem with first reading: you forgot to mention in the while loop description that 'else:' would be executed if the exception is raised, so the 'break' is a pseudo-break'. Basic response: I *love* the iter(), sq_iter and __iter__ parts. I tremble at seeing the rest. Why not add a method to dictionaries .iteritems() and do for (k, v) in dict.iteritems(): pass (dict.iteritems() would return an an iterator to the items) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From MarkH@ActiveState.com Wed Jan 31 10:34:01 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Wed, 31 Jan 2001 21:34:01 +1100 Subject: [Python-Dev] WARNING: Changed build process for zlib on Windows Message-ID: Hi all, In an attempt to solve "[ Bug #129293 ] zlib library used for binary win32 distribution can crash" (https://sourceforge.net/bugs/?func=detailbug&group_id=5470&bug_id=129293), Tim and I have decided that we should fix the build process of zlib.pyd on windows. The current process requires that the builder download _2_ zlib archives - a binary distribution for zlib.lib, and the source archive for the headers. We believe that slight differences between the 2 are causing the above bug. A particular warning-light is that the current process defines ZLIB_DLL even though we are _not_ currently using the DLL but the static lib. Removing this #define generates linker errors. The new process is very simple, but may break some peoples build. In theory it _should_ still work for everyone, but if it fails to build, please check your directory structure. >From the comments I just added to zlib.c: /* *** Notes for Windows Users *** * Download the source distribution as referenced above. * Unpack the distribution such that a "..\..\zlib-1.1.3" directory is created relative to the "pcbuild" directory. * Build this "zlib" project. Via from MSVC magic, the correct zlib makefile will be run, and "..\..\zlib-1.1.3\zlib.lib" will be built before zlib.pyd. *** End of notes for Windows users *** */ Specifically, MSVC has a "pre-link step" setup that runs the zlib makefile from the "..\..\zlib-1.1.3" directory. The reason this _should_ not break your build is that your _probably_ already have a "..\..\zlib-1.1.3" directory installed in the right place so the header files can be located. Once you have a successful build, you can delete the old "zlib113" directory, which was the binary-only distribution. Please let me know if this causes too much pain, or it is in someway broken for you. The relevant checkins are Rev 1.15 of PCbuild/zlib.dsp and Rev 2.37 of Modules/zlibmodules.c. Thanks, Mark. From ping@lfw.org Wed Jan 31 11:00:48 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 03:00:48 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010131015832.K962@xs4all.nl> Message-ID: On Wed, 31 Jan 2001, Thomas Wouters wrote: > I still can't *see* this, though I > wouldn't be myself if I hadn't tried to implement it anyway :) And I ran > into some fairly mind-boggling issues. The worst bit is 'how the f*ck > does FOR_LOOP know if something's a dict or a list'. I believe the Pythonic answer to that is "see if the appropriate method is available". The best definition of "sequence-like" or "mapping-like" i can come up with is: x is sequence-like if it provides __getitem__() but not keys() x is mapping-like if it provides __getitem__() and keys() But in our case, since we need iteration, we can look for specific methods that have to do with just what we need for iteration and nothing else. Thus, e.g. a mapping-like class without a values() method is no problem if we never ask to iterate over values. > And the > almost-as-bad bit is 'WTF to do for user classes, extension types and > almost-list/almost-dict practically-builtin types I think it can be done; the draft PEP at http://www.lfw.org/python/pep-iterators.html is a best-attempt at supporting everything just as you would expect. Let me know if you think there are important cases it doesn't cover. I know, the table mp_iteritems __iteritems__, __iter__, items, __getitem__ mp_iterkeys __iterkeys__, __iter__, keys, __getitem__ mp_itervalues __itervalues__, __iter__, values, __getitem__ sq_iter __iter__, __getitem__ might look a little frightening, but it's not so bad, and i think it's about as simple as you can make it while continuing to support existing pseudo-lists and pseudo-dictionaries. No instance should ever provide __iter__ at the same time as any of the other __iter*__ methods anyway. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From mal@lemburg.com Wed Jan 31 11:56:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 12:56:12 +0100 Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) References: Message-ID: <3A77FD5C.DE8729DC@lemburg.com> > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv17061/Python > > Modified Files: > compile.c > Log Message: > Enforce two illegal import statements that were outlawed in the > reference manual but not checked: Names bound by import statemants may > not occur in global statements in the same scope. The from ... import * > form may only occur in a module scope. > > I guess these changes could break code, but the reference manual > warned about them. Jeremy, your code breaks all uses of "from package import submodule" inside packages. Try distutils for example or setup.py.... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jan 31 12:01:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:01:24 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <3A77FE94.E5082136@lemburg.com> Guido van Rossum wrote: > > [ESR] > > For different reasons, I'd like to be able to set a constant flag on a > > object instance. Simple semantics: if you try to assign to a > > member or method, it throws an exception. > > > > Application? I have a large Python program that goes to a lot of effort > > to build elaborate context structures in core. It would be nice to know > > they can't be even inadvertently trashed without throwing an exception I > > can watch for. > > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? How about .lock() and .unlock() ? > - Should this reversible? I.e. should there be an x.unfreeze()? Yes. These low-level locks could be used in thread programming since the above calls are C level functions and thus thread safe w/r to the global interpreter lock. > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... Sure :) Eric, could you write a PEP for this ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jan 31 12:08:15 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:08:15 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A78002F.DC8F0582@lemburg.com> Tim Peters wrote: > > [MAL] > > ... > > What we really want is iterators for dictionaries, so why not > > implement these instead of tweaking for-loops. > > Seems an unrelated topic: would "iterators for dictionaries" solve the > supposed problem with iteration order? No, but it would solve the problem in a more elegant and generalized way. Besides, it also allows writing code which is thread safe, since the iterator can take special actions to assure that the dictionary doesn't change during the iteration phase (see the other thread about "making mutable objects readonly"). > > If you are looking for speedups w/r to for-loops, applying a > > different indexing technique in for-loops would go a lot further > > and provide better performance not only to dictionary loops, > > but also to other sequences. > > > > I have made some good experience with a special counter object > > (sort of like a mutable integer) which is used instead of the > > iteration index integer in the current implementation. > > Please quantify, if possible. My belief (based on past experiments) is that > in loops fancier than > > for i in range(n): > pass > > the loop overhead quickly falls into the noise even now. I don't remember the figures, but these micor optimizations do speedup loops by a noticable amount. Just compare the performance of stock Python 1.5 against my patched version. > > Using an iterator object instead of the integer + __getitem__ > > call machinery would allow more flexibility for all kinds of > > sequences or containers. ... > > This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new > iteration *protocol* could have major attractions. Not really... the counter object is just a special case of an iterator -- in this case iteration is over the IN. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jan 31 12:10:43 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:10:43 +0100 Subject: [Python-Dev] Re: Making mutable objects readonly References: Message-ID: <3A7800C3.B5D3203F@lemburg.com> Tim Peters wrote: > > Note that even adding a "frozen" flag would add 4 bytes to every freezable > object on most machines. That's why I'd rather .freeze() replace the type > pointer and .unfreeze() restore it. No time or space overhead; no > cluttering up the normal-case (i.e., unfrozen) type implementations with new > tests. Note that Fred's weak ref implementation also need a flag on every weak referencable object (at least last time I looked at his patches). Why not add a flag byte or word to these objects -- then we'd have 8 or 16 choices of what to do with them ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From MarkH@ActiveState.com Wed Jan 31 12:18:12 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Wed, 31 Jan 2001 23:18:12 +1100 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com> Message-ID: MAL writes: > > - How to spell it? x.freeze()? x.readonly()? > > How about .lock() and .unlock() ? I'm with Greg here - lock() and unlock() imply an operation similar to threading.Lock() - ie, exclusivity rather than immutability. I don't have a strong opinion on the other names, but definately prefer any of the others over lock() for this operation. Mark. From mal@lemburg.com Wed Jan 31 12:26:07 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:26:07 +0100 Subject: [Python-Dev] Making mutable objects readonly References: Message-ID: <3A78045F.7DB50871@lemburg.com> Mark Hammond wrote: > > MAL writes: > > > > - How to spell it? x.freeze()? x.readonly()? > > > > How about .lock() and .unlock() ? > > I'm with Greg here - lock() and unlock() imply an operation similar to > threading.Lock() - ie, exclusivity rather than immutability. > > I don't have a strong opinion on the other names, but definately prefer any > of the others over lock() for this operation. Funny, I though that .lock() and .unlock() could be used to implement exactly what threading.Lock() does... Anyway, names really don't matter much, so how about: .mutable([flag]) -> integer If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state. The semantics of this interface would be in sync with many other state APIs in Python and C (e.g. setlocale()). The advantage of making this a method should be clear: it allows writing polymorphic code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Samuele Pedroni Wed Jan 31 12:34:32 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Wed, 31 Jan 2001 13:34:32 +0100 (MET) Subject: [Python-Dev] weak refs and jython Message-ID: <200101311234.NAA24584@core.inf.ethz.ch> Hi. I have read weak ref PEP, maybe too late. I don't know if portability of code using weak refs between python and jython was a goal or could be one, and up to which extent actual impl. will correspond to the PEP. But about The callbacks registered with weak references must accept a single parameter, which will be the weak-ly referenced object itself. The object can be resurrected by creating some other reference to the object in the callback, in which case the weak reference generating the callback will still be cleared but no remaining weak references to the object will be cleared. AFAIK using java weak refs (which I think is a natural choice) I see no way (at least no worth-the-effort way) to implement this in jython. Java weak refs cannot be resurrected. regards, Samuele Pedroni. PS: Mr. X is a jython developer. From bckfnn@worldonline.dk Wed Jan 31 12:49:22 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Wed, 31 Jan 2001 12:49:22 GMT Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <200101302042.PAA29301@cj20424-a.reston1.va.home.com> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> Message-ID: <3a7809c0.14839067@smtp.worldonline.dk> >> > Note that Jeremy is only raising errors for "from M import *". >> >> No, he says he's also raising errors for 'import spam' if 'spam' is declared >> global, like so: >> >> def viking(): >> global spam >> import spam > >Yeah, this was just brought to my attention at our group meeting >today. I'm with you on this one -- there really isn't a good reason >why this shouldn't work. (I wonder why that constraint was ever added >to the reference manual; maybe I was just upset that someone would >*do* something as ugly as that, or maybe there was a J[P]ython >reason???.) Previously Jython have had problems with "from .. import *" in function scope, and still have problems when used with the python -> java compiler: http://sourceforge.net/bugs/?func=detailbug&bug_id=122834&group_id=12867 Using global on an import name is currently ignored by Jython because the name assignment is done by the runtime, not the compiler. regards, finn From thomas@xs4all.net Wed Jan 31 12:59:14 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 13:59:14 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <3a7809c0.14839067@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Wed, Jan 31, 2001 at 12:49:22PM +0000 References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> Message-ID: <20010131135914.N962@xs4all.nl> On Wed, Jan 31, 2001 at 12:49:22PM +0000, Finn Bock wrote: > Using global on an import name is currently ignored by Jython because > the name assignment is done by the runtime, not the compiler. So it's impossible to do, in Jython, something like: def fillme(): global me import me but it is possible to do: def fillme(): global me import me as _me me = _me ? I have to say I don't like that; we're always claiming 'import' (and 'def' and 'class' for that matter) are 'just another way of writing assignment'. All these special cases break that. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From bckfnn@worldonline.dk Wed Jan 31 13:35:36 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Wed, 31 Jan 2001 13:35:36 GMT Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <20010131135914.N962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> <20010131135914.N962@xs4all.nl> Message-ID: <3a780eda.16144995@smtp.worldonline.dk> On Wed, 31 Jan 2001 13:59:14 +0100, you wrote: >On Wed, Jan 31, 2001 at 12:49:22PM +0000, Finn Bock wrote: > >> Using global on an import name is currently ignored by Jython because >> the name assignment is done by the runtime, not the compiler. > >So it's impossible to do, in Jython, something like: > >def fillme(): > global me > import me > >but it is possible to do: > >def fillme(): > global me > import me as _me > me = _me > >? Yes, only the second example will make a global variable. > I have to say I don't like that; we're always claiming 'import' (and >'def' and 'class' for that matter) are 'just another way of writing >assignment'. All these special cases break that. I don't like it either, I was only reported what jython currently does. The current design used by Jython does lend itself directly towards a solution, but I don't see anything that makes it impossible to solve. regards, finn From mal@lemburg.com Wed Jan 31 14:34:19 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 15:34:19 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A78226B.2E177EFE@lemburg.com> Michael Hudson wrote: > > In the interest of generating some numbers (and filling up my hard > drive), last night I wrote a script to build lots & lots of versions > of python (many of which turned out to be redundant - eg. -O6 didn't > seem to do anything different to -O3 and pybench doesn't work with > 1.5.2), and then run pybench with them. Summarised results below; > first a key: > > src-n: this morning's CVS (with Jeremy's f_localsplus optimisation) > (only built this with -O3) > src: CVS from yesterday afternoon > src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc > patch applied. More on this later... > Python-2.0: you can guess what this is. > > All runs are compared against Python-2.0-O2: > > Benchmark: src-n-O3 (rounds=10, warp=20) > Average round time: 49029.00 ms -0.86% > Benchmark: src (rounds=10, warp=20) > Average round time: 67141.00 ms +35.76% > Benchmark: src-O (rounds=10, warp=20) > Average round time: 50167.00 ms +1.44% > Benchmark: src-O2 (rounds=10, warp=20) > Average round time: 49641.00 ms +0.37% > Benchmark: src-O3 (rounds=10, warp=20) > Average round time: 49104.00 ms -0.71% > Benchmark: src-O6 (rounds=10, warp=20) > Average round time: 49131.00 ms -0.66% > Benchmark: src-obmalloc (rounds=10, warp=20) > Average round time: 63276.00 ms +27.94% > Benchmark: src-obmalloc-O (rounds=10, warp=20) > Average round time: 46927.00 ms -5.11% > Benchmark: src-obmalloc-O2 (rounds=10, warp=20) > Average round time: 46146.00 ms -6.69% > Benchmark: src-obmalloc-O3 (rounds=10, warp=20) > Average round time: 46456.00 ms -6.07% > Benchmark: src-obmalloc-O6 (rounds=10, warp=20) > Average round time: 46450.00 ms -6.08% > Benchmark: Python-2.0 (rounds=10, warp=20) > Average round time: 68933.00 ms +39.38% > Benchmark: Python-2.0-O (rounds=10, warp=20) > Average round time: 49542.00 ms +0.17% > Benchmark: Python-2.0-O3 (rounds=10, warp=20) > Average round time: 48262.00 ms -2.41% > Benchmark: Python-2.0-O6 (rounds=10, warp=20) > Average round time: 48273.00 ms -2.39% > > My conclusion? Python 2.1 is slower than Python 2.0, but not by > enough to care about. What compiler did you use and on which platform ? I have made similar experience with -On with n>3 compared to -O2 using pgcc (gcc optimized for PC processors). BTW, the Linux kernel uses "-Wall -Wstrict-prototypes -O3 -fomit-frame-pointer" as CFLAGS -- perhaps Python should too on Linux ?! Does anybody know about the effect of -fomit-frame-pointer ? Would it cause problems or produce code which is not compatible with code compiled without this flag ? > Interestingly, adding obmalloc speeds things up. Let's take a closer > look: > > $ python pybench.py -c src-obmalloc-O3 -s src-O3 > PYBENCH 0.7 > > Benchmark: src-O3 (rounds=10, warp=20) > > Tests: per run per oper. diff * > ------------------------------------------------------------------------ > BuiltinFunctionCalls: 843.35 ms 6.61 us +2.93% > BuiltinMethodLookup: 878.70 ms 1.67 us +0.56% > ConcatStrings: 1068.80 ms 7.13 us -1.22% > ConcatUnicode: 1373.70 ms 9.16 us -1.24% > CreateInstances: 1433.55 ms 34.13 us +9.06% > CreateStringsWithConcat: 1031.75 ms 5.16 us +10.95% > CreateUnicodeWithConcat: 1277.85 ms 6.39 us +3.14% > DictCreation: 1275.80 ms 8.51 us +44.22% > ForLoops: 1415.90 ms 141.59 us -0.64% > IfThenElse: 1152.70 ms 1.71 us -0.15% > ListSlicing: 397.40 ms 113.54 us -0.53% > NestedForLoops: 789.75 ms 2.26 us -0.37% > NormalClassAttribute: 935.15 ms 1.56 us -0.41% > NormalInstanceAttribute: 961.15 ms 1.60 us -0.60% > PythonFunctionCalls: 1079.65 ms 6.54 us -1.00% > PythonMethodCalls: 908.05 ms 12.11 us -0.88% > Recursion: 838.50 ms 67.08 us -0.00% > SecondImport: 741.20 ms 29.65 us +25.57% > SecondPackageImport: 744.25 ms 29.77 us +18.66% > SecondSubmoduleImport: 947.05 ms 37.88 us +25.60% > SimpleComplexArithmetic: 1129.40 ms 5.13 us +114.92% > SimpleDictManipulation: 1048.55 ms 3.50 us -0.00% > SimpleFloatArithmetic: 746.05 ms 1.36 us -2.75% > SimpleIntFloatArithmetic: 823.35 ms 1.25 us -0.37% > SimpleIntegerArithmetic: 823.40 ms 1.25 us -0.37% > SimpleListManipulation: 1004.70 ms 3.72 us +0.01% > SimpleLongArithmetic: 865.30 ms 5.24 us +100.65% > SmallLists: 1657.65 ms 6.50 us +6.63% > SmallTuples: 1143.95 ms 4.77 us +2.90% > SpecialClassAttribute: 949.00 ms 1.58 us -0.22% > SpecialInstanceAttribute: 1353.05 ms 2.26 us -0.73% > StringMappings: 1161.00 ms 9.21 us +7.30% > StringPredicates: 1069.65 ms 3.82 us -5.30% > StringSlicing: 846.30 ms 4.84 us +8.61% > TryExcept: 1590.40 ms 1.06 us -0.49% > TryRaiseExcept: 1104.65 ms 73.64 us +24.46% > TupleSlicing: 681.10 ms 6.49 us -3.13% > UnicodeMappings: 1021.70 ms 56.76 us +0.79% > UnicodePredicates: 1308.45 ms 5.82 us -4.79% > UnicodeProperties: 1148.45 ms 5.74 us +13.67% > UnicodeSlicing: 984.15 ms 5.62 us -0.51% > ------------------------------------------------------------------------ > Average round time: 49104.00 ms +5.70% > > *) measured against: src-obmalloc-O3 (rounds=10, warp=20) > > Words fail me slightly, but maybe some tuning of the memory allocation > of longs & complex numbers would be in order? AFAIR, Vladimir's malloc implementation favours small objects. All number objects (except longs) fall into this category. Perhaps we should think about adding his lib to the core ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jan 31 14:39:01 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 15:39:01 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A782385.5B544CD5@lemburg.com> > In the interest of generating some numbers (and filling up my hard > drive), last night I wrote a script to build lots & lots of versions > of python (many of which turned out to be redundant - eg. -O6 didn't > seem to do anything different to -O3 and pybench doesn't work with > 1.5.2), and then run pybench with them. FYI, I've just updated the archive to also work under Python 1.5.x: http://www.lemburg.com/python/pybench-0.7.zip -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mwh21@cam.ac.uk Wed Jan 31 15:52:23 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 31 Jan 2001 15:52:23 +0000 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 31 Jan 2001 15:34:19 +0100" References: <3A78226B.2E177EFE@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > > My conclusion? Python 2.1 is slower than Python 2.0, but not by > > enough to care about. > > What compiler did you use and on which platform ? Argh, sorry; I meant to put this in! $ uname -a Linux atrus.jesus.cam.ac.uk 2.2.14-1.1.0 #1 Thu Jan 6 05:12:58 EST 2000 i686 unknown $ gcc --version 2.95.1 It's a Dell Dimension XPS D233 (a 233MHz PII) with a reasonably fast hard drive (two year old 10G IBM 7200rpm thingy) and quite a lot of RAM (192Mb). [snip] > AFAIR, Vladimir's malloc implementation favours small objects. > All number objects (except longs) fall into this category. Well, longs & complex numbers don't do any free list handling (like floats and int do), so I see two conclusions: 1) Don't add obmalloc to the core, but do simple free list stuff for longs (might be tricky) and complex nubmers (this should be a no-brainer). 2) Integrate obmalloc - then maybe we can ditch all of that icky freelist stuff. > Perhaps we should think about adding his lib to the core ?! Strikes me as the better solution. Can anyone try this on Windows? Seeing as windows malloc reputedly sucks, maybe the differences would be bigger. Cheers, M. -- Our lecture theatre has just crashed. It will currently only silently display an unexplained line-drawing of a large dog accompanied by spookily flickering lights. -- Dan Sheppard, ucam.chat (from Owen Dunn's summary of the year) From barry@digicool.com Wed Jan 31 16:42:28 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 11:42:28 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: Message-ID: <14968.16500.594486.613828@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> Could i get a number for this please? Looks like you beat Eric to PEP 234. :) I'll update PEP 0 and let you check in your txt file. I may want to do an editorial pass over it. -Barry From barry@digicool.com Wed Jan 31 16:50:10 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 11:50:10 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> Message-ID: <14968.16962.830739.920771@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> Basic response: I *love* the iter(), sq_iter and __iter__ MZ> parts. I tremble at seeing the rest. Why not add a method to MZ> dictionaries .iteritems() and do | for (k, v) in dict.iteritems(): | pass MZ> (dict.iteritems() would return an an iterator to the items) Moshe, I had exactly the same reaction and exactly the same idea. I'm a strong -1 on introducing new syntax for this when new methods can handle it in a much more readable way (IMO). Another idea would be to allow the iterator() method to take an argument: for key in dict.iterator() a.k.a. for key in dict.iterator(KEYS) and also for value in dict.iterator(VALUES) for key, value in dict.iterator(ITEMS) One problem is that the constants KEYS, VALUES, and ITEMS would either have to be defined some place, or you'd just use values like 0, 1, 2, which is less readable perhaps than just having iteratoritems(), iteratorkeys(), and iteratorvalues() methods. Alternative spellings: itemsiter(), keysiter(), valsiter() itemsiterator(), keysiterator(), valuesiterator() iiterator(), kiterator(), viterator() ad-nauseum-ly y'rs, -Barry From skip@mojam.com (Skip Montanaro) Wed Jan 31 16:11:19 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 31 Jan 2001 10:11:19 -0600 (CST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> Message-ID: <14968.14631.419491.440774@beluga.mojam.com> What stimulated this thread about making mutable objects (temporarily) immutable? Can someone give me an example where this is actually useful and can't be handled through some existing mechanism? I'm definitely with Fredrik on this one. Sounds like madness to me. I'm just guessing here, but since the most common need for immutable objects is a dictionary keys, I can envision having to test the lock state of a list or dict that someone wants to use as a key everywhere you would normally call has_key: if l.islocked() and d.has_key(l): ... If you want immutable dicts or lists in order to use them as dictionary keys, just serialize them first: survey_says = {"spam": 14, "eggs": 42} sl = marshal.dumps(survey_says) dict[sl] = "spam" Here's another pitfall I can envision. survey_says = {"spam": 14, "eggs": 42} survey_says.lock() dict[survey_says] = "Richard Dawson" survey_says.unlock() At this point can I safely iterate over the keys in the dictionary or not? Skip From skip@mojam.com (Skip Montanaro) Wed Jan 31 15:57:30 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 31 Jan 2001 09:57:30 -0600 (CST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: References: <20010131015832.K962@xs4all.nl> Message-ID: <14968.13802.22823.702114@beluga.mojam.com> Ping> x is sequence-like if it provides __getitem__() but not keys() So why does this barf? >>> [].__getitem__ Traceback (most recent call last): File "", line 1, in ? AttributeError: __getitem__ (Obviously, lists *do* understand __getitem__ at some level. Why isn't it exposed in the method table?) Skip From fredrik@pythonware.com Wed Jan 31 17:19:44 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 31 Jan 2001 18:19:44 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <007301c08baa$02908220$e46940d5@hagrid> barry wrote: > Alternative spellings: > > itemsiter(), keysiter(), valsiter() > itemsiterator(), keysiterator(), valuesiterator() > iiterator(), kiterator(), viterator() shouldn't that be xitems, xkeys, xvalues? From mal@lemburg.com Wed Jan 31 17:21:02 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 18:21:02 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> Message-ID: <3A78497E.8BCF197E@lemburg.com> Skip Montanaro wrote: > > What stimulated this thread about making mutable objects (temporarily) > immutable? Can someone give me an example where this is actually useful and > can't be handled through some existing mechanism? I'm definitely with > Fredrik on this one. Sounds like madness to me. This thread is an offspring of the "for something in dict:" thread. The problem we face when iterating over mutable objects is that the underlying objects can change. By marking them read-only we can safely iterate over their contents. Another advantage of being able to mark mutable as read-only is that they may become usable as dictionary keys. Optimizations such as self-reorganizing read-only dictionaries would also become possible (e.g. attribute dictionaries which are read-only could calculate a second hash value to make the hashing perfect). > I'm just guessing here, but since the most common need for immutable objects > is a dictionary keys, I can envision having to test the lock state of a list > or dict that someone wants to use as a key everywhere you would normally > call has_key: > > if l.islocked() and d.has_key(l): > ... > > If you want immutable dicts or lists in order to use them as dictionary > keys, just serialize them first: > > survey_says = {"spam": 14, "eggs": 42} > sl = marshal.dumps(survey_says) > dict[sl] = "spam" Sure and that's what .items(), .keys() and .values() do. The idea was to avoid the extra step of creating lists or tuples first. > Here's another pitfall I can envision. > > survey_says = {"spam": 14, "eggs": 42} > survey_says.lock() > dict[survey_says] = "Richard Dawson" > survey_says.unlock() > > At this point can I safely iterate over the keys in the dictionary or not? Tim already pointed out that we will need two different read-only states: a) temporary b) permanent For dictionaries to become usable as keys in another dictionary, they'd have to marked permanently read-only. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy@alum.mit.edu Wed Jan 31 04:35:58 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jan 2001 23:35:58 -0500 (EST) Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) In-Reply-To: <3A77FD5C.DE8729DC@lemburg.com> References: <3A77FD5C.DE8729DC@lemburg.com> Message-ID: <14967.38446.700271.122029@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: >> Modified Files: compile.c Log Message: Enforce two illegal import >> statements that were outlawed in the reference manual but not >> checked: Names bound by import statemants may not occur in global >> statements in the same scope. The from ... import * form may only >> occur in a module scope. >> >> I guess these changes could break code, but the reference manual >> warned about them. MAL> Jeremy, your code breaks all uses of "from package import MAL> submodule" inside packages. MAL> Try distutils for example or setup.py.... Quite aside from whether the changes should be preserved, I don't see how "from package import submodule" is affected. I ran setup.py without any problem; I wouldn't have been able to build Python otherwise. I wrote some simple test cases and didn't have any trouble with the form you describe. Can you provide a concrete example? It may be that something other than the changes mentioned above that is causing you problems. Jeremy From jeremy@alum.mit.edu Wed Jan 31 04:35:58 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jan 2001 23:35:58 -0500 (EST) Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) In-Reply-To: <3A77FD5C.DE8729DC@lemburg.com> References: <3A77FD5C.DE8729DC@lemburg.com> Message-ID: <14967.38446.700271.122029@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: >> Modified Files: compile.c Log Message: Enforce two illegal import >> statements that were outlawed in the reference manual but not >> checked: Names bound by import statemants may not occur in global >> statements in the same scope. The from ... import * form may only >> occur in a module scope. >> >> I guess these changes could break code, but the reference manual >> warned about them. MAL> Jeremy, your code breaks all uses of "from package import MAL> submodule" inside packages. MAL> Try distutils for example or setup.py.... Quite aside from whether the changes should be preserved, I don't see how "from package import submodule" is affected. I ran setup.py without any problem; I wouldn't have been able to build Python otherwise. I wrote some simple test cases and didn't have any trouble with the form you describe. Can you provide a concrete example? It may be that something other than the changes mentioned above that is causing you problems. Jeremy From barry@digicool.com Wed Jan 31 17:20:24 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 12:20:24 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> <007301c08baa$02908220$e46940d5@hagrid> Message-ID: <14968.18776.644453.903217@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> shouldn't that be xitems, xkeys, xvalues? Or iitems(), ikeys(), ivalues()? Personally, I don't much care. If we get consensus on the more important issue of going with methods instead of new syntax, I'm sure Guido will pick whatever method names appeal to him most. -Barry From ping@lfw.org Wed Jan 31 17:14:15 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 09:14:15 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: On Wed, 31 Jan 2001, Skip Montanaro wrote: > Ping> x is sequence-like if it provides __getitem__() but not keys() > > So why does this barf? > > >>> [].__getitem__ I was describing how to tell if instances are sequence-like. Before we get to make that judgement, first we have to look at the C method table. So: x is sequence-like if it has tp_as_sequence; all instances have tp_as_sequence; an instance is sequence-like if it has __getitem__() but not keys() x is mapping-like if it has tp_as_mapping; all instances have tp_as_mapping; an instance is mapping-like if it has both __getitem__() and keys() The "in" operator is implemented this way. x customizes "in" if it has sq_contains; all instances have sq_contains; an instance customizes "in" if it has __contains__() If sq_contains is missing, or if an instance has no __contains__ method, we supply the default behaviour by comparing the operand to each member of x in turn. This default behaviour is implemented twice: once in PyObject_Contains, and once in instance_contains. So i proposed this same structure for sq_iter and __iter__. x customizes "for ... in x" if it has sq_iter; all instances have sq_iter; an instance customizes "in" if it has __iter__() If sq_iter is missing, or if an instance has no __iter__ method, we supply the default behaviour by calling PyObject_GetItem on x and incrementing the index until IndexError. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From mal@lemburg.com Wed Jan 31 17:57:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 18:57:20 +0100 Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) References: <3A77FD5C.DE8729DC@lemburg.com> <14967.38446.700271.122029@localhost.localdomain> Message-ID: <3A785200.FFB37CAD@lemburg.com> Jeremy Hylton wrote: > > >>>>> "MAL" == M -A Lemburg writes: > > >> Modified Files: compile.c Log Message: Enforce two illegal import > >> statements that were outlawed in the reference manual but not > >> checked: Names bound by import statemants may not occur in global > >> statements in the same scope. The from ... import * form may only > >> occur in a module scope. > >> > >> I guess these changes could break code, but the reference manual > >> warned about them. > > MAL> Jeremy, your code breaks all uses of "from package import > MAL> submodule" inside packages. > > MAL> Try distutils for example or setup.py.... > > Quite aside from whether the changes should be preserved, I don't see > how "from package import submodule" is affected. I ran setup.py > without any problem; I wouldn't have been able to build Python > otherwise. I wrote some simple test cases and didn't have any trouble > with the form you describe. Perhaps you still had old .pyc files in your installation dir ? > Can you provide a concrete example? It may be that something other > than the changes mentioned above that is causing you problems. The distutils code is full of imports like these (and other code I'm running is too): distutils/cmd.py: def __init__ (self, dist): """Create and initialize a new Command object. Most importantly, invokes the 'initialize_options()' method, which is the real initializer and depends on the actual command being instantiated. """ # late import because of mutual dependence between these classes from distutils.dist import Distribution This is the report I got from Benjamin Collar: > I've gotten the newest CVS tarball, but setup.py is still not > working; this time with a different error. I will resubmit a bug on > sourceforge if that's the proper way to handle this. Here's the error: > > ./python ./setup.py build > Traceback (most recent call last): > File "./setup.py", line 12, in ? > from distutils.core import Extension, setup > File "/usr/src/python/dist/src/Lib/distutils/core.py", line 20, in ? > from distutils.cmd import Command > File "/usr/src/python/dist/src/Lib/distutils/cmd.py", line 15, in ? > from distutils import util, dir_util, file_util, archive_util, > dep_util > SyntaxError: 'from ... import *' may only occur in a module scope > make: *** [sharedmods] Error 1 -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip@mojam.com (Skip Montanaro) Wed Jan 31 18:33:56 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 31 Jan 2001 12:33:56 -0600 (CST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A78497E.8BCF197E@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> <3A78497E.8BCF197E@lemburg.com> Message-ID: <14968.23188.573257.392841@beluga.mojam.com> MAL> This thread is an offspring of the "for something in dict:" thread. MAL> The problem we face when iterating over mutable objects is that the MAL> underlying objects can change. By marking them read-only we can MAL> safely iterate over their contents. I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. (And what about Andy Dustman's cool sqldict stuff?) If you can't extend this concept in a reasonable fashion to cover (most of) the other objects that smell like dictionaries, I think you'll just be adding needless complications for a feature than can't be used where it's really needed. I see no problem asking for the items() of an in-memory dictionary in order to get a predictable list to iterate over, but doing that for disk-based mappings would be next to impossible. So, I'm stuck iterating over something can can change out from under me. In the end, the programmer will still have to handle border cases specially. Besides, even if you *could* lock your disk-based mapping, are you really going to do that in situations where its sharable (that's what databases they are there for, after all)? I suspect you're going to keep the database mutable and work around any resulting problems. If you want to implement "for key in dict:", why not just have the VM call keys() under the covers and use that list? It would be no worse than the situation today where you call "for key in dict.keys():", and with the same caveats. If you're dumb enough to do that for an on-disk mapping object, well, you get what you asked for. Skip From esr@thyrsus.com Wed Jan 31 17:55:00 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 31 Jan 2001 12:55:00 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A78045F.7DB50871@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 01:26:07PM +0100 References: <3A78045F.7DB50871@lemburg.com> Message-ID: <20010131125500.C5151@thyrsus.com> M.-A. Lemburg : > Anyway, names really don't matter much, so how about: > > .mutable([flag]) -> integer > > If called without argument, returns 1/0 depending on whether > the object is mutable or not. When called with a flag argument, > sets the mutable state of the object to the value indicated > by flag and returns the previous flag state. I'll bear this in mind if things progress to the point where a PEP is indicated. -- Eric S. Raymond From tim.one@home.com Wed Jan 31 19:49:34 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 14:49:34 -0500 Subject: [Python-Dev] WARNING: Changed build process for zlib on Windows In-Reply-To: Message-ID: [Mark Hammond] > ... > The new process is very simple, but may break some peoples build. > ... > The reason this _should_ not break your build is that your > _probably_ already have a "..\..\zlib-1.1.3" directory installed > in the right place so the header files can be located. Actually, it's certain to break the build for anyone who read PCbuild\readme.txt. But I *want* it to break: changing the directory name is a strong hint that they should download the zlib source code from the same place you did (and which is now explained in PCbuild\readme.txt, and mentioned in the 2.1a2 NEWS file). Other than that, worked first time, and-- even better --the second time too . From esr@thyrsus.com Wed Jan 31 17:53:16 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 31 Jan 2001 12:53:16 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 01:01:24PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> Message-ID: <20010131125316.B5151@thyrsus.com> M.-A. Lemburg : > Eric, could you write a PEP for this ? Not yet. I'm about (at Guido's suggestion) to submit a revised ternary-select proposal. Let's process that first. -- Eric S. Raymond "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom." -- John F. Kennedy From tim.one@home.com Wed Jan 31 20:28:00 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 15:28:00 -0500 Subject: [Python-Dev] weak refs and jython In-Reply-To: <200101311234.NAA24584@core.inf.ethz.ch> Message-ID: [Samuele Pedroni] > I have read weak ref PEP, maybe too late. > I don't know if portability of code using weak refs between > python and jython was a goal or could be one, CPython generally doesn't want to do anything impossible for Jython, if it can help it. > and up to which extent actual impl. will correspond to the PEP. Don't care about that. > ... > AFAIK using java weak refs (which I think is a natural choice) I > see no way (at least no worth-the-effort way) to implement this > in jython. Java weak refs cannot be resurrected. Thanks for bringing this up! Fred is looking into it. From fdrake@acm.org Wed Jan 31 20:25:51 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 31 Jan 2001 15:25:51 -0500 (EST) Subject: [Python-Dev] weak refs and jython In-Reply-To: <200101311234.NAA24584@core.inf.ethz.ch> References: <200101311234.NAA24584@core.inf.ethz.ch> Message-ID: <14968.29903.183882.41485@cj42289-a.reston1.va.home.com> Samuele Pedroni writes: > AFAIK using java weak refs (which I think is a natural choice) I see > no way (at least no worth-the-effort way) to implement this in jython. > Java weak refs cannot be resurrected. This is certainly annoying. How about this: the callback receives the weak reference object or proxy which it was registered on as a parameter. Since the reference has already been cleared, there's no way to get the object back, so we don't need to get it from Java either. Would that be workable? (I'm adjusting my patch now.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Wed Jan 31 20:56:52 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 15:56:52 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: [Ping] > x is sequence-like if it provides __getitem__() but not keys() [Skip] > So why does this barf? > > >>> [].__getitem__ > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: __getitem__ > > (Obviously, lists *do* understand __getitem__ at some level. Why > isn't it exposed in the method table?) The old type/class split: list is a type, and types spell their "method tables" in ways that have little in common with how classes do it. See PyObject_GetItem in abstract.c for gory details (e.g., dicts spell their version of getitem via ->tp_as_mapping->mp_subscript(...), while lists spell it ->tp_as_sequence->sq_item(...); neither has any binding to the attr "__getitem__"; instance objects fill in both the tp_as_mapping and tp_as_sequence slots, then map both the mp_subscript and sq_item slots to classobject.c's instance_item, which in turn looks up "__getitem__"). bet-you're-sorry-you-asked-ly y'rs - tim From tim.one@home.com Wed Jan 31 21:24:53 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 16:24:53 -0500 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: <3A78226B.2E177EFE@lemburg.com> Message-ID: [M.-A. Lemburg] > AFAIR, Vladimir's malloc implementation favours small objects. It favors the memory alloc/dealloc patterns Vlad recorded while running an instrumented Python. Which is mostly good news. The flip side is that it favors the specific programs he ran, and who knows whether those are "typical". OTOH, vendor mallocs favor the programs *they* ran, which probably didn't include Python at all . > ... > Perhaps we should think about adding his lib to the core ?! It's patch 101104 on SF. I pushed Vlad to push this for 2.0, but he wisely decided it was too big a change at the time. It's certainly too much a change to slam into 2.1 at this late stage too. There are many reasons to want this (e.g., list.append() calls realloc every time today, because, despite over-allocating, it has no idea how much storage *has* already been allocated; any malloc has to know this info under the covers, but there's no way for us to know that too unless we add another N bytes to every list object to record it, or use our own malloc which *can* tell us that info). list.append()-behavior-varies-wildly-across-platforms-today- when-the-list-gets-large-because-of-that-ly y'rs - tim From tim.one@home.com Wed Jan 31 21:49:31 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 16:49:31 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A78002F.DC8F0582@lemburg.com> Message-ID: [Tim] >> Seems an unrelated topic: would "iterators for dictionaries" solve the >> supposed problem with iteration order? [MAL] > No, but it would solve the problem in a more elegant and > generalized way. I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the problem], but it would solve the problem ...". Can only assume we're switching topics within single sentences now . > Besides, it also allows writing code which is thread safe, since > the iterator can take special actions to assure that the dictionary > doesn't change during the iteration phase (see the other thread > about "making mutable objects readonly"). Sorry, but immutability has nothing to do with thread safety (the latter has to do with "doing a right thing" in the presence of multiple threads, to keep data structures internally consistent; raising an exception is never "a right thing" unless the user is violating the advertised semantics, and if mutation during iteration is such a violation, the presence or absence of multiple threads has nothing to do with that). IOW, perhaps, a critical section is an area of non-exceptional serialization, not a landmine that makes other threads *blow up* if they touch it. > ... > I don't remember the figures, but these micor optimizations That's plural, but I thought you were talking specifically about the mutable counter object. I don't know which, but the two statements don't jibe. > do speedup loops by a noticable amount. Just compare the performance > of stock Python 1.5 against my patched version. No time now, but after 2.1 is out, sure, wrt it (not 1.5). From tim.one@home.com Wed Jan 31 22:10:12 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 17:10:12 -0500 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: Message-ID: [Michael Hudson] > ... > Can anyone try this on Windows? Seeing as windows malloc > reputedly sucks, maybe the differences would be bigger. No time now (pymalloc is a non-starter for 2.1). Was tried in the past on Windows. Helped significantly. Unclear how much was simply due to exploiting the global interpreter lock, though. "Windows" is also a multiheaded beast (e.g., NT has very different memory performance characteristics than 95). From tim.one@home.com Wed Jan 31 22:43:59 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 17:43:59 -0500 Subject: generators (was RE: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: <20010130092454.D18319@glacier.fnational.com> Message-ID: [Neil Schemenauer] > What's the chances of getting generators into 2.2? Unknown. IMO it has more to do with generalizing the iteration protocol than with generators per se (a generator object that doesn't play nice with "for" is unpleasant to use; otoh, a generator object that can't be used divorced from "for" is frustrating too (like when comparing the fringes of two trees efficiently, which requires interleaving two distinct traversals, each naturally recursive on its own)). > The implementation should not be hard. Didn't Steven Majewski have > something years ago? Yes, but Guido also sketched out a nearly complete implementation within the last year or so. > Why do we always get sidetracked on trying to figure out how to > do coroutines and continuations? Sorry, I've been failing to find a good answer to that question for a decade <0.4 wink>. I should note, though, that Guido's current notion of "generator" is stronger than Icon/CLU/Sather's (which are "strictly stack-like"), and requires machinery more elaborate than StevenM (or Guido) sketched before. > Generators would add real power to the language and are simple > enough that most users could benefit from them. Also, it should be > possible to design an interface that does not preclude the > addition of coroutines or continuations later. Agreed. > I'm not volunteering to champion the cause just yet. I just want > to know if there is some issue I'm missing. microthreads have an enthusiastic and possibly growing audience. That gets into (C) stacklessness, though, as do coroutines. I'm afraid that once you go beyond "simple" (Icon) generators, a whole world of other stuff gets pulled in. The key trick to implementing simple generators in current Python is simply to decline decrementing the frame's refcount upon a "suspend" (of course the full details are more involved than *just* that, but they mostly follow *from* just that). everything-is-the-enemy-of-something-ly y'rs - tim From skip@mojam.com (Skip Montanaro) Wed Jan 31 22:27:38 2001 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 31 Jan 2001 16:27:38 -0600 (CST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: References: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: <14968.37210.886842.820413@beluga.mojam.com> >>>>> "Tim" == Tim Peters writes: >> (Obviously, lists *do* understand __getitem__ at some level. Why >> isn't it exposed in the method table?) Tim> The old type/class split: list is a type, and types spell their Tim> "method tables" in ways that have little in common with how classes Tim> do it. The problem that rolls around in the back of my mind from time-to-time is that since Python doesn't currently support interfaces, checking for specific methods seems to be the only reasonable way to determine if a object does what you want or not. What would break if we decided to simply add __getitem__ (and other sequence methods) to list object's method table? Would they foul something up or would simply sit around quietly waiting for hasattr to notice them? Skip From pedroni@inf.ethz.ch Wed Jan 31 22:29:37 2001 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Wed, 31 Jan 2001 23:29:37 +0100 Subject: [Python-Dev] weak refs and jython References: <200101311234.NAA24584@core.inf.ethz.ch> <14968.29903.183882.41485@cj42289-a.reston1.va.home.com> Message-ID: <001f01c08bd5$4c9c9900$7c5821c0@newmexico> Hi. [Fred L. Drake, Jr.] > > Java weak refs cannot be resurrected. > > This is certainly annoying. > How about this: the callback receives the weak reference object or > proxy which it was registered on as a parameter. Since the reference > has already been cleared, there's no way to get the object back, so we > don't need to get it from Java either. > Would that be workable? (I'm adjusting my patch now.) Yes, it is workable: clearly we can implement weak refs only under java2 but this is not (really) an issue. We can register the refs in a java reference queue, and poll it lazily or trough a low-priority thread in order to invoke the callbacks. -- Some remarks I have used java weak/soft refs to implement some of the internal tables of jython in order to avoid memory leaks, at least under java2. I imagine that the idea behind callbacks plus resurrection was to enable the construction of sofisticated caches. My intuition is that these features are not present under java because they will interfere too much with gc and have a performance penalty. On the other hand java offers reference queues and soft references, the latter cover the common case of caches that should be cleared when there is few memory left. (Never tried them seriously, so I don't know if the actual impl is fair, or will just wait too much starting to discard things => behavior like primitives gc). The main difference I see between callbacks and queues approach is that with queues is this left to the user when to do the actual cleanup of his tables/caches, and handling queues internally has a "low" overhead. With callbacks what happens depends really on the collection times/patterns and the overhead is related to call overhead and how much is non trivial, what the user put in the callbacks. Clearly general performance will not be easily predictable. (From a theoretical viewpoint one can simulate more or less queues with callbacks and the other way around). Resurrection makes few sense with queues, but I can easely see that lacking of both resurrection and soft refs limits what can be done with weak-like refs. Last thing: one of the things that is really missing in java refs features is that one cannot put conditions of the form as long A is not collected B should not be collected either. Clearly I'm referring to situation when one cannot modify the class of A in order to add a field, which is quite typical in java. This should not be a problem with python and its open/dynamic way-of-life. regards, Samuele Pedroni. From mal@lemburg.com Wed Jan 31 19:03:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 20:03:12 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> <3A78497E.8BCF197E@lemburg.com> <14968.23188.573257.392841@beluga.mojam.com> Message-ID: <3A786170.CD65B8A4@lemburg.com> Skip Montanaro wrote: > > MAL> This thread is an offspring of the "for something in dict:" thread. > MAL> The problem we face when iterating over mutable objects is that the > MAL> underlying objects can change. By marking them read-only we can > MAL> safely iterate over their contents. > > I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. > (And what about Andy Dustman's cool sqldict stuff?) If you can't extend > this concept in a reasonable fashion to cover (most of) the other objects > that smell like dictionaries, I think you'll just be adding needless > complications for a feature than can't be used where it's really needed. We are currently only talking about Python dictionaries here, even though other objects could also benefit from this. > I see no problem asking for the items() of an in-memory dictionary in order > to get a predictable list to iterate over, but doing that for disk-based > mappings would be next to impossible. So, I'm stuck iterating over > something can can change out from under me. In the end, the programmer will > still have to handle border cases specially. Besides, even if you *could* > lock your disk-based mapping, are you really going to do that in situations > where its sharable (that's what databases they are there for, after all)? I > suspect you're going to keep the database mutable and work around any > resulting problems. > > If you want to implement "for key in dict:", why not just have the VM call > keys() under the covers and use that list? It would be no worse than the > situation today where you call "for key in dict.keys():", and with the same > caveats. If you're dumb enough to do that for an on-disk mapping object, > well, you get what you asked for. That's why iterators do a much better task here. In DB design these are usually called cursors which the allow moving inside large result sets. But this really is a different topic... Readonlyness could be put to some good use in optimizing data structure for which you know that they won't change anymore. Temporary readonlyness has the nice sideeffect of allowing low-level lock implementations and makes writing thread safe code easier to handle, because you can make assertions w/r to the immutability of an object during a certain period of time explicit in your code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jan 31 20:36:54 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 21:36:54 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A78045F.7DB50871@lemburg.com> <20010131125500.C5151@thyrsus.com> Message-ID: <3A787766.35453597@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > Anyway, names really don't matter much, so how about: > > > > .mutable([flag]) -> integer > > > > If called without argument, returns 1/0 depending on whether > > the object is mutable or not. When called with a flag argument, > > sets the mutable state of the object to the value indicated > > by flag and returns the previous flag state. > > I'll bear this in mind if things progress to the point where a PEP is > indicated. Great :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From greg@cosc.canterbury.ac.nz Wed Jan 31 23:21:04 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Feb 2001 12:21:04 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <200101312321.MAA03263@s454.cosc.canterbury.ac.nz> barry@digicool.com (Barry A. Warsaw): > for key in dict.iterator(KEYS) > for value in dict.iterator(VALUES) > for key, value in dict.iterator(ITEMS) Yuck. I don't like any of this "for x in y.iterator_something()" stuff. The things you're after aren't "in" the iterator, they're "in" the dict. I don't want to know that there are iterators involved. We seem to be coming up with more and more convoluted ways to say things that should be very straightforward. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Wed Jan 31 16:23:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 11:23:37 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Wed, 31 Jan 2001 13:35:36 GMT." <3a780eda.16144995@smtp.worldonline.dk> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> <20010131135914.N962@xs4all.nl> <3a780eda.16144995@smtp.worldonline.dk> Message-ID: <200101311623.LAA01774@cj20424-a.reston1.va.home.com> [Finn] > >> Using global on an import name is currently ignored by Jython because > >> the name assignment is done by the runtime, not the compiler. [Thomas] > >So it's impossible to do, in Jython, something like: > > > >def fillme(): > > global me > > import me > > > >but it is possible to do: > > > >def fillme(): > > global me > > import me as _me > > me = _me > > > >? [Finn again] > Yes, only the second example will make a global variable. > > > I have to say I don't like that; we're always claiming 'import' (and > >'def' and 'class' for that matter) are 'just another way of writing > >assignment'. All these special cases break that. > > I don't like it either, I was only reported what jython currently does. > The current design used by Jython does lend itself directly towards a > solution, but I don't see anything that makes it impossible to solve. Tentatively, I'd say that this should be documented as a Jython difference and Jython should strive to fix this. So I see no good reason to rule it out in CPython. That doesn't mean I like Thomas's example! It should probably be redesigned along the lines of def fillme(): import me return me me = fillme() to avoid needing side effects on globals. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jan 31 16:26:11 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 11:26:11 -0500 Subject: [Python-Dev] The 2nd Korea Python Users Seminar Message-ID: <200101311626.LAA01799@cj20424-a.reston1.va.home.com> Wow...! Way to go, Christian! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 31 Jan 2001 22:46:06 +0900 From: "Changjune Kim" To: Subject: The 2nd Korea Python Users Seminar Dear Mr. Guido van Rossum, First of all, I can't thank you more for your great contribution to the presence of Python. It is not a mere computer programming language but a whole culture, I think. I am proud to tell you that we are having the 2nd Korea Python Users Seminar which is wide open to the public. There are already more than 400 people who registered ahead, and we expect a few more at the site. The seminar will be held in Seoul, South Korea on Feb 2. With the effort of Korea Python Users Group, there has been quite a boom or phenomenon for Python among developers in Korea. Several magazines are _competitively_ carrying regular articles about Python -- I'm one of the authors -- and there was an article even on a _normal_ newspaper, one of the major four big newspapers in Korea, which described the sprouting of Python in Korea and pointed its extreme easiness to learn. (moreover, it's the year of the snake in the 12 zodiac animals) The seminar is mainly about: Python 2.0, intro for newbies, Python coding style, ZOPE, internationalization of Zope for Korean, GUIs such as wxPython, PyQt, Internet programming in Python, Python with UML, Python C/API, XML with Python, and Stackless Python. Christian Tismer is coming for SPC presentation with me, and Hostway CEO Lucas Roh will give a talk about how they are using Python, and one of the Python evangelists, Brian Lee, CTO of Linuxkorea will give a brief intro to Python and Python C/API. I'm so excited and happy to tell you this great news. If there is any message you want to give to Korea Python Users Group and the audience, it'd be great -- I could translate it and post it at the site for all the audience. Thank you again for your wonderful snake. Best regards, June from Korea. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com ------- End of Forwarded Message From tim.one@home.com Wed Jan 31 23:25:54 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 18:25:54 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <200101301500.KAA25733@cj20424-a.reston1.va.home.com> Message-ID: [Ping] > Is a frozen list hashable? [Guido] > Yes -- that's what started this thread (using dicts as dict keys, > actually). Except this doesn't actually work unless list.freeze() recursively ensures that all elements in the list are frozen too: >>> hash((1, 2)) 219750523 >>> hash((1, [2])) Traceback (most recent call last): File "", line 1, in ? TypeError: unhashable type >>> That bothered me in Eric's original suggestion: unless x.freeze() does a traversal of all objects reachable from x, it doesn't actually make x safe against modification (except at the very topmost level). But doing such a traversal isn't what *everyone* would want either (as with "const" in C, I expect the primary benefit would be the chance to spend countless hours worming around it in both directions ). [Skip] > If you want immutable dicts or lists in order to use them as > dictionary keys, just serialize them first: > > survey_says = {"spam": 14, "eggs": 42} > sl = marshal.dumps(survey_says) > dict[sl] = "spam" marshal.dumps(dict) isn't canonical, though. That is, it may well be that d1 == d2 but dumps(d1) != dumps(d2). Even materializing dict.values(), then sorting it, then marshaling *that* isn't enough; e.g., consider {1: 1} and {1: 1L}. The latter example applies to marshaling lists too. From greg@cosc.canterbury.ac.nz Wed Jan 31 23:34:50 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Feb 2001 12:34:50 +1300 (NZDT) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <14968.14631.419491.440774@beluga.mojam.com> Message-ID: <200101312334.MAA03267@s454.cosc.canterbury.ac.nz> Skip Montanaro : > Can someone give me an example where this is actually useful and > can't be handled through some existing mechanism? I can envisage cases where you want to build a data structure incrementally, and then treat it as immutable so you can use it as a dict key, etc. There's currently no way to do that to a list without copying it. So, it could be handy to have a way of turning a list into a tuple in-place. It would have to be a one-way transformation, otherwise you could start using it as a dict key, make it mutable again, and cause havoc. Suggested implementation: When you allocate the space for the values of a list, leave enough room for the PyObject_HEAD of a tuple at the beginning. Then you can turn that memory block into a real tuple later, and flag the original list object as immutable so you can't change it later via that route. Hmmm, would waste a bit of space for each list object. Maybe this should be a special list-about-to-become-tuple type. (Tist? Luple?) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Wed Jan 31 23:36:48 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 31 Jan 2001 18:36:48 -0500 Subject: [Python-Dev] RE: [Patch #103203] PEP 205: weak references implementation In-Reply-To: Message-ID: > Patch #103203 has been updated. > > Project: python > Category: core (C code) > Status: Open > Submitted by: fdrake > Assigned to : tim_one > Summary: PEP 205: weak references implementation Fred, just noticed the new "assigned to". If you don't think it's a disaster(*), check it in! That will force more eyeballs on it quickly, and the quicker the better. I'm simply not going to do a decent review quickly on something this large starting cold. More urgently, I've been working long hours every day for several weeks, and need a break so I don't screw up last-second crises tomorrow. has-12-hours-of-taped-professional-wrestling-to-catch-up-on-ly y'rs - tim (*) otoh, if you do think it's a disaster, withdraw it for 2.1. From moshez@zadka.site.co.il Wed Jan 31 20:32:45 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 22:32:45 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <007301c08baa$02908220$e46940d5@hagrid> References: <007301c08baa$02908220$e46940d5@hagrid>, <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <20010131203245.E813BA83E@darjeeling.zadka.site.co.il> [Barry] > itemsiter(), keysiter(), valsiter() > itemsiterator(), keysiterator(), valuesiterator() > iiterator(), kiterator(), viterator() [/F] > shouldn't that be xitems, xkeys, xvalues? I'm so hoping I missed a there somewhere. Please, no more of the dreaded 'x'. thinking-of-ripping-x-from-my-keyboard-ly y'rs, Z. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From greg@cosc.canterbury.ac.nz Wed Jan 31 23:54:45 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Feb 2001 12:54:45 +1300 (NZDT) Subject: [Python-Dev] Generator protocol? (Re: Sets: elt in dict, lst.include) In-Reply-To: <20010131063007.536ACA83E@darjeeling.zadka.site.co.il> Message-ID: <200101312354.MAA03272@s454.cosc.canterbury.ac.nz> Moshe Zadka : > Tim's "try to use that to write something that > will return the nodes of a binary tree" still haunts me. Instead of an iterator protocol, how about a generator protocol? Now that we're getting nested scopes, it should be possible to arrange it so that for x in thing: ...stuff... gets compiled as something like def _body(x): ...stuff... thing.__generate__(_body) (Actually it would be more complicated than that - for backward compatibility you'd want a new bytecode that would look for a __generator__ attribute and emulate the old iteration protocol otherwise.) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Jan 31 23:57:39 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Feb 2001 12:57:39 +1300 (NZDT) Subject: [Python-Dev] codecity.com In-Reply-To: <200101310521.AAA31653@cj20424-a.reston1.va.home.com> Message-ID: <200101312357.MAA03275@s454.cosc.canterbury.ac.nz> > Should I spread this word, or is this a joke? I'm not sure what answering trivia questions has to do with the stated intention of "teaching jr. programmers how to write code". Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Jan 31 23:59:33 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Feb 2001 12:59:33 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310049.TAA30197@cj20424-a.reston1.va.home.com> Message-ID: <200101312359.MAA03278@s454.cosc.canterbury.ac.nz> Guido van Rossum : > But it *is* true that coroutines are a very attractice piece of land > "just nextdoor". Unfortunately there's a big high fence in between topped with barbed wire and patrolled by vicious guard dogs. :-( Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From thomas@xs4all.net Wed Jan 31 21:00:33 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 22:00:33 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: <3A78226B.2E177EFE@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 03:34:19PM +0100 References: <3A78226B.2E177EFE@lemburg.com> Message-ID: <20010131220033.O962@xs4all.nl> On Wed, Jan 31, 2001 at 03:34:19PM +0100, M.-A. Lemburg wrote: > I have made similar experience with -On with n>3 compared to -O2 > using pgcc (gcc optimized for PC processors). BTW, the Linux > kernel uses "-Wall -Wstrict-prototypes -O3 -fomit-frame-pointer" > as CFLAGS -- perhaps Python should too on Linux ?! Maybe, but the Linux kernel can be quite specific in what version of gcc you need, and knows in advance on what platform you are using it :) The stability and actual speedup of gcc's optimization options can and does vary across platforms. In the above example, -Wall and -Wstrict-prototypes are just warnings, and -O3 is the same as "-O2 -finline-functions". As for -fomit-frame-pointer.... > Does anybody know about the effect of -fomit-frame-pointer ? > Would it cause problems or produce code which is not compatible > with code compiled without this flag ? The effect of -fomit-frame-pointer is that the compilation of frame-pointer handling code is avoided. It doesn't have any effect on compatibility, since it doesn't matter that other parts/functions/libraries do have such code, but it does make debugging impossible (on most machines, in any case.) From GCC's info docs: -fomit-frame-pointer' Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. *It also makes debugging impossible on some machines.* On some machines, such as the Vax, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro =06RAME_POINTER_REQUIRED' controls whether a target machine supports this flag. *Note Registers::. Obviously, for the Linux kernel this is a very good thing, you don't debug the Linux kernel like a normal program anyway (contrary to some other UNIX kernels, I might add.) I believe -g turns off -fomit-frame-pointer itself, but the docs for -g or -fomit-frame-pointer don't mention it.=20 One other thing I noted in the gcc docs is that gcc doesn't do loop unrolling even with -O3, though I thought it would at -O2. You need to add -funroll-loop to enable loop unrolling, and that might squeeze out some more performance.. This only works for loops with a fixed repetition, though, so I'm not sure if it matters. --=20 Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me sp= read! From thomas@xs4all.net Wed Jan 31 19:14:58 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 20:14:58 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <14968.16962.830739.920771@anthem.wooz.org>; from barry@digicool.com on Wed, Jan 31, 2001 at 11:50:10AM -0500 References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <20010131201457.I922@xs4all.nl> [ Trimming CC: line ] On Wed, Jan 31, 2001 at 11:50:10AM -0500, Barry A. Warsaw wrote: > Moshe, I had exactly the same reaction and exactly the same idea. I'm > a strong -1 on introducing new syntax for this when new methods can > handle it in a much more readable way (IMO). Same here. I *might* like it if iterators were given a format string (or tuple object, or whatever) so they knew what the iterating code expected (so something like this: for x,y,z in obj would translate into iterator(obj)("(x,y,z)") or maybe just iterator(obj)((None,None,None)) or maybe even just iterator(obj)(3) # that is, number of elements or so) but I suspect it might be too cute (and obfuscated) for Python, especially if it was put to use to distingish between 'for x:y in obj' and 'for x,y in obj'. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From sjoerd@oratrix.nl Wed Jan 31 20:05:06 2001 From: sjoerd@oratrix.nl (Sjoerd Mullender) Date: Wed, 31 Jan 2001 21:05:06 +0100 Subject: [Python-Dev] python setup.py fails with illegal import (+ fix) Message-ID: <20010131200507.A106931E1AD@bireme.oratrix.nl> With the current CVS version, running python setup.py as part of the build process fails with a syntax error: Traceback (most recent call last): File "../setup.py", line 12, in ? from distutils.core import Extension, setup File "/usr/people/sjoerd/src/python/Lib/distutils/core.py", line 20, in ? from distutils.cmd import Command File "/usr/people/sjoerd/src/python/Lib/distutils/cmd.py", line 15, in ? from distutils import util, dir_util, file_util, archive_util, dep_util SyntaxError: 'from ... import *' may only occur in a module scope The fix is to change the from ... import * that the compiler complains about: Index: file_util.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/distutils/file_util.py,v retrieving revision 1.7 diff -u -c -r1.7 file_util.py *** file_util.py 2000/09/30 17:29:35 1.7 --- file_util.py 2001/01/31 20:01:56 *************** *** 106,112 **** # changing it (ie. it's not already a hard/soft link to src OR # (not update) and (src newer than dst). ! from stat import * from distutils.dep_util import newer if not os.path.isfile(src): --- 106,112 ---- # changing it (ie. it's not already a hard/soft link to src OR # (not update) and (src newer than dst). ! from stat import ST_ATIME, ST_MTIME, ST_MODE, S_IMODE from distutils.dep_util import newer if not os.path.isfile(src): I didn't check this in because distutils is Greg Ward's baby. -- Sjoerd Mullender From mal@lemburg.com Wed Jan 31 22:24:43 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 23:24:43 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A7890AB.69B893F9@lemburg.com> Tim Peters wrote: > > [Michael Hudson] > > ... > > Can anyone try this on Windows? Seeing as windows malloc > > reputedly sucks, maybe the differences would be bigger. > > No time now (pymalloc is a non-starter for 2.1). Was tried in the past on > Windows. Helped significantly. Unclear how much was simply due to > exploiting the global interpreter lock, though. "Windows" is also a > multiheaded beast (e.g., NT has very different memory performance > characteristics than 95). We're still in alpha, no ? Adding pymalloc is not much of a deal since it fits nicely with the Python malloc macros and giving the package a nice spin by putting it into a Python alpha release would sure create more confidence in this nice piece of work. We can always take it out again before going into the beta phase. Or do we have a 2.1 feature freeze already ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jan 31 22:15:50 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 23:15:50 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A788E96.AB823FAE@lemburg.com> Tim Peters wrote: > > [Tim] > >> Seems an unrelated topic: would "iterators for dictionaries" solve the > >> supposed problem with iteration order? > > [MAL] > > No, but it would solve the problem in a more elegant and > > generalized way. > > I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the > problem], but it would solve the problem ...". Can only assume we're > switching topics within single sentences now . Sorry, not my brightest day today... what I wanted to say is that iterators would solve the problem of defining "something" in "for something in dict" nicely. Since iterators can define the order in which a data structure is traversed, this would also do away with the second (supposed) problem. > > Besides, it also allows writing code which is thread safe, since > > the iterator can take special actions to assure that the dictionary > > doesn't change during the iteration phase (see the other thread > > about "making mutable objects readonly"). > > Sorry, but immutability has nothing to do with thread safety (the latter has > to do with "doing a right thing" in the presence of multiple threads, to > keep data structures internally consistent; raising an exception is never "a > right thing" unless the user is violating the advertised semantics, and if > mutation during iteration is such a violation, the presence or absence of > multiple threads has nothing to do with that). IOW, perhaps, a critical > section is an area of non-exceptional serialization, not a landmine that > makes other threads *blow up* if they touch it. Who said that an exception is raised ? The method I posted on the mutability thread allows querying the current state just like you would query the availability of a resource. > > ... > > I don't remember the figures, but these micor optimizations > > That's plural, but I thought you were talking specifically about the mutable > counter object. I don't know which, but the two statements don't jibe. The counter object patch is a micro-optimization and as such will only give you a gain of a few percent. What makes the difference is the sum of these micro optimizations. Here's the patch for Python 1.5 which includes the optimizations: http://www.lemburg.com/python/mxPython-1.5.patch.gz > > do speedup loops by a noticable amount. Just compare the performance > > of stock Python 1.5 against my patched version. > > No time now, but after 2.1 is out, sure, wrt it (not 1.5). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 1 01:13:12 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 31 Dec 2000 19:13:12 -0500 Subject: [Python-Dev] Re: Most everything is busted In-Reply-To: <14926.34447.60988.553140@anthem.concentric.net> Message-ID: [Barry A. Warsaw] > There's a stupid, stupid bug in Mailman 2.0, which I've just fixed > and (hopefully) unjammed things on the Mailman end[1]. We're still > probably subject to the Postfix delays unfortunately; I think those > are DNS related, and I've gotten a few other reports of DNS oddities, > which I've forwarded off to the DC sysadmins. I don't think that > particular problem will be fixed until after the New Year. > > relax-and-enjoy-the-quiet-ly y'rs, I would have, except you appear to have ruined it: hundreds of msgs disgorged overnight and into the afternoon. And echoes of email to c.l.py now routinely come back in minutes instead of days. Overall, ya, I liked it better when it was broken -- jerk . typical-user-ly y'rs - tim From tim.one at home.com Mon Jan 1 02:31:18 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 31 Dec 2000 20:31:18 -0500 Subject: [Python-Dev] Copyrights and licensing (was ... something irrelevant) In-Reply-To: <200012291652.RAA20251@pandora.informatik.hu-berlin.de> Message-ID: [Martin von Loewis] > I'd like to get an "official" clarification on this question. Is it > the case that patches containing copyright notices are only accepted > if they are accompanied with license information? It's nigh unto impossible to get Guido to pay attention to these kinds of issues until after it's too late -- guess who's still trying to get an FSF approved license for Python 1.6 . What I intend to push for is that nothing be accepted except under the understanding that copyright is assigned to the Python Software Foundation; but, since that doesn't exist yet, we're in limbo. > I agree that the changes are minor, I also believe that I hold the > copyright to the changes whether I attach a notice or not (at least > according to our local copyright law). Under U.S. law too. The difference is that, without an explicit copyright notice, it's a lot easier to get lawyers to ignore that reality <0.3 wink>. When the PSF does come into being, the lawyers will doubtless make us hassle everyone with an explicit copyright notice into signing reams of paperwork. It's a drain on time and money for all concerned, IMO, with no real payback. > What concerns me that without such a notice, gencodec.py looks as if > CNRI holds the copyright to it. I'm not willing to assign the > copyright of my changes to CNRI, and I'd like to avoid the impression > of doing so. Understood, and with sympathy. Since the status of JPython/Jython is still muddy, I urged Finn Bock to put his own copyright notice on his Jython work for exactly the same reason (i.e., to prevent CNRI claiming it later). Seems to me, though, that it may simplify life down the road if, whenever an author felt a similar need to assert copyright explicitly, they list Guido as the copyright holder. He's not going to screw Python! And it's inevitable that all Python copyrights will eventually be owned by him and/or the PSF anyway. But, for God's sake, whatever you do, *please* (anyone) don't make us look at a unique license! We're not lawyers, but we've been paying lawyers out of our own pockets to do this crap, and it's expensive and time-consuming. If you can't trust Guido to do a Right Thing with your code, Python is better off without it over the long haul. > What is even more concerning is that CNRI also holds the copyright to > the generated files, even though they are derived from information > made available by the Unicode consortium! It's no concern to me -- but then I'm not paranoid . cnri-and-the-uc-can-fight-it-out-if-it-comes-to-that-ly y'rs - tim From moshez at zadka.site.co.il Mon Jan 1 11:01:02 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 1 Jan 2001 12:01:02 +0200 (IST) Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20001231105812.A12168@newcnri.cnri.reston.va.us> References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> Message-ID: <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> On Sun, 31 Dec 2000, Andrew Kuchling wrote: > It also leads to one section of the FAQ (#3, I think) having something > like 60 questions jumbled together. IMHO the FAQ should be a text > file, perhaps in the PEP format so it can be converted to HTML, and it > should have an editor who'll arrange it into smaller sections. Any > volunteers? (Must ... resist ... urge to volunteer myself... help > me, Spock...) Well, Andrew, I know if I leave you any more time, you won't be able to resist the urge. OK, I'll volunteer. Can't do anything right now, but expect to see an updated version posted on my site soon. If people will think it's a good idea, I'll move it to Misc/. Fred, if the some-xml-format-to-HTML you're working on is in any sort of readiness, I'll use that to format the FAQ. Having used Perl in the last couple of weeks, I learned to appreciate the fact that the FAQ is a standard part of the documentation. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From loewis at informatik.hu-berlin.de Mon Jan 1 12:43:34 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 1 Jan 2001 12:43:34 +0100 (MET) Subject: [Python-Dev] Re: Copyrights and licensing (was ... something irrelevant) In-Reply-To: References: Message-ID: <200101011143.MAA11550@pandora.informatik.hu-berlin.de> > Seems to me, though, that it may simplify life down the road if, whenever an > author felt a similar need to assert copyright explicitly, they list Guido > as the copyright holder. He's not going to screw Python! That's a good solution, which I'll implement in a revised patch. Thanks for the advice, and Happy New Year, Martin From mal at lemburg.com Mon Jan 1 18:56:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 01 Jan 2001 18:56:20 +0100 Subject: [Python-Dev] Re: Copyright statements ([Patch #103002] Fix for #116285: Properly raise UnicodeErrors) References: <200012290957.KAA17936@pandora.informatik.hu-berlin.de> <3A4C757D.F64E9CEF@lemburg.com> Message-ID: <3A50C4C4.76A1C5B6@lemburg.com> Martin von Loewis wrote: > > > My only problem with it is your copyright notice. AFAIK, patches to > > the Python core cannot contain copyright notices without proper > > license information. OTOH, I don't think that these minor changes > > really warrant adding a complete license paragraph. > > I'd like to get an "official" clarification on this question. Is it > the case that patches containing copyright notices are only accepted > if they are accompanied with license information? > > I agree that the changes are minor, I also believe that I hold the > copyright to the changes whether I attach a notice or not (at least > according to our local copyright law). True. > What concerns me that without such a notice, gencodec.py looks as if > CNRI holds the copyright to it. I'm not willing to assign the > copyright of my changes to CNRI, and I'd like to avoid the impression > of doing so. > > What is even more concerning is that CNRI also holds the copyright to > the generated files, even though they are derived from information > made available by the Unicode consortium! The copyright for the files and changes needed for the Unicode support was indeed transferred to CNRI earlier this year. This was part of the contract I had with CNRI. I don't know why the copyright notice wasn't subsequently removed from the files after final checkin of the changes, though, because, as I remember, the copyright line was only added as "search&replace" token to the files in question in the sign over period. The codec files were part of the Unicode support patch, even though they were created by the gencodec.py tool I wrote to create them from the Unicode mapping files. That's why they also carry the copyright token. Note that with strict reading of the CNRI license, there's no problem with removing the notice from the files in question: """ ...provided, however, that CNRI's License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) 1995-2000 Corporation for National Research Initiatives; All Rights Reserved" are retained in Python 1.6 alone or in any derivative version prepared by Licensee... """ The copyright line in the Unicode files is "(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.", so this does not match the definition they gave in their license text. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Mon Jan 1 19:58:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 13:58:36 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: Your message of "Fri, 29 Dec 2000 21:59:16 +0100." <20001229215915.L1281@xs4all.nl> References: <20001229215915.L1281@xs4all.nl> Message-ID: <200101011858.NAA09263@cj20424-a.reston1.va.home.com> Thomas just checked this in, using Tim's words: > *** ref7.tex 2000/07/16 19:05:38 1.20 > --- ref7.tex 2000/12/31 22:52:59 1.21 > *************** > *** 243,249 **** > \ttindex{exc_value}\ttindex{exc_traceback}} > > ! The optional \keyword{else} clause is executed when no exception occurs > ! in the \keyword{try} clause. Exceptions in the \keyword{else} clause are > ! not handled by the preceding \keyword{except} clauses. > \kwindex{else} > > --- 243,251 ---- > \ttindex{exc_value}\ttindex{exc_traceback}} > > ! The optional \keyword{else} clause is executed when the \keyword{try} clause > ! terminates by any means other than an exception or executing a > ! \keyword{return}, \keyword{continue} or \keyword{break} statement. > ! Exceptions in the \keyword{else} clause are not handled by the preceding > ! \keyword{except} clauses. > \kwindex{else} How is this different from "when control flow reaches the end of the try clause", which is what I really had in mind? Using the current wording, this paragraph would have to be changed each time a new control-flow keyword is added. Based upon the historical record that's not a grave concern ;-), but I think the new wording relies too much on accidentals such as the fact that these are the only control flow altering events. It may be that control flow is not rigidly defined -- but as it is what was really intended, maybe the fix should be to explain the right concept rather than the current ad-hoc solution. This also avoids concerns of readers who are trying to read too much into the words and might become worried that there are other ways of altering the control flow that *would* cause the else clause to be executed; and guides implementors of other Pyhon-like languages (like vyper) that might have more control-flow altering statements or events. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Mon Jan 1 20:00:38 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 1 Jan 2001 20:00:38 +0100 Subject: [Python-Dev] PSA (Was: FAQ Horribly Out Of Date) Message-ID: <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> > It appears that CNRI can only think about one thing at a time <0.5 > wink>. For the last 6 months, that thing has been the license. If > they ever resolve the GPL compatibility issue, maybe they can be > persuaded to think about the PSA. In the meantime, I'd suggest you > not renew . I think we need to find a better answer than that, and soon. While everybody reading this list probably knows not to renew, the PSA is the first thing that you see when selecting "Python Community" on python.org. The first paragraph reads # The continued, free existence of Python is promoted by the # contributed efforts of many people. The Python Software Activity # (PSA) supports those efforts by helping to coordinate them. The PSA # operates web, ftp, and email services, organizes conferences, and # engages in other activities that benefit the Python user # community. In order to continue, the PSA needs the membership of # people who value Python. If you look at the current members list (http://www.python.org/psa/Members.html), it appears that many long-time members indeed have not renewed. This page was last updated Nov 14 - so it appears that CNRI is still processing applications when they come. It may well be that many of the newer members ask themselves by now what happened to their money; it might not be easy to get an answer to that question. However, there is clearly somebody to blame here: The Python Community. So I'd like to request that somebody with write permissions to these pages changes the text, to something along the lines of replacing the first paragraph with # The Python community organizes itself in different ways; people # interested in discussing development of and with Python usually # participate in mailing lists. # #

Organizations that wish to influence further directions of the # Python language may join the Python # Consortium. # #

The Corporation for # National Research Initiatives hosts the Python Software # Activity, which is described below. The PSA used to provide funding # for the Python development; that is no longer the case. If there is a factual error in this text, please let me know. Regards, Martin From tim.one at home.com Mon Jan 1 20:20:53 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 14:20:53 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [gvanrossum, in an SF patch comment] > Bah. I don't like this one bit. More complexity for a little > bit of extra speed. > I'm keeping this open but expect to be closing it soon unless I > hear a really good argument why more speed is really needed in > this area. Down with code bloat and creeping featurism! Without judging "the solution" here, "the problem" is that everyone's first attempt to use line-at-a-time file input in Perl: while (} { ... $_ ...; } runs 2-5x faster then everyone's first attempt in Python: while 1: line = f.readline() if not line: break ... line ... It would be beneficial to address that *somehow*, cuz 2-5x isn't just "a little bit"; and by the time you walk a newbie thru while 1: lines = f.readlines(hintsize) if not lines: break for line in lines: ... line ... they feel like maybe Perl isn't so obscure after all . Does someone have an elegant way to address this? I believe Jeff's shot at elegance was the other part of the patch, using (his new) xreadlines under the covers to speed the fileinput module. reading-text-files-is-very-common-ly y'rs - tim From guido at digicool.com Mon Jan 1 20:25:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:25:07 -0500 Subject: [Python-Dev] PSA (Was: FAQ Horribly Out Of Date) In-Reply-To: Your message of "Mon, 01 Jan 2001 20:00:38 +0100." <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> References: <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> Message-ID: <200101011925.OAA09669@cj20424-a.reston1.va.home.com> > > It appears that CNRI can only think about one thing at a time <0.5 > > wink>. For the last 6 months, that thing has been the license. If > > they ever resolve the GPL compatibility issue, maybe they can be > > persuaded to think about the PSA. In the meantime, I'd suggest you > > not renew . > > I think we need to find a better answer than that, and soon. While > everybody reading this list probably knows not to renew, the PSA is > the first thing that you see when selecting "Python Community" on > python.org. The first paragraph reads > > # The continued, free existence of Python is promoted by the > # contributed efforts of many people. The Python Software Activity > # (PSA) supports those efforts by helping to coordinate them. The PSA > # operates web, ftp, and email services, organizes conferences, and > # engages in other activities that benefit the Python user > # community. In order to continue, the PSA needs the membership of > # people who value Python. > > If you look at the current members list > (http://www.python.org/psa/Members.html), it appears that many > long-time members indeed have not renewed. This page was last updated > Nov 14 - so it appears that CNRI is still processing applications when > they come. It may well be that many of the newer members ask > themselves by now what happened to their money; it might not be easy > to get an answer to that question. However, there is clearly somebody > to blame here: The Python Community. I don't know how many memberships CNRI has received, but it can't be many, since we sent out no reminders. I'll see if I can get an answer. > So I'd like to request that somebody with write permissions to these > pages changes the text, to something along the lines of replacing the > first paragraph with > > # The Python community organizes itself in different ways; people > # interested in discussing development of and with Python usually > # participate in mailing lists. > # > #

Organizations that wish to influence further directions of the > # Python language may join the Python > # Consortium. > # > #

The Corporation for > # National Research Initiatives hosts the Python Software > # Activity, which is described below. The PSA used to provide funding > # for the Python development; that is no longer the case. > > If there is a factual error in this text, please let me > know. I've done something slightly different -- see http://www.python.org/psa/. I've kept only your first paragraph, and inserted a boldface note before that about the obsolescence (or deprecation :-) of the PSA membership. I've removed the references to the consortium, since that's also about to collapse under its own inactivity; instead, the PSF will be formed, independent from CNRI, to hold the IP rights (insofar they can be assigned to the PSF) and for not much else. I'll see if I can get some more news about the creation of the PSF (which is supposed to be an initiative of ActiveState and Digital Creations). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 1 20:35:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:35:24 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 01 Jan 2001 14:20:53 EST." References: Message-ID: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> > [gvanrossum, in an SF patch comment] > > Bah. I don't like this one bit. More complexity for a little > > bit of extra speed. > > I'm keeping this open but expect to be closing it soon unless I > > hear a really good argument why more speed is really needed in > > this area. Down with code bloat and creeping featurism! > > Without judging "the solution" here, "the problem" is that everyone's first > attempt to use line-at-a-time file input in Perl: > > while (} { > ... $_ ...; > } > > runs 2-5x faster then everyone's first attempt in Python: > > while 1: > line = f.readline() > if not line: > break > ... line ... But is everyone's first thought to time the speed of Python vs. Perl? Why does it hurt so much that this is a bit slow? > It would be beneficial to address that *somehow*, cuz 2-5x isn't just "a > little bit"; and by the time you walk a newbie thru > > while 1: > lines = f.readlines(hintsize) > if not lines: > break > for line in lines: > ... line ... > > they feel like maybe Perl isn't so obscure after all . > > Does someone have an elegant way to address this? I believe Jeff's shot at > elegance was the other part of the patch, using (his new) xreadlines under > the covers to speed the fileinput module. But of course suggesting fileinput is also not a great solution -- it's relatively obscure (since it's not taught by most tutorials, certainly not by the standard tutorial). > reading-text-files-is-very-common-ly y'rs - tim So is worrying about performance without a good reason... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 1 20:49:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:49:24 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: Your message of "Mon, 01 Jan 2001 12:01:02 +0200." <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> Message-ID: <200101011949.OAA09804@cj20424-a.reston1.va.home.com> [Moshe] > Well, Andrew, I know if I leave you any more time, you won't be able > to resist the urge. OK, I'll volunteer. Can't do anything right now, > but expect to see an updated version posted on my site soon. If > people will think it's a good idea, I'll move it to Misc/. > Fred, if the some-xml-format-to-HTML you're working on is in any > sort of readiness, I'll use that to format the FAQ. Moshe, if your solution is to turn the FAQ into a document with a single editor again, I think you're not doing the community a favor. Granted, we could add some more sections (easy enough for me if someone tells me the new section headings and which existing questions go where) and there is a lot of obsolete information. But I would be very hesitant to drop the notion of maintaining the FAQ as a group collaboration project. There's nothing wrong with the FAQ wizard except that the password (Spam) should be made publicly known... I've also noticed that Bjorn Pettersen has made a whole slew of useful updates to various sections, mostly updates about new 2.0 features or syntax. > Having used Perl > in the last couple of weeks, I learned to appreciate the fact that > the FAQ is a standard part of the documentation. Does that mean more than that it should be linked to from http://www.python.org/doc/ ? It's already there in the side bar; does it need a more prominent position? I used to include the FAQ in Misc/ (Ping's Misc/faq2html.py script is a last remnant of that), but gave up after realizing that the on-line FAQ is much more useful than a single text file. In my eyes, the best thing you (and everyone else) could do, if you find the time, would be to use the FAQ wizard to fix or delete out-of-date entries. To delete an entry, change its subject to "Deleted" and remove its body; I'll figure out a way to delete them from the index. Because FAQ entries can refer to each other (and are referred to from elsewhere) by number, it's not safe to simply renumber entries. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 1 21:27:37 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 15:27:37 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: <200101011858.NAA09263@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Thomas just checked this in, using Tim's words: [ The optional \keyword{else} clause is executed when no exception occurs in the \keyword{try} clause. Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. vs The optional \keyword{else} clause is executed when the \keyword{try} clause terminates by any means other than an exception or executing a \keyword{return}, \keyword{continue} or \keyword{break} statement. Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. ] > How is this different from "when control flow reaches the end of the > try clause", which is what I really had in mind? Only in that it doesn't appeal to a new undefined phrase, and is (I think) unambiguous in the eyes of a non-specialist reader (like Robin's friend). Note that "reaching the end of the try clause" is at best ambiguous, because you *really* have in mind "falling off the end" of the try clause. It wouldn't be unreasonable to say that in: try: x = 1 y = 2 return 1 "x=1" is the beginning of the try clause and "return 1" is the end. So if the reader doesn't already know what you mean, saying "the end" doesn't nail it (or, if like me, the reader does already know what you mean, it doesn't matter one whit what it says ). > Using the current wording, this paragraph would have to be > changed each time a new control-flow keyword is added. Based > upon the historical record that's not a grave concern ;-), It was sure no concern of mine ... > but I think the new wording relies too much on accidentals such > as the fact that these are the only control flow altering events. > > It may be that control flow is not rigidly defined -- but as it is > what was really intended, maybe the fix should be to explain the > right concept rather than the current ad-hoc solution. > ... OK, except I don't know how to do that succinctly. For example, if Java had an "else" clause, the Java spec would say: If present, the "else block" is executed if and only if execution of the "try block" completes normally, and then there is a choice: If the "else block" completes normally, then the "try" statement completes normally. If the "else block" completes abruptly for reason S, then the "try" statement completes abruptly for reason S. That is, they deal with control-flow issues via appeal to "complete normally" and "complete abruptly" (which latter comes in several flavors ("reasons"), such as returns and exceptions), and there are pages and pages and pages of stuff throughout the spec inductively defining when these conditions obtain. It's clear, precise and readable; but it's also wordy, and we don't have anything similar to build on. As a compromise, given that we're not going to take the time to be precise (well, I'm sure not ...): The optional \keyword{else} clause is executed if and when control flows off the end of the \keyword{try} clause.\foonote{In Python 2.0, control "flows off the end" except in case of exception, or executing a \keyword{return}, \keyword{continue} or \keyword{break} statement.} Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. Now it's all of imprecise, almost precise, specific to Python 2.0, and robust against any future changes . From akuchlin at cnri.reston.va.us Mon Jan 1 21:35:27 2001 From: akuchlin at cnri.reston.va.us (Andrew Kuchling) Date: Mon, 1 Jan 2001 15:35:27 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <200101011949.OAA09804@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 01, 2001 at 02:49:24PM -0500 References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> <200101011949.OAA09804@cj20424-a.reston1.va.home.com> Message-ID: <20010101153527.A14116@newcnri.cnri.reston.va.us> On Mon, Jan 01, 2001 at 02:49:24PM -0500, Guido van Rossum wrote: >But I would be very hesitant to drop the notion of maintaining the FAQ >as a group collaboration project. There's nothing wrong with the FAQ >wizard except that the password (Spam) should be made publicly known... Why multiply the number of mechanisms required to maintain things? We already use CVS for other documentation; why not use it for the FAQ as well? --amk From tim.one at home.com Mon Jan 1 22:00:36 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 16:00:36 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20010101153527.A14116@newcnri.cnri.reston.va.us> Message-ID: [Andrew Kuchling] > Why multiply the number of mechanisms required to maintain things? > We already use CVS for other documentation; why not use it for the > FAQ as well? The search facilities of the FAQ wizard are invaluable, and so is the ability for "just users" to update the info from within their browsers. There are two problems with the FAQ in practice: 1. It doesn't get updated enough. We can't fix that by making it harder to update! 2. It's *only* available via the web interface. We should ship a text or HTML snapshot with releases; perhaps even do the usual Usenet periodic FAQ-posting thing. From tim.one at home.com Mon Jan 1 23:34:03 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 17:34:03 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > But is everyone's first thought to time the speed of Python vs. Perl? It's few peoples' first thought. It's impossible for bilingual programmers (or dabblers, or evaluators) not to notice *soon*, though, because: > Why does it hurt so much that this is a bit slow? Factors of 2 to 5 aren't "a bit" -- they're obvious when they happen, but the *cause* is not. To judge from a decade of c.l.py gripes, most people write it off to "huh -- guess Python is just slow"; the rest eventually figure out that their text input is the bottleneck (Tom Christiansen never got this far <0.5 wink>), but then don't know what to do about it. At this point I'm going to insert two anonymized pvt emails from last year: -----Original Message #1 ----- From: TTT Sent: Monday, March 13, 2000 2:29 AM To: GGG Subject: RE: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison GGG, note especially figure 4 in Lutz Prechelt's report: > http://wwwipd.ira.uka.de/~prechelt/Biblio/#jccpprtTR The submitted Python programs had by far the largest variability in how long it took to load the dictionary. My input loop is probably typical of the "fast" Python programs, which indeed beat most (but not all) of the fastest Perl ones here: class Dictionary: ... def fill_from_file(self, f, BUFFERSIZE=500000): """f, BUFFERSIZE=500000 -> fill dictionary from file f. f must be an open file, or other object with a readlines() method. It must contain one word per line. Optional arg BUFFERSIZE is used to chunk up input for efficiency, and is roughly the # of bytes read at a time. """ addword = self.addword while 1: lines = f.readlines(BUFFERSIZE) if not lines: break for line in lines: addword(line[:-1]) # chop trailing newline Comparable Perl may have been the one-liner: grep(&addword, chomp(<>)); which may account for why Perl's memory use was uniformly higher than Python's. Whatever, you really need to be a Python expert to dream up "the fast way" to do Python input! Hire me, and I'll fix that . nothing-like-blackmail-before-going-to-bed-ly y'rs - TTT -----Original Message #2 ----- From: GGG Sent: Monday, March 13, 2000 7:08 AM To: TTT Subject: Re: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison Agreed. readlines(BUFFERSIZE) is a crock. In fact, ``for i in f.readlines()'' should use lazy evaluation -- but that will have to wait for Py3K unless we add hints so that readlines knows it is being called from a for loop. --GGG -----Back to 2001 ----- I took TTT's advice and read Lutz's report . I agree with GGG that hiding this in .readlines() would be maximally elegant. xreadlines supplies most of the lazy machinery GGG favored. I don't know how hard it would be to supply the rest of it, but it's such a frequent bitching point that I would prefer pointing people to an explicit .xreadlines() hack than either (a) try to convince them that they "shouldn't" care about the speed as much as they claim to; or, (b) try to explain the double-loop buffering method. I'd personally rather use an explicit .xreadlines() hack than code the double-loop buffering too, and don't see an obvious way to do better than that right now. >> reading-text-files-is-very-common-ly y'rs - tim > So is worrying about performance without a good reason... Indeed it is. I'm persuaded that many people making this specific complaint have a legitimate need for more speed, though, and that many don't persist with Python long enough to find out how to address this complaint (because the double-loop method is too obscure for a newbie to dream up). That makes this hack score extraordinarily high on my benefit/harm ratio scale (in P3K xreadlines can be deprecated in favor of readlines <0.9 wink>). heck-it-doesn't-even-require-a-new-keyword-ly y'rs - tim From thomas at xs4all.net Mon Jan 1 23:46:45 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 1 Jan 2001 23:46:45 +0100 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101011935.OAA09728@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 01, 2001 at 02:35:24PM -0500 References: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <20010101234645.B5435@xs4all.nl> On Mon, Jan 01, 2001 at 02:35:24PM -0500, Guido van Rossum wrote: [ Python lacks a One True Way of doing Perl's 'while(<>)' ] > > Does someone have an elegant way to address this? I believe Jeff's shot at > > elegance was the other part of the patch, using (his new) xreadlines under > > the covers to speed the fileinput module. > But of course suggesting fileinput is also not a great solution -- > it's relatively obscure (since it's not taught by most tutorials, > certainly not by the standard tutorial). Is fileinput really obscure ? I personally quite like it. It is enough like the perl idiom to be very useful for people thinking that way, and it doesn't require special syntax or considerations. If tutorialization is the only problem, I'd be happy to fix that, provided Fred or Moshe can TeX my fix up. As for speed (which stays a secondary or tertiary consideration at best) do we really need the xreadlines method to accomplish that ? Couldn't fileinput get almost the same performance using readlines() with a sizehint ? I personally don't like the xreadlines because it adds yet another function to do the same, with a slight, subtle and to the untrained programmer unclear distinction from the rest. (I don't really like the range/xrange difference either -- I think Python code shouldn't care whether they're dealing with a real list or a generator, and as much as possible should just be generators. And in the case of simple (x)range()es, I have yet to see a case where a 'real' list had significantly better performance than a generator.) If we *do* start adding methods to (the public API of) filemethods, I think we should consider more than just xreadlines() (I seem to recall other proposals, but my memory is hazy at the moment -- I haven't slept since last millennium) add whatever is necessary, and provide a UserFile in the std. lib that 'emulates' all fileobject functionality using a single readline() function. Now, if you'll excuse me, I have a date with a soft bed I haven't seen in about 40 hours, a pair of aspirin my head is killing for and probably a hangover that I don't want to think about, right now ;) Gelukkig-Nieuwjaar-iedereen-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jepler at inetnebr.com Tue Jan 2 02:49:35 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Mon, 1 Jan 2001 19:49:35 -0600 Subject: [Python-Dev] Re: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: ; from Tim Peters on Mon, Jan 01, 2001 at 02:20:53PM -0500 Message-ID: <20010101194935.19672@falcon.inetnebr.com> I'd like to speak up about this patch I've submitted on sourceforge. I consider the xreadlines function/object to be the core of my proposal. The addition of a method to file objects, as well as the modifications to fileinput, are secondary in my opinion. The desire is to iterate over file conents in a way that satisfies the following criteria: * Uses the "for" syntax, because this clearly captures the underlying operation. (files can be viewed as sequences of lines when appropriate) * Consumes small amounts of memory even when the file contents are large. * Has the lowest overhead that can reasonably be attained. I think that it is agreed that the ability to use the "for" syntax is important, since it was the impetus for the xrange function/object. After all, there's a "while" statement which will give the same effect, without introducing xrange. The point under debate, as I see it, is the utility of speeding up the "benchmarks" of folks who compare the speed of Python and another language doing a very simple loop over the lines in a file. Since this advantage disappears once real work is beig done on the file, maybe an XReadLines class, written in Python, would be more suitable. In fact, I've written such a class since I didn't know about fileinput and in any case I find it less useful to me because of all the weird stuff it does. (parsing argv, opening files by name, etc) One shortcoming of my current patch, aside from the ones already named in another person's response to the it, are that it fails when working on a file-like class which implements .readline but not .readlines. In any case, I wrote xreadlines to learn how to write C extensions to Python, and submitted it at the suggestion of a fellow Python user in a private discussion. I'd like to extinguish one of these eternal comp.lang.python threads with it too, but maybe it's not to be. Happy new year, all. Jeff From gstein at lyra.org Tue Jan 2 04:34:31 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 1 Jan 2001 19:34:31 -0800 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20010101153527.A14116@newcnri.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Mon, Jan 01, 2001 at 03:35:27PM -0500 References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> <200101011949.OAA09804@cj20424-a.reston1.va.home.com> <20010101153527.A14116@newcnri.cnri.reston.va.us> Message-ID: <20010101193431.M10567@lyra.org> On Mon, Jan 01, 2001 at 03:35:27PM -0500, Andrew Kuchling wrote: > On Mon, Jan 01, 2001 at 02:49:24PM -0500, Guido van Rossum wrote: > >But I would be very hesitant to drop the notion of maintaining the FAQ > >as a group collaboration project. There's nothing wrong with the FAQ > >wizard except that the password (Spam) should be made publicly known... > > Why multiply the number of mechanisms required to maintain things? We > already use CVS for other documentation; why not use it for the FAQ as > well? That would limit the updaters to just those with CVS access. As Guido just pointed out, Bjorn made a bunch of updates. And he didn't need CVS to do that... Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Tue Jan 2 04:44:05 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 22:44:05 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101194935.19672@falcon.inetnebr.com> Message-ID: [Jeff Epler] > I'd like to speak up about this patch I've submitted on sourceforge. I'm not sure that's allowed . > ... > The point under debate, as I see it, is the utility of speeding > up the "benchmarks" of folks who compare the speed of Python and > another language doing a very simple loop over the lines in a file. If that were true, I couldn't care less. > Since this advantage disappears once real work is being done on > the file, ... I agree that's true, but submit it's rarely relevant. *Most* file-crunching apps are dominated by I/O time, which is why this is so visible to so many; e.g., chewing over massive log files looking for patterns appears to be the growth industry of the 21st century . Even in Lutz's report (see reference from earlier mail), where the task to be solved was far from trivial, input time exceeded processing time across all languages (with some oddball exceptions, when the coder neglected to use a hash table to store info). That's thoroughly typical of real file-crunching applications, in my experience: Perl has a killer speed advantage in the single most time-consuming portion of the app, and due to one implementation trick. Take that advantage away, and Python holds its own in this domain. Coincidentally, I got pvt email from a newbie today, reading in part; > If Perl wasn't so gosh darn good and fast at text scrubbing, it > wouldn't really be a consideration, it's syntax is so clunky and > hard to learn by comparison to both Python and Ruby. This is just depressing, because I can predict every step of this dance. > ... > Happy new year, all. And to you! Just make sure it's a fast new year . From moshez at zadka.site.co.il Tue Jan 2 16:24:40 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 2 Jan 2001 17:24:40 +0200 (IST) Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101234645.B5435@xs4all.nl> References: <20010101234645.B5435@xs4all.nl>, <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <20010102152440.9C26DA84F@darjeeling.zadka.site.co.il> On Mon, 1 Jan 2001, Thomas Wouters wrote: > As for speed (which stays a secondary or tertiary consideration at best) do > we really need the xreadlines method to accomplish that ? Couldn't fileinput > get almost the same performance using readlines() with a sizehint ? I me too Adding xreadlines() to the interface would break half a dozen file-objects all around the world (just the standard library has StringIO, cStringIO, GzipFile and probably some others I can't remember) Adding .readlines(sizehint) to fileinput, and adding a function to create something similar to fileinput from a file object (as opposed to a file name) would help everyone, and doesn't seem to hard. Is there a gotcha I'm just not seeing? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Tue Jan 2 09:06:32 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 03:06:32 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101234645.B5435@xs4all.nl> Message-ID: [Thomas Wouters] > ... > As for speed (which stays a secondary or tertiary consideration > at best) do we really need the xreadlines method to accomplish > that ? Couldn't fileinput get almost the same performance using > readlines() with a sizehint ? There was a long email discussion among Jeff, Paul Prescod, Neel Krishnaswami, and Alex Martelli about this. I started getting copied on it somewhere midstream, but didn't have time to follow it then (like I do now ). About two weeks ago Neel summarized all the approaches then under discussion: """ [Neel Krishnaswami] ... Quick performance summary of the current solutions: Slowest: for line in fileinput.input('foo'): # Time 100 : while 1: line = file.readline() # Time 75 : for line in LinesOf(open('foo')): # Time 25 Fastest: for line in file.readlines(): # Time 10 while 1: lines = file.readlines(hint) # Time 10 for line in xreadlines(file): # Time 10 The difference in speed between the slowest and fastest is about a factor of 10. LinesOf is Alex's Python wrapper class that takes a file and uses readlines() with a size-hint to present a sequence interface. It's around half as fast as the fastest idioms, and 3-4 times faster than while 1:. Jeff's xreadlines is essentially the same thing in C, and is indistinguishable in performance from the other fast idioms. ... """ On his box, line-at-a-time is >7x slower than the fastest Python methods, which latter are usually close (depending on the platform) to Perl line-at-a-time speeds. A factor of 7 is too large for most working programmers to ignore in the interest of somebody else's notion of theoretical purity . Seriously, speed is not a secondary consideration to me when the gap is this gross, and in an area so visible and common. Alex's LineOf appears a good predictor for how adding fileinput.readlines(hint) would perform, since it appears to *be* that (except off on its own). Then it buys a factor of 3 over line-at-a-time on Neel's box but leaves a factor of 2.5 on the table. The cause of the latter appears mostly to be the overhead of getting a Python method call into the equation for each line returned. Note that Jeff added .xreadlines() as a file object method at Neel's urging. The way he started this is shown on the last line: a function. If we threw out the fileinput and file method aspects, and just added a new module xreadlines with a function xreadlines, then what? I bet it would become as popular as the string module, and for good reason: it's a specific approach that works, to a specific and common problem. > ... > And in the case of simple (x)range()es, I have yet to see a case > where a 'real' list had significantly better performance than > a generator.) It varies by platform, but I don't think I've heard of variations larger than 20% in either direction. 20% is nothing, though; in *this* case we're talking order of magnitude. That's go/nogo territory. > ... > Gelukkig-Nieuwjaar-iedereen-ly y'rs I understand people are passionate when reality clashes with the dream of a wart-free language, but that's no reason to swear at me . wishing-you-a-happy-new-year-like-a-civilized-man-ly y'rs - tim From paulp at ActiveState.com Tue Jan 2 11:00:46 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 02:00:46 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <3A51A6CE.3B15371D@ActiveState.com> Guido van Rossum wrote: > > ... > > But is everyone's first thought to time the speed of Python vs. Perl? > Why does it hurt so much that this is a bit slow? I want to interject here that I asked Jeff to submit this patch because I don't see it as "a little bit slow." When someone transliterates a program from one scripting language to another and gets a program that is two to five times slower that is a big deal! > But of course suggesting fileinput is also not a great solution -- > it's relatively obscure (since it's not taught by most tutorials, > certainly not by the standard tutorial). Fileinput's primary problem is that IIRC, it is even slower than doing readline yourself! > > reading-text-files-is-very-common-ly y'rs - tim > > So is worrying about performance without a good reason... I don't understand what constitutes good reason. We're talking about a relatively minor change that will speed up thousands of programs, answer a frequently asked question from comp.lang.python, obliterate an obscure idiom and reduce the number of requests for a Python syntax change (assignment expression) all in one bold sweep. It seemed to me as if it was a "pure win." Paul Prescod From paulp at ActiveState.com Tue Jan 2 11:06:24 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 02:06:24 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: <20010101234645.B5435@xs4all.nl>, <200101011935.OAA09728@cj20424-a.reston1.va.home.com> <20010102152440.9C26DA84F@darjeeling.zadka.site.co.il> Message-ID: <3A51A820.50365F02@ActiveState.com> Moshe Zadka wrote: > > ... > > Adding .readlines(sizehint) to fileinput, and adding a function > to create something similar to fileinput from a file object (as opposed > to a file name) would help everyone, and doesn't seem to hard. > Is there a gotcha I'm just not seeing? Fileinput is inherently slow because there are too many layers of Python code. I started to consider ways of inverting the logic so that it only called into Python when it needed to switch files but it would have been a much larger patch than Jeff's and I thought that a conservative approach was important. Fileinput should someday be optimized but we can easily get a low-hanging fruit improvement with Jeff's patch. Paul Prescod From guido at digicool.com Tue Jan 2 15:56:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 09:56:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 03:06:32 EST." References: Message-ID: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Tim's almost as good at convincing me as he is at channeling me! The timings he showed almost convinced me that fileinput is hopeless and xreadlines should be added. But then I wrote a little timer of my own... I am including the timer program below my signature. The test input was the current access_log of dinsdale.python.org, which has about 119 Mbytes and 1M lines (as counted by the test program). I measure about a factor of 2 between readlines with a sizehint (of 1 MB) and fileinput; a change to fileinput that uses readline with a sizehint and in-lines the common case in __getitem__ (as suggested by Moshe), didn't make a difference. Output (the first time is realtime seconds, the second CPU seconds): total 119808333 chars and 1009350 lines count_chars_lines 7.944 7.890 readlines_sizehint 5.375 5.320 using_fileinput 15.861 15.740 while_readline 8.648 8.570 This was on a 600 MHz Pentium-III Linux box (RH 6.2). Note that count_chars_lines and readlines_sizehint use the same algorithm -- the difference is that readlines_sizehint uses 'pass' as the inner loop body, while count_chars_lines adds two counters. Given that very light per-line processing (counting lines and characters) already increases the time considerably, I'm not sure I buy the arguments that the I/O overhead is always considerable. The fact that my change to fileinput.py didn't make a difference suggests that its lack of speed it purely caused by the Python code. Now what to do? I still don't like xreadlines very much, but I do see that it can save some time. But my test doesn't confirm Neel's times as posted by Tim: > Slowest: for line in fileinput.input('foo'): # Time 100 > : while 1: line = file.readline() # Time 75 > : for line in LinesOf(open('foo')): # Time 25 > Fastest: for line in file.readlines(): # Time 10 > while 1: lines = file.readlines(hint) # Time 10 > for line in xreadlines(file): # Time 10 I only see a factor of 3 between fastest and slowest, and readline is only about 60% slower than readlines_sizehint. --Guido van Rossum (home page: http://www.python.org/~guido/) import time, fileinput, sys def timer(func, *args): t0 = time.time() c0 = time.clock() func(*args) t1 = time.time() c1 = time.clock() print "%-20s %6.3f %6.3f" % (func.__name__, t1-t0, c1-c0) def count_chars_lines(fn, bs=1024*1024): nl = 0 nc = 0 f = open(fn, "r") while 1: buf = f.readlines(bs) if not buf: break for line in buf: nl += 1 nc += len(line) f.close() print "total", nc, "chars and", nl, "lines" def readlines_sizehint(fn, bs=1024*1024): f = open(fn, "r") while 1: buf = f.readlines(bs) if not buf: break for line in buf: pass f.close() def using_fileinput(fn): f = fileinput.FileInput(fn) for line in f: pass f.close() def while_readline(fn): f = open(fn, "r") while 1: line = f.readline() if not line: break pass f.close() fn = "/home/guido/access_log" if sys.argv[1:]: fn = sys.argv[1] timer(count_chars_lines, fn) timer(readlines_sizehint, fn, 1024*1024) timer(using_fileinput, fn) timer(while_readline, fn) From guido at digicool.com Tue Jan 2 16:07:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:07:06 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: Your message of "Mon, 01 Jan 2001 15:27:37 EST." References: Message-ID: <200101021507.KAA12796@cj20424-a.reston1.va.home.com> > As a compromise, given that we're not going to take the time to be precise > (well, I'm sure not ...): > > The optional \keyword{else} clause is executed if and > when control flows off the end of the \keyword{try} > clause.\foonote{In Python 2.0, control "flows off the > end" except in case of exception, or executing a > \keyword{return}, \keyword{continue} or \keyword{break} > statement.} > Exceptions in the \keyword{else} clause are not handled by > the preceding \keyword{except} clauses. > > Now it's all of imprecise, almost precise, specific to Python 2.0, and > robust against any future changes . Sounds good to me. The reference to 2.0 could be changed to "Currently". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 2 16:20:11 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:20:11 -0500 Subject: [Python-Dev] Re: curses in the core? In-Reply-To: Your message of "Thu, 28 Dec 2000 18:25:28 EST." <20001228182528.A10743@thyrsus.com> References: <200012282252.XAA18952@loewis.home.cs.tu-berlin.de> <20001228182528.A10743@thyrsus.com> Message-ID: <200101021520.KAA13222@cj20424-a.reston1.va.home.com> > What does being in the Python core mean? There are two potential definitions: > > 1. Documentation says it's available on all platforms. > > 2. Documentation restricts it to one of the three platform groups > (Unix/Windows/Mac) but implies that it will be available on any > OS in that group. > > I think the second one is closer to what application programmers > thinking about which batteries are included expect. But I could be > persuaded otherwise by a good argument. Actually, when *I* have used the term "core" I've typically thought of this as referring to anything that's in the standard source distribution, whether or not it is built on all platforms. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Tue Jan 2 09:42:30 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 00:42:30 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021456.JAA12633@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 02, 2001 at 09:56:40AM -0500 References: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Message-ID: <20010102004230.A29700@glacier.fnational.com> On Tue, Jan 02, 2001 at 09:56:40AM -0500, Guido van Rossum wrote: > Now what to do? I still don't like xreadlines very much, but I do see > that it can save some time. But my test doesn't confirm Neel's times > as posted by Tim: > > > Slowest: for line in fileinput.input('foo'): # Time 100 > > : while 1: line = file.readline() # Time 75 > > : for line in LinesOf(open('foo')): # Time 25 > > Fastest: for line in file.readlines(): # Time 10 > > while 1: lines = file.readlines(hint) # Time 10 > > for line in xreadlines(file): # Time 10 > > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. Could it be that your using the CVS version of Python which includes Andrew's cool glibc getline enhancement? Neil From guido at digicool.com Tue Jan 2 16:40:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:40:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 00:42:30 PST." <20010102004230.A29700@glacier.fnational.com> References: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> <20010102004230.A29700@glacier.fnational.com> Message-ID: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> [me] > > I only see a factor of 3 between fastest and slowest, and > > readline is only about 60% slower than readlines_sizehint. [Neil] > Could it be that your using the CVS version of Python which > includes Andrew's cool glibc getline enhancement? Bingo! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 2 17:34:31 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 11:34:31 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: <200101021507.KAA12796@cj20424-a.reston1.va.home.com> Message-ID: >> The optional \keyword{else} clause is executed if and >> when control flows off the end of the \keyword{try} >> clause.\foonote{In Python 2.0, control "flows off the >> end" except in case of exception, or executing a >> \keyword{return}, \keyword{continue} or \keyword{break} >> statement.} >> Exceptions in the \keyword{else} clause are not handled by >> the preceding \keyword{except} clauses. [Guido] > Sounds good to me. The reference to 2.0 could be changed to > "Currently". Cool. See http://sourceforge.net/bugs/?group_id=5470&func=detailbug&bug_id=127098 From tim.one at home.com Tue Jan 2 21:48:08 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 15:48:08 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom Message-ID: test_compare is broken because the expected-output file has bizarre stuff in it like: cmp(2, [1]) = -108 cmp(2, (2,)) = -116 cmp(2, None) = -78 What's up with that? I'll leave test_minidom to someone who thinks they know what it's doing. Both failures are very recent. From tim.one at home.com Tue Jan 2 21:48:09 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 15:48:09 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. [Neil] > Could it be that your using the CVS version of Python which > includes Andrew's cool glibc getline enhancement? [Guido] > Bingo! It's a good thing I haven't yet had time to try any speed tests myself, since I don't have a glibc-enabled platform so Guido and I may have been tempted to disagree about numbers in public . I checked out the source for glibc's getline. It's pulling the same trick Perl uses, copying directly from the stdio buffer when it can, instead of (like Python, and like almost all vendor fgets implementations) doing getc-in-a-loop. The difference is that Perl can't do that without breaking into the FILE* representation in platform-dependent ways. It's a shame that almost all vendors missed that fgets was defined as a primitive by the C committee precisely so that vendors *could* pull this speed trick under the covers. It's also a shame that Perl did it for them . From barry at digicool.com Tue Jan 2 22:56:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 2 Jan 2001 16:56:10 -0500 Subject: [Python-Dev] testing, please ignore Message-ID: <14930.20090.283107.799626@anthem.wooz.org> Sorry folks, just making sure things are working again. you-really-didn't-want-email-this-millennium-didja?-ly y'rs, -Barry From guido at python.org Tue Jan 2 21:59:22 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 15:59:22 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 14:59:24 EST." References: Message-ID: <200101022059.PAA14845@cj20424-a.reston1.va.home.com> > [Guido] > > I only see a factor of 3 between fastest and slowest, and > > readline is only about 60% slower than readlines_sizehint. > > [Neil] > > Could it be that your using the CVS version of Python which > > includes Andrew's cool glibc getline enhancement? > > [Guido] > > Bingo! > > It's a good thing I haven't yet had time to try any speed tests myself, > since I don't have a glibc-enabled platform so Guido and I may have been > tempted to disagree about numbers in public . > > I checked out the source for glibc's getline. It's pulling the same trick > Perl uses, copying directly from the stdio buffer when it can, instead of > (like Python, and like almost all vendor fgets implementations) doing > getc-in-a-loop. The difference is that Perl can't do that without breaking > into the FILE* representation in platform-dependent ways. It's a shame that > almost all vendors missed that fgets was defined as a primitive by the C > committee precisely so that vendors *could* pull this speed trick under the > covers. It's also a shame that Perl did it for them . Quite apart from whether we should enable xreadlines(), could you look into doing a similar thing for MSVC stdio? For most Unix platforms, a cop-out answer is "use glibc" -- but for Windows it may pay to do our own hack. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Tue Jan 2 22:06:05 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 2 Jan 2001 16:06:05 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 03:48:09PM -0500 References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> Message-ID: <20010102160605.A5211@kronos.cnri.reston.va.us> On Tue, Jan 02, 2001 at 03:48:09PM -0500, Tim Peters wrote: >into the FILE* representation in platform-dependent ways. It's a shame that >almost all vendors missed that fgets was defined as a primitive by the C >committee precisely so that vendors *could* pull this speed trick under the >covers. It's also a shame that Perl did it for them . So, should Python be changed to use fgets(), available on all ANSI C platforms, rather than the glibc-specific getline()? That would be more complicated than the brain-dead easy course of using getline(), which is obviously why I didn't do it; PyFile_GetLine() had annoyingly complicated logic. When this was discussed in comp.lang.python, someone also mentioned getc_unlocked(), which saves the overhead of locking the stream every time, but that didn't seem a fruitful avenue for exploration. --amk From tim.one at home.com Tue Jan 2 23:00:37 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:00:37 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101022059.PAA14845@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Quite apart from whether we should enable xreadlines(), could you look > into doing a similar thing for MSVC stdio? For most Unix platforms, a > cop-out answer is "use glibc" -- but for Windows it may pay to do our > own hack. There's no question about whether it would pay on Windows, because it pays big for Perl on Windows. The question is about cost. There's no way to *do* it short of the way Perl does it, which is to write a large pile of Windows-specific code (roughly the same size and complexity as the glibc getline implementation -- check it out, it's not trivial, and glibc exploits compiler inlining to make it bearable) relying on reverse-engineered accidents of how MS happens to use all the fields from this undocumented struct (from MS's stdio.h): struct _iobuf { char *_ptr; int _cnt; char *_base; int _flag; int _file; int _charbuf; int _bufsiz; char *_tmpfname; }; typedef struct _iobuf FILE; in their stdio implementation. Else it won't play correctly with MS's stdio. That's A Project. Last year I tried extracting the relevant code from Perl, but, as is usual, gave up after unraveling the third (whatever) layer of mystery macros with no end in sight. I bet it would take me a week. Is it worth that much to you and DC? Since the real Windows experts are hanging out at ActiveState, I bet one of them will volunteer to do it tonight . From tim.one at home.com Tue Jan 2 23:17:14 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:17:14 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010102160605.A5211@kronos.cnri.reston.va.us> Message-ID: [Tim] > It's a shame that almost all vendors missed that fgets was defined > as a primitive by the C committee precisely so that vendors *could* > pull this speed trick under the covers. It's also a shame that Perl > did it for them . [Andrew Kuchling] > So, should Python be changed to use fgets(), available on all ANSI C > platforms, rather than the glibc-specific getline()? That would be > more complicated than the brain-dead easy course of using getline(), > which is obviously why I didn't do it; PyFile_GetLine() had annoyingly > complicated logic. The thrust of my original comment above is that fgets is almost never faster than what Python is doing now, because vendors overwhelmingly do *not* exploit the opportunity the std gave them. So, no, switching to fgets() wouldn't help. > When this was discussed in comp.lang.python, someone also mentioned > getc_unlocked(), which saves the overhead of locking the stream every > time, but that didn't seem a fruitful avenue for exploration. Well, get_unlocked isn't std (not even in C99). Mentioning it did inspire me to discover, however, that while the MS fgets() is the typical "getc in a loop" thing, at least it locks/unlocks the stream once each at function entry/exit, and uses a special MS flavor of getc ("_getc_lk") inside the loop. However, that this helps is an illusion, because the body of their _getc_lk macro is identical to the body of their getc macro. Smells like a bug, or an unfinished project. From paulp at ActiveState.com Tue Jan 2 23:40:39 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 14:40:39 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: Message-ID: <3A5258E7.D52CA2C@ActiveState.com> Tim Peters wrote: > > There's no question about whether it would pay on Windows, because it pays > big for Perl on Windows. The question is about cost. There's no way to > *do* it short of the way Perl does it, which is to write a large pile of > Windows-specific code > ... Since the real Windows experts > are hanging out at ActiveState, I bet one of them will volunteer to do it > tonight . Mark is busy tonight and the Perl guys are still recovering from implementing it the first time. :) Paul From guido at python.org Tue Jan 2 23:46:00 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 17:46:00 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 16:06:05 EST." <20010102160605.A5211@kronos.cnri.reston.va.us> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> Message-ID: <200101022246.RAA16384@cj20424-a.reston1.va.home.com> > On Tue, Jan 02, 2001 at 03:48:09PM -0500, Tim Peters wrote: > >into the FILE* representation in platform-dependent ways. It's a shame that > >almost all vendors missed that fgets was defined as a primitive by the C > >committee precisely so that vendors *could* pull this speed trick under the > >covers. It's also a shame that Perl did it for them . > > So, should Python be changed to use fgets(), available on all ANSI C > platforms, rather than the glibc-specific getline()? That would be > more complicated than the brain-dead easy course of using getline(), > which is obviously why I didn't do it; PyFile_GetLine() had annoyingly > complicated logic. You mean get_line(), which indeed has a complicated API and corresponding logic: the argument may be a max length, or 0 to indicate arbutrary length, or negative to indicate raw_input() semantics. :-( Unfortunately we can't use fgets(), even if it were faster than getline(), because it doesn't tell how many characters it read. On files containing null bytes, readline() is supposed to treat these like any other character; if your input is "abc\0def\nxyz\n", the first readline() call should return "abc\0def\n". But with fgets(), you're left to look in the returned buffer for a null byte, and there's no way (in general) to distinguish this result from an input file that only consisted of the three characters "abc". getline() doesn't seem to have this problem, since its size is also an output parameter. > When this was discussed in comp.lang.python, someone also mentioned > getc_unlocked(), which saves the overhead of locking the stream every > time, but that didn't seem a fruitful avenue for exploration. I've never heard of getc_unlocked; it's not in the (old) C standard. If it's also a glibc thing, I doubt that using it would be faster than getline(). If it's a new C standard (C9x) thing, we'll have to wait. Fred reminded me that for e.g. Solaris, while everybody probably compiles with GCC, that doesn't mean they are using glibc, so in practice getline() will only help on Linux. I'm slowly warming up to xreadlines(), although we must be careful to consider the consequences (do other file-like objects need to support it too?). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 2 23:46:18 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:46:18 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <3A5258E7.D52CA2C@ActiveState.com> Message-ID: [Tim] > ... Since the real Windows experts are hanging out at ActiveState, > I bet one of them will volunteer to do it tonight . [Paul Prescod] > Mark is busy tonight and the Perl guys are still recovering from > implementing it the first time. :) I'm delighted, then, that you have nothing better to do than tease the decent, hard-working folks on Python-Dev! I'll be up until about 4am -- feel free to submit your patch anytime before then. in-a-pinch-i'll-even-accept-it-tomorrow-ly y'rs - tim From guido at python.org Tue Jan 2 23:53:14 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 17:53:14 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 17:00:37 EST." References: Message-ID: <200101022253.RAA16482@cj20424-a.reston1.va.home.com> > [Guido] > > Quite apart from whether we should enable xreadlines(), could you look > > into doing a similar thing for MSVC stdio? For most Unix platforms, a > > cop-out answer is "use glibc" -- but for Windows it may pay to do our > > own hack. > > There's no question about whether it would pay on Windows, because it pays > big for Perl on Windows. The question is about cost. There's no way to > *do* it short of the way Perl does it, which is to write a large pile of > Windows-specific code (roughly the same size and complexity as the glibc > getline implementation -- check it out, it's not trivial, and glibc exploits > compiler inlining to make it bearable) relying on reverse-engineered > accidents of how MS happens to use all the fields from this undocumented > struct (from MS's stdio.h): > > struct _iobuf { > char *_ptr; > int _cnt; > char *_base; > int _flag; > int _file; > int _charbuf; > int _bufsiz; > char *_tmpfname; > }; > typedef struct _iobuf FILE; > > in their stdio implementation. Else it won't play correctly with MS's > stdio. That's A Project. Last year I tried extracting the relevant code > from Perl, but, as is usual, gave up after unraveling the third (whatever) > layer of mystery macros with no end in sight. I bet it would take me a > week. Is it worth that much to you and DC? Since the real Windows experts > are hanging out at ActiveState, I bet one of them will volunteer to do it > tonight . Yeah. That's too much. Too bad. I'm not holding my breath for ActiveState though. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Jan 2 23:52:58 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 2 Jan 2001 16:52:58 -0600 (CST) Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101022246.RAA16384@cj20424-a.reston1.va.home.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> Message-ID: <14930.23498.53540.401218@beluga.mojam.com> Guido> I'm slowly warming up to xreadlines(), ... I haven't followed this thread closely, and my brain is a bit frazzled at the moment, but is there some fundamental reason that the file object's readlines method can't be made lazy, perhaps only when given a sizehint? Skip From paulp at ActiveState.com Tue Jan 2 23:59:47 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 14:59:47 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> Message-ID: <3A525D63.17ABCC87@ActiveState.com> Skip Montanaro wrote: > > Guido> I'm slowly warming up to xreadlines(), ... > > I haven't followed this thread closely, and my brain is a bit frazzled at > the moment, but is there some fundamental reason that the file object's > readlines method can't be made lazy, perhaps only when given a sizehint? I suggested this at one point but it was pointed out that there is probably a lot of code that works with the resulting list *as a list* i.e. as a random-access, writable sequence object. I really wasn't thrilled with xreadlines at first either...it's the least of all possible evils (including the status quo). Paul From nas at arctrix.com Tue Jan 2 17:09:15 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 08:09:15 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 03:48:08PM -0500 References: Message-ID: <20010102080915.A30892@glacier.fnational.com> On Tue, Jan 02, 2001 at 03:48:08PM -0500, Tim Peters wrote: > test_compare is broken because the expected-output file has bizarre stuff in > it like: > > cmp(2, [1]) = -108 > cmp(2, (2,)) = -116 > cmp(2, None) = -78 > > What's up with that? My fault. I only ran regrtest.py and not "make test". I'm not sure why you say bizarre stuff though. Do you object to testing that 2 is less than None (something that is not part of the language spec) or do you think that the results from cmp() should be clamped between -1 and 1? Neil From guido at python.org Wed Jan 3 00:06:16 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 18:06:16 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 16:52:58 CST." <14930.23498.53540.401218@beluga.mojam.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> Message-ID: <200101022306.SAA16684@cj20424-a.reston1.va.home.com> > I haven't followed this thread closely, and my brain is a bit frazzled at > the moment, but is there some fundamental reason that the file object's > readlines method can't be made lazy, perhaps only when given a sizehint? Yes -- readlines() is documented to return a list, and some people do things to it that require it to be a real list (e.g. sort or reverse it or modify it in place or concatenate it with other lists). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 3 00:19:14 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 18:19:14 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102080915.A30892@glacier.fnational.com> Message-ID: [Tim] > test_compare is broken because the expected-output file has > bizarre stuff in it like: > > cmp(2, [1]) = -108 > cmp(2, (2,)) = -116 > cmp(2, None) = -78 > > What's up with that? [Neil Schemenauer] > My fault. I only ran regrtest.py and not "make test". Neil, my platform doesn't even *have* a "make": are you saying the test passes for you when you run regrtest.py? That's what I did. > I'm not sure why you say bizarre stuff though. Do you object to > testing that 2 is less than None (something that is not part of the > language spec) Only in part. Lang Ref 2.1.3 (Comparisons) says you can compare them, and guarantees they won't compare equal, but doesn't define it beyond that. If Python actually says "less", fine, we can test for that, although to minimize maintenance down the road it would be better to test for no more than we expect Python to guarantee across releases and implementations (suppose Jython says 2 is greater than None: that's fine too, and it would be better if the test suite didn't say Jython was broken). > or do you think that the results from cmp() should be clamped > between -1 and 1? Not that either ; cmp() isn't documented that way. They're "bizarre" simply because they're not what Python returns! C:\Code\python\dist\src\PCbuild>python Python 2.0 (#8, Dec 17 2000, 01:39:08) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> cmp(2, [1]) -1 >>> cmp(2, (2,)) -1 >>> cmp(2, None) -1 >>> The expected-output file is supposed to match what Python actually does. I have no idea where things like "-108" came from. So things like -108 look bizarre to me. So long as cmp(2, [1]) returns -1 in reality, an expected-output file that claims it returns -108 will never work no matter how you run the tests. One of us is missing something obvious here . From paulp at ActiveState.com Wed Jan 3 00:26:39 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 15:26:39 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> Message-ID: <3A5263AF.CE6C8C81@ActiveState.com> Guido van Rossum wrote: > > ... > > I'm slowly warming up to xreadlines(), although we must be careful to > consider the consequences (do other file-like objects need to support > it too?). The implementation is such that it is pretty easy to add the method to other file-like objects. It is also easy to use the xreadlines module to get the same behavior for objects that do not have the method. Essentially, file.xreadlines is implemented like this: def xreadlines(self): import xreadlines xreadlines.xreadlines(self) Any object can add the method similarly. Paul Prescod From nas at arctrix.com Tue Jan 2 17:51:48 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 08:51:48 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 06:19:14PM -0500 References: <20010102080915.A30892@glacier.fnational.com> Message-ID: <20010102085148.A30986@glacier.fnational.com> On Tue, Jan 02, 2001 at 06:19:14PM -0500, Tim Peters wrote: > Neil, my platform doesn't even *have* a "make": are you saying the test > passes for you when you run regrtest.py? Yes. Isn't checking in code without running regrtest a capital offence? :) > Lang Ref 2.1.3 (Comparisons) says you can compare them, and > guarantees they won't compare equal, but doesn't define it beyond that. Okay, I'll use == rather than cmp(). When I was working on the coercion patch I found cmp() useful. I guess it shouldn't be in the standard test suite, especially since Jython may implement things differently. [Neil] > or, do you think that the results from cmp() should be clamped > between -1 and 1? [Tim] > Not that either ; cmp() isn't documented that way. > > They're "bizarre" simply because they're not what Python returns! They do on my box: Python 2.0 (#19, Nov 21 2000, 18:13:04) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Type "copyright", "credits" or "license" for more information. >>> cmp(1, None) -78 I guess MS uses a different strcmp than GNU. Do you mind trying the attached C code? I get "-78" as output. I should have thought a little more before checking in the patch. -78 is quite obviously a machine/library dependent thing. [Tim again] > One of us is missing something obvious here . I don't know about that. The implementation of coercion and comparison is not simple. I've been studying it for some time now and I obviously still don't know what the hell is going on. AFAICT, the problem is that instances without a comparison method can compare larger or smaller than numbers depending on where in memory the objects are stored. Neil #include #include int main() { printf("%d\n", strcmp("", "None")); } From tim.one at home.com Wed Jan 3 01:30:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 19:30:26 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102085148.A30986@glacier.fnational.com> Message-ID: [Neil] > They do on my box: > > Python 2.0 (#19, Nov 21 2000, 18:13:04) > [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> cmp(1, None) > -78 Well, who cares about your silly box ? Messier than I thought! Yes, Windows strcmp is always in {-1, 0, 1}. Rather than run tests, here's the tail end of MS's strcmp.c: if ( ret < 0 ) ret = -1 ; else if ( ret > 0 ) ret = 1 ; return( ret ); Wasted cycles and stupid formatting . > ... > AFAICT, the problem is that instances without a comparison method can > compare larger or smaller than numbers depending on where in memory > the objects are stored. If so, that's a bug ... OK, it *is* a bug, at least in current CVS. Did you cause that, or was it always this way? I was able to provoke this badness: >>> j < c < i 1 >>> j < i 0 >>> i.e. it violates transitivity, and that's never supposed to happen in the absence of user-supplied __cmp__. Here c is an instance of "class C: pass", and i and j are ints. >>> type(i), type(j), type(c) (, , ) >>> i, j, c (999999, 1000000, <__main__.C instance at 00791B7C>) >>> id(i), id(j), id(c) (7941572, 7744676, 7936892) >>> Guido thought he fixed this kind of stuff once (and I believed him ) by treating all numbers as if they had type name "" (i.e., yes, an empty string) when compared to non-numbers. Then the usual "mixed-type comparisons in the absence of __cmp__ compare via type name string" rule ensured that numbers would always compare "less than" instances of any other type. That's the intent of the tail end: else if (vtp->tp_as_number != NULL) vname = ""; else if (wtp->tp_as_number != NULL) wname = ""; /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); of PyObject_Compare. So, in the example above, we *should* have i < c == 1 j < c == 1 j < c < i == 0 Unfortunately, we actually have i < c == 0 in that example. We're apparently not getting to the "number hack" code because c is an instance, and I'll confess up front that my eyes always glazed over long before I got to PyInstance_HalfBinOp <0.half wink>. Whatever, there's at least one bug somewhere in that path! We should have n < i == 1 for any numeric type n and any non-numeric type i (in the absence of user-defined __cmp__). From skip at mojam.com Wed Jan 3 02:27:03 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 2 Jan 2001 19:27:03 -0600 (CST) Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <3A525D63.17ABCC87@ActiveState.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> <3A525D63.17ABCC87@ActiveState.com> Message-ID: <14930.32743.525564.69044@beluga.mojam.com> Paul> I suggested this at one point but it was pointed out that there is Paul> probably a lot of code that works with the resulting list *as a Paul> list* How about this idea? What if readlines() was allowed to return a lazy evaluator if a sizehint > 0 was given? I only saw one example outside of test cases in the current CVS tree where readlines(sizehint) was used (Tools/idle/GrepDialog.py), and it used it as expected: while 1: block = f.readlines(sizehint) if not block: break for line in block: more stuff My suspicion is that most uses of sizehint will be like this. It hasn't been around all that long in Python-years (since 1.5a2), so there's probably not tons of code to break (I agree the semantics would change), and the majority of code that uses it probably looks like the above, which is almost safe (if it returned "" instead of an empty evaluator when nothing was left to read it would be safe). The advantage would be that the above could become the more obvious for line in f.readlines(sizehint): more stuff and the change to file reading code that is "too slow" becomes much simpler. (Of course, xreadlines() has that advantage as well.) I scanned my own code quickly. I found about 10 uses with sizehint and 300 without. I presume we are talking about 2.1 here. In any case, it seems to me that in Py3k readlines should be lazy. Skip P.S. Why did FileInput class never grow a readlines method? From nas at arctrix.com Tue Jan 2 20:38:53 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 11:38:53 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 07:30:26PM -0500 References: <20010102085148.A30986@glacier.fnational.com> Message-ID: <20010102113853.A31341@glacier.fnational.com> On Tue, Jan 02, 2001 at 07:30:26PM -0500, Tim Peters wrote: > > AFAICT, the problem is that instances without a comparison method can > > compare larger or smaller than numbers depending on where in memory > > the objects are stored. > > If so, that's a bug ... OK, it *is* a bug, at least in current CVS. Did you > cause that, or was it always this way? To quote Bart Simpson: I didn't do it. I'm pretty sure the bug is in PyInstance_DoBinOp. I don't think its worth fixing though. I'm ready to check in my coercion overhaul patch, assuming no veto's from the list. It should fix this bug (and introduce a whole slew of new ones :). Guido suggested that I remove the "number types compare smaller than other types" behavior. What's your take on that? The current patch on SF always uses the type names. It should be easy to implement the old behavior though. Neil From nas at arctrix.com Tue Jan 2 20:48:09 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 11:48:09 -0800 Subject: [Python-Dev] Applying the PEP 208 (coercion overhaul) patch Message-ID: <20010102114809.B31341@glacier.fnational.com> I'm almost ready to apply SF patch #102652. Guido has give the okay assuming there are no objections from the rest of python-dev. The patch is large and modifies some complicated parts of the interpreter. I expect there will be some bugs. If you would like me to wait, speak now. Guido has sent me some comments on the patch today which I plan to review and address tonight. I will probably apply the patch tomorrow evening. Neil From tim.one at home.com Wed Jan 3 04:05:59 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 22:05:59 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102113853.A31341@glacier.fnational.com> Message-ID: [Neil Schemenauer, on a violation of transitivity j < c < i but not j < i] > To quote Bart Simpson: I didn't do it. I'm pretty sure the bug > is in PyInstance_DoBinOp. I don't think its worth fixing though. > I'm ready to check in my coercion overhaul patch, assuming no > veto's from the list. It should fix this bug (and introduce a > whole slew of new ones :). Sounds good to me! > Guido suggested that I remove the "number types compare smaller > than other types" behavior. What's your take on that? The > current patch on SF always uses the type names. It should be > easy to implement the old behavior though. It doesn't matter that they're specifically smaller, it matters that they can't violate transitivity. "numbers compare smaller" was introduced deliberately (by Guido) because, e.g., before that we had 99 < [99] < 99L despite that 99 == 99L, because "int" < "list" < "long int" Even stranger, we had 100 < [99] < 0L < 100 and 100 < [] < -101L < -100 Making numbers compare smaller than other types is one way to ensure stuff like that can't happen; I can't think of a simpler way (although making them compare larger than other types would be equally simple, as would making them compare as if their type name were "Neil" ). From paulp at ActiveState.com Wed Jan 3 04:34:59 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 19:34:59 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> <3A525D63.17ABCC87@ActiveState.com> <14930.32743.525564.69044@beluga.mojam.com> Message-ID: <3A529DE3.D93C3916@ActiveState.com> Skip Montanaro wrote: > >... > > I presume we are talking about 2.1 here. In any case, it seems to me that > in Py3k readlines should be lazy. I agree, but I'm ambivalent about your suggestion for polymorphic return values from readlines(). Yet another option is a "lazy=1" option. Paul Prescod From tim.one at home.com Wed Jan 3 05:33:29 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 23:33:29 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Message-ID: [Guido, writes a timing program] [Jeff, if you weren't copied on all this stuff, you can play catch-up by reading the archives, at http://mail.python.org/pipermail/python-dev/ ] > ... > I am including the timer program below my signature. The test input > was the current access_log of dinsdale.python.org, which has about 119 > Mbytes and 1M lines (as counted by the test program). For a contrast, I cobbled together a large test file out of various chunks of C source, .py source, HTML source, and email archives. I was shooting for the same size you used (~119Mb), but ended up with more than 3x as many lines. > I measure about a factor of 2 between readlines with a sizehint (of 1 > MB) and fileinput; Factor of 7 here (Jeff, NeilS eventually figured out that Guido was using a CVS version of Python that has AndrewK's glibc getline patch, a zippier line-input routine than Python 2.0 has; but it only applies to platforms using glibc). > ... > Output (the first time is realtime seconds, the second CPU seconds): > > total 119808333 chars and 1009350 lines > count_chars_lines 7.944 7.890 > readlines_sizehint 5.375 5.320 > using_fileinput 15.861 15.740 > while_readline 8.648 8.570 > > This was on a 600 MHz Pentium-III Linux box (RH 6.2). total 117615824 chars and 3237568 count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 866 MHz P3 Win98SE, current CVS Python. I have no handy explanation for why clock() and time() differ on my box (Win98 has no notions of "user time" or "CPU time" distinct from clock time). > Note that count_chars_lines and readlines_sizehint use the same > algorithm -- the difference is that readlines_sizehint uses 'pass' as > the inner loop body, while count_chars_lines adds two counters. > > Given that very light per-line processing (counting lines and > characters) already increases the time considerably, I'm not sure I > buy the arguments that the I/O overhead is always considerable. I disagree that this is "very light processing", although I agree it's hard to think of lighter processing : it's a few Python statements per line, which I'd say is pretty *typical* processing. Read a line, run a string find or regexp search on it, test the result, sometimes fiddle the line accordingly and sometimes not. File-crunching apps generally aren't rocket science! For example, I changed count_chars_lines to tally the number of lines containing the string "Guido" instead, and the runtime went up by just 0.8 seconds (BTW, it found 13808 of them ): if you're thinking in C terms, millions of failing searches for "Guido" may seem like more work, but the number of Python stmts executed usually counts more than what the stmts do at the C level. > ... > Now what to do? I still don't like xreadlines very much, but I do > see that it can save some time. But my test doesn't confirm Neel's > times as posted by Tim: > >> Slowest: for line in fileinput.input('foo'): # Time 100 >> : while 1: line = file.readline() # Time 75 >> : for line in LinesOf(open('foo')): # Time 25 >> Fastest: for line in file.readlines(): # Time 10 >> while 1: lines = file.readlines(hint) # Time 10 >> for line in xreadlines(file): # Time 10 > > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. I don't know what Neel used for an input file, or which platform he used either. And this is bound to vary a lot across platforms. As above, I saw a factor of 7 between fastest and slowest and a factor of 3 between readline and readlines_sizehint. BTW, on my platform the Perl script (using a recent ActiveState Windows Perl) open(FILE, "ga.txt"); while () { 1; } ran in about 6 seconds (I never figured how to get Perl to compute usable timings itself)-- substantially faster than even readlines_sizehint! --and changing the body to $nc = $nl = 0; while () { ++$nl; $nc += length; } print "$nc $nl\n"; boosted that to about 8 seconds. So Perl has gotten zippier too over the years. From tim.one at home.com Wed Jan 3 10:32:55 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 3 Jan 2001 04:32:55 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101022253.RAA16482@cj20424-a.reston1.va.home.com> Message-ID: [Guido & Tim, wonder about faking getline-like functionality for Windows] The attached is kinda baffling. The std tests pass with it, and it changes my test timings from: count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 to: count_chars_lines 14.880 14.854 readlines_sizehint 9.280 9.302 using_fileinput 48.610 48.589 while_readline 13.450 13.451 Big win? You bet. But ... The baffling parts: 1. That Perl still takes only 6 seconds in line-at-a-time mode. 2. I originally wrote a getline workalike, instead of building directly into a PyString buffer. That made my test run *slower*, and I'm talking factor of 2, not a yawn. To judge from my usually silent disk (I've got 256Mb RAM on this box), I'm afraid the extra mallocs required may have triggered the horrid Win9x malloc-thrashing problem I wrote about while I was still at Dragon. Consider that another vote for Vlad's PyMalloc -- we've got no handle on x-platform dynamic memory behavior now. Python's destiny is to replace both the platform OS and libc anyway <0.9 wink>. The scary parts: + As the "XXX" comments indicate, this is full of little insecurities. + Another one I just thought of: if the user's last operations on the fp were two or more consecutive ungetc calls, all bets are off. But then MS doesn't define what happens then either. + This is much less ambitious than I recall Perl's code being: it doesn't try to guess anything about the file, and effectively captures only what would happen if you could unroll the guts of a getc-in-a-loop and optimize the snot out of it. The good news is that this means it's much easier to maintain (it touches only two of the MS FILE* fields, and in ways that are pretty obviously correct). The bad news is that this seems also pretty clearly all there *is* to be gotten out of breaking into the FILE* abstraction for the particular test case I'm using; and increasing TUNEME doesn't save any time at all: the sucker is flying at full speed already. + It drops (line-at-a-time) drops to a little under 13 seconds if I comment out the thread macros. + I haven't looked at Perl's implementation in a year, and they must have dreamt up another trick since then. That's a "scary part" indeed to anyone who has ever looked at Perl's implementation. retreating-into-a-fetal-position-ly y'rs - tim Anyone wants to play, the sandbox is fileobject.c. Do two things: insert this new chunk somewhere above get_line: #ifdef MS_WIN32 static PyObject* win32_getline(FILE *fp) { /* XXX ignores thread safety -- but so does MS's getc macro! */ PyObject* v; char* pBuf; /* next free slot in v's buffer */ /* MS's internals are declared in terms of ints, but it's a sure bet * that won't last forever -- use size_t now & live w/ the casting; * ditto for Python's routines */ size_t total_buf_size = 100; size_t free_buf_size = total_buf_size; #define TUNEME 1000 /* how much to boost the string buffer when exhausted */ v = PyString_FromStringAndSize((char *)NULL, (int)total_buf_size); if (v == NULL) return NULL; pBuf = BUF(v); Py_BEGIN_ALLOW_THREADS for (;;) { char ch; size_t ms_cnt; /* FILE->_cnt shadow */ char* ms_ptr; /* FILE->_ptr shadow */ size_t max_to_copy, i; /* stdio buffer empty or in unknown state; rather * than try to simulate every quirk of MS's internals, * let the MS macros deal with it. */ /* XXX we also wind up here when we simply run out of string * XXX buffer space, but I'm not sure I care: making this a * XXX double-nested loop doesn't seem worth it */ ch = getc(fp); if (ch == EOF) break; /* make sure we've got some breathing room */ if (free_buf_size < 100) { size_t currentoffset = pBuf - BUF(v); total_buf_size += TUNEME; /* XXX check for overflow */ Py_BLOCK_THREADS if (_PyString_Resize(&v, (int)total_buf_size) < 0) return NULL; Py_UNBLOCK_THREADS pBuf = BUF(v) + currentoffset; free_buf_size = TUNEME; } /* ch wasn't EOF, so store it */ *pBuf++ = ch; --free_buf_size; if (ch == '\n') { break; } ms_cnt = (size_t)fp->_cnt; if (!ms_cnt) { /* XXX this is a slow way to read one character at * XXX a time if, e.g., the stream is unbuffered */ continue; } /* payback! now we don't have to check for buffer overflows or * EOF inside the loop, nor does the macro _filbuf() branch force * _ptr and _cnt in and out of memory on each iteration */ ms_ptr = fp->_ptr; assert(ms_cnt > 0); i = max_to_copy = ms_cnt < free_buf_size ? ms_cnt : free_buf_size; do { /* XXX unclear to me why MS's getc macro does "& 0xff" */ *pBuf++ = ch = *ms_ptr++ & 0xff; } while (--i && ch != '\n'); /* update the shadows & counters */ fp->_ptr = ms_ptr; free_buf_size -= max_to_copy - i; fp->_cnt = ms_cnt - (max_to_copy - i); if (ch == '\n') break; } Py_END_ALLOW_THREADS _PyString_Resize(&v, pBuf - BUF(v)); return v; } #endif 2. Within get_line, add this before the #endif (this is the getline #if block): #elif defined(MS_WIN32) if (n == 0) { return win32_getline(fp); } From ping at lfw.org Wed Jan 3 12:40:47 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 3 Jan 2001 05:40:47 -0600 (CST) Subject: [Python-Dev] inspect.py In-Reply-To: <14840.19556.127151.457533@anthem.concentric.net> Message-ID: Uh... hi. I know i've all but dropped out of existence for a long time, what with my simultaneous first stints as a grad student, a teaching assistant, and a house cook (!) and all, but i didn't want to let this work go to waste. Now that the holidays are here i can *finally* try to get some work done! So, i've updated inspect.py in response to Barry's comments, and below is my reply to this old thread. I also wrote some regression tests. I tried to submit inspect.py to SourceForge, but i got: ERROR Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 Does anyone know what's going on with that? Anyway, the latest module and regression tests are available at: http://www.lfw.org/python/inspect.py http://www.lfw.org/python/test_inspect.py for your perusal. On Thu, 26 Oct 2000 barry at wooz.org wrote: > Some thoughts after an initial scan of inspect.py: > > - The doc strings for the is*() functions aren't accurate. > E.g. ismodule() says that it asks whether "the object is a module > with the __file__ special attribute", but that isn't really what it > tests! Guido points out that builtin modules don't currently have > __file__ and besides, you're really testing that the type of the > object is ModuleType. Perhaps a different wording would be better, but i should at least clarify the intention: i wrote them that way because it seemed that the current objects export an unofficial "interface" by means of the special attributes they provide. The purpose of the "is*()" functions is to determine whether an object meets one of these interfaces. A complete interface would provide (1) a type-checker, (2) a constructor, and (3) the methods. As for (2), we don't normally allow construction of these things (except for wizards using the newmodule). As for (3), i suppose that one could further encapsulate these interfaces by providing spelled-out methods like "def getcode(f): return f.func_code", but it didn't seem worth the trouble. So that left just (1), and i had the other parts in mind while trying to describe (1). The type-checkers aren't of much use unless they accurately reflect the availability of the special attributes. Do you see what i'm trying to do? Maybe you can suggest a better way of doing it... anyway, i've tried to compromise in the docstrings as submitted. > - Don't make the predicate in getmembers() default to "lambda x: 1" > Instead make the default None, and skip the predicate test if it is > None. Okay, fine. > - getdoc()'s docstring should describe the margin munging it does. Okay, done. > - findsource() seems off-by one, e.g. > > >>> x = inspect.findsource(inspect.findsource) > >>> x[1] > 138 > > but the function really stars on line 139. 138 was the intended result here. Indeed the function starts on line 139 if you start counting from 1. The reason it returns 138 is that it's the index you would use for the array of lines (thus x[0][x[1]] or file.readlines()[138] is the first line of the function). Which way makes more sense? Should it be changed? > - I notice that currentframe() still uses the try/except trick to get > the frame object. It's much more efficient to provide a C > trampoline for getting that information. Sure, if there's a faster way, that's fine. It just wasn't something i expected to be used really often, and i wanted to write the module in pure Python so it could be easily maintained. I added a line to clobber the pure-Python currentframe() with sys._getframe() if it exists. > - If this were included in the library, we might want to 2.0-ify it. It currently doesn't rely on any 2.0 features, and it would be kind of nice to have it still work with 1.5 (especially if it is part of a drop-in documentation tool, as it is now, since it goes with htmldoc). -- ?!ng "Computers are useless. They can only give you answers." -- Pablo Picasso From guido at python.org Wed Jan 3 13:06:33 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 07:06:33 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Apparently getc_unlocked() is in the Single Unix spec. Not sure how widespread that is -- do Linux developers pay attention to this standard at all? According to the webpage it's (c) 1997. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 03 Jan 2001 10:58:44 +0200 From: Erno Kuusela To: guido at python.org Subject: getc_unlocked note hello, i was reading the python-dev archives and saw that someone had noticed my getline/getc_unlocked post from the newsgroup. a correction to the python-dev thread: getc_unlocked and friends are infact standard (not c99 though since c99 doesn't specify threads); they are part of the single unix specification. link: http://www.opennc.org/onlinepubs/007908799/xsh/getc_unlocked.html -- erno ------- End of Forwarded Message From guido at python.org Wed Jan 3 13:37:11 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 07:37:11 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 03 Jan 2001 04:32:55 EST." References: Message-ID: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> > 1. That Perl still takes only 6 seconds in line-at-a-time mode. Are you sure Perl still uses stdio at all? If so, does it open the file in binary or in text mode? Based on the APIs in MS's libc, I presume that the crlf->lf translation is not done by stdio proper but by the Unix I/O emulation just underneath it (open() has an O_BINARY option flag, so read() probably does the translation). That comes down to copying most bytes an extra time. (To test this hypothesis, you could try to open the test file with mode "rb" and see if it makes a difference.) > 2. I originally wrote a getline workalike, instead of building > directly into a PyString buffer. That made my test run *slower*, > and I'm talking factor of 2, not a yawn. To judge from my usually > silent disk (I've got 256Mb RAM on this box), I'm afraid the extra > mallocs required may have triggered the horrid Win9x > malloc-thrashing problem I wrote about while I was still at Dragon. > Consider that another vote for Vlad's PyMalloc -- we've got no > handle on x-platform dynamic memory behavior now. Python's destiny > is to replace both the platform OS and libc anyway <0.9 wink>. > > The scary parts: > > + As the "XXX" comments indicate, this is full of little > insecurities. My biggest worry: thread-safety. There must be a way to lock the file (you indicated that fgets() uses it). > + Another one I just thought of: if the user's last operations on > the fp were two or more consecutive ungetc calls, all bets are off. > But then MS doesn't define what happens then either. Python doesn't have an interface to ungetc(), and I believe the stdio standard says you can only call ungetc() once consecutively. Assuming other C code linked with Python obeys this rule (a pretty safe assumption), we should be fine. And if the assumption is violated, I presume it's really that C code's fault -- plus, it code that only uses getc() would be screwed just as badly. > + This is much less ambitious than I recall Perl's code being: it > doesn't try to guess anything about the file, and effectively > captures only what would happen if you could unroll the guts of a > getc-in-a-loop and optimize the snot out of it. The good news is > that this means it's much easier to maintain (it touches only two of > the MS FILE* fields, and in ways that are pretty obviously correct). > The bad news is that this seems also pretty clearly all there *is* > to be gotten out of breaking into the FILE* abstraction for the > particular test case I'm using; and increasing TUNEME doesn't save > any time at all: the sucker is flying at full speed already. You probably don't have many lines longer than 1000 characters. > + It drops (line-at-a-time) drops to a little under 13 seconds if I > comment out the thread macros. If you mean the Py_BLOCK_THREADS around the resize, that can be safely dropped. (If/when we introduce Vladimir's malloc, we'll have to decide whether it is threadsafe by itself or whether it requires the global interpreter lock. I vote to make it threadsafe by itself.) > + I haven't looked at Perl's implementation in a year, and they must > have dreamt up another trick since then. That's a "scary part" > indeed to anyone who has ever looked at Perl's implementation. > > retreating-into-a-fetal-position-ly y'rs - tim > > > Anyone wants to play, the sandbox is fileobject.c. Do two things: > insert this new chunk somewhere above get_line: > > #ifdef MS_WIN32 > static PyObject* > win32_getline(FILE *fp) > { > /* XXX ignores thread safety -- but so does MS's getc macro! */ > PyObject* v; > char* pBuf; /* next free slot in v's buffer */ > /* MS's internals are declared in terms of ints, but it's a sure bet > * that won't last forever -- use size_t now & live w/ the casting; > * ditto for Python's routines > */ > size_t total_buf_size = 100; > size_t free_buf_size = total_buf_size; > #define TUNEME 1000 /* how much to boost the string buffer when exhausted */ > > v = PyString_FromStringAndSize((char *)NULL, (int)total_buf_size); > if (v == NULL) > return NULL; > pBuf = BUF(v); > Py_BEGIN_ALLOW_THREADS > for (;;) { > char ch; > size_t ms_cnt; /* FILE->_cnt shadow */ > char* ms_ptr; /* FILE->_ptr shadow */ > size_t max_to_copy, i; > /* stdio buffer empty or in unknown state; rather > * than try to simulate every quirk of MS's internals, > * let the MS macros deal with it. > */ > /* XXX we also wind up here when we simply run out of string > * XXX buffer space, but I'm not sure I care: making this a > * XXX double-nested loop doesn't seem worth it > */ > ch = getc(fp); > if (ch == EOF) > break; > /* make sure we've got some breathing room */ > if (free_buf_size < 100) { > size_t currentoffset = pBuf - BUF(v); > total_buf_size += TUNEME; /* XXX check for overflow */ > Py_BLOCK_THREADS > if (_PyString_Resize(&v, (int)total_buf_size) < 0) > return NULL; > Py_UNBLOCK_THREADS > pBuf = BUF(v) + currentoffset; > free_buf_size = TUNEME; > } > /* ch wasn't EOF, so store it */ > *pBuf++ = ch; > --free_buf_size; > if (ch == '\n') { > break; > } > ms_cnt = (size_t)fp->_cnt; > if (!ms_cnt) { > /* XXX this is a slow way to read one character at > * XXX a time if, e.g., the stream is unbuffered > */ > continue; > } > /* payback! now we don't have to check for buffer overflows or > * EOF inside the loop, nor does the macro _filbuf() branch force > * _ptr and _cnt in and out of memory on each iteration > */ > ms_ptr = fp->_ptr; > assert(ms_cnt > 0); > i = max_to_copy = ms_cnt < free_buf_size ? ms_cnt : free_buf_size; Doesn't it make more sense to delay the resize until this point? I don't know how much the character copying accounts for, but I could imagine a strategy based on memchr() and memcpy() that first searches for a \n, and if found, allocates to the right size before copying. Typically, the buffer contains many lines, so this could be optimized into requiring a single exactly-sized malloc() call in the common case (where the buffer doesn't wrap). But possibly scanning the buffer for \n and then copying the bytes separately, even with memcmp() and memcpy(), slows things down too much for this to be faster. > do { > /* XXX unclear to me why MS's getc macro does "& 0xff" */ > *pBuf++ = ch = *ms_ptr++ & 0xff; I know why. getchar() returns an int in the range [-1, 255]. If chars are signed the &0xff is needed else you would get a return in the range [-128, 127] and -1 would be ambiguous (EOF==-1). Not sure if they *are* unsigned on any MS platform -- if they aren't, whoever coded this wasn't thinking -- on the other hand the compiler probagbly optimizes it out. But here since you're copying to another character, it's pointless. > } while (--i && ch != '\n'); > /* update the shadows & counters */ > fp->_ptr = ms_ptr; > free_buf_size -= max_to_copy - i; > fp->_cnt = ms_cnt - (max_to_copy - i); > if (ch == '\n') > break; > } > Py_END_ALLOW_THREADS > _PyString_Resize(&v, pBuf - BUF(v)); > return v; > } > #endif > > 2. Within get_line, add this before the #endif (this is the getline #if block): > > #elif defined(MS_WIN32) > if (n == 0) { > return win32_getline(fp); > } Note that get_line() with negative n could be implemented as get_line(0) with some post processing. This should be done completely separately, in PyFile_GetLine. The negative n case is only used by raw_input() -- it means strip the \n and raise EOFError for EOF, and I expect that this is rarely if ever used in a speed-conscious situation. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 3 15:56:31 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 09:56:31 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 03 Jan 2001 07:06:33 EST." <200101031206.HAA19182@cj20424-a.reston1.va.home.com> References: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Message-ID: <200101031456.JAA19990@cj20424-a.reston1.va.home.com> > Apparently getc_unlocked() is in the Single Unix spec. Not sure how > widespread that is -- do Linux developers pay attention to this > standard at all? According to the webpage it's (c) 1997. Erno Kuusela gave me some more info about this; glibc supports it. I did a quick test which suggests that it is a lot faster than regular getc() -- on a small test file it's actually faster than GNU getline(), even with the proper flockfile() / funlockfile() calls. (The test file was 6Mb -- 10 copies of /etc/termcap, which has short lines -- avg 43 chars.) This together with Tim's Win32x specific hacks might be the best we can do for get_line(). However, raw xreadlines is still almost twice as fast, so it's still under consideration. Maybe MS supports a similar unlocked getc macro, and a separate primitive to lock/unlock a file? That would allow more unified code. (Quick research shows that it exists, but only in internal form. We could probably call _lock_file() and _unlock_file(), and define our own getc_lk(), protected by the proper set of macros. This could all be presented by config.h as flockfile(), funlockfile(), and getc_unlocked() macros.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Wed Jan 3 16:27:09 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 10:27:09 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101031206.HAA19182@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 07:06:33AM -0500 References: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Message-ID: <20010103102709.A19451@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 07:06:33AM -0500, Guido van Rossum wrote: >Apparently getc_unlocked() is in the Single Unix spec. Not sure how >widespread that is -- do Linux developers pay attention to this >standard at all? According to the webpage it's (c) 1997. It seems to be in glibc 2.1, but I don't know how much it would help, and the added complexity of having to lock the file separately worries me, perhaps due to a superstitious fear of angering the Thread Gods. --amk From akuchlin at mems-exchange.org Wed Jan 3 16:44:57 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 10:44:57 -0500 Subject: [Python-Dev] Help wanted with setup.py script In-Reply-To: <017201c0759a$c2b180c0$e000a8c0@thomasnotebook>; from thomas.heller@ion-tof.com on Wed, Jan 03, 2001 at 04:35:10PM +0100 References: <200012290046.TAA01346@207-172-57-128.s128.tnt2.ann.va.dialup.rcn.com> <017201c0759a$c2b180c0$e000a8c0@thomasnotebook> Message-ID: <20010103104457.A19493@kronos.cnri.reston.va.us> [Cc'ing to python-dev]. On Wed, Jan 03, 2001 at 04:35:10PM +0100, Thomas Heller wrote: >You didn't expect this script run under windows? >(It does not run) It shouldn't matter, I think, since the makesetup stuff doesn't run on Windows either; presumably the compiled-in modules are specified by an MSVC project file, or something similar. Can anyone confirm that I don't care if setup.py works on Windows? (Well, I *know* for a fact I don't care; but should I? :) ) --amk From guido at python.org Wed Jan 3 16:49:43 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 10:49:43 -0500 Subject: [Python-Dev] Help wanted with setup.py script In-Reply-To: Your message of "Wed, 03 Jan 2001 10:44:57 EST." <20010103104457.A19493@kronos.cnri.reston.va.us> References: <200012290046.TAA01346@207-172-57-128.s128.tnt2.ann.va.dialup.rcn.com> <017201c0759a$c2b180c0$e000a8c0@thomasnotebook> <20010103104457.A19493@kronos.cnri.reston.va.us> Message-ID: <200101031549.KAA20188@cj20424-a.reston1.va.home.com> > It shouldn't matter, I think, since the makesetup stuff doesn't run on > Windows either; presumably the compiled-in modules are specified by an > MSVC project file, or something similar. Can anyone confirm that I > don't care if setup.py works on Windows? (Well, I *know* for a fact I > don't care; but should I? :) ) Personally, I don't think it's worth to make setup.py work for Windows. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Wed Jan 3 21:04:07 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 15:04:07 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: ; from noreply@sourceforge.net on Wed, Jan 03, 2001 at 08:47:30AM -0800 References: Message-ID: <20010103150407.D20301@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 08:47:30AM -0800, GvR wrote: >Summary: speed up readline() using getc_unlocked() So what does the performance of this version look like? --amk From guido at python.org Wed Jan 3 21:25:53 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 15:25:53 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 15:04:07 EST." <20010103150407.D20301@kronos.cnri.reston.va.us> References: <20010103150407.D20301@kronos.cnri.reston.va.us> Message-ID: <200101032025.PAA27457@cj20424-a.reston1.va.home.com> > >Summary: speed up readline() using getc_unlocked() > > So what does the performance of this version look like? Very slightly faster than the GNU getline() version. Without GNU getline, the old code was about 3.5 times slower. Here are the current times on a 6 Mb file (fileinput.py has my sourceforge speedup patch too): $ ./python ~/rltest.py ~/termcapx10 total 6252720 chars and 146250 lines; average line length 42.8 count_chars_lines 0.943 0.930 readlines_sizehint 0.544 0.540 using_fileinput 2.089 2.090 while_readline 0.956 0.960 For comparison, here's what Python 1.5.2 does with the same test (which should be pretty close to what the released Python 2.0 does; I don't have a copy of that handy). $ python1.5 ~/rltest.py ~/termcapx10 total 6252720 chars and 146250 lines; average line length 42.8 count_chars_lines 0.836 0.820 readlines_sizehint 0.523 0.520 using_fileinput 5.739 5.740 while_readline 3.670 3.670 I don't know why count_chars_lines got proportionally more slower than readlines_sizehint. (The += operator didn't make a difference either way.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 3 21:45:38 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 15:45:38 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 15:25:53 EST." <200101032025.PAA27457@cj20424-a.reston1.va.home.com> References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> Message-ID: <200101032045.PAA27595@cj20424-a.reston1.va.home.com> I should add that the patches are on SourceForge: fileinput.py: http://sourceforge.net/patch/?func=detailpatch&patch_id=103081&group_id=5470 fileobject.c: http://sourceforge.net/patch/?func=detailpatch&patch_id=103082&group_id=5470 I'm ready to check these in, but I'm waiting 24 hours in case there's something I've missed. (I haven't actually tested these on any other platform besides Linux.) Jeff Epler's xreadlines patch is here: http://sourceforge.net/patch/?func=detailpatch&patch_id=102915&group_id=5470 Note that Jeff's patch includes a patch to fileinput.py that does the same thing as mine but using his xreadlines module instead of directly using readlines(sizehint) as does mine. I like my approach better, mostly because it reduces depenencies. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Wed Jan 3 22:25:30 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 16:25:30 -0500 Subject: [Python-Dev] speed up readline() using getc_unlocked() In-Reply-To: <200101032045.PAA27595@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 03:45:38PM -0500 References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> <200101032045.PAA27595@cj20424-a.reston1.va.home.com> Message-ID: <20010103162530.A20433@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 03:45:38PM -0500, Guido van Rossum wrote: >I'm ready to check these in, but I'm waiting 24 hours in case there's >something I've missed. (I haven't actually tested these on any other >platform besides Linux.) On Solaris 2.6, the configure script doesn't detect that getc_unlocked() & friends are supported; details available from the patch. After editing config.h manually to enable them, the results are: Before getc_unlocked patch: total 1559913 chars and 32513 lines count_chars_lines 0.892 0.730 readlines_sizehint 0.329 0.300 using_fileinput 4.612 4.470 while_readline 2.739 2.670 After patch: total 1559913 chars and 32513 lines count_chars_lines 0.698 0.680 readlines_sizehint 0.273 0.270 using_fileinput 2.707 2.700 while_readline 0.778 0.780 amarok src> With a patched version of fileinput.py: using_fileinput 1.675 1.680 --amk From guido at python.org Wed Jan 3 22:36:07 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 16:36:07 -0500 Subject: [Python-Dev] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 16:25:30 EST." <20010103162530.A20433@kronos.cnri.reston.va.us> References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> <200101032045.PAA27595@cj20424-a.reston1.va.home.com> <20010103162530.A20433@kronos.cnri.reston.va.us> Message-ID: <200101032136.QAA07752@cj20424-a.reston1.va.home.com> > On Solaris 2.6, the configure script doesn't detect that > getc_unlocked() & friends are supported; details available from the > patch. (Fixed now, see the new patch.) > After editing config.h manually to enable them, the results are: > > Before getc_unlocked patch: > total 1559913 chars and 32513 lines > count_chars_lines 0.892 0.730 > readlines_sizehint 0.329 0.300 > using_fileinput 4.612 4.470 > while_readline 2.739 2.670 > > After patch: > total 1559913 chars and 32513 lines > count_chars_lines 0.698 0.680 > readlines_sizehint 0.273 0.270 > using_fileinput 2.707 2.700 > while_readline 0.778 0.780 > amarok src> > > With a patched version of fileinput.py: > using_fileinput 1.675 1.680 Thanks! The bottom line seems to be that your basic readline loop is still 3x as slow as the fastest way -- so there's still a lot to say for xreadlines... --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Jan 3 22:42:48 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 03 Jan 2001 22:42:48 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib codecs.py,1.13,1.14 References: Message-ID: <3A539CD8.367361B8@lemburg.com> "M.-A. Lemburg" wrote: > > Update of /cvsroot/python/python/dist/src/Lib > In directory usw-pr-cvs1:/tmp/cvs-serv26608/Lib > > Modified Files: > codecs.py > Log Message: > ... > > This patch closes the bugs #116285 and #119960. I was too fast... the subject line of #119960 was misleading. It is still open. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Thu Jan 4 00:13:15 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 3 Jan 2001 18:13:15 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Are you sure Perl still uses stdio at all? Pretty sure, but there are so many layers of macros the code is undecipherable, and I can't step thru macros in the debugger either (that's assuming I wanted to devote N hours to building Perl from source too -- which I don't). Perl also makes heavy use of macroizing std library names, so e.g. when I see "fopen" (which I do!), that doesn't mean I'm getting the fopen I'm thinking of. But the MSVC config files define all sorts of macros to get at the MS stdio _cnt and _ptr (and most other) FILE* fields, and the version of fopen in the Win32 stuff appears to defer to the platform fopen (after doing Perlish stuff, like if someone passed "/dev/null" as the file name, Perl changes it to "NUL"). This is what it's like: the first line of Perl's win32_fopen is this: dTHXo; That's conditionally defined in perl.h, either as #define dTHXo dTHXoa(PERL_GET_THX) or, if pTHXo is not defined, as # define dTHXo dTHX dTHX in turn is #defined in 4 different places across 3 different files in 2 different directories. I'll skip those. OTOH, dTHXoa is easy! It's only defined once: #define dTHXoa(a) pTHXo = a Ah, *that* clears it up . Etc. 20 years ago I may have thought this was fun. I thought debugging large systems of m4 macros was fun then, and I'm not sure this is either better or worse than that -- well, it's worse, because I understood m4's implementation. > If so, does it open the file in binary or in text mode? Sorry, but I really don't know and it's a pit to pursue. If it's not native text mode, they do a good job of faking it (e.g., Ctrl-Z acts like an EOF when reading a text file from Perl on Windows -- not something even Larry would be likely to do on his own ). > Based on the APIs in MS's libc, I presume that the crlf->lf > translation is not done by stdio proper but by the Unix I/O > emulation just underneath it (open() has an O_BINARY option > flag, so read() probably does the translation). Yes; and late in the last release cycle, import.c's open_exclusive had a Windows bug related to this (fdopen() used "wb", but the earlier open() didn't use O_BINARY, and fdopen *acted* like it had used "w"). Also, the MS setmode() function works on file handles, not streams. > That comes down to copying most bytes an extra time. Understood. But the CRLF are stored physically on disk, so unless the disk controller is converting them, *someone's* software (whether MS's or Perl's) is doing it. By the time Perl is doing its fast line-input stuff, and doing what sure looks like a straight copy out of an IO buffer, it's clear from the code that CRLF has already been translated to LF. > (To test this hypothesis, you could try to open the test file > with mode "rb" and see if it makes a difference.) In Python, that saved about 10% (but got the wrong answers ). In Perl, about 15-20%. But I don't think that tells us who's doing the translation. Assuming that the translation takes about the same total time for each, it makes sense that the percentage would be higher for Perl (since its total runtime is lower: same-sized slice of a smaller pie). > My biggest worry: thread-safety. There must be a way to lock > the file (you indicated that fgets() uses it). Yes, via the unadvertised _lock_str and _unlock_str macros defined in MS mtdll.h, which is not on the include path: /* * This is an internal C runtime header file. It is used when building * the C runtimes only. It is not to be used as a public header file. */ The routines and macros it calls are also unadvertised. After an hour of thrashing I wasn't able to successfully link any code trying to call these routines. Doesn't mean it's impossible, does means they're internal to MS libc and aren't meant to be called by anything else. That's why it's called "cheating" . Perl appears to ignore the whole issue (but Perl's thread story is muddy at best). [... ungetc ...] Not worried here either. > ... > You probably don't have many lines longer than 1000 characters. None, in fact. >> + It drops (line-at-a-time) drops to a little under 13 seconds if I >> comment out the thread macros. > If you mean the Py_BLOCK_THREADS around the resize, that can be safely > dropped. I meant *all* thread-related macros -- was just trying to get a feel for how much that fiddling cost (it's an expense Perl doesn't seem to have -- yet). Was measurable but not substantial. WRT the resize, there's now a "fast path" that avoids it. > (If/when we introduce Vladimir's malloc, we'll have to decide whether > it is threadsafe by itself or whether it requires the global > interpreter lock. I vote to make it threadsafe by itself.) As feared, this thread is going to consume my life <0.5 wink>. > ... > Doesn't it make more sense to delay the resize until this point? I > don't know how much the character copying accounts for, but I could > imagine a strategy based on memchr() and memcpy() that first searches > for a \n, and if found, allocates to the right size before copying. > Typically, the buffer contains many lines, so this could be optimized > into requiring a single exactly-sized malloc() call in the common case > (where the buffer doesn't wrap). But possibly scanning the buffer for > \n and then copying the bytes separately, even with memcmp() and > memcpy(), slows things down too much for this to be faster. Turns out that Perl does very much what I was doing; the Perl code is actually more burdensome, because its routine is trying to deal not only with \n-termination, but also arbitrary-string termination (Perl's Awk-like input record separator), and "paragraph mode", and fixed-size reads, and some other stuff I can't figure out from the macro names. In all cases with a terminator, though, it's doing the same business of both copying and testing in a very tight inner loop. It doesn't appear to make any serious attempts to avoid resizing the buffer. But, Perl has its own malloc routines, and I'm guessing they're highly tuned for this stuff. Since we're stuck with the MS malloc-- and Win9x's in particular seems lame --adding this near the start of my stuff did yield a nice speedup: if (fp->_cnt > 0 && (pBuf = (char *)memchr(fp->_ptr, '\n', fp->_cnt)) != NULL) { /* it's all in the buffer so don't bother releasing the * global lock */ total_buf_size = pBuf - fp->_ptr + 1; v = PyString_FromStringAndSize(fp->_ptr, (int)total_buf_size); if (v != NULL) { pBuf = BUF(v) + total_buf_size; fp->_cnt -= total_buf_size; fp->_ptr += total_buf_size; } goto done; } So that builds the result string directly from the stdio buffer when it can. Times dropped from (before this particular small hack) count_chars_lines 14.880 14.854 readlines_sizehint 9.280 9.302 using_fileinput 48.610 48.589 while_readline 13.450 13.451 to count_chars_lines 14.780 14.784 readlines_sizehint 9.550 9.514 using_fileinput 43.560 43.584 while_readline 10.600 10.578 Since I have no long lines in this test data, and the stdio buffer typically contains thousands of chars, most calls should be satisfied by the fast path. Compared to the previous code, the fast path (1) avoids global lock fiddling (but that didn't account for much in a distinct test); (2) crawls over the buffer twice instead of once; and, (3) avoids one (shrinking!) realloc. So crawling over the buffer an extra time costs nothing compared to the cost of a resize; and that's likely just more evidence that malloc/realloc suck on this platform. CAUTION: no file locking is going on now (because I haven't found a way to do it). My previous claim that the MS getc macro did no locking was wrong, as I discovered by stepping thru the generated machine code. stdio.h #defines getc without locking, but in _MT mode it later gets #undef'ed and turned into a function call. >> /* XXX unclear to me why MS's getc macro does "& 0xff" */ >> *pBuf++ = ch = *ms_ptr++ & 0xff; > I know why. getchar() returns an int in the range [-1, 255]. If > chars are signed the &0xff is needed else you would get a return in > the range [-128, 127] and -1 would be ambiguous (EOF==-1). Bingo -- MS chars are signed. > ... > But here since you're copying to another character, it's pointless. Yup! Gone. > .... > Note that get_line() with negative n could be implemented as > get_line(0) with some post processing. Andrew's glibc getline code appears to have wanted to do that, but looks to me like it's unreachable (unless I'm hallucinating, the "n < 0" test after return from glibc getline can't succeed, because the enclosing block is guarded by an "n==0" test). > This should be done completely separately, in PyFile_GetLine. I assume you have an editor . > The negative n case is only used by raw_input() -- it means strip > the \n and raise EOFError for EOF, and I expect that this is rarely > if ever used in a speed-conscious situation. I've never seen raw_input used except when stdin and stdout were connected to a tty. When I tried raw_input from a DOS box under the debugger, it never called get_line. Something trickier is going on there; I suspect it's actually calling fgets (eventually) instead in that case. more-mysteries-than-i-really-need-ly y'rs - tim From jeremy at alum.mit.edu Thu Jan 4 01:06:58 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 3 Jan 2001 19:06:58 -0500 (EST) Subject: [Python-Dev] Mailman problems? In-Reply-To: References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: <14931.48802.273143.209933@localhost.localdomain> Tim & Barry, It looks like the is some problem with Mailman that is garbling messages to python-dev. It may only affect lines that begin with a tab; not sure. Your most recent message came through with the following line > dTHXo; (This was not the only example.) I think this was supposed to be a line of C code, but whatever meaningful contents it had were rendered as gobbledygook. Jeremy From loewis at informatik.hu-berlin.de Thu Jan 4 01:13:16 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Thu, 4 Jan 2001 01:13:16 +0100 (MET) Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> > Apparently getc_unlocked() is in the Single Unix spec. Not sure how > widespread that is -- do Linux developers pay attention to this > standard at all? Ulrich Drepper, who is in charge of glibc, is always interested in following Single Unix to the letter; getc_unlocked is supported atleast since glibc 2.0. http://www.sun.com/smcc/solaris-migration/docs/courses/threadsHTML/adv.html claims that getc_unlocked is already in POSIX.1c; Solaris apparently supports it atleast since Solaris 2.4. Irix has it since 6.5, Tru64 atleast since 4.0d (probably much longer); HPUX since 11.0, AIX since atleast 4.3. Of the BSDs, only OpenBSD appears to support it; it knows that it is in ANSI 1003.1 since 1996-07-12. SCO OpenServer doesn't support it. Regards, Martin From fredrik at effbot.org Thu Jan 4 01:20:41 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 4 Jan 2001 01:20:41 +0100 Subject: [Python-Dev] Mailman problems? References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> <14931.48802.273143.209933@localhost.localdomain> Message-ID: <011901c075e4$2ce96360$e46940d5@hagrid> > It looks like the is some problem with Mailman that is garbling > messages to python-dev. It may only affect lines that begin with a > tab; not sure. > > Your most recent message came through with the following line > > > dTHXo; > > (This was not the only example.) > > I think this was supposed to be a line of C code, but whatever > meaningful contents it had were rendered as gobbledygook. also looks like Mailman removed all smileys from Jeremys post ;-) From thomas at xs4all.net Thu Jan 4 01:27:54 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 01:27:54 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101040013.BAA13436@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Thu, Jan 04, 2001 at 01:13:16AM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> Message-ID: <20010104012753.D2467@xs4all.nl> On Thu, Jan 04, 2001 at 01:13:16AM +0100, Martin von Loewis wrote: > Of the BSDs, only OpenBSD appears to support it; it knows that it is > in ANSI 1003.1 since 1996-07-12. BSDI supports getc_unlocked() at least since BSDI 3.1. I don't have any older boxes to check, but the manpage for getc and all its friends carries the timestamp 'June 4, 1993', which implies it could have been available a lot longer. (Note that BSD was once known to *define* the standard ;-) I concur that FreeBSD does not currently support getc_unlocked, but since BSDI and FreeBSD are merging, I suspect it will, soonish. In other words: use it! :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry at wooz.org Thu Jan 4 03:59:01 2001 From: barry at wooz.org (Barry A. Warsaw) Date: Wed, 3 Jan 2001 21:59:01 -0500 Subject: [Python-Dev] Re: Mailman problems? References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> <14931.48802.273143.209933@localhost.localdomain> Message-ID: <14931.59125.391596.730296@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> It looks like the is some problem with Mailman that is JH> garbling messages to python-dev. It may only affect lines JH> that begin with a tab; not sure. JH> Your most recent message came through with the following line >> dTHXo; JH> (This was not the only example.) JH> I think this was supposed to be a line of C code, but whatever JH> meaningful contents it had were rendered as gobbledygook. Oh shoot, my bad. I dropped in an experimental Perl filter module in the delivery pipeline. It's been so long since I hacked Perl, I think I meant to write $%_-> when I really wrote %$_-> -Barry From tim.one at home.com Thu Jan 4 05:26:51 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 3 Jan 2001 23:26:51 -0500 Subject: [Python-Dev] RE: Mailman problems? In-Reply-To: <14931.48802.273143.209933@localhost.localdomain> Message-ID: [Jeremy] > It looks like the is some problem with Mailman that is garbling > messages to python-dev. It may only affect lines that begin with a > tab; not sure. > > Your most recent message came through with the following line > >> dTHXo; > > (This was not the only example.) > > I think this was supposed to be a line of C code, but whatever > meaningful contents it had were rendered as gobbledygook. I have no idea where that "o" came from! It was supposed to be "o". Barry, fix it! BTW, the second line of Perl implementation functions is usually a lot less mysterious than the first. If anyone wants the joy of reverse-engineering Perl's supernaturally fast input, it's function Perl_sv_gets in file sv.c. sv.c? Yes! The destination of a one-line input is a Scalar Value, hence, sc. I expect there's similar method behind all of this stuff, but I never stumbled into the key. To get you started, here's the first line of Perl_sv_gets: dTHR; The line you're looking for is 119 lines down from that: if ((*bp++ = *ptr++) == rslast) /* really | dust */ the-comment-makes-more-sense-in-context-ly y'rs - tim From thomas at xs4all.net Thu Jan 4 07:51:17 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 07:51:17 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101040037.TAA08699@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 07:37:22PM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> Message-ID: <20010104075116.J402@xs4all.nl> On Wed, Jan 03, 2001 at 07:37:22PM -0500, Guido van Rossum wrote: > > In other words: use it! :) > > Mind doing a few platform tests on the (new version of the) patch? Well, only a bit :) It's annoying that BSDI doesn't come with autoconf, but I managed to use all my early-morning wit (it's 6:30AM ) to work around it. I've tested it on BSDI 4.1 and FreeBSD 4.2-RELEASE. > I already know that it works on Red Hat Linux 6.2 (my box) and Solaris > 2.6 (Andrew's box). I would be delighted to know that it works on at > least one other platform that has getc_unlocked() and one platform > that doesn't have it! Sorry, I have to disappoint you. FreeBSD does have getc_unlocked, they just didn't document it. Hurrah for autoconf ;P Anyway, it worked like a charm on BSDI: (Python 2.0) total 1794310 chars and 37660 lines count_chars_lines 0.310 0.300 readlines_sizehint 0.150 0.150 using_fileinput 2.013 2.017 while_readline 1.006 1.000 (CVS Python + getc_unlocked) daemon2:~/python/python/dist/src > ./python test.py termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.354 0.350 readlines_sizehint 0.182 0.183 using_fileinput 1.594 1.583 while_readline 0.363 0.367 But something weird is going on on FreeBSD: (Standard CVS Python) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.265 0.266 readlines_sizehint 0.148 0.148 using_fileinput 0.943 0.938 while_readline 0.214 0.219 (CVS+getc_unlocked) > ./python-getc-unlocked ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.266 0.266 readlines_sizehint 0.151 0.141 using_fileinput 1.066 1.078 while_readline 0.283 0.281 This was sufficiently unexpected that I looked a bit further. The FreeBSD Python was compiled without editing Modules/Setup, so it was statically linked, no readline etc, but *with* threads (which are on by default, and functional on both FreeBSD and BSDI 4.1.) Here's the timings after I enabled just '*shared*': (CVS + *shared*) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.276 0.273 readlines_sizehint 0.150 0.156 using_fileinput 0.902 0.898 while_readline 0.206 0.203 (This was not a fluke, I repeated it several times, getting hardly any variation.) Enabling readline and cursesmodule had no additional effect. Adding *shared* to the getc_unlocked tree saw roughly the same improvement, but was still slower than without getc_unlocked. (CVS + *shared* + getc_unlocked) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.272 0.273 readlines_sizehint 0.149 0.148 using_fileinput 1.031 1.031 while_readline 0.267 0.266 Increasing the size of the testfile didn't change anything, other than the absolute numbers. I browsed stdio.h, where both getc() and getc_unlocked() are defined as macros. getc_unlocked is defined as: #define __sgetc(p) (--(p)->_r < 0 ? __srget(p) : (int)(*(p)->_p++)) #define getc_unlocked(fp) __sgetc(fp) and getc either as #define getc(fp) getc_unlocked(fp) (without threads) or static __inline int \ __getc_locked(FILE *_fp) \ { \ extern int __isthreaded; \ int _ret; \ if (__isthreaded) \ _FLOCKFILE(_fp); \ _ret = getc_unlocked(_fp); \ if (__isthreaded) \ funlockfile(_fp); \ return (_ret); \ } #define getc(fp) __getc_locked(fp) _FLOCKFILE(x) is defined as flockfile(x), so that isn't the difference. The speed difference has to be in the quick-and-easy test for whether the locking is even necessary. Starting a thread on 'time.sleep(900)' in test.py shows these numbers: (standard CVS python) > ./python-shared-std ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.433 0.445 readlines_sizehint 0.204 0.188 using_fileinput 1.595 1.594 while_readline 0.456 0.453 (getc_unlocked) > ./python-getc-unlocked-shared ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.441 0.453 readlines_sizehint 0.206 0.195 using_fileinput 1.677 1.688 while_readline 0.509 0.508 So... using getc_unlocked manually for performance reasons isn't a cardinal sin on FreeBSD only if you are really using threads :-) Lets-outsmart-the-OS-scheduler-next!-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Thu Jan 4 08:57:26 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 08:57:26 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test/output test_coercion,1.2,1.3 In-Reply-To: ; from nascheme@users.sourceforge.net on Wed, Jan 03, 2001 at 05:36:27PM -0800 References: Message-ID: <20010104085726.E2467@xs4all.nl> On Wed, Jan 03, 2001 at 05:36:27PM -0800, Neil Schemenauer wrote: > Update of /cvsroot/python/python/dist/src/Lib/test/output > In directory usw-pr-cvs1:/tmp/cvs-serv21710/Lib/test/output > > Modified Files: > test_coercion > Log Message: > Sequence repeat works now for in-place multiply with an integer type > as the left operand. I don't know if this is a feature or a bug. > ! 2 *= [1] => [1, 1] It's a feature. x = 2 * [1] works, so x = 2 x *= [1] does, too. Obviously, '2 *= [1]' shouldn't, but I'm assuming you don't actually execute that (it should give a SyntaxError) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Thu Jan 4 10:32:55 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 4 Jan 2001 10:32:55 +0100 Subject: [Python-Dev] RE: Mailman problems? References: Message-ID: <00a701c07631$531983b0$e46940d5@hagrid> tim wrote: > I have no idea where that "o" came from! It was supposed to be "o". > Barry, fix it! no need. from the perlguts man page: "You can ignore [pad]THX[xo] when browsing the Perl headers/sources." in-my-dictionary-perl's-an-american-physicist-ly yrs /F From mal at lemburg.com Thu Jan 4 11:02:35 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 04 Jan 2001 11:02:35 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 References: Message-ID: <3A544A3B.32B86792@lemburg.com> Neil Schemenauer wrote: > > Update of /cvsroot/python/python/dist/src/Include > In directory usw-pr-cvs1:/tmp/cvs-serv21006/Include > > Modified Files: > classobject.h > Log Message: > Remove PyInstance_*BinOp functions. > > Index: classobject.h > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Include/classobject.h,v > retrieving revision 2.33 > retrieving revision 2.34 > diff -C2 -r2.33 -r2.34 > *** classobject.h 2000/09/01 23:29:26 2.33 > --- classobject.h 2001/01/04 01:30:34 2.34 > *************** > *** 60,71 **** > extern DL_IMPORT(int) PyClass_IsSubclass(PyObject *, PyObject *); > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > - char *, char *, > - PyObject * (*)(PyObject *, > - PyObject *)); > - > - extern DL_IMPORT(int) > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > - PyObject * (*)(PyObject *, PyObject *), int); Wouldn't it be safer to provide emulation APIs for these ? There might be code out there using these APIs. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Thu Jan 4 15:06:53 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 09:06:53 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 In-Reply-To: Your message of "Thu, 04 Jan 2001 11:02:35 +0100." <3A544A3B.32B86792@lemburg.com> References: <3A544A3B.32B86792@lemburg.com> Message-ID: <200101041406.JAA11926@cj20424-a.reston1.va.home.com> > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > > - char *, char *, > > - PyObject * (*)(PyObject *, > > - PyObject *)); > > - > > - extern DL_IMPORT(int) > > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > > - PyObject * (*)(PyObject *, PyObject *), int); > > Wouldn't it be safer to provide emulation APIs for these ? There > might be code out there using these APIs. No. These were never intended to be part of the API (and it was a mistake that they used DL_IMPORT()). They had to be extern because they were defined in one file and used in another. I'm glad they're gone. They are so obscure that I'd be *very* surprised if anybody was using them, and even more if they even *wanted* emulation under the new scheme -- I'd expect them to eagerly convert their code to using new-style numbers right away. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 4 15:16:39 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 09:16:39 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 07:51:17 +0100." <20010104075116.J402@xs4all.nl> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> Message-ID: <200101041416.JAA11983@cj20424-a.reston1.va.home.com> [Thomas finds that on FreeBSD, getc() is faster than getc_unlocked().] Thomas, I really don't understand it. The getc() source code you showed calls getc_unlocked(). So how can it be faster? The answer must be somewhere else... Cache line conflicts, the rewriting of the loop that I did, a compiler bug, the inlining, who knows. Can you compare the generated assembly code? On other platforms, getc_unlocked() typically speeds the readline() test case up by a significant factor (as in your BSDI numbers, where it's almost 3x faster). Could it be that you're mistaken and that somehow getc_unlocked() is *not* chosen on FreeBSD? Then I could believe it, the rewritten loop is so different that the optimizer might have done something different to it. (Check config.h. When all else fails, I put an #error in the #ifdef branch that I expect not to be taken.) Could it be that somehow getc_unlocked() is later defined to be the same as getc(), so choosing it just adds the overhead of calling f[un]lockfile() for each line? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 4 15:59:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 15:59:05 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041416.JAA11983@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 04, 2001 at 09:16:39AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> Message-ID: <20010104155904.L402@xs4all.nl> On Thu, Jan 04, 2001 at 09:16:39AM -0500, Guido van Rossum wrote: > [Thomas finds that on FreeBSD, getc() is faster than getc_unlocked().] > Thomas, I really don't understand it. The getc() source code you > showed calls getc_unlocked(). So how can it be faster? The answer > must be somewhere else... Cache line conflicts, the rewriting of the > loop that I did, a compiler bug, the inlining, who knows. Can you > compare the generated assembly code? On other platforms, > getc_unlocked() typically speeds the readline() test case up by a > significant factor (as in your BSDI numbers, where it's almost 3x > faster). Nono, reread my message, and your code. getc() isn't faster than getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, etc.) Significantly so when there is only one thread running (which is still the common case, for most systems, and FreeBSD's libc has easy inside knowledge about) and marginally so when there is at least one other thread. The small advantage in the multi-threaded case can be explained by the rest of the changes. You see, I was comparing a patched tree versus a non-patched tree, not a getc_unlocked() enabled one versus a disabled one, so I was measuring the speed difference of the *patch*, not of the use of getc_unlocked() vs getc(). Here is the speed difference of just the use of getc() vs getc_unlocked() (same tree, hand-edited config.h) in a non-threaded environment: > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.149 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.148 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 As you see, no significant difference. Here is the difference in a threaded environment (a second thread that does just 'time.sleep(900)'): > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.422 readlines_sizehint 0.200 0.211 using_fileinput 1.604 1.594 while_readline 0.465 0.461 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.430 readlines_sizehint 0.201 0.203 using_fileinput 1.600 1.602 while_readline 0.463 0.461 ... where I have to note that the getc-disabled version's 'using_fileinput' time fluctuates a lot more, mostly upwards, in the threaded environment. (I see it jump to 1.609, 1.617 cputime, every few runs.) Still not a terribly significant difference, but a hint that we, too, can use inside knowledge ;) > Could it be that you're mistaken and that somehow getc_unlocked() is > *not* chosen on FreeBSD? Then I could believe it, the rewritten loop > is so different that the optimizer might have done something different > to it. (Check config.h. When all else fails, I put an #error in the > #ifdef branch that I expect not to be taken.) Yah, #error is great for debugging, I use it a lot ;) But I'm sure of this. FreeBSD's getc() is just craftily optimized. Note that if we can get get_line using getc_unlocked() to run as fast as get_line using getc() on FreeBSD, it should also benifit other platforms, because the only speed to be had is in our own code :) Not that I'm saying it can be improved, just that it apparently got slower, because of this patch. I can't be much help doing any performance tuning, though, I've about used up my lunchhour and I'm working late tonight ;P Good-thing-my-boss-can't-tell-the-difference-between-Apache-and-Python-src-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Thu Jan 4 16:27:28 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 10:27:28 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 15:59:05 +0100." <20010104155904.L402@xs4all.nl> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> Message-ID: <200101041527.KAA12181@cj20424-a.reston1.va.home.com> [Me & Thomas in violent agreement that there's something weird about the speed of getc_unlocked() vs. getc() on FreeBSD.] I just realized what's the probable cause. Read your timing post again: # BSDI: # # (Python 2.0) # while_readline 1.006 1.000 # # (CVS Python + getc_unlocked) # while_readline 0.363 0.367 # FreeBSD: # # (Standard CVS Python) # while_readline 0.214 0.219 # # (CVS+getc_unlocked) # while_readline 0.283 0.281 Standard CVS Python, as opposed to Python 2.0 as released, uses GNU getline()! So on FreeBSD, for this test case, GNU getline() is faster than getc_unlocked(). So the question is, should I leave the GNU getline() code in? I'm inclined against it -- it's not that much faster, and on other platform getc_unlocked() is faster. Given that getc_unlocked() is a standard (of some sort) and GNU getline() is, well, just that, I'd say let's stick with getc_unlocked(). (Unfortunately, from a phone conversation I had last night with Tim, there's not much hope of doing something there -- and that platform sorely needs it! The hacks that Tim reported earlier are definitely not thread-safe. While it's easy to come up with getc_unlocked() for Windows, the locking operations used internally there by the /MT code are not exported from MSVCRT.DLL, and that's crucial.) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 4 16:31:39 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 16:31:39 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041527.KAA12181@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 04, 2001 at 10:27:28AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <200101041527.KAA12181@cj20424-a.reston1.va.home.com> Message-ID: <20010104163139.M402@xs4all.nl> On Thu, Jan 04, 2001 at 10:27:28AM -0500, Guido van Rossum wrote: > [Me & Thomas in violent agreement that there's something weird about > the speed of getc_unlocked() vs. getc() on FreeBSD.] > I just realized what's the probable cause. Read your timing post > again: > Standard CVS Python, as opposed to Python 2.0 as released, uses GNU > getline()! Sorry, no go. You need two things to use getline(): getline() itself, and a GNU libc. FreeBSD has neither. (And autoconf agrees with me.) If you *really really* want me to, I can compile 2.0-standard on FreeBSD and show you. But I'd rather not :) Now go back and read my other mail about why FreeBSD is faster :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Thu Jan 4 16:43:15 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 4 Jan 2001 10:43:15 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104155904.L402@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 04, 2001 at 03:59:05PM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> Message-ID: <20010104104315.C23803@kronos.cnri.reston.va.us> On Thu, Jan 04, 2001 at 03:59:05PM +0100, Thomas Wouters wrote: >getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ >the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, >etc.) Significantly so when there is only one thread running (which is still So it looks like the ALLOW_THREADS should be moved out of the for loop. This produced no measureable performance difference on Solaris; I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some unusually slow thread operation? --amk From thomas at xs4all.net Thu Jan 4 16:59:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 16:59:25 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104104315.C23803@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 04, 2001 at 10:43:15AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> Message-ID: <20010104165925.G2467@xs4all.nl> On Thu, Jan 04, 2001 at 10:43:15AM -0500, Andrew Kuchling wrote: > On Thu, Jan 04, 2001 at 03:59:05PM +0100, Thomas Wouters wrote: > >getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ > >the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, > >etc.) Significantly so when there is only one thread running (which is still > So it looks like the ALLOW_THREADS should be moved out of the for > loop. This produced no measureable performance difference on Solaris; > I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some > unusually slow thread operation? Note that I was just guessing there. I did a quick scan of the function, and noticed that the ALLOW_THREADS statements had moved into the outer loop. I didn't even contemplate whether that made a difference, so don't trust that judgement. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Thu Jan 4 17:10:29 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 4 Jan 2001 11:10:29 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104165925.G2467@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 04, 2001 at 04:59:25PM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> <20010104165925.G2467@xs4all.nl> Message-ID: <20010104111029.A28510@kronos.cnri.reston.va.us> On Thu, Jan 04, 2001 at 04:59:25PM +0100, Thomas Wouters wrote: >Note that I was just guessing there. I did a quick scan of the function, and >noticed that the ALLOW_THREADS statements had moved into the outer loop. I >didn't even contemplate whether that made a difference, so don't trust that >judgement. According to your benchmark, the performance of the threaded version was the same whether or not getc_unlocked() was unused, so it's not that flockfile() is really slow. I can't believe the compiler optimized the old, ungainly loop better than the newer, tighter loop. That leaves the ALLOW_THREADS as the most reasonable culprit. --amk From akuchlin at mems-exchange.org Thu Jan 4 18:10:11 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 04 Jan 2001 12:10:11 -0500 Subject: [Python-Dev] SGI's Digital Media SDK Message-ID: SGI just made a source release of their digital media SDK for IRIX and Linux at http://oss.sgi.com/projects/dmsdk/ . According to the FAQ, this is derived from previous SGI libraries, "including the Video Library (VL), the Audio Library (AL), Digital Media Image Convertor (DMIC), Digital Media Audio Convertor (DMAC), and the Compression Library (CL)." Interested parties may want to look into this, because Python still has the al, cd, cl, and sv modules; maybe they'd work with the new software with a reasonable amount of fixing, and at least now there's a reasonable chance that non-IRIX platforms will be supported. --amk From guido at python.org Thu Jan 4 20:07:13 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 14:07:13 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 10:43:15 EST." <20010104104315.C23803@kronos.cnri.reston.va.us> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> Message-ID: <200101041907.OAA12573@cj20424-a.reston1.va.home.com> > So it looks like the ALLOW_THREADS should be moved out of the for > loop. This produced no measureable performance difference on Solaris; > I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some > unusually slow thread operation? I kind of doubt that it's Py_ALLOW_THREADS -- it's in the outer loop, which typically only gets executed once. It only goes around a second time when the line is longer than the initial buffer. We could tweak the initial buffer size (currently 100, with increments of 1000). --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Thu Jan 4 20:32:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 04 Jan 2001 20:32:15 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 References: <3A544A3B.32B86792@lemburg.com> <200101041406.JAA11926@cj20424-a.reston1.va.home.com> Message-ID: <3A54CFBF.CDD2138B@lemburg.com> Guido van Rossum wrote: > > > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > > > - char *, char *, > > > - PyObject * (*)(PyObject *, > > > - PyObject *)); > > > - > > > - extern DL_IMPORT(int) > > > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > > > - PyObject * (*)(PyObject *, PyObject *), int); > > > > Wouldn't it be safer to provide emulation APIs for these ? There > > might be code out there using these APIs. > > No. These were never intended to be part of the API (and it was a > mistake that they used DL_IMPORT()). They had to be extern because > they were defined in one file and used in another. I'm glad they're > gone. They are so obscure that I'd be *very* surprised if anybody was > using them, and even more if they even *wanted* emulation under the > new scheme -- I'd expect them to eagerly convert their code to using > new-style numbers right away. I'll see whether I can get mxDateTime working with the new scheme later this year -- it would be really great to do away with the coercion hack I was using until now :-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Fri Jan 5 07:04:56 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 01:04:56 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041527.KAA12181@cj20424-a.reston1.va.home.com> Message-ID: [Guido van Rossum] > ... > (Unfortunately, from a phone conversation I had last night with > Tim, there's not much hope of doing something there -- and that > platform [Win32] sorely needs it! The hacks that Tim reported > earlier are definitely not thread-safe. While it's easy to come > up with getc_unlocked() for Windows, the locking operations used > internally there by the /MT code are not exported from MSVCRT.DLL, > and that's crucial.) The short course is that I still haven't found a workable way to lock streams on Windows: they do have a complete set of stream-locking functions and macros, but there's no way short of deep magic I can find to get at them ("deep magic" == resort to assembler and patch in function addresses). The only file-locking functions advertised in the C and platform SDK libraries are trivial variants of Python's msvcrt.locking, but that has to do with locking specific file byte-position ranges across processes, not ensuring the integrity of runtime stream structures across threads. Perl appears to ignore the issue of thread safety here (on Windows and everywhere else). Revealing experiment! 1. I threw away my changes and rebuilt from current CVS. 2. I made one change, expanding the getc() call in get_line to what MSVC *would* expand it to if we weren't building in thread mode: if ((c = (--fp->_cnt >= 0 ? 0xff & *fp->_ptr++ : _filbuf(fp))) == EOF) { That alone reduced the runtime of my "while 1: readline" test case from over 30 seconds to 12.8. What I did before went beyond that, by also (in effect) unrolling the loop and optimizing it. That bought an additional ~2 seconds. So compared to Perl's 6 seconds, it looks like we're paying (on Win98SE) approximately: 17 seconds for compiling with _MT (threadsafe libc) 6 seconds to do the work 5 seconds for "other stuff", best guess mostly a poor platform malloc/realloc 2 seconds for not optimizing the loop -- 30 total Unfortunately, the smoking gun is the only one whose firing pin we can't file down on this platform. so-the-good-news-is-that-it's-impossible-for-perl-not-to-be-at- least-twice-as-fast-ly y'rs - tim From guido at python.org Fri Jan 5 16:29:05 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 10:29:05 -0500 Subject: [Python-Dev] Python 2.1 release schedule (PEP 226) Message-ID: <200101051529.KAA19100@cj20424-a.reston1.va.home.com> We had our first PythonLabs meeting of the year yesterday, and we went over the 2.1 release schedule. The release schedule is posted in PEP 226: http://python.sourceforge.net/peps/pep-0226.html We found that the schedule previously posted there was a bit too aggressive, given our goals for this release, so we have adjusted the dates somewhat. We have also decided on a date for the first alpha release (previously unmentioned in the PEP). So, here are the relevant dates: 19-Jan-2001: First 2.1 alpha release 23-Feb-2001: First 2.1 beta release 01-Apr-2001: 2.1 final release We're already in PEP freeze mode -- no more PEPs will be considered for inclusion in 2.1. Below is a list of the PEPs that we are currently considering, with some comments. But first some general remarks: - The alpha release cycle is for testing of tentative features. Alpha releases contain working code that we want to see widely tested; however, it's possible that a feature present in an alpha release is changed or even retracted in a later release. - Beta releases represent a feature freeze -- after the first beta release, we will resign ourselves to fixing bugs. Once beta 1 is released, no new features will be introduced, and no features will be withdrawn. The alpha cycle is especially important for features (such as nested scopes) that (may) introduce backwards incompatibilities. There may be more than one alpha release depending on feedback on the alpha 1 release. (But having too many alpha releases is not good -- people won't bother downloading.) Thus, we can only introduce a new feature in beta 1 if we're very sure that it is mature enough to stay without interface changes. The final decision on all PEPs under consideration has to be made before the beta 1 release. The beta cycle is important to ensure stability of the final release. Specific PEPs under consideration: I 42 pep-0042.txt Small Feature Requests Hylton Actually, most of these won't be fulfilled in 2.1. SD 205 pep-0205.txt Weak References Drake Fred is still working on this. I hope Tim can assist. But we may have to postpone this. S 207 pep-0207.txt Rich Comparisons Lemburg, van Rossum I'm pretty sure that this is a piece of cake now that the coercion patches are checked in. S 208 pep-0208.txt Reworking the Coercion Model Schemenauer All checked in. Great work, Neil! S 217 pep-0217.txt Display Hook for Interactive Use Zadka Moshe, this was accepted ages ago. Would you mind submitting a patch to SourceForge? If you don't champion this (and nobody else does), we may have to postpone it still. S 222 pep-0222.txt Web Library Enhancements Kuchling This is really up to Andrew. It seems he plans to create new modules, so he won't be introducing incompatibilities in existing APIs. S 227 pep-0227.txt Statically Nested Scopes Hylton Jeremy is still working on a proper implementation, which he hopes to have ready in time for the first alpha release date. S 229 pep-0229.txt Using Distutils to Build Python Kuchling I just moved this from pie-in-the-sky to active. Andrew has a working prototype, it just doesn't work 100% yet, so I'm very hopeful. S 230 pep-0230.txt Warning Framework van Rossum All done. S 232 pep-0232.txt Function Attributes Warsaw Still waiting for Barry to implement this, but it's pretty straightforward. S 233 pep-0233.txt Python Online Help Prescod Paul, what's up with this? Tim & I recommended to do something simple and working, and then you disappeared from the face of the earth. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Jan 5 16:28:16 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 5 Jan 2001 10:28:16 -0500 (EST) Subject: [Python-Dev] new "theme" on SourceForge! Message-ID: <14933.59408.512734.105160@cj42289-a.reston1.va.home.com> While "theme-ability" is becoming very popular for desktop software (think about the latest Gnome and KDE systems for Unix, and some of the multimedia applications for Windows, and the newest MacOS desktops), it can be a huge drain on Web sites; too many graphics is a pain, and too many tables just makes it worse. SourceForge had definately fallen prey to the overly-fancy themes, and all of us developers paid the price with slow rendering. But they've fixed that! The SF crew has announced a new "theme" called "Ultra Light" which is optimized for slow connections. What that really means is less embedded graphics and fewer nested tables, so rendering is *much* faster. To try the new theme, go to the "Change My Theme" link near the top of the left-hand navigation area. Use the form to select "Ultra Light"; you can preview the theme first if you want. Guido also thinks its cool that the bug & patch report pages are printable with this theme. (Sheesh... managers! ;) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Fri Jan 5 18:46:16 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 12:46:16 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Lib fileinput.py,1.5,1.6 In-Reply-To: Message-ID: [Guido] > Modified Files: > fileinput.py > Log Message: > Speed it up by using readlines(sizehint). It's still slower than > other ways of reading input. :-( On my box, it's now head-to-head with (maybe even a little quicker than) the while 1: line-at-a-time way: total 117615824 chars and 3237568 lines readlines_sizehint 9.450 9.459 using_fileinput 29.880 29.884 while_readline 30.480 30.506 (stock CVS Python under Win98SE) So that's a huge improvement! the-two-people-using-fileinput-should-be-delighted-ly y'rs - tim From skip at mojam.com Fri Jan 5 20:05:14 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 5 Jan 2001 13:05:14 -0600 (CST) Subject: [Python-Dev] fileinput.py In-Reply-To: References: Message-ID: <14934.6890.160122.384692@beluga.mojam.com> Tim> the-two-people-using-fileinput-should-be-delighted-ly What do you think contributes to fileinput's relative disfavor? This whole thread on Python's file reading performance was started by the eternal whine "why is Python so much slower than Perl?" which really means why is line = f.readline() while line: process(line) so much slower than whatever that thing is in Perl that everybody uses as the be-all-end-all performance benchmark (something with <> in it). Given that fileinput is supposed to make the I/O loop in Python more familiar to those people wandering over from Perl (at least in part), you'd think that people would naturally gravitate to it. Would it benefit from some exposure in the Python tutorial? Is it fast enough now to warrant the extra exposure? just-whining-out-loud-ly y'rs Skip From tim.one at home.com Fri Jan 5 20:11:00 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 14:11:00 -0500 Subject: [Python-Dev] new "theme" on SourceForge! In-Reply-To: <14933.59408.512734.105160@cj42289-a.reston1.va.home.com> Message-ID: [Fred L. Drake, Jr.] Who would have guessed that the "L." stands for Light? > ... > The SF crew has announced a new "theme" called "Ultra Light" which > is optimized for slow connections. Indeed, I think I can cancel my cable modem now and go back to a 28.8 phone modem. liking-it!-ly y'rs - tim From jeremy at alum.mit.edu Fri Jan 5 20:14:49 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 5 Jan 2001 14:14:49 -0500 (EST) Subject: [Python-Dev] unit testing bake-off Message-ID: <14934.7465.360749.199433@localhost.localdomain> There was a brief discussion of unit testing last millennium, which did not reach any conclusions. I'd like to restart the discussion and set some specific goals. The action item is a unit testing bake-off, held next week, to choose a tool. The primary goal is to choose a unit testing framework for the regression test suite. Tests written with this framework would eventually replace the current regrtest.py framework, based on comparing test output to expected output. For the 2.1 release, the goal would be to choose a test framework to include in the standard distribution and use it to write some or all of the new tests. We would need to integrate it in some way with regrtest.py, so that a single command can be used to run all the tests. In the long run, we can migrate existing tests to use the new system. The new system can help us address some other goals: - running an entire test suite to completion instead of stopping on the first failure - clearer reporting of what went wrong - better support for conditional tests, e.g. write a test for httplib that only runs if the network is up. This is tied into better error reporting, since the current test suite could only report that httplib succeeded or failed. Does anyone disagree with the goal? Three tools have been proposed: PyUnit, Quixote unittest, and doctest. doctest has been championed by Peter Funk, who wants a few new features, but Tim, its author, isn't pushing it as a tool for writing stand alone tests. I think the best way to use doctest is for module writers to consider it when writing a new module. If doctest is used from the start for a module, we could integrate it with the regression test. It seems quite useful for what it is intended for, but is not a general solution. That leaves PyUnit and Quixote's unittest. The two tools are fairly similar, but differ on a number of non-trivial details. Quixote also integrates code coverage, which is quite handy. If we don't adopt its unittest, we should add code coverage to PyUnit. Is anyone else interested in the choice between the two? If so, I suggest you try writing some tests with each tool and reporting back with your feedback. I propose leaving one week for such a bake-off and making a decision next Friday. Jeremy From fredrik at effbot.org Fri Jan 5 20:55:18 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 5 Jan 2001 20:55:18 +0100 Subject: [Python-Dev] unit testing bake-off References: <14934.7465.360749.199433@localhost.localdomain> Message-ID: <004c01c07751$6eed84d0$e46940d5@hagrid> Jeremy Hylton wrote: > Is anyone else interested in the choice between the two? yes. I suggest adding doctest.py plus one unit test implementation. > If so, I suggest you try writing some tests with each tool and > reporting back with your feedback. we've recently migrated from a 30-minute reimplementation of Kent Beck's original framework to one of the frameworks you mention. with that background, the choice was easy. let me know when it's time to vote... From guido at python.org Fri Jan 5 20:55:33 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 14:55:33 -0500 Subject: [Python-Dev] fileinput.py In-Reply-To: Your message of "Fri, 05 Jan 2001 13:05:14 CST." <14934.6890.160122.384692@beluga.mojam.com> References: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: <200101051955.OAA20190@cj20424-a.reston1.va.home.com> > What do you think contributes to fileinput's relative disfavor? In my view, fileinput is one of those unfortunate features that exist solely to shut up a particular kind of criticism. Without fileinput, Perl zealots would have an easy argument for a "trivial reject" of even considering Python. Now, when somebody claims the superiority of Perl's "loop involving a <> thingie", you can point to fileinput to prevent them from scoring a point. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 5 21:01:13 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 15:01:13 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: Your message of "Fri, 05 Jan 2001 20:55:18 +0100." <004c01c07751$6eed84d0$e46940d5@hagrid> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> Message-ID: <200101052001.PAA20238@cj20424-a.reston1.va.home.com> > yes. I suggest adding doctest.py plus one unit test implementation. I second this vote for doctest (in addition to a unittest thing). I propose that Tim checks in his latest version of doctest. It should go under Lib, not under Lib/test, I think. (Certainly that's how Tim has been proposing its use.) It requires LaTeX docs, but since it's got a great docstring, that should be easy. > > If so, I suggest you try writing some tests with each tool and > > reporting back with your feedback. > > we've recently migrated from a 30-minute reimplementation of Kent > Beck's original framework to one of the frameworks you mention. with > that background, the choice was easy. let me know when it's time to > vote... Which framework are you now using? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 5 21:14:41 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 15:14:41 -0500 Subject: [Python-Dev] Add __exports__ to modules Message-ID: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Please have a look at this SF patch: http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 This implements control over which names defined in a module are externally visible: if there's a variable __exports__ in the module, it is a list of identifiers, and any access from outside the module to names not in the list is disallowed. This affects access using the getattr and setattr protocols (which raise AttributeError for disallowed names), as well as "from M import v" (which raises ImportError). I like it. This has been asked for many times. Does anybody see a reason why this should *not* be added? Tim remarked that introducing this will prompt demands for a similar feature on classes and instances, where it will be hard to implement without causing a bit of a slowdown. It causes a slight slowdown (an extra dictionary lookup for each use of "M.v") even when it is not used, but for accessing module variables that's acceptable. I'm not so sure about instance variable references. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Fri Jan 5 21:19:55 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 5 Jan 2001 15:19:55 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101052001.PAA20238@cj20424-a.reston1.va.home.com> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> Message-ID: <14934.11371.879059.610988@localhost.localdomain> If anyone is interested in experimenting with a test suite, here is a summary of the code coverage for the current regression test suite as run on my Linux box. Pick a module with low code coverage and your experiment can also improve the regression test suite. Jeremy 67.42% 798 Modules/arraymodule.c 74.39% 773 Modules/audioop.c 81.84% 380 Modules/binascii.c 62.36% 449 Modules/bsddbmodule.c 78.29% 152 Modules/cmathmodule.c 67.89% 246 Modules/_codecsmodule.c 47.41% 2647 Modules/cPickle.c 87.50% 8 Modules/cryptmodule.c 64.34% 272 Modules/cStringIO.c 0.00% 1351 Modules/_cursesmodule.c 0.00% 202 Modules/_curses_panel.c 99.28% 139 Modules/errnomodule.c 30.71% 127 Modules/fcntlmodule.c 81.90% 315 Modules/gcmodule.c 0.00% 4 Modules/getbuildinfo.c 47.29% 277 Modules/getpath.c 72.22% 54 Modules/grpmodule.c 79.95% 419 Modules/imageop.c 0.00% 11 Modules/../Include/cStringIO.h 13.25% 234 Modules/linuxaudiodev.c 14.80% 223 Modules/_localemodule.c 30.66% 137 Modules/main.c 73.20% 97 Modules/mathmodule.c 98.39% 124 Modules/md5c.c 69.70% 66 Modules/md5module.c 48.62% 362 Modules/mmapmodule.c 66.22% 74 Modules/newmodule.c 84.91% 53 Modules/operator.c 50.57% 1236 Modules/parsermodule.c 0.00% 350 Modules/pcremodule.c 28.88% 1077 Modules/posixmodule.c 82.05% 39 Modules/pwdmodule.c 77.96% 431 Modules/pyexpat.c 0.00% 1876 Modules/pypcre.c 50.00% 2 Modules/python.c 0.00% 189 Modules/readline.c 78.35% 425 Modules/regexmodule.c 72.93% 931 Modules/regexpr.c 0.00% 81 Modules/resource.c 76.98% 443 Modules/rgbimgmodule.c 82.70% 289 Modules/rotormodule.c 82.47% 291 Modules/selectmodule.c 85.10% 208 Modules/shamodule.c 81.52% 276 Modules/signalmodule.c 51.18% 678 Modules/socketmodule.c 78.64% 1105 Modules/_sre.c 69.67% 689 Modules/stropmodule.c 80.49% 656 Modules/structmodule.c 4.88% 123 Modules/termios.c 60.71% 140 Modules/threadmodule.c 68.78% 205 Modules/timemodule.c 76.92% 65 Modules/ucnhash.c 87.50% 16 Modules/unicodedatabase.c 65.83% 120 Modules/unicodedata.c 68.81% 420 Modules/zlibmodule.c 64.68% 1005 Objects/abstract.c 18.77% 261 Objects/bufferobject.c 68.77% 1204 Objects/classobject.c 27.59% 58 Objects/cobject.c 59.41% 271 Objects/complexobject.c 78.32% 678 Objects/dictobject.c 52.14% 723 Objects/fileobject.c 80.43% 368 Objects/floatobject.c 84.86% 185 Objects/frameobject.c 60.40% 149 Objects/funcobject.c 78.68% 455 Objects/intobject.c 77.66% 779 Objects/listobject.c 81.17% 1142 Objects/longobject.c 50.68% 148 Objects/methodobject.c 58.82% 136 Objects/moduleobject.c 76.50% 549 Objects/object.c 15.24% 105 Objects/rangeobject.c 41.03% 78 Objects/sliceobject.c 76.63% 1797 Objects/stringobject.c 77.00% 287 Objects/tupleobject.c 22.22% 18 Objects/typeobject.c 84.26% 108 Objects/unicodectype.c 66.61% 2743 Objects/unicodeobject.c 90.79% 76 Parser/acceler.c 0.00% 28 Parser/bitset.c 0.00% 67 Parser/firstsets.c 18.18% 22 Parser/grammar1.c 0.00% 139 Parser/grammar.c 0.00% 30 Parser/intrcheck.c 0.00% 38 Parser/listnode.c 0.00% 2 Parser/metagrammar.c 0.00% 63 Parser/myreadline.c 90.70% 43 Parser/node.c 82.26% 124 Parser/parser.c 79.38% 97 Parser/parsetok.c 0.00% 366 Parser/pgen.c 0.00% 85 Parser/pgenmain.c 0.00% 60 Parser/printgrammar.c 76.70% 588 Parser/tokenizer.c 62.31% 1231 Python/bltinmodule.c 76.55% 2021 Python/ceval.c 64.78% 230 Python/codecs.c 73.85% 2367 Python/compile.c 76.67% 30 Python/dynload_shlib.c 75.75% 301 Python/errors.c 65.59% 401 Python/exceptions.c 0.00% 31 Python/frozenmain.c 56.83% 776 Python/getargs.c 100.00% 2 Python/getcompiler.c 100.00% 2 Python/getcopyright.c 80.00% 5 Python/getmtime.c 15.62% 32 Python/getopt.c 100.00% 2 Python/getplatform.c 100.00% 4 Python/getversion.c 61.78% 1167 Python/import.c 66.67% 42 Python/importdl.c 51.35% 483 Python/marshal.c 60.58% 274 Python/modsupport.c 88.73% 71 Python/mystrtoul.c 0.00% 2 Python/pyfpe.c 91.15% 113 Python/pystate.c 37.80% 635 Python/pythonrun.c 0.00% 5 Python/sigcheck.c 12.67% 150 Python/structmember.c 53.87% 323 Python/sysmodule.c 100.00% 5 Python/thread.c 53.47% 144 Python/thread_pthread.h 21.74% 138 Python/traceback.c 58.65% 48417 TOTAL From tim.one at home.com Fri Jan 5 21:46:10 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 15:46:10 -0500 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: [Skip Montanaro] > What do you think contributes to fileinput's relative disfavor? Only half jokingly, because I never use it , and I don't think Fredrik or Alex Martelli do either. That means it rarely gets mentioned by the c.l.py reply bots. Plus it's not *used* anywhere in the Python distribution, so nobody stumbles into it that way either. Plus the docs require more than one line to explain what it does, and get bogged down describing the Awk-like (Perl took this from Awk) convolutions before the simplest (one explictly named file) case. It *is* regularly mentioned in the eternal "while 1:" debate, but that's it. > This whole thread on Python's file reading performance was started > by the eternal whine "why is Python so much slower than Perl?" No, it started with Guido's objections to Jeff's xreadlines patch. I dragged Perl into it -- because, like it or not, that was the right thing to do . > which really means why is > > line = f.readline() > while line: > process(line) > > so much slower than whatever that thing is in Perl that everybody > uses as the be-all-end-all performance benchmark (something with > <> in it). "" is simply Perl's way of spelling Python's FILE.readline() (and FILE.readlines(), when appears in an array context; and FILE.read() when Perl's Awkish "record separator" is disabled; and ...). "<>" without an explict filehandle does all the inherited-from-Awk magic with argv, else that stuff doesn't come into play. "<>" (wihtout a filehandle) seems rarely used in Perl practice, though, *except* in support of your_shell_prompt> some_perl_script < some_file That is, "<>" is usually used simply as an abbrevision for , and I bet *most* Perl programmers don't even know "<>" is more general than that. > Given that fileinput is supposed to make the I/O loop in Python more > familiar to those people wandering over from Perl (at least in part), > you'd think that people would naturally gravitate to it. I guess you didn't actually read the timing results . Really, it's been an outrageously slow way to do input. That's better now, and I'm much more likely now than I used to be to use for line in fileinput.input('file'): instead of f = open('file') while 1: line = f.readline() if not line: break The relative attraction of the former is obvious if it's reasonably quick. I don't really have any use for the Awk complications (note that I'm running on Windows, though, and the shells here don't expand wildcards -- the Awk gimmicks are much more useful on Unix systems). > Would it benefit from some exposure in the Python tutorial? Heh -- that's a tough one. The *simplest* case is the only one deserving of promotion. But in that case, Jeff's xreadlines is about as convenient and much quicker. I bet we'll all be afraid to change the tutorial to mention either <0.9 wink>. > Is it fast enough now to warrant the extra exposure? Don't know. It's the same speed as "while 1: on *my* box now, but still 3x slower than the double-loop method. > just-whining-out-loud-ly y'rs so-do-*you*-want-to-use-it-now?-ly y'rs - tim From thomas at xs4all.net Fri Jan 5 22:19:42 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 5 Jan 2001 22:19:42 +0100 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: ; from tim.one@home.com on Fri, Jan 05, 2001 at 03:46:10PM -0500 References: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: <20010105221942.J2467@xs4all.nl> On Fri, Jan 05, 2001 at 03:46:10PM -0500, Tim Peters wrote: > "<>" (wihtout a filehandle) seems > rarely used in Perl practice, though, *except* in support of > > your_shell_prompt> some_perl_script < some_file > > That is, "<>" is usually used simply as an abbrevision for , and I > bet *most* Perl programmers don't even know "<>" is more general than that. Well, I can't say anything about *most* Perl programmers, but all Perl programmers I know (including me) know damned well what <> does, and use it frequently. And in all the ways: no arguments meaning , a list of files meaning open those files one at a time, using - to include stdin in that list, accessing the filename and linenumber, etc. None of them can be called newbies, though. But then, I like using Python's fileinput, too, so maybe I'm just weird :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ping at lfw.org Fri Jan 5 23:01:53 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 5 Jan 2001 16:01:53 -0600 (CST) Subject: [Python-Dev] RE: fileinput.py In-Reply-To: <20010105221942.J2467@xs4all.nl> Message-ID: On Fri, 5 Jan 2001, Thomas Wouters wrote: > On Fri, Jan 05, 2001 at 03:46:10PM -0500, Tim Peters wrote: > > That is, "<>" is usually used simply as an abbrevision for , and I > > bet *most* Perl programmers don't even know "<>" is more general than that. > > Well, I can't say anything about *most* Perl programmers, but all Perl > programmers I know (including me) know damned well what <> does, and use it > frequently. And in all the ways: no arguments meaning , a list of > files meaning open those files one at a time, using - to include stdin in > that list, accessing the filename and linenumber, etc. I was just about to chime in and say the same thing. I don't even program in Perl any more, and i still remember all the ways that <> works. For text-processing scripts, it's unbeatable. It does pretty much exactly everything you want, and the idiom while (<>) { ... } is simple, quickly learned, frequently used, and instantly recognizable. import sys if len(sys.argv) > 1: file = open(sys.argv[1]) else: file = sys.stdin while 1: line = file.readline() if not line: break ... is much more complex, harder to explain, harder to learn, and runs slower. I have two separate suggestions: 1. Include 'sys' in builtins. It's silly to have to 'import sys' just to be able to see sys.argv and sys.stdin. 2. Put fileinput.input() in sys. With both, the while (<>) idiom becomes: for line in sys.input(): ... -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From skip at mojam.com Fri Jan 5 23:19:36 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 5 Jan 2001 16:19:36 -0600 (CST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14934.11371.879059.610988@localhost.localdomain> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> <14934.11371.879059.610988@localhost.localdomain> Message-ID: <14934.18552.749081.871226@beluga.mojam.com> Jeremy> If anyone is interested in experimenting with a test suite, here Jeremy> is a summary of the code coverage for the current regression Jeremy> test suite as run on my Linux box. Speaking of which, I am still running my nightly code coverage thing (still with warts) whose results are available at http://musi-cal.mojam.com/~skip/python/Python/dist/src/ Does anyone care? Should I turn it off? Skip From thomas at xs4all.net Sat Jan 6 00:18:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 00:18:58 +0100 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: ; from ping@lfw.org on Fri, Jan 05, 2001 at 04:01:53PM -0600 References: <20010105221942.J2467@xs4all.nl> Message-ID: <20010106001858.B402@xs4all.nl> On Fri, Jan 05, 2001 at 04:01:53PM -0600, Ka-Ping Yee wrote: > while (<>) { > ... > } > is simple, quickly learned, frequently used, and instantly recognizable. > import sys > if len(sys.argv) > 1: > file = open(sys.argv[1]) > else: > file = sys.stdin > while 1: > line = file.readline() > if not line: > break > ... ... Except that it can take more than one filename, and will do the one after another, and that it takes "-" as a filename for stdin. Doing it in a script is not dead simple, unless you open up all files at once (which can be harmful, and Perl, for one, doesn't do) or you do most of the work fileinput does. That is why I use fileinput (and while-diamond) -- I might not need it now, but when I do need it, it already works :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From moshez at zadka.site.co.il Sat Jan 6 12:00:33 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sat, 6 Jan 2001 13:00:33 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> On Fri, 05 Jan 2001 15:14:41 -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). Ummmmm.....why do we want this? What's wrong with the current suggestion of using "_"? __exports__ feels somehow wrong to me. None of the rest of Python has any access control, and I really like that. A big -1 from me, for what it's worth. > I like it. I'm surprised. Why do you like that? > This has been asked for many times. So has adding curly-braces as control structure, with all due respect. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From billtut at microsoft.com Sat Jan 6 04:43:06 2001 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 5 Jan 2001 19:43:06 -0800 Subject: [Python-Dev] Add __exports__ to modules Message-ID: <58C671173DB6174A93E9ED88DCB0883DB8637E@red-msg-07.redmond.corp.microsoft.com> I think I'm with Moshe on this one, whats wrong with just using underscores (__) to play the hiding game. Here's my silly language suggestion for this week: with self: .bar = foo bar.blah = .fubar .bar = .bar + 1 # etc.... Bill From skip at mojam.com Sat Jan 6 05:15:12 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 5 Jan 2001 22:15:12 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <14934.39888.908416.983794@beluga.mojam.com> > On Fri, 05 Jan 2001 15:14:41 -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). I have to agree with Moshe. If __exports__ is implemented for modules we'll have multiple, different access control mechanisms for different things, some of which thoughtful programmers would be able to get around, some of which they wouldn't. Here are the ways I'm aware of to control attribute visibility (there may be others - I don't usually delve too deeply into this stuff): * preface module globals with "_": This just prevents those globals from being added to the current namespace when a programmer executes "from module import *". Programmers can workaround this by attribute access through the module object or by explicitly importing it: "from module import _foo" works, yes? * preface class or instance attributes with "__": This just mangles the name by prefacing the visible name with _. The programmer can still access it by knowing the simple name mangling rule. In both cases the programmer can still get at the attribute value when necessary. If you were to add some sort of access control to module globals, I would have thought it would have been along the same lines as the existing mechanisms in place to "hide" class/instance attributes. Would it be possible (or desirable) to add the name mangling restriction to module globals as an alternative to this more restrictive implementation? What about the chances that class/instance attribute hiding will get more restrictive in the future? Finally, are the motivations for wanting to restrict access to module globals and class/instance attributes that much different from one another that they call for fundamentally different mechanisms? Skip From barry at digicool.com Sat Jan 6 06:15:20 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 00:15:20 -0500 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <14934.43496.322436.612746@anthem.wooz.org> I'm -0 on this, largely for the reasons already brought up: if modules grow __exports__ then there will be pressure to add it to classes, and modules already have a limited version of access control through leading underscore names. I might be more positive on the addition if __exports__ were added to classes, because at least there'd be a consistently stronger fence added to name access rules that prevented even consenting adults from fiddling with the naughty bits. -Barry From nas at arctrix.com Sat Jan 6 00:20:58 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 5 Jan 2001 15:20:58 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14934.43496.322436.612746@anthem.wooz.org>; from barry@digicool.com on Sat, Jan 06, 2001 at 12:15:20AM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> Message-ID: <20010105152058.A6016@glacier.fnational.com> On Sat, Jan 06, 2001 at 12:15:20AM -0500, Barry A. Warsaw wrote: > I might be more positive on the addition if __exports__ were added to > classes, because at least there'd be a consistently stronger fence > added to name access rules that prevented even consenting adults from > fiddling with the naughty bits. I think you, Skip and Moshe are missing a big advantage of having the __exports__ mechanism. It should allow some attribute access inside of modules to become faster (like LOAD_FAST for locals). I think that optimization could be implemented without too much difficultly. I've never channeled Guido before so I could be off the mark. If the only advantage is encapsulation then I'm -0. Neil From barry at digicool.com Sat Jan 6 08:09:31 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 02:09:31 -0500 Subject: [Python-Dev] PEP 232 update and patch Message-ID: <14934.50347.851118.581484@anthem.wooz.org> I've updated PEP 232, function attributes, and uploaded a patch to SF. I couldn't coax cvs diff into including the new files Lib/test/test_funcattrs.py and Lib/test/output/test_funcattrs so I'll attach them below. PEP 232: http://python.sourceforge.net/peps/pep-0232.html SF patch #103123: http://sourceforge.net/patch/?func=detailpatch&patch_id=103123&group_id=5470 Enjoy, -Barry -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs URL: From martin at loewis.home.cs.tu-berlin.de Sat Jan 6 11:06:49 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 11:06:49 +0100 Subject: [Python-Dev] PEP 208 comment Message-ID: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> I just studied PEP 208 for the first time. Overall, it seems all natural and nice, but there is one one aspect I'd like to see changed: the naming of the type flag. Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a program should be called "new". The flag will still be there five years from now, but it won't be new anymore. Also, while the flag indicates that style of the numbers is new, it does not say what it does. So I propose to rename it; if nobody finds a better name, I propose to call it Py_TPFLAGS_UNCOERCED. Regards, Martin From thomas at xs4all.net Sat Jan 6 13:52:19 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 13:52:19 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sat, Jan 06, 2001 at 11:06:49AM +0100 References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> Message-ID: <20010106135219.L2467@xs4all.nl> On Sat, Jan 06, 2001 at 11:06:49AM +0100, Martin v. Loewis wrote: > Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a > program should be called "new". The flag will still be there five > years from now, but it won't be new anymore. Also, while the flag > indicates that style of the numbers is new, it does not say what it > does. So I propose to rename it; if nobody finds a better name, I > propose to call it Py_TPFLAGS_UNCOERCED. Wrong name. The TPFLAGs only indicate whether a struct is large enough to contain a particular member, not whether that member is going to contain or do anything. 'Py_TPFLAGS_HASCOERCE' or some such would seem more appropriate to me. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Sat Jan 6 14:36:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 14:36:39 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <20010106135219.L2467@xs4all.nl> (message from Thomas Wouters on Sat, 6 Jan 2001 13:52:19 +0100) References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <20010106135219.L2467@xs4all.nl> Message-ID: <200101061336.f06DadP02895@mira.informatik.hu-berlin.de> > Wrong name. The TPFLAGs only indicate whether a struct is large enough to > contain a particular member, not whether that member is going to contain or > do anything. That may have been the original intention; *this* specific flag is not of that kind. Please look at abstract.c:binary_op1, which has if (v->ob_type->tp_as_number != NULL && NEW_STYLE_NUMBER(v)) { slot = NB_BINOP(v->ob_type->tp_as_number, op_slot); if (*slot) { x = (*slot)(v, w); if (x != Py_NotImplemented) { return x; } Py_DECREF(x); /* can't do it */ } if (v->ob_type == w->ob_type) { goto binop_error; } } Here, no additional member was added: there always was tp_as_number, and that also supported all possible op_slot values. What is new here is that the slot may be called even if v and w have different types; that was not allowed before the PEP 208 changes. Yet it tests for NEW_STYLE_NUMBER(v), which is PyType_HasFeature((o)->ob_type, Py_TPFLAGS_NEWSTYLENUMBER) So the presence of this flag is indeed an promise that a specific member will do something that it normally wouldn't do. > 'Py_TPFLAGS_HASCOERCE' or some such would seem more appropriate to > me. Well, all numbers still have coercion - it just may not be used if the flag is present. It's not a matter of having or not having something (well, only the "new style" numbers may have nb_cmp, but calling it Py_TPFLAGS_HAS_NB_CMP would be besides the point, IMO). Anyway, I don't want to defend my version too much - I just want to request that the current name is changed to *something* more descriptive. Regards, Martin From skip at mojam.com Sat Jan 6 15:40:30 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 08:40:30 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010105152058.A6016@glacier.fnational.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> <20010105152058.A6016@glacier.fnational.com> Message-ID: <14935.11870.360839.235102@beluga.mojam.com> Neil> I think you, Skip and Moshe are missing a big advantage of having Neil> the __exports__ mechanism. It should allow some attribute access Neil> inside of modules to become faster (like LOAD_FAST for locals). I Neil> think that optimization could be implemented without too much Neil> difficultly. True enough, that hadn't occurred to me. Knowing that now, I still don't think consistency of the interface should suffer as a result of under-the-covers performance gains. Skip From skip at mojam.com Sat Jan 6 15:42:25 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 08:42:25 -0600 (CST) Subject: [Python-Dev] Re: [Patches] [Patch #103123] PEP 232 implementation (function attributes) In-Reply-To: References: Message-ID: <14935.11985.972526.108391@beluga.mojam.com> Oooo... I tried went to check out Barry's function attribute patch at http://sourceforge.net/patch/?func=detailpatch&patch_id=103123&group_id=5470 and got Fatal error: Call to a member function on a non-object in /usr/local/htdocs/alexandria/www/patch/index.php on line 55 in response. Any idea whazzup? Skip From akuchlin at cnri.reston.va.us Sat Jan 6 15:47:59 2001 From: akuchlin at cnri.reston.va.us (Andrew Kuchling) Date: Sat, 6 Jan 2001 09:47:59 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14934.18552.749081.871226@beluga.mojam.com>; from skip@mojam.com on Fri, Jan 05, 2001 at 04:19:36PM -0600 References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> <14934.11371.879059.610988@localhost.localdomain> <14934.18552.749081.871226@beluga.mojam.com> Message-ID: <20010106094759.A13723@newcnri.cnri.reston.va.us> On Fri, Jan 05, 2001 at 04:19:36PM -0600, Skip Montanaro wrote: >Speaking of which, I am still running my nightly code coverage thing (still >with warts) whose results are available at > http://musi-cal.mojam.com/~skip/python/Python/dist/src/ Add a link to it from the Python development pages on SourceForge; I suspect much of the problem is that people don't remember the URL for it, and don't want to dig through the archives to find it. --amk From mal at lemburg.com Sat Jan 6 16:15:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 06 Jan 2001 16:15:27 +0100 Subject: [Python-Dev] PEP 208 comment References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> Message-ID: <3A57368F.FC01F78@lemburg.com> "Martin v. Loewis" wrote: > > I just studied PEP 208 for the first time. Overall, it seems all > natural and nice, but there is one one aspect I'd like to see changed: > the naming of the type flag. > > Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a > program should be called "new". The flag will still be there five > years from now, but it won't be new anymore. Also, while the flag > indicates that style of the numbers is new, it does not say what it > does. So I propose to rename it; if nobody finds a better name, I > propose to call it Py_TPFLAGS_UNCOERCED. Given that the design could well be applied to other slots as well, I think you've got a point there. The idea behind the flag was to signal that slots will no longer make object type assumptions which they could previously. Right now, only numeric types support this feature. In the future I could imaging strings and other types involving coercion would also want to use the feature. Given this design idea, how about calling the flag Py_TPFLAGS_CHECKTYPES ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Sat Jan 6 16:35:20 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 09:35:20 -0600 (CST) Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error Message-ID: <14935.15160.130742.390323@beluga.mojam.com> You know, I thought of something (which was probably already obvious to the rest of you) while perusing Barry's patch. Attaching function attributes to unbound methods could really function like C++ static data members. You'd have to write accessor functions to make setting the attributes look clean, but that wouldn't be all bad. Precisely because you couldn't modify them through the bound method, there's be no chance you could make the mistake of modifying them that way and having them transmogrify into instance attributes. Here's a quick example: class C: def __init__(self): self.just_resting() __init__.howmany = 0 def __del__(self): self.hes_dead() def hes_dead(self): C.__init__.howmany -= 1 def just_resting(self): C.__init__.howmany += 1 def howmany(self): return C.__init__.howmany def howmany(): return C.__init__.howmany c = C() print c.howmany() d = C() print d.howmany() del c print d.howmany() After applying Barry's patch, if I execute this script from the command line it displays 1 2 1 as one would expect, but then catches an attribute error during cleanup: Exception exceptions.AttributeError: "'None' object has no attribute '__init__'" in ignored If I add "del d" to the end of the script the exception disappears. I suspect there is a cleanup order problem of some sort. It seems like C is getting reclaimed before d (not possible), or that d's __class__ attribute is set to None before its __del__ method is called. Is this a known problem or something introduced by Barry's patch? Skip From barry at digicool.com Sat Jan 6 17:09:47 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 11:09:47 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103123] PEP 232 implementation (function attributes) References: <14935.11985.972526.108391@beluga.mojam.com> Message-ID: <14935.17227.634808.132783@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> and got | Fatal error: Call to a member function on a non-object in | /usr/local/htdocs/alexandria/www/patch/index.php on line 55 SM> in response. Any idea whazzup? I got a similar error on SF when I tried to find my patch on the patches page. I still think the patch manager just gives you no way to see all the patches when there's more than what fits on one page. The error dropped a cookie in my lap that logged me out too. After I logged in again, it all seemed to work. -Barry From martin at loewis.home.cs.tu-berlin.de Sat Jan 6 16:20:51 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 16:20:51 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <3A57368F.FC01F78@lemburg.com> (mal@lemburg.com) References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <3A57368F.FC01F78@lemburg.com> Message-ID: <200101061520.f06FKpu03218@mira.informatik.hu-berlin.de> > Given this design idea, how about calling the flag > Py_TPFLAGS_CHECKTYPES ?! Sounds good to me. Martin From thomas at xs4all.net Sat Jan 6 17:47:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 17:47:24 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <200101061336.f06DadP02895@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sat, Jan 06, 2001 at 02:36:39PM +0100 References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <20010106135219.L2467@xs4all.nl> <200101061336.f06DadP02895@mira.informatik.hu-berlin.de> Message-ID: <20010106174724.M2467@xs4all.nl> On Sat, Jan 06, 2001 at 02:36:39PM +0100, Martin v. Loewis wrote: > That may have been the original intention; *this* specific flag is not > of that kind. Please look at abstract.c:binary_op1, which has You're right, I stand corrected, I retract my proposal :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Sat Jan 6 23:05:23 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 06 Jan 2001 17:05:23 -0500 Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error In-Reply-To: Your message of "Sat, 06 Jan 2001 09:35:20 CST." <14935.15160.130742.390323@beluga.mojam.com> References: <14935.15160.130742.390323@beluga.mojam.com> Message-ID: <200101062205.RAA23603@cj20424-a.reston1.va.home.com> > You know, I thought of something (which was probably already obvious to the > rest of you) while perusing Barry's patch. Attaching function attributes to > unbound methods could really function like C++ static data members. You'd > have to write accessor functions to make setting the attributes look clean, > but that wouldn't be all bad. Precisely because you couldn't modify them > through the bound method, there's be no chance you could make the mistake of > modifying them that way and having them transmogrify into instance > attributes. > > Here's a quick example: > > class C: > def __init__(self): > self.just_resting() > __init__.howmany = 0 > > def __del__(self): > self.hes_dead() > > def hes_dead(self): > C.__init__.howmany -= 1 > > def just_resting(self): > C.__init__.howmany += 1 > > def howmany(self): > return C.__init__.howmany > > def howmany(): > return C.__init__.howmany > > c = C() > print c.howmany() > d = C() > print d.howmany() > del c > print d.howmany() Skip, I don't find this better than the existing solution, which uses C._howmany instead of C.__init__.howmany. True, you can access it as self._howmany and if you assign to self._howmany you'd transform it into an instance attribute -- but that falls in the "then don't do that" category. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat Jan 6 23:14:44 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 17:14:44 -0500 Subject: [Python-Dev] Rehabilitating fgets Message-ID: [Guido] > ... > Unfortunately we can't use fgets(), even if it were faster than > getline(), because it doesn't tell how many characters it read. Let's think about that a little harder, because it appears to be our only hope on Windows (the MS fgets isn't optimized like the Perl inner loop, but it does lock/unlock the stream only at routine entry/exit, and uses a hidden non-locking (== much faster) variant of getc in the guts -- we've seen that the "locking" part of MS getc accounts for 17 of 30 seconds in my test case). > On files containing null bytes, readline() is supposed to treat > these like any other character; fgets does too (at least it does on Windows, and I believe that's std behavior). The problem is that it also makes up a null byte on its own. > If your input is "abc\0def\nxyz\n", the first readline() call > should return "abc\0def\n". Yes. > But with fgets(), you're left to look in the returned buffer for > a null byte, Also yes. But suppose I search "from the right", and ensure the buffer is free of null bytes before the fgets. For your input file above, fgets overwrites the initial 9 bytes of the buffer (assuming the buffer is at least 9 bytes long ...) with "abc\0def\n\0" and there's no problem if I search from the right. > and there's no way (in general) to distinguish this result from > an input file that only consisted of the three characters "abc". As above, I'm not convinced of that. The input file "abc" would overwrite the first four bytes of the buffer with "abc\0" and leave the tail end alone (well, the MS fgets leaves the tail alone, although I'm not sure ANSI C guarantees that). Of course I've *read* any number of Unix(tm) FAQs that also claim it's impossible, but I never believed them either . This extra buffer fiddling is surely an expense I don't want to pay, but the timing evidence on Windows so far says that I can probably search and/or copy the whole buffer 100 times and still be faster than enduring the threadsafe getc. Am I missing something obvious? From guido at python.org Sat Jan 6 23:33:00 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 06 Jan 2001 17:33:00 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Your message of "Sat, 06 Jan 2001 17:14:44 EST." References: Message-ID: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> [Tim suggests to use fgets(), preparing the buffer with non-null bytes, and searching for a null byte from the right.] If this is really sufficiently fast, I'd say, go for it. Looks bullet-proof as long as the source code to MSVCRT doesn't change. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat Jan 6 23:34:42 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 17:34:42 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Message-ID: [Tim, pondering] > ... But suppose I search "from the right", and ensure the buffer is > free of null bytes before the fgets. Even better, suppose I ensure the buffer is free of both null bytes and newlines before the fgets; then if I search from the *left* for a newline and find one, it must be that fgets found a line and it ends right there, and this should usually obtain. There's no need to search from the right unless I don't find a newline ... From skip at mojam.com Sun Jan 7 02:15:08 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 19:15:08 -0600 (CST) Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error In-Reply-To: <200101062205.RAA23603@cj20424-a.reston1.va.home.com> References: <14935.15160.130742.390323@beluga.mojam.com> <200101062205.RAA23603@cj20424-a.reston1.va.home.com> Message-ID: <14935.49948.574427.668588@beluga.mojam.com> Skip> Attaching function attributes to unbound methods could really Skip> function like C++ static data members.... Guido> Skip, I don't find this better than the existing solution, which Guido> uses C._howmany instead of C.__init__.howmany. It was more a "hey, I never thought of it quite that way" than a "hey, I think this would be a great new idiom". In fact, I believe the more important part of my note was the bit about the attribute error on exit. I'm sure function attributes will attract their fair share of abuse. ;-) Skip From tim_one at email.msn.com Sun Jan 7 04:16:31 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 22:16:31 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow Message-ID: I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. test_builtin fails because raw_input() isn't stripping a trailing newline. I've got my own code in this area that *may* be to blame, but I don't see how it could be. I note that fileobject.c's new function get_line_raw has the comment /* Internal routine to get a line for raw_input(): strip trailing '\n', raise EOFError if EOF reached immediately */ but the code doesn't look for a trailing newline (let alone strip one). From tim_one at email.msn.com Sun Jan 7 04:33:02 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 22:33:02 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> Message-ID: > [Tim suggests to use fgets(), preparing the buffer with non-null > bytes, and searching for a null byte from the right.] [Guido] > If this is really sufficiently fast, I'd say, go for it. Looks > bullet-proof as long as the source code to MSVCRT doesn't change. :-) Surprise? Despite all the memsets, memchrs (looking for a newline), and one-at-a-time backward searches (looking for a null byte), it's a huge win on Windows: total 117615824 chars and 3237568 lines readlines_sizehint 9.550 9.578 using_fileinput 28.790 28.781 while_readline 13.120 13.134 The last one was 30.5 seconds before the fgets hackery. I'll check it in tomorrow after sleeping on it (there's a large pile of messy endcases (not only does fgets() invent a null byte, it can't tell you whether it stopped reading due to EOF, so maybe the last line in the file ends with 10000 null bytes + no newline + exactly lines up with a buffer boundary -- etc); test_builtin is failing in a closely related area but nobody would have checked in code that failed a std test ; and it's been a frustrating day all around). i-want-my-cable-modem-back-now-ly y'rs - tim From esr at thyrsus.com Sun Jan 7 05:01:25 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 6 Jan 2001 23:01:25 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: ; from tim_one@email.msn.com on Sat, Jan 06, 2001 at 10:33:02PM -0500 References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> Message-ID: <20010106230125.A29058@thyrsus.com> Tim Peters : > > [Tim suggests to use fgets(), preparing the buffer with non-null > > bytes, and searching for a null byte from the right.] No, I haven't forgotten about the curses autoconfig stuff. But... This mess reminds me. For some work I'm doing right now, it would be very useful if there were a way to query the end-of-file status of a file descriptor without actually doing a read. I don't see this ability anywhere in the 2.0 API. Questions: 1. Am I missing something obvious? 2. If the answer to 1 is that I am not, in fact, being a dumbass, what is the right way to support this? The obvious alternatives are an eof member (analogous to the existing `closed' member, or an eof() method. I favor the latter. 3. If we agree on a design, I'm willing to implement this at least for Unix. Should be a small project. -- Eric S. Raymond The direct use of physical force is so poor a solution to the problem of limited resources that it is commonly employed only by small children and great nations. -- David Friedman From skip at mojam.com Sun Jan 7 05:05:22 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 22:05:22 -0600 (CST) Subject: [Python-Dev] readline module seems crippled - am I missing something? Message-ID: <14935.60162.726131.593211@beluga.mojam.com> For a more-or-less throwaway script I'm working on I need a little input function similar to Emacs's read-from-minibuffer, which accepts both a prompt and an initial string for the input buffer. Seems like I ought to be able to whip something up using readline, but it's not happening. GNU readline's docs aren't the greatest, but I thought this simple script would work: import readline readline.insert_text("default") x = raw_input("?") print x I expected to see an editable "default" displayed after the prompt and have x default to "default" if I just hit the return key. I see nothing displayed after the question mark, and x is the empty string if I just hit return. This does print "default": readline.insert_text("default") x = readline.get_line_buffer() print x so I know that insert_text and get_line_buffer seem to be working as intended. Looking at call_readline in Modules/readline.c I see nothing that would disrupt the line buffer before the call to readline(). Am I missing something totally obvious about how GNU readline works or the conditions under which readline is used (only at the interactive prompt?) or is some required bit of GNU readline not exposed through Python's readline module? Skip From tim.one at home.com Sun Jan 7 11:09:02 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 7 Jan 2001 05:09:02 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <20010106230125.A29058@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > For some work I'm doing right now, it would be very useful if > there were a way to query the end-of-file status of a file > descriptor without actually doing a read. > > I don't see this ability anywhere in the 2.0 API. When someone says "API", I think "C API". In that case you can use feof(stream) directly, or whatever the heck your platform supports for handles (_eof(handle) on Windows, which I know is an OS you're secretly longing to master ). I don't believe there's a way to find out from Python short of trying to read, though. Well, I suppose you could try to compare f.tell() to the size, if you knew that f.tell() and "the size" made sense for f ... > 1. Am I missing something obvious? I don't know! I never asked Guido about this, and given that he's not on vacation now I'm not allowed to channel him. I would hazard a guess, though, that he thinks "you do or don't get something back when you read" is clearer than "you may or may not get something back when you read, regardless of which answer I give you in response to .eof() -- depending". The latter is particularly muddy in a threaded environment, even for plain old disk files. > 2. If the answer to 1 is that I am not, in fact, being a dumbass, > what is the right way to support this? The obvious alternatives > are an eof member (analogous to the existing `closed' member, or > an eof() method. I favor the latter. > > 3. If we agree on a design, I'm willing to implement this at least > for Unix. Should be a small project. I agree an .eof() method would be better than a data member. Note that whenever Python internals hit stream EOF today, they call clearerr(), so simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to make sure that feof() would never be useful <0.8 wink>. one-of-life's-little-mysteries-ly y'rs - tim From gstein at lyra.org Sun Jan 7 11:46:54 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 02:46:54 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.96,2.97 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Fri, Jan 05, 2001 at 06:43:07AM -0800 References: Message-ID: <20010107024654.W17220@lyra.org> On Fri, Jan 05, 2001 at 06:43:07AM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv3183 > > Modified Files: > fileobject.c > Log Message: > Restructured get_line() for clarity and speed. > > - The raw_input() functionality is moved to a separate function. > > - Drop GNU getline() in favor of getc_unlocked(), which exists on more > platforms (and is even a tad faster on my system). The "configure" tests for getline() can be punted if we won't use it any more... Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Jan 7 13:27:57 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 04:27:57 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101052014.PAA20328@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 05, 2001 at 03:14:41PM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <20010107042757.X17220@lyra.org> It feels wrong. Whatever happened to the "we're all adults here" mantra. Besides people asking for it, what is a good reason *for* it to be added? Cheers, -g On Fri, Jan 05, 2001 at 03:14:41PM -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). > > I like it. This has been asked for many times. Does anybody see a > reason why this should *not* be added? > > Tim remarked that introducing this will prompt demands for a similar > feature on classes and instances, where it will be hard to implement > without causing a bit of a slowdown. It causes a slight slowdown (an > extra dictionary lookup for each use of "M.v") even when it is not > used, but for accessing module variables that's acceptable. I'm not > so sure about instance variable references. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Greg Stein, http://www.lyra.org/ From guido at python.org Sun Jan 7 17:52:11 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 11:52:11 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sat, 06 Jan 2001 23:01:25 EST." <20010106230125.A29058@thyrsus.com> References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> <20010106230125.A29058@thyrsus.com> Message-ID: <200101071652.LAA31411@cj20424-a.reston1.va.home.com> > This mess reminds me. For some work I'm doing right now, it would be > very useful if there were a way to query the end-of-file status of a > file descriptor without actually doing a read. I hope you really mean file object (== wrapper around stdio FILE object). A file descriptor (small little integer in Unix) doesn't have a way to find this out. Even for file objects, it is typically only known that there's an EOF condition after a lowest-level read operation returned 0 bytes. So in effect you must still do a read in order to determine EOF status. I just ran a small test program, and fread() appears to set the eof status when it returns a short count. Normally, Python's read() uses fread() so this might be useful. However after a readline(), you can't know the eof status (unless the last line of the file doesn't end in a newline). > I don't see this ability anywhere in the 2.0 API. Questions: > > 1. Am I missing something obvious? > > 2. If the answer to 1 is that I am not, in fact, being a dumbass, what > is the right way to support this? The obvious alternatives are an > eof member (analogous to the existing `closed' member, or an eof() > method. I favor the latter. > > 3. If we agree on a design, I'm willing to implement this at least for > Unix. Should be a small project. Before adding an eof() method, can you explain what your program is trying to do? Is it reading from a pipe or socket? Then select() or poll() might be useful. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Sun Jan 7 19:30:32 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 13:30:32 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: ; from tim.one@home.com on Sun, Jan 07, 2001 at 05:09:02AM -0500 References: <20010106230125.A29058@thyrsus.com> Message-ID: <20010107133032.F4586@thyrsus.com> Tim Peters : > I agree an .eof() method would be better than a data member. Note that > whenever Python internals hit stream EOF today, they call clearerr(), so > simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to > make sure that feof() would never be useful <0.8 wink>. That's inconvenient, but only means the internal Python state flag that feof() would inspect would have to be checked after each read. -- Eric S. Raymond "...The Bill of Rights is a literal and absolute document. The First Amendment doesn't say you have a right to speak out unless the government has a 'compelling interest' in censoring the Internet. The Second Amendment doesn't say you have the right to keep and bear arms until some madman plants a bomb. The Fourth Amendment doesn't say you have the right to be secure from search and seizure unless some FBI agent thinks you fit the profile of a terrorist. The government has no right to interfere with any of these freedoms under any circumstances." -- Harry Browne, 1996 USA presidential candidate, Libertarian Party From esr at thyrsus.com Sun Jan 7 19:45:41 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 13:45:41 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101071652.LAA31411@cj20424-a.reston1.va.home.com>; from guido@python.org on Sun, Jan 07, 2001 at 11:52:11AM -0500 References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> <20010106230125.A29058@thyrsus.com> <200101071652.LAA31411@cj20424-a.reston1.va.home.com> Message-ID: <20010107134541.G4586@thyrsus.com> Guido van Rossum : > > This mess reminds me. For some work I'm doing right now, it would be > > very useful if there were a way to query the end-of-file status of a > > file descriptor without actually doing a read. > > I hope you really mean file object (== wrapper around stdio FILE > object). A file descriptor (small little integer in Unix) doesn't > have a way to find this out. You're right, my bad. > Even for file objects, it is typically only known that there's an EOF > condition after a lowest-level read operation returned 0 bytes. So in > effect you must still do a read in order to determine EOF status. > > I just ran a small test program, and fread() appears to set the eof > status when it returns a short count. Normally, Python's read() uses > fread() so this might be useful. However after a readline(), you > can't know the eof status (unless the last line of the file doesn't > end in a newline). I considered trying a zero-length read() in Python, but this strikes me as inelegant even if it would work. > Before adding an eof() method, can you explain what your program is > trying to do? Is it reading from a pipe or socket? Then select() or > poll() might be useful. Sadly, it's exactly the wrong case. Hmmm...omitting irrelevant details, it's a situation where a markup file can contain sections in two different languages. The design requires the first interpreter to exit on seeing either EOF or a marker that says "switching to second language". For reasons too compllicated to explain, it would be best if the parser for the first language didn't simply call the second parser. The logic I wanted to write amounts to: while 1: line = fp.readline() if not line or line == "history": break interpret_in-language_1(line) if not fp.feof() while 1: line = fp.readline() if not line: break interpret_in-language_2(line) I just tested the zero-length-read method. That worked. I guess I'll use it. -- Eric S. Raymond "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom." -- John F. Kennedy From martin at loewis.home.cs.tu-berlin.de Sun Jan 7 19:45:15 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 7 Jan 2001 19:45:15 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? Message-ID: <200101071845.f07IjFi01249@mira.informatik.hu-berlin.de> Authors of extension packages often find the need to auto-import some of their modules. This is often needed for registration, e.g. a codec author (like Tamito KAJIYAMA, who wrote the JapaneseCodecs package) may need to register a search function with codecs.register. This is currently only possible by writing into sitecustomize.py, which must be done by the system administrator manually. To enhance the service of site.py, I've written the patch http://sourceforge.net/patch/?func=detailpatch&patch_id=103134&group_id=5470 which treats lines in PTH files which start with "import" as statements and executes them, instead of appending these lines to sys.path. The patch is relatively small, but since it is an extension: Do I need to write a PEP for it? Regards, Martin From tismer at tismer.com Sun Jan 7 19:05:21 2001 From: tismer at tismer.com (Christian Tismer) Date: Sun, 07 Jan 2001 20:05:21 +0200 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> <20010105152058.A6016@glacier.fnational.com> <14935.11870.360839.235102@beluga.mojam.com> Message-ID: <3A58AFE1.3AB619BD@tismer.com> Skip Montanaro wrote: > > Neil> I think you, Skip and Moshe are missing a big advantage of having > Neil> the __exports__ mechanism. It should allow some attribute access > Neil> inside of modules to become faster (like LOAD_FAST for locals). I > Neil> think that optimization could be implemented without too much > Neil> difficultly. > > True enough, that hadn't occurred to me. Knowing that now, I still don't > think consistency of the interface should suffer as a result of > under-the-covers performance gains. Ok, vice versa: Given that we can support access control via __exports__ for modules, classes and instances as well, *and* if we can think up a scheme that allows a LOAD_FAST like speedup for all of these cases at the same time, then I would say +1, otherwise -0, half-hearted solution. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Sun Jan 7 22:13:01 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 16:13:01 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sun, 07 Jan 2001 13:30:32 EST." <20010107133032.F4586@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> Message-ID: <200101072113.QAA32467@cj20424-a.reston1.va.home.com> > Tim Peters : > > I agree an .eof() method would be better than a data member. Note that > > whenever Python internals hit stream EOF today, they call clearerr(), so > > simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to > > make sure that feof() would never be useful <0.8 wink>. > [ESR] > That's inconvenient, but only means the internal Python state flag > that feof() would inspect would have to be checked after each read. This was done because some platforms set feof() when there's still a possibity to read more (e.g. after an interactive user typed ^D), while others don't. It's inconvenient to get an endless stream of EOFs from stdin when a user typed ^D to one particular prompt, so I decided to clear the EOF status. [ESR in a later message] > I considered trying a zero-length read() in Python, but this strikes me > as inelegant even if it would work. I doubt that a zero-length read conveys any information. It should return "" whether or not there is more to read! Plus, look at the implementation of readline() (file_readline() in Objects/fileobject.c): it shortcuts the n == 0 case and returns an empty string without touching the file. [me] > > Before adding an eof() method, can you explain what your program is > > trying to do? Is it reading from a pipe or socket? Then select() or > > poll() might be useful. [ESR again] > Sadly, it's exactly the wrong case. Hmmm...omitting irrelevant details, > it's a situation where a markup file can contain sections in two different > languages. The design requires the first interpreter to exit on seeing > either EOF or a marker that says "switching to second language". For > reasons too compllicated to explain, it would be best if the parser for > the first language didn't simply call the second parser. > > The logic I wanted to write amounts to: > > while 1: > line = fp.readline() > if not line or line == "history": > break > interpret_in-language_1(line) > > if not fp.feof() > while 1: > line = fp.readline() > if not line: > break > interpret_in-language_2(line) > > I just tested the zero-length-read method. That worked. I guess I'll > use it. Bizarre (given what I know about zero-length read). But in the above code, you can replace "if not fp.feof()" with "if line". In other words, you just have to carry the state over within your program. So, I see no reason why the logic in your program couldn't take care of this, which in general is a preferred way to solve a problem than to change the language. Also note that in Python it's no sin to attempt to read a line even when the file is already at EOF -- you will simply get an empty line again. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Sun Jan 7 22:29:46 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sun, 7 Jan 2001 22:29:46 +0100 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> Message-ID: <035901c078f0$f6180f70$e46940d5@hagrid> Guido van Rossum wrote: > Bizarre (given what I know about zero-length read). But in the above > code, you can replace "if not fp.feof()" with "if line". In other > words, you just have to carry the state over within your program. and if that's too hard, just hide the state in a class: class FileWrapper: def __init__(self, file): self.__file = file self.__line = None def __more(self): # try reading another line if not self.__line: self.__line = self.__file.readline() def eof(self): self.__more() return not self.__line def readline(self): self.__more() line = self.__line self.__line = None return line file = open("myfile.txt") file = FileWrapper(file) while not file.eof(): print repr(file.readline()) From guido at python.org Sun Jan 7 22:32:26 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 16:32:26 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: Your message of "Sat, 06 Jan 2001 22:16:31 EST." References: Message-ID: <200101072132.QAA32627@cj20424-a.reston1.va.home.com> > I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. > > test_builtin fails because raw_input() isn't stripping a trailing newline. > I've got my own code in this area that *may* be to blame, but I don't see > how it could be. I note that fileobject.c's new function get_line_raw has > the comment > > /* Internal routine to get a line for raw_input(): > strip trailing '\n', raise EOFError if EOF reached immediately > */ > > but the code doesn't look for a trailing newline (let alone strip one). My bad. Try the latest CVS now. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Sun Jan 7 23:15:27 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 17:15:27 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101072113.QAA32467@cj20424-a.reston1.va.home.com>; from guido@python.org on Sun, Jan 07, 2001 at 04:13:01PM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> Message-ID: <20010107171527.A5093@thyrsus.com> Guido van Rossum : > [ESR in a later message] > > I considered trying a zero-length read() in Python, but this strikes me > > as inelegant even if it would work. > > I doubt that a zero-length read conveys any information. It should > return "" whether or not there is more to read! Duh. Of course it would. You know, I've always been half-consciously dissatisfied with Python's use of "" as an EOF marker, and now I know why. It's precisely because there's no way to distinguish these cases. I think a zero-length read ought to return "" and a read on EOF ought to return None. > Bizarre (given what I know about zero-length read). But in the above > code, you can replace "if not fp.feof()" with "if line". In other > words, you just have to carry the state over within your program. > > So, I see no reason why the logic in your program couldn't take care > of this, which in general is a preferred way to solve a problem than > to change the language. OK, two objections, one practical and one (more important) esthetic: Practical: I guess I oversimplified the code for expository purposes. What's actually going on is that I have two parser classes both based on shlex -- they do character-at-a-time input and don't actually *have* accessible line buffers. Esthetic: Yes, I can have the first parser set a flag, or return some EOF token. But this seems deeply wrong to me, because EOFness is not a property of the parser but of the underlying stream object. It seems to me that my program ought to be able to ask the stream object whether it's at EOF rather than carrying its own flag for that state. In Python as it is, there's no clean way to do this. I'd have to do a nonzero-length read to test it (I failed to check the right alternate case before when I tried zero-length). That's really broken. What if the neither the underlying stream nor the parser supports pushback? Do you see now why I think this is a more general issue? Now, another and more general way to handle this would be to make an equivalent of the old FIONCLEX ioctl part of Python's standard set of file object methods -- a way to ask "how many bytes are ready to be read in this stream? Trivial to make it work for plain files, of course. Harder to make it work usefully for pipes/fifos/sockets/terminals. Having it pass up the results of the fstat.size field (corrected for the current seek address if you're reading a plain file) would be a good start. -- Eric S. Raymond Live free or die; death is not the worst of evils. -- General George Stark. From tismer at tismer.com Sun Jan 7 23:37:55 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 08 Jan 2001 00:37:55 +0200 Subject: [Python-Dev] ANN: Stackless Python 2.0 Message-ID: <3A58EFC3.5A722FF0@tismer.com> Dear community, I'm happy to announce that Stackless Python 2.0 is finally ready and available for download. Stackless Python for Python 1.5.2+ also got some minor enhancements. Both versions are available as Win32 installer files here: http://www.stackless.com/spc20-win32.exe http://www.stackless.com/spc15-win32.exe Speed: Stackless Python for Python 2.0 is again a bit faster than the original. This time even better: About 9-10 percent. I have to say that optimization was much harder this time. My speed patches are now done by a Python script, which will make maintenance and diff reading much easier in the future. There is now also a bit of example code available, like the uthread9.py Microthreads module from Will Ware, Just van Rossum, and Mike Fletcher. Source code and an update to the website will become available in the next days. enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal at lemburg.com Mon Jan 8 01:26:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 01:26:00 +0100 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow References: Message-ID: <3A590918.E90031AA@lemburg.com> Tim Peters wrote: > > I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. test_charmapcodec is my fault... I should run the tests in a clean room environment before checkin: my PYTHONPATH picked up some other file which it was not supposed to do. I'll fix it next week. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 8 05:13:26 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 7 Jan 2001 23:13:26 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Message-ID: The "Win32" readline() hack is now checked in, but there's really nothing Win32-specific about it anymore. It makes one mild assumption about what the C std doesn't clearly address but may have intended: that in case of a non-NULL return, fgets doesn't overwrite any of the buffer positions beyond the terminating null byte (the std is clear that it doesn't overwrite anything at all in case of a NULL-because-EOF return, but I can't say whether they're pointing that out as a consequence, or pointing that out as an exception). I'm curious about how it performs (relative to the getc_unlocked hack) on other platforms. If you'd like to try that, just recompile fileobject.c with USE_MS_GETLINE_HACK #define'd. It should *work* on any platform with fgets() meeting the assumption. The new test_bufio.py std test gives it a pretty good correctness workout, if you're worried about that. From esr at snark.thyrsus.com Mon Jan 8 05:16:53 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 23:16:53 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge Message-ID: <200101080416.f084GrM10912@snark.thyrsus.com> Setting things up so curses is autoconfigured into the default build if your system has it in the expected places turned out to be dead easy. Some clever person (the BDFL himself?) wrote the build process so that there is *already* a Setup.config.in that gets configure expansions done on it, with the generated Setup.config used when makesetup does its magic. As a bonus, I've also added autoconfiguration for readline. A small detail, but one which I suspect many people building their own Pythons frequently trip over. The technique generalizes easily. The archetype for a facility for autoconfiguring libfoo with a Python extension foo.c if it's present has just two steps: Add this to Modules/Setup.config.in: @USE_FOO_MODULE at foo foo.c -lfoo Add this to configure.in: # This is used to generate Setup.config AC_SUBST(USE_FOO_MODULE) AC_CHECK_LIB(foo, random_foo_function, [USE_FOO_MODULE=""], [USE_FOO_MODULE="#"]) (Apologies for the lack of description with the patch. I tripped over a SourceForge interface bug.) -- Eric S. Raymond The possession of arms by the people is the ultimate warrant that government governs only with the consent of the governed. -- Jeff Snyder From tim.one at home.com Mon Jan 8 06:34:20 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 00:34:20 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <3A590918.E90031AA@lemburg.com> Message-ID: An update: test_builtin works again (thanks, Guido!), and test_charmapcodec will "next week" (thanks, MAL!). Still unknown (to me): is the test_pow failure unique to Windows? One response from a Unix(tm) geek would settle that. From nas at arctrix.com Sun Jan 7 23:59:49 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 14:59:49 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 12:34:20AM -0500 References: <3A590918.E90031AA@lemburg.com> Message-ID: <20010107145949.A14166@glacier.fnational.com> On Mon, Jan 08, 2001 at 12:34:20AM -0500, Tim Peters wrote: > Still unknown (to me): is the test_pow failure unique to Windows? One > response from a Unix(tm) geek would settle that. It works fine for me on Linux. I thought I tested on Windows before checking in the coerce patch. I'll try again. Neil From nas at arctrix.com Mon Jan 8 00:29:14 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:29:14 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107145949.A14166@glacier.fnational.com>; from nas@arctrix.com on Sun, Jan 07, 2001 at 02:59:49PM -0800 References: <3A590918.E90031AA@lemburg.com> <20010107145949.A14166@glacier.fnational.com> Message-ID: <20010107152914.A14228@glacier.fnational.com> On Sun, Jan 07, 2001 at 02:59:49PM -0800, Neil Schemenauer wrote: > It works fine for me on Linux. I thought I tested on Windows > before checking in the coerce patch. I'll try again. Wierd. rt.bat does not run the test_pow script. If I run "regrtet test_pow" then the test fails. It could be a problem with line endings (I copied the source for a Unix CVS checkout). Anyhow, I found the bug. I don't know how test_pow was passing under Linux. Time to reboot again. Neil From tim.one at home.com Mon Jan 8 07:39:20 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 01:39:20 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107152914.A14228@glacier.fnational.com> Message-ID: [NeilS] > Wierd. rt.bat does not run the test_pow script. Works for me, else I never would have noticed . Also works for me in single-test mode: C:\Code\python\dist\src\PCbuild>rt test_pow C:\Code\python\dist\src\PCbuild>python ../lib/test/regrtest.py test_pow test_pow The actual stdout doesn't match the expected stdout. This much did match (between asterisk lines): ********************************************************************** test_pow Testing integer mode... Testing 2-argument pow() function... Testing 3-argument pow() function... Testing long integer mode... Testing 2-argument pow() function... Testing 3-argument pow() function... Testing floating point mode... Testing 3-argument pow() function... The number in both columns should match. 3 3 -5 -5 -1 -1 5 5 -3 -3 -7 -7 3L 3L -5L -5L -1L -1L 5L 5L -3L -3L -7L -7L 3.0 3.0 -5.0 -5.0 -1.0 -1.0 -7.0 -7.0 ********************************************************************** Then ... We expected (repr): '' But instead we got: 'Float mismatch:' test test_pow failed -- Writing: 'Float mismatch:', expected: '' 1 test failed: test_pow C:\Code\python\dist\src\PCbuild> That may point to the problem, too: the canned output file is truncated? > If I run "regrtet test_pow" then the test fails. It could be a > problem with line endings (I copied the source for a Unix CVS > checkout). Don't understand; e.g., "copied" what, from where to where? I'm not sure I gave you write access to my box, and hacking into Windows machines is uncool because it's not challenging . > Anyhow, I found the bug. I don't know how test_pow was passing > under Linux. Time to reboot again. Cool! BTW, Windows solves the "don't reboot enough" problem for you via automation, sometimes on an hourly basis. Thanks for sharing the brain cells, Neil! From thomas at xs4all.net Mon Jan 8 07:44:11 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 07:44:11 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101080416.f084GrM10912@snark.thyrsus.com>; from esr@snark.thyrsus.com on Sun, Jan 07, 2001 at 11:16:53PM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> Message-ID: <20010108074411.N2467@xs4all.nl> On Sun, Jan 07, 2001 at 11:16:53PM -0500, Eric S. Raymond wrote: > Setting things up so curses is autoconfigured into the default build > if your system has it in the expected places turned out to be dead > easy. Some clever person (the BDFL himself?) wrote the build process > so that there is *already* a Setup.config.in that gets configure > expansions done on it, with the generated Setup.config used when > makesetup does its magic. Skip, actually, IIRC. It was added in the last stages of 2.0 development, to auto-detect bsddb. However, I still think it should be a separate 'configure', in the Modules directory. Especially now that Andrew is practically checking in the distutils setup ;) The main configure can make an educated guess whether Python and distutils are available, and call configure with some passed-through options if not. It does depend on what the distutils setup does, though, and I'll shamefully admit that I haven't looked at that ;P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas at arctrix.com Mon Jan 8 00:51:16 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:51:16 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 01:39:20AM -0500 References: <20010107152914.A14228@glacier.fnational.com> Message-ID: <20010107155116.A14312@glacier.fnational.com> On Mon, Jan 08, 2001 at 01:39:20AM -0500, Tim Peters wrote: > [NeilS] > > If I run "regrtet test_pow" then the test fails. It could be a > > problem with line endings (I copied the source for a Unix CVS > > checkout). > > Don't understand; e.g., "copied" what, from where to where? I should have been clearer. I mean the problem with rt.bat not running test_pow. I copied the CVS source from my Linux ext2 filesystem to a VFAT filesystem. I was too lazy to fix the line endings. Neil From nas at arctrix.com Mon Jan 8 00:52:38 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:52:38 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107152914.A14228@glacier.fnational.com>; from nas@arctrix.com on Sun, Jan 07, 2001 at 03:29:14PM -0800 References: <3A590918.E90031AA@lemburg.com> <20010107145949.A14166@glacier.fnational.com> <20010107152914.A14228@glacier.fnational.com> Message-ID: <20010107155238.A14291@glacier.fnational.com> On Sun, Jan 07, 2001 at 03:29:14PM -0800, Neil Schemenauer wrote: > I don't know how test_pow was passing under Linux. Under Linux with the buggy float_pow: >>> pow(10.0, 0, 10) nan >>> pow(10.0, 0, 10) == 1 1 >>> pow(10.0, 0, 10) == 0 1 Under Windows NAN obviously behaves differently. floating-point-is-fun-ly y'rs Neil From esr at thyrsus.com Mon Jan 8 07:49:45 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 01:49:45 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108074411.N2467@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 08, 2001 at 07:44:11AM +0100 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> Message-ID: <20010108014945.A19516@thyrsus.com> Thomas Wouters : > On Sun, Jan 07, 2001 at 11:16:53PM -0500, Eric S. Raymond wrote: > > Setting things up so curses is autoconfigured into the default build > > if your system has it in the expected places turned out to be dead > > easy. Some clever person (the BDFL himself?) wrote the build process > > so that there is *already* a Setup.config.in that gets configure > > expansions done on it, with the generated Setup.config used when > > makesetup does its magic. > > Skip, actually, IIRC. It was added in the last stages of 2.0 development, to > auto-detect bsddb. However, I still think it should be a separate > 'configure', in the Modules directory. You may be right. Still, this patch solves the immediate problem in a reasonably clean way, and I urge that it should go in. We can do a more complete reorganization of the build process later. (I'll help with that; I'm pretty expert with autoconf and friends.) -- Eric S. Raymond "As to the species of exercise, I advise the gun. While this gives [only] moderate exercise to the body, it gives boldness, enterprise, and independence to the mind. Games played with the ball and others of that nature, are too violent for the body and stamp no character on the mind. Let your gun, therefore, be the constant companion to your walks." -- Thomas Jefferson, writing to his teenaged nephew. From tim.one at home.com Mon Jan 8 08:05:46 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 02:05:46 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: Well, I like __exports__ (but not some details of the patch, for which see my SF comments). Guido is aware of the optimization possibilities, but that's not what's driving it. I don't know why he likes it; I like it because the only normal use for a module is to do module.attr, or "from module import attr", and dir(module) very often exposes stuff today that the module author had no intention of exporting. For example, if I do import os dir(os) under CVS Python today, on my box I see that os exports "i". It's bound to _exit. That's baffling, and is purely an accident of how module os.py initialization works when you're running on Windows. Couple that with that I've hardly ever seen (or bothered to write) a module docstring spelling out everything a module *intends* to export, and an __exports__ line near the top (when present) would also automagically give a solid answer to that question. modules aren't classes or instances, and in normal practice modules accumulate all sorts of accidental attrs (due to careless (== normal) imports, and module init code). It doesn't make any *sense* that os exports "sys" either, or that random exports "cos", or that cgi exports "string", or ... this inelegance is ubiquitous. In a world with an __exports__ that gets used, though, I do wonder whether people will or won't export their test() functions. I really like that they do now. or-maybe-it's-just-that-i-like-modules-that-*have*-a- test-function-ly y'rs - tim From gstein at lyra.org Mon Jan 8 08:25:32 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 23:25:32 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 02:05:46AM -0500 References: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <20010107232532.V17220@lyra.org> On Mon, Jan 08, 2001 at 02:05:46AM -0500, Tim Peters wrote: >... > modules aren't classes or instances, and in normal practice modules > accumulate all sorts of accidental attrs (due to careless (== normal) > imports, and module init code). It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. Simple question: so what? "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Mon Jan 8 08:29:39 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 02:29:39 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107155238.A14291@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Under Linux with the buggy float_pow: > > >>> pow(10.0, 0, 10) > nan > >>> pow(10.0, 0, 10) == 1 > 1 > >>> pow(10.0, 0, 10) == 0 > 1 > > Under Windows NAN obviously behaves differently. Comparisons with NaN are a platform-dependent accident, partly because some C compilers generate nonsense code, partly because Python isn't coded to cater to NaN's peculiarities either. The behavior under Windows is (accidentally) better in these cases today (NaN should never compare equal to anything -- not even to itself -- and, curiously, MSVC's codegen mistakes cancel out Python's mistakes in this case!). Thank you for fixing the bug. Only test_charmapcodec is failing for me now, and MAL knows the cause and cure. nothing-can-stop-the-alpha-now-ly y'rs - tim From thomas at xs4all.net Mon Jan 8 08:42:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 08:42:30 +0100 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 02:29:39AM -0500 References: <20010107155238.A14291@glacier.fnational.com> Message-ID: <20010108084230.O2467@xs4all.nl> On Mon, Jan 08, 2001 at 02:29:39AM -0500, Tim Peters wrote: > (NaN should never compare equal to anything -- not even to itself You know that's impossible, in Python, right ? (Due to the shortcut taken by '==', based on object identity.) Is that going to be 'fixed', too ? :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ping at lfw.org Mon Jan 8 08:51:11 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 7 Jan 2001 23:51:11 -0800 (PST) Subject: [Python-Dev] inspect.py In-Reply-To: Message-ID: Hi again. Sorry to bother you if you're busy -- i haven't seen any responses about inspect.py for a few days and wanted to know what your reactions were. The module and test suite are still at: http://www.lfw.org/python/inspect.py http://www.lfw.org/python/test_inspect.py The only change since my announcement last Wednesday is that getframe() has been renamed to getframeinfo(). Thanks, -- ?!ng "Old code doesn't die -- it just smells that way." -- Bill Frantz From tim.one at home.com Mon Jan 8 09:17:57 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:17:57 -0500 Subject: NaN nonsense (was RE: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow) In-Reply-To: <20010108084230.O2467@xs4all.nl> Message-ID: >> (NaN should never compare equal to anything -- not even to itself [Thomas Wouters] > You know that's impossible, in Python, right ? (Due to the > shortcut taken by '==', based on object identity.) Surely you jest: I probably knew that while you were still nursing . OTOH, Python on WinTel comes remarkably close (by accident): C:\Code\python\dist\src\PCbuild>python Python 2.0 (#8, Jan 5 2001, 00:33:19) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> inf = 1e300**2 >>> inf 1.#INF >>> nan = inf - inf >>> nan -1.#IND >>> nan2 = nan * 1.0 >>> nan2 -1.#IND >>> nan == nan2 0 >>> > Is that going to be 'fixed', too ? :) Not if I can help it. I'd be in favor of adding an fcmp function that needs to be called explicitly when you want the full complexity of 754 comparisons. Count them all up, and there are 32 distinct 754 binary float comparison operators! The 754 std says 26 (from memory, may be 2 more or less) of those have to be supplied, but-- since 754 is not a language std --says nothing about how they're to be spelled. OTOH, C99 resolutely tries to map that into C, and 754 True Believers will use that as a club. On the third hand, as Tom MacDonald posted here earlier (he was X3J11 chair), he's not sure anyone will ever implement C99 in whole. The complexities of full 754 support are a large part of why he worries about that. too-much-too-late-ly y'rs - tim From tim.one at home.com Mon Jan 8 09:17:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:17:59 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010107232532.V17220@lyra.org> Message-ID: [Greg Stein] > Simple question: so what? > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Couldn't care less about the module author. It's the module user who has to sort this stuff out. "Don't use 'import *'" is good advice but not followed either, and after I do from MyPackage import sys # intentionally exports its own sys from GregSnort import * # accidentally exports some other sys madness ensues. Like I said, it's inelegant, and at best. Simple question for you: what would __exports__ hurt? "Oh, no! Tim's module explicitly lists what it intended to export! Oh, woe is me!". Gimme a break. From gstein at lyra.org Mon Jan 8 09:26:03 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 00:26:03 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:17:59AM -0500 References: <20010107232532.V17220@lyra.org> Message-ID: <20010108002603.X17220@lyra.org> On Mon, Jan 08, 2001 at 03:17:59AM -0500, Tim Peters wrote: > [Greg Stein] > > Simple question: so what? > > > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* > > Couldn't care less about the module author. It's the module user who has to > sort this stuff out. "Don't use 'import *'" is good advice but not followed > either, and after I do > > from MyPackage import sys # intentionally exports its own sys > from GregSnort import * # accidentally exports some other sys > > madness ensues. Like I said, it's inelegant, and at best. > > Simple question for you: what would __exports__ hurt? "Oh, no! Tim's > module explicitly lists what it intended to export! Oh, woe is me!". Gimme > a break. hehe... adding __exports__ to your module is fine. Adding more crud to Python, in opposition to the "we're all adults" motto, doesn't seem Right. Somebody wants to use "from foo import *" on a module not designed for it? Too bad for them. If you're suggesting __exports__ is to patch over problems caused by "from foo import *", then I think you're barking up the wrong tree :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at zadka.site.co.il Mon Jan 8 17:50:57 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 8 Jan 2001 18:50:57 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010107232532.V17220@lyra.org> References: <20010107232532.V17220@lyra.org>, <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> [Tim Peters] > modules aren't classes or instances, and in normal practice modules > accumulate all sorts of accidental attrs (due to careless (== normal) > imports, and module init code). It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. [Greg Stein] > Simple question: so what? > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Let me "me to" here: Put another way, what Greg said is just a rephrase of "don't use from foo import * unless foo's docos say it's OK". Add to that the simple access control of a leading underscore, and I don't see any place which needs it. Something better to do would be to use import foo as _foo In some standard library modules, and minimize using from foo import bar in them. Since everyone know that leading underscore means "implementation detail - ignore at your convenience, use at yor peril", this would keep the "we're all adults" philosophy of Python, with all the advantages *I* see in __exports__. One more point against __exports__, which I hoped I would not have to make (but when I'm up against the timbot *and* Guido, I need to pull out the heavy artillery): it would *totally* stop any hope in the future of module level __getattr__ (or at least complicate the semantics). I think Alex M. is thinking of a PEP, but he's taking his time, since no PEPs can be considered until 2.1 is out. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Mon Jan 8 09:49:58 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:49:58 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010108002603.X17220@lyra.org> Message-ID: [Greg Stein] > hehe... adding __exports__ to your module is fine. Adding more > crud to Python, in opposition to the "we're all adults" motto, > doesn't seem Right. My idea of what's Right is copied from my boss . > Somebody wants to use "from foo import *" on a module not designed > for it? Too bad for them. How is someone supposed to know whether a module "was designed" for import*? Even Tkinter (which just about everyone does "import *" on) also exports sys, and everything from the "types" module, by accident too. > If you're suggesting __exports__ is to patch over problems > caused by "from foo import *", then I think you're barking up the > wrong tree > :-) Indeed. But I'm suggesting that the problems that *can* arise from "import*" illustrate the fundamental silliness of exporting things by accident. It's come up much more often for me when I'm looking over someone's shoulder, teaching them how to use dir() in an interactive shell to answer their own damn questions <0.5 wink>. It's usually the case that dir(M) shows them something that isn't documented, and over time I am *not* pleased that "oh, I guess the 'string' in there is just crap" is how they learn to view it. I can live without __exports__; but I'd prefer not to, because I would always use it if it were there. if-i'd-both-use-it-and-heartily-recommend-it-it's-hard-to- oppose-it-ly y'rs - tim From m.favas at per.dem.csiro.au Mon Jan 8 12:48:40 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Mon, 08 Jan 2001 19:48:40 +0800 Subject: [Python-Dev] _cursesmodule.c clobbered since Christmas Message-ID: <3A59A918.E0D02E0D@per.dem.csiro.au> I last successfully downloaded from CVS, compiled, linked and tested on Dec. 22 last year. For the last week or so, the current CVS _cursesmodule.c gives a bunch of compiler warning messages of the form: cc: Warning: ./_cursesmodule.c, line 619: In this statement, "derwin(...)" of ty pe "int", is being converted to "pointer to struct _win_st". (cvtdiftypes) win = derwin(self->win,nlines,ncols,begin_y,begin_x); --^ cc: Warning: ./_cursesmodule.c, line 1259: In this statement, "subpad(...)" of t ype "int", is being converted to "pointer to struct _win_st". (cvtdiftypes) win = subpad(self->win, nlines, ncols, begin_y, begin_x); ----^ cc: Warning: ./_cursesmodule.c, line 1488: In this statement, "termname(...)" of type "int", is being converted to "pointer to const char". (cvtdiftypes) NoArgReturnStringFunction(termname) ^ (more elided) and cc: Warning: ./_cursesmodule.c, line 305: The scalar variable "arg1" is fetched but not initialized. And there may be other such fetches of this variable that have not been reported in this compilation. (uninit1) Window_NoArg2TupleReturnFunction(getparyx, int, "(ii)") ^ cc: Warning: ./_cursesmodule.c, line 305: The scalar variable "arg2" is fetched but not initialized. And there may be other such fetches of this variable that have not been reported in this compilation. (uninit1) Window_NoArg2TupleReturnFunction(getparyx, int, "(ii)") ^ (more elided) and at link time, fails with: ld: Unresolved: getbegyx getmaxyx getparyx I've held off bothering anyone about this, but it begins to look as though no-one else has noticed... My platform? Tru64 Unix, V4.0F (aka OSF1). The recent pow() bug hit this platform, too. Happy to do any testing... -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From guido at python.org Mon Jan 8 15:27:50 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 09:27:50 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: Your message of "Mon, 08 Jan 2001 01:49:45 EST." <20010108014945.A19516@thyrsus.com> References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> Message-ID: <200101081427.JAA03146@cj20424-a.reston1.va.home.com> > You may be right. Still, this patch solves the immediate problem in a > reasonably clean way, and I urge that it should go in. We can do a > more complete reorganization of the build process later. (I'll help with > that; I'm pretty expert with autoconf and friends.) I expect Andrew's code to go in before 2.1 is released. So I don't see a reason why we should hurry and check in a stop-gap measure. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 8 15:33:09 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 09:33:09 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Mon, 08 Jan 2001 00:26:03 PST." <20010108002603.X17220@lyra.org> References: <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> Message-ID: <200101081433.JAA03185@cj20424-a.reston1.va.home.com> > hehe... adding __exports__ to your module is fine. Adding more crud to > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > Somebody wants to use "from foo import *" on a module not designed for it? > Too bad for them. If you're suggesting __exports__ is to patch over problems > caused by "from foo import *", then I think you're barking up the wrong tree > :-) You haven't been answering many newbie questions lately, have you? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 8 16:06:28 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 10:06:28 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sun, 07 Jan 2001 17:15:27 EST." <20010107171527.A5093@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> Message-ID: <200101081506.KAA03404@cj20424-a.reston1.va.home.com> > > So, I see no reason why the logic in your program couldn't take care > > of this, which in general is a preferred way to solve a problem than > > to change the language. > > OK, two objections, one practical and one (more important) esthetic: > > Practical: I guess I oversimplified the code for expository purposes. > What's actually going on is that I have two parser classes both based > on shlex -- they do character-at-a-time input and don't actually > *have* accessible line buffers. And what's wrong with always starting the second parser? If the stream was at EOF it will simply process zero lines. Or does your parser have a problem with empty input? > Esthetic: Yes, I can have the first parser set a flag, or return some > EOF token. But this seems deeply wrong to me, because EOFness is not > a property of the parser but of the underlying stream object. It > seems to me that my program ought to be able to ask the stream object > whether it's at EOF rather than carrying its own flag for that state. Eric, before we go furhter, can you give an exact definition of EOFness to me? > In Python as it is, there's no clean way to do this. I'd have to do a > nonzero-length read to test it (I failed to check the right alternate > case before when I tried zero-length). That's really broken. What if the > neither the underlying stream nor the parser supports pushback? > > Do you see now why I think this is a more general issue? No. What's wrong with just setting the parser loose on the input and letting it deal with EOF? In your example, apparently a line containing the word "history" signals that the rest of the file must be parsed by the second parser. What if "history" is the last line of the file? The eof() test can't tell you *that*! > Now, another and more general way to handle this would be to make an > equivalent of the old FIONCLEX ioctl part of Python's standard set of > file object methods -- a way to ask "how many bytes are ready to be > read in this stream? There's no portable way to do that. > Trivial to make it work for plain files, of course. Harder to make it > work usefully for pipes/fifos/sockets/terminals. Having it pass up the > results of the fstat.size field (corrected for the current seek address > if you're reading a plain file) would be a good start. This seems totally the wrong level to solve your problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Tue Jan 9 00:13:21 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 01:13:21 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101081433.JAA03185@cj20424-a.reston1.va.home.com> References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> Message-ID: <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 09:33:09 -0500, Guido van Rossum wrote: > > hehe... adding __exports__ to your module is fine. Adding more crud to > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > caused by "from foo import *", then I think you're barking up the wrong tree > > :-) > > You haven't been answering many newbie questions lately, have you? :-) Well, I have. And frankly, I think having "from foo import *" issue a warning at 2.1 a *much* better solution. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido at python.org Mon Jan 8 16:15:20 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 10:15:20 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Tue, 09 Jan 2001 01:13:21 +0200." <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> [Greg] > > > hehe... adding __exports__ to your module is fine. Adding more crud to > > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > > caused by "from foo import *", then I think you're barking up the wrong tree > > > :-) [Guido] > > You haven't been answering many newbie questions lately, have you? :-) [Moshe] > Well, I have. > And frankly, I think having "from foo import *" issue a warning at 2.1 > a *much* better solution. (1) For what problem? (2) Under exactly what circumstances do you want from foo import * issue a warning? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jan 8 16:26:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 16:26:21 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101071845.f07IjFi01249@mira.informatik.hu-berlin.de> Message-ID: <3A59DC1D.29DE500B@lemburg.com> "Martin v. Loewis" wrote: > > Authors of extension packages often find the need to auto-import some > of their modules. This is often needed for registration, e.g. a codec > author (like Tamito KAJIYAMA, who wrote the JapaneseCodecs package) > may need to register a search function with codecs.register. This is > currently only possible by writing into sitecustomize.py, which must > be done by the system administrator manually. > > To enhance the service of site.py, I've written the patch > > http://sourceforge.net/patch/?func=detailpatch&patch_id=103134&group_id=5470 > > which treats lines in PTH files which start with "import" as > statements and executes them, instead of appending these lines to > sys.path. > > The patch is relatively small, but since it is an extension: Do I need > to write a PEP for it? Just curious: wouldn't this introduce a /tmp-style problem to Python ? The scenario is quite simple: a Python script runs under root. The script could pick up a lingering .pth file (e.g. from /tmp or one of its subdirs -- distutils does this !) and then executes arbitrary code as *root*. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Mon Jan 8 16:43:05 2001 From: jim at interet.com (James C. Ahlstrom) Date: Mon, 08 Jan 2001 10:43:05 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? Message-ID: <3A59E009.96922CA5@interet.com> There a number of problems which frequently recur on c.l.p that can serve as a source of Python improvement ideas. On December 30, 2000 gerson.kurz at t-online.de (Gerson Kurz) writes: If I embedd Python in a Win32 console application (using Demo\embed.c), everything works fine. If I take the very same piece of code and put it in a Win32 Windows application (not MFC, just a plain WinMain()) I see no output (and more importantly so, no errors), because the application does not have a stdout/stderr set up. This is well known. Windows developers must replace sys.stdout and sys.stderr with alternative mechanisms. Unfortunately this solution does not completely work because errors can occur before sys.stdout is replaced. I propose patching pythonw.exe (WinMain.c) and adding a new module to fix this so it Just Works. The patch is completely Windows specific. I am not sure if this constitutes a PEP, but would like everyone's feedback anyway. Design Requirements 1) "pythonw.exe myfile.py" will give the usual error message if myfile.py does not exist. 2) "pythonw.exe myfile.py" will give the usual traceback for a syntax error in myfile.py. 3) python.exe will provide a useful C-language stdout/stderr so the user does not have to replace sys.stdout/err herself. 4) None of the above will interfere will the user's replacement of sys.stdout/err for her own purposes. Description of Patch A new module winstdoutmodule.c (138 lines) is included in Windows builds. It contains a C entry point PyWin_StdoutReplace() which creates a valid C stdout/err, and code to display output in a popup dialog box. There is a Python entry point winstdout.print() to display output, but it is only used for special purposes, and the typical user will never import winstdout. The file WinMain.c calls PyWin_StdoutReplace() before it calls Py_Main(), and PyWin_StdoutPrint() afterwards. This is meant to display startup error messages. Normally, any available output is displayed when the system is idle. Technical Details Some experimentation (as opposed to documentation) shows that Win32 programs have a valid FILE * stdout, but fileno(stdout) gives INVALID_HANDLE_VALUE; the FILE * has an invalid OS file object. It is tempting to hack the FILE structure directly. But it is more prudent to use the only documented way to replace stdout, namely the standard call "freopen()" (also available on Unix). The design uses this call to open a temporary file to append stdout and stderr output. To display output, the file is checked when the system is idle, and MessageBox() is called with the file contents if any. Status After a few false starts, I now have working code. Is this a good idea? If so, is the implementation optimal (comments from MarkH especially welcome)? JimA From mal at lemburg.com Mon Jan 8 16:52:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 16:52:32 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <3A59E240.7F77790E@lemburg.com> Moshe Zadka wrote: > > On Mon, 08 Jan 2001 09:33:09 -0500, Guido van Rossum wrote: > > > hehe... adding __exports__ to your module is fine. Adding more crud to > > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > > caused by "from foo import *", then I think you're barking up the wrong tree > > > :-) > > > > You haven't been answering many newbie questions lately, have you? :-) > > Well, I have. > And frankly, I think having "from foo import *" issue a warning at 2.1 > a *much* better solution. Why raise a warning ? "from xyz import *" is still very useful in intercative sessions and also has some merrits when it comes to importing all subpackages of a package (well, at least those listed in __all__). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From barry at digicool.com Mon Jan 8 16:54:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 8 Jan 2001 10:54:10 -0500 Subject: [Python-Dev] Add __exports__ to modules References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> Message-ID: <14937.58018.792925.31985@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> it would *totally* stop any hope in the future of module level MZ> __getattr__ (or at least complicate the semantics). I think MZ> Alex M. is thinking of a PEP, but he's taking his time, since MZ> no PEPs can be considered until 2.1 is out. Given the current discussion, I'm now -1 on __exports__ unless a PEP is written. I think enough issues and interactions have been brought up that a PEP is warranted first. -Barry From moshez at zadka.site.co.il Tue Jan 9 01:03:00 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 02:03:00 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com>, <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 10:15:20 -0500, Guido van Rossum wrote: > (1) For what problem? Users seeing things they didn't expect in their modules. > (2) Under exactly what circumstances do you want from foo import * > issue a warning? All. If you want to be less extreme, don't warn if the module defines a __from_star_ok__ But in any case, I'm done with this thread. We'll probably won't manage to convince each other. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido at python.org Mon Jan 8 17:04:58 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 11:04:58 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Mon, 08 Jan 2001 10:54:10 EST." <14937.58018.792925.31985@anthem.wooz.org> References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <14937.58018.792925.31985@anthem.wooz.org> Message-ID: <200101081604.LAA04464@cj20424-a.reston1.va.home.com> > Given the current discussion, I'm now -1 on __exports__ unless a PEP > is written. I think enough issues and interactions have been brought > up that a PEP is warranted first. I have to agree. I am no longer championing this patch. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Mon Jan 8 17:27:17 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 8 Jan 2001 10:27:17 -0600 (CST) Subject: [Python-Dev] inspect.py In-Reply-To: References: Message-ID: <14937.60005.951163.80255@beluga.mojam.com> Ping> Sorry to bother you if you're busy -- i haven't seen any responses Ping> about inspect.py for a few days and wanted to know what your Ping> reactions were. Fiddling code bits is not the sort of stuff I do very often, but every time I do I wind up having to reacquaint myself with all sorts of object details that slip out of my brain shortly after the latest need is gone. Having a module that hides the details seems like a good idea to me. +1. I vote it go into 2.1 assuming a bit for the library reference can be written in time. Skip From akuchlin at mems-exchange.org Mon Jan 8 17:31:09 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:31:09 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101081427.JAA03146@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 09:27:50AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> Message-ID: <20010108113109.C7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 09:27:50AM -0500, Guido van Rossum wrote: >I expect Andrew's code to go in before 2.1 is released. So I don't >see a reason why we should hurry and check in a stop-gap measure. But it might not; the final version might be unacceptable or run into some intractable problem. Assuming the patch is correct (I haven't looked at it), why not check it in? The work has already been done to write it, after all. --amk From akuchlin at mems-exchange.org Mon Jan 8 17:41:10 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:41:10 -0500 Subject: [Python-Dev] _cursesmodule.c clobbered since Christmas In-Reply-To: <3A59A918.E0D02E0D@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Mon, Jan 08, 2001 at 07:48:40PM +0800 References: <3A59A918.E0D02E0D@per.dem.csiro.au> Message-ID: <20010108114110.D7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 07:48:40PM +0800, Mark Favas wrote: >I last successfully downloaded from CVS, compiled, linked and tested on >Dec. 22 last year. For the last week or so, the current CVS >_cursesmodule.c gives a bunch of compiler warning messages of the form: Hmm... on Dec. 22 there was a sizable change to export a C API from the module; since then there's only been one minor change. Perhaps the last version you compiled successfully was from before I checked in those changes. In any case, I'll look into it as soon as my Compaq test drive account is usable and I have access to a Tru64 4.0 machine again. Thanks for the report! Once the PEP 229 changes go in, many more modules will be tried on many more platforms. It might be worth considering setting up a Tinderbox for Python, or at least doing a systematic test on several platforms before releases. --amk From paulp at ActiveState.com Mon Jan 8 17:46:47 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:46:47 -0800 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A59EEF7.BB4118BD@ActiveState.com> Tim Peters wrote: > > ... It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. I agree strongly. I think that Python people are careless about what their module dictionaries look like. My two main annoyances are modules that export other modules randomly and modules that export huge wacks of constants. > Indeed. But I'm suggesting that the problems that *can* arise from > "import*" illustrate the fundamental silliness of exporting things by > accident. It's come up much more often for me when I'm looking over > someone's shoulder, teaching them how to use dir() in an interactive shell > to answer their own damn questions <0.5 wink>. It's usually the case that > dir(M) shows them something that isn't documented, and over time I am *not* > pleased that "oh, I guess the 'string' in there is just crap" is how they > learn to view it. Screw dir()! Let's talk about important stuff: Komodo. And Idle. And WingIDE. And PythonWorks and PythonWin. :) How are class browsers and "intellisense prompters" supposed to know that it "makes sense" to prompt the user with os.path but not CGIHTTPServer.os.path. Overall, I think Tim is right. We are all adults here and part of being adults is keeping your privates private and your nose clean. Paul Prescod From paulp at ActiveState.com Mon Jan 8 17:47:39 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:47:39 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <20010107232532.V17220@lyra.org>, <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> Message-ID: <3A59EF2B.792801E5@ActiveState.com> Moshe Zadka wrote: > > ... > Let me "me to" here: > Put another way, what Greg said is just a rephrase of "don't use from > foo import * unless foo's docos say it's OK". That's not the issue. It's not about keeping people out of your module. In fact I would propose that mod.__dict__ should be as loose as ever. It's a user interface issue. If we encourage people to learn about modules in interactive environments like the prompt using dir(), class browsers and IDEs then we need to create modules that are friendly for those users. I think that the current situation is pretty bad that way. what does CGIHTTPServer export BaseHTTPServer? And why is CGIHTTPServer.CGIHTTPServer a class but CGIHTTPServer.BaseHTTPServer is a module? We go to great lengths to make the syntax newbie friendly. I think that we should make similar efforts in a cleanly reflective class library. > Add to that the simple > access control of a leading underscore, and I don't see any place > which needs it. > > Something better to do would be to use > import foo as _foo It's pretty clear that nobody does this now and nobody is going to start doing it in the near future. It's too invasive and it makes the code too ugly. Why obfuscate thousands of lines of code when a simple feature can mitigate that? >... > One more point against __exports__, which I hoped I would not have to > make (but when I'm up against the timbot *and* Guido, I need to pull > out the heavy artillery): it would *totally* stop any hope in the > future of module level __getattr__ (or at least complicate the semantics). > I think Alex M. is thinking of a PEP, but he's taking his time, since > no PEPs can be considered until 2.1 is out. __exports__ would merely be considered an implementation detail of the "default __getattr__". Custom __getattr__'s could decide whether to respect it or not. It doesn't complicate anything much. Paul Prescod From nas at arctrix.com Mon Jan 8 10:54:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 8 Jan 2001 01:54:55 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A59E009.96922CA5@interet.com>; from jim@interet.com on Mon, Jan 08, 2001 at 10:43:05AM -0500 References: <3A59E009.96922CA5@interet.com> Message-ID: <20010108015455.A15138@glacier.fnational.com> On Mon, Jan 08, 2001 at 10:43:05AM -0500, James C. Ahlstrom wrote: > Is this a good idea? If so, is the implementation optimal > (comments from MarkH especially welcome)? The general idea sounds good to me. Having tracebacks go nowhere when running pythonw is un-Python-like. I don't know enough about MFC, etc. to comment on the specifics of your patch. Neil From akuchlin at mems-exchange.org Mon Jan 8 17:49:13 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:49:13 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EEF7.BB4118BD@ActiveState.com>; from paulp@ActiveState.com on Mon, Jan 08, 2001 at 08:46:47AM -0800 References: <3A59EEF7.BB4118BD@ActiveState.com> Message-ID: <20010108114913.E7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 08:46:47AM -0800, Paul Prescod wrote: >How are class browsers and "intellisense prompters" supposed to know >that it "makes sense" to prompt the user with os.path but not >CGIHTTPServer.os.path. Could we then simply adopt __exports__ as a convention for such browsers, but with no changes to core Python to support it? Browsers would then follow the algorithm "Use __exports__ if present, dir() if not." --amk From paulp at ActiveState.com Mon Jan 8 17:51:26 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:51:26 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <3A59F00E.53A0A32A@ActiveState.com> Tim Peters wrote: > > .... > > Perl appears to ignore the issue of thread safety here (on Windows and > everywhere else). If you can create a sample program that demonstrates the unsafety I'll anonymously submit it as a bug on our internal system and ensure that the next version of Perl is as slow as Python. :) Seriously: If someone comes at me with Perl-IO-is-way-faster-than-Python-IO, I'd like to know what concretely they've given up in order to achieve that performance. And even just for my own interest I'd like to understand the cost/benefit of stream thread safety. For instance would it make sense to just write a thread-safe wrapper for streams used from multiple threads? Paul Prescod From paulp at ActiveState.com Mon Jan 8 18:01:49 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 09:01:49 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <3A59EEF7.BB4118BD@ActiveState.com> <20010108114913.E7563@kronos.cnri.reston.va.us> Message-ID: <3A59F27D.C27B8CD0@ActiveState.com> Andrew Kuchling wrote: > > ... > > Could we then simply adopt __exports__ as a convention for such > browsers, but with no changes to core Python to support it? Browsers > would then follow the algorithm "Use __exports__ if present, dir() if > not." dir() is one of the "interactive tools" I'd like to work better in the presence of __exports__. On the other hand, dir() works pretty poorly for object instances today so maybe we need something new anyhow. Perhaps attrs()? If there were an "attrs()" and it basically returned __exports__ if it existed and dir() if it didn't, then I would buy it. Graphical apps would just build on attrs(). Paul From MarkH at ActiveState.com Mon Jan 8 18:04:31 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 8 Jan 2001 09:04:31 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A59E009.96922CA5@interet.com> Message-ID: > Is this a good idea? If so, is the implementation optimal Im really on the fence here. Note however that your solution does not solve the original problem. Eg, your example is: > On December 30, 2000 gerson.kurz at t-online.de (Gerson Kurz) writes: > > If I embedd Python in a Win32 console application (using > Demo\embed.c), everything works fine. If I take the very same piece But your solution involves: > The file WinMain.c calls PyWin_StdoutReplace() before it > calls Py_Main(), and PyWin_StdoutPrint() afterwards. This Note that the original problem was _embedding_ Python - thus, you need to patch _their_ WinMain to make it work for them - something you can't do. Even if PyWin_StdoutReplace() was a public symbol so they _could_ call it, I am not convinced they would - it is almost certain they will still need to redirect output to somewhere useful, so why bother redirecting it temporarily just to redirect it for real immediately after? Finally, I am slightly concerned about the possibility of "hanging" certain programs. For example, I believe that DCOM will often invoke a COM server in a different "desktop" than the user (this is also true for Services, but Python services don't use pythonw.exe). Thus, a Python program may end up hanging with a dialog box, but in the context where no user is able to see it. However, this could be addressed by adding a command-line option to prevent this new behaviour kicking in. I would prefer to see a decent API for extracting error and traceback information from Python. On the other hand, I _do_ see the problem for "newbies" trying to use pythonw.exe. So - I guess I am saying that I don't see this as optimal, and it doesnt solve the original problem you pointed at - but in the interests of making pythonw.exe seem "less broken" for newbies, I could live with this as long as I could prevent it when necessary. Another option would be to use the Win32 Console APIs, and simply attempt to create a console for the error message. Eg, maybe PyErr_Print() could be changed to check for the existance of a console, and if not found, create it. However, the problem with this approach is that the error message will often be printed just as the process is terminating - meaning you will see a new console with the error message for about 0.025 of a second before it vanishes due to process termination. Any sort of "press any key to terminate" option then leaves us in the same position - if no user can see the message, the process appears hung. Mark. From andreas at andreas-jung.com Mon Jan 8 18:06:16 2001 From: andreas at andreas-jung.com (Andreas Jung) Date: Mon, 8 Jan 2001 18:06:16 +0100 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 In-Reply-To: <3A58EFC3.5A722FF0@tismer.com>; from tismer@tismer.com on Mon, Jan 08, 2001 at 12:37:55AM +0200 References: <3A58EFC3.5A722FF0@tismer.com> Message-ID: <20010108180616.A18993@yetix.sz-sb.de> On Mon, Jan 08, 2001 at 12:37:55AM +0200, Christian Tismer wrote: > Dear community, > > I'm happy to announce that > > Stackless Python 2.0 > > is finally ready and available for download. > > Stackless Python for Python 1.5.2+ also got some minor > enhancements. Both versions are available as Win32 > installer files here: Are there patches available against the standard Python 2.0 source code tree ? Andreas From tismer at tismer.com Mon Jan 8 17:15:55 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 08 Jan 2001 18:15:55 +0200 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 References: <3A58EFC3.5A722FF0@tismer.com> <20010108180616.A18993@yetix.sz-sb.de> Message-ID: <3A59E7BB.6908B7E2@tismer.com> Andreas Jung wrote: > > On Mon, Jan 08, 2001 at 12:37:55AM +0200, Christian Tismer wrote: > > Dear community, > > > > I'm happy to announce that > > > > Stackless Python 2.0 > > > > is finally ready and available for download. > > > > Stackless Python for Python 1.5.2+ also got some minor > > enhancements. Both versions are available as Win32 > > installer files here: > > Are there patches available against the standard Python 2.0 > source code tree ? I had no time yet to put the source trees on the web. Should happen in one or two days. The I will probably not provide patches, hoping that some other Unix people will catch up and provide that part. This worked the same for the 1.5.2 version. The 2.0 port consists of 10 or so files, which can be used as direct replacements for the same files in the 2.0 distro. I think on Unix this is the right way to go. For me it is simpler to have my own litle tree, since I'm working with Windows, and I just have to modify my VC++ project file. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From moshez at zadka.site.co.il Tue Jan 9 02:30:09 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 03:30:09 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59F27D.C27B8CD0@ActiveState.com> References: <3A59F27D.C27B8CD0@ActiveState.com>, <3A59EEF7.BB4118BD@ActiveState.com> <20010108114913.E7563@kronos.cnri.reston.va.us> Message-ID: <20010109013009.37D6DA82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 09:01:49 -0800, Paul Prescod wrote: > dir() is one of the "interactive tools" I'd like to work better in the > presence of __exports__. On the other hand, dir() works pretty poorly > for object instances today so maybe we need something new anyhow. > Perhaps attrs()? > > If there were an "attrs()" and it basically returned __exports__ if it > existed and dir() if it didn't, then I would buy it. Graphical apps > would just build on attrs(). Even better, __exports__ could be what was imported in from foo import *. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From andreas at andreas-jung.com Mon Jan 8 18:25:36 2001 From: andreas at andreas-jung.com (Andreas Jung) Date: Mon, 8 Jan 2001 18:25:36 +0100 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 In-Reply-To: <3A59E7BB.6908B7E2@tismer.com>; from tismer@tismer.com on Mon, Jan 08, 2001 at 06:15:55PM +0200 References: <3A58EFC3.5A722FF0@tismer.com> <20010108180616.A18993@yetix.sz-sb.de> <3A59E7BB.6908B7E2@tismer.com> Message-ID: <20010108182536.A20361@yetix.sz-sb.de> On Mon, Jan 08, 2001 at 06:15:55PM +0200, Christian Tismer wrote: > > The 2.0 port consists of 10 or so files, which can be used > as direct replacements for the same files in the 2.0 distro. > I think on Unix this is the right way to go. > For me it is simpler to have my own litle tree, since I'm > working with Windows, and I just have to modify my VC++ > project file. I would prefer a tar.gz archive that contains just the modified files. With this approach it is easy possible to extract the archive inside the Python source tree. Andreas From loewis at informatik.hu-berlin.de Mon Jan 8 18:51:28 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 8 Jan 2001 18:51:28 +0100 (MET) Subject: [Python-Dev] Extending startup code: PEP needed? Message-ID: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> > Just curious: wouldn't this introduce a /tmp-style problem to > Python ? I tried, but I could not produce such a problem. > The scenario is quite simple: a Python script runs under root. > The script could pick up a lingering .pth file (e.g. from /tmp > or one of its subdirs -- distutils does this !) and then executes > arbitrary code as *root*. No, Python looks only in a few places for pth file: {,}{,/lib/python/site-packages,/lib/site-python} so it won't pick up pth files in /tmp. Regards, Martin From esr at thyrsus.com Mon Jan 8 19:01:37 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 13:01:37 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101081506.KAA03404@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 10:06:28AM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> Message-ID: <20010108130137.E22834@thyrsus.com> Guido van Rossum : > Eric, before we go furhter, can you give an exact definition of > EOFness to me? A file is at EOF when attempts to read more data from it will fail returning no data. > What's wrong with just setting the parser loose on the input and > letting it deal with EOF? Nothing wrong in theory, but it's a problem in practice. I don't want to import the second parser unless it's actually needed, because it's much larger than the first one. > In your example, apparently a line > containing the word "history" signals that the rest of the file must > be parsed by the second parser. What if "history" is the last line of > the file? The eof() test can't tell you *that*! Right. That case never happens. I mean it *really* never happens :-). What we're talking about is a game system. The first parser recognizes a spec language for describing games of a particular class (variants of Diplomacy, if that's meaningful to you). The system keeps logfiles which consist of a a section in the game description language, optionally followed by the token "history" and an order log. The parser for the order log language is a *lot* larger than the one for the description language. This is why I said I don't want the first parser to just call the second. I want to test for EOF to know whether I have to import the second parser at all! Here's the beginning of my problem: the first parser can't export a line buffer, because it doesn't *have* a line buffer. It's a subclass of shlex and does single-character reads. There are two ways I can cope with this. One is to do a (nonzero) length read after the first parser exits; the other is to have the first parser set a state flag controlling whether the second parser loads. This is where it bites that I can't test for EOF with a read(0). The second shlex parser only has token-level pushback! If do a nonzero-length read and I get data, I'm screwed. On the other hand (as I said before) setting a lexer state flag seems wrong, because EOFness is a property of the underlying stream rather than the parser. I'd be duplicating state that exists in the stdio stream structure anyway; it ought to be accessible. > > Now, another and more general way to handle this would be to make an > > equivalent of the old FIONCLEX ioctl part of Python's standard set of > > file object methods -- a way to ask "how many bytes are ready to be > > read in this stream? > > There's no portable way to do that. Actually, fstat(2) is portable enough to support a very useful approximation of FIONCLEX. I know, because I tried it. Last night I coded up a "waiting" method for file objects that calls fstat(2) on the associated file descriptor. For a plain file, it then subtracts the result of ftell() from the fstat size field and returns that -- for other files, it simply returns the size field. I then tested this on plain files, FIFOs, and sockets under Linux. It turns out fstat(2) gives useful information in all three cases (a count of characters waiting in the buffer in the latter two). I expected this; it should be true under all current Unixes. fstat(2) does not give useful size-field results for Linux block devices. I didn't test the character (terminal) devices. (I documented my results in Python's Doc/lib/stat.tex, in a patch I have already submitted to SourceForge.) I would be quite surprised if the plain-file case didn't work on Mac and Windows. I would be a little surprised if the socket case failed, because all three probably inherited fstat(2) from the ancestral BSD TCP/IP stack. Just having the plain-file case work would, IMHO, be justification enough for this method. If it turns out to be portable across Mac and Windows sockets as well, *huge* win. Could this be tested by someone with access to Windows and Mac systems? -- Eric S. Raymond An armed society is a polite society. Manners are good when one may have to back up his acts with his life. -- Robert A. Heinlein, "Beyond This Horizon", 1942 From mal at lemburg.com Mon Jan 8 19:10:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 19:10:50 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> Message-ID: <3A5A02AA.675A35D1@lemburg.com> Martin von Loewis wrote: > > > Just curious: wouldn't this introduce a /tmp-style problem to > > Python ? > > I tried, but I could not produce such a problem. > > > The scenario is quite simple: a Python script runs under root. > > The script could pick up a lingering .pth file (e.g. from /tmp > > or one of its subdirs -- distutils does this !) and then executes > > arbitrary code as *root*. > > No, Python looks only in a few places for pth file: > {,}{,/lib/python/site-packages,/lib/site-python} > > so it won't pick up pth files in /tmp. Hmm, but what if the Python script picks up a site.py which is different from the standard one distributed with Python ? The code adding (and with the patch: executing) the .pth files is defined in site.py and it is rather easy to override this file by adding a modified site.py file to the current working dir... a potential security hole in its own right, I guess :( -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Jan 8 19:30:34 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 13:30:34 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Mon, 08 Jan 2001 13:01:37 EST." <20010108130137.E22834@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> <20010108130137.E22834@thyrsus.com> Message-ID: <200101081830.NAA05301@cj20424-a.reston1.va.home.com> Eric, take a hint. You're not going to get your eof() method no matter what arguments you bring up. But I'll explain it to you again anyway... :-) > Guido van Rossum : > > Eric, before we go furhter, can you give an exact definition of > > EOFness to me? [Eric] > A file is at EOF when attempts to read more data from it will fail > returning no data. I was afraid you would say this. That's not a condition that's easy to calculate without doing I/O, *and* that's not the condition that you are interested in for your problem. According to your definition, f.eof() should be true in this example: f = open("/etc/passwd") f.seek(0, 2) # Seek to end of file print f.eof() # What will this print??? print `f.readline()` # Will print '' But getting the right result here requires a lot of knowledge about how the file is implemented! While you've explained how this can be implemented on Unix, it can't be implemented with just the tools that stdio gives us. Going beyond stdio in order to implement a feature is a grave decision. After all, Python is portable to many less-than-mainstream operating systems (VxWorks, OS/9, VMS...). Now, if this was just a speed hack (like xreadlines) I could accept having some platform-dependent code, if at least there was a portable way to do it that was just a bit slower. But here you can't convince me that this can be done in a portable way, and I don't want to force porters to figure out how to do this for their platform before their port can work. I also don't want to make f.eof() a non-portable feature: *if* it is provided, it's too important for that. Note that stdio's feof() doesn't have this definition! It is set when the last *read* (or getc(), etc.) stumbled upon an EOF condition. That's also of limited value; it's mostly defined so you can distinguish between errors and EOF when you get a short read. The stdio feof() flag would be false in the above example. > > What's wrong with just setting the parser loose on the input and > > letting it deal with EOF? > > Nothing wrong in theory, but it's a problem in practice. I don't want > to import the second parser unless it's actually needed, because it's much > larger than the first one. So be practical and let the first parser set a global flag that tells you whether it's necessary to load the second one. > > In your example, apparently a line > > containing the word "history" signals that the rest of the file must > > be parsed by the second parser. What if "history" is the last line of > > the file? The eof() test can't tell you *that*! > > Right. That case never happens. I mean it *really* never happens :-). > > What we're talking about is a game system. The first parser recognizes > a spec language for describing games of a particular class (variants of > Diplomacy, if that's meaningful to you). The system keeps logfiles which > consist of a a section in the game description language, optionally > followed by the token "history" and an order log. > > The parser for the order log language is a *lot* larger than the one > for the description language. This is why I said I don't want the > first parser to just call the second. I want to test for EOF to > know whether I have to import the second parser at all! > > Here's the beginning of my problem: the first parser can't export a line > buffer, because it doesn't *have* a line buffer. It's a subclass of > shlex and does single-character reads. > > There are two ways I can cope with this. One is to do a (nonzero) > length read after the first parser exits; the other is to have the > first parser set a state flag controlling whether the second parser > loads. Do the latter. Nothing wrong with it that I can see. > This is where it bites that I can't test for EOF with a read(0). And can you tell me a system where you *can* test for EOF with a read(0)? I've never heard of such a thing. The Unix read() system call has the same properties as Python's f.read(). I'm pretty sure that fread() with a zero count also doesn't give you the information you're after. > The > second shlex parser only has token-level pushback! If do a > nonzero-length read and I get data, I'm screwed. On the other hand > (as I said before) setting a lexer state flag seems wrong, because > EOFness is a property of the underlying stream rather than the parser. > I'd be duplicating state that exists in the stdio stream structure > anyway; it ought to be accessible. Bullshit. The EOFness that you're after (according to your own definition) is not the same as the EOFness of the stdio stream. The EOFness in the stdio stream could help you, but Python resets it -- so that making it available wouldn't be as easy as you claim. Anyway, you seem to have a sufficiently vague idea of what "EOFness" means that I don't think providing access to whatever low-level EOFness condition might exist would do you much good. > > > Now, another and more general way to handle this would be to make an > > > equivalent of the old FIONCLEX ioctl part of Python's standard set of > > > file object methods -- a way to ask "how many bytes are ready to be > > > read in this stream? > > > > There's no portable way to do that. > > Actually, fstat(2) is portable enough to support a very useful > approximation of FIONCLEX. I know, because I tried it. > > Last night I coded up a "waiting" method for file objects that calls > fstat(2) on the associated file descriptor. For a plain file, it > then subtracts the result of ftell() from the fstat size field and > returns that -- for other files, it simply returns the size field. > > I then tested this on plain files, FIFOs, and sockets under Linux. It > turns out fstat(2) gives useful information in all three cases (a > count of characters waiting in the buffer in the latter two). I expected > this; it should be true under all current Unixes. > > fstat(2) does not give useful size-field results for Linux block > devices. I didn't test the character (terminal) devices. (I > documented my results in Python's Doc/lib/stat.tex, in a patch I have > already submitted to SourceForge.) > > I would be quite surprised if the plain-file case didn't work on Mac > and Windows. I would be a little surprised if the socket case failed, > because all three probably inherited fstat(2) from the ancestral BSD > TCP/IP stack. > > Just having the plain-file case work would, IMHO, be justification > enough for this method. If it turns out to be portable across Mac and > Windows sockets as well, *huge* win. Could this be tested by someone > with access to Windows and Mac systems? I don't see the huge win. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 8 19:33:26 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 13:33:26 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 19:10:50 +0100." <3A5A02AA.675A35D1@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> Message-ID: <200101081833.NAA05325@cj20424-a.reston1.va.home.com> Discussions based on Python running as root and picking up untrusted code from $PYTHONPATH are pointless. Of course this is a security hole. If root runs *any* Python script in a way that could pick up even a single untrusted module, there's a security hole. site.py or *.pth files are just a special case of this, so I don't see why this is used as an example. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 8 19:48:40 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 13:48:40 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EF2B.792801E5@ActiveState.com> Message-ID: [Moshe] > Something better to do would be to use > import foo as _foo [Paul] > It's pretty clear that nobody does this now and nobody is going > to start doing it in the near future. It's too invasive and it > makes the code too ugly. Actually, this function is one of my std utilities: def _pvt_import(globs, modname, *items): """globs, modname, *items -> import into globs with leading "_". If *items is empty, set globs["_" + modname] to module modname. If *items is not empty, import each item similarly but don't import the module into globs. Leave names that already begin with an underscore as-is. # import math as _math >>> _pvt_import(globals(), "math") >>> round(_math.pi, 0) 3.0 # import math.sin as _sin and math.floor as _floor >>> _pvt_import(globals(), "math", "sin", "floor") >>> _floor(3.14) 3.0 """ mod = __import__(modname, globals()) if items: for name in items: xname = name if xname[0] != "_": xname = "_" + xname globs[xname] = getattr(mod, name) else: xname = modname if xname[0] != "_": xname = "_" + xname globs[xname] = mod Note that it begins with an underscore because it's *meant* to be exported <0.5 wink>. That is, the module importing this does from utils import _pvt_import because they don't already have _pvt_import to automate adding the underscore, and without the underscore almost everyone would accidentally export "pvt_import" in turn. IOW, import M from N import M not only import M, by default they usually export it too, but the latter is rarely *intended*. So, over the years, I've gone thru several phases of naming objects I *intend* to export with a leading underscore. That's the only way to prevent later imports from exporting by accident. I don't believe I've distributed any code using _pvt_import, though, because it fights against the language and expectations. Metaprogramming against the grain should be a private sin <0.9 wink>. _metaprogramming-ly y'rs - tim From mal at lemburg.com Mon Jan 8 19:40:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 19:40:37 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> Message-ID: <3A5A09A5.D0DC33A1@lemburg.com> Guido van Rossum wrote: > > Discussions based on Python running as root and picking up untrusted > code from $PYTHONPATH are pointless. Of course this is a security > hole. If root runs *any* Python script in a way that could pick up > even a single untrusted module, there's a security hole. site.py or > *.pth files are just a special case of this, so I don't see why this > is used as an example. Agreed; see my reply to Martin. Still, wouldn't it be wise to add some logic to Python to prevent importing untrusted modules, e.g. by making sys.path read-only and disabling the import hook usage using a command line ? This would at least prevent the most obvious attacks. I wonder how RedHat works around these problems. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Mon Jan 8 20:16:45 2001 From: jim at interet.com (James C. Ahlstrom) Date: Mon, 08 Jan 2001 14:16:45 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? References: Message-ID: <3A5A121D.FDD8C2C1@interet.com> Mark Hammond wrote: > Note that the original problem was _embedding_ Python - thus, you need to > patch _their_ WinMain to make it work for them - something you can't do. Correct, if they don't use pythonw.exe, but use a different main program, the new stdout will not be installed. But then they must have their own main.c, and they can add the C call. > Even if PyWin_StdoutReplace() was a public symbol so they _could_ call it, I Yes, the symbol PyWin_StdoutReplace() is public, and they can call it. > am not convinced they would - it is almost certain they will still need to > redirect output to somewhere useful, so why bother redirecting it > temporarily just to redirect it for real immediately after? Redirecting it temporarily is valuable, because if the sys.stdout replacement occurs in (for example) myprog.py, then "pythonw.exe myprog.py" will fail to produce any error messages for a syntax error in myprog.py. Also, I was hoping further sys.stdout redirection would be unnecessary. > Finally, I am slightly concerned about the possibility of "hanging" certain > programs. For example, I believe that DCOM will often invoke a COM server in > a different "desktop" than the user (this is also true for Services, but > Python services don't use pythonw.exe). Thus, a Python program may end up > hanging with a dialog box, but in the context where no user is able to see > it. However, this could be addressed by adding a command-line option to > prevent this new behaviour kicking in. Limiting the code to pythonw.exe instead of trying to install it in python20.dll was supposed to prevent damage to the use of Python in servers. Since pythonw.exe is a Windows (GUI) program, I am assuming there is a screen. The dialog box is started with MessageBox() and a window handle of GetForegroundWindow(). So there doesn't need to be an application window. I have tested it with GUI programs, and it also works when run from a console. Having said that, you may be right that there is some way to hang on a dialog box which can not be seen. It depends on what MessageBox() and GetForegroundWindow() actually do. If it seems that this patch has merit, I would be grateful if you would review the code to look for issues of this type. > I would prefer to see a decent API for extracting error and traceback > information from Python. On the other hand, I _do_ see the problem for > "newbies" trying to use pythonw.exe. There could be an API added to the winstdout module such as msg = winstdout.GetMessageText() which would return saved text, control its display etc. But then the problem remains of actually displaying the messages especially in the context of tracebacks and errors. And it is probably easier to redirect sys.stdout so it does what you want rather than use the API. I do not view winstdout as a "newbie" feature, but rather a generally useful C-language addition to Python. > So - I guess I am saying that I don't see this as optimal, and it doesnt > solve the original problem you pointed at - but in the interests of making > pythonw.exe seem "less broken" for newbies, I could live with this as long > as I could prevent it when necessary. I guess I am saying, perhaps incorrectly, that the mechanism provided will make further redirection of sys.stdout unnecessary 99% of the time. Experimentation shows that Python composes tracebacks and error messages a line or partial line at a time. That is, you can not display each call to printf(), but must wait until the system is idle to be sure that multiple calls to printf() are complete. So this forces you to use the idle processing loop, not rocket science but at least inconvenient. And the only source of stdout/err is tracebacks, error messages and the "print" statement. What would you do with these in a Windows program except display an "OK" dialog box? If someone out there knows of a different example of sys.stdout redirection in use in the real world, it would be helpful if they would describe it. Maybe it could be incorporated. > Another option would be to use the Win32 Console APIs, and simply attempt to > create a console for the error message. Eg, maybe PyErr_Print() could be > changed to check for the existance of a console, and if not found, create > it. However, the problem with this approach is that the error message will > often be printed just as the process is terminating - meaning you will see a > new console with the error message for about 0.025 of a second before it > vanishes due to process termination. Any sort of "press any key to > terminate" option then leaves us in the same position - if no user can see > the message, the process appears hung. Yes, this a problem with the console API approach. Another is that popping up a black console for output instead of the usual "OK" dialog box is unnatural, and will force the user to replace sys.stdout. I was hoping this C stdout will make this unnecessary. JimA From esr at thyrsus.com Mon Jan 8 20:17:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 14:17:50 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101081830.NAA05301@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 01:30:34PM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> <20010108130137.E22834@thyrsus.com> <200101081830.NAA05301@cj20424-a.reston1.va.home.com> Message-ID: <20010108141750.C23214@thyrsus.com> Guido van Rossum : > [Eric] > > A file is at EOF when attempts to read more data from it will fail > > returning no data. > > I was afraid you would say this. That's not a condition that's easy > to calculate without doing I/O, *and* that's not the condition that > you are interested in for your problem. According to your definition, > f.eof() should be true in this example: > > f = open("/etc/passwd") > f.seek(0, 2) # Seek to end of file > print f.eof() # What will this print??? > print `f.readline()` # Will print '' I agree that after f.seek(0, 2) f is in an end-of-file condition. But I think it's precisely the definition that would be useful for my problem. Contrary to what you say, I think my definition of EOF is quite sharp -- a sequential read would return no data. Better to think of what I need as an "is there data waiting?" query. I should have framed it that way, rather than about EOFness, from the beginning. > But getting the right result here requires a lot of knowledge about > how the file is implemented! While you've explained how this can be > implemented on Unix, it can't be implemented with just the tools that > stdio gives us. Granted. However, it looks possible that "is there data waiting" *can* be portably implemented with the help of fstat(2), which by precedent is also part of Python's toolkit. > I also don't want to make f.eof() a non-portable feature: *if* > it is provided, it's too important for that. Agreed. > Note that stdio's feof() doesn't have this definition! It is set when > the last *read* (or getc(), etc.) stumbled upon an EOF condition. > That's also of limited value; it's mostly defined so you can > distinguish between errors and EOF when you get a short read. The > stdio feof() flag would be false in the above example. OK. You're right about that. I should have thought more clearly about the difference between the state of stdio and the state of the underlying file or device. Access to stdio state won't do by itself. > > This is where it bites that I can't test for EOF with a read(0). > > And can you tell me a system where you *can* test for EOF with a > read(0)? I've never heard of such a thing. The Unix read() system > call has the same properties as Python's f.read(). I'm pretty sure > that fread() with a zero count also doesn't give you the information > you're after. I'd have to test -- but what Unix read(2) does in this case isn't really my point. My real point is that I can't probe for whether there's data waiting to be read in what seems like the obvious way. I expect Python to compensate for the deficiencies of the underlying C, not reflect them. > > Just having the plain-file case work would, IMHO, be justification > > enough for this method. If it turns out to be portable across Mac and > > Windows sockets as well, *huge* win. Could this be tested by someone > > with access to Windows and Mac systems? > > I don't see the huge win. Try "polling after a non-blocking open". A lower-overhead and more natural way to do it than with a poller object. (This is on my mind because I used a poller object to query FIFOs just last week.) The game system I'm working on, BTW, has another point of interest for this list. It is a rather large and complex suite of C programs that makes heavy use of dynamic-memory allocation; I am translating to Python partly in order to avoid chronic misallocation problems (leaks and wild pointers) and partly because the thing needed to be rewritten anyway to eliminate global state so I can embed it an multithreaded server. Side-by-side comparison of the original C and its translation should be quite an interesting educational experience once it's done. That just might be my next yesar's paper. -- Eric S. Raymond It is the assumption of this book that a work of art is a gift, not a commodity. Or, to state the modern case with more precision, that works of art exist simultaneously in two "economies," a market economy and a gift economy. Only one of these is essential, however: a work of art can survive without the market, but where there is no gift there is no art. -- Lewis Hyde, The Gift: Imagination and the Erotic Life of Property From guido at python.org Mon Jan 8 20:36:02 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 14:36:02 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 19:40:37 +0100." <3A5A09A5.D0DC33A1@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> Message-ID: <200101081936.OAA05440@cj20424-a.reston1.va.home.com> > Still, wouldn't it be wise to add some logic to Python to prevent > importing untrusted modules, e.g. by making sys.path read-only and > disabling the import hook usage using a command line ? > > This would at least prevent the most obvious attacks. I wonder how > RedHat works around these problems. I don't understand what kind of attacks you are thinking of. What would making sys.path read-only prevent? You seem to be thinking that some malicious piece of code could try to subvert you by setting sys.path. But what you forget is that if this piece of code cannot be trusted wiuth sys.path, it should not be trusted to run at all! --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis at informatik.hu-berlin.de Mon Jan 8 20:45:44 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 8 Jan 2001 20:45:44 +0100 (MET) Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: <3A5A02AA.675A35D1@lemburg.com> (mal@lemburg.com) References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> Message-ID: <200101081945.UAA12178@pandora.informatik.hu-berlin.de> > The code adding (and with the patch: executing) the .pth files > is defined in site.py and it is rather easy to override this > file by adding a modified site.py file to the current working dir... > a potential security hole in its own right, I guess :( Indeed - independent of my patch changing the other site.py :-) Regards, Martin From skip at mojam.com Mon Jan 8 20:49:22 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 8 Jan 2001 13:49:22 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EF2B.792801E5@ActiveState.com> References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <3A59EF2B.792801E5@ActiveState.com> Message-ID: <14938.6594.44596.509259@beluga.mojam.com> Paul> It's not about keeping people out of your module. In fact I would Paul> propose that mod.__dict__ should be as loose as ever. Okay, how about this as a compromise first step? Allow programmers to put __exports__ lists in their modules but don't do anything with them *except* modify dir() to respect that if it exists? That would pretty up dir() output for newbies, almost certainly not break anything, improve the internal documentation of the modules that use __exports__, and still allow us to move in a more restrictive direction at a later time if we so choose. Skip From moshez at zadka.site.co.il Tue Jan 9 05:04:23 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 06:04:23 +0200 (IST) Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: <3A5A02AA.675A35D1@lemburg.com> References: <3A5A02AA.675A35D1@lemburg.com>, <200101081751.SAA08918@pandora.informatik.hu-berlin.de> Message-ID: <20010109040423.68AA4A82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 19:10:50 +0100, "M.-A. Lemburg" wrote: > Hmm, but what if the Python script picks up a site.py which is > different from the standard one distributed with Python ? Then the site.py can do whatever it wants. No need to go through PTHs -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Mon Jan 8 20:59:48 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 14:59:48 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <20010108130137.E22834@thyrsus.com> Message-ID: Quickie: [Guido] > Eric, before we go furhter, can you give an exact definition of > EOFness to me? [Eric] > A file is at EOF when attempts to read more data from it will fail > returning no data. To be very clear about this, that's not what C's feof() means: in general, the end-of-file indicator in std C stream input is set only *after* you've attempted a read that "didn't work". For example, #include void main() { FILE* fp = fopen("guts", "wb"); fputs("abc", fp); fclose(fp); fp = fopen("guts", "rb"); for (;;) { int c; c = getc(fp); printf("getc returned %c (%d)\n", c, c); printf("At EOF after getc? %d\n", feof(fp)); if (c == EOF) break; } } Unless your C is broken, feof() will return 0 after getc() returns 'a', and again after 'b', and again after 'c'. It's not until getc() returns EOF that feof() first returns a non-zero result. Then add these two lines after the "for": fseek(fp, 0L, SEEK_END); printf("after seeking to the end, feof() says %d\n", feof(fp)); Unless your fseek() is non-std, that clears the end-of-file indicator, and regardless of to where you seek. So the std behavior throughout libc is much like Python's behavior: there's nothing that can tell you whether you're at the end of the file, in general, short of trying to read and failing to get something back. In your case you seem to *know* that you have a "plain old file", meaning that its size is well-defined and that ftell() makes sense for it. You also seem to know that you don't have to worry about anyone else, e.g., appending to it (or in any other way changing its size, or changing your stream's file position), while you're mucking with it. So why not just do f.tell() and compare that to the size yourself? This sounds easy for you to do, but in this particular case you enjoy the benefits of a world of assumptions that aren't true in general. > ... > This is where it bites that I can't test for EOF with a read(0). You can't in std C using an fread of 0 bytes either -- that has no effect on the end-of-file indicator. Add if (c == 'c') { char buf[100]; size_t i = fread(buf, 1, 0, fp); printf("after fread of 0 bytes, feof() says %d\n", feof(fp)); } before the "(c == EOF)" test above to try that on your platform. > ... > I would be quite surprised if the plain-file case didn't work on Mac > and Windows. Don't know about Mac. On Windows everything is grossly complicated because of line-end translations in text mode. Like the C std says, the only *portable* thing you can do with an ftell() result for a text file is feed it back unaltered to fseek(). It so happens that on Windows, using MS's libc, if f.readline() returns "abc\n" for the first line of a native text file, f.tell() returns 5, reflecting the actual byte offset in the file (including the \r that .readline() doesn't show you). So you *can* get away with comparing f.tell() to the file's size on Windows too (using the MS C compiler; don't know about others). the-operational-defn-of-eof-is-the-only-portable-defn- there-is-ly y'rs - tim From moshez at zadka.site.co.il Tue Jan 9 05:08:29 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 06:08:29 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14938.6594.44596.509259@beluga.mojam.com> References: <14938.6594.44596.509259@beluga.mojam.com>, <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <3A59EF2B.792801E5@ActiveState.com> Message-ID: <20010109040829.BDB66A82D@darjeeling.zadka.site.co.il> [Paul Prescod] > It's not about keeping people out of your module. In fact I would > propose that mod.__dict__ should be as loose as ever. [Skip Montanaro] > Okay, how about this as a compromise first step? Allow programmers to put > __exports__ lists in their modules but don't do anything with them *except* > modify dir() to respect that if it exists? That would pretty up dir() > output for newbies, almost certainly not break anything, improve the > internal documentation of the modules that use __exports__, and still allow > us to move in a more restrictive direction at a later time if we so choose. I'm +1 on that personally. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal at lemburg.com Mon Jan 8 21:38:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 21:38:00 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> Message-ID: <3A5A2528.C289BE1D@lemburg.com> Guido van Rossum wrote: > > > Still, wouldn't it be wise to add some logic to Python to prevent > > importing untrusted modules, e.g. by making sys.path read-only and > > disabling the import hook usage using a command line ? > > > > This would at least prevent the most obvious attacks. I wonder how > > RedHat works around these problems. > > I don't understand what kind of attacks you are thinking of. What > would making sys.path read-only prevent? You seem to be thinking that > some malicious piece of code could try to subvert you by setting > sys.path. But what you forget is that if this piece of code cannot be > trusted wiuth sys.path, it should not be trusted to run at all! I was thinking an attack where knowledge of common temporary execution locations is used to trick Python into executing untrusted code -- the untrusted code would only have to be copied to the known temporary execution directory and then gets executed by Python next time the program using the temporary location is invoked. But you're right: this is possible with and without sys.path being writeable or not. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Mon Jan 8 21:45:57 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 21:45:57 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101081427.JAA03146@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 09:27:50AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> Message-ID: <20010108214557.H402@xs4all.nl> On Mon, Jan 08, 2001 at 09:27:50AM -0500, Guido van Rossum wrote: > > You may be right. Still, this patch solves the immediate problem in a > > reasonably clean way, and I urge that it should go in. We can do a > > more complete reorganization of the build process later. (I'll help with > > that; I'm pretty expert with autoconf and friends.) > I expect Andrew's code to go in before 2.1 is released. So I don't > see a reason why we should hurry and check in a stop-gap measure. Oh, we're gonna distribute binaries of Python 2.0/1.5.2-with-distutils for every known platform that can run configure ? :) I still think there are more than enough platforms without Python to warrant using autoconf for configuring modules. The module list and their demands are stable enough to make maintenance a fair breeze, IMHO. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Mon Jan 8 22:57:58 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 16:57:58 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108214557.H402@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 08, 2001 at 09:45:57PM +0100 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108214557.H402@xs4all.nl> Message-ID: <20010108165758.B9260@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 09:45:57PM +0100, Thomas Wouters wrote: >every known platform that can run configure ? :) I still think there are >more than enough platforms without Python to warrant using autoconf for >configuring modules. The module list and their demands are stable enough to >make maintenance a fair breeze, IMHO. Umm... the proposed PEP 229 patch would compile a Python binary with sre, posix, and strop statically linked; this minimal Python is then used to run the setup.py script. You shouldn't require a preinstalled Python, though the current version of the patch doesn't meet this requirement yet. --amk From tim.one at home.com Mon Jan 8 21:59:40 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 15:59:40 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: [Tim] > Perl appears to ignore the issue of thread safety here (on Windows and > everywhere else). [Paul Prescod] > If you can create a sample program that demonstrates the unsafety > I'll anonymously submit it as a bug on our internal system I don't want to spend time on that, as I *assume* it's already well-known within the Perl thread community. Besides, the last version of Perl I got from ActiveState complains: No threads in this perl at temp.pl line 14 if I try to use Perl threads. That's: > \perl\bin\perl -v This is perl, v5.6.0 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2000, Larry Wall Binary build 620 provided by ActiveState Tool Corp. http://www.ActiveState.com Built 18:31:05 Oct 31 2000 ... If I can repair that by downloading a more recent release, let me know. > and ensure that the next version of Perl is as slow as Python. :) I don't want to slow them down! To the contrary, now I've got a solid reason for why I keep using Perl for simple high-volume text-crunching jobs . > Seriously: If someone comes at me with Perl-IO-is-way-faster-than- > Python-IO, I'd like to know what concretely they've given up in order > to achieve that performance. My line-at-a-time test case used (rounding to nearest whole integers) 30 seconds in Python and 6 in Perl. The result of testing many changes to Python's implementation was that the excess 24 seconds broke down like so: 17 spent inside internal MS threadsafe getc() lock/unlock routines 5 uncertain, but evidence suggests much of it due to MS malloc/realloc (Perl does its own memory mgmt) 2 for not copying directly out of the platform FILE* implementation struct in a highly optimized loop (like Perl does) My last checkin to fileobject.c reclaimed 17 seconds on Win98SE while remaining threadsafe, via a combination of locking per line instead of per character, and invoking realloc much less often (only for lines exceeding 200 chars). (BTW, I'm still curious to know how that compares to the getc_unlocked hack on a platform other than Windows!) > And even just for my own interest I'd like to understand the cost/ > benefit of stream thread safety. If you're not *using* threads, or not using them to muck with the same stream at the same time, the ratio is infinite. And that's usually the case. > For instance would it make sense to just write a thread-safe > wrapper for streams used from multiple threads? Alas, on Windows you can't pick and choose: you get the threadsafe libc, or you don't. So long as anyone may want to use threads for any reason whatsoever, we must link with threadsafe libraries. But, as above, on Windows we're not paying much for that anymore in this case (unless maybe the threadsafe MS malloc family is also outrageously slower than its careless counterpart ...). It does prevent me from persuing the "optimized inner loop" business, because MS doesn't expose its locking primitives (so I can't do in C everything I would need to do to optimize the inner loop while remaining threadsafe). there-are-damn-few-pieces-of-libc-we-wouldn't-be-better-off- writing-ourselves-but-then-we'd-have-a-much-harder-time- playing-with-others'-code-ly y'rs - tim From akuchlin at mems-exchange.org Mon Jan 8 22:15:34 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 16:15:34 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:59:40PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: <20010108161534.A2392@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: >200 chars). (BTW, I'm still curious to know how that compares to the >getc_unlocked hack on a platform other than Windows!) On Solaris and Linux, the results seemed to be lost in the noise. Repeated runs of filetest.py were sometimes faster than without USE_MS_GETLINE_HACK, so the variation is probably large enough to swamp any difference between the two. (Assuming I enabled the getline hack correctly of course; someone please replicate...) --amk Linux: w/o USE_MS_GETLINE_HACK kronos Python-2.0>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.186 0.190 readlines_sizehint 0.108 0.110 using_fileinput 0.447 0.450 while_readline 0.184 0.180 Linux w/ USE_MS_GETLINE_HACK: kronos Python-2.0>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.178 0.180 readlines_sizehint 0.108 0.110 using_fileinput 0.434 0.430 while_readline 0.183 0.190 Solaris w/o USE_MS_GETLINE_HACK: amarok src>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.640 0.630 readlines_sizehint 0.278 0.280 using_fileinput 1.874 1.820 while_readline 0.839 0.840 Solaris w/ USE_MS_GETLINE_HACK: amarok src>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.569 0.570 readlines_sizehint 0.275 0.280 using_fileinput 1.902 1.900 while_readline 0.769 0.770 From gstein at lyra.org Mon Jan 8 22:29:40 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 13:29:40 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108161534.A2392@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Jan 08, 2001 at 04:15:34PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> <20010108161534.A2392@kronos.cnri.reston.va.us> Message-ID: <20010108132940.G4141@lyra.org> On Mon, Jan 08, 2001 at 04:15:34PM -0500, Andrew Kuchling wrote: > On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: > >200 chars). (BTW, I'm still curious to know how that compares to the > >getc_unlocked hack on a platform other than Windows!) > > On Solaris and Linux, the results seemed to be lost in the noise. Your times are so small... I'd suggest do a few iterations within filetest.py so your margin of error isn't so noticable. Cheers, -g >... > Linux: w/o USE_MS_GETLINE_HACK > kronos Python-2.0>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.186 0.190 > readlines_sizehint 0.108 0.110 > using_fileinput 0.447 0.450 > while_readline 0.184 0.180 > > Linux w/ USE_MS_GETLINE_HACK: > kronos Python-2.0>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.178 0.180 > readlines_sizehint 0.108 0.110 > using_fileinput 0.434 0.430 > while_readline 0.183 0.190 > Solaris w/o USE_MS_GETLINE_HACK: > amarok src>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.640 0.630 > readlines_sizehint 0.278 0.280 > using_fileinput 1.874 1.820 > while_readline 0.839 0.840 > > Solaris w/ USE_MS_GETLINE_HACK: > amarok src>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.569 0.570 > readlines_sizehint 0.275 0.280 > using_fileinput 1.902 1.900 > while_readline 0.769 0.770 -- Greg Stein, http://www.lyra.org/ From thomas at xs4all.net Mon Jan 8 22:59:17 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 22:59:17 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108165758.B9260@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Jan 08, 2001 at 04:57:58PM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108214557.H402@xs4all.nl> <20010108165758.B9260@kronos.cnri.reston.va.us> Message-ID: <20010108225916.P2467@xs4all.nl> On Mon, Jan 08, 2001 at 04:57:58PM -0500, Andrew Kuchling wrote: > Umm... the proposed PEP 229 patch would compile a Python binary with > sre, posix, and strop statically linked; this minimal Python is then > used to run the setup.py script. You shouldn't require a preinstalled > Python, though the current version of the patch doesn't meet this > requirement yet. Apologies. I should've bothered to read the PEP first, but I haven't found the time yet :P I retract all my comments on the subject until I do. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Mon Jan 8 23:08:50 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 23:08:50 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Tue, Jan 09, 2001 at 02:03:00AM +0200 References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com>, <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> <200101081515.KAA03474@cj20424-a.reston1.va.home.com> <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> Message-ID: <20010108230850.Q2467@xs4all.nl> On Tue, Jan 09, 2001 at 02:03:00AM +0200, Moshe Zadka wrote: > > (2) Under exactly what circumstances do you want from foo import * > > issue a warning? > All. > If you want to be less extreme, don't warn if the module defines > a __from_star_ok__ We already have a perfectly acceptable way of turning off warnings in particular circumstances. I'm +1 on warning against using 'from spam import *' by the way, though it would be even better (+2!) if there was a 'import * considered harmful' page/chapter in the documentation somewhere, so we could point to it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Mon Jan 8 23:23:02 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 17:23:02 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 21:38:00 +0100." <3A5A2528.C289BE1D@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> <3A5A2528.C289BE1D@lemburg.com> Message-ID: <200101082223.RAA05858@cj20424-a.reston1.va.home.com> > I was thinking an attack where knowledge of common temporary > execution locations is used to trick Python into executing > untrusted code -- the untrusted code would only have to be > copied to the known temporary execution directory and then > gets executed by Python next time the program using the temporary > location is invoked. When does Python execute code from a predictable common temporary location? When is that likely to be used from a Python script running as root? Note that if you use tempfile.TemporaryFile(), you can create a temporary file that's not subvertible. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Jan 8 23:35:17 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 8 Jan 2001 17:35:17 -0500 (EST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010108230850.Q2467@xs4all.nl> References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> <200101081433.JAA03185@cj20424-a.reston1.va.home.com> <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> <20010108230850.Q2467@xs4all.nl> Message-ID: <14938.16549.944123.917467@cj42289-a.reston1.va.home.com> Thomas Wouters writes: > *' by the way, though it would be even better (+2!) if there was a 'import * > considered harmful' page/chapter in the documentation somewhere, so we could > point to it. Care to write it? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From MarkH at ActiveState.com Tue Jan 9 00:00:01 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 8 Jan 2001 15:00:01 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A5A05DA.86B3EB86@interet.com> Message-ID: > Limiting the code to pythonw.exe instead of trying to install > it in python20.dll was supposed to prevent damage to the use > of Python in servers. Since pythonw.exe is a Windows (GUI) program, > I am assuming there is a screen. Sometimes _no_ screen at all is wanted - ie, no main GUI window, and no console window. pythonw is used in this case. COM uses pythonw.exe in just this way, and when executed by DCOM, it will be executed in a context where the user can not see any such dialog. However, I would be happy to ensure the correct command-line is used to prevent this behaviour in this case. Indeed, in _every_ case I use pythonw.exe I would disable this - but I accept that other users have simpler requirements. > Having said that, you may be right that there is some way to > hang on a dialog box which can not be seen. It depends on what > MessageBox() and GetForegroundWindow() actually do. If it seems > that this patch has merit, I would be grateful if you would review > the code to look for issues of this type. There will be no issues in the code - it is just that Win2k will execute in a different "workspace" (I think that is the term). This is identical to the problem of a service attempting to display a messagebox - the code is perfect and works perfectly - just in a context where noone can see it, or dismiss it. > > I would prefer to see a decent API for extracting error and traceback > > information from Python. On the other hand, I _do_ see the problem for > > "newbies" trying to use pythonw.exe. > > There could be an API added to the winstdout module such as > msg = winstdout.GetMessageText() > which would return saved text, control its display etc. I was thinking more of a "Py_GetTraceback()", which would return a complete exception string. Thus, embedders could write code similar to: whatever = Py_BuildValue(...); ret = PyObject_Call(foo, whatever); ... if (!ok) { char *text = Py_GetTraceback(); MsgBox(text); } Thus, with only a small amount of work, they have _complete_ control over the output. However, I agree this doesnt really solve pythonw.exe's problems. > I do not view winstdout as a "newbie" feature, but rather a > generally useful C-language addition to Python. Hrm. I dont believe a commercial app, for example, would find this suitable - they would roll their own solution. Hence I see this purely for newbie users. Advanced users have complete control now - a simple try/except block around their main code, and you are pretty good. A builtin module for displaying a messagebox is as robust as an experienced user needs to emulate this, IMO. > I guess I am saying, perhaps incorrectly, that the mechanism provided > will make further redirection of sys.stdout unnecessary 99% of the > time. Yes, I disagree here. IMO it is no good for a commercial, real app. As I said, I see this as a feature so the newbie will not believe pythonw.exe is broken. Advanced users can already do similar things themselves. > Experimentation shows that Python composes tracebacks and > error messages a line or partial line at a time. That is, you can > not display each call to printf(), but must wait until the system is > idle to be sure that multiple calls to printf() are complete. So this > forces you to use the idle processing loop, not rocket science but > at least inconvenient. What "idle processing loop"? > And the only source of stdout/err is tracebacks, > error messages and the "print" statement. What would you do with > these in a Windows program except display an "OK" dialog box? Log the error to a file, and display a "friendly" dialog - possibly offering to automatically submit a support request/bug report. The casual user is going to be _very_ scared by a Python traceback. This is a sin of a similar magnitude to those crappy applications with unhandled VB exceptions. IMO, nothing looks more unprofessional than an app that displays an internal VB error message. Python is no different IMO. For real applications, there is a good chance that the majority of your users have never heard of Python. Thus, I don't believe your solution suitable for the real, professional, commercial user. However, I agree that your solution does not prevent this user doing the "right thing"... But all this does keep me believing this is a "newbie" helper. > > If someone out there knows of a different example of sys.stdout > redirection in use in the real world, it would be helpful if > they would describe it. Maybe it could be incorporated. Sure. Komodo to a file with a friendly dialog (sometimes ;-). Pythonwin actually attempts a few things first - eg, not every exception Pythonwin casues at startup should be logged. Python services write unhandled errors to the event log. I don't believe I have worked on 2 projects with the same requirement here!!! Mark. From nas at arctrix.com Mon Jan 8 17:22:10 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 8 Jan 2001 08:22:10 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:59:40PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: <20010108082210.A16149@glacier.fnational.com> On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: > My line-at-a-time test case used (rounding to nearest whole integers) 30 > seconds in Python and 6 in Perl. The result of testing many changes to > Python's implementation was that the excess 24 seconds broke down like so: > > 17 spent inside internal MS threadsafe getc() lock/unlock > routines > 5 uncertain, but evidence suggests much of it due to MS > malloc/realloc (Perl does its own memory mgmt) > 2 for not copying directly out of the platform FILE* > implementation struct in a highly optimized loop (like > Perl does) Have you tried pymalloc? Neil From billtut at microsoft.com Tue Jan 9 01:38:14 2001 From: billtut at microsoft.com (Bill Tutt) Date: Mon, 8 Jan 2001 16:38:14 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? Message-ID: <58C671173DB6174A93E9ED88DCB0883D0A6202@red-msg-07.redmond.corp.microsoft.com> > From: Mark Hammond [mailto:MarkH at ActiveState.com] > There will be no issues in the code - it is just that Win2k will execute in > a different "workspace" (I think that is the term). This is identical to > the problem of a service attempting to display a messagebox - the code is > perfect and works perfectly - just in a context where noone can see it, or > dismiss it. The term Mark is looking for here is Windowstation, and it's an NT thing, not just a Win2k thing. Windowstations have been around for ages. Bill From ping at lfw.org Tue Jan 9 02:51:15 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 8 Jan 2001 17:51:15 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14938.6594.44596.509259@beluga.mojam.com> Message-ID: On Mon, 8 Jan 2001, Skip Montanaro wrote: > Okay, how about this as a compromise first step? Allow programmers to put > __exports__ lists in their modules but don't do anything with them *except* > modify dir() to respect that if it exists? I'd say: Just have dir() and import * pay attention to __exports__. Don't mess with getattr or __dict__. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From ping at lfw.org Tue Jan 9 03:00:08 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 8 Jan 2001 18:00:08 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59F27D.C27B8CD0@ActiveState.com> Message-ID: On Mon, 8 Jan 2001, Paul Prescod wrote: > dir() is one of the "interactive tools" I'd like to work better in the > presence of __exports__. On the other hand, dir() works pretty poorly > for object instances today so maybe we need something new anyhow. I suggest a built-in function "methods()" that works like this: def methods(obj): if type(obj) is InstanceType: return methods(obj.__class__) results = [] if hasattr(obj, '__bases__'): for base in obj.__bases__: results.extend(methods(base)) results.extend( filter(lambda k, o=obj: type(getattr(o, k)) in [MethodType, BuiltinMethodType], dir(obj))) return unique(results) def unique(seq): dict = {} for item in seq: dict[item] = 1 results = dict.keys() results.sort() return results >>> import sys >>> >>> methods(sys.stdin) ['close', 'fileno', 'flush', 'isatty', 'read', 'readinto', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines'] >>> >>> import SocketServer >>> >>> methods(SocketServer.ForkingTCPServer) ['__init__', 'collect_children', 'fileno', 'finish_request', 'get_request', 'handle_error', 'handle_request', 'process_request', 'serve_forever', 'server_activate', 'server_bind', 'verify_request'] >>> -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From gstein at lyra.org Tue Jan 9 03:20:56 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 18:20:56 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.102,2.103 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, Jan 08, 2001 at 06:00:13PM -0800 References: Message-ID: <20010108182056.C4640@lyra.org> On Mon, Jan 08, 2001 at 06:00:13PM -0800, Guido van Rossum wrote: >... > Modified Files: > fileobject.c > Log Message: > Tsk, tsk, tsk. Treat FreeBSD the same as the other BSDs when defining > a fallback for TELL64. Fixes SF Bug #128119. >... > *** fileobject.c 2001/01/08 04:02:07 2.102 > --- fileobject.c 2001/01/09 02:00:11 2.103 > *************** > *** 59,63 **** > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > /* NOTE: this is only used on older > NetBSD prior to f*o() funcions */ > --- 59,63 ---- > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > /* NOTE: this is only used on older > NetBSD prior to f*o() funcions */ All of those #ifdefs could be tossed and it would be more robust (long term) if an autoconf macro were used to specify when TELL64 should be defined. [ I've looked thru fileobject.c and am a bit confused: the conditions for defining TELL64 do not match the conditions for *using* it. that would seem to imply a semantic error somewhere and/or a potential gotcha when they get skewed (like I assume what happened to FreeBSD). simplifying with an autoconf macro may help to rationalize it. ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Tue Jan 9 05:29:02 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:29:02 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108161534.A2392@kronos.cnri.reston.va.us> Message-ID: [Andrew Kuchling] I'll chop everything except while_readline (which is most affected by this stuff): > Linux: w/o USE_MS_GETLINE_HACK > while_readline 0.184 0.180 > > Linux w/ USE_MS_GETLINE_HACK: > while_readline 0.183 0.190 > > Solaris w/o USE_MS_GETLINE_HACK: > while_readline 0.839 0.840 > > Solaris w/ USE_MS_GETLINE_HACK: > while_readline 0.769 0.770 So it's probably a wash. In that case, do we want to maintain two hacks for this? I can't use the FLOCKFILE/etc approach on Windows, while "the Windows" approach probably works everywhere (although its speed relies on the platform factoring out at least the locking/unlocking in fgets). Both methods lack a refinement I would like to see, but can't achieve in "the Windows way": ensure that consistency is on no worse than a per-line basis. Right now, both methods lock/unlock the file only for the extent of the current buffer size, so that two threads *can* get back different interleaved pieces of a single long line. Like so: import thread def read(f): x = f.readline() print "thread saw " + `len(x)` + " chars" m.release() f = open("ga", "w") # a file with one long line f.write("x" * 100000 + "\n") f.close() m = thread.allocate_lock() for i in range(10): print i f = open("ga", "r") m.acquire() thread.start_new_thread(read, (f,)) x = f.readline() print "main saw " + `len(x)` + " chars" m.acquire(); m.release() f.close() Here's a typical run on Windows (current CVS Python): 0 main saw 95439 chars thread saw 4562 chars 1 main saw 97941 chars thread saw 2060 chars 2 thread saw 43801 chars main saw 56200 chars 3 thread saw 8011 chars main saw 91990 chars 4 main saw 46546 chars thread saw 53455 chars 5 thread saw 53125 chars main saw 46876 chars 6 main saw 98638 chars thread saw 1363 chars 7 main saw 72121 chars thread saw 27880 chars 8 thread saw 70031 chars main saw 29970 chars 9 thread saw 27555 chars main saw 72446 chars So, yes, it's threadsafe now: between them, the threads always see a grand total of 100001 characters. But what friggin' good is that ? If, e.g., Guido wants multiple threads to chew over his giant logfile, there's no guarantee that .readline() ever returns an actual line from the file. Not that Python 2.0 was any better in this respect ... From tim.one at home.com Tue Jan 9 05:48:25 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:48:25 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108082210.A16149@glacier.fnational.com> Message-ID: [Tim] > 5 uncertain, but evidence suggests much of it due to MS > malloc/realloc (Perl does its own memory mgmt) [NeilS] > Have you tried pymalloc? Not recently, and don't expect to find time for it this week. IIRC, Vladimir did get significant speedups-- lo those many years ago! --when he tried it on Windows, though. Maybe (or maybe not) that was due to exploiting the global lock (i.e., exploiting that pymalloc didn't need to do its own serialization, when called from the Python core). From tim.one at home.com Tue Jan 9 05:52:25 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:52:25 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [Tim] > ... > Here's a typical run on Windows (current CVS Python): > > 0 > main saw 95439 chars > thread saw 4562 chars > 1 > main saw 97941 chars > thread saw 2060 chars > 2 > thread saw 43801 chars > main saw 56200 chars > 3 > thread saw 8011 chars > main saw 91990 chars > 4 > main saw 46546 chars > thread saw 53455 chars > 5 > thread saw 53125 chars > main saw 46876 chars > 6 > main saw 98638 chars > thread saw 1363 chars > 7 > main saw 72121 chars > thread saw 27880 chars > 8 > thread saw 70031 chars > main saw 29970 chars > 9 > thread saw 27555 chars > main saw 72446 chars Oops! I lied. That was the released 2.0. Current CVS is either better or worse, depending on whether you think "working" by accident more often is a good thing or leads to false confidence : 0 main saw 100001 chars thread saw 0 chars 1 main saw 100001 chars thread saw 0 chars 2 main saw 100001 chars thread saw 0 chars 3 main saw 100001 chars thread saw 0 chars 4 main saw 100001 chars thread saw 0 chars 5 thread saw 25802 chars main saw 74199 chars 6 thread saw 802 chars main saw 99199 chars 7 main saw 100001 chars thread saw 0 chars 8 main saw 100001 chars thread saw 0 chars 9 main saw 100001 chars thread saw 0 chars From mal at lemburg.com Tue Jan 9 08:23:42 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 09 Jan 2001 08:23:42 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> <3A5A2528.C289BE1D@lemburg.com> <200101082223.RAA05858@cj20424-a.reston1.va.home.com> Message-ID: <3A5ABC7E.E953962B@lemburg.com> Guido van Rossum wrote: > > > I was thinking an attack where knowledge of common temporary > > execution locations is used to trick Python into executing > > untrusted code -- the untrusted code would only have to be > > copied to the known temporary execution directory and then > > gets executed by Python next time the program using the temporary > > location is invoked. > > When does Python execute code from a predictable common temporary > location? When is that likely to be used from a Python script running > as root? > > Note that if you use tempfile.TemporaryFile(), you can create a > temporary file that's not subvertible. It's not Python itself that's running temporary files. Tools like distutils, RPM, etc. tend to run Python code in temporary locations during build stages. That's what I was thinking about. OTOH, root should know where these tools run their code, so I guess it's moot to discuss who's fault this really is, e.g. distutils style distributions should never be unzipped to /tmp for subsequent installation, but nobody will prevent root from doing so. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Tue Jan 9 08:35:09 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 9 Jan 2001 02:35:09 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Are you sure Perl still uses stdio at all? I've got solid answers now, but I'll paraphrase them anonymously to save the bother of untangling multi-person email etiquette snarls: + Yes, Perl uses platform stdio. Usually. Yes on Windows anyway. + But Perl "cheats" on Windows (well, everywhere it can ...), as I've explained in great detail half a dozen times over the years. No reason to retract any of that. + The cheating is not thread-safe. + The last stab at threads accessible from Perl was an experiment that got dropped. There are no user-muckable threads in std Perl builds. + But there is a notion of threads available at the C level. + This latter notion of threads is used to implement Perl's fork() on Windows, so can be exploited to test Windows Perl thread safety without writing a Perl extension module in C. + This Perl program (very much like the 2-threaded one I just posted for Python) uses that trick: ------------------------------------------------------------------- sub counter { my $nc = 0; while () { $nc += length; } print "num bytes seen = $nc\n"; } open(FILE, "ga"); binmode FILE; fork(); &counter(); ------------------------------------------------------------------- Under the covers, that really shares the FILE filehandle on Windows via threads. Running it multiple times yields multiple wild results; the number of bytes seen by parent and child rarely sum to the number of bytes actually in the input file ("ga"). The most common output for me is that one thread sees the entire file, while the other sees "a lot" of it (since the Perl inner loop registerizes its FILE* struct member shadows for as long as possible, that's actually what I expected). So the code is exactly as thread-unsafe as it looked. bosses-demand-answers-but-they-forget-their-questions-ly y'rs - tim From guido at python.org Tue Jan 9 14:41:24 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 09 Jan 2001 08:41:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 08 Jan 2001 23:29:02 EST." References: Message-ID: <200101091341.IAA09132@cj20424-a.reston1.va.home.com> > So it's probably a wash. In that case, do we want to maintain two hacks for > this? I can't use the FLOCKFILE/etc approach on Windows, while "the > Windows" approach probably works everywhere (although its speed relies on > the platform factoring out at least the locking/unlocking in fgets). I'm much more confident about the getc_unlocked() approach than about fgets() -- with the latter we need much more faith in the C library implementers. (E.g. that fgets() never writes beyond the null bytes it promises, and that it locks/unlocks only once.) Also, you're relying on blindingly fast memchr() and memset() implementations. > Both methods lack a refinement I would like to see, but can't achieve in > "the Windows way": ensure that consistency is on no worse than a per-line > basis. [Example omitted] The only portable way to ensure this that I can see, is to have a separate mutex in the Python file object. Since this is hardly a common thing to do, I think it's better to let the application manage that lock if they need it. (Then why are we bothering with flockfile(), you may ask? Because otherwise, accidental multithreaded reading from the same file could cause core dumps.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Tue Jan 9 16:48:13 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 9 Jan 2001 10:48:13 -0500 Subject: [Python-Dev] Python 2.1 release schedule (PEP 226) In-Reply-To: <200101051529.KAA19100@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 05, 2001 at 10:29:05AM -0500 References: <200101051529.KAA19100@cj20424-a.reston1.va.home.com> Message-ID: <20010109104813.D6203@kronos.cnri.reston.va.us> On Fri, Jan 05, 2001 at 10:29:05AM -0500, Guido van Rossum wrote: > S 222 pep-0222.txt Web Library Enhancements Kuchling > > This is really up to Andrew. It seems he plans to create > new modules, so he won't be introducing incompatibilities in > existing APIs. I don't think PEP 222 will be worked on for 2.1; there have only been a few reactions, and none at all on the python-web-modules mailing list, so I don't think anyone really cares very much at this point. Maybe for 2.2, or maybe I'll just write new classes for Quixote. That leaves PEP 229 as the only PEP I need to work on for 2.1. --amk From tim.one at home.com Tue Jan 9 22:12:42 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 9 Jan 2001 16:12:42 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101091341.IAA09132@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I'm much more confident about the getc_unlocked() approach than about > fgets() -- with the latter we need much more faith in the C library > implementers. (E.g. that fgets() never writes beyond the null bytes > it promises, and that it locks/unlocks only once.) Also, you're > relying on blindingly fast memchr() and memset() implementations. Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a bit quicker on Solaris, despite that it's paying an extra layer of function call per line, to keep it out of get_line proper). That tells me the assumptions are indeed mild. The business about not writing beyond the null byte is a concern only I would have raised: the possibility is an aggressively paranoid reading of the std (I do *lots* of things with libc I'm paranoid about <0.9 wink>). If even *Microsoft* didn't blow these things, it's hard to imagine any other vendor exploding ... Still, I'd rather get rid of ms_getline_hack if I could, because the code is so much more complicated. >> Both methods lack a refinement I would like to see, but can't >> achieve in "the Windows way": ensure that consistency is on no >> worse than a per-line basis. [Example omitted] > The only portable way to ensure this that I can see, is to have a > separate mutex in the Python file object. Since this is hardly a > common thing to do, I think it's better to let the application manage > that lock if they need it. Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the file locked until the line was complete, and I wouldn't be opposed to making life saner on platforms that allow it. But there's another problem here: part of the reason we release Python threads around the fgets is in case some other thread is trying to write the data we're trying to read, yes? But since FLOCKFILE is in effect, other threads *trying* to write to the stream we're reading will get blocked anyway. Seems to give us potential for deadlocks. > (Then why are we bothering with flockfile(), you may ask? I wouldn't ask that, no . > Because otherwise, accidental multithreaded reading from the same > file could cause core dumps.) Ugh ... turns out that on my box I can provoke core dumps anyway, with this program. Blows up under released 2.0 and CVS Pythons (so it's not due to anything new): import thread def read(f): import time time.sleep(.01) n = 0 while n < 1000000: x = f.readline() n += len(x) print "r", print "read " + `n` m.release() m = thread.allocate_lock() f = open("ga", "w+") print "opened" m.acquire() thread.start_new_thread(read, (f,)) n = 0 x = "x" * 113 + "\n" while n < 1000000: f.write(x) print "w", n += len(x) m.acquire() print "done" Typical run: C:\Python20>\code\python\dist\src\pcbuild\python temp.py opened w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r r w r w r w r w r w r and then it dies in msvcrt.dll with a bad pointer. Also dies under the debugger (yay!) ... always dies like so: + We (Python) call the MS fwrite, from fileobject.c file_write. + MS fwrite succeeds with its _lock_str(stream) call. + MS fwrite then calls MS _fwrite_lk. + MS _fwrite_lk calls memcpy, which blows up for a non-obvious reason. Looks like the stream's _cnt member has gone mildly negative, which _fwrite_lk casts to unsigned and so treats like a giant positive count, and so memcpy eventually runs off the end of the process address space. Only thing I can conclude from this is that MS's internal stream-locking implementation is buggy. At least on W98SE. Other flavors of Windows? Other platforms? Note that I don't claim the program above is *sensible*, just that it shouldn't blow up. Alas, short of indeed adding a separate mutex in Python file objects-- or writing our own stdio --I don't believe I can fix this. the-best-thing-to-do-with-threads-is-don't-ly y'rs - tim From fdrake at acm.org Tue Jan 9 23:58:49 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 9 Jan 2001 17:58:49 -0500 (EST) Subject: [Python-Dev] Updated development documentation Message-ID: <14939.38825.218757.535010@cj42289-a.reston1.va.home.com> I've just updated the development version of the documentation, but am not sure the automated notice got sent. This version contains a wide variety of smaller updates, plus added documentation on the fpectl and xreadlines modules. http://python.sourceforge.net/devel-docs/ -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From MarkH at ActiveState.com Wed Jan 10 01:00:03 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 9 Jan 2001 16:00:03 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: > Only thing I can conclude from this is that MS's internal stream-locking > implementation is buggy. At least on W98SE. Other flavors of Windows? > Other platforms? Same behaviour on Win2k for me. Mark. From tim.one at home.com Wed Jan 10 01:55:11 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 9 Jan 2001 19:55:11 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: Final report (I've spent way more time on this than I can afford already, so it's "final" by defn <0.3 wink>). We started here (on my Win98SE box, using Guido's test program): total 117615824 chars and 3237568 lines count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 Here's where we are today: total 117615824 chars and 3237568 lines count_chars_lines 14.670 14.667 readlines_sizehint 9.500 9.506 using_fileinput 28.670 28.708 while_readline 13.680 13.676 for_xreadlines 7.630 7.635 Same box, same input file, same test program except for this addition: def for_xreadlines(fn): f = open(fn, MODE) for line in xreadlines.xreadlines(f): pass f.close() This last is within 25% of Perl "while (<>)" speed, but-- unlike Perl --is thread-safe. Good show! The other speedups are nothing to snort at either. The strangest thing left to my eye is why xreadlines enjoys a significant advantage over the double-loop buffering method (readlines_sizehint) on my box; reducing the very large (1Mb) buffer in Guido's test program made no material difference to that. nothing's-ever-finished-but-everything-ends-ly y'rs - tim From tim.one at home.com Wed Jan 10 06:46:24 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 00:46:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [Tim] > Only thing I can conclude from this is that MS's internal stream- > locking implementation is buggy. At least on W98SE. Other flavors > of Windows? Other platforms? [Mark Hammond] > Same behaviour on Win2k for me. Thanks, Mark! I opened a bug on SF to record more clues: http://sourceforge.net/bugs/?func=detailbug&bug_id=128210&group_id=5470 I didn't assign it to anyone because-- best I can tell --there's nothing realistic we can do about it. Probably won't happen in practice anyway . there's-a-reason-thread-problems-pop-up-on-windows-first-but- ms-isn't-it-ly y'rs - tim From billtut at microsoft.com Wed Jan 10 10:10:51 2001 From: billtut at microsoft.com (Bill Tutt) Date: Wed, 10 Jan 2001 01:10:51 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> With a nice simple C test case from Tim, I've submitted this one to internal support. I'll let everybody know what happens when I know more. Bill -----Original Message----- From: Tim Peters [mailto:tim.one at home.com] Sent: Tuesday, January 09, 2001 9:46 PM To: python-dev at python.org Subject: RE: [Python-Dev] xreadlines : readlines :: xrange : range [Tim] > Only thing I can conclude from this is that MS's internal stream- > locking implementation is buggy. At least on W98SE. Other flavors > of Windows? Other platforms? [Mark Hammond] > Same behaviour on Win2k for me. Thanks, Mark! I opened a bug on SF to record more clues: http://sourceforge.net/bugs/?func=detailbug&bug_id=128210&group_id=5470 I didn't assign it to anyone because-- best I can tell --there's nothing realistic we can do about it. Probably won't happen in practice anyway . there's-a-reason-thread-problems-pop-up-on-windows-first-but- ms-isn't-it-ly y'rs - tim _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://www.python.org/mailman/listinfo/python-dev From m.favas at per.dem.csiro.au Wed Jan 10 12:57:56 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Wed, 10 Jan 2001 19:57:56 +0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint Message-ID: <3A5C4E44.23B593E9@per.dem.csiro.au> Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same behaviour as Tim's WinBox wrt the new xreadline and the double-loop readlines (so it's not just something funny with MS (not that there's not anything funny with MS...)): total 131426612 chars and 514216 lines count_chars_lines 5.450 5.066 readlines_sizehint 4.112 4.083 using_fileinput 10.928 10.916 while_readline 11.766 11.733 for_xreadlines 3.569 3.533 -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tismer at tismer.com Wed Jan 10 12:06:42 2001 From: tismer at tismer.com (Christian Tismer) Date: Wed, 10 Jan 2001 13:06:42 +0200 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A5C4242.E445C3A1@tismer.com> Ka-Ping Yee wrote: > > On Mon, 8 Jan 2001, Skip Montanaro wrote: > > Okay, how about this as a compromise first step? Allow programmers to put > > __exports__ lists in their modules but don't do anything with them *except* > > modify dir() to respect that if it exists? > > I'd say: Just have dir() and import * pay attention to __exports__. > Don't mess with getattr or __dict__. quadruple-nodd - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal at lemburg.com Wed Jan 10 14:21:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 10 Jan 2001 14:21:28 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <3A5C61D8.2E5D098C@lemburg.com> Guido van Rossum wrote: > > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). Can't we use the existing attribute __all__ (this is currently only used for packages) for this kind of thing. As other have already remarked: I would rather like to see this attribute being used as basis for 'from M import *' rather than enforce the access restrictions like the patch suggests. Access control mechanisms should be treated in different ways such as wrapping objects using access-control proxies (see mx.Proxy for an example of such an implementation) and on-demand only. I wouldn't wan't to pay the performance hit for each and every lookup in all my Python applications just because someone out there feels that "from M import *" has a meaning in life apart from being useful in interactive sessions to ease typing ;-) > I like it. This has been asked for many times. Does anybody see a > reason why this should *not* be added? > > Tim remarked that introducing this will prompt demands for a similar > feature on classes and instances, where it will be hard to implement > without causing a bit of a slowdown. It causes a slight slowdown (an > extra dictionary lookup for each use of "M.v") even when it is not > used, but for accessing module variables that's acceptable. I'm not > so sure about instance variable references. Again, I'd rather see these implemented using different techniques which are under programmer control and made explicit and visible in the program flow. Proxies are ideal for these things, since they allow great flexibility while still providing reasonable security at Python level. I have been using the proxy approach for years now and so far with great success. What's even better is that weak references and garbage finalization aids come along with it for free. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Jan 10 16:12:56 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 10:12:56 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 09 Jan 2001 19:55:11 EST." References: Message-ID: <200101101512.KAA26193@cj20424-a.reston1.va.home.com> > The strangest thing left to my eye is why xreadlines enjoys a significant > advantage over the double-loop buffering method (readlines_sizehint) on my > box; reducing the very large (1Mb) buffer in Guido's test program made no > material difference to that. I was baffled at this too (same difference on my box), until I discovered that the buffer size is specified *twice*: once as a default in the arg list of readlines_sizehint(), then *again* in the call to timer() near the bottom of the file. Take the latter one out and the times are comparable, in fact readlines_sizehint() is a few percent quicker. --Guido van Rossum (home page: http://www.python.org/~guido/) From jim at interet.com Wed Jan 10 16:19:01 2001 From: jim at interet.com (James C. Ahlstrom) Date: Wed, 10 Jan 2001 10:19:01 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? References: Message-ID: <3A5C7D65.780065C6@interet.com> Mark Hammond wrote: > Sometimes _no_ screen at all is wanted - ie, no main GUI window, and no > console window. pythonw is used in this case. COM uses pythonw.exe in just > this way, and when executed by DCOM, it will be executed in a context where > the user can not see any such dialog. > > However, I would be happy to ensure the correct command-line is used to > prevent this behaviour in this case. > > Indeed, in _every_ case I use pythonw.exe I would disable this - but I > accept that other users have simpler requirements. It would be easier to have a pythonw2.exe where this feature is built in, rather than a command line option. But see below. > > I do not view winstdout as a "newbie" feature, but rather a > > generally useful C-language addition to Python. > > Hrm. I dont believe a commercial app, for example, would find this > suitable - they would roll their own solution. ... > > I guess I am saying, perhaps incorrectly, that the mechanism provided > > will make further redirection of sys.stdout unnecessary 99% of the > > time. > > Yes, I disagree here. IMO it is no good for a commercial, real app. As I ... > > If someone out there knows of a different example of sys.stdout > > redirection in use in the real world, it would be helpful if > > they would describe it. Maybe it could be incorporated. > > Sure. Komodo to a file with a friendly dialog (sometimes ;-). ... > I don't believe I have worked on 2 projects with the same requirement > here!!! Well, that is the problem. Is this feature "generally useful"? I am writing Windows programs in which Python is the "main" and provides the GUI, so I find this useful. And I do show my users tracebacks. But perhaps this is unique to me. I don't see users of wxPython nor tkinter replying "great idea" so maybe they don't use pythonw. Absent more support, I don't think this idea has enough merit to justify a patch. JimA From guido at python.org Wed Jan 10 17:39:34 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:39:34 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 01:10:51 PST." <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> References: <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> Message-ID: <200101101639.LAA26776@cj20424-a.reston1.va.home.com> > With a nice simple C test case from Tim, I've submitted this one to internal > support. > I'll let everybody know what happens when I know more. I bet you it's rejected on the basis of "the docs tell you not to mix reading and writing on the same stream without intervening seek or flush." If I were on the support line I would do that. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 10 17:38:16 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:38:16 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 09 Jan 2001 16:12:42 EST." References: Message-ID: <200101101638.LAA26759@cj20424-a.reston1.va.home.com> > [Guido] > > I'm much more confident about the getc_unlocked() approach than about > > fgets() -- with the latter we need much more faith in the C library > > implementers. (E.g. that fgets() never writes beyond the null bytes > > it promises, and that it locks/unlocks only once.) Also, you're > > relying on blindingly fast memchr() and memset() implementations. [Tim] > Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a > bit quicker on Solaris, despite that it's paying an extra layer of function > call per line, to keep it out of get_line proper). That tells me the > assumptions are indeed mild. The business about not writing beyond the null > byte is a concern only I would have raised: the possibility is an > aggressively paranoid reading of the std (I do *lots* of things with libc > I'm paranoid about <0.9 wink>). If even *Microsoft* didn't blow these > things, it's hard to imagine any other vendor exploding ... > > Still, I'd rather get rid of ms_getline_hack if I could, because the code is > so much more complicated. Which is another argument to prefer the getc_unlocked() code when it works -- it's obviously correct. :-) > >> Both methods lack a refinement I would like to see, but can't > >> achieve in "the Windows way": ensure that consistency is on no > >> worse than a per-line basis. [Example omitted] > > > The only portable way to ensure this that I can see, is to have a > > separate mutex in the Python file object. Since this is hardly a > > common thing to do, I think it's better to let the application manage > > that lock if they need it. > > Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the > file locked until the line was complete, and I wouldn't be opposed to making > life saner on platforms that allow it. Hm... That would be possible, except for one unfortunate detail: _PyString_Resize() may call PyErr_BadInternalCall() which touches thread state. > But there's another problem here: > part of the reason we release Python threads around the fgets is in case > some other thread is trying to write the data we're trying to read, yes? NO, NO NO! Mixing reads and writes on the same stream wasn't what we are locking against at all. (As you've found out, it doesn't even work.) We're only trying to protect against concurrent *reads*. > But since FLOCKFILE is in effect, other threads *trying* to write to the > stream we're reading will get blocked anyway. Seems to give us potential > for deadlocks. Only if tyeh are holding other locks at the same time. I haven't done a thorough survey of fileobject.c, but I've skimmed it, I believe it's religious about releasing the Global Interpreter Lock around I/O calls. But, of course, 3rd party C code might not be. > > (Then why are we bothering with flockfile(), you may ask? > > I wouldn't ask that, no . > > > Because otherwise, accidental multithreaded reading from the same > > file could cause core dumps.) > > Ugh ... turns out that on my box I can provoke core dumps anyway, with this > program. Blows up under released 2.0 and CVS Pythons (so it's not due to > anything new): Yeah. But this is insane use -- see my comments on SF. It's only worth fixing because it could be used to intentionally crash Python -- but there are easier ways... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Wed Jan 10 17:41:47 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 10 Jan 2001 10:41:47 -0600 (CST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? Message-ID: <14940.37067.893679.750918@beluga.mojam.com> I just noticed that the "Environment" options for Python on the SF site are listed as Console (Text Based), Win32 (MS Windows), X11 Applications Shouldn't something Macintosh-related be in that list as well? Skip From guido at python.org Wed Jan 10 17:53:16 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:53:16 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Wed, 10 Jan 2001 14:21:28 +0100." <3A5C61D8.2E5D098C@lemburg.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> Message-ID: <200101101653.LAA28986@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > Please have a look at this SF patch: > > > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > > > This implements control over which names defined in a module are > > externally visible: if there's a variable __exports__ in the module, > > it is a list of identifiers, and any access from outside the module to > > names not in the list is disallowed. This affects access using the > > getattr and setattr protocols (which raise AttributeError for > > disallowed names), as well as "from M import v" (which raises > > ImportError). [Marc-Andre] > Can't we use the existing attribute __all__ (this is currently > only used for packages) for this kind of thing. As other have already > remarked: I would rather like to see this attribute being used > as basis for 'from M import *' rather than enforce the access > restrictions like the patch suggests. Yes -- I came up with the same thought. So here's a plan: somebody please submit a patch that does only one thing: from...import * looks for __all__ and if it exists, imports exactly those names. No changes to dir(), or anything. > Access control mechanisms should be treated in different ways > such as wrapping objects using access-control proxies (see mx.Proxy > for an example of such an implementation) and on-demand only. > I wouldn't wan't to pay the performance hit for each and every > lookup in all my Python applications just because someone out > there feels that "from M import *" has a meaning in life > apart from being useful in interactive sessions to ease typing ;-) In the process of looking into Zope internals I've noticed that proxies are indeed very useful! I note that the IMPORT opcodes in ceval.c require that the imported module (as found in sys.modules[name] or returned by __import__()) is a real module object. I think this is unnecessary -- at least IMPORT_FROM should work even if the module is a proxy or some other thing (I've been known to smuggle class instances into sys.modules :-) and IMPORT_STAR should work with a non-module at least if it has an __all__ attribute. > > I like it. This has been asked for many times. Does anybody see a > > reason why this should *not* be added? > > > > Tim remarked that introducing this will prompt demands for a similar > > feature on classes and instances, where it will be hard to implement > > without causing a bit of a slowdown. It causes a slight slowdown (an > > extra dictionary lookup for each use of "M.v") even when it is not > > used, but for accessing module variables that's acceptable. I'm not > > so sure about instance variable references. > > Again, I'd rather see these implemented using different > techniques which are under programmer control and made > explicit and visible in the program flow. Proxies are ideal > for these things, since they allow great flexibility while > still providing reasonable security at Python level. > > I have been using the proxy approach for years now and > so far with great success. What's even better is that > weak references and garbage finalization aids come along with > it for free. Agreed. Which reminds me -- would you mind reviewing Fred's new version of PEP 205 (weak refs)? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Jan 10 18:12:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 10 Jan 2001 18:12:20 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <3A5C97F4.945D0C1@lemburg.com> Guido van Rossum wrote: > > > Guido van Rossum wrote: > > > > > > Please have a look at this SF patch: > > > > > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > > > > > This implements control over which names defined in a module are > > > externally visible: if there's a variable __exports__ in the module, > > > it is a list of identifiers, and any access from outside the module to > > > names not in the list is disallowed. This affects access using the > > > getattr and setattr protocols (which raise AttributeError for > > > disallowed names), as well as "from M import v" (which raises > > > ImportError). > > [Marc-Andre] > > Can't we use the existing attribute __all__ (this is currently > > only used for packages) for this kind of thing. As other have already > > remarked: I would rather like to see this attribute being used > > as basis for 'from M import *' rather than enforce the access > > restrictions like the patch suggests. > > Yes -- I came up with the same thought. Sorry, I didn't read the whole thread on the topic. Rereading the above paragraph I guess I should have had some more coffee at the time of writing ;-) > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. +1 -- this won't be me though (at least not this week). > > Access control mechanisms should be treated in different ways > > such as wrapping objects using access-control proxies (see mx.Proxy > > for an example of such an implementation) and on-demand only. > > I wouldn't wan't to pay the performance hit for each and every > > lookup in all my Python applications just because someone out > > there feels that "from M import *" has a meaning in life > > apart from being useful in interactive sessions to ease typing ;-) > > In the process of looking into Zope internals I've noticed that > proxies are indeed very useful! > > I note that the IMPORT opcodes in ceval.c require that the imported > module (as found in sys.modules[name] or returned by __import__()) is > a real module object. I think this is unnecessary -- at least > IMPORT_FROM should work even if the module is a proxy or some other > thing (I've been known to smuggle class instances into sys.modules :-) > and IMPORT_STAR should work with a non-module at least if it has an > __all__ attribute. Cool. This could make Python instances usable as "modules" -- with full getattr() hook support ! For IMPORT_STAR I'd suggest first looking for __all__ and then reverting to __dict__.items() in case this fails. BTW, is __dict__ needed by the import mechanism or would the getattr/setattr slots suffice ? And if yes, must it be a real Python dictionary ? > > > I like it. This has been asked for many times. Does anybody see a > > > reason why this should *not* be added? > > > > > > Tim remarked that introducing this will prompt demands for a similar > > > feature on classes and instances, where it will be hard to implement > > > without causing a bit of a slowdown. It causes a slight slowdown (an > > > extra dictionary lookup for each use of "M.v") even when it is not > > > used, but for accessing module variables that's acceptable. I'm not > > > so sure about instance variable references. > > > > Again, I'd rather see these implemented using different > > techniques which are under programmer control and made > > explicit and visible in the program flow. Proxies are ideal > > for these things, since they allow great flexibility while > > still providing reasonable security at Python level. > > > > I have been using the proxy approach for years now and > > so far with great success. What's even better is that > > weak references and garbage finalization aids come along with > > it for free. > > Agreed. Which reminds me -- would you mind reviewing Fred's new > version of PEP 205 (weak refs)? I'll have a look at it next week. Is that OK ? > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Wed Jan 10 18:37:58 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 10 Jan 2001 12:37:58 -0500 (EST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: <14940.37067.893679.750918@beluga.mojam.com> References: <14940.37067.893679.750918@beluga.mojam.com> Message-ID: <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > I just noticed that the "Environment" options for Python on the SF site are > listed as > > Console (Text Based), Win32 (MS Windows), X11 Applications > > Shouldn't something Macintosh-related be in that list as well? Are the maintainers of the MacOS port using the SF bug tracker or something else? If they're using it, then by all means we should add it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas at xs4all.net Wed Jan 10 19:06:06 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 10 Jan 2001 19:06:06 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,NONE,1.1 Setup.dist,1.3,1.4 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Tue, Jan 09, 2001 at 01:46:53PM -0800 References: Message-ID: <20010110190606.T2467@xs4all.nl> On Tue, Jan 09, 2001 at 01:46:53PM -0800, Guido van Rossum wrote: > static void > xreadlines_dealloc(PyXReadlinesObject *op) { > Py_XDECREF(op->file); > Py_XDECREF(op->lines); > PyObject_DEL(op); > } I'm confuzzled. Is this breach of the style guidelines intentional, accidental, or just not cared enough about ? The style isn't even consistent in that single module! > void > initxreadlines(void) > { > PyObject *m; > > m = Py_InitModule("xreadlines", xreadlines_methods); > } -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at mojam.com Wed Jan 10 19:11:52 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 10 Jan 2001 12:11:52 -0600 (CST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> References: <14940.37067.893679.750918@beluga.mojam.com> <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> Message-ID: <14940.42472.174920.866172@beluga.mojam.com> Fred> Are the maintainers of the MacOS port using the SF bug tracker or Fred> something else? If they're using it, then by all means we should Fred> add it. Even if they aren't, I think it would be valuable to list. There aren't all that many tools (open source or otherwise) that run on Unix, Windows and Mac and can be used as either a console app or a GUI. I assume the reason Fred asks is that the Environment: list is generated on-the-fly and somehow ties into use of the SF bug tracker. Skip From thomas at xs4all.net Wed Jan 10 19:45:44 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 10 Jan 2001 19:45:44 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101101653.LAA28986@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 10, 2001 at 11:53:16AM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010110194544.V2467@xs4all.nl> On Wed, Jan 10, 2001 at 11:53:16AM -0500, Guido van Rossum wrote: > I note that the IMPORT opcodes in ceval.c require that the imported > module (as found in sys.modules[name] or returned by __import__()) is > a real module object. I think this is unnecessary -- at least > IMPORT_FROM should work even if the module is a proxy or some other > thing (I've been known to smuggle class instances into sys.modules :-) > and IMPORT_STAR should work with a non-module at least if it has an > __all__ attribute. Hmm.... Have you been sneaking looks at python-list again, Guido ? :-) I'm certain the expanding of IMPORT would make a lot of people very happy. Alex Martelli only just discovered the fact you can populate sys.modules yourself, with non-module objects, and was wondering about its legality and compatibility. I, for one, am very +1 on the idea, also on MAL's idea to do our best in the IMPORT_STAR case (try dict.items(), etc.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed Jan 10 19:49:40 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 13:49:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101101512.KAA26193@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > The strangest thing left to my eye is why xreadlines enjoys a > significant advantage over the double-loop buffering method > (readlines_sizehint) on my box; reducing the very large > (1Mb) buffer in Guido's test program made no material difference > to that. [Guido] > I was baffled at this too (same difference on my box), until I > discovered that the buffer size is specified *twice*: once as a > default in the arg list of readlines_sizehint(), then *again* in > the call to timer() near the bottom of the file. Bingo! > Take the latter one out and the times are comparable, in fact > readlines_sizehint() is a few percent quicker. They're indistinguishable then on my box (on one run xreadlines is .1 seconds (out of around 7.6 total) quicker, on another readlines_sizehint), *provided* that I specify the same buffer size (8192) that xreadlines uses internally. However, if I even double that, readlines_sizehint is uniformly about 10% slower. It's also a tiny bit slower if I cut the sizehint buffer size to 4096. I'm afraid Mysteries will remain no matter how many person-decades we spend staring at this <0.5 wink> ... From guido at python.org Wed Jan 10 19:50:10 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 13:50:10 -0500 Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: Your message of "Wed, 10 Jan 2001 10:41:47 CST." <14940.37067.893679.750918@beluga.mojam.com> References: <14940.37067.893679.750918@beluga.mojam.com> Message-ID: <200101101850.NAA29744@cj20424-a.reston1.va.home.com> > I just noticed that the "Environment" options for Python on the SF site are > listed as > > Console (Text Based), Win32 (MS Windows), X11 Applications > > Shouldn't something Macintosh-related be in that list as well? Yeah, except for two problems: :-) (1) This is a selection from a drop-down menu that doesn't have a Mac option; (2) There are only three slots allowed. So this is the best we can do. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Wed Jan 10 19:53:32 2001 From: gstein at lyra.org (Greg Stein) Date: Wed, 10 Jan 2001 10:53:32 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010110194544.V2467@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 10, 2001 at 07:45:44PM +0100 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> <20010110194544.V2467@xs4all.nl> Message-ID: <20010110105332.T4640@lyra.org> On Wed, Jan 10, 2001 at 07:45:44PM +0100, Thomas Wouters wrote: > On Wed, Jan 10, 2001 at 11:53:16AM -0500, Guido van Rossum wrote: > > > I note that the IMPORT opcodes in ceval.c require that the imported > > module (as found in sys.modules[name] or returned by __import__()) is > > a real module object. I think this is unnecessary -- at least > > IMPORT_FROM should work even if the module is a proxy or some other > > thing (I've been known to smuggle class instances into sys.modules :-) > > and IMPORT_STAR should work with a non-module at least if it has an > > __all__ attribute. > > Hmm.... Have you been sneaking looks at python-list again, Guido ? :-) I'm > certain the expanding of IMPORT would make a lot of people very happy. Alex > Martelli only just discovered the fact you can populate sys.modules > yourself, with non-module objects, and was wondering about its legality and > compatibility. > > I, for one, am very +1 on the idea, also on MAL's idea to do our best in the > IMPORT_STAR case (try dict.items(), etc.) +1 ... I'm always up for removing type restrictions. Did that with the bytecodes in function objects a while back. Cheers, -g -- Greg Stein, http://www.lyra.org/ From MarkH at ActiveState.com Wed Jan 10 19:54:34 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 10 Jan 2001 10:54:34 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,NONE,1.1 Setup.dist,1.3,1.4 In-Reply-To: <20010110190606.T2467@xs4all.nl> Message-ID: > I'm confuzzled. Is this breach of the style guidelines intentional, > accidental, or just not cared enough about ? I vote the latter! Who-really-cares ly, Mark. From guido at python.org Wed Jan 10 20:00:24 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 14:00:24 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: Your message of "Mon, 08 Jan 2001 11:31:09 EST." <20010108113109.C7563@kronos.cnri.reston.va.us> References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> Message-ID: <200101101900.OAA30486@cj20424-a.reston1.va.home.com> [me] > >I expect Andrew's code to go in before 2.1 is released. So I don't > >see a reason why we should hurry and check in a stop-gap measure. [Andrew] > But it might not; the final version might be unacceptable or run into > some intractable problem. Assuming the patch is correct (I haven't > looked at it), why not check it in? The work has already been done to > write it, after all. OK, done. It was more work than I had hoped for, because Eric apparently (despite having developer privileges!) doesn't use the CVS tree -- he sent in a diff relative to the 2.0 release. I munged it into place, adding the feature that readline, _curses and bsdddb are built as shared libraries by default. You'd have to edit Setup.config.in to change this. Hope this doesn't break anybody's setup. (Skip???) Question for Eric: do you still want developer privileges? They come with responsibilities too. Please check out the @#$%& CVS tree! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 10 20:03:07 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 14:03:07 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 01 Jan 2001 19:49:35 CST." <20010101194935.19672@falcon.inetnebr.com> References: <20010101194935.19672@falcon.inetnebr.com> Message-ID: <200101101903.OAA30522@cj20424-a.reston1.va.home.com> Hi Jeff, I'm glad to tell you that I've accepted your xreadlines patches. It's all checked into the CVS tree now, except for your patch to fileinput.py, where I had already checked in a similar change using readlines(sizehint) directly. Thanks again for your contribution! --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Wed Jan 10 21:08:31 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 10 Jan 2001 12:08:31 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <3A5CC13F.DFB26A0B@ActiveState.com> Guido van Rossum wrote: > > ... > > Yes -- I came up with the same thought. > > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. Why? From my point of view, the changes to dir() are much more important. I seldom tell newbies about import * but I always tell them how they can browse objects (especially modules) with dir. If dir() is changed then IDEs and so forth would use that and inherit the right behavior. If the module exporting behavior gets more sophisticated in a future version of Python they will continue to inherit the behavior. Also, dir() could look for an __all__ on all objects including "module proxies", classes and "plain old instances". In other words we can extend the convention to other objects "for free". Paul From tim.one at home.com Wed Jan 10 21:25:24 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 15:25:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101101638.LAA26759@cj20424-a.reston1.va.home.com> Message-ID: [Tim] >> Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method >> to keep the file locked until the line was complete, and I >> wouldn't be opposed to making life saner on platforms that allow it. [Guido] > Hm... That would be possible, except for one unfortunate detail: > _PyString_Resize() may call PyErr_BadInternalCall() which touches > thread state. FLOCKFILE/FUNLOCKFILE are independent of Python's notion of thread state. IOW, do FLOCKFILE once before the for(;;), and FUNLOCKFILE once on every *exit* path thereafter. We can block/unblock Python threads as often as desired between those *file*-locking brackets. The only thing the repeated FLOCKFILE/FUNLOCKFILE calls do to my eyes now is to create the *possibility* for multiple readers to get partial lines of the file. > ... > NO, NO NO! Mixing reads and writes on the same stream wasn't what we > are locking against at all. (As you've found out, it doesn't even > work.) On Windows, yes, but that still seems to me to be a bug in MS's code. If anyone had reported a core dump on any other platform, I'd be more tractable on this point. > We're only trying to protect against concurrent *reads*. As above, I believe that we could do a better job of that, then, on platforms that HAVE_GETC_UNLOCKED, by protecting not only against core dumps but also against .readline() not delivering an intact line from the file. >> But since FLOCKFILE is in effect, other threads *trying* to write >> to the stream we're reading will get blocked anyway. Seems to give us >> potential for deadlocks. > Only if tyeh are holding other locks at the same time. I'm not being clear, then. Thread X does f.readline(), on a HAVE_GETC_UNLOCKED platform. get_line allows other threads to run and invokes FLOCKFILE on f->f_fp. get_line's GETC in thread X eventually hits the end of the stdio buffer, and does its platform's version of _filbuf. _filbuf may wait (depending on the nature of the stream) for more input to show up. Simultaneously, thread Y attempts to write some data to f. But the *FLOCKFILE* lock prevents it from doing anything with f. So X is waiting for Y to write data inside platform _filbuf, but Y is waiting for X to release the platform stream lock inside some platform stream-output routine (if I'm being clear now, Python locks have nothing to do with this scenario: it's the platform stream lock). I think this is purely the user's fault if it happens. Just pointing it out as another insecurity we're probably not able to protect users from. > ... > Yeah. But this is insane use -- see my comments on SF. It's only > worth fixing because it could be used to intentionally crash Python -- > but there are easier ways... If it's unique to MS (as I suspect), I see no reason to even consider trying to fix it in Python. Unless the Perl Mongers use it to crash Zope . From cgw at fnal.gov Wed Jan 10 22:57:41 2001 From: cgw at fnal.gov (Charles G Waldman) Date: Wed, 10 Jan 2001 15:57:41 -0600 (CST) Subject: [Python-Dev] Interning filenames of imported modules Message-ID: <14940.56021.646147.770080@buffalo.fnal.gov> I have a question about the following code in compile.c:jcompile (line 3678) filename = PyString_InternFromString(sc.c_filename); name = PyString_InternFromString(sc.c_name); In the case of a long-running server which constantly imports modules, this causes the interned string dict to grow without bound. Is there a strong reason that the filename needs to be interned? How about the module name? How about some way to enforce a limit on the size of the interned strings dictionary? From mwh21 at cam.ac.uk Wed Jan 10 23:02:49 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: Wed, 10 Jan 2001 22:02:49 +0000 (GMT) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5CC13F.DFB26A0B@ActiveState.com> Message-ID: On Wed, 10 Jan 2001, Paul Prescod wrote: > Guido van Rossum wrote: > > > > ... > > > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Why? From my point of view, the changes to dir() are much more > important. I seldom tell newbies about import * but I always tell them > how they can browse objects (especially modules) with dir. If dir() is > changed then IDEs and so forth would use that and inherit the right > behavior. If the module exporting behavior gets more sophisticated in a > future version of Python they will continue to inherit the behavior. Changing dir would also make rlcompleter nicer - it's something of a pain to use with a module that has, eg, "from TERMIOS import *"-ed. This might also make "from ... import *" less of a pariah... Sounds good to me, IOW. Cheers, M. From tim.one at home.com Wed Jan 10 23:23:14 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 17:23:14 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101101639.LAA26776@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I bet you it's rejected on the basis of "the docs tell you not to mix > reading and writing on the same stream without intervening seek or > flush." If I were on the support line I would do that. So would I if I were a typical first-line support idiot . But the *implementers*-- if they ever see it --should be very keen to figure out how they managed to let the _iobuf get corrupted. *I'm* not mucking with their internals, nor doing wild pointer stores, nor anything else sneaky to subvert their locking protection. I wasn't even trying to break it. The only code reading from or storing into the _iobuf is theirs. They're ordinary stdio calls with ordinary arguments, and if *any* sequence of those can cause internal corruption, they've almost certainly got a problem that will manifest in other situations too. Think like an implementer here <0.5 wink>: they've lost track of how many characters are in the buffer despite a locking scheme whose purpose is to prevent that. If it were my implementation, that would be a top-priority bug no matter how silly the first program I saw that triggered it. but-willing-to-let-them-decide-whether-they-care-ly y'rs - tim From skip at mojam.com Wed Jan 10 23:52:55 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 10 Jan 2001 16:52:55 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5CC13F.DFB26A0B@ActiveState.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> <3A5CC13F.DFB26A0B@ActiveState.com> Message-ID: <14940.59335.723701.574821@beluga.mojam.com> Paul> Also, dir() could look for an __all__ on all objects including Paul> "module proxies", classes and "plain old instances". In other Paul> words we can extend the convention to other objects "for free". The __exports__/dir() patch I submitted will do this if you remove the PyModule_Check that guards it. Skip From tim.one at home.com Thu Jan 11 00:06:05 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 18:06:05 -0500 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <3A5C4E44.23B593E9@per.dem.csiro.au> Message-ID: [Mark Favas] > Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same > behaviour as Tim's WinBox wrt the new xreadline and the double-loop > readlines (so it's not just something funny with MS (not that there's > not anything funny with MS...)): > > total 131426612 chars and 514216 lines You average over 255 chars/line? Really? What kind of file are you reading? I don't really want to measure the speed of line-at-a-time input on binary files where "line" doesn't actually make sense <0.6 wink>. > count_chars_lines 5.450 5.066 > readlines_sizehint 4.112 4.083 > using_fileinput 10.928 10.916 > while_readline 11.766 11.733 > for_xreadlines 3.569 3.533 Guido pointed out that his readlines_sizehint test forced use of a 1Mb buffer (in the call, not only the default value). For whatever reason, that was significantly slower than using an 8Kb sizehint on my box. Another oddity is that while_readline is slower than using_fileinput for you. From that I take it Python config does *not* #define HAVE_GETC_UNLOCKED on your platform. If that's true (or esp. if it's not!), would you do me a favor? Recompile fileobject.c with USE_MS_GETLINE_HACK #define'd, try the timing test again (while_readline is the most interesting test for this), and run the test_bufio.py std test to make sure you're actually getting the right answers. At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack whenever HAVE_GETC_UNLOCKED isn't available. I'd be surprised if ms_getline_hack failed to work correctly on any platform; a bigger unknown (to me) is whether it will yield a speedup. So far it yields a large speedup on Windows, and looks like a speedup equal to getc_unlocked() yields on Linux and Solaris. Info on a platform from Mars (like Tru64 Unix ) would be valuable in deciding whether to boost +0.5. don't-want-your-python-to-run-slower-than-possible-if-possible-ly y'rs - tim From tismer at tismer.com Wed Jan 10 23:38:57 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 11 Jan 2001 00:38:57 +0200 Subject: [Python-Dev] [Stackless] ANN: Sourcecode for Stackless Python 2.0 Message-ID: <3A5CE481.24A7656@tismer.com> On Monday, Jan 8th, I spake """ Source code and an update to the website will become available in the next days. """ Now, here it is, together with a slightly updated website, which tries to mention all the people who are helping or sponsoring me (yes, there are sponsors!). If somebody feels ignored by me, let me know. I'm good at making mistakes. Let me also know if there are problems building the code, or if there are *no* problems understanding the code. I don't expect either :-) There is nearly no support for Unix, but Stackless *should* build on Unix as it did before without problems. enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From nas at arctrix.com Wed Jan 10 19:15:45 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 10 Jan 2001 10:15:45 -0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: ; from tim.one@home.com on Wed, Jan 10, 2001 at 06:06:05PM -0500 References: <3A5C4E44.23B593E9@per.dem.csiro.au> Message-ID: <20010110101545.A21305@glacier.fnational.com> On Wed, Jan 10, 2001 at 06:06:05PM -0500, Tim Peters wrote: > At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack > whenever HAVE_GETC_UNLOCKED isn't available. Leave it to the timbot use floating point votes. :) Compare ms_getline_hack to what Perl does in order speed up IO. I think its worth maintaining that piece of relatively portable code given the benefit. If the code has to be maintained then it might was well be used. If we find a platform the breaks we can always disable it before the final release. Neil From m.favas at per.dem.csiro.au Thu Jan 11 02:28:59 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 09:28:59 +0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <3A5D0C5B.162F624A@per.dem.csiro.au> [Tim produces a warped threader that crashes on MS OS's] >> ... >> NO, NO NO! Mixing reads and writes on the same stream wasn't what >> we are locking against at all. (As you've found out, it doesn't >> even work.) >On Windows, yes, but that still seems to me to be a bug in MS's code. >If anyone had reported a core dump on any other platform, I'd be more >tractable on this point. On Tru64 Unix, I get an infinite generator of 'r's (after an initial few 'w's) to the screen (but no crashes). If I reduce the size of the loop counters from 1000000 to 3000, I get the following output: opened w w w w w w w w w w w w w w w w w w w w w w w w w w w r read 5114 done -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From m.favas at per.dem.csiro.au Thu Jan 11 04:40:18 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 11:40:18 +0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint Message-ID: <3A5D2B22.B8028AC@per.dem.csiro.au> [Tim responded] >> >> total 131426612 chars and 514216 lines >You average over 255 chars/line? Really? What kind of file are you >reading? I don't really want to measure the speed of line-at-a-time >input on binary files where "line" doesn't actually make sense <0.6 wink>. Real-life input, my boy! It's actually a syslog from my mailserver, consisting mainly of sendmail log messages, and I have a current need to process these things (MS Exchange, corrupted database, clobbered backup tapes), so this thread came along at the right time... >Guido pointed out that his readlines_sizehint test forced use of a 1Mb >buffer (in the call, not only the default value). For whatever >reason, that was significantly slower than using an 8Kb sizehint on my >box. Removing the buffer size arg in the call to readlines_sizehint results in this (using up-to-the-minute CVS): total 131426612 chars and 514216 lines count_chars_lines 4.922 4.916 readlines_sizehint 3.881 3.850 using_fileinput 10.371 10.366 while_readline 10.943 10.916 for_xreadlines 2.990 2.967 and with an 8Kb sizehint: total 131426612 chars and 514216 lines count_chars_lines 5.241 5.216 readlines_sizehint 2.917 2.900 using_fileinput 10.351 10.333 while_readline 10.990 10.983 for_xreadlines 2.877 2.867 >Another oddity is that while_readline is slower than using_fileinput >for you. From that I take it Python config does *not* #define > > HAVE_GETC_UNLOCKED > >on your platform. If that's true Nope, HAVE_GETC_UNLOCKED is indeed #define'd >(or esp. if it's not!), would you do me a >favor? Recompile fileobject.c with > > USE_MS_GETLINE_HACK > >#define'd, try the timing test again (while_readline is the most >interesting test for this), and run the test_bufio.py std test to make >sure you're actually getting the right answers. Sure: With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd (although defining the former makes the latter def irrelevant): (test_bufio also OK) total 131426612 chars and 514216 lines count_chars_lines 5.056 5.050 readlines_sizehint 3.771 3.667 using_fileinput 11.128 11.116 while_readline 8.287 8.233 for_xreadlines 3.090 3.083 With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just for completeness): total 131426612 chars and 514216 lines count_chars_lines 4.916 4.900 readlines_sizehint 3.875 3.867 using_fileinput 14.404 14.383 while_readline 322.728 321.837 for_xreadlines 7.113 7.100 So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From nas at arctrix.com Wed Jan 10 22:55:23 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 10 Jan 2001 13:55:23 -0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <3A5D2B22.B8028AC@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Thu, Jan 11, 2001 at 11:40:18AM +0800 References: <3A5D2B22.B8028AC@per.dem.csiro.au> Message-ID: <20010110135523.A21894@glacier.fnational.com> On Thu, Jan 11, 2001 at 11:40:18AM +0800, Mark Favas wrote: [with getc_unlocked] > while_readline 10.943 10.916 [without] > while_readline 322.728 321.837 Holy crap. Great work team. Neil From tim.one at home.com Thu Jan 11 06:03:51 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 00:03:51 -0500 Subject: [Python-Dev] Baffled on Windows Message-ID: In version 2.26 of mmapmodule.c, Guido replaced (as part of a contributed Cygwin patch): #ifdef MS_WIN32 __declspec(dllexport) void #endif /* MS_WIN32 */ #ifdef UNIX extern void #endif by: DL_EXPORT(void) before initmmap. 1. Windows Python can no longer import mmap: >>> import mmap Traceback (most recent call last): File "", line 1, in ? ImportError: dynamic module does not define init function (initmmap) >>> This is because GetProcAddress returns NULL. 2. Everything's fine if I revert Guido's change (although I assume that breaks Cygwin then). 3. DL_EXPORT(void) expands to "void". 4. The way mmapmodule.c is coded and built after Guido's change appears to me to be the same as how every other non-builtin module is coded and built on Windows. For example, winsound.c, which uses DL_EXPORT(void) before its initwinsound and where that macro also expands to "void". But importing winsound works fine. Since what I'm seeing makes no consistent sense, I'm at a loss how to fix it. But then I'm punch-drunk too <0.7 wink>. Any Windows geek got a clue? From tim.one at home.com Thu Jan 11 07:10:40 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 01:10:40 -0500 Subject: [Python-Dev] RE: xreadline speed vs readlines_sizehint In-Reply-To: <3A5D2B22.B8028AC@per.dem.csiro.au> Message-ID: [Tim, to MarkF] >> You average over 255 chars/line? [nag, nag, nag] [Mark Favas] > Real-life input, my boy! It's actually a syslog from my > mailserver, consisting mainly of sendmail log messages, and I > have a current need to process these things (MS Exchange, > corrupted database, clobbered backup tapes), so this thread > came along at the right time... Hmm. I tuned ms_getline_hack for Guido's logfiles, which he said don't often exceed 160 chars/line. I guess if you're on a 64-bit platform, though, it must take about twice as many chars per line to record a log msg . > ... > Removing the buffer size arg in the call to readlines_sizehint results > in this (using up-to-the-minute CVS): > total 131426612 chars and 514216 lines > count_chars_lines 4.922 4.916 > readlines_sizehint 3.881 3.850 > using_fileinput 10.371 10.366 > while_readline 10.943 10.916 > for_xreadlines 2.990 2.967 > > and with an 8Kb sizehint: > total 131426612 chars and 514216 lines > count_chars_lines 5.241 5.216 > readlines_sizehint 2.917 2.900 > using_fileinput 10.351 10.333 > while_readline 10.990 10.983 > for_xreadlines 2.877 2.867 That's sure consistent across platforms, then. I guess we'll write it off to "cache effects" (a catch-all explanation for any timing mystery -- go ahead, just *try* to prove it's wrong <0.5 wink>). [and Mark has HAVE_GETC_UNLOCKED on his Tru64 Unix box, yet using_fileinput is quicker than while_readline] > With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd > (although defining the former makes the latter def irrelevant): > (test_bufio also OK) > total 131426612 chars and 514216 lines > count_chars_lines 5.056 5.050 > readlines_sizehint 3.771 3.667 > using_fileinput 11.128 11.116 > while_readline 8.287 8.233 > for_xreadlines 3.090 3.083 So ms_getline_hack is significantly faster on your box (I'm only looking at while_readline: 11 using getc_unlocked, 8.3 using ms_getline_hack). There are only two reasons I can imagine for that: 1. Your vendor optimizes the inner loop in fgets (as all vendors should, but few do). and/or 2. Despite the long average length of your lines, many of them are nevertheless shorter than 200 chars, and so all the pain ms_getline_hack endures to avoid a realloc pays off. Unfortunately, there's not enough info to figure out if either, both, or none of those are on-target. It's such a large percentage speedup, though, that my bet goes primarily to #1 -- unless realloc is really pig slow on your box. Which some things *are*: > With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just > for completeness): > total 131426612 chars and 514216 lines > count_chars_lines 4.916 4.900 > readlines_sizehint 3.875 3.867 > using_fileinput 14.404 14.383 > while_readline 322.728 321.837 > for_xreadlines 7.113 7.100 > > So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement > Yes, that's the "platform from Mars" evidence I was seeking: if ms_getline_hack survives test_bufio on *your* crazy box, it's as close to provably correct as any algorithm in all of Computer Science . a-factor-of-39-is-almost-big-enough-to-notice!-ly y'rs - tim From m.favas at per.dem.csiro.au Thu Jan 11 08:26:37 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 15:26:37 +0800 Subject: [Python-Dev] Re: xreadline speed vs readlines_sizehint References: Message-ID: <3A5D602D.9DC991CB@per.dem.csiro.au> [Tim speculates on getc_unlocked and his ms_getline_hack]: > > So ms_getline_hack is significantly faster on your box (I'm only > looking at while_readline: 11 using getc_unlocked, 8.3 using > ms_getline_hack). There are only two reasons I can imagine for that: > > 1. Your vendor optimizes the inner loop in fgets (as all vendors > should, but few do). Digital engineering, Compaq management/marketing <0.6 wink> > > and/or > > 2. Despite the long average length of your lines, many of them are > nevertheless shorter than 200 chars, and so all the pain > ms_getline_hack endures to avoid a realloc pays off. > > Unfortunately, there's not enough info to figure out if either, both, > or none of those are on-target. It's such a large percentage > speedup, though, that my bet goes primarily to #1 -- unless realloc > is really pig slow on your box. The lines range in length from 96 to 747 characters, with 11% @ 233, 17% @ 252 and 52% @ 254 characters, so #1 looks promising - most lines are long enough to trigger a realloc. Cranking up INITBUFSIZE in ms_getline_hack to 260 from 200 improves thing again, by another 25%: total 131426612 chars and 514216 lines count_chars_lines 5.081 5.066 readlines_sizehint 3.743 3.717 using_fileinput 11.113 11.100 while_readline 6.100 6.083 for_xreadlines 3.027 3.033 Apart from the name , I like ms_getline_hack... tho'-a-factor-of-100-makes-xreadlines-a-welcome-addition!-ly y'rs -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From m.favas at per.dem.csiro.au Thu Jan 11 10:08:29 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 17:08:29 +0800 Subject: [Python-Dev] Current CVS version of sysmodule.c fails to compile Message-ID: <3A5D780D.62D0F473@per.dem.csiro.au> On Tru64 Unix, with Compaq's C/CXX compilers, the current CVS version of sysmodule.c produces the following errors: cc -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c -o sysmodule.o sysmodule.c cc: Error: sysmodule.c, line 73: Invalid declarator. (declarator) PyObject *o, *stdout; ----------------------^ cc: Error: sysmodule.c, line 79: In this statement, "o" is not declared. (undeclared) if (!PyArg_ParseTuple(args, "O:displayhook", &o)) ------------------------------------------------------^ cc: Error: sysmodule.c, line 93: In this statement, "(&_iob[1])" is not an lvalue, but occurs in a context that requires one. (needlvalue) stdout = PySys_GetObject("stdout"); --------^ cc: Warning: sysmodule.c, line 98: In this statement, the referenced type of the pointer value "(&_iob[1])" is "struct declared without a tag", which is not compatible with "struct _object". (ptrmismatch) if (PyFile_WriteObject(o, stdout, 0) != 0) ----------------------------------^ cc: Warning: sysmodule.c, line 100: In this statement, the referenced type of the pointer value "(&_iob[1])" is "struct declared without a tag", which is not compatible with "struct _object". (ptrmismatch) PyFile_SoftSpace(stdout, 1); -------------------------^ The problem is that stdout is a macro #define'd in stdio.h as (&_iob[1]) (stdin and stderr also are similarly #define'd). -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From gstein at lyra.org Thu Jan 11 10:18:44 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 01:18:44 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.216,2.217 sysmodule.c,2.80,2.81 In-Reply-To: ; from moshez@users.sourceforge.net on Wed, Jan 10, 2001 at 09:41:29PM -0800 References: Message-ID: <20010111011843.W4640@lyra.org> On Wed, Jan 10, 2001 at 09:41:29PM -0800, Moshe Zadka wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv21213/Python > > Modified Files: > ceval.c sysmodule.c >... > --- 1246,1269 ---- > case PRINT_EXPR: > v = POP(); > ! w = PySys_GetObject("displayhook"); > ! if (w == NULL) { > ! PyErr_SetString(PyExc_RuntimeError, > ! "lost sys.displayhook"); > ! err = -1; > } > + if (err == 0) { > + x = Py_BuildValue("(O)", v); > + if (x == NULL) > + err = -1; > + } > + if (err == 0) { > + w = PyEval_CallObject(w, x); > + if (w == NULL) > + err = -1; > + } > Py_DECREF(v); > + Py_XDECREF(x); x was never initialized to NULL. In fact, the loop sets it to Py_None. If you get an error in the initial "w" setup case, then you could erroneously decref None. Further, there is no DECREF for the CallObject result ("w"). But watch out: you don't want to DECREF the PySys_GetObject result (that is a borrowed reference). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Jan 11 10:28:16 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 01:28:16 -0800 Subject: [Python-Dev] Current CVS version of sysmodule.c fails to compile In-Reply-To: <3A5D780D.62D0F473@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Thu, Jan 11, 2001 at 05:08:29PM +0800 References: <3A5D780D.62D0F473@per.dem.csiro.au> Message-ID: <20010111012815.X4640@lyra.org> You're quite right! I've checked in a change, renaming it to "outf". Cheers, -g On Thu, Jan 11, 2001 at 05:08:29PM +0800, Mark Favas wrote: > On Tru64 Unix, with Compaq's C/CXX compilers, the current CVS version of > sysmodule.c produces the following errors: > > cc -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c -o > sysmodule.o sysmodule.c > cc: Error: sysmodule.c, line 73: Invalid declarator. (declarator) > PyObject *o, *stdout; > ----------------------^ > cc: Error: sysmodule.c, line 79: In this statement, "o" is not declared. > (undeclared) > if (!PyArg_ParseTuple(args, "O:displayhook", &o)) > ------------------------------------------------------^ > cc: Error: sysmodule.c, line 93: In this statement, "(&_iob[1])" is not > an lvalue, but occurs in a context that requires one. (needlvalue) > stdout = PySys_GetObject("stdout"); > --------^ > cc: Warning: sysmodule.c, line 98: In this statement, the referenced > type of the pointer value "(&_iob[1])" is "struct declared without a > tag", which is not compatible with "struct _object". (ptrmismatch) > if (PyFile_WriteObject(o, stdout, 0) != 0) > ----------------------------------^ > cc: Warning: sysmodule.c, line 100: In this statement, the referenced > type of the pointer value "(&_iob[1])" is "struct declared without a > tag", which is not compatible with "struct _object". (ptrmismatch) > PyFile_SoftSpace(stdout, 1); > -------------------------^ > > The problem is that stdout is a macro #define'd in stdio.h as (&_iob[1]) > (stdin and stderr also are similarly #define'd). > > -- > Mark Favas - m.favas at per.dem.csiro.au > CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Greg Stein, http://www.lyra.org/ From skip at mojam.com Thu Jan 11 15:13:55 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 11 Jan 2001 08:13:55 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: References: Message-ID: <14941.49059.26189.733094@beluga.mojam.com> Moshe> * Did not DECREF result from displayhook function ... Moshe> w = PyEval_CallObject(w, x); Moshe> + Py_XDECREF(w); Moshe> if (w == NULL) ... While it works, is it really kosher to test w's value after the DECREF? Just seems like an odd construct to me. I'm used to seeing the test immediately after it's been set. Skip From guido at python.org Thu Jan 11 15:44:58 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 09:44:58 -0500 Subject: [Python-Dev] Interning filenames of imported modules In-Reply-To: Your message of "Wed, 10 Jan 2001 15:57:41 CST." <14940.56021.646147.770080@buffalo.fnal.gov> References: <14940.56021.646147.770080@buffalo.fnal.gov> Message-ID: <200101111444.JAA14597@cj20424-a.reston1.va.home.com> > I have a question about the following code in compile.c:jcompile (line 3678) > > filename = PyString_InternFromString(sc.c_filename); > name = PyString_InternFromString(sc.c_name); > > In the case of a long-running server which constantly imports modules, > this causes the interned string dict to grow without bound. Is there > a strong reason that the filename needs to be interned? How about the > module name? It's probably not *necessary* for the filename, but I know why I am interning it: since a module typically contains a bunch of functions, and each function has its own code object with a reference to the filename, I'm trying to save memory (the filename is a C string pointer in the "sc" structure, so it has to be turned into a Python string when creating the code object). The module name is used as an identifier elsewhere so will become interned anyway. > How about some way to enforce a limit on the size of the interned > strings dictionary? I've never thought of this -- but I suppose that a weak dictionary could be used. Fred's working on a PEP for weak references, so there's a chance that we might use this eventually. In the mean time, a possibility would be to provide a service function that goes through the "interned" dictionary and looks for values with a reference count of 1, and deletes them. You could then explicitly call this service function occasionally in your program. I would let it return a tuple: (number of values kept, number of values deleted). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 16:08:48 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:08:48 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 13:49:40 EST." References: Message-ID: <200101111508.KAA14870@cj20424-a.reston1.va.home.com> > They're indistinguishable then on my box (on one run xreadlines is .1 > seconds (out of around 7.6 total) quicker, on another readlines_sizehint), > *provided* that I specify the same buffer size (8192) that xreadlines uses > internally. However, if I even double that, readlines_sizehint is uniformly > about 10% slower. It's also a tiny bit slower if I cut the sizehint buffer > size to 4096. > > I'm afraid Mysteries will remain no matter how many person-decades we spend > staring at this <0.5 wink> ... 8192 happens to be the size of the stack-allocated buffer readlines() uses, and also the stdio BUFSIZ parameter, on many systems. Look for SMALLCHUNK in fileobject.c. Would it make sense to tie the two constants together more to tune this optimally even when BUFSIZ is different? --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Thu Jan 11 16:09:54 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 11 Jan 2001 10:09:54 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> <200101101900.OAA30486@cj20424-a.reston1.va.home.com> Message-ID: <14941.52418.18484.898061@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> It was more work than I had hoped for, because Eric GvR> apparently (despite having developer privileges!) doesn't use GvR> the CVS tree -- he sent in a diff relative to the 2.0 GvR> release. I munged it into place, adding the feature that GvR> readline, _curses and bsdddb are built as shared libraries by GvR> default. You'd have to edit Setup.config.in to change this. GvR> Hope this doesn't break anybody's setup. (Skip???) We may need to move dbm module to Setup.config from Setup and build it shared too. The problem I ran into when building the pybsddb3 module was that even though I'd built the standard bsddb shared, I was also building in dbm statically. This pulled in a dependency to the old db.so module (under RH6.1) and core dumped me during the test suite for pybsddb. Commenting out dbm did the trick, so building it shared should work too. Couple of things: dbm isn't enabled by default I believe so moving it to Setup.config may not be the right thing after all (would that imply an autoconf test and auto-enabling if it's detected?) Also, Andrew's distutils-based build procedure may obviate the need for this change. -Barry From ping at lfw.org Thu Jan 11 16:14:17 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 07:14:17 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: On Wed, 10 Jan 2001, Guido van Rossum wrote: > Yes -- I came up with the same thought. > > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. Please don't use __all__. At the moment, __all__ is the only way to easily tell whether a particular module object really represents a package, and the only way to get the list of submodule names. If __all__ is overloaded to also represent exportable symbols in modules, these two pieces of information will be impossible (or require much ugly hackery) to obtain. -- ?!ng From guido at python.org Thu Jan 11 16:23:26 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:23:26 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 15:25:24 EST." References: Message-ID: <200101111523.KAA14982@cj20424-a.reston1.va.home.com> > [Tim] > >> Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method > >> to keep the file locked until the line was complete, and I > >> wouldn't be opposed to making life saner on platforms that allow it. > > [Guido] > > Hm... That would be possible, except for one unfortunate detail: > > _PyString_Resize() may call PyErr_BadInternalCall() which touches > > thread state. [Tim] > FLOCKFILE/FUNLOCKFILE are independent of Python's notion of thread state. > IOW, do FLOCKFILE once before the for(;;), and FUNLOCKFILE once on every > *exit* path thereafter. We can block/unblock Python threads as often as > desired between those *file*-locking brackets. The only thing the repeated > FLOCKFILE/FUNLOCKFILE calls do to my eyes now is to create the *possibility* > for multiple readers to get partial lines of the file. I don't want to call FLOCKFILE while holding the Python lock, as this means that *if* we're blocked in FLOCKFILE (e.g. we're reading from a pipe or socket), no other Python thread can run! > > ... > > NO, NO NO! Mixing reads and writes on the same stream wasn't what we > > are locking against at all. (As you've found out, it doesn't even > > work.) > > On Windows, yes, but that still seems to me to be a bug in MS's code. If > anyone had reported a core dump on any other platform, I'd be more tractable > on this point. Yes, it's a Windows bug. > > We're only trying to protect against concurrent *reads*. > > As above, I believe that we could do a better job of that, then, on > platforms that HAVE_GETC_UNLOCKED, by protecting not only against core dumps > but also against .readline() not delivering an intact line from the file. See above for a reason why I think that's not safe. I think that applications that want to do this can do their own locking. (They'll find out soon enough that readline() isn't atomic. :-) > >> But since FLOCKFILE is in effect, other threads *trying* to write > >> to the stream we're reading will get blocked anyway. Seems to give us > >> potential for deadlocks. > > > Only if tyeh are holding other locks at the same time. > > I'm not being clear, then. Thread X does f.readline(), on a > HAVE_GETC_UNLOCKED platform. get_line allows other threads to run and > invokes FLOCKFILE on f->f_fp. get_line's GETC in thread X eventually hits > the end of the stdio buffer, and does its platform's version of _filbuf. > _filbuf may wait (depending on the nature of the stream) for more input to > show up. Simultaneously, thread Y attempts to write some data to f. But > the *FLOCKFILE* lock prevents it from doing anything with f. So X is > waiting for Y to write data inside platform _filbuf, but Y is waiting for X > to release the platform stream lock inside some platform stream-output > routine (if I'm being clear now, Python locks have nothing to do with this > scenario: it's the platform stream lock). I don't think that _filbuf can possibly wait for another thread to write data to the same stream object. A single stream object doesn't act like a pipe, even if it is open for simultaneous reading and writing. So if there's no more data in the file, _fulbuf will simply return with an EOF status, not wait for the data that the other thread would write. > I think this is purely the user's fault if it happens. Just pointing it out > as another insecurity we're probably not able to protect users from. I don't think this can happen. > > ... > > Yeah. But this is insane use -- see my comments on SF. It's only > > worth fixing because it could be used to intentionally crash Python -- > > but there are easier ways... > > If it's unique to MS (as I suspect), I see no reason to even consider trying > to fix it in Python. Unless the Perl Mongers use it to crash Zope . OK. It's unique to MS. So close the bug report with a "won't fix" resolution. There's no point in having bug reports remain open that we know we can't fix. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 16:27:05 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:27:05 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 17:23:14 EST." References: Message-ID: <200101111527.KAA15005@cj20424-a.reston1.va.home.com> > Think like an implementer here <0.5 wink>: they've lost track of how many > characters are in the buffer despite a locking scheme whose purpose is to > prevent that. If it were my implementation, that would be a top-priority > bug no matter how silly the first program I saw that triggered it. The locking prevents concurrent threads accessing the stream. But mixing reads and writes (without intervening fseek etc.) is illegal use of the stream, and the C standard allows them to be lax here, even if the program was single-threaded. In other words: the locking is so good that it serializes the sequence of reads and writes; but if the sequence of reads and writes is illegal, they don't guarantee anything. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 16:28:23 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:28:23 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 11 Jan 2001 09:28:59 +0800." <3A5D0C5B.162F624A@per.dem.csiro.au> References: <3A5D0C5B.162F624A@per.dem.csiro.au> Message-ID: <200101111528.KAA15021@cj20424-a.reston1.va.home.com> > On Tru64 Unix, I get an infinite generator of 'r's (after an initial few > 'w's) to the screen (but no crashes). Same here on Linux. > If I reduce the size of the loop > counters from 1000000 to 3000, I get the following output: > opened > w w w w w w w w w w w w w w w w w w w w w w w w w w w r read 5114 > done I still get an infinite amount of 'r's. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 11 16:28:21 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 16:28:21 +0100 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: ; from tim.one@home.com on Sun, Jan 07, 2001 at 11:13:26PM -0500 References: Message-ID: <20010111162820.W2467@xs4all.nl> On Sun, Jan 07, 2001 at 11:13:26PM -0500, Tim Peters wrote: > I'm curious about how it performs (relative to the getc_unlocked hack) on > other platforms. If you'd like to try that, just recompile fileobject.c > with > USE_MS_GETLINE_HACK > #define'd. It should *work* on any platform with fgets() meeting the > assumption. The new test_bufio.py std test gives it a pretty good > correctness workout, if you're worried about that. FreeBSD seems to work fine. Speed is practically the same as without USE_MS_GETLINE_HACK (but with HAVE_GETC_UNLOCKED), though still not quite the same as before all this hackery :-) Not by much though. For most tests it's smaller than the margin of error, though the difference is still as much as 20, 30% for the while_readline test. When using a second thread somewhere in the test, the difference vanishes further. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Thu Jan 11 16:33:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 11 Jan 2001 16:33:28 +0100 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A5DD248.8EE0DF63@lemburg.com> Ka-Ping Yee wrote: > > On Wed, 10 Jan 2001, Guido van Rossum wrote: > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package, and the only way to get the list of submodule names. But __all__ has to be user-defined, so I don't buy that argument. Note that the only true way to recognize a package is by looking for an attribute "__path__" since Python adds this for packages only. > If __all__ is overloaded to also represent exportable symbols in > modules, these two pieces of information will be impossible (or > require much ugly hackery) to obtain. Again, __all__ is not automatically generated, so trusting it doesn't get you very far. To be able to find subpackages you will always have to apply some hackery (based on __path__) in order to be sure. It would be better to add a helper function to packages to query this kind of information -- the package usually knows best where to look and what to look for. Note that __all__ was explicitly invented to be used by from package import * so I think it is the right choice here. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Thu Jan 11 16:37:19 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 11 Jan 2001 10:37:19 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <14941.52418.18484.898061@anthem.wooz.org>; from barry@digicool.com on Thu, Jan 11, 2001 at 10:09:54AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> <200101101900.OAA30486@cj20424-a.reston1.va.home.com> <14941.52418.18484.898061@anthem.wooz.org> Message-ID: <20010111103719.A7191@thyrsus.com> GvR> It was more work than I had hoped for, because Eric GvR> apparently (despite having developer privileges!) doesn't use GvR> the CVS tree -- he sent in a diff relative to the 2.0 GvR> release. I'm using the CVS tree now. I did that patch relative to 2.0 for boring reasons having to do with the state of my laptop. -- Eric S. Raymond The IRS has become morally corrupted by the enormous power which we in Congress have unwisely entrusted to it. Too often it acts like a Gestapo preying upon defenseless citizens. -- Senator Edward V. Long From thomas at xs4all.net Thu Jan 11 16:48:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 16:48:32 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5DD248.8EE0DF63@lemburg.com>; from mal@lemburg.com on Thu, Jan 11, 2001 at 04:33:28PM +0100 References: <3A5DD248.8EE0DF63@lemburg.com> Message-ID: <20010111164831.X2467@xs4all.nl> On Thu, Jan 11, 2001 at 04:33:28PM +0100, M.-A. Lemburg wrote: > > Please don't use __all__. At the moment, __all__ is the only way > > to easily tell whether a particular module object really represents > > a package, and the only way to get the list of submodule names. > > But __all__ has to be user-defined, so I don't buy that argument. > Note that the only true way to recognize a package is by looking > for an attribute "__path__" since Python adds this for packages > only. Ehm.... What, exactly, prevents usercode from doing __path__ = "neener, neener" ? In other words, even *that* isn't a true way to recognize a package. You can see what isn't a package, but not what is. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Thu Jan 11 16:58:55 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:58:55 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 07:14:17 PST." References: Message-ID: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package, and the only way to get the list of submodule names. > > If __all__ is overloaded to also represent exportable symbols in > modules, these two pieces of information will be impossible (or > require much ugly hackery) to obtain. Marc-Andre already explained that __all__ is not to be trusted. If you want a reasonably good test for package-ness, use the presence of __path__. For a really good test, check whether __file__ ends in __init__.py[c]. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Thu Jan 11 17:14:00 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 11 Jan 2001 11:14:00 -0500 Subject: [Python-Dev] PEP 229: setup.py revised Message-ID: I've put a new version of the setup.py script at http://www.mems-exchange.org/software/files/python/setup.py (I'm at work and can't remember the password to get into www.amk.ca. :) ) This version improves the detection of Tcl/Tk, handles the _curses_panel module, and doesn't do a chdir(). Same drill as before: just grab the script, drop it in the root of your Python source tree (2.0 or current CVS), run "./python setup.py build", and look at the modules it compiles. I can try it on Linux, so I'm most interested in hearing reports for other Unix versions (*BSD, HP-UX, etc.) --amk From ping at lfw.org Thu Jan 11 17:36:36 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 08:36:36 -0800 (PST) Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) Message-ID: I'm pleased to announce a reasonable first pass at a documentation utility for interactive use. "pydoc" is usable in three ways: 1. At the shell prompt, "pydoc " displays documentation on , very much like "man". 2. At the shell prompt, "pydoc -k " lists modules whose one-line descriptions mention the keyword, like "man -k". 3. Within Python, "from pydoc import help" provides a "help" function to display documentation at the interpreter prompt. All of them use sys.path in order to guarantee that the documentation you see matches the modules you get. To try "pydoc", download: http://www.lfw.org/python/pydoc.py http://www.lfw.org/python/htmldoc.py http://www.lfw.org/python/textdoc.py http://www.lfw.org/python/inspect.py I would very much appreciate your feedback, especially from testing on non-Unix platforms. Thank you! I've pasted some examples from my shell below (when you actually run pydoc, the output is piped through "less", "more", or a pager implemented in Python, depending on what is available). -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler skuld[1268]% pydoc -k mail mailbox - Classes to handle Unix style, MMDF style, and MH style mailboxes. mailcap - Mailcap file handling. See RFC 1524. mimify - Mimification and unmimification of mail messages. test.test_mailbox - (no description) skuld[1269]% pydoc -k text textdoc - Generate text documentation from live Python objects. collab - Routines for collaboration, especially group editing of text documents. gettext - Internationalization and localization support. test.test_gettext - (no description) curses.textpad - Simple textbox editing widget with Emacs-like keybindings. distutils.text_file - text_file ScrolledText - (no description) skuld[1270]% pydoc -k html htmldoc - Generate HTML documentation from live Python objects. htmlentitydefs - HTML character entity references. htmllib - HTML 2.0 parser. skuld[1271]% pydoc md5 Python Library Documentation: built-in module md5 NAME md5 FILE (built-in) DESCRIPTION This module implements the interface to RSA's MD5 message digest algorithm (see also Internet RFC 1321). Its use is quite straightforward: use the new() to create an md5 object. You can now feed this object with arbitrary strings using the update() method, and at any point you can ask it for the digest (a strong kind of 128-bit checksum, a.k.a. ``fingerprint'') of the contatenation of the strings fed to it so far using the digest() method. Functions: new([arg]) -- return a new md5 object, initialized with arg if provided md5([arg]) -- DEPRECATED, same as new, but for compatibility Special Objects: MD5Type -- type object for md5 objects FUNCTIONS md5(no arg info) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. new(no arg info) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. skuld[1272]% pydoc types Python Library Documentation: module types NAME types FILE /home/ping/sw/Python-1.5.2/Lib/types.py DESCRIPTION # Define names for all type symbols known in the standard interpreter. # Types that are part of optional modules (e.g. array) are not listed. skuld[1273]% pydoc abs Python Library Documentation: built-in function abs abs (no arg info) abs(number) -> number Return the absolute value of the argument. skuld[1274]% pydoc repr Python Library Documentation: built-in function repr repr (no arg info) repr(object) -> string Return the canonical string representation of the object. For most object types, eval(repr(object)) == object. Python Library Documentation: module repr NAME repr - # Redo the `...` (representation) but with limits on most sizes. FILE /home/ping/sw/Python-1.5.2/Lib/repr.py CLASSES Repr class Repr __init__(self) repr(self, x) repr1(self, x, level) repr_dictionary(self, x, level) repr_instance(self, x, level) repr_list(self, x, level) repr_long_int(self, x, level) repr_string(self, x, level) repr_tuple(self, x, level) FUNCTIONS repr(no arg info) skuld[1275]% pydoc re.MatchObject Python Library Documentation: class MatchObject in re class MatchObject __init__(self, re, string, pos, endpos, regs) end(self, g=0) Return the end of the substring matched by group g group(self, *groups) Return one or more groups of the match groupdict(self, default=None) Return a dictionary containing all named subgroups of the match groups(self, default=None) Return a tuple containing all subgroups of the match object span(self, g=0) Return (start, end) of the substring matched by group g start(self, g=0) Return the start of the substring matched by group g skuld[1276]% pydoc xml Python Library Documentation: package xml NAME xml - Core XML support for Python. FILE /home/ping/dev/python/dist/src/Lib/xml/__init__.py DESCRIPTION This package contains three sub-packages: dom -- The W3C Document Object Model. This supports DOM Level 1 + Namespaces. parsers -- Python wrappers for XML parsers (currently only supports Expat). sax -- The Simple API for XML, developed by XML-Dev, led by David Megginson and ported to Python by Lars Marius Garshol. This supports the SAX 2 API. VERSION 1.8 skuld[1277]% pydoc lovelyspam no Python documentation found for lovelyspam skuld[1278]% python Python 1.5.2 (#1, Dec 12 2000, 02:25:44) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> >>> from pydoc import help >>> help(int) Help on built-in function int: int (no arg info) int(x) -> integer Convert a string or number to an integer, if possible. A floating point argument will be truncated towards zero. >>> help("urlparse.urljoin") Help on function urljoin in module urlparse: urljoin(base, url, allow_fragments=1) # Join a base URL and a possibly relative URL to form an absolute # interpretation of the latter. >>> import random >>> help(random.generator) Help on class generator in module random: class generator(whrandom.whrandom) Random generator class. __init__(self, a=None) Constructor. Seed from current time or hashable value. seed(self, a=None) Seed the generator from current time or hashable value. >>> From moshez at zadka.site.co.il Fri Jan 12 01:48:30 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 02:48:30 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5C97F4.945D0C1@lemburg.com> References: <3A5C97F4.945D0C1@lemburg.com>, <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il> On Wed, 10 Jan 2001 18:12:20 +0100, "M.-A. Lemburg" wrote: > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > +1 -- this won't be me though (at least not this week). I'm working on it -- I'll have a patch ready as soon as my slow modem will manage to finish the "cvs diff". Guido, I'll assign it to you, OK? > Cool. This could make Python instances usable as "modules" > -- with full getattr() hook support ! My Patch already does that -- if the instance supports __all__ > For IMPORT_STAR I'd suggest first looking for __all__ and > then reverting to __dict__.items() in case this fails. That's what my patch is doing. > BTW, is __dict__ needed by the import mechanism or would > the getattr/setattr slots suffice ? And if yes, must it > be a real Python dictionary ? My patch works with getattr (no setattr) as longs as there is an __all__ attribute. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From ping at lfw.org Thu Jan 11 17:42:44 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 08:42:44 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Guido van Rossum wrote: > > Marc-Andre already explained that __all__ is not to be trusted. > > If you want a reasonably good test for package-ness, use the presence > of __path__. Sorry, you're right. I retract my comment about __all__. -- ?!ng From skip at mojam.com Thu Jan 11 17:47:13 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 11 Jan 2001 10:47:13 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010111164831.X2467@xs4all.nl> References: <3A5DD248.8EE0DF63@lemburg.com> <20010111164831.X2467@xs4all.nl> Message-ID: <14941.58257.304339.437443@beluga.mojam.com> Thomas> __path__ = "neener, neener" I believe correct English usage here is "neener, neener, neener", with a little extra emphasis on the first syllable of the third "neener"... does-that-help?-ly y'rs, Skip From MarkH at ActiveState.com Fri Jan 12 17:55:29 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Fri, 12 Jan 2001 08:55:29 -0800 Subject: [Python-Dev] RE: Baffled on Windows In-Reply-To: Message-ID: > 4. The way mmapmodule.c is coded and built after Guido's change appears to > me to be the same as how every other non-builtin module is coded and built > on Windows. For example, winsound.c, which uses DL_EXPORT(void) > before its > initwinsound and where that macro also expands to "void". But importing > winsound works fine. winsound adds "/export:initwinsound" to the link line. This is an alternative to __declspec in the sources. This all gets back to a discussion we had here nearly a year or so ago - that "DL_EXPORT" isnt capturing our semantics, and that we should probably create #defines that match the _intent_ of the definition, rather than the implementation details - ie, replace DL_EXPORT with (say) PY_API_DECL and PY_MODULEINIT_DECL or some such. I'm happy to think about this and help implement it if the time is now right... > Any Windows geek got a clue? Isn't that question a paradox? ;-) Mark. From skip at mojam.com Thu Jan 11 18:11:23 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 11 Jan 2001 11:11:23 -0600 (CST) Subject: [Python-Dev] dir()/__all__/etc Message-ID: <14941.59707.632995.224116@beluga.mojam.com> I know Guido has said he doesn't want to fiddle with dir(), but my sense of things from the overall discussion of the __exports__ concept tells me that when used interactively dir() often presents confusing output for new Python users. I twiddled CGIHTTPServer to have __all__ and added the following dir() function to my PYTHONSTARTUP file: def dir(o,showall=0): if not showall and hasattr(o, "__all__"): x = list(o.__all__) x.sort() return x from __builtin__ import dir as d return d(o) Compare its output with and without showall set: >>> dir(CGIHTTPServer) ['CGIHTTPRequestHandler', 'test'] >>> dir(CGIHTTPServer,1) ['BaseHTTPServer', 'CGIHTTPRequestHandler', 'SimpleHTTPServer', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__version__', 'executable', 'nobody', 'nobody_uid', 'os', 'string', 'sys', 'test', 'urllib'] I haven't demonstrated any great programming prowess with this little function, but I rather suspect it may be beyond most brand new users. If Guido can't be convinced to allow dir() to change, how about adding a sample PYTHONSTARTUP file to the distribution that contains little bits like this and Ping's pydoc.help stuff (assuming it gets into the distro, which I hope it does)? Skip From mal at lemburg.com Thu Jan 11 18:25:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 11 Jan 2001 18:25:20 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <3A5DD248.8EE0DF63@lemburg.com> <20010111164831.X2467@xs4all.nl> Message-ID: <3A5DEC80.596F0818@lemburg.com> Thomas Wouters wrote: > > On Thu, Jan 11, 2001 at 04:33:28PM +0100, M.-A. Lemburg wrote: > > > > Please don't use __all__. At the moment, __all__ is the only way > > > to easily tell whether a particular module object really represents > > > a package, and the only way to get the list of submodule names. > > > > But __all__ has to be user-defined, so I don't buy that argument. > > Note that the only true way to recognize a package is by looking > > for an attribute "__path__" since Python adds this for packages > > only. > > Ehm.... What, exactly, prevents usercode from doing > > __path__ = "neener, neener" > > ? In other words, even *that* isn't a true way to recognize a package. You > can see what isn't a package, but not what is. Purists.... ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Fri Jan 12 03:06:37 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:06:37 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <14941.49059.26189.733094@beluga.mojam.com> References: <14941.49059.26189.733094@beluga.mojam.com>, Message-ID: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 08:13:55 -0600 (CST), Skip Montanaro wrote: > While it works, is it really kosher to test w's value after the DECREF? Yes. It may not point to anything valid, but it won't be NULL. > Just seems like an odd construct to me. I'm used to seeing the test > immediately after it's been set. It was more convenient that way. And I'm pretty certain the _DECREF macros do not change their arguments. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez at zadka.site.co.il Fri Jan 12 03:09:13 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:09:13 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: References: Message-ID: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 07:14:17 -0800 (PST), Ka-Ping Yee wrote: > On Wed, 10 Jan 2001, Guido van Rossum wrote: > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package Why not __init__? It has to be there, and is in no other module object. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez at zadka.site.co.il Fri Jan 12 03:23:16 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:23:16 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il> References: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il>, <3A5C97F4.945D0C1@lemburg.com>, <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010112022316.BE682A82D@darjeeling.zadka.site.co.il> On Fri, 12 Jan 2001, Moshe Zadka wrote: > I'm working on it -- I'll have a patch ready as soon as my slow > modem will manage to finish the "cvs diff". Guido, I'll > assign it to you, OK? OK, it's 103200. Unfortunately, I couldn't assign it to Guido, since I couldn't upload it at all (yeah, still those lynx problems). This time I managed to get one specific person to upload for me, but someone else will have to assign to Guido. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From nas at arctrix.com Thu Jan 11 12:42:51 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 11 Jan 2001 03:42:51 -0800 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: ; from akuchlin@mems-exchange.org on Thu, Jan 11, 2001 at 11:14:00AM -0500 References: Message-ID: <20010111034251.A23512@glacier.fnational.com> Here is what I get on my Debian Linux machine: _codecs.so cPickle.so imageop.so pwd.so termios.so _curses.so cStringIO.so linuxaudiodev.so regex.so time.so _curses_panel.so cmath.so math.so resource.so timing.so _locale.so crypt.so md5.so rgbimg.so ucnhash.so _socket.so dbm.so mmap.so rotor.so unicodedata.so _tkinter.so errno.so new.so select.so zlib.so array.so fcntl.so nis.so sha.so audioop.so fpectl.so operator.so signal.so binascii.so gdbm.so parser.so strop.so bsddb.so grp.so pcre.so syslog.so I think that is every module which can be compiled on my machine. Great work Andrew (and the distutil developers). Neil From nas at arctrix.com Thu Jan 11 12:47:09 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 11 Jan 2001 03:47:09 -0800 Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <14941.59707.632995.224116@beluga.mojam.com>; from skip@mojam.com on Thu, Jan 11, 2001 at 11:11:23AM -0600 References: <14941.59707.632995.224116@beluga.mojam.com> Message-ID: <20010111034709.C23512@glacier.fnational.com> I'm -1 on making dir() pay attention to __all__. I'm +1 on adding a help() function which pays attention to __all__ and (optionally?) prints doc strings. Neil From gstein at lyra.org Thu Jan 11 20:38:50 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 11:38:50 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101111558.KAA15447@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 11, 2001 at 10:58:55AM -0500 References: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> Message-ID: <20010111113850.F4640@lyra.org> On Thu, Jan 11, 2001 at 10:58:55AM -0500, Guido van Rossum wrote: > > Please don't use __all__. At the moment, __all__ is the only way > > to easily tell whether a particular module object really represents > > a package, and the only way to get the list of submodule names. > > > > If __all__ is overloaded to also represent exportable symbols in > > modules, these two pieces of information will be impossible (or > > require much ugly hackery) to obtain. > > Marc-Andre already explained that __all__ is not to be trusted. > > If you want a reasonably good test for package-ness, use the presence > of __path__. > > For a really good test, check whether __file__ ends in __init__.py[c]. Even that isn't safe: if the module was pulled from an archive, __file__ might not get set. Determining whether something is a package is highly dependent upon how it was brought into the system. It is entirely possibly that you *can't* know something represents a package. You can get close by looking in sys.modules to look for modules "below" the given module. But if none have been imported yet, then you're out of luck. If you're using imputil, then you can look for __ispkg__ in the module. Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas at xs4all.net Thu Jan 11 20:50:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 20:50:24 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Fri, Jan 12, 2001 at 04:09:13AM +0200 References: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il> Message-ID: <20010111205024.Z2467@xs4all.nl> On Fri, Jan 12, 2001 at 04:09:13AM +0200, Moshe Zadka wrote: > Why not __init__? It has to be there, and is in no other module object. Wrong association... __init__ would be a method that gets executed. (At least that's what I'd expect :) 'sides,-everyone-was-in-agreement-on-__all__-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From MarkH at ActiveState.com Thu Jan 11 21:25:30 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 11 Jan 2001 12:25:30 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> Message-ID: > It was more convenient that way. And I'm pretty certain the _DECREF > macros do not change their arguments. Pretty certain??? That doesn't inspire confidence . How certain are you that this will be true in the future? I think it bad style indeed - for example, I could see benefit in having DECREF (or _Py_Dealloc, called by decref) set the object to NULL in debug builds. What if that decision is taken in the future? I thought rules were pretty clear with reference counting - dont assume _anything_ about the object unless you hold a reference (or are damn sure someone else does!) Mark. From thomas at xs4all.net Thu Jan 11 22:41:57 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 22:41:57 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: ; from MarkH@ActiveState.com on Thu, Jan 11, 2001 at 12:25:30PM -0800 References: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> Message-ID: <20010111224157.A2467@xs4all.nl> On Thu, Jan 11, 2001 at 12:25:30PM -0800, Mark Hammond wrote: > I thought rules were pretty clear with reference counting - dont assume > _anything_ about the object unless you hold a reference (or are damn sure > someone else does!) Moshe isn't breaking that rule. He isn't assuming anything about the object, just about the value of the pointer to that object. I agree, though, that it's bad practice to rely on it having the old value, after DECREFing it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Thu Jan 11 22:48:46 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 16:48:46 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 08:42:44 PST." References: Message-ID: <200101112148.QAA16227@cj20424-a.reston1.va.home.com> > Sorry, you're right. I retract my comment about __all__. Can you explain *why* you wanted to test for package-ness? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 22:55:24 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 16:55:24 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Thu, 11 Jan 2001 11:14:00 EST." References: Message-ID: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> > I've put a new version of the setup.py script at > http://www.mems-exchange.org/software/files/python/setup.py > > (I'm at work and can't remember the password to get into > www.amk.ca. :) ) > > This version improves the detection of Tcl/Tk, handles the > _curses_panel module, and doesn't do a chdir(). Same drill as before: > just grab the script, drop it in the root of your Python source tree > (2.0 or current CVS), run "./python setup.py build", and look at the > modules it compiles. I can try it on Linux, so I'm most interested in > hearing reports for other Unix versions (*BSD, HP-UX, etc.) Good work -- but I still can't run this inside a platform-specific subdirectory. Are you planning on supporting this? --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Thu Jan 11 22:20:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 11 Jan 2001 22:20:45 +0100 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) Message-ID: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> > I would very much appreciate your feedback At the first glance, it looks *very* promising. I really look forward to see it in 2.1. However, robustness probably needs to be improved: >>> help() Traceback (most recent call last): File "", line 1, in ? TypeError: not enough arguments to help(); expected 1, got 0 Wasn't there even a proposal that >>> help should do something meaningful (by implementing __repr__)? >>> import string >>> help(string) Traceback (most recent call last): File "", line 1, in ? File "pydoc.py", line 183, in help pager('Help on %s:\n\n' % desc + textdoc.document(thing)) File "./textdoc.py", line 171, in document if inspect.ismodule(object): results = document_module(object) File "./textdoc.py", line 87, in document_module if (inspect.getmodule(value) or object) is object: File "./inspect.py", line 190, in getmodule file = getsourcefile(object) File "./inspect.py", line 204, in getsourcefile filename = getfile(object) File "./inspect.py", line 172, in getfile raise TypeError, 'arg is a built-in class' TypeError: arg is a built-in class Also, the tools could use some command line options: martin at mira:~/pydoc > ./pydoc.py --help Traceback (most recent call last): File "./pydoc.py", line 190, in ? opts[args[i][1:]] = args[i+1] IndexError: list index out of range At a minimum, I propose -h, --help, -v, -V. Regards, Martin From fdrake at acm.org Thu Jan 11 23:11:24 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 11 Jan 2001 17:11:24 -0500 (EST) Subject: [Python-Dev] [PEP 205] Weak References PEP updated, patch available! Message-ID: <14942.12172.129547.770776@cj42289-a.reston1.va.home.com> I've updated the Weak References PEP a little: http://python.sourceforge.net/peps/pep-0205.html A preliminary version of the implementation and documentation is available as well: http://sourceforge.net/patch/?func=detailpatch&patch_id=103203&group_id=5470 Please send feedback on the PEP or implementation to me. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From akuchlin at mems-exchange.org Thu Jan 11 23:26:33 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 11 Jan 2001 17:26:33 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: <200101112155.QAA16678@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 11, 2001 at 04:55:24PM -0500 References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> Message-ID: <20010111172633.A26249@kronos.cnri.reston.va.us> On Thu, Jan 11, 2001 at 04:55:24PM -0500, Guido van Rossum wrote: >Good work -- but I still can't run this inside a platform-specific >subdirectory. Are you planning on supporting this? I didn't really understand this when you pointed it out, but forgot to ask for clarification. What does your directory layout look like? --amk From ping at lfw.org Thu Jan 11 23:26:53 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 14:26:53 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> Message-ID: On Thu, 11 Jan 2001, Martin v. Loewis wrote: > > However, robustness probably needs to be improved: Agreed. > Wasn't there even a proposal that > > >>> help > > should do something meaningful (by implementing __repr__)? There was. I am planning to incorporate Paul Prescod's mechanism for doing this; i just didn't have time to throw in that feature yet, and wanted feedback on the man-like stuff first. My next two targets are: 1. Generating text from the HTML documentation files using Paul Prescod's stuff in onlinehelp.py. 2. Running a background HTTP server that produces its pages using htmldoc.py. Both are pieces we already have and only need to integrate; i just wanted to get at least a working candidate done first. Did using pydoc like "man" work okay for you? > >>> import string > >>> help(string) > Traceback (most recent call last): ... > TypeError: arg is a built-in class Mine doesn't do this for me. I think i may have left up an older version of inspect.py by mistake. Try downloading http://www.lfw.org/python/inspect.py again -- apologies for the hassle. > Also, the tools could use some command line options: > > martin at mira:~/pydoc > ./pydoc.py --help > Traceback (most recent call last): > File "./pydoc.py", line 190, in ? > opts[args[i][1:]] = args[i+1] > IndexError: list index out of range > > At a minimum, I propose -h, --help, -v, -V. Okay. There is usage help already; i just failed to make it sufficiently robust about deciding when to show it. skuld[1010]% pydoc /home/ping/bin/pydoc ... Show documentation on something. may be the name of a Python function, module, package, or a dotted reference to a class or function within a module or module in a package. /home/ping/bin/pydoc -k Search for a keyword in the short descriptions of modules. -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler From ping at lfw.org Thu Jan 11 23:28:44 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 14:28:44 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101112148.QAA16227@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Guido van Rossum wrote: > > Sorry, you're right. I retract my comment about __all__. > > Can you explain *why* you wanted to test for package-ness? Auto-generating documentation. pydoc.py currently tests for __path__, and looks for the presence of __init__.py in a subdirectory to mean that the subdirectory name is a package name. Is it safe on all platforms to just list all .py files in the subdirectory to get all submodules? -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler From tim.one at home.com Fri Jan 12 00:17:06 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 18:17:06 -0500 Subject: [Python-Dev] RE: Baffled on Windows In-Reply-To: Message-ID: [Mark Hammond] > winsound adds "/export:initwinsound" to the link line. This is an > alternative to __declspec in the sources. Yup/arghghghgh. It's fixed now. Thanks! > This all gets back to a discussion we had here nearly a year > or so ago - Yup/arghghghgh. . > that "DL_EXPORT" isnt capturing our semantics, and that we should > probably create #defines that match the _intent_ of the > definition, rather than the implementation details - ie, replace > DL_EXPORT with (say) PY_API_DECL and PY_MODULEINIT_DECL or some > such. Yup/noarghghghgh. > I'm happy to think about this and help implement it if the time > is now right... Same here. Now how can we tell whether the time is right? I must say, it hasn't gotten better by leaving it alone for a year. I think we need a Unix dweeb to play along, though -- if only to confirm that their compilers are no help. >> Any Windows geek got a clue? > Isn't that question a paradox? ;-) Well, nobody else will understand this, but *we* know that Windows geeks need more clues than everyone else put together just to get the box booted each day (or hour <0.9 wink>). From michel at digicool.com Fri Jan 12 02:15:52 2001 From: michel at digicool.com (Michel Pelletier) Date: Thu, 11 Jan 2001 20:15:52 -0500 Subject: [Python-Dev] New Draft PEP: Python Interfaces Message-ID: Hello, I have roughed out a draft PEP that proposes the extension of Python to include an interface framework. It is posted online here: http://www.zope.org/Members/michel/InterfacesPEP/PEP.txt This is my first revision and stab at a PEP. I'd like to find out what you think about the PEP and maybe discuss it some more offline on a different list. Thanks! -Michel From martin at loewis.home.cs.tu-berlin.de Fri Jan 12 02:15:25 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 02:15:25 +0100 Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: (message from Ka-Ping Yee on Thu, 11 Jan 2001 14:26:53 -0800 (PST)) References: Message-ID: <200101120115.f0C1FPx03702@mira.informatik.hu-berlin.de> > Did using pydoc like "man" work okay for you? Yes, that is very impressive. > Mine doesn't do this for me. I think i may have left up an older version > of inspect.py by mistake. Try downloading > > http://www.lfw.org/python/inspect.py > > again -- apologies for the hassle. No need to apologize. It works fine now. Thanks, Martin From moshez at zadka.site.co.il Fri Jan 12 10:53:35 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 11:53:35 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: References: Message-ID: <20010112095335.E8A15A82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001, "Mark Hammond" wrote: > I think it bad style indeed - for example, I could see benefit in having > DECREF (or _Py_Dealloc, called by decref) set the object to NULL in debug > builds. What if that decision is taken in the future? > > I thought rules were pretty clear with reference counting - dont assume > _anything_ about the object unless you hold a reference (or are damn sure > someone else does!) I'm not assuming anything about the object -- I'm assuming something about the pointer. And macros should not change their arguments -- DECREF is basically a wrapper around _Py_Dealloc((PyObject *)(op)). Just like free(pointer); if (pointer == NULL) do_something(); is perfectly legal C. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez at zadka.site.co.il Fri Jan 12 10:57:32 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 11:57:32 +0200 (IST) Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <14941.59707.632995.224116@beluga.mojam.com> References: <14941.59707.632995.224116@beluga.mojam.com> Message-ID: <20010112095732.1F65BA82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 11:11:23 -0600 (CST), Skip Montanaro wrote: > > I know Guido has said he doesn't want to fiddle with dir(), but my sense of > things from the overall discussion of the __exports__ concept tells me that > when used interactively dir() often presents confusing output for new Python > users. > > I twiddled CGIHTTPServer to have __all__ and added the following dir() > function to my PYTHONSTARTUP file: > > def dir(o,showall=0): > if not showall and hasattr(o, "__all__"): > x = list(o.__all__) > x.sort() > return x > from __builtin__ import dir as d > return d(o) > > Compare its output with and without showall set: > > >>> dir(CGIHTTPServer) > ['CGIHTTPRequestHandler', 'test'] > >>> dir(CGIHTTPServer,1) > ['BaseHTTPServer', 'CGIHTTPRequestHandler', 'SimpleHTTPServer', '__all__', > '__builtins__', '__doc__', '__file__', '__name__', '__version__', > 'executable', 'nobody', 'nobody_uid', 'os', 'string', 'sys', 'test', > 'urllib'] > > I haven't demonstrated any great programming prowess with this little > function, but I rather suspect it may be beyond most brand new users. If > Guido can't be convinced to allow dir() to change, how about adding a sample > PYTHONSTARTUP file to the distribution that contains little bits like this > and Ping's pydoc.help stuff (assuming it gets into the distro, which I hope > it does)? And, while we're at it, the following bit too can be in the PYTHONSTARTUP: def display(x): import __builtin__ __builtin__._ = None if type(x) == type(''): print `x` else: print x __built__._ = x import sys sys.displayhook = display -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Fri Jan 12 03:33:59 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 21:33:59 -0500 Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <20010111034709.C23512@glacier.fnational.com> Message-ID: [Neil Schemenauer] > I'm -1 on making dir() pay attention to __all__. Me too. The original __exports__ idea was an ironclad guarantee about which names were externally visible for *any* purpose. Then it made sense to restrict dir() accordingly. But if __all__ is just "a hint" (to be ignored or honored at whim, by whoever chooses), the introspective uses of dir() must be served too. > I'm +1 on adding a help() function which pays attention to > __all__ and (optionally?) prints doc strings. I can't be +1 on anything that vague -- although I'm +1 on each part of it if done in exactly the way I envision . From ping at lfw.org Fri Jan 12 03:51:54 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 18:51:54 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101120115.f0C1FPx03702@mira.informatik.hu-berlin.de> Message-ID: On Fri, 12 Jan 2001, Martin v. Loewis wrote: > > Did using pydoc like "man" work okay for you? > > Yes, that is very impressive. Good. What platform did you try it on? I have updated the scripts now to provide a very rudimentary HTTP server feature: skuld[1316]% pydoc -p 8080 starting server on port 8080 This starts a server on port 8080 that generates HTML documentation for modules on the fly. The root page (http://localhost:8080/) shows an index of modules -- it badly needs some cleaning up, but at least it provides access to all the documentation. http://www.lfw.org/python/pydoc.py http://www.lfw.org/python/htmldoc.py Also, as you requested: skuld[1324]% pydoc -h /home/ping/bin/pydoc ... Show documentation on something. may be the name of a Python function, module, package, or a dotted reference to a class or function within a module or module in a package. /home/ping/bin/pydoc -k Search for a keyword in the short descriptions of modules. /home/ping/bin/pydoc -p Start an HTTP server on the given port on the local machine. More to come. -- ?!ng From fdrake at acm.org Fri Jan 12 04:02:00 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 11 Jan 2001 22:02:00 -0500 (EST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: References: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> Message-ID: <14942.29609.19618.534613@cj42289-a.reston1.va.home.com> Ka-Ping Yee writes: > My next two targets are: > 1. Generating text from the HTML documentation files > using Paul Prescod's stuff in onlinehelp.py. You mean the ones I publish as the standard documentation? Relying on the structure of that HTML is pure folly! I don't think I can make any guaranttees that the HTML structures won't change as the processing evolves. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Fri Jan 12 04:49:47 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 22:49:47 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101111523.KAA14982@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I don't want to call FLOCKFILE while holding the Python lock, as > this means that *if* we're blocked in FLOCKFILE (e.g. we're reading > from a pipe or socket), no other Python thread can run! Ah, good point! Doesn't appear an essential point, though: the HAVE_GETC_UNLOCKED code could still be fiddled easily enough to call FLOCKFILE and FUNLOCKFILE exactly once per line, but with the first thread release before the (dynamically only) FLOCKFILE and the last thread grab after the (dynamically only) FUNLOCKFILE. It's just a question of will, but since that's lacking I'll drop it. > ... > I don't think that _filbuf can possibly wait for another thread to > write data to the same stream object. OK, I'll buy that. Dropped too. > ... > OK. It's unique to MS. So close the bug report with a "won't fix" > resolution. There's no point in having bug reports remain open that > we know we can't fix. We don't really have a policy about that. Perhaps you're articulating one here, though! I've always left bugs open if they're (a) bugs, and (b) open . For example, I left the Norton Blue-Screen crash bug open (although I see now you eventually closed that). Ditto the "Rare hangs in w9xpopen.exe" bug (which is still open, but will never be fixed by *us*). Just other examples of things we'll almost certainly never fix ourselves (we have no handle on them, and all evidence says the OS is screwing up). My view has been that if a user comes to the bug site, it's most helpful for them if active (== "still happens") crashes and hangs appear among the open problems. Now that your view of it is clearer, I'll switch to yours. too-easy-ly y'rs - tim From tim.one at home.com Fri Jan 12 05:22:40 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 23:22:40 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101111527.KAA15005@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > The locking prevents concurrent threads accessing the stream. > > But mixing reads and writes (without intervening fseek etc.) is > illegal use of the stream, and the C standard allows them to be lax > here, even if the program was single-threaded. > > In other words: the locking is so good that it serializes the > sequence of reads and writes; but if the sequence of reads and > writes is illegal, they don't guarantee anything. We're never going to agree on this one, you know. My definition of "bug" here has nothing to do with the std: something's "a bug" if it's not functioning as designed. That's all. So if the implementers would say "oops! that should not have happened!", then to me it's "a bug". It so happens I believe the MS implementers would consider this to be a bug under that defn. Multi-threaded libraries have to be written to a much higher level than the C std guarantees (been there, done that, and so have you), and this is specifically corruption in a crucial area vulnerable to races. They have a timing hole! That's clear. If the MS implementers don't believe that's "a bug", then I'd say they're too unprofessional to be allowed in the same country as a multithreaded library <0.1 wink>. Your definition of "bug" seems to be more "I don't want it in Python's open bug list, so I'll do what Tim usually does and appeal to the std in a transparent effort to convince someone that it's not really 'a bug' -- then maybe I'll get it off of Python's bug list". I'm sure you'll agree that's a fair summary of both sides . it's-a-bug-and-it's-no-longer-on-python's-open-bug-list-ly y'rs - tim From tim.one at home.com Fri Jan 12 07:54:47 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 12 Jan 2001 01:54:47 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101111508.KAA14870@cj20424-a.reston1.va.home.com> Message-ID: [Tim, on for_xreadlines vs readlines_sizehint, after disabling the default 1Mb buffer size in the latter] > They're indistinguishable then on my box (on one run xreadlines > is .1 seconds (out of around 7.6 total) quicker, on another > readlines_sizehint), *provided* that I specify the same buffer > size (8192) that xreadlines uses internally. However, if I even > double that, readlines_sizehint is uniformly about 10% slower. It's > also a tiny bit slower if I cut the sizehint buffer size to 4096. [Guido] > 8192 happens to be the size of the stack-allocated buffer readlines() > uses, and also the stdio BUFSIZ parameter, on many systems. Look for > SMALLCHUNK in fileobject.c. > > Would it make sense to tie the two constants together more to tune > this optimally even when BUFSIZ is different? Have to repeat what I first said: > I'm afraid Mysteries will remain no matter how many > person-decades we spend staring at this <0.5 wink> ... I'm repeating that because BUFSIZ is 4096 on WinTel, but SMALLCHUNK (8192) worked best for me. Now we're in some complex balancing act among how often the outer loop needs to refill the readlines_sizehint buffer;, how out of whack the latter is with the platform stdio buffer; whether platform malloc takes only twice as long to allocate space for 2*N strings as for N; and, if the readlines buffer is too large, at exactly which point the known Win9x eventually-quadratic-time behavior of PyList_Append starts to kick in. I can't out-think all that. Indeed, I can't out-think any of it . After staring at the code, I expect my "only a tiny bit slower" was an illusion: if 0 < sizehint <= SMALLCHUNK, sizehint appears to have no effect on the operation on file_readline. BTW, changing fileobject.c's SMALLCHUNK to a copy of BUFSIZ didn't make any difference on Windows. From moshez at zadka.site.co.il Fri Jan 12 17:03:58 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 18:03:58 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,1.2,1.3 In-Reply-To: References: Message-ID: <20010112160358.B0AC0A82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001, Thomas Wouters wrote: > Noone but me cares, but Guido said to go ahead and fix it if it bothered me. I think you meant no one. Noone is an archaic spelling of noon. quid-pro-quo-ly y'rs, Z. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From fredrik at effbot.org Fri Jan 12 09:17:11 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 12 Jan 2001 09:17:11 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,1.2,1.3 References: <20010112160358.B0AC0A82D@darjeeling.zadka.site.co.il> Message-ID: <012a01c07c70$11aac700$e46940d5@hagrid> > > Noone but me cares, but Guido said to go ahead and fix it if it bothered me. > > I think you meant no one. Noone is an archaic spelling of noon. no, he meant me. I care. From martin at loewis.home.cs.tu-berlin.de Fri Jan 12 09:09:00 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 09:09:00 +0100 Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: (message from Ka-Ping Yee on Thu, 11 Jan 2001 18:51:54 -0800 (PST)) References: Message-ID: <200101120809.f0C890B00802@mira.informatik.hu-berlin.de> > Good. What platform did you try it on? Linux, in a Konsole. I guess that is an environment you'd been using as well :-) Martin From jack at oratrix.nl Fri Jan 12 10:57:27 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 12 Jan 2001 10:57:27 +0100 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message by Ka-Ping Yee , Thu, 11 Jan 2001 08:36:36 -0800 (PST) , Message-ID: <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> > I'm pleased to announce a reasonable first pass at a documentation > utility for interactive use. "pydoc" is usable in three ways: [...] > I would very much appreciate your feedback, especially from testing > on non-Unix platforms. Thank you! Wow, I'm impressed! To make it run on the mac I had to add tests for the existence of os.system only. (So all statements "if os.system(...) > 0:" got to be "if hasattr(os, "system") and os.system(...) > 0:"). There are however various other niceties that could be added to make it more useful, can this be put into the repository or something? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein at lyra.org Fri Jan 12 11:31:53 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 12 Jan 2001 02:31:53 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <20010111224157.A2467@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 11, 2001 at 10:41:57PM +0100 References: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> <20010111224157.A2467@xs4all.nl> Message-ID: <20010112023153.Q4640@lyra.org> On Thu, Jan 11, 2001 at 10:41:57PM +0100, Thomas Wouters wrote: > On Thu, Jan 11, 2001 at 12:25:30PM -0800, Mark Hammond wrote: > > > I thought rules were pretty clear with reference counting - dont assume > > _anything_ about the object unless you hold a reference (or are damn sure > > someone else does!) > > Moshe isn't breaking that rule. He isn't assuming anything about the object, > just about the value of the pointer to that object. I agree, though, that > it's bad practice to rely on it having the old value, after DECREFing it. Oh, that is just so much baloney. If I said Py_DECREF(&ptr), *then* I'd be worried. But if I ever call Py_DECREF(foo) and it modifies foo, then I'd be quite upset. "functions" just aren't supposed to do that. -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Jan 12 14:51:51 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 08:51:51 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Thu, 11 Jan 2001 17:26:33 EST." <20010111172633.A26249@kronos.cnri.reston.va.us> References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> Message-ID: <200101121351.IAA19676@cj20424-a.reston1.va.home.com> > >Good work -- but I still can't run this inside a platform-specific > >subdirectory. Are you planning on supporting this? > > I didn't really understand this when you pointed it out, but forgot to > ask for clarification. What does your directory layout look like? Ah. It's very simple. I create a directory "linux" as a subdirectory of the Python source tree (i.e. at the same level as Lib, Objects, etc.). Then I chdir into that directory, and I say "../configure". The configure script creates subdirectories to hold the object files for me: Grammar, Parser, Objects, Python, Modules, and sticks Makefiles in them. The "srcdir" variable in the Makefiles is set to "..". Then I say "make" and it builds Python. The source directories are used but no files are created or modified there: all files are created in the "linux" directory. This lets me have several separate configurations: the feature used to be intended for sharing a source tree between multiple platforms, but now I use it to have threaded, nonthreaded, debugging, and regular builds under a single source tree. This also works where the build directory is completely outside the source tree (some people apparently mount the source tree read-only). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 12 14:54:12 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 08:54:12 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 14:28:44 PST." References: Message-ID: <200101121354.IAA19700@cj20424-a.reston1.va.home.com> > > Can you explain *why* you wanted to test for package-ness? > > Auto-generating documentation. pydoc.py currently tests for __path__, > and looks for the presence of __init__.py in a subdirectory to mean > that the subdirectory name is a package name. Is it safe on all platforms > to just list all .py files in the subdirectory to get all submodules? Yes, that should work. Of course there could also be extension modules or .pyc-only files there -- you could use imp..get_suffixes() to find out all modules (even if that means you don't always have the source code available). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 12 15:07:30 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 09:07:30 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 11 Jan 2001 22:49:47 EST." References: Message-ID: <200101121407.JAA19781@cj20424-a.reston1.va.home.com> > [Guido] > > I don't want to call FLOCKFILE while holding the Python lock, as > > this means that *if* we're blocked in FLOCKFILE (e.g. we're reading > > from a pipe or socket), no other Python thread can run! [Tim] > Ah, good point! Doesn't appear an essential point, though: the > HAVE_GETC_UNLOCKED code could still be fiddled easily enough to call > FLOCKFILE and FUNLOCKFILE exactly once per line, but with the first thread > release before the (dynamically only) FLOCKFILE and the last thread grab > after the (dynamically only) FUNLOCKFILE. It's just a question of will, but > since that's lacking I'll drop it. Yes, but if the line is very long, you'd have to use malloc() -- you can't use _PyString_Resize() since that can access the thread state. You're right that I don't want to do this. > > OK. It's unique to MS. So close the bug report with a "won't fix" > > resolution. There's no point in having bug reports remain open that > > we know we can't fix. > > We don't really have a policy about that. Perhaps you're articulating one > here, though! I've always left bugs open if they're (a) bugs, and (b) open > . For example, I left the Norton Blue-Screen crash bug open (although > I see now you eventually closed that). Ditto the "Rare hangs in > w9xpopen.exe" bug (which is still open, but will never be fixed by *us*). > Just other examples of things we'll almost certainly never fix ourselves (we > have no handle on them, and all evidence says the OS is screwing up). Yes, as I was thinking about this I realized that that was the policy I wanted. So, yes, the w9xpopen popen bug can be closed as WontFix too. > My view has been that if a user comes to the bug site, it's most helpful for > them if active (== "still happens") crashes and hangs appear among the open > problems. Now that your view of it is clearer, I'll switch to yours. I find it more important that the bug list gives us developers an overview of tasks to be tackled. The problems that won't go away can be listed in the Python 2.0 MoinMoin web! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 12 15:27:43 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 09:27:43 -0500 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Your message of "Fri, 12 Jan 2001 10:57:27 +0100." <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> References: <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> Message-ID: <200101121427.JAA20034@cj20424-a.reston1.va.home.com> > There are however various other niceties that could be added to make it more > useful, can this be put into the repository or something? Ping, do you think you could check this in into the nondist tree? nondist/sandbox/help would seem a good name (next to Paul's nondist/sandbox/doctools). --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Fri Jan 12 17:37:57 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 12 Jan 2001 10:37:57 -0600 (CST) Subject: [Python-Dev] [Patch #103154] Cygwin Check Import Case Patch In-Reply-To: References: Message-ID: <14943.13029.103771.261362@beluga.mojam.com> Guido> Summary: Cygwin Check Import Case Patch ... Guido> But I believe the solution is that the TERMIOS module should be Guido> renamed. Isn't this a general problem? As I recall, the convention when generating Python modules from C header files is to simply convert the base name to upper case and replace ".h" with ".py" (errno.h -> ERRNO.py). From h2py.py: # Without filename arguments, acts as a filter. # If one or more filenames are given, output is written to corresponding # filenames in the local directory, translated to all uppercase, with # the extension replaced by ".py". Perhaps the convention should be instead to append "d" or "data" to the base name (errno.h -> errnodata.py). Skip From guido at python.org Fri Jan 12 18:47:46 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 12:47:46 -0500 Subject: [Python-Dev] [Patch #103154] Cygwin Check Import Case Patch In-Reply-To: Your message of "Fri, 12 Jan 2001 10:37:57 CST." <14943.13029.103771.261362@beluga.mojam.com> References: <14943.13029.103771.261362@beluga.mojam.com> Message-ID: <200101121747.MAA27504@cj20424-a.reston1.va.home.com> > Guido> Summary: Cygwin Check Import Case Patch > ... > Guido> But I believe the solution is that the TERMIOS module should be > Guido> renamed. > > Isn't this a general problem? As I recall, the convention when generating > Python modules from C header files is to simply convert the base name to > upper case and replace ".h" with ".py" (errno.h -> ERRNO.py). From h2py.py: > > # Without filename arguments, acts as a filter. > # If one or more filenames are given, output is written to corresponding > # filenames in the local directory, translated to all uppercase, with > # the extension replaced by ".py". > > Perhaps the convention should be instead to append "d" or "data" to the base > name (errno.h -> errnodata.py). An even better solution is to get rid of those generated headers and incorporate the desired symbols directly in the C extension modules. That's happened for errno and socket, for example; maybe it's time to do that for termios, too! --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Fri Jan 12 19:54:47 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 12 Jan 2001 13:54:47 -0500 Subject: [Python-Dev] Patch 103216 - dbmmodule Setup changes Message-ID: <14943.21239.382891.661026@anthem.wooz.org> I've just uploaded patch 103216 to the Python project at SF. This does a couple of things. First, it auto-detects (in configure) whether dbmmodule can be built, and if so whether the -lndbm library needs to be specified. Second, it moves the entry for dbmmodule to Setup.conf, after the *shared* key so that it'll be built as a dynamic library by default. This should fix the problem where compiling in dbmmodule sets up a dependency to libdb which later hoses pybsddb3. I'd have just checked it in, but I'd like someone else to just proof it first. I've only tested this with the current CVS tree on a fairly stock RH6.1. BTW, I didn't include the changes to configure in the patch, because it's large and made SF's patch manager cough. Besides it can be generated from configure.in and config.h.in which are included in the patch. Cheers, -Barry From martin at loewis.home.cs.tu-berlin.de Fri Jan 12 23:19:57 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 23:19:57 +0100 Subject: [Python-Dev] PEP 205 comments Message-ID: <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> Before commenting on the patch itself, I'd like to comment on the patch describing it. I'm missing a discussion as to why weak references don't act as proxies (or why they do now). A weak proxy would provide the same attributes as the object which it encapsulates, so it could be used transparently in place of the original object. I can think of a number of reasons why it is not done this way (e.g. complete transparency is impossible to achieve); now that a revision of the patch provides proxies, the documentation should state which features are forwarded to the proxy and which aren't (it lists the type() as a difference, but I doubt that is the only difference - repr is also different). Next, I wonder whether weakref.new is allowed to return an existing weak reference to the same object. If that is not acceptable, I'd like to know why - if it was acceptable, then weakref.new(instance) (i.e. without callback) could return the same weak reference all the time. A smart implementation might chose to put the weak reference with no callback in the start of the list, so creation of additional weak references to the same object would be inexpensive. Likewise, I'd like to know the rationale for the clear method. Why is it desirable to drop the object, yet keep the weak reference? Isn't it easier for the application to either ignore clearing altogether, or dropping the reference to the weak reference? So I'd propose to kill the clear method. Again on proxies, there is no discussion or documentation of the ReferenceError. Why is it a RuntimeError? LookupError, ValueError, and AttributeError seem to be just as fine or better. On to the type type extensions: Should there be a type flag indicating presence of tp_weaklistoffset? It appears that the type structure had tp_xxx7 for a long time, so likely all in-use binary modules have that field set to zero. Is that sufficient? Thanks for reading all of this message, Martin From skip at mojam.com Sat Jan 13 16:37:55 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 13 Jan 2001 09:37:55 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tempfile.py,1.23,1.24 In-Reply-To: References: Message-ID: <14944.30291.658931.489979@beluga.mojam.com> Tim> On Linux, someone please run that standalone with more files and/or Tim> more threads; e.g., Tim> python lib/test/test_threadedtempfile.py -f 1000 -t 10 Tim> to run with 10 threads each creating (and deleting) 1000 temp files. After capitalizing "Lib", it worked fine for me: % ./python Lib/test/test_threadedtempfile.py -f 1000 -t 10 Creating Starting Reaping Done: errors 0 ok 10000 Skip From dkwolfe at pacbell.net Sat Jan 13 19:48:21 2001 From: dkwolfe at pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 10:48:21 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore Message-ID: <0G740027Q6Q1KL@mta6.snfc21.pbi.net> Howdy Folks, I need some help here. I'd like to see Python build out of the box with a ./configure, make, make test, and make install on Darwin and Mac OS X. Having it build out of the box will make it easier to be incorporated into both Darwin and the base Mac OS X distribution - although not for the initial release of the latter but definitely doable for subsequent releases. In order to do this, I need to have it build cleanly on HFS and UFS filesystems. Under HFS system, I've got a name conflict due to case insenstivity between the build target and the "Python" directory that forces me to build with a -with-suffix command on HFS and manually change the name after install - which is an automatic knockout factor when it comes to incorporating it in an automatic build system. Not to mention a problem with unix newbies trying to build from source... Last night, I did some quick investigation to determine the best way to fix this problem as documented in PEP-42 in the build section and Sourceforge bug 122215 and determined that the easiest and least error prone way was to change the directory name Python to PyCore. It's apparent from the comments that I'm missing something here as the reaction has been negative so far - to the point where Guido has rejected the patch. Can someone explain what I'd missing that's causing such strong feelings? My second question is how do I resolve the name conflict in an approved way? It's been suggested that a build directory be created (/src/build ?) and that the target be place here. The problem that I had with this suggestion is that it would require an additional layer to execute the target and I wasn't sure what impact it whould have on running python from a new directory... which is the reason I took the more known path. :-) Bottom line, come March 24th, Mac OS X 1.0 will be released and as of July 2001 all Macintoshes will come with Mac OS X. I'd like to see Python be easily built on "out of the box" these machines - rather come with a haphazardous list of instructions or commands as currently needed for 1.5.2 and 2.0 releases. And hopefully, at some point be incorporated into the base Mac OS X installation... - Dan Wolfe From esr at thyrsus.com Sat Jan 13 21:23:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 15:23:50 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? Message-ID: <20010113152350.A17338@thyrsus.com> I have a new goodie for the 2.1 standard library, a module called "simil" that supports computation of similarity indices between strings such as one might use for recovery-matching of misspellings against a dictionary. The three methods supported are stemming, normalized Hamming similarity, and (the star of the show) Ratcliff-Obershelp gestalt subpattern matching. The latter is spookily effective for detecting not just substition typos but insertions and deletions. The module is a C extension (my first!) for speed and because the Ratcliff-Obershelp implementation uses pointer arithmetic heavily. It's documented, tested, and ready to go. But having written it, I now have a question: why is soundex marked obsolete? Is there something wrong with the algorithm or implementation? If not, then it would be natural for simil to absorb the existing soundex implementation as a fourth entry point. -- Eric S. Raymond Whether the authorities be invaders or merely local tyrants, the effect of such [gun control] laws is to place the individual at the mercy of the state, unable to resist. -- Robert Anson Heinlein, 1949 -- Eric S. Raymond Americans have the right and advantage of being armed - unlike the citizens of other countries whose governments are afraid to trust the people with arms. -- James Madison, The Federalist Papers From tim.one at home.com Sat Jan 13 22:34:10 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 16:34:10 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113152350.A17338@thyrsus.com> Message-ID: [Eric S. Raymond] > I have a new goodie for the 2.1 standard library, a module called > "simil" that supports computation of similarity indices between > strings such as one might use for recovery-matching of misspellings > against a dictionary. My guess is that Guido won't accept it. > The three methods supported are stemming, normalized Hamming > similarity, and (the star of the show) Ratcliff-Obershelp gestalt > subpattern matching. The latter is spookily effective for detecting > not just substition typos but insertions and deletions. The module is > a C extension (my first!) for speed and because the Ratcliff-Obershelp > implementation uses pointer arithmetic heavily. Never heard of R-O, so tracked down some C code via google. It appears I invented the same algorithm at Cray Research in the early 80's for a diff generator, which later got reincarnated in my ndiff.py (in the Tools/scripts/ directory). ndiff generates "human-friendly" diffs between text files, at both the "file is a sequence of lines" and "line is a sequence of characters" levels. I didn't have the hyperbolic marketing genius to call it "gestalt subpattern matching", though -- I thought of it as what Unix diff *would* do if it constrained itself to matching *contiguous* subsequences, and under the theory people would find that more natural because contiguity is something the human visual system naturally latches on to. ndiff can be spookily natural in practice too. > It's documented, tested, and ready to go. But having written it, I > now have a question: why is soundex marked obsolete? Is there > something wrong with the algorithm or implementation? What is the soundex algorithm? Not joking. Skip Montanaro and I were unable to find the algorithm implemented by soundex.c anywhere in the literature, and I never found *any* two definitions that were the same. Even Knuth changed his description of Soundex between editions 2 and 3 of volume 3. Skip eventually merged my and Fred Drake's Python implementations of Knuth Vol 3 Ed 3 Soundex (see the Vaults of Parnassus). > If not, then it would be natural for simil to absorb the existing > soundex implementation as a fourth entry point. Well, soundex.c doesn't match any other Soundex on earth, so it's not worth reproducing in new code. Guido doesn't want to be in the middle of fighting over ill-defined algorithms, so booted Soundex entirely. Another candidate for inclusion is the NYSIIS algorithm, which is probably in more "serious" use than Soundex anyway. Same thing with NYSIIS, though (i.e., what-- exactly --is "the NYSIIS algorithm"?), except that Knuth didn't do us the favor of making up his own variation that will *become* "the std" via force of reputation. Sean True implemented *a* NYSIIS in Python (and again see the Vaults for a link to that). So that's why the module is unlikely to make it into the core: + There are any number of algorithms people may want to see (I don't know what "normalized Hamming similarity" means, but if it's not the same as Levenshtein edit distance then add the latter to the pot too). + Each algorithm on its own is likely controversial. + Computing string similarity is something few apps need anyway. Lots of hassle + little demand == not a natural for the core. ndiff is in the core only because many people found the *app* useful; its SequenceMatcher class isn't even advertised. may-never-understand-how-bigints-got-into-python-ly y'rs - tim From fdrake at acm.org Sat Jan 13 22:45:12 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 13 Jan 2001 16:45:12 -0500 (EST) Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: References: <20010113152350.A17338@thyrsus.com> Message-ID: <14944.52328.558763.46161@cj42289-a.reston1.va.home.com> Tim Peters writes: > + Computing string similarity is something few apps need anyway. And this is a biggie. > Lots of hassle + little demand == not a natural for the core. ndiff is in But it *is* an excellent type of thing to have around -- Eric: just post it on your Web site and register it with the Vaults. > the core only because many people found the *app* useful; its > SequenceMatcher class isn't even advertised. Did you ever write documentation for it? ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas at arctrix.com Sat Jan 13 16:17:58 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sat, 13 Jan 2001 07:17:58 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Sat, Jan 13, 2001 at 02:06:07PM -0800 References: Message-ID: <20010113071758.C28643@glacier.fnational.com> [Guido van Rossum on Demo/embed/loop] > (Except it still leaks, but that's probably a separate issue.) Could this be caused by modules adding things to their dict and then forgetting to decref them? I know I've been guilty of that. Neil From esr at thyrsus.com Sat Jan 13 23:15:28 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 17:15:28 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 04:34:10PM -0500 References: <20010113152350.A17338@thyrsus.com> Message-ID: <20010113171528.A17480@thyrsus.com> OK, now I understand why soundex isn't in the core -- there's no canonical version. Tim Peters : > + There are any number of algorithms people may want to see (I don't know > what "normalized Hamming similarity" means, but if it's not the same as > Levenshtein edit distance then add the latter to the pot too). Normalized Hamming similarity: it's an inversion of Hamming distance -- number of pairwise matches in two strings of the same length, divided by the common string length. Gives a measure in [0.0, 1.0]. I've looked up "Levenshtein edit distance" and you're rigbt. I'll add it as a fourth entry point as soon as I can find C source to crib. (Would you happen to have a pointer?) > + Each algorithm on its own is likely controversial. Not these. There *are* canonical versions of all these, and exact equivalents are all heavily used in commercial OCR software. > + Computing string similarity is something few apps need anyway. Tim, this isn't true. Any time you need to validate user input against a controlled vocabulary and give feedback on probable right choices, R/O similarity is *very* useful. I've had it in my personal toolkit for a decade and used it heavily for this -- you take your unknown input, check it against a dictionary and kick "maybe you meant foo?" to the user for every foo with an R/O similarity above 0.6 or so. The effects look like black magic. Users love it. -- Eric S. Raymond "I hold it, that a little rebellion, now and then, is a good thing, and as necessary in the political world as storms in the physical." -- Thomas Jefferson, Letter to James Madison, January 30, 1787 From guido at python.org Sat Jan 13 23:25:12 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Jan 2001 17:25:12 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: Your message of "Sat, 13 Jan 2001 07:17:58 PST." <20010113071758.C28643@glacier.fnational.com> References: <20010113071758.C28643@glacier.fnational.com> Message-ID: <200101132225.RAA03197@cj20424-a.reston1.va.home.com> > [Guido van Rossum on Demo/embed/loop] > > (Except it still leaks, but that's probably a separate issue.) > > Could this be caused by modules adding things to their dict and > then forgetting to decref them? I know I've been guilty of that. Do you have a tool that detects leaks? Barry has one: Insure++. It's expensive and we don't have a site license, so I'll ask Barry to investigate this. (Barry: go to Demo/embed and do "make looptest". Then in another shell window use "top" to watch the "loop" process grow slowly. I'd love to find out what's the problem here. It's not dependent on what you ask it to loop over; "./loop pass" also grows. Of course it could be one of the modules loaded during initialization...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Jan 13 23:33:34 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Jan 2001 17:33:34 -0500 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Your message of "Sat, 13 Jan 2001 10:48:21 PST." <0G740027Q6Q1KL@mta6.snfc21.pbi.net> References: <0G740027Q6Q1KL@mta6.snfc21.pbi.net> Message-ID: <200101132233.RAA03229@cj20424-a.reston1.va.home.com> > Howdy Folks, > > I need some help here. I'd like to see Python build out of the box with a > ./configure, make, make test, and make install on Darwin and Mac OS X. > Having it build out of the box will make it easier to be incorporated > into both Darwin and the base Mac OS X distribution - although not for > the initial release of the latter but definitely doable for subsequent > releases. In order to do this, I need to have it build cleanly on HFS and > UFS filesystems. > > Under HFS system, I've got a name conflict due to case insenstivity > between the build target and the "Python" directory that forces me to > build with a -with-suffix command on HFS and manually change the name > after install - which is an automatic knockout factor when it comes to > incorporating it in an automatic build system. Not to mention a problem > with unix newbies trying to build from source... > > Last night, I did some quick investigation to determine the best way to > fix this problem as documented in PEP-42 in the build section and > Sourceforge bug 122215 and determined that the easiest and least error > prone way was to change the directory name Python to PyCore. > > It's apparent from the comments that I'm missing something here as the > reaction has been negative so far - to the point where Guido has rejected > the patch. Can someone explain what I'd missing that's causing such > strong feelings? We use CVS to manage the sources. CVS makes it it very hard to a directory; it doesn't have a command for this, so you have to do the move directly in the repository, which will then break checkouts for everyone who has a work directory linked to the CVS repository. Using SourceForge makes it a bit harder still: we have to ask the SF sysadmins to do the move for us. And if we did the move, it would be much harder to reproduce old versions of the source tree with a single CVS command. A way around that would be to do a copy instead of a move, but that would cause the directory "PyCore" to pop up in all old versions, too. I just don't want to go through this hassle in order to make building easier for one relatively little-used platform. > My second question is how do I resolve the name conflict in an approved > way? It's been suggested that a build directory be created (/src/build > ?) and that the target be place here. The problem that I had with this > suggestion is that it would require an additional layer to execute the > target and I wasn't sure what impact it whould have on running python > from a new directory... which is the reason I took the more known path. > :-) I don't understand what you are proposing here; I can't imagine that an extra directory level could cause a slowdown. A suggestion I would be open to: change the executable name during build (currently a .exe suffix is added), but change it back (removing the .exe suffix) during the install. That should be a small change to the Makefile. > Bottom line, come March 24th, Mac OS X 1.0 will be released and as of > July 2001 all Macintoshes will come with Mac OS X. I'd like to see > Python be easily built on "out of the box" these machines - rather come > with a haphazardous list of instructions or commands as currently needed > for 1.5.2 and 2.0 releases. And hopefully, at some point be incorporated > into the base Mac OS X installation... Just get Apple to include Python with their standard distribution and nobody will *have* to build Python on Mac OSX. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sun Jan 14 00:59:44 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 18:59:44 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113171528.A17480@thyrsus.com> Message-ID: [Eric] > OK, now I understand why soundex isn't in the core -- there's no > canonical version. Actually, I think Knuth Vol 3 Ed 3 is canonical *now* -- nobody would dare to oppose him <0.5 wink>. > Normalized Hamming similarity: it's an inversion of Hamming distance > -- number of pairwise matches in two strings of the same length, > divided by the common string length. Gives a measure in [0.0, 1.0]. > > I've looked up "Levenshtein edit distance" and you're rigbt. I'll add > it as a fourth entry point as soon as I can find C source to crib. > (Would you happen to have a pointer?) If you throw almost everything out of Unix diff, that's what you'll be left with. Offhand I don't know of enencumbered, industrial-strength C source; a problem is that writing a program to compute this is a std homework exercise (it's a common first "dynamic programming" example), so you can find tons of bad C source. Caution: many people want small variations of "edit distance", usually via assigning different weights to insertions, replacements and deletions. A less common but still popular variant is to say that a transposition ("xy" vs "yx") is less costly than a delete plus an insert. Etc. "edit distance" is really a family of algorithms. >> + Each algorithm on its own is likely controversial. > Not these. There *are* canonical versions of all these, See the "edit distance" gloss above. > and exact equivalents are all heavily used in commercial OCR > software. God forbid that core Python may lose the commercial OCR developer market . It's not accepted that for every field F, core Python needs to supply the algorithms F uses heavily. Heck, core Python doesn't even ship with an FFT! Doesn't bother the folks working in signal processing. >> + Computing string similarity is something few apps need anyway. > Tim, this isn't true. Any time you need to validate user input > against a controlled vocabulary and give feedback on probable right > choices, Which is something few apps need anyway -- in my experience, but more so in my *primary* role here of trying to channel for you (& Guido) what Guido will say. It should be clear that I've got some familiarity with these schemes, so it should also be clear that Guido is likely to ask me about them whenever they pop up. But Guido has hardly ever asked me about them over the past decade, with the exception of the short-lived Soundex brouhaha. From that I guess hardly anyone ever asks *him* about them, and that's how channeling works: if this were an area where Guido felt core Python needed beefier libraries, I'm pretty sure I would have heard about it by now. But now Guido can speak for himself. There's no conceivable argument that could change what I *predict* he'll say. > R/O similarity is *very* useful. I've had it in my personal > toolkit for a decade and used it heavily for this -- you take your > unknown input, check it against a dictionary and kick "maybe you meant > foo?" to the user for every foo with an R/O similarity above 0.6 or so. > > The effects look like black magic. Users love it. I believe that. And I'd guess we all have things in our personal toolkits our users love. That isn't enough to get into the core, as I expect Guido will belabor on the next iteration of this . doesn't-mean-the-code-isn't-mondo-cool-ly y'rs - tim From dkwolfe at pacbell.net Sun Jan 14 01:19:56 2001 From: dkwolfe at pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 16:19:56 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore Message-ID: <0G7400EZQM2TXD@mta5.snfc21.pbi.net> >CVS makes it it very hard to a directory... >which will then break checkouts for everyone... with the potential to cause development code to be lost >Using SourceForge...have to ask the SF sysadmins I understand... we also use CVS and periodically (usually pre alpha) reorganize the source... going thru SF sysadmin makes it doublely hard... yuck! However, since you have "released" tarball archives, it seems to me that the loss of the diffs and log notes is more troubling that the need to create an old version.... at least that's been my experience when building software. ;-) >I just don't want to go through this hassle in order to make building >easier for one relatively little-used platform. humph. Ok, I'll accept that for now as we've only sold 100,000 Beta copies of Mac OS X... but if were not over 1 million users by this time next year... I'll eat my words. ;-) >> It's been suggested that a build directory be created (/src/build ?) >> and that the target be place here. >I don't understand what you are proposing here; I can't imagine that >an extra directory level could cause a slowdown. moshez suggested this in his comment on the patch - moving the target to a seperate directory. I'm not sure of the implications of doing this however, and wondered if it might effect the running of the regression suite and the executable before it was installed. >A suggestion I would be open to: change the executable name during >build (currently a .exe suffix is added), but change it back (removing >the .exe suffix) during the install. That should be a small change to >the Makefile. You mean without using the -with-suffix command? That can probably be done... but based on my readings, I'd thought you reject it as not being "clean" and complicating the build process more than it should - not to mention renaming the executable behind the builder's back... Lesser of two evils I guess - I'll investigate this however... >> I'd like to see Python be easily built on "out of the box"... >> [and] incorporated into the base Mac OS X installation... > >Just get Apple to include Python with their standard distribution and >nobody will *have* to build Python on Mac OSX. :-) Easier said that done as they already have the other P language installed. ;-) But then on the other hand, there are quite a few Pythonatic including me who use it in daily work at Apple. As I mentioned, the road to getting it in Mac OS X begins with getting it to build cleanly with the automated build system... so I've got to get this problem fixed before I start working on getting it in the build. - Dan (yes, I work for Apple, but this is something that I'm doing on my own!) From mwh21 at cam.ac.uk Sun Jan 14 01:41:35 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 14 Jan 2001 00:41:35 +0000 Subject: [Python-Dev] a readline replacement? In-Reply-To: Michael Hudson's message of "17 Dec 2000 18:18:24 +0000" References: <200012142319.MAA02140@s454.cosc.canterbury.ac.nz> <3A39ED07.6B3EE68E@lemburg.com> <14906.17412.221040.895357@anthem.concentric.net> <20001215040304.A22056@glacier.fnational.com> <20001215235425.A29681@xs4all.nl> Message-ID: Michael Hudson writes: > It wouldn't be particularly hard to rewrite editline in Python (we > have termios & the terminal handling functions in curses - and even > ioctl if we get really keen). > > I've been hacking on my own Python line reader on and off for a while; > it's still pretty buggy, but if you're feeling brave you could look at: > > http://www-jcsu.jesus.cam.ac.uk/~mwh21/hacks/pyrl-0.0.0.tar.gz As I secretly planned , the embarrassment of having code that full of holes publicly accessible spurred me to writing a much better version, to be found at: http://www-jcsu.jesus.cam.ac.uk/~mwh21/hacks/pyrl-0.2.0.tar.gz (or, now rsync works there again, in the equivalent place on the starship...). If you unpack it and execute $ python python_reader.py you should get something that closely mimics the current interpreter top level. It supports a wide range of cursor motion commands, built-in support for multiple line input and history (including incremental search). It doesn't do completion, basically because I haven't got round to it yet, and it will get into severe trouble if you enter an input that is taller than your terminal (I think this should be surmountable, but I haven't gotten round to this either). Another thing that I haven't gotten round to yet is documentation. After I've tackled these points I'll probably stick it up on parnassus. I've been using it as my standard python shell for a week or so, and quite like it, though the lack of completion is a drag. It is probably staggeringly unportable, so I'd appreciate finding out how it breaks on systems other that Linux with terminals other than xterms... Have the changes to enable use of editline been checked in yet? I worry that the licensing situation around the readline module is grey at best... Cheers, M. -- That's why the smartest companies use Common Lisp, but lie about it so all their competitors think Lisp is slow and C++ is fast. (This rumor has, however, gotten a little out of hand. :) -- Erik Naggum, comp.lang.lisp From esr at thyrsus.com Sun Jan 14 01:58:08 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 19:58:08 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 06:59:44PM -0500 References: <20010113171528.A17480@thyrsus.com> Message-ID: <20010113195808.B17712@thyrsus.com> Tim Peters : > If you throw almost everything out of Unix diff, that's what you'll be left > with. Offhand I don't know of enencumbered, industrial-strength C source; a > problem is that writing a program to compute this is a std homework exercise > (it's a common first "dynamic programming" example), so you can find tons of > bad C source. I found some formal descriptions of the algorithm and some unencumbered Oberon source. I'm coding up C now. It's not complicated if you're willing to hold the cost matrix in memory, which is reasonable for a string comparator in a way it wouldn't be for a file diff. > Caution: many people want small variations of "edit distance", usually via > assigning different weights to insertions, replacements and deletions. A > less common but still popular variant is to say that a transposition ("xy" > vs "yx") is less costly than a delete plus an insert. Etc. "edit distance" > is really a family of algorithms. Which about collapse into one if your function has three weight arguments for insert/replace/delete weights, as mine does. It don't get more general than that -- I can see that by looking at the formal description. OK, so I'll give you that I don't weight transpositions separately, but neither does any other variant I found on the web nor the formal descriptions. A fourth optional weight agument someday, maybe :-). > God forbid that core Python may lose the commercial OCR developer market > . It's not accepted that for every field F, core Python needs to > supply the algorithms F uses heavily. That's not my point -- I don't see OCR as a big Python market either. My point in observing that OCR uses Ratcliff/Obershelp heavily was simplty to show that it's a well-established algorithm, not `controversial'. > Heck, core Python doesn't even ship > with an FFT! Doesn't bother the folks working in signal processing. It probably won't surprise you that I considered writing an FFT extension module at one point :-). > > Tim, this isn't true. Any time you need to validate user input > > against a controlled vocabulary and give feedback on probable right > > choices, > > Which is something few apps need anyway I fundamentally disagree. Few application designers *know* they need it, but user interfaces would get a hell of a lot better if the technique were more commonly applied -- and that's why I want it in the Python library, so doing the right thing in Python will be a minimum-effort proposition. -- Eric S. Raymond What if you were an idiot, and what if you were a member of Congress? But I repeat myself. -- Mark Twain From tim.one at home.com Sun Jan 14 04:17:34 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 22:17:34 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <14944.52328.558763.46161@cj42289-a.reston1.va.home.com> Message-ID: [Fred] > Did you ever write documentation for it? ;-) A lot more than you did . just-show-me-"write-docs"-in-my-job-description-ly y'rs - tim From tim.one at home.com Sun Jan 14 05:39:59 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 23:39:59 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113195808.B17712@thyrsus.com> Message-ID: [Eric, on "edit distance"] > I found some formal descriptions of the algorithm and some > unencumbered Oberon source. I'm coding up C now. It's not > complicated if you're willing to hold the cost matrix in memory, > which is reasonable for a string comparator in a way it wouldn't > be for a file diff. All agreed, and it should be a straightforward task then. I'm assuming it will work with Unicode strings too . [on differing weights] > Which about collapse into one if your function has three weight > arguments for insert/replace/delete weights, as mine does. It don't > get more general than that -- I can see that by looking at the formal > description. > > OK, so I'll give you that I don't weight transpositions separately, > but neither does any other variant I found on the web nor the formal > descriptions. A fourth optional weight agument someday, maybe :-). > ... > and that's why I want it in the Python library, so doing the right > thing in Python will be a minimum-effort proposition. Guido will depart from you at a different point. I depart here: it's not "the right thing". It's a bunch of hacks that appeal not because they solve a problem, but because they're cute algorithms that are pretty easy to implement and kinda solve part of a problem. "The right thing"-- which you can buy --at least involves capturing a large base of knowledge about phonetics and spelling. In high school, one of my buddies was Dan Pryzbylski. If anyone who knew him (other than me ) were to type his name into the class reunion guy's web page, they'd probably spell it the way they remember him pronouncing it: sha-bill-skey (and that's how he pronounced "Dan" ). If that hit on the text string "Pryzbylski", *then* it would be "the right thing" in a way that makes sense to real people, not just to implementers. Working six years in commercial speech recog really hammered that home to me: 95% solutions are on the margin of unsellable, because an error one try in 20 is intolerable for real people. Developers writing for developers get "whoa! cool!" where my sisters walk away going "what good is that?". Edit distance doesn't get within screaming range of 95% in real life. Even for most developers, it would be better to package up the single best approach you've got (f(list, word) -> list of possible matches sorted in confidence order), instead of a module with 6 (or so) functions they don't understand and a pile of equally mysterious knobs. Then it may actually get used! Developers of the breed who would actually take the time to understand what you've done are, I suggest, similar to us: they'd skim the docs, ignore the code, and write their own variations. Or, IOW: > so doing the right thing in Python will be a minimum-effort > proposition. Make someone think first, and 95% of developers will just skip over it too. BTW, the theoretical literature ignored transposition at first, because it didn't fit well in the machinery. IIRC, I first read about it in an issue of SP&E (Software Practice & Experience), where the authors were forced into it because the "traditional" edit sequence measure sucked in their practice. They were much happier after taking transposition into account. The theoreticians have more than caught up since, and research is still active; e.g., 1997's PATTERN RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, DELETIONS AND GENERALIZED TRANSPOSITIONS B. J. Oommen and R. K. S. Loke http://www.scs.carleton.ca/~oommen/papers/GnTrnsJ2.PDF is a good read. As they say there, If one views the elements of the confusion matrices as probabilities, this [treating each character independent of all others, as "edit distance" does] is equivalent to assuming that the transformation probabilities at each position in the string are statistically independent and possess first-order Markovian characteristics. This model is usually assumed for simplicity rather it [sic] having any statistical significance. IOW, because it's easy to analyze, not because it solves a real problem -- and they're complaining about an earlier generalization of edit distance that makes the weights depend on the individual symbols involved as well as on the edit/delete/insert distinction (another variation trying to make this approach genuinely useful in real life). The Oommen-Loke algorithm appears much more realistic, taking into account the observed probabilities of mistyping specific letter pairs (although it still ignores phonetics), and they report accuracies approaching 98% in correctly identifying mangled words. 98% (more than twice as good as 95% -- the error rate is actually more useful to think about, 2% vs 5%) is truly useful for non-geek end users, and the state of the art here is far beyond what's easy to find and dead easy to implement. > ... > It probably won't surprise you that I considered writing an FFT > extension module at one point :-). Nope! More power to you, Eric. At least FFTs *are* state of the art, although *coding* them optimally is likely beyond human ability on modern machines: http://www.fftw.org/ (short course: they've generally got the fastest FFTs available, and their code is generated by program, systematically *trying* every trick in the book, timing it on a given box, and synthesizing a complete strategy out of the quickest pieces). sooner-or-later-the-only-code-real-people-will-use-won't-be-written- by-people-at-all-ly y'rs - tim From tim.one at home.com Sun Jan 14 06:38:52 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 00:38:52 -0500 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: <0G7400EZQM2TXD@mta5.snfc21.pbi.net> Message-ID: [Dan Wolfe] > ... > As I mentioned, the road to getting it in Mac OS X begins with > getting it to build cleanly with the automated build system... so > I've got to get this problem fixed before I start working on > getting it in the build. > > - Dan > (yes, I work for Apple, but this is something that I'm doing > on my own!) Hang in there, Dan! I did the first Python port to the KSR-1 on my own time too, despite working for the visionless bastards at the time. The rest is history: the glory, the fame, the riches, the groupies, the adulation of my peers. We won't mention the financial scandal and subsequent bankruptcy lest it discourage you for no good reason . BTW, "do the simplest thing that can possibly work"! It's OK if it's a little ugly. Better that than force hundreds of Python-builders to get divorced from a decade-old directory naming scheme. From esr at thyrsus.com Sun Jan 14 08:08:57 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 02:08:57 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 11:39:59PM -0500 References: <20010113195808.B17712@thyrsus.com> Message-ID: <20010114020857.E19782@thyrsus.com> Tim Peters : > All agreed, and it should be a straightforward task then. I'm assuming it > will work with Unicode strings too . Thought about that. Want to get it working for 8 bits first. > Guido will depart from you at a different point. I depart here: it's not > "the right thing". It's a bunch of hacks that appeal not because they solve > a problem, but because they're cute algorithms that are pretty easy to > implement and kinda solve part of a problem. Again, my experience says differently. I have actually *used* Ratcliff-Obershelp to implement Do What I Mean (actually, Tell Me What I Mean) -- and had it work very well for non-geek users. That's why I want other Python programmers to have easy access to the capability. > Working six years in commercial speech recog really hammered that home to > me: 95% solutions are on the margin of unsellable, because an error one try > in 20 is intolerable for real people. Developers writing for developers get > "whoa! cool!" where my sisters walk away going "what good is that?". Edit > distance doesn't get within screaming range of 95% in real life. I suspect your speech recognition experience has given you an unhelpful bias. For English, what you say is certainly true -- but that's a gross worst-case application of R/O and Levenshtein that I'm not interested in pursuing. Nor do I expect Python hackers to use my module for that. Where techniques like Ratcliff-Obershelp really shine (and what I expect the module to be used for) is with controlled vocabularies such as command interfaces. These tend to have better orthogonality than NL, so antinoise filtering by R/O or Levenshtein distance (a kindred technique I somehow didn't learn until today -- there are disadvantages to being an autodidact) can really go to town on them. (Actually, my gut after thinking about both algorithms hard is that R/O is still a better technique than Levenshtein for the kind of application I have in mind. But I also suspect the difference is marginal.) (Other good uses for algorithms in this class include cladistics and genomic analysis.) > Even for most developers, it would be better to package up the single best > approach you've got (f(list, word) -> list of possible matches sorted in > confidence order), instead of a module with 6 (or so) functions they don't > understand and a pile of equally mysterious knobs. That's why good documentation, with motivating usage hints, is important. I write good documentation, Tim. > PATTERN RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, > DELETIONS AND GENERALIZED TRANSPOSITIONS > B. J. Oommen and R. K. S. Loke > http://www.scs.carleton.ca/~oommen/papers/GnTrnsJ2.PDF Thanks for the pointer; I've downloaded it and will read it. If the description of Ooomen's algorithm is good enough, I'll implement it and add it to the module. -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From dkwolfe at pacbell.net Sun Jan 14 08:48:51 2001 From: dkwolfe at pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 23:48:51 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Message-ID: <0G75009ZD6UYYE@mta5.snfc21.pbi.net> On Saturday, January 13, 2001, at 09:38 PM, Tim Peters wrote: > [Dan Wolfe] >> ... >> As I mentioned, the road to getting it in Mac OS X begins with >> getting it to build cleanly with the automated build system... so >> I've got to get this problem fixed before I start working on >> getting it in the build. >> >> - Dan >> (yes, I work for Apple, but this is something that I'm doing >> on my own!) > > Hang in there, Dan! I did the first Python port to the KSR-1 on my own > time > too, despite working for the visionless bastards at the time. Well, I won't go that far..... some of them are quite visionaries (I can't stop drooling over a Ti portable....). > The rest is > history: the glory, the fame, the riches, the groupies, the adulation > of my > peers. We won't mention the financial scandal and subsequent bankruptcy > lest it discourage you for no good reason . You left out the part where they turn ya into a timbot... > BTW, "do the simplest thing that can possibly work"! It's OK if it's a > little ugly. Better that than force hundreds of Python-builders to get > divorced from a decade-old directory naming scheme. Well the mv Python to PyCore was the simplest... but obviously the most painful.... The longer ugly fix is working but it's such a hack that I'd rather not show it off...I need to fix it so that it allow nice things such allowing the -with-suffix to be used...and then testing all the edge cases such as clobber, etc so that I don't break anything. :-) appreciating-your-note-after-attempting-to-understand-makefiles-on-Saturday-night' ly yours, - Dan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1729 bytes Desc: not available URL: From tim.one at home.com Sun Jan 14 11:45:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 05:45:53 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114020857.E19782@thyrsus.com> Message-ID: [Tim] >> ...It's a bunch of hacks that appeal not because they solve >> a problem, but because they're cute algorithms that are pretty >> easy to implement and kinda solve part of a problem. [Eric] > Again, my experience says differently. I have actually *used* > Ratcliff-Obershelp to implement Do What I Mean (actually, Tell Me What > I Mean) -- and had it work very well for non-geek users. That's why I > want other Python programmers to have easy access to the capability. > ... > Where techniques like Ratcliff-Obershelp really shine (and what I > expect the module to be used for) is with controlled vocabularies > such as command interfaces. Yet the narrower the domain, the less call for a library with multiple approaches. If R-O really shone for you, why bother with anything else? Seriously. You haven't used some (most?) of these. The core isn't a place for research modules either (note that I have no objection whatsoever to writing any module you like -- the only question here is what belongs in the core, and any algorithm *nobody* here has experience with in your target domain is plainly a poor *core* candidate for that reason alone -- we have to maintain, justify and explain it for years to come). > I suspect your speech recognition experience has given you an > unhelpful bias. Try to think of it as a helpfully different perspective <0.5 wink>. It's in favor of measuring error rate by controlled experiments, skeptical of intuition, and dismissive of anecdotal evidence. I may well agree you don't need all that heavy machinery if I had a clear definition of what problem it is you're trying to solve (I've learned it's not the kinds of problems *I* had in mind when I first read your description!). BTW, telephone speech recog requires controlled vocabularies because phone acoustics are too poor for the customary close-talking microphone approaches to work well enough. A std technique there is to build a "confusability matrix" of the words *in* the vocabulary, to spot trouble before it happens: if two words are acoustically confusable, it flags them and bounces that info back to the vocabulary designer. A similar approach should work well in your domain: if you get to define the cmd interface, run all the words in it pairwise through your similarity measure of choice, and dream up new words whenever a pair is "too close". That all but ensures that even a naive similarity algorithm will perform well (in telephone speech recog, the unconstrained error rate is up to 70% on cell phones; by constraining the vocabulary with the aid of confusability measures, we cut that to under 1%). > ... > (Actually, my gut after thinking about both algorithms hard is that > R/O is still a better technique than Levenshtein for the kind of > application I have in mind. But I also suspect the difference is > marginal.) So drop Levenshtein -- go with your best shot. Do note that they both (usually) consider a single transposition to be as much a mutation as two replacements (or an insert plus a delete -- "pure" Levenshtein treats those the same). What happens when the user doesn't enter an exact match? Does the kind of app you have in mind then just present them with a list of choices? If that's all (as opposed to, e.g., substituting its best guess for what the user actually typed and proceeding as if the user had given that from the start), then the evidence from studies says users are almost as pleased when the correct choice appears somewhere in the first three choices as when it appears as *the* top choice. A well-designed vocabulary can almost guarantee that happy result (note that most of the current research is aimed at the much harder job of getting the intended word into the #1 slot on the choice list). > (Other good uses for algorithms in this class include cladistics and > genomic analysis.) I believe you'll find current work in those fields has moved far beyond these simplest algorithms too, although they remain inspirational (for example, see "Protein Sequence Alignment and Database Scanning" at http://barton.ebi.ac.uk/papers/rev93_1/rev93_1.html Much as in typing, some mutations are more likely than others for *physical* reasons, so treating all pairs of symbols in the alphabet alike is too gross a simplification.). >> Even for most developers, it would be better to package up the >> single best approach you've got (f(list, word) -> list of possible >> matches sorted in confidence order), instead of a module with 6 >> (or so) functions they don't understand and a pile of equally >> mysterious knobs. > That's why good documentation, with motivating usage hints, is > important. I write good documentation, Tim. You're not going to find offense here even if you look for it, Eric : while only a small percentage of developers don't read docs at all, everyone else spaces out at least in linear proportion to the length of the docs. Most people will be looking for "a solution", not for "a toolkit". If the docs read like a toolkit, it doesn't matter how good they are, the bulk of the people you're trying to reach will pass on it. If you really want this to be *used*, supply one class that does *all* the work, including making the expert-level choices of which algorithm is used under the covers and how it's tuned. That's good advice. I still expect Guido won't want it in the core before wide use is a demonstrated fact, though (and no, that's not a chicken-vs-egg thing: "wide use" for a thing outside the core is narrower than "wide use" for a thing in the core). An exception would likely get made if he tried it and liked it a lot. But to get it under his radar, it's again much easier if the usage docs are no longer than a couple paragraphs. I'll attach a tiny program that uses ndiff's SequenceMatcher to guess which of the 147 std 2.0 top-level library modules a user may be thinking of (and best I can tell, these are the same results case-folding R/O would yield): Module name? random Hmm. My best guesses are random, whrandom, anydbm (BTW, the first choice was an exact match) Module name? disect Hmm. My best guesses are bisect, dis, UserDict Module name? password Hmm. My best guesses are keyword, getpass, asyncore Module name? chitchat Hmm. My best guesses are whichdb, stat, asynchat Module name? xml Hmm. My best guesses are xmllib, mhlib, xdrlib [So far so good] Module name? http Hmm. My best guesses are httplib, tty, stat [I was thinking of httplib, but note that it missed SimpleHTTPServer: a name that long just isn't going to score high when the input is that short] Module name? dictionary Hmm. My best guesses are Bastion, ConfigParser, tabnanny [darn, I *think* I was thinking of UserDict there] Module name? uuencode Hmm. My best guesses are code, codeop, codecs [Missed uu] Module name? parse Hmm. My best guesses are tzparse, urlparse, pre Module name? browser Hmm. My best guesses are webbrowser, robotparser, user Module name? brower Hmm. My best guesses are webbrowser, repr, reconvert Module name? Thread Hmm. My best guesses are threading, whrandom, sched Module name? pickle Hmm. My best guesses are pickle, profile, tempfile (BTW, the first choice was an exact match) Module name? shelf Hmm. My best guesses are shelve, shlex, sched Module name? katmandu Hmm. My best guesses are commands, random, anydbm [I really was thinking of "commands"!] Module name? temporary Hmm. My best guesses are tzparse, tempfile, fpformat So it gets what I was thinking of into the top 3 very often, and despite some wildly poor guesses at the correct spelling -- you'd *almost* think it was doing a keyword search, except the *unintended* choices on the list are so often insane . Something like that may be a nice addition to Paul/Ping's help facility someday too. Hard question: is that "good enough" for what you want? Checking against 147 things took no perceptible time, because SequenceMatcher is already optimized for "compare one thing against N", doing preprocessing work on the "one thing" that greatly speeds the N similarity computations (I suspect you're not -- yet). It's been tuned and tested in practice for years; it works for any sequence type with hashable elements (so Unicode strings are already covered); it works for long sequences too. And if R-O is the best trick we've got, I believe it already does it. Do we need more? Of course *I'm* not convinced we even need *it* in the core, but packaging a match-1-against-N class is just a few minutes' editing of what follows. something-to-play-with-anyway-ly y'rs - tim NDIFFPATH = "/Python20/Tools/Scripts" LIBPATH = "/Python20/Lib" import sys, os sys.path.append(NDIFFPATH) from ndiff import SequenceMatcher modules = {} # map lowercase module stem to module name for f in os.listdir(LIBPATH): if f.endswith(".py"): f = f[:-3] modules[f.lower()] = f def match(fname, numchoices=3): lower = fname.lower() s = SequenceMatcher() s.set_seq2(lower) scores = [] for lowermod, mod in modules.items(): s.set_seq1(lowermod) scores.append((s.ratio(), mod)) scores.sort() scores.reverse() return modules.has_key(lower), [x[1] for x in scores[:numchoices]] while 1: name = raw_input("Module name? ") is_exact, choices = match(name) print "Hmm. My best guesses are", ", ".join(choices) if is_exact: print "(BTW, the first choice was an exact match)" From esr at thyrsus.com Sun Jan 14 13:15:33 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 07:15:33 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 05:45:53AM -0500 References: <20010114020857.E19782@thyrsus.com> Message-ID: <20010114071533.A5812@thyrsus.com> Tim Peters : > Yet the narrower the domain, the less call for a library with multiple > approaches. If R-O really shone for you, why bother with anything else? Well, I was bothering with Levenshtein because *you* suggested it. :-) I put in Hamming similarity and stemming because they're O(n) where R/O is quadratic, and both widely used in situations where a fast sloppy job is preferable to a good but slow one. My documentation page is explicit about the tradeoff. > Seriously. You haven't used some (most?) of these. I've used stemming and R-O. Haven't used Hamming or Levenshtein. > The core isn't a place > for research modules either (note that I have no objection whatsoever to > writing any module you like -- the only question here is what belongs in the > core, and any algorithm *nobody* here has experience with in your target > domain is plainly a poor *core* candidate for that reason alone -- we have > to maintain, justify and explain it for years to come). Fair point. I read it, in this context, as good advice to drop the Hamming entry point and forget about the Levenshtein implementation -- stick to what I've used and know is useful as opposed to what I think might be useful. > I may well agree you don't > need all that heavy machinery if I had a clear definition of what problem it > is you're trying to solve (I've learned it's not the kinds of problems *I* > had in mind when I first read your description!). I think you have it by now, judging by the following... > What happens when the user doesn't enter an exact match? Does the kind of > app you have in mind then just present them with a list of choices? Yes. I've used this technique a lot. It gives users not just guidance but warm fuzzy feelings -- they react as though there's a friendly homunculus inside the software looking out for them. Actually, in my experience, the less techie they are the more they like this. > If that's all (as opposed to, e.g., substituting its best guess for what the > user actually typed and proceeding as if the user had given that from the > start), then the evidence from studies says users are almost as pleased when > the correct choice appears somewhere in the first three choices as when it > appears as *the* top choice. Interesting. That does fit what I've seen. > A well-designed vocabulary can almost > guarantee that happy result (note that most of the current research is aimed > at the much harder job of getting the intended word into the #1 slot on the > choice list). Yes. One of my other tricks is to design command vocabularies so the first three characters close to unique. This means R/O will almost always nail the right thing. > Much as in typing, some mutations are more likely than others for *physical* > reasons, so treating all pairs of symbols in the alphabet alike is too gross > a simplification.). Indeed. Couple weeks ago I was a speaker at a conference called "After the Genome 6" at which one of the most interesting papers was given by a lady mathematician who designs algorithms for DNA sequence matching. She made exactly this point. > > That's why good documentation, with motivating usage hints, is > > important. I write good documentation, Tim. > > You're not going to find offense here even if you look for it, Eric : No worries, I wasn't looking. :-) > Most people will be looking for "a solution", not for "a toolkit". If the > docs read like a toolkit, it doesn't matter how good they are, the bulk of > the people you're trying to reach will pass on it. If you really want this > to be *used*, supply one class that does *all* the work, including making > the expert-level choices of which algorithm is used under the covers and how > it's tuned. That's good advice. I don't think that's possible in this case -- the proper domains for stemming and R-O are too different. But maybe this is another nudge to drop the Hamming code. > But to get it under his radar, it's again much easier if the usage > docs are no longer than a couple paragraphs. How's this? \section{\module{simil} -- String similarily metrics} \declaremodule{standard}{simil} \moduleauthor{Eric S. Raymond}{esr at thyrsus.com} \modulesynopsis{String similarity metrics.} \sectionauthor{Eric S. Raymond} The \module{simil} module provides similarity functions for approximate word or string matching. One important application is for checking input words against a dictionary to match possible misspellings with the right terms in a controlled vocabulary. The entry points provide different tradeoffs ranging from crude and fast (stemming) to effective but slow (Ratcliff-Obershelp gestalt subpattern matching). The latter is one of the standard techniques used in commercial OCR software. The \module{simil} module defines the following functions: \begin{funcdesc}{stem}{} Returns the length of the longest common prefix of two strings divided by the length of the longer. Similarity scores range from 0.0 (no common prefix) to 1.0 (identity). Running time is linear in string length. \end{funcdesc} \begin{funcdesc}{hamming}{} Computes a normalized Hamming similarity between two strings of equal length -- the number of pairwise matches in the strings, divided by their common length. It returns None if the strings are of unequal length. Similarity scores range from 0.0 (no positions equal) to 1.0 (identity). Running time is linear in string length. \end{funcdesc} \begin{funcdesc}{ratcliff}{} Returns a Ratcliff/Obershelp gestalt similarity score based on co-occurrence of subpatterns. Similarity scores range from 0.0 (no common subpatterns) to 1.0 (identity). Running time is best-case linear, worst-case quadratic in string length. \end{funcdesc} > Module name? http > Hmm. My best guesses are httplib, tty, stat > > [I was thinking of httplib, but note that it missed > SimpleHTTPServer: a name that long just isn't going to score > high when the input is that short] >>> simil.ratcliff("http", "httplib") 0.72727274894714355 >>> simil.ratcliff("http", "tty") 0.57142859697341919 >>> simil.ratcliff("http", "stat") 0.5 >>> simil.ratcliff("http", "simplehttpserver") 0.40000000596046448 So with the 0.6 threshold I normally use R-O does better at eliminating the false matches but doesn't catch SimpleHTTPServer (case is, I'm sure you'll agree, an irrelevant detail here). > Module name? dictionary > Hmm. My best guesses are Bastion, ConfigParser, tabnanny > > [darn, I *think* I was thinking of UserDict there] >>> simil.ratcliff("dictionary", "bastion") 0.47058823704719543 >>> simil.ratcliff("dictionary", "configparser") 0.45454546809196472 >>> simil.ratcliff("dictionary", "tabnanny") 0.4444444477558136 >>> simil.ratcliff("dictionary", "userdict") 0.4444444477558136 R-O would have booted all of these. Hiighest score to configparser. Interesting -- I'm beginning to think R-O overweights lots of small subpattern matches relative to a few big ones, something I didn't notice before because the statistics of my vocabularies masked it. > Module name? uuencode > Hmm. My best guesses are code, codeop, codecs >>> simil.ratcliff("uuencode", "code") 0.66666668653488159 >>> simil.ratcliff("uuencode", "codeops") 0.53333336114883423 >>> simil.ratcliff("uuencode", "codecs") 0.57142859697341919 >>> simil.ratcliff("uuencode", "uu") 0.40000000596046448 R-O would pick "code" and boot the rest. > [Missed uu] > > Module name? parse > Hmm. My best guesses are tzparse, urlparse, pre >>> simil.ratcliff("parse", "tzparse") 0.83333331346511841 >>> simil.ratcliff("parse", "urlparse") 0.76923078298568726 >>> simil.ratcliff("parse", "pre") 0.75 Same result. > Module name? browser > Hmm. My best guesses are webbrowser, robotparser, user >>> simil.ratcliff("browser", "webbrowser") 0.82352942228317261 >>> simil.ratcliff("browser", "robotparser") 0.55555558204650879 >>> simil.ratcliff("browser", "user") 0.54545456171035767 Big win for R-O. Picks the right one, boots the wrong two. > Module name? brower > Hmm. My best guesses are webbrowser, repr, reconvert >>> simil.ratcliff("brower", "webbrowser") 0.75 >>> simil.ratcliff("brower", "repr") 0.60000002384185791 >>> simil.ratcliff("brower", "reconvert") 0.53333336114883423 Small win for R/O -- boots reconvert, and repr squeaks in under the wire. > Module name? Thread > Hmm. My best guesses are threading, whrandom, sched >>> simil.ratcliff("thread", "threading") 0.80000001192092896 >>> simil.ratcliff("thread", "whrandom") 0.57142859697341919 >>> simil.ratcliff("thread", "sched") 0.54545456171035767 Big win for R-O. > Module name? pickle > Hmm. My best guesses are pickle, profile, tempfile >>> simil.ratcliff("pickle", "pickle") 1.0 >>> simil.ratcliff("pickle", "profile") 0.61538463830947876 >>> simil.ratcliff("pickle", "tempfile") 0.57142859697341919 R-O wins again. > (BTW, the first choice was an exact match) > Module name? shelf > Hmm. My best guesses are shelve, shlex, sched >>> simil.ratcliff("shelf", "shelve") 0.72727274894714355 >>> simil.ratcliff("shelf", "shlex") 0.60000002384185791 >>> simil.ratcliff("shelf", "sched") 0.60000002384185791 Interesting. Shelve scoores highest, both the others squeak in. > Module name? katmandu > Hmm. My best guesses are commands, random, anydbm > > [I really was thinking of "commands"!] >>> simil.ratcliff("commands", "commands") 1.0 >>> simil.ratcliff("commands", "random") 0.4285714328289032 >>> simil.ratcliff("commands", "anydbm") 0.4285714328289032 R-O wins big. > Module name? temporary > Hmm. My best guesses are tzparse, tempfile, fpformat >>> simil.ratcliff("temporary", "tzparse") 0.5 >>> simil.ratcliff("temporary", "tempfile") 0.47058823704719543 >>> simil.ratcliff("temporary", "fpformat") 0.47058823704719543 R-O boots all of these. > Hard question: is that "good enough" for what you want? Um...notice that R-O filtering, even though it seems to be underweighting large matches, did a rather better job on your examples! With an 0.66 threshold it would have done *much* better. I think you've just made an argument for replacing your SequenceMatcher with simil.ratcliff. Mine's even documented. :-). -- Eric S. Raymond Militias, when properly formed, are in fact the people themselves and include all men capable of bearing arms. [...] To preserve liberty it is essential that the whole body of the people always possess arms and be taught alike, especially when young, how to use them. -- Senator Richard Henry Lee, 1788, on "militia" in the 2nd Amendment From ping at lfw.org Sun Jan 14 13:38:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 04:38:42 -0800 (PST) Subject: [Python-Dev] Why both r'' and R'', u'' and U''? Message-ID: Sorry i'm being forgetful -- could someone please refresh my memory: Was there a good reason for allowing both lowercase and capital 'r' as a prefix for raw-strings? I assume that the availability of both r'' and R'' is what led to having both u'' and U''. Is there any good reason for that either? This just seems to lead to ambiguity and unneeded complexity: more cases in tokenize.py, more cases in tokenize.c, more work for IDLE, more annoying when searching for u' in your editor. (I was about to fix the lack of u'' support in tokenize.py and that made me think about this.) What happened to TOOWTDI? Would you believe we now have 36 different ways of starting a string: ' " ''' """ r' r" r''' r""" u' u" u''' u""" ur' ur" ur''' ur""" R' R" R''' R""" U' U" U''' U""" uR' uR" uR''' uR""" Ur' Ur" Ur''' Ur""" UR' UR" UR''' UR""" Would it be outrageous to suggest deprecating the last five rows? -- ?!ng [1] We started with 4. Perl has (by my count) 381 ways of starting a string literal, so we're halfway there, logarithmically speaking. Perl has 757 if you count the fancier operators qx, qw, s, and tr. From mal at lemburg.com Sun Jan 14 14:33:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 14 Jan 2001 14:33:29 +0100 Subject: [Python-Dev] Why is soundex marked obsolete? References: Message-ID: <3A61AAA9.F6F1EA9F@lemburg.com> [Lots of talk about interesting algorithms for "human" pattern matching] I just want to add my 2 cents to the discussion: * Eric's package seems very useful for pattern matching, but that is a very specific domain -- not main stream * I would opt to create a neat distutils style package for it for people to install at their own liking (I would certainly like it :) * If wrapped up as a separate package, I'd suggest to add all known algorithms to the package and also make it Unicode aware. There are similar package for e.g. RNGs on Parnassus. BTW, are there less English centric "sounds alike" matchers around ? The NIST soundex algorithm as published on the internet: http://physics.nist.gov/cuu/Reference/soundex.html works fine for English texts, but other languages of course have different letter coding requirements (or even different alphabets). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sun Jan 14 14:53:03 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 14 Jan 2001 14:53:03 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? References: Message-ID: <3A61AF3F.EE6DAB88@lemburg.com> Ka-Ping Yee wrote: > > Sorry i'm being forgetful -- could someone please refresh my memory: > > Was there a good reason for allowing both lowercase and capital 'r' > as a prefix for raw-strings? I assume that the availability of both > r'' and R'' is what led to having both u'' and U''. Right. > Is there any > good reason for that either? No idea... I have never used anything other than the lowercase versions. > This just seems to lead to ambiguity and unneeded complexity: > more cases in tokenize.py, more cases in tokenize.c, more work > for IDLE, more annoying when searching for u' in your editor. > (I was about to fix the lack of u'' support in tokenize.py and > that made me think about this.) > > What happened to TOOWTDI? > > Would you believe we now have 36 different ways of starting a string: > > ' " ''' """ > r' r" r''' r""" > u' u" u''' u""" > ur' ur" ur''' ur""" > R' R" R''' R""" > U' U" U''' U""" > uR' uR" uR''' uR""" > Ur' Ur" Ur''' Ur""" > UR' UR" UR''' UR""" > > Would it be outrageous to suggest deprecating the last five rows? No. + 1 on the idea. > -- ?!ng > > [1] We started with 4. Perl has (by my count) 381 ways of starting > a string literal, so we're halfway there, logarithmically speaking. > Perl has 757 if you count the fancier operators qx, qw, s, and tr. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Sun Jan 14 15:24:08 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 14 Jan 2001 15:24:08 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: ; from ping@lfw.org on Sun, Jan 14, 2001 at 04:38:42AM -0800 References: Message-ID: <20010114152408.G1005@xs4all.nl> On Sun, Jan 14, 2001 at 04:38:42AM -0800, Ka-Ping Yee wrote: > [1] We started with 4. Perl has (by my count) 381 ways of starting > a string literal, so we're halfway there, logarithmically speaking. > Perl has 757 if you count the fancier operators qx, qw, s, and tr. Don't forget 'qr//', which is quite like a raw string, except that Perl uses it to 'precompile' regular expressions as a side effect. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Sun Jan 14 18:08:28 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 12:08:28 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Your message of "Sun, 14 Jan 2001 14:53:03 +0100." <3A61AF3F.EE6DAB88@lemburg.com> References: <3A61AF3F.EE6DAB88@lemburg.com> Message-ID: <200101141708.MAA11161@cj20424-a.reston1.va.home.com> > Ka-Ping Yee wrote: > > > > Sorry i'm being forgetful -- could someone please refresh my memory: > > > > Was there a good reason for allowing both lowercase and capital 'r' > > as a prefix for raw-strings? I assume that the availability of both > > r'' and R'' is what led to having both u'' and U''. > > Right. > > > Is there any > > good reason for that either? > > No idea... I have never used anything other than the lowercase > versions. It comes from the numeric literals. C allows 0x0 and 0X0, and 0L as well as 0l. So does Python (and also 0j == 0J). > > This just seems to lead to ambiguity and unneeded complexity: > > more cases in tokenize.py, more cases in tokenize.c, more work > > for IDLE, more annoying when searching for u' in your editor. > > (I was about to fix the lack of u'' support in tokenize.py and > > that made me think about this.) > > > > What happened to TOOWTDI? > > > > Would you believe we now have 36 different ways of starting a string: > > > > ' " ''' """ > > r' r" r''' r""" > > u' u" u''' u""" > > ur' ur" ur''' ur""" > > R' R" R''' R""" > > U' U" U''' U""" > > uR' uR" uR''' uR""" > > Ur' Ur" Ur''' Ur""" > > UR' UR" UR''' UR""" > > > > Would it be outrageous to suggest deprecating the last five rows? > > No. + 1 on the idea. Why bother? All that does is outdate a bunch of documentation. I don't see the extra effort in various parsers as a big deal. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Sun Jan 14 18:53:32 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sun, 14 Jan 2001 18:53:32 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? Message-ID: <010f01c07e52$e9801fc0$e46940d5@hagrid> The name database portions of SF task 17335 ("add compressed unicode database") were postponed to 2.1. My current patch replaces the ~450k large ucnhash module with a new ~160k large module. (See earlier posts for more info on how the new database works). Should I check it in? From skip at mojam.com Sun Jan 14 18:51:52 2001 From: skip at mojam.com (Skip Montanaro) Date: Sun, 14 Jan 2001 11:51:52 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core Message-ID: <14945.59192.400783.403810@beluga.mojam.com> Ping's pydoc is awesome! Move it out of the sandbox and put it in the standard distribution. Biggest hook for me: 1. execute "pydoc -p 3200" 2. visit "http://localhost:3200/" 3. knock yourself out Skip From martin at mira.cs.tu-berlin.de Sun Jan 14 18:57:57 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 14 Jan 2001 18:57:57 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? Message-ID: <200101141757.f0EHvvt01407@mira.informatik.hu-berlin.de> > > Would it be outrageous to suggest deprecating the last five rows? > Why bother? All that does is outdate a bunch of documentation. He suggested to deprecate it, not to remove it. By the time it is removed, the documentation still mentioning it should be outdated for other reasons (e.g. the string module might have disappeared). In general, the rationale for deprecating things would be that the simplification will make everybody's life easier in the long run. In the case of a small change (such as this one), that advantage would be small. OTOH, the hassle for users that rely on the then-removed feature will be also small; I see it as quite unlikely that anybody uses that feature actively (although I do think that people use 0X10 and 100L; the latter is common since 100l is oft confused with 1001). Regards, Martin From tim.one at home.com Sun Jan 14 20:00:21 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 14:00:21 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114071533.A5812@thyrsus.com> Message-ID: Very quick (swamped): > I think you've just made an argument for replacing your > SequenceMatcher with simil.ratcliff. Actually, I'm certain they're the same algorithm now, except the C is showing through in ratcliff to the floating-point eye . For demonstration, I *always* printed the top three scorers (that's logic in the little driver I posted, not in SequenceMatcher), without any notion of cutoff (ndiff does use a cutoff). Add this line before the return (in the posted driver) to see the actual scores: print scores[:numchoices] For example: Module name? browser [(0.82352941176470584, 'webbrowser'), (0.55555555555555558, 'robotparser'), (0.54545454545454541, 'user')] Hmm. My best guesses are webbrowser, robotparser, user Module name? On this example you reported: >>> simil.ratcliff("browser", "webbrowser") 0.82352942228317261 >>> simil.ratcliff("browser", "robotparser") 0.55555558204650879 >>> simil.ratcliff("browser", "user") 0.54545456171035767 which strongly suggests you're using C floats instead of Python floats to compute the final score. I didn't try every example in your email, but it's the same story on the three I did try (scores identical modulo simil.ratcliff dropping about 30 of the low-order result bits -- which is about the difference between a C double and a C float on most boxes). > Mine's even documented. :-). Which I appreciate! I dreamt up the SequenceMatcher algorithm going on 20 years ago for a friendly diff generator, and never even considered using it for other purposes. But then I may have mentioned that these other purposes never come up in my apps . or-at-least-they-haven't-in-contexts-where-r/o-would-have-been- strong-enough-ly y'rs - tim From bckfnn at worldonline.dk Sun Jan 14 20:00:33 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sun, 14 Jan 2001 19:00:33 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <010f01c07e52$e9801fc0$e46940d5@hagrid> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: <3a61f12a.36601630@smtp.worldonline.dk> On Sun, 14 Jan 2001 18:53:32 +0100, you wrote: >The name database portions of SF task 17335 ("add >compressed unicode database") were postponed to >2.1. > >My current patch replaces the ~450k large ucnhash >module with a new ~160k large module. (See earlier >posts for more info on how the new database works). Do you have a link or an approx date of this earlier posts? I must have missed it. The patch on sourceforge seems a bit empty: https://sourceforge.net/patch/index.php?func=detailpatch&patch_id=100899&group_id=5470 As a result I invented my own compression format for the ucnhash for jython. I managed to achive ~100k but that probably have different performance properties. regards, finn From esr at thyrsus.com Sun Jan 14 20:09:01 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 14:09:01 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 02:00:21PM -0500 References: <20010114071533.A5812@thyrsus.com> Message-ID: <20010114140901.A6431@thyrsus.com> Tim Peters : > > I think you've just made an argument for replacing your > > SequenceMatcher with simil.ratcliff. > > Actually, I'm certain they're the same algorithm now, except the C is > showing through in ratcliff to the floating-point eye . Take a look: /***************************************************************************** * * Ratcliff-Obershelp common-subpattern similarity. * * This code first appeared in a letter to the editor in Doctor * Dobbs's Journal, 11/1988. The original article on the algorithm, * "Pattern Matching by Gestalt" by John Ratcliff, had appeared in the * July 1988 issue (#181) but the algorithm was presented in assembly. * The main drawback of the Ratcliff-Obershelp algorithm is the cost * of the pairwise comparisons. It is significantly more expensive * than stemming, Hamming distance, soundex, and the like. * * Running time quadratic in the data size, memory usage constant. * *****************************************************************************/ static int RatcliffObershelp(char *st1, char *end1, char *st2, char *end2) { register char *a1, *a2; char *b1, *b2; char *s1 = st1, *s2 = st2; /* initializations are just to pacify GCC */ short max, i; if (end1 <= st1 || end2 <= st2) return(0); if (end1 == st1 + 1 && end2 == st2 + 1) return(0); max = 0; b1 = end1; b2 = end2; for (a1 = st1; a1 < b1; a1++) { for (a2 = st2; a2 < b2; a2++) { if (*a1 == *a2) { /* determine length of common substring */ for (i = 1; a1[i] && (a1[i] == a2[i]); i++) continue; if (i > max) { max = i; s1 = a1; s2 = a2; b1 = end1 - max; b2 = end2 - max; } } } } if (!max) return(0); max += RatcliffObershelp(s1 + max, end1, s2 + max, end2); /* rhs */ max += RatcliffObershelp(st1, s1, st2, s2); /* lhs */ return max; } static float ratcliff(char *s1, char *s2) /* compute Ratcliff-Obershelp similarity of two strings */ { short l1, l2; l1 = strlen(s1); l2 = strlen(s2); /* exact match end-case */ if (l1 == 1 && l2 == 1 && *s1 == *s2) return(1.0); return 2.0 * RatcliffObershelp(s1, s1 + l1, s2, s2 + l2) / (l1 + l2); } static PyObject * simil_ratcliff(PyObject *self, PyObject *args) { char *str1, *str2; if(!PyArg_ParseTuple(args, "ss:ratcliff", &str1, &str2)) return NULL; return Py_BuildValue("f", ratcliff(str1, str2)); } -- Eric S. Raymond "Taking my gun away because I might shoot someone is like cutting my tongue out because I might yell `Fire!' in a crowded theater." -- Peter Venetoklis From fredrik at effbot.org Sun Jan 14 20:31:06 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sun, 14 Jan 2001 20:31:06 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3a61f12a.36601630@smtp.worldonline.dk> Message-ID: <040e01c07e60$8c74d100$e46940d5@hagrid> finn wrote: > As a result I invented my own compression format for the ucnhash for > jython. I managed to achive ~100k but that probably have different > performance properties. here's the description: --- From: "Fredrik Lundh" Date: Sun, 16 Jul 2000 20:40:46 +0200 /.../ The unicodenames database consists of two parts: a name database which maps character codes to names, and a code database, mapping names to codes. * The Name Database (getname) First, the 10538 text strings are split into 42193 words, and combined into a 4949-word lexicon (a 29k array). Each word is given a unique index number (common words get lower numbers), and there's a "lexicon offset" table mapping from numbers to words (10k). To get back to the original text strings, I use a "phrase book". For each original string, the phrase book stores a a list of word numbers. Numbers 0-127 are stored in one byte, higher numbers (less common words) use two bytes. At this time, about 65% of the words can be represented by a single byte. The result is a 56k array. The final data structure is an offset table, which maps code points to phrase book offsets. Instead of using one big table, I split each code point into a "page number" and a "line number" on that page. offset = line[ (page[code>>SHIFT]< From tim.one at home.com Sun Jan 14 20:46:44 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 14:46:44 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <3A61AAA9.F6F1EA9F@lemburg.com> Message-ID: [M.-A. Lemburg] > BTW, are there less English centric "sounds alike" matchers > around ? Yes, but if anything there are far too many of them: like Soundex, they're just heuristics, and *everybody* who cares adds their own unique twists, while proper studies are almost non-existent. Few variants appear to be in use much beyond their inventor's friends; one notable exception in the Jewish community is the Daitch-Mokotoff variation, originally tailored to their unique needs but later generalized; a brief description here: http://www.avotaynu.com/soundex.html The similarly involved NYSIIS algorithm (New York State Identification Intelligence System -- look for NYSIIS on Parnassus) was the winner from a field of about two dozen competing algorithms, after measuring their effectiveness on assorted databases maintained by the state of New York. Since New York has a large immigrant population, NYSIIS isn't as Anglocentric as Soundex either. But state-of-the-art has given up on purely computational algorithms for these purposes: proper names are simply too much a mess. For example, if I search for "Richard", it *ought* to match on "Dick"; if my Arab buddy searches on "Mohammed", it *ought* to match on "Mhd"; "the rules" people actually use just aren't reducible to pure computation -- it takes a large knowledge base to capture what people "just know". You may enjoy visiting this commercial site (AFAIK, nobody is giving away state-of-the-art for free): http://www.las-inc.com/ > ... > http://physics.nist.gov/cuu/Reference/soundex.html > > works fine for English texts, If that were true, the English-speaking researchers would have declared victory 120 years ago . But English pronunciation is *notoriously* difficult to predict from spelling, partly because English is the Perl of human languages. or-maybe-the-borg-assuming-there's-a-difference-ly y'rs - tim From esr at thyrsus.com Sun Jan 14 21:17:53 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 15:17:53 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 02:46:44PM -0500 References: <3A61AAA9.F6F1EA9F@lemburg.com> Message-ID: <20010114151753.A6671@thyrsus.com> Tim Peters : > If that were true, the English-speaking researchers would have declared > victory 120 years ago . But English pronunciation is *notoriously* > difficult to predict from spelling, partly because English is the Perl of > human languages. Actually, according to the Oxford Encyclopedia of Linguistics, this is an urban myth. The orthography of English is, in fact, quite consistent; it looks much more wacked out than it is because the maddening irregularities are concentrated in the 400 most commonly used words. The situation is much like that with French verb forms -- most French verbs have a very regular inflection pattern, but the twenty or so exceptions are the most commonly used ones. In fact it's a general rule in language evolution that irregularities are preserved in common forms and not rare ones -- in the rare ones they get forgotten. American personal names are are problem precisely because they sometimes do *not* have English orthography. -- Eric S. Raymond "...quemadmodum gladius neminem occidit, occidentis telum est." [...a sword never kills anybody; it's a tool in the killer's hand.] -- (Lucius Annaeus) Seneca "the Younger" (ca. 4 BC-65 AD), From tim.one at home.com Sun Jan 14 21:31:06 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 15:31:06 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114140901.A6431@thyrsus.com> Message-ID: [Tim] > Actually, I'm certain they're the same algorithm now, except the C is > showing through in ratcliff to the floating-point eye . [Eric] > Take a look: Yup, same thing, except: > static float ratcliff(char *s1, char *s2) accounts for the numeric differences (change "float"->"double" and they'd be the same; Python has to convert it to a double anyway, lacking any internal support for C's floats; and the C code is *computing* in double regardless, cutting it back to a float upon return just because of the "float" decl). The code in SequenceMatcher doesn't *look* anything like it, though, due to years of dreaming up faster ways to do this (in its original role as a diff generator, it routinely had to deal with sequences containing 10s of thousands of elements, and code very much like the code you posted was just too slow for that). One simple trick that can enormously speed the worst cases: the "find the longest match starting here" innermost loop is guarded by > if (*a1 == *a2) However, it can't possibly find a *bigger* max unless it's also the case that a1[max) == a2[max) That's usually false in real life, so by adding that test to the guard you usually get to skip the innermost loop entirely. Probably more important in a diff-generator role, though. SequenceMatcher's prime trick is to preprocess one of the strings, in linear time building up a hash table mapping each character in the string to a list of the indices at which it appears. Then the second-innermost loop is saved from needing to do any search: when we get to, e.g., 'x' in the other string, the precomputed hash table tells us directly where to find all the x's in the original string. And in the match-1-against-N case, this hash table can be computed once & reused N times. That's a monster win. However, I never had the patience to code that in C, so I never *did* that before I reimplemented my stuff in Python. Now the Python ndiff runs circles around the old Pascal and C versions. I'm sure that has nothing to do with machines having gotten 100x faster in the meantime > for-short-1-against-1-matches-yours-will-certainly-be-quicker-ly y'rs - tim From guido at python.org Sun Jan 14 21:55:21 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 15:55:21 -0500 Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: Your message of "Sun, 14 Jan 2001 11:51:52 CST." <14945.59192.400783.403810@beluga.mojam.com> References: <14945.59192.400783.403810@beluga.mojam.com> Message-ID: <200101142055.PAA13041@cj20424-a.reston1.va.home.com> > Ping's pydoc is awesome! Move it out of the sandbox and put it in the > standard distribution. > > Biggest hook for me: > > 1. execute "pydoc -p 3200" > 2. visit "http://localhost:3200/" > 3. knock yourself out Yes, wow! Now, if we could somehow get this to show both the docs that Fred maintains and the stuff that Ping extracts from the source code, that would be even better! (I think that Ping's stuff should also run on the python.org site, by the way.) --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Sun Jan 14 21:59:28 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 15:59:28 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 03:31:06PM -0500 References: <20010114140901.A6431@thyrsus.com> Message-ID: <20010114155928.A6793@thyrsus.com> Tim Peters : > [Tim] > > Actually, I'm certain they're the same algorithm now, except the C is > > showing through in ratcliff to the floating-point eye . > > [Eric] > > Take a look: > > Yup, same thing, except: > > > static float ratcliff(char *s1, char *s2) > > accounts for the numeric differences (change "float"->"double" and they'd be > the same; Python has to convert it to a double anyway, lacking any internal > support for C's floats; and the C code is *computing* in double regardless, > cutting it back to a float upon return just because of the "float" decl). OK, so the right answer is to make your version visible and documented in the library. -- Eric S. Raymond No one is bound to obey an unconstitutional law and no courts are bound to enforce it. -- 16 Am. Jur. Sec. 177 late 2d, Sec 256 From tim.one at home.com Sun Jan 14 22:01:19 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 16:01:19 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Message-ID: [?!ng] > [1] We started with 4. Na, *we* started with two, just ' and ". And at the time, I thought that was arguably one too many already . Allowing the modifiers to be case-insensitive seems to me much more Pythonic than the original sin of making ' and " mean the same thing. OTOH, if only " had been allowed at the start, we'd probably spell raw strings with ' today, and that doesn't really scream that they're so very different from " strings. leaving-this-one-be-ly y'rs - tim From barry at digicool.com Sun Jan 14 22:02:07 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sun, 14 Jan 2001 16:02:07 -0500 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> Message-ID: <14946.5071.92879.789400@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> Ping's pydoc is awesome! Move it out of the sandbox and put SM> it in the standard distribution. SM> Biggest hook for me: | 1. execute "pydoc -p 3200" | 2. visit "http://localhost:3200/" | 3. knock yourself out Whoa. Awesome. From ping at lfw.org Sun Jan 14 22:01:45 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:01:45 -0800 (PST) Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: <200101141708.MAA11161@cj20424-a.reston1.va.home.com> Message-ID: On Sun, 14 Jan 2001, Guido van Rossum wrote: > > It comes from the numeric literals. C allows 0x0 and 0X0, and 0L as > well as 0l. So does Python (and also 0j == 0J). I just did a little test. Neither Python, Perl, nor Tcl support "\X66", only "\x66". Perl doesn't support 0X1234, only 0x1234. Tcl's "expr" routine does support 0X1234. Javascript supports 0X1234, but not "\X66". I'd bet that no one really relies on or expects the uppercase forms except L. -- ?!ng From ping at lfw.org Sun Jan 14 22:14:34 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:14:34 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <14942.29609.19618.534613@cj42289-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Fred L. Drake, Jr. wrote: > Ka-Ping Yee writes: > > My next two targets are: > > 1. Generating text from the HTML documentation files > > using Paul Prescod's stuff in onlinehelp.py. > > You mean the ones I publish as the standard documentation? Relying > on the structure of that HTML is pure folly! Paul's onlinehelp.py is using the HTMLParser and AbstractFormatter to turn HTML into text. It also contains paths to specific files, e.g. help('assert') looks for "ref/assert.html". Are you okay with this technique? Have you tried onlinehelp.py? I was planning to do the same to provide help on the language in pydoc. -- ?!ng From skip at mojam.com Sun Jan 14 22:26:48 2001 From: skip at mojam.com (Skip Montanaro) Date: Sun, 14 Jan 2001 15:26:48 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <200101142055.PAA13041@cj20424-a.reston1.va.home.com> References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> Message-ID: <14946.6552.542015.620760@beluga.mojam.com> Guido> Now, if we could somehow get this to show both the docs that Fred Guido> maintains and the stuff that Ping extracts from the source code, Guido> that would be even better! I had exactly the same thought. I suspect that if the install target were modified to install the html-ized sections of the lib reference manual pydoc could grovel around in sys and find the root of the library reference manual pretty easily. If not, it could simply redirect to the relevant section of http://www.python.org/doc/current/lib/. Skip From tim.one at home.com Sun Jan 14 22:45:48 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 16:45:48 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Message-ID: [?!ng] > ... > I'd bet that no one really relies on or expects the uppercase > forms except L. And 0X. I don't think it's in the std library, but I've certainly seen Python code do stuff like magic = 0XFEEDFACE Plus it's always good for a language to be able parse the stuff it prints, and "0X..." is generated by Python's %#X format code. Don't believe I've ever seen the "u" or "r" string modifiers in uppercase, though, but really don't see the harm in allowing that. From ping at lfw.org Sun Jan 14 22:50:43 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:50:43 -0800 (PST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <14946.5071.92879.789400@anthem.wooz.org> Message-ID: On Sun, 14 Jan 2001, Barry A. Warsaw wrote: > Whoa. Awesome. Thanks! Two things added recently: constants (any numbers, lists, tuples, strings, or types) in modules are shown; and packages are listed in the index as they should be. -- ?!ng From bckfnn at worldonline.dk Sun Jan 14 23:20:51 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sun, 14 Jan 2001 22:20:51 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <040e01c07e60$8c74d100$e46940d5@hagrid> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3a61f12a.36601630@smtp.worldonline.dk> <040e01c07e60$8c74d100$e46940d5@hagrid> Message-ID: <3a622615.50148579@smtp.worldonline.dk> [/F] >here's the description: Thanks. >From: "Fredrik Lundh" >Date: Sun, 16 Jul 2000 20:40:46 +0200 > >/.../ > > The unicodenames database consists of two parts: a name > database which maps character codes to names, and a code > database, mapping names to codes. > >* The Name Database (getname) > > First, the 10538 text strings are split into 42193 words, > and combined into a 4949-word lexicon (a 29k array). I only added a word to the lexicon if it was used more than once and if the length was larger then the lexicon index. I ended up with 1385 entries in the lexicon. (a 7k array) > Each word is given a unique index number (common words get > lower numbers), and there's a "lexicon offset" table mapping > from numbers to words (10k). My lexicon offset table is 3k and I also use 4k on a perfect hash of the words. > To get back to the original text strings, I use a "phrase > book". For each original string, the phrase book stores a a > list of word numbers. Numbers 0-127 are stored in one byte, > higher numbers (less common words) use two bytes. At this > time, about 65% of the words can be represented by a single > byte. The result is a 56k array. Because not all words are looked up in the lexicon, I used the values 0-38 for the letters and number, 39-250 are used for one byte lexicon index, and 251-255 are combined with following byte to form a two byte. This also result in a 57k array So far it is only minor variations. > The final data structure is an offset table, which maps code > points to phrase book offsets. Instead of using one big > table, I split each code point into a "page number" and a > "line number" on that page. > > offset = line[ (page[code>>SHIFT]< > Since the unicode space is sparsely populated, it's possible > to split the code so that lots of pages gets no contents. I > use a brute force search to find the optimal SHIFT value. > > In the current database, the page table has 1024 entries > (SHIFT is 6), and there are 199 unique pages in the line > table. The total size of the offset table is 26k. > >* The code database (getcode) > > For the code table, I use a straight-forward hash table to store > name to code mappings. It's basically the same implementation > as in Python's dictionary type, but a different hash algorithm. > The table lookup loop simply uses the name database to check > for hits. > > In the current database, the hash table is 32k. I chose to split a unicode name into words even when looking up a unicode name. Each word is hashed to a lexicon index and a "phrase book string" is created. The sorted phrase book is then search with a binary search among 858 entries that can be address directly followed by a sequential search among 12 entries. The phrase book search index is 8k and a table that maps phrase book indexes to codepoints is another 20k. The searching I do makes jython slower then the direct calculation you do. I'll take another look at this after jython 2.0 to see if I can improve performance with your page/line number scheme and a total hashing of all the unicode names. regards, finn From ping at lfw.org Sun Jan 14 23:44:47 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 14:44:47 -0800 (PST) Subject: [Python-Dev] SourceForge and long patches Message-ID: Okay, this is getting really annoying. SourceForge won't accept any patches > 16k. Why not? Is there a way around this? SourceForge: Exiting with Error ERROR Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 I'm trying to submit the update to tokenize.py, but it's too long because i've changed test/output/test_tokenize and that's a big file. -- ?!ng From guido at python.org Sun Jan 14 23:58:03 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 17:58:03 -0500 Subject: [Python-Dev] SourceForge and long patches In-Reply-To: Your message of "Sun, 14 Jan 2001 14:44:47 PST." References: Message-ID: <200101142258.RAA13606@cj20424-a.reston1.va.home.com> > Okay, this is getting really annoying. SourceForge won't accept > any patches > 16k. Why not? Is there a way around this? I have no idea why; can only assume it's a limitation in the database package they use. The standard workaround is to upload a URL pointing to the patch. :-( > SourceForge: Exiting with Error > > ERROR > > Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jan 15 00:35:51 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 00:35:51 +0100 Subject: [Python-Dev] Where's Greg Ward ? Message-ID: <3A6237D7.673BBB30@lemburg.com> He seems to be offline and the people on the distutils list have some patches and other things which would be nice to have in distutils for 2.1. I suppose we could simply check in the patches, but we still want to get his OK on things before applying patches to the distutils tree. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 15 00:57:45 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 18:57:45 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <3A6237D7.673BBB30@lemburg.com> Message-ID: [MAL] > He seems to be offline and the people on the distutils list have > some patches and other things which would be nice to have in > distutils for 2.1. Greg's somewhere near the end of the process of moving from Virginia to Canada; I expect he'll become visible again Real Soon. > I suppose we could simply check in the patches, but we still want > to get his OK on things before applying patches to the distutils > tree. The distutils SIG could elect a Shadow Dictator in his place; if everyone agrees to vote for Andrew, you save the effort of counting votes . From tismer at tismer.com Mon Jan 15 02:35:57 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 15 Jan 2001 02:35:57 +0100 Subject: [Python-Dev] Minor Bug-fix release for Stackless Python 2.0 Message-ID: <3A6253FD.E9B30462@tismer.com> Wolfgang Lipp reported that Microthreads were executing sequentially with SLP 2.0 . The bug fix is available on the website. Please use this new version, or microthreads will not give you much fun. http://www.stackless.com/spc20-win32.exe http://www.stackless.com/spc-src-010115.zip enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tommy at ilm.com Mon Jan 15 03:18:20 2001 From: tommy at ilm.com (Captain Senorita) Date: Sun, 14 Jan 2001 18:18:20 -0800 (PST) Subject: [Python-Dev] chomp()? In-Reply-To: <14923.31238.65155.496546@buffalo.fnal.gov> References: <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> Message-ID: <14946.23981.694472.406438@mace.lucasdigital.com> Charles G Waldman writes: | | P=NP (Python is not Perl) Is it too late to suggest this for the SPAM9 t-shirt? :) From guido at python.org Mon Jan 15 03:24:36 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 21:24:36 -0500 Subject: [Python-Dev] chomp()? In-Reply-To: Your message of "Sun, 14 Jan 2001 18:18:20 PST." <14946.23981.694472.406438@mace.lucasdigital.com> References: <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> <14946.23981.694472.406438@mace.lucasdigital.com> Message-ID: <200101150224.VAA15254@cj20424-a.reston1.va.home.com> > Charles G Waldman writes: > | > | P=NP (Python is not Perl) > > Is it too late to suggest this for the SPAM9 t-shirt? :) By just about a day -- I haven't seen the new design yet, but Just & Eric were supposed to design it today and hand in the final proofs tomorrow. I believe the slogan will be "it fits your brain" (or "it fits my brain"). But if you print a bunch of P=NP shirts, I'm sure you can sell them with a profit, both in Long Beach and in San Diego (at the O'Reilly Open Source conference)... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 15 07:35:05 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 01:35:05 -0500 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <20010110101545.A21305@glacier.fnational.com> Message-ID: [Timmy] > At this point I'm +0.5 on the idea of fileobject.c using > ms_getline_hack whenever HAVE_GETC_UNLOCKED isn't available. [NeilS, from Wednesday] > Compare ms_getline_hack to what Perl does in order speed up IO. Believe me, I have . > I think its worth maintaining that piece of relatively portable > code given the benefit. If the code has to be maintained then it > might was well be used. If we find a platform the breaks we can > always disable it before the final release. Given that hearty encouragement, and the utterly non-scary results so far, I just checked in a new scheme: On a platform with getc_unlocked(): By default, use getc_unlocked(). If you want to use fgets() instead, #define USE_FGETS_IN_GETLINE. [so motivated people can use fgets() instead if it's faster on their platform] On a platform without getc_unlocked(): By default, use fgets(). If you don't want to use fgets(), #define DONT_USE_FGETS_IN_GETLINE. [so if we stumble into a platform it fails on between releases, the user will have an easy time turning it off themself] From gstein at lyra.org Mon Jan 15 08:18:20 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 14 Jan 2001 23:18:20 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Sat, Jan 13, 2001 at 08:55:35AM -0800 References: Message-ID: <20010114231820.C6081@lyra.org> On Sat, Jan 13, 2001 at 08:55:35AM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Lib > In directory usw-pr-cvs1:/tmp/cvs-serv14586 > > Modified Files: > httplib.py > Log Message: > SF Patch #103225 by Ping: httplib: smallest Python patch ever >... Not so small: >... > *** 333,337 **** > i = host.find(':') > if i >= 0: > ! port = int(host[i+1:]) > host = host[:i] > else: > --- 333,340 ---- > i = host.find(':') > if i >= 0: > ! try: > ! port = int(host[i+1:]) > ! except ValueError, msg: > ! raise socket.error, str(msg) > host = host[:i] > else: Did you intend to commit this? Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at zadka.site.co.il Mon Jan 15 16:53:58 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 17:53:58 +0200 (IST) Subject: [Python-Dev] chomp()? In-Reply-To: <200101150224.VAA15254@cj20424-a.reston1.va.home.com> References: <200101150224.VAA15254@cj20424-a.reston1.va.home.com>, <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> <14946.23981.694472.406438@mace.lucasdigital.com> Message-ID: <20010115155358.86E5AA828@darjeeling.zadka.site.co.il> On Sun, 14 Jan 2001 21:24:36 -0500, Guido van Rossum wrote: > But if you print a bunch of P=NP shirts, I'm sure you can sell them > with a profit, both in Long Beach and in San Diego (at the O'Reilly > Open Source conference)... And the Libre Software Meeting (http://lsm.abul.org), which has a Python subtopic too. (Since it's in France, no one is calling it "free", so it's probable you can sell those T-shirts there...) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal at lemburg.com Mon Jan 15 10:44:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:44:14 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: <3A62C66E.2BB69E61@lemburg.com> Fredrik Lundh wrote: > > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? Since the Unicode character names are probably not used for performance sensitive tasks, I suggest to checkin the smallest version possible. If it is too much work to get Finn's version recoded in C (presuming it's written in Java), then I'd suggest checking in your version until someone comes up with a yet smaller edition. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 15 10:48:49 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:48:49 +0100 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> <14946.6552.542015.620760@beluga.mojam.com> Message-ID: <3A62C781.22240D3C@lemburg.com> Skip Montanaro wrote: > > Guido> Now, if we could somehow get this to show both the docs that Fred > Guido> maintains and the stuff that Ping extracts from the source code, > Guido> that would be even better! > > I had exactly the same thought. I suspect that if the install target were > modified to install the html-ized sections of the lib reference manual pydoc > could grovel around in sys and find the root of the library reference manual > pretty easily. If not, it could simply redirect to the relevant section of > http://www.python.org/doc/current/lib/. Since Fred remarked that the URLs for the different docs are not fixed, how about adding a __onlinedocs__ attribute to the standard Python modules providing the correct URL ? Or, alternatively, pass the module's name through some Google like "I feel lucky" documentation search engine... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 15 10:51:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:51:40 +0100 Subject: [Python-Dev] Where's Greg Ward ? References: Message-ID: <3A62C82C.EA25AAF5@lemburg.com> [CCed to distutils, since it matters there] Tim Peters wrote: > > [MAL] > > He seems to be offline and the people on the distutils list have > > some patches and other things which would be nice to have in > > distutils for 2.1. > > Greg's somewhere near the end of the process of moving from Virginia to > Canada; I expect he'll become visible again Real Soon. Great :) > > I suppose we could simply check in the patches, but we still want > > to get his OK on things before applying patches to the distutils > > tree. > > The distutils SIG could elect a Shadow Dictator in his place; if everyone > agrees to vote for Andrew, you save the effort of counting votes . Ok, let's agree to vote for Andrew :) Andrew, is that OK with you ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 15 11:52:09 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 05:52:09 -0500 Subject: [Python-Dev] RE: xreadline speed vs readlines_sizehint In-Reply-To: <3A5D602D.9DC991CB@per.dem.csiro.au> Message-ID: [Mark Favas] > ... > The lines range in length from 96 to 747 characters, with > 11% @ 233, 17% @ 252 and 52% @ 254 characters, so #1 [a vendor > who actually optimized fgets()] looks promising - most lines are > long enough to trigger a realloc. Plus as soon as you spill over the stack buffer, I make you pay for filling 1024 new bytes with newlines before the next fgets() call, and almost all of those are irrelevant to you. It doesn't degrade gracefully. Alas, I tried several "adaptive" schemes (adjusting how much of the initial segment of a larger stack buffer they would use, based on the actual line lengths seen in the past), but the costs always exceeded the savings on my box. > Cranking up INITBUFSIZE in ms_getline_hack to 260 from 200 > improves thing again, by another 25%: > total 131426612 chars and 514216 lines > count_chars_lines 5.081 5.066 > readlines_sizehint 3.743 3.717 > using_fileinput 11.113 11.100 > while_readline 6.100 6.083 > for_xreadlines 3.027 3.033 Well, I couldn't let you forego *all* of 25%. The current fileobject.c has a stack buffer of 300 bytes, but only uses 100 of them on the first gets() call. On a very quiet machine, that saved 3-4% of the runtime on *my* test case, whose line lengths are typical of the text files I crunch over, so I'm happy for me. If 100 bytes aren't enough, it must call fgets() again, but just appends the next call into the full 300-byte buffer. So it saves the realloc for lines under 300 chars. > Apart from the name , I like ms_getline_hack... Ya, it's now the non-pejorative getline_via_fgets(). I hate that I became a grown-up <0.9 wink>. time-to-pick-wings-off-of-flies-ly y'rs - tim From ping at lfw.org Mon Jan 15 12:11:16 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 03:11:16 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: <20010114231820.C6081@lyra.org> Message-ID: On Sun, 14 Jan 2001, Greg Stein wrote: > Not so small: > > >... > > *** 333,337 **** > > i = host.find(':') > > if i >= 0: > > ! port = int(host[i+1:]) > > host = host[:i] > > else: > > --- 333,340 ---- > > i = host.find(':') > > if i >= 0: > > ! try: > > ! port = int(host[i+1:]) > > ! except ValueError, msg: > > ! raise socket.error, str(msg) > > host = host[:i] > > else: The above changes were not part of the patch i submitted; the patch i submitted was exactly a one-character change. Guido has already edited the file, so there's no need to commit anything further here. -- ?!ng From mal at lemburg.com Mon Jan 15 12:56:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 12:56:37 +0100 Subject: [Python-Dev] Why is soundex marked obsolete? References: Message-ID: <3A62E575.9A584108@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > BTW, are there less English centric "sounds alike" matchers > > around ? > > Yes, but if anything there are far too many of them: like Soundex, they're > just heuristics, and *everybody* who cares adds their own unique twists, > while proper studies are almost non-existent. Few variants appear to be in > use much beyond their inventor's friends; one notable exception in the > Jewish community is the Daitch-Mokotoff variation, originally tailored to > their unique needs but later generalized; a brief description here: > > http://www.avotaynu.com/soundex.html > > The similarly involved NYSIIS algorithm (New York State Identification > Intelligence System -- look for NYSIIS on Parnassus) was the winner from a > field of about two dozen competing algorithms, after measuring their > effectiveness on assorted databases maintained by the state of New York. > Since New York has a large immigrant population, NYSIIS isn't as > Anglocentric as Soundex either. Thanks for the pointer. I'll add that module to my lib :) http://metagram.webreply.com/downloads/nysiis.py Perhaps Eric ought to add this one to his package as well ?! BTW, where can I find your package on the web, Eric ? I'd like to give it a ride under German language conditions ;) > But state-of-the-art has given up on purely computational algorithms for > these purposes: proper names are simply too much a mess. For example, if I > search for "Richard", it *ought* to match on "Dick"; if my Arab buddy > searches on "Mohammed", it *ought* to match on "Mhd"; "the rules" people > actually use just aren't reducible to pure computation -- it takes a large > knowledge base to capture what people "just know". You may enjoy visiting > this commercial site (AFAIK, nobody is giving away state-of-the-art for > free): > > http://www.las-inc.com/ Sad -- "patent pending" algorithms don't help anyone on this planet :( > > ... > > http://physics.nist.gov/cuu/Reference/soundex.html > > > > works fine for English texts, > > If that were true, the English-speaking researchers would have declared > victory 120 years ago . But English pronunciation is *notoriously* > difficult to predict from spelling, partly because English is the Perl of > human languages. Then Dutch must be the Python of human languages... ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Mon Jan 15 21:13:18 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:13:18 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 In-Reply-To: References: Message-ID: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> On Sun, 14 Jan 2001 19:26:38 -0800, Tim Peters wrote: > Modified Files: > tabnanny.py > Log Message: > Whitespace normalization. hmmmmmm....... -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal at lemburg.com Mon Jan 15 13:10:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 13:10:30 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 References: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: <3A62E8B6.3DFC1FA2@lemburg.com> Moshe Zadka wrote: > > On Sun, 14 Jan 2001 19:26:38 -0800, Tim Peters wrote: > > Modified Files: > > tabnanny.py > > Log Message: > > Whitespace normalization. > > hmmmmmm....... Perhaps you ought to make this a CRON job ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Mon Jan 15 21:24:48 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:24:48 +0200 (IST) Subject: [Python-Dev] Someone should be shot In-Reply-To: <3A62E8B6.3DFC1FA2@lemburg.com> References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: <20010115202448.38F60A828@darjeeling.zadka.site.co.il> I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! Of course, the real culprit is the person who fixed up the reply-to in the checkin messages to point to python-dev. Why was it done, and isn't there a better way? This makes it painful to personally comment on people's checkin messages. I suggest instead to add a mail-followup-to header (Didn't anyone read "Reply-To Munging Considered Harmful"?) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From esr at thyrsus.com Mon Jan 15 13:23:25 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 07:23:25 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <3A62E575.9A584108@lemburg.com>; from mal@lemburg.com on Mon, Jan 15, 2001 at 12:56:37PM +0100 References: <3A62E575.9A584108@lemburg.com> Message-ID: <20010115072325.A10377@thyrsus.com> M.-A. Lemburg : > Perhaps Eric ought to add this one to his package as well ?! Actually, at this point, my plan is to give Tim a decent interval to refactor ndiff so his SequenceMatcher class is exposed and documented -- otherwise *I'll* go in and do it (har! waving a bloody knife!). His turns out to be the same as the Ratcliff-Obershelp technique I was using, except Tim had his bullshit threshold set too low (:-)) and let through matches I wouldn't have. -- Eric S. Raymond The only purpose for which power can be rightfully exercised over any member of a civilized community, against his will, is to prevent harm to others. His own good, either physical or moral, is not a sufficient warrant -- John Stuart Mill, "On Liberty", 1859 From mal at lemburg.com Mon Jan 15 13:26:59 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 13:26:59 +0100 Subject: [Python-Dev] Re: Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <3A62EC93.9AA60ABA@lemburg.com> Moshe Zadka wrote: > > I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! > Of course, the real culprit is the person who fixed up the reply-to in > the checkin messages to point to python-dev. Why was it done, and > isn't there a better way? This makes it painful to personally comment > on people's checkin messages. I suggest instead to add a mail-followup-to > header > > (Didn't anyone read "Reply-To Munging Considered Harmful"?) Naa, noone needs to be shot in the foot ;) In fact I like it, that replies go to python-dev ... after all, that's where these things should be discussed. BTW, in case you misunderstood my reply: it would indeed make sense to automate these kinds of check (tabnanny et al). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Mon Jan 15 21:42:15 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:42:15 +0200 (IST) Subject: [Python-Dev] Re: Someone should be shot In-Reply-To: <3A62EC93.9AA60ABA@lemburg.com> References: <3A62EC93.9AA60ABA@lemburg.com>, <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <20010115204215.84F0CA828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001 13:26:59 +0100, "M.-A. Lemburg" wrote: > In fact I like it, that replies go to python-dev ... after all, > that's where these things should be discussed. Well, that's the mailing list where things should be discussed. But when I press the "Reply" button (as opposed to "Reply to List" button) I expect my e-mail to go to the person originating the e-mail. Reply-To: means "I'd like to get replies to some other address". What if, say, a checkin message relates to some private topic I'd discussed with someone: I'd like to reply to him personally. I agree that responses to Python-Checkins should be handled on Python-Dev: that's what the mail-followup-to header is for. > BTW, in case you misunderstood my reply: it would indeed make > sense to automate these kinds of check (tabnanny et al). Oh, ok. The "cron" part threw me off (why cron?) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From barry at digicool.com Mon Jan 15 14:15:28 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 08:15:28 -0500 Subject: [Python-Dev] Where's Greg Ward ? References: <3A62C82C.EA25AAF5@lemburg.com> Message-ID: <14946.63472.282750.828218@anthem.wooz.org> >>>>> "M" == M writes: >> The distutils SIG could elect a Shadow Dictator in his place; >> if everyone agrees to vote for Andrew, you save the effort of >> counting votes . M> Ok, let's agree to vote for Andrew :) M> Andrew, is that OK with you ? He's got my vote. I've been experiencing some weird problems with the distutils installation of pybsddb3 out of the current Python cvs tree. It'd be nice if the outstanding distutils patches are integrated before I dive in. I don't see anything relevant in patches or bugs, but I don't know if there are other repositories of distutils fixes (like the archives?). -Barry From barry at digicool.com Mon Jan 15 14:27:02 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 08:27:02 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <14946.64166.348139.425223@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> I'm sorry! I meant to reply to tim alone, and ended up MZ> spamming python-dev! Of course, the real culprit is the MZ> person who fixed up the reply-to in the checkin messages to MZ> point to python-dev. Why was it done, and isn't there a better MZ> way? This makes it painful to personally comment on people's MZ> checkin messages. I suggest instead to add a mail-followup-to MZ> header MZ> (Didn't anyone read "Reply-To Munging Considered Harmful"?) Or how about http://www.metasystema.org/essays/reply-to-useful.mhtml for a dissenting view. Of course Mail-Followup-To is completely non-standard, but even if it were, having the mailing list munge it in isn't recommended: http://cr.yp.to/proto/replyto.html Bottom line (IMHO), this is just something about email that is and will forever remain broken. Given that, it was voted a long while back to make Reply-To for checkins point to python-dev so until there's a hue and cry to change it back, I'll leave it as is. And yeah, it bites me sometimes too! -Barry From tony at lsl.co.uk Mon Jan 15 15:18:36 2001 From: tony at lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 15 Jan 2001 14:18:36 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message-ID: <002801c07efe$0c728a80$f05aa8c0@lslp7o.int.lsl.co.uk> Neat stuff. Ka-Ping Yee strikes again. And it works with Python 1.5.2. Running on NT (4.00.1381) in an "MS-DOS" window, using Python 1.5.2 installed in the effbot manner, it works, with the slight strangeness that if I do: python pydoc.py I get the documentation for OK, but it is preceded with a line claiming that: The system cannot find the path specified. I don't have the time to pursue this at the moment - it's possibly an artefact of our system? (one minor "prettiness" hack - those of us who have been tainted by Emacs Lisp programming tend to start module documentation off with a line of the form: .py -- information about the module which, when pydoc'ed, results in a NAME line which starts with twice... Of course, if I'm the only person doing this, I'll just have to, well, stop...) A request - a "-f" switch to allow the user to specify a particular Python file (i.e., something not on the PYTHONPATH). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From jack at oratrix.nl Mon Jan 15 15:32:02 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 15 Jan 2001 15:32:02 +0100 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Message by Guido van Rossum , Sat, 13 Jan 2001 17:33:34 -0500 , <200101132233.RAA03229@cj20424-a.reston1.va.home.com> Message-ID: <20010115143203.A44B63C2031@snelboot.oratrix.nl> Also note that the problem only occurs when trying to build a unix-Python out-of-the-box on MacOSX. If you're building a Carbon Python from the MacPython sources (something very few people can do right now:-) the executable isn't called "python". And when a real MacOSX-Python will be done it'll have all the nifty packaging stuff that will also make sure that there's nothing called "python" in the toplevel folder. And the two workarounds (1-Use a UFS filesystem, 2-Put a ".exe" extension in the Makefile) work fine for the mean time. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Mon Jan 15 15:33:23 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 09:33:23 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: Your message of "Sun, 14 Jan 2001 23:18:20 PST." <20010114231820.C6081@lyra.org> References: <20010114231820.C6081@lyra.org> Message-ID: <200101151433.JAA17944@cj20424-a.reston1.va.home.com> > >... > > *** 333,337 **** > > i = host.find(':') > > if i >= 0: > > ! port = int(host[i+1:]) > > host = host[:i] > > else: > > --- 333,340 ---- > > i = host.find(':') > > if i >= 0: > > ! try: > > ! port = int(host[i+1:]) > > ! except ValueError, msg: > > ! raise socket.error, str(msg) > > host = host[:i] > > else: > > Did you intend to commit this? Oops. That was a patch submitted a while ago that I applied as an experiment but then decided I didn't like (argument: why bother). I've reverted it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 15 15:40:30 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 09:40:30 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 22:24:48 +0200." <20010115202448.38F60A828@darjeeling.zadka.site.co.il> References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <200101151440.JAA18045@cj20424-a.reston1.va.home.com> > I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! > Of course, the real culprit is the person who fixed up the reply-to in > the checkin messages to point to python-dev. Why was it done, and > isn't there a better way? This makes it painful to personally comment > on people's checkin messages. I suggest instead to add a mail-followup-to > header > > (Didn't anyone read "Reply-To Munging Considered Harmful"?) I agree with you, but Barry (who set this up) seems to believe that there's a good reason to do it this way. Barry, do you still feel that way? The auto-reply-all has probably tripped me up more than anyone. Anyone else have a strong reason why this should be set? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Tue Jan 16 00:03:25 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 16 Jan 2001 01:03:25 +0200 (IST) Subject: [Python-Dev] Someone should be shot In-Reply-To: <14946.64166.348139.425223@anthem.wooz.org> References: <14946.64166.348139.425223@anthem.wooz.org>, <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <20010115230325.1C7F5A828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001 08:27:02 -0500, barry at digicool.com (Barry A. Warsaw) wrote: > > Or how about > > http://www.metasystema.org/essays/reply-to-useful.mhtml If your mailer doesn't have this option, you should request it from its development team. Any mailer, whose development team refuses this simple request due to some ideological position, cannot be said to be reasonable. As some people here know, I'm my mailer's "development team". I refuse to add it due to an ideological position. Anyone who knows me know I'm quite unreasonable. Hmmm....I'm not making much headway, am I ;-) > for a dissenting view. Of course Mail-Followup-To is completely > non-standard, but even if it were, having the mailing list munge it in > isn't recommended: > > http://cr.yp.to/proto/replyto.html This has no relevance to the current case, since python-checkin messages are machine-generated -- so this is closer to doing this in the script generating the checkin message, and only differes in implementation. > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! I won't continue this thread, but remember that my vote is "no". I simply shudder at the thought that I might send someone e-mail with something like "nice bugfix. Didn't know you were back from the sex-change operation", and it would be broadcast out to all Python-Dev *and* the archives, for posterity. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From thomas at xs4all.net Mon Jan 15 16:31:22 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 16:31:22 +0100 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14946.64166.348139.425223@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 08:27:02AM -0500 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> Message-ID: <20010115163122.I1005@xs4all.nl> On Mon, Jan 15, 2001 at 08:27:02AM -0500, Barry A. Warsaw wrote: > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! I've said this before, on the Mailman-devel list, but I'll repeat it here for the record (in case this issue ever comes up for vote again :) The main bite (for me) is that to reply to a person in private, you have to cut&paste the 'From' header from the original mail, and edit your new mail's headers, in order to reply to a specific person. My mailer is mature enough to have a 'reply', 'reply-group' and 'reply-list' keybinding, so the 'Reply-To' only interferes. There probably is a 'reply-to-from-ignoring-replyto' keybinding in there, too, somewhere, or it could be added, but remembering to type that different key is almost as much trouble as typing the email address by hand ;P So, my vote, like Moshe's, is just back from a sex change, and reads 'no'. Recount-recount-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Mon Jan 15 16:38:01 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 10:38:01 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 08:27:02 EST." <14946.64166.348139.425223@anthem.wooz.org> References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> Message-ID: <200101151538.KAA21937@cj20424-a.reston1.va.home.com> > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! It sounds like a hue and cry to change it to me! It looks like it's time for a BDFL Pronouncement. I pronounce: Given that: - we all know how to mail to python-dev; - replying to the sender is by far the most common kind of reply; - the mistake of replying to the sender when a reply-all was intended does much less potential harm than the mistake of replying to all when reply-to-sender was intended, the reply-to header shall be removed. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Mon Jan 15 17:57:19 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 11:57:19 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <14946.63472.282750.828218@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 08:15:28AM -0500 References: <3A62C82C.EA25AAF5@lemburg.com> <14946.63472.282750.828218@anthem.wooz.org> Message-ID: <20010115115719.B919@kronos.cnri.reston.va.us> On Mon, Jan 15, 2001 at 08:15:28AM -0500, Barry A. Warsaw wrote: >tree. It'd be nice if the outstanding distutils patches are >integrated before I dive in. I don't see anything relevant in patches >or bugs, but I don't know if there are other repositories of distutils >fixes (like the archives?). There are a few patches buried in the back archives, but I don't know of any outstanding bugfixes, so please report whatever problem you're seeing. Oh, and Barry, did the issue holding up your patch for adding shar support (#102313) ever get resolved? --amk From guido at python.org Mon Jan 15 17:02:39 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 11:02:39 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Mon, 08 Jan 2001 18:20:56 PST." <20010108182056.C4640@lyra.org> References: <20010108182056.C4640@lyra.org> Message-ID: <200101151602.LAA22272@cj20424-a.reston1.va.home.com> Greg Stein noticed me checking in *yet* another system that needs the fallback TELL64() definition in fileobjects.c, and wrote: > All of those #ifdefs could be tossed and it would be more robust (long term) > if an autoconf macro were used to specify when TELL64 should be defined. > > [ I've looked thru fileobject.c and am a bit confused: the conditions for > defining TELL64 do not match the conditions for *using* it. that would > seem to imply a semantic error somewhere and/or a potential gotcha when > they get skewed (like I assume what happened to FreeBSD). simplifying with > an autoconf macro may help to rationalize it. ] I have a better idea. Since "lseek((fd),0,SEEK_CUR)" seems to be the universal fallback, why not just define TELL64 to be that if it's not previously defined (currently only MS_WIN64 has a different definition)? It isn't always *used* (the conditions under which _portable_fseek() uses it are quite complex), but *when* it is used, this seems to be the most common definition... Patch: *** fileobject.c 2001/01/15 10:36:56 2.106 --- fileobject.c 2001/01/15 16:02:06 *************** *** 58,66 **** /* define the appropriate 64-bit capable tell() function */ #if defined(MS_WIN64) #define TELL64 _telli64 ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) ! /* NOTE: this is only used on older ! NetBSD prior to f*o() funcions */ #define TELL64(fd) lseek((fd),0,SEEK_CUR) #endif --- 58,65 ---- /* define the appropriate 64-bit capable tell() function */ #if defined(MS_WIN64) #define TELL64 _telli64 ! #else ! /* Fallback for older systems that don't have the f*o() funcions */ #define TELL64(fd) lseek((fd),0,SEEK_CUR) #endif I'll check this in after 24 hours unless a better idea comes up. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 15 17:17:07 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 11:17:07 -0500 Subject: [Python-Dev] PEP 205 comments In-Reply-To: Your message of "Fri, 12 Jan 2001 23:19:57 +0100." <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> References: <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> Message-ID: <200101151617.LAA22359@cj20424-a.reston1.va.home.com> I'll leave most of this to Fred, but I'll reply to two items (Fred can add these replies to the PEP): > Again on proxies, there is no discussion or documentation of the > ReferenceError. Why is it a RuntimeError? LookupError, ValueError, and > AttributeError seem to be just as fine or better. RuntimeError was my suggestion. The error doesn't really qualify as a LookupError in my view (there's no key that could be valid or invalid) and ValueError seems too general (that's typically used for out-of-range arguments and unparseable strings and the like). Do you have a reason why RuntimeError is inappropriate? > On to the type type extensions: Should there be a type flag indicating > presence of tp_weaklistoffset? It appears that the type structure had > tp_xxx7 for a long time, so likely all in-use binary modules have > that field set to zero. Is that sufficient? Yes, that should be sufficient. (I'm also going to clain tp_xxx7 for the rich comparison function slot, but either patch can be modified to use tp_xxx8 instead.) Maybe it's time to add a bunch of new spares? > Thanks for reading all of this message, You're welcome. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Mon Jan 15 17:39:03 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 11:39:03 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> Message-ID: <14947.10151.575008.869188@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> the reply-to header shall be removed. I'm more than happy to do this (I remember adding the reply-to munging reluctantly). Understand one thing: anybody who naively replies to the whole list will send those replies to python-checkins, not python-dev. Still want it? -Barry From barry at digicool.com Mon Jan 15 17:46:28 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 11:46:28 -0500 Subject: [Python-Dev] Where's Greg Ward ? References: <3A62C82C.EA25AAF5@lemburg.com> <14946.63472.282750.828218@anthem.wooz.org> <20010115115719.B919@kronos.cnri.reston.va.us> Message-ID: <14947.10596.733726.995351@anthem.wooz.org> >>>>> "AK" == Andrew Kuchling writes: AK> There are a few patches buried in the back archives, but I AK> don't know of any outstanding bugfixes, so please report AK> whatever problem you're seeing. Okay, will do. AK> Oh, and Barry, did the issue holding up your patch for adding AK> shar support (#102313) ever get resolved? No, but I'll try to take another poke at it. -Barry From moshez at zadka.site.co.il Tue Jan 16 02:07:48 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 16 Jan 2001 03:07:48 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: References: Message-ID: <20010116010748.41869A828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001, Guido van Rossum wrote: > Modified Files: > Meta.py > Log Message: > Geoffrey Gerrietts discovered that a KeyError was caught that probably > should have been a NameError. I'm checking in a change that catches > both, just to be sure -- I can't be bothered trying to understand this > code any more. :-) ... > ! except (KeyError, AttributeError): Ummmm....can you be bothered to make sure you really meant AttributeError when you said NameError? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido at python.org Mon Jan 15 18:06:07 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 12:06:07 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 11:39:03 EST." <14947.10151.575008.869188@anthem.wooz.org> References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> Message-ID: <200101151706.MAA22884@cj20424-a.reston1.va.home.com> > I'm more than happy to do this (I remember adding the reply-to munging > reluctantly). Understand one thing: anybody who naively replies to > the whole list will send those replies to python-checkins, not > python-dev. > > Still want it? Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Mon Jan 15 18:11:29 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 12:11:29 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <200101151706.MAA22884@cj20424-a.reston1.va.home.com> Message-ID: <14947.12097.613433.580928@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm more than happy to do this (I remember adding the reply-to >> munging reluctantly). Understand one thing: anybody who >> naively replies to the whole list will send those replies to >> python-checkins, not python-dev. Still want it? GvR> Yes. Done. From thomas at xs4all.net Mon Jan 15 18:34:37 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 18:34:37 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ftplib.py,1.47,1.48 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, Jan 15, 2001 at 08:32:52AM -0800 References: Message-ID: <20010115183437.J1005@xs4all.nl> On Mon, Jan 15, 2001 at 08:32:52AM -0800, Guido van Rossum wrote: > This is slightly controversial, but after reading the argumentation in > the bug tracker for and against, I believe this is the right solution. It's really only slightly controversional. 'mfisk' convinced me too, and I used to use ftp to a server behind a firewall :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Mon Jan 15 19:21:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 19:21:54 +0100 Subject: [Python-Dev] Re: Someone should be shot References: <3A62EC93.9AA60ABA@lemburg.com>, <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <20010115204215.84F0CA828@darjeeling.zadka.site.co.il> Message-ID: <3A633FC2.11F90E94@lemburg.com> Moshe Zadka wrote: > > On Mon, 15 Jan 2001 13:26:59 +0100, "M.-A. Lemburg" wrote: > > > In fact I like it, that replies go to python-dev ... after all, > > that's where these things should be discussed. > > Well, that's the mailing list where things should be discussed. > But when I press the "Reply" button (as opposed to "Reply to List" button) > I expect my e-mail to go to the person originating the e-mail. > Reply-To: means "I'd like to get replies to some other address". > What if, say, a checkin message relates to some private topic > I'd discussed with someone: I'd like to reply to him personally. > > I agree that responses to Python-Checkins should be handled on Python-Dev: > that's what the mail-followup-to header is for. Ah, ok. I thought you pressed Reply-All and then wondered why your message got copied to python-dev... > > BTW, in case you misunderstood my reply: it would indeed make > > sense to automate these kinds of check (tabnanny et al). > > Oh, ok. The "cron" part threw me off (why cron?) CRON is what's used on Unix to implement jobs which run on a regular basis... perhaps we just need to seup the CRON job in timbot though ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Jan 15 19:35:54 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 13:35:54 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: Your message of "Tue, 16 Jan 2001 03:07:48 +0200." <20010116010748.41869A828@darjeeling.zadka.site.co.il> References: <20010116010748.41869A828@darjeeling.zadka.site.co.il> Message-ID: <200101151835.NAA26712@cj20424-a.reston1.va.home.com> > > Modified Files: > > Meta.py > > Log Message: > > Geoffrey Gerrietts discovered that a KeyError was caught that probably > > should have been a NameError. I'm checking in a change that catches > > both, just to be sure -- I can't be bothered trying to understand this > > code any more. :-) > ... > > ! except (KeyError, AttributeError): > > Ummmm....can you be bothered to make sure you really meant AttributeError > when you said NameError? The code is correct. Ignore the comment. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Mon Jan 15 12:55:51 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 03:55:51 -0800 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14947.10151.575008.869188@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 11:39:03AM -0500 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> Message-ID: <20010115035550.B4336@glacier.fnational.com> [Barry on removing the reply-to header on python-checkins messages] > I'm more than happy to do this (I remember adding the reply-to munging > reluctantly). Understand one thing: anybody who naively replies to > the whole list will send those replies to python-checkins, not > python-dev. Could you make the script generate mail-followup-to instead of reply-to? I know its not a standard header but some MUA understand it and it is exactly what is needed to solve this problem. I think promoting it is a good thing. Neil From thomas at xs4all.net Mon Jan 15 19:59:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 19:59:12 +0100 Subject: [Python-Dev] Someone should be shot In-Reply-To: <20010115035550.B4336@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 15, 2001 at 03:55:51AM -0800 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <20010115035550.B4336@glacier.fnational.com> Message-ID: <20010115195912.K1005@xs4all.nl> On Mon, Jan 15, 2001 at 03:55:51AM -0800, Neil Schemenauer wrote: > [Barry on removing the reply-to header on python-checkins messages] > > I'm more than happy to do this (I remember adding the reply-to munging > > reluctantly). Understand one thing: anybody who naively replies to > > the whole list will send those replies to python-checkins, not > > python-dev. > Could you make the script generate mail-followup-to instead of > reply-to? I know its not a standard header but some MUA > understand it and it is exactly what is needed to solve this > problem. I think promoting it is a good thing. The script just calls '/bin/mail'. The Reply-To munging is done by Mailman, which is slightly more than 'a script'. syncmail could do it, but that would mean using sendmail instead of mail, and writing all headers itself. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Mon Jan 15 20:17:27 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 14:17:27 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: Your message of "Fri, 05 Jan 2001 14:14:49 EST." <14934.7465.360749.199433@localhost.localdomain> References: <14934.7465.360749.199433@localhost.localdomain> Message-ID: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> There doesn't seem to be a lot of enthousiasm for a Unittest bakeoff... Certainly I don't think I'll get to this myself before the conference. How about the following though: talking of low-hanging fruit, Tim's doctest module is an excellent thing even if it isn't a unit testing framework! (I found this out when I played with it -- it's real easy to get used to...) Would anyone object against Tim checking this in? Since it isn't a contender in the unit test bake-off, it shouldn't affect the outcome there at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Mon Jan 15 20:40:03 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 14:40:03 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <20010115035550.B4336@glacier.fnational.com> <20010115195912.K1005@xs4all.nl> Message-ID: <14947.21011.310090.686632@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: >> Could you make the script generate mail-followup-to instead of >> reply-to? I know its not a standard header but some MUA >> understand it and it is exactly what is needed to solve this >> problem. I think promoting it is a good thing. TW> The script just calls '/bin/mail'. The Reply-To munging is TW> done by Mailman, which is slightly more than 'a TW> script'. syncmail could do it, but that would mean using TW> sendmail instead of mail, and writing all headers itself. I'm sure Fred or I would be happy to review such a patch to syncmail . -Barry From jeremy at alum.mit.edu Mon Jan 15 20:31:44 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 15 Jan 2001 14:31:44 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> Message-ID: <14947.20512.140859.119597@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: GvR> There doesn't seem to be a lot of enthousiasm for a Unittest GvR> bakeoff... Certainly I don't think I'll get to this myself GvR> before the conference. Let's have all the interested parties vote now, then. It would certainly be helpful to have the new unittest module in the alpha release of 2.1. I'd like to write some new tests and I'd rather use the new stuff than the old stuff. Jeremy From tim.one at home.com Mon Jan 15 21:01:52 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:01:52 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14947.10151.575008.869188@anthem.wooz.org> Message-ID: [Barry] > ... > Understand one thing: anybody who naively replies to the whole > list will send those replies to python-checkins, not python-dev. IIRC, that's why the redirect to python-dev was added to begin with: of course people will reply to python-checkins, and then the next guy x-posts to python-dev too, and the next three in turn variously remove one or the other groups, or keep both or add c.l.py too. In the end, no single archive contains a coherent record on its own, and the random mix of "[Python-Dev]" and "[Python-checkins]" Subject tags even make it impossible to sort by (true) subject easily in your own mail client. > Still want it? Don't care . From tim.one at home.com Mon Jan 15 21:08:15 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:08:15 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 In-Reply-To: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: [] > Modified Files: > tabnanny.py > Log Message: > Whitespace normalization. [Moshe] > hmmmmmm....... LOL! I was hoping nobody would notice that <0.7 wink>. The appalling truth is that late in tabnanny's development I deliberately indented a large block of code by one column, and actually thought it was a good idea at the time. I'm as delighted to see that finally fixed as I am emabarrassed by the necessity. although-perhaps-more-appalled-that-was-there-was-followup- debate-about-followups-containing-more-msgs-than-there- were-characters-in-moshe's-followup-ly y'rs - tim From ping at lfw.org Mon Jan 15 21:10:10 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 12:10:10 -0800 (PST) Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <002801c07efe$0c728a80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: On Mon, 15 Jan 2001, Tony J Ibbs (Tibs) wrote: > I get the documentation for OK, but it is preceded with a line > claiming that: > > The system cannot find the path specified. Thanks for the NT testing. That's funny -- i put in a special case for Windows to avoid messages like the above a couple of days ago. How recently did you download pydoc.py? Does your copy contain: if hasattr(sys, 'winver'): return lambda text: tempfilepager(text, 'more') ? > .py -- information about the module > > which, when pydoc'ed, results in a NAME line which starts with > twice... > Of course, if I'm the only person doing this, I'll just have to, well, > stop...) I think i'm going to ask you to stop, unless Guido prefers otherwise. Guido, do you have a style pronouncement for module docstrings? > A request - a "-f" switch to allow the user to specify a particular > Python file (i.e., something not on the PYTHONPATH). Yes, it's on my to-do list. So you can see what i'm up to, here's my current to-do list: make boldness optional (only if using more/less? only Unix?) document a .py file given on the command line + webserver in background help should have a repr write a better htmlrepr (\n should look special, max length limit, etc.) generate docs from lib HTML generate HTML index from precis and __path__ and package contents list have help(...) produce a directory of available things to ask for help on curses.wrapper is broken: both function and package respect package __all__ coherent answer to .py vs .pyc: do we show .pyc? fix getcomments() bug: last two lines stuck together + grey out shadowed modules/packages refactor .py/.pyc/.module.so/.module.so.1 listers in htmldoc, textdoc skip __main__ module + index built-in modules too Windows and Mac testing default to HTTP mode on GUI platforms? (win, mac) The ones marked with + i consider done. Feel free to comment on or suggest priorities for the others; in particular, what do you think of the last one? The idea is that double-clicking on pydoc.py in Windows or MacOS could launch the server and then open the localhost URL using webbrowser.py to display the documentation index. Should it do this by default? -- ?!ng From guido at python.org Mon Jan 15 21:41:25 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 15:41:25 -0500 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Your message of "Mon, 15 Jan 2001 12:10:10 PST." References: Message-ID: <200101152041.PAA32298@cj20424-a.reston1.va.home.com> > > .py -- information about the module > > > > which, when pydoc'ed, results in a NAME line which starts with > > twice... > > Of course, if I'm the only person doing this, I'll just have to, well, > > stop...) > > I think i'm going to ask you to stop, unless Guido prefers > otherwise. Guido, do you have a style pronouncement for module > docstrings? I'm with Ping. None of the examples in the style guide start the docstring with the function name. Almost none of the standard library modules start their module docstring with the module name (codecs is an exception, but I didn't write it :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn at worldonline.dk Mon Jan 15 21:45:02 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Mon, 15 Jan 2001 20:45:02 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <3A62C66E.2BB69E61@lemburg.com> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3A62C66E.2BB69E61@lemburg.com> Message-ID: <3a636122.45847835@smtp.worldonline.dk> [Fredrik Lundh] > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? [M.-A. Lemburg] >Since the Unicode character names are probably >not used for performance sensitive tasks, I suggest to >checkin the smallest version possible. > >If it is too much work to get Finn's version recoded in C >(presuming it's written in Java), then I'd suggest checking >in your version until someone comes up with a yet smaller >edition. FWIW, I agree the that 160k module should be used. Please, nobody should use the jython compression as an argument to delay any improvements in CPython. I certainly didn't post because I wanted to complicate your processes. I just wanted to show off . regards, finn From fredrik at effbot.org Mon Jan 15 21:58:11 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Mon, 15 Jan 2001 21:58:11 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3A62C66E.2BB69E61@lemburg.com> <3a636122.45847835@smtp.worldonline.dk> Message-ID: <001f01c07f35$e2c09500$e46940d5@hagrid> mal, finn: > >If it is too much work to get Finn's version recoded in C > >(presuming it's written in Java), then I'd suggest checking > >in your version until someone comes up with a yet smaller > >edition. > > FWIW, I agree the that 160k module should be used. Please, nobody should > use the jython compression as an argument to delay any improvements in > CPython. okay, unless someone throws in a -1 vote, I'll check this in tomorrow. Cheers /F From tim.one at home.com Mon Jan 15 21:57:26 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:57:26 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: [Fredrik Lundh] > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? Absolutely! But not like as for 2.0: check it in *now*, so we have a few days to deal with surprises before the alpha release. With 300K sitting on the table waiting to be taken, it's not worth delaying one hour to worry about 60K additional that may or may not be achievable later. From ping at lfw.org Mon Jan 15 22:02:38 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 13:02:38 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: <20010116010748.41869A828@darjeeling.zadka.site.co.il> Message-ID: On Tue, 16 Jan 2001, Moshe Zadka wrote: > Ummmm....can you be bothered to make sure you really meant AttributeError > when you said NameError? Nice bugfix. Didn't know you were back from the sex-change operation. -- ?!ng From tim.one at home.com Mon Jan 15 22:15:54 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 16:15:54 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > There doesn't seem to be a lot of enthousiasm for a Unittest > bakeoff... I'm enthusiastic, but ... > Certainly I don't think I'll get to this myself before the > conference. Ditto. Takes time that's not there. > ... > Would anyone object against Tim checking [doctest] in? You suggested that before, and so it was already on my 2.1a1 todo list. Hoped to get to it over the weekend but didn't. Hope to get to it today, but won't . On the chance that I do, anyone inclined to object should do so before the sun sets in Reston. or-if-it-never-sets-the-world-ends-anyway-ly y'rs - tim From akuchlin at mems-exchange.org Mon Jan 15 22:26:19 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 16:26:19 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14947.20512.140859.119597@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 15, 2001 at 02:31:44PM -0500 References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> <14947.20512.140859.119597@localhost.localdomain> Message-ID: <20010115162619.A19484@kronos.cnri.reston.va.us> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: >Let's have all the interested parties vote now, then. It would >certainly be helpful to have the new unittest module in the alpha >release of 2.1. I'd like to write some new tests and I'd rather use >the new stuff than the old stuff. Huh? If no one has tried the different modules, what's the point of having a vote? (Given that doctest is going to be added, though, it should be checked in ASAP.) --amk From trentm at ActiveState.com Mon Jan 15 23:10:26 2001 From: trentm at ActiveState.com (Trent Mick) Date: Mon, 15 Jan 2001 14:10:26 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <200101151602.LAA22272@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 11:02:39AM -0500 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> Message-ID: <20010115141026.I29870@ActiveState.com> On Mon, Jan 15, 2001 at 11:02:39AM -0500, Guido van Rossum wrote: > Greg Stein noticed me checking in *yet* another system that needs > the fallback TELL64() definition in fileobjects.c, and wrote: > > > All of those #ifdefs could be tossed and it would be more robust (long term) > > if an autoconf macro were used to specify when TELL64 should be defined. > > > > [ I've looked thru fileobject.c and am a bit confused: the conditions for > > defining TELL64 do not match the conditions for *using* it. that would > > seem to imply a semantic error somewhere and/or a potential gotcha when > > they get skewed (like I assume what happened to FreeBSD). simplifying with > > an autoconf macro may help to rationalize it. ] The problem is that these systems lie when they "say" (according to Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have largefile support. This seems to have happened for a particular release of BSD (which has since been fixed). I think that the Right(tm) (meaning the cleanest solution where the tests and definitions in the code actually represent the truth) answer is a proper configure test (sort of as Greg suggests). I don't really feel comfortable writing that patch (because (1) lack of time and (2) inability to test, I don't have any access to any of these BSD machines). [Guido] > > I have a better idea. Since "lseek((fd),0,SEEK_CUR)" seems to be the > universal fallback, why not just define TELL64 to be that if it's not > previously defined (currently only MS_WIN64 has a different > definition)? It isn't always *used* (the conditions under which > _portable_fseek() uses it are quite complex), but *when* it is used, > this seems to be the most common definition... While I agree that it is annoying that the build breaks for these platforms I think that it is appropriate that the build breaks. Having to put these: #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) definitions here gives a nice list of those platforms that *do* lie. I would prefer that to having an "#else" block that just captures all other cases, but that is just my opinion. Options (in order of preference): (1) Update the configure test for HAVE_LARGEFILE_SUPPORT such that the proper versions of these OSes do *not* #define it. (2) Guido's suggestion. (2) Keep extending the "#elif" list. ^---- using (2) twice was intentional Trent > > *** fileobject.c 2001/01/15 10:36:56 2.106 > --- fileobject.c 2001/01/15 16:02:06 > *************** > *** 58,66 **** > /* define the appropriate 64-bit capable tell() function */ > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > ! /* NOTE: this is only used on older > ! NetBSD prior to f*o() funcions */ > #define TELL64(fd) lseek((fd),0,SEEK_CUR) > #endif > > --- 58,65 ---- > /* define the appropriate 64-bit capable tell() function */ > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #else > ! /* Fallback for older systems that don't have the f*o() funcions */ > #define TELL64(fd) lseek((fd),0,SEEK_CUR) > #endif > > > I'll check this in after 24 hours unless a better idea comes up. > Better idea but no patch. :( Trent -- Trent Mick TrentM at ActiveState.com From skip at mojam.com Mon Jan 15 23:10:36 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 15 Jan 2001 16:10:36 -0600 (CST) Subject: [Python-Dev] should we start instrumenting modules with __all__? Message-ID: <14947.30044.934204.951564@beluga.mojam.com> I see the from-import-* patch for __all__ has been checked in. Should we make an effort to add __all__ to at least some modules before 2.1a1? Skip From akuchlin at mems-exchange.org Mon Jan 15 23:13:03 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 17:13:03 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: <200101121351.IAA19676@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 12, 2001 at 08:51:51AM -0500 References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> <200101121351.IAA19676@cj20424-a.reston1.va.home.com> Message-ID: <20010115171303.A23626@kronos.cnri.reston.va.us> On Fri, Jan 12, 2001 at 08:51:51AM -0500, Guido van Rossum wrote: >Ah. It's very simple. I create a directory "linux" as a subdirectory >of the Python source tree (i.e. at the same level as Lib, Objects, >etc.). Then I chdir into that directory, and I say "../configure". >The configure script creates subdirectories to hold the object files ... >Then I say "make" and it builds Python. This doesn't work at all for me in my copy of the CVS tree. Are there other steps or requirements to make this work. (Transcript available upon request, but I suspect I'm missing something simple.) --amk From tim.one at home.com Mon Jan 15 23:32:51 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 17:32:51 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: [Jeremy] > Let's have all the interested parties vote now, then. It would > certainly be helpful to have the new unittest module in the alpha > release of 2.1. I'd like to write some new tests and I'd rather use > the new stuff than the old stuff. [Andrew] > Huh? If no one has tried the different modules, what's the point of > having a vote? Presumably so that *something* gets into 2.1a1. At least you, Jeremy and Fredrik have tried them, and if that's all there can't be a tie . I would agree this is not an ideal decision procedure. the-question-is-whether-it's-better-than-paralysis-ly y'rs - tim From ping at lfw.org Mon Jan 15 23:35:47 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 14:35:47 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' Message-ID: I don't know whether this is going to be obvious or controversial, but here goes. Most of the time we're used to seeing a newline as '\n', not as '\012', and newlines are typed in as '\n'. A newcomer to Python is likely to do >>> 'hello\n' 'hello\012' and ask "what's \012?" -- whereupon one has to explain that it's an octal escape, that 012 in octal equals 10, and that chr(10) is newline, which is the same as '\n'. You're bound to run into this, and you'll see \012 a lot, because \n is such a common character. Aside from being slightly more frightening, '\012' also takes up twice as many characters as necessary. So... i'm submitting a patch that causes the three most common special whitespace characters, '\n', '\r', and '\t', to appear in their natural form rather than as octal escapes when strings are printed and repr()ed. Mm? -- ?!ng From esr at thyrsus.com Tue Jan 16 00:15:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 18:15:50 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from ping@lfw.org on Mon, Jan 15, 2001 at 02:35:47PM -0800 References: Message-ID: <20010115181550.A11566@thyrsus.com> Ka-Ping Yee : > I don't know whether this is going to be obvious or controversial, > but here goes. Most of the time we're used to seeing a newline as > '\n', not as '\012', and newlines are typed in as '\n'. > > A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > > and ask "what's \012?" -- whereupon one has to explain that it's an > octal escape, that 012 in octal equals 10, and that chr(10) is > newline, which is the same as '\n'. You're bound to run into this, > and you'll see \012 a lot, because \n is such a common character. > Aside from being slightly more frightening, '\012' also takes up > twice as many characters as necessary. > > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. Works for me. I'd add \v, \b and \a to cover the whole ANSI C standard escape set (hmmm...am I missing any?) -- Eric S. Raymond Live free or die; death is not the worst of evils. -- General George Stark. From thomas at xs4all.net Tue Jan 16 00:49:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 00:49:30 +0100 Subject: [Python-Dev] time functions Message-ID: <20010116004930.L1005@xs4all.nl> Maybe this is a dead and buried subject, but I'm going to try anyway, since everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood lately :) Why do we need the following atrocity : timestr = time.strftime("", time.localtime(time.time())) To do the simple task of 'date +' ? I never really understood why there isn't a way to get a timetuple directly from C, rather than converting a float that we got from C a bytecode before, even though the higher level almost always deals with timetuples. How about making the float-to-tuple functions (time.localtime, time.gmtime) accept 0 arguments as well, and defaulting to time.time() in that case ? Even better, how about doing the same for the other functions, too ? (where it makes sense, of course :) Actually, I'll split it up in three proposals: - Making the time in time.strftime default to 'now', so that the above becomes the ever so slightly confusing: timestr = time.strftime("") (confusing because it looks a bit like a regexp constructor...) - Making the time in time.asctime and time.ctime optional, defaulting to 'now', so you can just call 'time.ctime()' without having to pass time.time() (which are about half the calls in my own code :) - Making the time in time.localtime and time.gmtime default to 'now'. I'm 0/+1/+1 myself :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 16 00:55:36 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 00:55:36 +0100 Subject: [Python-Dev] TELL64 In-Reply-To: <20010115141026.I29870@ActiveState.com>; from trentm@ActiveState.com on Mon, Jan 15, 2001 at 02:10:26PM -0800 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> Message-ID: <20010116005536.M1005@xs4all.nl> On Mon, Jan 15, 2001 at 02:10:26PM -0800, Trent Mick wrote: > > > [ I've looked thru fileobject.c and am a bit confused: the conditions > > > for defining TELL64 do not match the conditions for *using* it. that > > > would seem to imply a semantic error somewhere and/or a potential > > > gotcha when they get skewed (like I assume what happened to > > > FreeBSD). simplifying with an autoconf macro may help to rationalize > > > it. ] > The problem is that these systems lie when they "say" (according to > Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have > largefile support. This seems to have happened for a particular release of > BSD (which has since been fixed). I think that the Right(tm) (meaning the > cleanest solution where the tests and definitions in the code actually > represent the truth) answer is a proper configure test (sort of as Greg > suggests). I don't really feel comfortable writing that patch (because (1) > lack of time and (2) inability to test, I don't have any access to any of > these BSD machines). There is no (longer any) 'single BSD release', so I doubt it has 'since been fixed' :) We should consider the different BSD derived OSes as separate, if slightly related, systems (much like SunOS <-> BSD.) The problem in the BSDI case is really simple: the autoconf test doesn't test whether the fs really supports large files, but rather whether the system has an off_t type that is 64 bits. BSDI has that type, but does not actually use it in any of the seek/tell functions. This has not been 'fixed' as far as I know, precisely because it isn't 'broken' :) I tried to fix the test, but I have been completely unable to find a proper test. There doesn't seem to be a 'standard' one, and I wasn't able to figure out what, say, 'zsh' uses -- black autoconf magic, for sure. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From trentm at ActiveState.com Tue Jan 16 01:24:54 2001 From: trentm at ActiveState.com (Trent Mick) Date: Mon, 15 Jan 2001 16:24:54 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <20010116005536.M1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 12:55:36AM +0100 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> Message-ID: <20010115162454.D3864@ActiveState.com> On Tue, Jan 16, 2001 at 12:55:36AM +0100, Thomas Wouters wrote: > On Mon, Jan 15, 2001 at 02:10:26PM -0800, Trent Mick wrote: > > > The problem is that these systems lie when they "say" (according to > > Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have > > largefile support. This seems to have happened for a particular release of > > BSD (which has since been fixed). I think that the Right(tm) (meaning the > > cleanest solution where the tests and definitions in the code actually > > represent the truth) answer is a proper configure test (sort of as Greg > > suggests). I don't really feel comfortable writing that patch (because (1) > > lack of time and (2) inability to test, I don't have any access to any of > > these BSD machines). > > There is no (longer any) 'single BSD release', so I doubt it has 'since been > fixed' :) Okay sure (showing my ignorance). My only understanding was that this "lying" was the case for some unspecified BSDs a while ago but that the latest releases of any of them *did* have largefile support. > > I tried to fix the test, but I have been completely unable to find a proper > test. There doesn't seem to be a 'standard' one, and I wasn't able to figure > out what, say, 'zsh' uses -- black autoconf magic, for sure. Hmmm... if one code encode whether or not a 64-bit fseek could be implemented (either using fseek, fseek0, fseek64, _fseek, fsetpos/fgetpos, etc.) in a short C program then that would be the test (or at least most of the test, might have to see if ftell could be implemented as well). Or are there other requirements? Trent -- Trent Mick TrentM at ActiveState.com From esr at thyrsus.com Tue Jan 16 02:26:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 20:26:14 -0500 Subject: [Python-Dev] time functions In-Reply-To: <20010116004930.L1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 12:49:30AM +0100 References: <20010116004930.L1005@xs4all.nl> Message-ID: <20010115202614.A11732@thyrsus.com> Thomas Wouters : > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) Likewise. -- Eric S. Raymond Never trust a man who praises compassion while pointing a gun at you. From barry at digicool.com Tue Jan 16 03:14:33 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 21:14:33 -0500 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <14947.44681.254332.976234@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> I'm 0/+1/+1 myself :) Maybe I'm an inch on the +0/+1/+1 side. :) From jeremy at alum.mit.edu Tue Jan 16 01:11:59 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 15 Jan 2001 19:11:59 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115162619.A19484@kronos.cnri.reston.va.us> References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> <14947.20512.140859.119597@localhost.localdomain> <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: <14947.37327.395622.66435@localhost.localdomain> >>>>> "AMK" == Andrew Kuchling writes: AMK> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: >> Let's have all the interested parties vote now, then. It would >> certainly be helpful to have the new unittest module in the alpha >> release of 2.1. I'd like to write some new tests and I'd rather >> use the new stuff than the old stuff. AMK> Huh? If no one has tried the different modules, what's the AMK> point of having a vote? (Given that doctest is going to be AMK> added, though, it should be checked in ASAP.) Guido is the only person that said he hadn't tried anything. If others have given it a whirl, they ought to chime in now. If very few people have given them a try, we should decide whether we wait for them or proceed without them. We can't wait indefinitely. I'm not sure when we need to decide. Jeremy From nas at arctrix.com Mon Jan 15 20:40:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 11:40:55 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: <200101132225.RAA03197@cj20424-a.reston1.va.home.com>; from guido@python.org on Sat, Jan 13, 2001 at 05:25:12PM -0500 References: <20010113071758.C28643@glacier.fnational.com> <200101132225.RAA03197@cj20424-a.reston1.va.home.com> Message-ID: <20010115114055.A5879@glacier.fnational.com> On Sat, Jan 13, 2001 at 05:25:12PM -0500, Guido van Rossum wrote: > Do you have a tool that detects leaks? debauch is showing promise athough it is still pretty rough around the edges. memprof is another option. It looks like init_exceptions may be leaking memory. Some debauch output: 1 Leaked Memory 0x0849cf98, size 44 (from 0x0) AllocTime: 79269 FreeTime: 43436 return stack: ???:?? (0x40016005) classobject.c:84 (0x805c16d) exceptions.c:337 (0x8088594) exceptions.c:1061 (0x80898dc) pythonrun.c:151 (0x8053581) loop.c:23 (0x8053305) I haven't figured out if this is a real leak yet. Neil From michel at digicool.com Tue Jan 16 07:33:00 2001 From: michel at digicool.com (Michel Pelletier) Date: Mon, 15 Jan 2001 22:33:00 -0800 (PST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14947.37327.395622.66435@localhost.localdomain> Message-ID: On Mon, 15 Jan 2001, Jeremy Hylton wrote: > >>>>> "AMK" == Andrew Kuchling writes: > > AMK> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: > >> Let's have all the interested parties vote now, then. It would > >> certainly be helpful to have the new unittest module in the alpha > >> release of 2.1. I'd like to write some new tests and I'd rather > >> use the new stuff than the old stuff. > > AMK> Huh? If no one has tried the different modules, what's the > AMK> point of having a vote? (Given that doctest is going to be > AMK> added, though, it should be checked in ASAP.) > > Guido is the only person that said he hadn't tried anything. If > others have given it a whirl, they ought to chime in now. I have used pyunit to create a simple set of tests. It seemed to do the job well and it was very easy. I'd never done it before and the docs were fat and A+. I can only give a one-sided opinion. I know of AMK's work but I have not used it, are there others? -Michel From akuchlin at mems-exchange.org Tue Jan 16 04:03:31 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Mon, 15 Jan 2001 22:03:31 -0500 Subject: [Python-Dev] Detecting install time Message-ID: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> For PEP 229, the setup.py script needs to figure out if it's running from the build directory, because then distutils.sysconfig needs to look at different config files; ./Modules/Makefile instead of /usr/lib/python2.0/config/Makefile, and so forth. Is there a simple/clean way to do this? --amk From guido at python.org Tue Jan 16 04:21:43 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:21:43 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Mon, 15 Jan 2001 17:13:03 EST." <20010115171303.A23626@kronos.cnri.reston.va.us> References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> <200101121351.IAA19676@cj20424-a.reston1.va.home.com> <20010115171303.A23626@kronos.cnri.reston.va.us> Message-ID: <200101160321.WAA00648@cj20424-a.reston1.va.home.com> > On Fri, Jan 12, 2001 at 08:51:51AM -0500, Guido van Rossum wrote: > >Ah. It's very simple. I create a directory "linux" as a subdirectory > >of the Python source tree (i.e. at the same level as Lib, Objects, > >etc.). Then I chdir into that directory, and I say "../configure". > >The configure script creates subdirectories to hold the object files ... > >Then I say "make" and it builds Python. > > This doesn't work at all for me in my copy of the CVS tree. Are there > other steps or requirements to make this work. (Transcript available > upon request, but I suspect I'm missing something simple.) You can't start doing this in a tree where you have already built Python using the default way -- you have to use a pristine tree. The reason is the funny way Make's VPATH feature works, it sees the .o files in the source directory and then thinks it doesn't have to creat the .o file in the build directory. I think a "make clobber" at the top level would probably eradicate everything that confuses Make. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:24:04 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:24:04 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 14:35:47 PST." References: Message-ID: <200101160324.WAA00677@cj20424-a.reston1.va.home.com> > I don't know whether this is going to be obvious or controversial, > but here goes. Most of the time we're used to seeing a newline as > '\n', not as '\012', and newlines are typed in as '\n'. > > A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > > and ask "what's \012?" -- whereupon one has to explain that it's an > octal escape, that 012 in octal equals 10, and that chr(10) is > newline, which is the same as '\n'. You're bound to run into this, > and you'll see \012 a lot, because \n is such a common character. > Aside from being slightly more frightening, '\012' also takes up > twice as many characters as necessary. > > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. +1 on the idea; no time to study the patch tonight. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:28:38 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:28:38 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 18:15:50 EST." <20010115181550.A11566@thyrsus.com> References: <20010115181550.A11566@thyrsus.com> Message-ID: <200101160328.WAA00723@cj20424-a.reston1.va.home.com> > > So... i'm submitting a patch that causes the three most common > > special whitespace characters, '\n', '\r', and '\t', to appear in > > their natural form rather than as octal escapes when strings are > > printed and repr()ed. > > Works for me. I'd add \v, \b and \a to cover the whole ANSI C > standard escape set (hmmm...am I missing any?) You missed \f [*]. Unclear to me whether it's a good idea to add the lesser-known ones; they are just as likely binary gobbledegook rather than what their escapes stand for. [*] http://www.python.org/doc/current/ref/strings.html --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:31:19 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:31:19 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 00:49:30 +0100." <20010116004930.L1005@xs4all.nl> References: <20010116004930.L1005@xs4all.nl> Message-ID: <200101160331.WAA00780@cj20424-a.reston1.va.home.com> > Maybe this is a dead and buried subject, but I'm going to try anyway, since > everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood > lately :) > > Why do we need the following atrocity : > > timestr = time.strftime("", time.localtime(time.time())) > > To do the simple task of 'date +' ? I never really understood why > there isn't a way to get a timetuple directly from C, rather than converting > a float that we got from C a bytecode before, even though the higher level > almost always deals with timetuples. How about making the float-to-tuple > functions (time.localtime, time.gmtime) accept 0 arguments as well, and > defaulting to time.time() in that case ? Even better, how about doing the > same for the other functions, too ? (where it makes sense, of course :) > > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) I don't see the confusion. > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) Yes, I've wondered this myself too. I guess the current API is based too much on the C API... +1/+1/+1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:47:32 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:47:32 -0500 Subject: [Python-Dev] Detecting install time In-Reply-To: Your message of "Mon, 15 Jan 2001 22:03:31 EST." <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> References: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> Message-ID: <200101160347.WAA01132@cj20424-a.reston1.va.home.com> > For PEP 229, the setup.py script needs to figure out if it's running > from the build directory, because then distutils.sysconfig needs to > look at different config files; ./Modules/Makefile instead of > /usr/lib/python2.0/config/Makefile, and so forth. Is there a > simple/clean way to do this? You could check for the presence of config.status -- that file is not installed. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 16 04:53:16 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 22:53:16 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Message-ID: [?!ng] > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. -1 on doing that when they're printed (although I probably misunderstand what you mean there). +1 for changing repr() as suggested. -0 on generalizing to \a \b \f \v too (I've never used one of those in a string literal in my life, so would be more baffled by seeing one come back than I would the octal equivalent). I would also be +1 on using hex escapes instead of octal (I grew up on 36- and 60-bit machines, but that was the last time octal looked *natural*!). Octal and hex escapes both consume 4 characters, so I can't imagine what octal has going for it in the 21st century . 377-is-an-irritating-way-to-spell-ff-ly y'rs - tim PS: Note that C doesn't define what numerical values \a etc have, just that: Each of these escape sequences shall produce a unique implementation-defined value which can be stored in a single char object. The external representations in a text file need not be identical to the internal representations, and are outside the scope of this International Standard. The current method does have the advantage of extreme clarity. From guido at python.org Tue Jan 16 05:08:46 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 23:08:46 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Mon, 15 Jan 2001 16:24:54 PST." <20010115162454.D3864@ActiveState.com> References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> Message-ID: <200101160408.XAA01368@cj20424-a.reston1.va.home.com> Looking at the code (in _portable_fseek()) that uses TELL64, I don't understand why it can't use fgetpos(). That code is used only when fpos_t -- the type used by fgetpos() and fsetpos() -- is 64-bit. Trent, you wrote that code. Why wouldn't this work just as well? (your code): if ((pos = TELL64(fileno(fp))) == -1L) return -1; (my suggestion): if (fgetpos(fp, &pos) != 0) return -1; It can't be because fgetpos() doesn't exist or is otherwise unusable, because the SEEK_CUR case uses it. We also know that offset is 8-bit capable (the #if around the declaration of _portable_fseek() ensures that). I would even go as far as to collapse the entire switch as follows: fpos_t pos; switch (whence) { case SEEK_END: /* do a "no-op" seek first to sync the buffering so that the low-level tell() can be used correctly */ if (fseek(fp, 0, SEEK_END) != 0) return -1; /* fall through */ case SEEK_CUR: if (fgetpos(fp, &pos) != 0) return -1; offset += pos; break; /* case SEEK_SET: break; */ } return fsetpos(fp, &offset); --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 05:13:40 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 23:13:40 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 22:53:16 EST." References: Message-ID: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> > [?!ng] > > So... i'm submitting a patch that causes the three most common > > special whitespace characters, '\n', '\r', and '\t', to appear in > > their natural form rather than as octal escapes when strings are > > printed and repr()ed. > > -1 on doing that when they're printed (although I probably misunderstand > what you mean there). Ping was using imprecise language here -- he meant repr() and "printed at the command line prompt." > +1 for changing repr() as suggested. > > -0 on generalizing to \a \b \f \v too (I've never used one of those in a > string literal in my life, so would be more baffled by seeing one come back > than I would the octal equivalent). > > I would also be +1 on using hex escapes instead of octal (I grew up on 36- > and 60-bit machines, but that was the last time octal looked *natural*!). Me too. One summer vacation while in college I had nothing better to do than decode the Pascal runtime system for the University's CDC-6600 from an octal dump into assembly. Learned lots! > Octal and hex escapes both consume 4 characters, so I can't imagine what > octal has going for it in the 21st century . Originally, using \x for these was impractical (at least) because of the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics of the \x escape. Now we've fixed this, I agree. > 377-is-an-irritating-way-to-spell-ff-ly y'rs - tim > > > PS: Note that C doesn't define what numerical values \a etc have, just > that: > > Each of these escape sequences shall produce a unique > implementation-defined value which can be stored in a single > char object. The external representations in a text file need > not be identical to the internal representations, and are > outside the scope of this International Standard. > > The current method does have the advantage of extreme clarity. Python doesn't support non-ASCII machines, like the C standard (pretends to). --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 16 05:26:13 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 23:26:13 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101160328.WAA00723@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 10:28:38PM -0500 References: <20010115181550.A11566@thyrsus.com> <200101160328.WAA00723@cj20424-a.reston1.va.home.com> Message-ID: <20010115232613.B12166@thyrsus.com> Guido van Rossum : > > > So... i'm submitting a patch that causes the three most common > > > special whitespace characters, '\n', '\r', and '\t', to appear in > > > their natural form rather than as octal escapes when strings are > > > printed and repr()ed. > > > > Works for me. I'd add \v, \b and \a to cover the whole ANSI C > > standard escape set (hmmm...am I missing any?) > > You missed \f [*]. Unclear to me whether it's a good idea to add the > lesser-known ones; they are just as likely binary gobbledegook rather > than what their escapes stand for. > > [*] http://www.python.org/doc/current/ref/strings.html Truth is, Guido, I'm kind of iffy about whether there'd be a gain in clarity myself. But I find I'm rather attached to the idea of maintaining strictest possible symmetry between what Python handles on input and what it emits on output. So unless we think adding \f, \v, \b, and \a to the special set would actually produce a *loss* of clarity relative to octal gibberish (!), I say do 'em all. Aesthetically, that feels to me like the right thing, and the *Pythonic* thing, to do here. Have I erred in my intuition, O BDFL? -- Eric S. Raymond A man who has nothing which he is willing to fight for, nothing which he cares about more than he does about his personal safety, is a miserable creature who has no chance of being free, unless made and kept so by the exertions of better men than himself. -- John Stuart Mill, writing on the U.S. Civil War in 1862 From nas at arctrix.com Mon Jan 15 22:45:28 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 13:45:28 -0800 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <20010115232613.B12166@thyrsus.com>; from esr@thyrsus.com on Mon, Jan 15, 2001 at 11:26:13PM -0500 References: <20010115181550.A11566@thyrsus.com> <200101160328.WAA00723@cj20424-a.reston1.va.home.com> <20010115232613.B12166@thyrsus.com> Message-ID: <20010115134528.B6193@glacier.fnational.com> On Mon, Jan 15, 2001 at 11:26:13PM -0500, Eric S. Raymond wrote: > [...] I find I'm rather attached to the idea of maintaining > strictest possible symmetry between what Python handles on > input and what it emits on output. > > So unless we think adding \f, \v, \b, and \a to the special set would > actually produce a *loss* of clarity relative to octal gibberish (!), > I say do 'em all. Symmetry is good but I bet most people who would see \f, \v, \b, \a wouldn't have entered those characters using escapes. Most likely those character's would have been read from a binary file. That said, I don't really mind either way. Neil From tim.one at home.com Tue Jan 16 05:43:06 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 23:43:06 -0500 Subject: [Python-Dev] Whitesapce normalization Message-ID: You may have noticed that I checked in changes to most of the modules in the top level of Lib yesterday (Sunday). This is part of a Crusade that was supposed to happen before 2.0a1, but got dropped on the floor then due to misunderstandings: make the Python code we distribute adhere to Guido's style guide (4-space indents, no hard tabs), + clean up minor whitespace nits (no stray blank lines at the ends of files, no trailing whitespace on lines, last line of the file should end with a newline). It would be nice if people cleaned up their code this way too; I'm not going to go thru the entire distribution doing this. So, if you give a rip, pick a directory or some modules you're fond of, and clean 'em up. The program Tools/scripts/reindent.py does all of the above for you, so it's not hard. But it takes some care in two areas, which is why I did the top level of Lib one file at a time by hand, and studied diffs by eyeball before checking in any changes: + It's unlikely but possible that some program file *depends* on trailing whitespace. That plain sucks (it's *going* to break sooner or later), but reindent.py can't help you there. + While reindent should never otherwise damage program logic, very strange commenting or docstring styles may get mangled by it, making code and/or docs hard to read. reindent works very hard to do a good job on that, and indeed I found no need to make manual changes to anything it did in the top level of Lib. But check anyway. Especially some of the very oldest modules are littered with ugly stuff like # all over the place, from back when nobody had an editor smart enough to skip over preceding blank lines when suggesting indentation for the current line. Then again, maybe we should just drop the Irix5 directory . voice-in-the-wilderness-ly y'rs - tim From esr at thyrsus.com Tue Jan 16 05:43:24 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 23:43:24 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from tim.one@home.com on Mon, Jan 15, 2001 at 10:53:16PM -0500 References: Message-ID: <20010115234324.C12166@thyrsus.com> Tim Peters : > I would also be +1 on using hex escapes instead of octal (I grew up on 36- > and 60-bit machines, but that was the last time octal looked *natural*!). > Octal and hex escapes both consume 4 characters, so I can't imagine what > octal has going for it in the 21st century . Tim, on the level of aesthetic preference I'm totally with you. I've always found octal really ugly myself. Hex fits my brain better; somehow I find it easier to visualize the bit patterns from. Sadly, there are so many other related ways in which Python intelligently follows C/Unix conventions that I think changing to a default of hex escapes rather than octal would violate the Rule of Least Surprise. One of the things I like about Python is precisely its conservatism in areas like string escapes, that Guido refrained from inventing new OS APIs or new conventions for things like string escapes in places where Unix and C did them in a well-established and reasonable way. He didn't make the mistake, all too typical in academic languages, of confusing novelty with value... This conservatism is valuable because it frees the C-experienced programmer's mind from having to think about where the language is trivially different, so he can concentrate on where it's importantly different. It's worth maintaining. On the other hand, the change would mesh well with the Unicode support. Hmm. Tough call. I could go either way, I guess. -- Eric S. Raymond The politician attempts to remedy the evil by increasing the very thing that caused the evil in the first place: legal plunder. -- Frederick Bastiat From tim.one at home.com Tue Jan 16 06:07:16 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 16 Jan 2001 00:07:16 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <20010115234324.C12166@thyrsus.com> Message-ID: [Eric] > Tim, on the level of aesthetic preference I'm totally with you. > I've always found octal really ugly myself. Hex fits my brain > better; somehow I find it easier to visualize the bit patterns from. > > Sadly, there are so many other related ways in which Python > intelligently follows C/Unix conventions that I think changing to > a default of hex escapes rather than octal would violate the Rule > of Least Surprise. > > ... [and skipping nice stuff I *do* agree with ] ... The saving grace here is that repr() is a form of ASCII dump. C has nothing to say about that, while last time I used Unix it was real easy to get dumps in hex (and indeed that's what everyone I knew routinely did). I expect that od retains both its name and its octal defaults on most systems simply due to inertia. An octal dump would be infinitely surprising on Windows (I'm not sure I can even get one without writing it myself). Do people actually use octal dumps on Unices anymore? I'd be surprised, if they're running on power-of-2 boxes. Defaults aren't conventions when *everyone* overrides them, they're just old and in the way. takes-one-to-know-one-ly y'rs - tim From ping at lfw.org Tue Jan 16 06:27:33 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 21:27:33 -0800 (PST) Subject: [Python-Dev] time functions In-Reply-To: <20010116004930.L1005@xs4all.nl> Message-ID: On Tue, 16 Jan 2001, Thomas Wouters wrote: > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. I like all of these suggestions. Go for it! -- ?!ng From esr at thyrsus.com Tue Jan 16 06:31:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 00:31:14 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from tim.one@home.com on Tue, Jan 16, 2001 at 12:07:16AM -0500 References: <20010115234324.C12166@thyrsus.com> Message-ID: <20010116003114.A12365@thyrsus.com> Tim Peters : > Do people actually use octal dumps on Unices anymore? Well, we do when we momentarily forget to give od(1) the -x escape :-) This so annoyed me that back around 1983 I wrote my own hex dumper specifically to emulate the 16-hex-bytes-with-midpage-gutter-and-ASCII- over-on-the-right-side format that CP/M used and DOS inherited. It's still available at . Do you know the history on this? C speaks octal because a bunch of mode fields in the PDP-11 instruction word were three bits wide. Time was it was actually useful to have the output from (say) core files chunk that way. But I haven't seen an octal code dump in over a decade, probably pushing fifteen years now. -- Eric S. Raymond In the absence of any evidence tending to show that possession or use of a 'shotgun having a barrel of less than eighteen inches in length' at this time has some reasonable relationship to the preservation or efficiency of a well regulated militia, we cannot say that the Second Amendment guarantees the right to keep and bear such an instrument. [...] The Militia comprised all males physically capable of acting in concert for the common defense. -- Majority Supreme Court opinion in "U.S. vs. Miller" (1939) From ping at lfw.org Tue Jan 16 06:33:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 21:33:42 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> Message-ID: On Mon, 15 Jan 2001, Guido van Rossum wrote: > > > special whitespace characters, '\n', '\r', and '\t', to appear in > > > their natural form rather than as octal escapes when strings are > > > printed and repr()ed. > > > > -1 on doing that when they're printed (although I probably misunderstand > > what you mean there). > > Ping was using imprecise language here -- he meant repr() and "printed > at the command line prompt." Yes, i referred to "when strings are printed and repr()ed" as two cases because both string_print() and string_repr() have to be changed. (Side question: when are *_print() and *_repr() ever different, and why?) > Originally, using \x for these was impractical (at least) because of > the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics > of the \x escape. Now we've fixed this, I agree. Oh, now i understand. Good point. I'll update the patch to do hex. 0xdeadbeef-ly yours, -- ?!ng From fredrik at effbot.org Tue Jan 16 08:11:38 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 16 Jan 2001 08:11:38 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <00b201c07f8b$93996820$e46940d5@hagrid> thomas wrote: > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) where "now" is local time, I assume? since you're assuming a time zone, you could make it accept an integer as well... > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) same here. From thomas at xs4all.net Tue Jan 16 08:18:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 08:18:38 +0100 Subject: [Python-Dev] time functions In-Reply-To: <00b201c07f8b$93996820$e46940d5@hagrid>; from fredrik@effbot.org on Tue, Jan 16, 2001 at 08:11:38AM +0100 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> Message-ID: <20010116081838.N1005@xs4all.nl> On Tue, Jan 16, 2001 at 08:11:38AM +0100, Fredrik Lundh wrote: > thomas wrote: > > - Making the time in time.strftime default to 'now', so that the above > > becomes the ever so slightly confusing: > > > > timestr = time.strftime("") > > (confusing because it looks a bit like a regexp constructor...) > where "now" is local time, I assume? Yes. See the patch I'll upload later today (meetings first, grrr) > since you're assuming a time zone, you could make it accept > an integer as well... Could, yes... I'll include it in the 2nd revision of the patch, it can be rejected (or accepted) separately. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 16 09:22:11 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 09:22:11 +0100 Subject: [Python-Dev] time functions In-Reply-To: <20010116081838.N1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 08:18:38AM +0100 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> Message-ID: <20010116092211.O1005@xs4all.nl> On Tue, Jan 16, 2001 at 08:18:38AM +0100, Thomas Wouters wrote: > On Tue, Jan 16, 2001 at 08:11:38AM +0100, Fredrik Lundh wrote: > > > timestr = time.strftime("") > > since you're assuming a time zone, you could make it accept > > an integer as well... > Could, yes... Actually, on second thought, lets not, not just yet anyway. Doing that for all functions in the time module would continue to pollute the already toxic waters of a C API translated into Python :P Who knows what 'ctime' stands for, anyway ? And 'asctime' ? How can we expect Python programmers who think 'C' is a high note or average grade, to understand how the time module is supposed to be used ? :) We now have: time() -- return current time in seconds since the Epoch as a float gmtime() -- convert seconds since Epoch to UTC tuple localtime() -- convert seconds since Epoch to local time tuple asctime() -- convert time tuple to string ctime() -- convert time in seconds to string mktime() -- convert local time tuple to seconds since Epoch strftime() -- convert time tuple to string according to format specification where asctime and ctime are basically wrappers around strftime, and would do the exact same thing if they both accepted tuples and floats. I think we should have something like: time() -- current time in float timetuple() -- current (local) time in timetuple tuple2time(tuple) -- tuple -> float time2tuple(float, tz=local) -- float -> tuple using timezone tz stringtime(time=now, format="ctimeformat") -- convert time value to string Those are just working names, to make the point, I don't have time to think up better ones :) I'm not sure if the timezone support in the above list is extensive enough, mostly because I hardly use timezones myself. Also, tuple2time() could be merged with time(), and likewise for time2tuple() and timetuple(). I think keeping strftime() and maybe ctime() for ease-of-use is a good idea, but the rest could eventually be deprecated. Off-to-important-meetings-*cough*-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Tue Jan 16 09:30:28 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 16 Jan 2001 09:30:28 +0100 Subject: [Python-Dev] unit testing bake-off References: Message-ID: <01ba01c07f96$967b7870$e46940d5@hagrid> Tim Peters wrote: > At least you, Jeremy and Fredrik have tried them, and > if that's all there can't be a tie . let me guess: Jeremy: PyUnit Andrew: unittest Fredrik: unittest (I find pyunit a bit unpythonic, and both overengineered and underengineered at the same time... hard to explain, but I strongly prefer unittest) > I would agree this is not an ideal decision procedure. well, any decision procedure that comes up with what I want just has to be ideal ;-) From andy at reportlab.com Tue Jan 16 10:20:45 2001 From: andy at reportlab.com (Andy Robinson) Date: Tue, 16 Jan 2001 09:20:45 -0000 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115204701.11972EA6B@mail.python.org> Message-ID: > Subject: Re: [Python-Dev] unit testing bake-off > From: Guido van Rossum > Date: Mon, 15 Jan 2001 14:17:27 -0500 > > There doesn't seem to be a lot of enthousiasm for a Unittest > bakeoff... Certainly I don't think I'll get to this myself before the > conference. > > How about the following though: talking of low-hanging fruit, Tim's > doctest module is an excellent thing even if it isn't a unit testing > framework! (I found this out when I played with it -- it's real easy > to get used to...) > > Would anyone object against Tim checking this in? Since it isn't a > contender in the unit test bake-off, it shouldn't affect the outcome > there at all. > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think it should definitely go in. Ditto with whatever testing framework and documentation tools (pydoc etc.) shortly emerge as "best of breed". I spend my time on corporate consulting projects, and saying things like "Python has standard tools for unit testing and documentation" is even better than saying "We have standard tools for unit testing and documentation". BTW, ReportLab has recently adopted PyUnit's unittest.py It feels a bit Java-like to me - a few more lines of code than needed - but it certainly works. One key feature is aggregating test suites; a big app we installed on a customer site can run the test suite for itself, the ReportLab library (whose test suite we are just getting to work on) and four or five dependent utilities; another is that people have heard of JUnit. Just my 2p worth, Andy Robinson From tony at lsl.co.uk Tue Jan 16 10:47:01 2001 From: tony at lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 16 Jan 2001 09:47:01 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101152041.PAA32298@cj20424-a.reston1.va.home.com> Message-ID: <003901c07fa1$46e10c70$f05aa8c0@lslp7o.int.lsl.co.uk> In the context of my starting doc strings in an Emacs Lisp manner, Ka-Ping Yee said: > I think i'm going to ask you to stop, unless Guido prefers > otherwise. Guido, do you have a style pronouncement for module > docstrings? and since Guido replied > I'm with Ping. None of the examples in the style guide start the > docstring with the function name. Almost none of the standard library > modules start their module docstring with the module name (codecs is > an exception, but I didn't write it :-). I shall indeed stop (of course, my habit started before we HAD documentation tools, and if we're going to browse things with pydoc, et al, then there's no need for it. To be honest, it's the answer I expected. Oh dear, another item for my TO DO list (i.e., remove the offending nits). Still, if it's only me it's hardly high impact! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Which is safer, driving or cycling? Cycling - it's harder to kill people with a bike... My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony at lsl.co.uk Tue Jan 16 11:13:31 2001 From: tony at lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 16 Jan 2001 10:13:31 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message-ID: <003a01c07fa4$fa0883c0$f05aa8c0@lslp7o.int.lsl.co.uk> I mentioned a "spurious" > The system cannot find the path specified. on NT, and Ka-Ping Yee said: > Thanks for the NT testing. That's funny -- i put in a special case > for Windows to avoid messages like the above a couple of days ago. > How recently did you download pydoc.py? Does your copy contain: > > if hasattr(sys, 'winver'): > return lambda text: tempfilepager(text, 'more') Hmm. I downloaded it when I read the email message announcing it, which was yesterday some time. But it doesn't look like the lines you mention are there - I'll try re-downloading... ...I've redownloaded the files from http://www.lfw.org/python/pydoc.py, etc., and done a grep for hasattr within them. There's no check such as the one you mention, so I guess it's "download impedance". > So you can see what i'm up to, here's my current to-do list: > > make boldness optional (only if using more/less? only Unix?) probably sensible. By the way, I don't get boldness on the NT box - any chance (he says, not intending to help *at all* in doing it!) of it happening there as well? (or would that depend on what curses support is built into the Python?) > document a .py file given on the command line also allow for a directory module (i.e., something with __init__.py in it) given on the command line? > write a better htmlrepr (\n should look special, max > length limit, etc.) yes, but these things can always get better - the fact it's working allows for improoooovement down the line. > generate HTML index from precis and __path__ and package a neat idea - definitely Good Stuff! > contents list well, I always do these, so I'm for this one as well > have help(...) produce a directory of available things to > ask for help on bouncy fun! > Windows and Mac testing I'm running Windows 98 with Python 1.5.2 at home, and will willingly try it out on that (after all, it's not a very big download) - although it might sometimes take a day or two to get round to it (for instance, I haven't yet done so!). But I suspect I shan't be a very demanding user... > default to HTTP mode on GUI platforms? (win, mac) > > The ones marked with + i consider done. Feel free to comment on > or suggest priorities for the others; in particular, what do you > think of the last one? The idea is that double-clicking on > pydoc.py in Windows or MacOS could launch the server and then open > the localhost URL using webbrowser.py to display the documentation > index. Should it do this by default? I'll leave that to better designers than myself (although if one is to *have* a double click action, that seems sensible to me). (looks up webbrowser.py - ah, a 2.0 module). Personally, I'd also like to have the option of having a "mini-browser" supported directly, perhaps in Tkinter, so I don't need to start up a whole web browser. But again I may be odd in that wish (I can't remember what IDLE does). Oh - that also means "integrate into IDLE" presumably goes on at least a WishList as well... Other ideas: * command line switch to *output* HTML to a file (i.e., documentation generation) (presumably something like "-o .html", where the "html" indicates the output format - an alternative being "txt" * if I ever finish the docutils effort (I should be getting back to it soon) then use that to format the texts (this would mean I need not worry about the "frontend" to docutils too much, since pydoc is already doing so much). Or maybe the docutils tool should be importing pydoc... Tibs (must do some (paid) work now!) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal at lemburg.com Tue Jan 16 11:18:44 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 11:18:44 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <3A642004.F6197E86@lemburg.com> Thomas Wouters wrote: > > Maybe this is a dead and buried subject, but I'm going to try anyway, since > everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood > lately :) > > Why do we need the following atrocity : > > timestr = time.strftime("", time.localtime(time.time())) > > To do the simple task of 'date +' ? I never really understood why > there isn't a way to get a timetuple directly from C, rather than converting > a float that we got from C a bytecode before, even though the higher level > almost always deals with timetuples. How about making the float-to-tuple > functions (time.localtime, time.gmtime) accept 0 arguments as well, and > defaulting to time.time() in that case ? Even better, how about doing the > same for the other functions, too ? (where it makes sense, of course :) > > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) +1 all the way -- though these days I tend not to use the time module anymore. mxDateTime already does everything I want and there date/time values are objects rather than Python integers or tuples... ok, I'm just showing opff a little :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Jan 16 11:32:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 11:32:21 +0100 Subject: [Python-Dev] Strings: '\012' -> '\n' References: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> Message-ID: <3A642335.82358B02@lemburg.com> Minor nit about this idea: it makes decoding repr() style strings harder for external tools and it could cause breakage (e.g. if "\n" is usedby the encoding for some other purpose). BTW, since there are a gazillion ways to encode strings into 7-bit ASCII, why not use the new codec design to add additional output schemes for 8-bit strings ?! Strings have an .encode() method as well... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Tue Jan 16 11:37:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 16 Jan 2001 02:37:42 -0800 (PST) Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <003a01c07fa4$fa0883c0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Before somebody decides to shoot us for spamming both lists, i'm taking this thread off of python-dev and solely to doc-sig. Please continue further discussion there... -- ?!ng From ping at lfw.org Tue Jan 16 11:47:02 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 16 Jan 2001 02:47:02 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Message-ID: On Mon, 15 Jan 2001, Ka-Ping Yee wrote: > On Mon, 15 Jan 2001, Guido van Rossum wrote: > > Originally, using \x for these was impractical (at least) because of > > the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics > > of the \x escape. Now we've fixed this, I agree. > > Oh, now i understand. Good point. I'll update the patch to do hex. I assume you would like Unicode strings to do the same (\n, \t, \r, and \xff rather than \377). Guido, do you have a Pronouncement on \v, \f, \b, \a? By the way, why do Unicode escapes appear in capitals? >>> u'\uface' u'\uFACE' (If someone tells me that there happens to be a picture of a face at that code point, i'll laugh. Is there a cow at \uBEEF?) Does anyone care that \x will be followed by lowercase and \u by uppercase? I noticed that the tutorial claims Unicode strings can be str()-ified and will encode themselves using UTF-8 as default. But this doesn't actually work for me: >>> us = u'\uface' >>> us u'\uFACE' >>> str(us) Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> us.encode() Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> us.encode('UTF-8') '\xef\xab\x8e' Assuming i have understood this correctly, i have submitted a patch to correct tut.tex. -- ?!ng From bckfnn at worldonline.dk Tue Jan 16 11:52:10 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Tue, 16 Jan 2001 10:52:10 GMT Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: References: Message-ID: <3a642768.6426631@smtp.worldonline.dk> [Ping] >I don't know whether this is going to be obvious or controversial, >but here goes. Most of the time we're used to seeing a newline as >'\n', not as '\012', and newlines are typed in as '\n'. > >A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > >and ask "what's \012?" -- whereupon one has to explain that it's an >octal escape, that 012 in octal equals 10, and that chr(10) is >newline, which is the same as '\n'. You're bound to run into this, >and you'll see \012 a lot, because \n is such a common character. >Aside from being slightly more frightening, '\012' also takes up >twice as many characters as necessary. > >So... i'm submitting a patch that causes the three most common >special whitespace characters, '\n', '\r', and '\t', to appear in >their natural form rather than as octal escapes when strings are >printed and repr()ed. I like it, because it removes yet another difference between Python and Jython. Jython happens to handle these chars specially: \n, \t, \b, \f and \r. regards, finn From esr at thyrsus.com Tue Jan 16 11:53:00 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 05:53:00 -0500 Subject: [Python-Dev] time functions In-Reply-To: <3A642004.F6197E86@lemburg.com>; from mal@lemburg.com on Tue, Jan 16, 2001 at 11:18:44AM +0100 References: <20010116004930.L1005@xs4all.nl> <3A642004.F6197E86@lemburg.com> Message-ID: <20010116055300.C12847@thyrsus.com> M.-A. Lemburg : > +1 all the way -- though these days I tend not to use the > time module anymore. mxDateTime already does everything I want > and there date/time values are objects rather than Python integers > or tuples... ok, I'm just showing opff a little :) mxDateTime is on my short list of "why isn't this in the Python library already?" Has it ever been discussed? -- Eric S. Raymond You need only reflect that one of the best ways to get yourself a reputation as a dangerous citizen these days is to go about repeating the very phrases which our founding fathers used in the great struggle for independence. -- Attributed to Charles Austin Beard (1874-1948) From mal at lemburg.com Tue Jan 16 12:18:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 12:18:24 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> <3A642004.F6197E86@lemburg.com> <20010116055300.C12847@thyrsus.com> Message-ID: <3A642E00.BD330647@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > +1 all the way -- though these days I tend not to use the > > time module anymore. mxDateTime already does everything I want > > and there date/time values are objects rather than Python integers > > or tuples... ok, I'm just showing opff a little :) > > mxDateTime is on my short list of "why isn't this in the Python library > already?" Has it ever been discussed? Yes. I'd rather keep it separate from the standard dist for various reasons. One of these reasons is that I will be moving the mx tools into a new packaging scheme built on distutils -- installing it should then boil down to a simple RPM install or maybe a "python setup.py install" thanks to distutils. The package will then become a subpackage of the mx package. BTW, I see distutils as strong argument for *not* including more exotic packages in Python's stdlib. If this catches on, I expect that together with the Vaults we are not far away from having our own CPAN style archive of add-on packages. I also expect the commercial vendors like ActiveState et al. to take care of wrapping SUMO distributions of Python and the existing add-ons. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 16 12:20:18 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 06:20:18 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <3a642768.6426631@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Tue, Jan 16, 2001 at 10:52:10AM +0000 References: <3a642768.6426631@smtp.worldonline.dk> Message-ID: <20010116062018.A12935@thyrsus.com> Finn Bock : > I like it, because it removes yet another difference between Python and > Jython. Jython happens to handle these chars specially: \n, \t, \b, \f > and \r. This is an argument for adding \b and \f to the special set in CPython. If the BDFL looks benignly on adding \v and \a, those should go into Jython's special set too. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address From fredrik at pythonware.com Tue Jan 16 12:37:10 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 16 Jan 2001 12:37:10 +0100 Subject: [Python-Dev] Strings: '\012' -> '\n' References: Message-ID: <03eb01c07fb0$aaaa19e0$0900a8c0@SPIFF> ping wrote: > By the way, why do Unicode escapes appear in capitals? > > >>> u'\uface' > u'\uFACE' > > (If someone tells me that there happens to be a picture of a face at > that code point, i'll laugh. Is there a cow at \uBEEF?) iirc, 0xFACE and 0xBEEF are part of the CJK and Hangul spaces. not sure 0xFACE is assigned, but 0xBEEF glyph looks like a ribcage with four legs... you'll find faces at 0x263A etc. From skip at mojam.com Tue Jan 16 14:09:51 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 16 Jan 2001 07:09:51 -0600 (CST) Subject: [Python-Dev] bummer - regsub/regex no longer in module index Message-ID: <14948.18463.971334.401426@beluga.mojam.com> I am now getting deprecation warnings about regsub so I decided to start replacing it with more zeal than I had previously. First thing I wanted to replace were some regsub.split calls. I went to the module index to look up the description but regsub was nowhere to be found. (I know, I know. I can use pydoc.) Still... how about continuing to include deprecated modules in the library reference manual but in a separate Deprecated Modules section and annotate them as such in the module index? Skip From guido at python.org Tue Jan 16 14:44:01 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 08:44:01 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 08:11:38 +0100." <00b201c07f8b$93996820$e46940d5@hagrid> References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> Message-ID: <200101161344.IAA04513@cj20424-a.reston1.va.home.com> > thomas wrote: > > - Making the time in time.strftime default to 'now', so that the above > > becomes the ever so slightly confusing: > > > > timestr = time.strftime("") > > (confusing because it looks a bit like a regexp constructor...) > > where "now" is local time, I assume? > > since you're assuming a time zone, you could make it accept > an integer as well... What would the integer mean? > > - Making the time in time.asctime and time.ctime optional, defaulting to > > 'now', so you can just call 'time.ctime()' without having to pass > > time.time() (which are about half the calls in my own code :) > > same here. Same what here? "now" == local time, sure. But accept an integer? It already accepts an integer! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 14:55:01 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 08:55:01 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 09:22:11 +0100." <20010116092211.O1005@xs4all.nl> References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> <20010116092211.O1005@xs4all.nl> Message-ID: <200101161355.IAA04802@cj20424-a.reston1.va.home.com> Let's not redesign the time module API too much. I'm all for adding the default argument values that Thomas proposes. Then, instead of changing the API, we should look into a higher-level Python module. That's how those things typically go. Digital Creations has its own time extension type somewhere in Zope, a bit similar to mxDateTime. I looked into making this a standard Python extension but quickly gave up. The problems with these things seems to be that it's hard to come up with a design that makes everyone happy: some people want small objects (because they have a lot of them around, e.g. a timestamp on almost every other object); others want timezone support; yet others want microsecond resolution; leap-second support; pre-Christian era support; support for nonstandard calendars; interval arithmetic; support for dates without times or times without dates... Python could use a better time type, but we'll have to look into which requirements make sense for a generalized type, and which don't. I fear that a committee could easily pee away years designing an interface to satisfy absolutely every wish. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:02:29 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:02:29 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 21:33:42 PST." References: Message-ID: <200101161402.JAA05045@cj20424-a.reston1.va.home.com> > Yes, i referred to "when strings are printed and repr()ed" as two cases > because both string_print() and string_repr() have to be changed. > > (Side question: when are *_print() and *_repr() ever different, and why?) You mean the tp_print and tp_str function slots in type objects, right? tp_print *should* always render exactly the same as tp_str. tp_print is used by the print statement, not by value display at the interactive prompt. tp_print and tp_str have differed historically for 3rd party extension types by accident. So, string_print most definitely should *not* be changed -- only string_repr! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:06:23 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:06:23 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 02:47:02 PST." References: Message-ID: <200101161406.JAA05153@cj20424-a.reston1.va.home.com> > I assume you would like Unicode strings to do the same (\n, \t, \r, > and \xff rather than \377). Yeah. > Guido, do you have a Pronouncement on \v, \f, \b, \a? Practicality beats purity: these will remain octal. > By the way, why do Unicode escapes appear in capitals? > > >>> u'\uface' > u'\uFACE' Could it be just that that's what Unicode folks are expecting? > (If someone tells me that there happens to be a picture of a face at > that code point, i'll laugh. Is there a cow at \uBEEF?) I'm laughing even though I don't see pictures. :-) > Does anyone care that \x will be followed by lowercase and \u by uppercase? It's mildly weird, and I think hex escapes in lowercase are more Pythonic than in upper case. > I noticed that the tutorial claims Unicode strings can be str()-ified > and will encode themselves using UTF-8 as default. But this doesn't > actually work for me: > > >>> us = u'\uface' > >>> us > u'\uFACE' > >>> str(us) > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > >>> us.encode() > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > >>> us.encode('UTF-8') > '\xef\xab\x8e' > > Assuming i have understood this correctly, i have submitted a patch > to correct tut.tex. Yeah, I guess that part of the tutorial was written before we changed our minds about this. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:09:56 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:09:56 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 11:32:21 +0100." <3A642335.82358B02@lemburg.com> References: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> <3A642335.82358B02@lemburg.com> Message-ID: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> > Minor nit about this idea: it makes decoding repr() style > strings harder for external tools and it could cause breakage > (e.g. if "\n" is usedby the encoding for some other purpose). Such a tool would be broken. If it accepts string literals it should accept all forms of escapes. > BTW, since there are a gazillion ways to encode strings into > 7-bit ASCII, why not use the new codec design to add additional > output schemes for 8-bit strings ?! > > Strings have an .encode() method as well... Good idea! This could also be used to "hexify" a string, for which currently one of the quickest ways is still the hack "%02x"*len(s) % tuple(s) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:11:53 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:11:53 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 06:20:18 EST." <20010116062018.A12935@thyrsus.com> References: <3a642768.6426631@smtp.worldonline.dk> <20010116062018.A12935@thyrsus.com> Message-ID: <200101161411.JAA05336@cj20424-a.reston1.va.home.com> > Finn Bock : > > I like it, because it removes yet another difference between Python and > > Jython. Jython happens to handle these chars specially: \n, \t, \b, \f > > and \r. [ESR] > This is an argument for adding \b and \f to the special set in > CPython. If the BDFL looks benignly on adding \v and \a, those > should go into Jython's special set too. No, I think Jython should remove \b and \f. Or the language standard could allow implementations some freedom here (as long as the output is a string literal). --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Jan 16 16:06:34 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 16 Jan 2001 10:06:34 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: References: <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: <14948.25466.698063.240902@cj42289-a.reston1.va.home.com> Tim Peters writes: > Presumably so that *something* gets into 2.1a1. At least you, Jeremy and > Fredrik have tried them, and if that's all there can't be a tie . I > would agree this is not an ideal decision procedure. I've been using PyUNIT some, but haven't tried the Quixote unittest module, which tells me I can't make a particularly informed recommendation (vote, whatever). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas at xs4all.net Tue Jan 16 16:23:52 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 16:23:52 +0100 Subject: [Python-Dev] time functions In-Reply-To: <200101161355.IAA04802@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 08:55:01AM -0500 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> <20010116092211.O1005@xs4all.nl> <200101161355.IAA04802@cj20424-a.reston1.va.home.com> Message-ID: <20010116162350.A21010@xs4all.nl> On Tue, Jan 16, 2001 at 08:55:01AM -0500, Guido van Rossum wrote: > Let's not redesign the time module API too much. [snip] Agreed. > I fear that a committee could easily pee away years designing an > interface to satisfy absolutely every wish. A committee is a life form with six or more legs and no brain. Lazarus Long in "Time Enough For Love", by R. A. Heinlein. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at mojam.com Tue Jan 16 18:23:56 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 16 Jan 2001 11:23:56 -0600 (CST) Subject: [Python-Dev] Re: [Patches] [Patch #102891] Alternative readline module In-Reply-To: References: Message-ID: <14948.33708.332464.107009@beluga.mojam.com> Michael> ... (or I'll just call it pyttyinput) Which, like "Guido", when properly pronounced should leave your monitor slightly moist... ;-) Skip From thomas at xs4all.net Tue Jan 16 18:36:03 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 18:36:03 +0100 Subject: [Python-Dev] Re: [Patches] [Patch #102891] Alternative readline module In-Reply-To: <14948.33708.332464.107009@beluga.mojam.com>; from skip@mojam.com on Tue, Jan 16, 2001 at 11:23:56AM -0600 References: <14948.33708.332464.107009@beluga.mojam.com> Message-ID: <20010116183603.B2776@xs4all.nl> On Tue, Jan 16, 2001 at 11:23:56AM -0600, Skip Montanaro wrote: > Which, like "Guido", when properly pronounced should leave your monitor > slightly moist... ;-) Nono, 'Guido' should be pronounced using a hard, back-of-your-throat 'G', more like a growl than a hiss. The less moisture the better :) You-were-thinking-of-Centraal-Wiskunde-Instituut-(cwi.nl)-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From trentm at ActiveState.com Tue Jan 16 19:36:29 2001 From: trentm at ActiveState.com (Trent Mick) Date: Tue, 16 Jan 2001 10:36:29 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <200101160408.XAA01368@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 11:08:46PM -0500 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> Message-ID: <20010116103626.D30209@ActiveState.com> On Mon, Jan 15, 2001 at 11:08:46PM -0500, Guido van Rossum wrote: > > Trent, you wrote that code. Why wouldn't this work just as well? > > (your code): > if ((pos = TELL64(fileno(fp))) == -1L) > return -1; > (my suggestion): > if (fgetpos(fp, &pos) != 0) > return -1; I agree, that looks to me like it would. I guess I just missed that when I wrote it. > > I would even go as far as to collapse the entire switch as follows: > > fpos_t pos; > switch (whence) { > case SEEK_END: > /* do a "no-op" seek first to sync the buffering so that > the low-level tell() can be used correctly */ > if (fseek(fp, 0, SEEK_END) != 0) > return -1; > /* fall through */ > case SEEK_CUR: > if (fgetpos(fp, &pos) != 0) > return -1; > offset += pos; > break; > /* case SEEK_SET: break; */ > } > return fsetpos(fp, &offset); Sure. Just get rid of the """do a "no-op" seek...""" comment because it is no longer applicable. I am not setup to test this on Win64 right and I don't suppose there are a lot of you out there with your own Win64 setups. I will be able to test this before the scheduled 2.1 beta (late Feb), though. Trent -- Trent Mick TrentM at ActiveState.com From trentm at ActiveState.com Tue Jan 16 20:34:17 2001 From: trentm at ActiveState.com (Trent Mick) Date: Tue, 16 Jan 2001 11:34:17 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <20010116103626.D30209@ActiveState.com>; from trentm@ActiveState.com on Tue, Jan 16, 2001 at 10:36:29AM -0800 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> <20010116103626.D30209@ActiveState.com> Message-ID: <20010116113417.I30209@ActiveState.com> On Tue, Jan 16, 2001 at 10:36:29AM -0800, Trent Mick wrote: > Sure. Just get rid of the """do a "no-op" seek...""" comment because it is no > longer applicable. I am not setup to test this on Win64 right and I don't s/right/right now/ Trent -- Trent Mick TrentM at ActiveState.com From cgw at fnal.gov Tue Jan 16 21:19:09 2001 From: cgw at fnal.gov (Charles G Waldman) Date: Tue, 16 Jan 2001 14:19:09 -0600 (CST) Subject: [Python-Dev] Re: [Patch #103248] Fix a memory leak in _sre.c Message-ID: <14948.44221.876681.838046@buffalo.fnal.gov> Frederik - I noticed that you chose to check in a slightly different patch than the one I submitted. I wonder why you chose to do this? In particular at line 1238 I had: if (PyErr_Occurred()) { Py_DECREF(self); return NULL; } and you changed this to if (PyErr_Occurred()) { PyObject_DEL(self); return NULL; } Can you explain why you made this (seemingly arbitrary) change? I think that since "self" was created via: self = PyObject_NEW_VAR(PatternObject, &Pattern_Type, n); which calls PyObjectINIT, which in turn calls _Py_NewReference, which increments _Py_RefTotal, it is incorrect to simply do a PyObject_DEL to de-allocate it -- won't this screw up the value of _Py_RefTotal? Admittedly this is a minor nit and only matters if Py_TRACE_REFS is defined - I just wanted to check to make sure my understanding of reference counting w.r.t. memory allocation and deallocation is correct - if the above is in error, I'd apprecate any corrections... From guido at python.org Tue Jan 16 21:53:41 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 15:53:41 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Tue, 16 Jan 2001 10:36:29 PST." <20010116103626.D30209@ActiveState.com> References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> <20010116103626.D30209@ActiveState.com> Message-ID: <200101162053.PAA13099@cj20424-a.reston1.va.home.com> > I agree, that looks to me like it would. I guess I just missed that when I > wrote it. Excellent! I've checked this in now -- we'll hear if it breaks anywhere soon enough. >I am not setup to test this on Win64 right [now] and I don't > suppose there are a lot of you out there with your own Win64 setups. What happened to ActiveState's Itanium boxes? --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Tue Jan 16 22:53:22 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 16 Jan 2001 16:53:22 -0500 Subject: [Python-Dev] Re: Detecting install time In-Reply-To: <200101160347.WAA01132@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 10:47:32PM -0500 References: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> <200101160347.WAA01132@cj20424-a.reston1.va.home.com> Message-ID: <20010116165322.B29674@kronos.cnri.reston.va.us> [CC'ing to the distutils-sig] On Mon, Jan 15, 2001 at 10:47:32PM -0500, Guido van Rossum wrote: >> For PEP 229, the setup.py script needs to figure out if it's running >> from the build directory, because then distutils.sysconfig needs to > >You could check for the presence of config.status -- that file is not >installed. This isn't a check suitable for inclusion in distutils.sysconfig, though, because it's so liable to being fooled (consider a Distutils-packaged module that comes with a configure script to build some library). Right now I'm using a hacked version of sysconfig with several patches like this: @@ -120,12 +121,16 @@ def get_config_h_filename(): """Return full pathname of installed config.h file.""" inc_dir = get_python_inc(plat_specific=1) + # XXX + if 1: inc_dir = '.' return os.path.join(inc_dir, "config.h") One hackish approach would be to add a assume_build_directories() to distutils.sysconfig, a little back door to be used by the setup.py script that comes with Python, so the above would become 'if build_time_flag: ...'. Anyone have a cleaner idea? --amk From akuchlin at mems-exchange.org Wed Jan 17 02:46:47 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Tue, 16 Jan 2001 20:46:47 -0500 Subject: [Python-Dev] PEP 229 issues Message-ID: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> I'm in a quandry about the patch implementing PEP 229. The patch is quite close to being ready, with only a few minor issues remaining, but to fix those issues, I need to make some changes to the Distutils, such as the sysconfig modification I recently suggested. Problem: I believe the patch *must* go in at the alpha stage, because there are bound to be lots of platform-specific problems that will show up; it should not be added in the beta stage, because it'll need time to get tested and debugged, and I wouldn't be surprised if it has to be reverted later because of some insurmountable problem. Problem: Greg Ward, the Distutils maintainer, is away at the moment. I can check in changes to the Distutils without his say-so, but when Greg gets back he might shriek in horror and rip all of the changes out again. (Or he's stuck with maintaining them until 2.2.) Problem: 2.1alpha1 is due on Friday. So, what to do? If I know there's going to be an alpha2, that's probably fine; Greg should have resurfaced by then, and the patch can go in for alpha2. Or, I can check in the changes before Friday, and if they're unacceptable, they can be fixed for alpha2/beta1, or simply backed out. Or, I can leave Distutils alone and make setup.py a tissue of hacks and workarounds. For example, it might insert new versions of various functions into the distutils.sysconf module. Icky and fragile, but cleaning it up for beta1 would then be a priority. Suggestions? Pronouncements? --amk From guido at python.org Wed Jan 17 02:39:35 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 20:39:35 -0500 Subject: [Python-Dev] PEP 229 issues In-Reply-To: Your message of "Tue, 16 Jan 2001 20:46:47 EST." <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> References: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> Message-ID: <200101170139.UAA17954@cj20424-a.reston1.va.home.com> I expect that there will be an alpha2, but I still recommend that you check in *something* that works for alpha1, to get maximal testing coverage. Alpha1 may slip a day or so (Jeremy and I are both late with our big patches, respectively nested scopes and rich comparisons, that we really want to have in alpha1). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 17 03:04:53 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 16 Jan 2001 21:04:53 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Good idea [using string.encode()]! This could also be used to > "hexify" a string, for which currently one of the quickest ways > is still the hack > > "%02x"*len(s) % tuple(s) Note that as of 2.0, a far quicker way is to use binascii.b2a_hex(), or its absurdist (read "Barry" ) synonym binascii.hexlify(). I'm wary of using string.encode() for this, because one normally hexlifies binary data (e.g., like sha checksums), and 4 days of 7 we're more than not in favor of moving away from strings to carry binary data. Of course we can change our minds about this across releases, and have even-numbered releases deprecate the function forms while odd-numbered ones abjure methods. Works for me . From nas at arctrix.com Tue Jan 16 22:08:23 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 16 Jan 2001 13:08:23 -0800 Subject: [Python-Dev] [droux@tuks.co.za: Our application doesn't work with Debian packaged Python] Message-ID: <20010116130823.C9640@glacier.fnational.com> This message was on the debian-python list. Does anyone know why the patch is needed? Neil ----- Forwarded message from Danie Roux ----- Date: Tue, 16 Jan 2001 11:44:48 +0200 From: Danie Roux Subject: Our application doesn't work with Debian packaged Python To: Debian Python Good they all, Our program is an archiver for gnome that uses gnome-python with one widget written in C. I converted our program to autoconf and automake so anyone can (and please do!) compile it and see what I mean. Everything compiles fine. But when it runs it just throws a weird exception. The funny thing is, if I alien RedHat 6.2's python package, and install that, it works! I need to change nothing else. Only the python package. I then went and look at the source rpm. They have this patch in there: --- Python-1.5.2/Python/importdl.c.global Sat Jul 17 16:52:26 1999 +++ Python-1.5.2/Python/importdl.c Sat Jul 17 16:53:19 1999 @@ -441,13 +441,13 @@ #ifdef RTLD_NOW /* RTLD_NOW: resolve externals now (i.e. core dump now if some are missing) */ - void *handle = dlopen(pathname, RTLD_NOW); + void *handle = dlopen(pathname, RTLD_NOW | RTLD_GLOBAL); #else void *handle; if (Py_VerboseFlag) printf("dlopen(\"%s\", %d);\n", pathname, - RTLD_LAZY); - handle = dlopen(pathname, RTLD_LAZY); + RTLD_LAZY | RTLD_GLOBAL); + handle = dlopen(pathname, RTLD_LAZY | RTLD_GLOBAL); #endif /* RTLD_NOW */ if (handle == NULL) { PyErr_SetString(PyExc_ImportError, dlerror()); Sure enough this fixes my problem. The thing is that this means our program only works on Redhat (and who ever patched python 1.5.2 with this). So what can I do now? How can I get this patch into debian-python? How can I change my program to not need the patch? btw the program is garchiver, it will be hosted at sourceforge as soon as they get back to me, in the mean time I will mail anyone a copy of the sources. -- Danie Roux *shuffle* Adore Unix -- To UNSUBSCRIBE, email to debian-python-request at lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster at lists.debian.org ----- End forwarded message ----- From guido at python.org Wed Jan 17 05:16:48 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 23:16:48 -0500 Subject: [Python-Dev] [droux@tuks.co.za: Our application doesn't work with Debian packaged Python] In-Reply-To: Your message of "Tue, 16 Jan 2001 13:08:23 PST." <20010116130823.C9640@glacier.fnational.com> References: <20010116130823.C9640@glacier.fnational.com> Message-ID: <200101170416.XAA20515@cj20424-a.reston1.va.home.com> > This message was on the debian-python list. Does anyone know why > the patch is needed? > - handle = dlopen(pathname, RTLD_LAZY); > + handle = dlopen(pathname, RTLD_LAZY | RTLD_GLOBAL); This comes back every once in a while. It means that they have an module whose shared library implementation exports symbols that are needed by another shared library (probably another module). IMO this approach is evil, because RTLD_GLOBAL means that *all* external symbols defined by any module are exported to all other shared libraries, and this will cause conflicts if the same symbol is exported by two different modules -- which can happen quite easily. (I don't know what happens on conflicts -- maybe you get an error, maybe it links to the wrong symbol.) The proper solution would be to put the needed entry points beside the init entry point in a separate shared library. But that's often not how quick-and-dirty extension modules are designed... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 17 05:22:54 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 23:22:54 -0500 Subject: [Python-Dev] Rich Comparisons technical prerelease Message-ID: <200101170422.XAA20626@cj20424-a.reston1.va.home.com> I've got a working version of the rich comparisons ready for preview. The patch is here: http://www.python.org/~guido/richdiff.txt It's also referenced at sourceforge: http://sourceforge.net/patch/?func=detailpatch&patch_id=103283&group_id=5470 Here's a summary: - The comparison operators support "rich comparison overloading" (PEP 207). C extension types can provide a rich comparison function in the new tp_richcompare slot in the type object. The cmp() function and the C function PyObject_Compare() first try the new rich comparison operators before trying the old 3-way comparison. There is also a new C API PyObject_RichCompare() (which also falls back on the old 3-way comparison, but does not constrain the outcome of the rich comparison to a Boolean result). The rich comparison function takes two objects (at least one of which is guaranteed to have the type that provided the function) and an integer indicating the opcode, which can be Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, Py_GE (for <, <=, ==, !=, >, >=), and returns a Python object, which may be NotImplemented (in which case the tp_compare slot function is used as a fallback, if defined). Classes can overload individual comparison operators by defining one or more of the methods__lt__, __le__, __eq__, __ne__, __gt__, __ge__. There are no explicit "reversed argument" versions of these; instead, __lt__ and __gt__ are each other's reverse, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reverse (similar at the C level). No other implications are made; in particular, Python does not assume that == is the inverse of !=, or that < is the inverse of >=. This makes it possible to define types with partial orderings. Classes or types that want to implement (in)equality tests but not the ordering operators (i.e. unordered types) should implement == and !=, and raise an error for the ordering operators. It is possible to define types whose comparison results are not Boolean; e.g. a matrix type might want to return a matrix of bits for A < B, giving elementwise comparisons. Such types should ensure that any interpretation of their value in a Boolean context raises an exception, e.g. by defining __nonzero__ (or the tp_nonzero slot at the C level) to always raise an exception. XXX TO DO for this feature: - the test "test_compare" fails, because of the changed semantics for complex number comparisons (1j<2j raises an error now) - tuple, dict should implement EQ/NE so containers containing complex numbers can be compared for equality (list is already done) -- or complex numbers should be reverted to old behavior - list.sort() shoud use rich comparison - check for memory leaks - int, long, float contain new-style-cmp functions that aren't used to their full potential any more (the new-style-cmp functions introduced by Neil's coercion work are gone again) - decide on unresolved issues from PEP 207 - documentation - more testing - compare performance to 2.0 (microbench?) Please give this a good spin -- I'm hoping to check this in and make it part of the alpha 1 release Friday... --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Wed Jan 17 05:50:25 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 16 Jan 2001 23:50:25 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' References: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> Message-ID: <14949.9361.591610.684695@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Note that as of 2.0, a far quicker way is to use TP> binascii.b2a_hex(), or its absurdist (read "Barry" ) TP> synonym binascii.hexlify(). Thanks for the compliment Tim, but I can't take credit for that name. If it was me I'd have called it wudduptify() (and its inverse, notmuchlify()). I stole the name from Emacs's hexlify-buffer function which kind of does the same thing. would-converting-to-octal-digits-be-called-octopuslify-ly y'rs, -Barry From fredrik at effbot.org Wed Jan 17 09:12:32 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 09:12:32 +0100 Subject: [Python-Dev] Re: [Patch #103248] Fix a memory leak in _sre.c References: <14948.44221.876681.838046@buffalo.fnal.gov> Message-ID: <00fe01c0805d$432d4cd0$e46940d5@hagrid> Charles G Waldman wrote: > Can you explain why you made this (seemingly arbitrary) change? > > I think that since "self" was created via: > > self = PyObject_NEW_VAR(PatternObject, &Pattern_Type, n); > > which calls PyObjectINIT, which in turn calls _Py_NewReference, which > increments _Py_RefTotal, it is incorrect to simply do a PyObject_DEL > to de-allocate it -- won't this screw up the value of _Py_RefTotal? and what do you think will happen if you call the destructor before you've initialized all pointer fields in the object? (according to the docs, the NEW/New functions return uninitialized memory. in this case, we're bailing out before the object has been fully initialized. pattern_dealloc definitely isn't prepared to deal with random pointer values...) > Admittedly this is a minor nit and only matters if Py_TRACE_REFS is > defined - I just wanted to check to make sure my understanding of > reference counting w.r.t. memory allocation and deallocation is > correct - if the above is in error, I'd apprecate any corrections... same here. I don't doubt it's working as you say it does, but I find it strange that you shouldn't be able to DEL an object you just created with NEW... maybe DEL should be fixed? Cheers /F From thomas at xs4all.net Wed Jan 17 10:48:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 17 Jan 2001 10:48:12 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules Setup.config.in,1.7,1.8 Setup.dist,1.7,1.8 In-Reply-To: ; from esr@users.sourceforge.net on Wed, Jan 17, 2001 at 12:25:13AM -0800 References: Message-ID: <20010117104812.F2776@xs4all.nl> On Wed, Jan 17, 2001 at 12:25:13AM -0800, Eric S. Raymond wrote: > + # ndbm(3) may require -lndbm or similar > + @USE_NDBM_MODULE at ndbm ndbmmodule.c @HAVE_LIBNDBM@ This is an interesting module... It's not in the Modules/ directory :-) Did you mean 'dbmmodule.c' with a different library argument ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at mojam.com Wed Jan 17 16:17:39 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 17 Jan 2001 09:17:39 -0600 (CST) Subject: [Python-Dev] Rich comparison confusion Message-ID: <14949.46995.259157.871323@beluga.mojam.com> I'm a bit confused about Guido's rich comparison stuff. In the description he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. From akuchlin at mems-exchange.org Wed Jan 17 16:42:13 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 10:42:13 -0500 Subject: [Python-Dev] PEP 229 issues In-Reply-To: <200101170139.UAA17954@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 08:39:35PM -0500 References: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> <200101170139.UAA17954@cj20424-a.reston1.va.home.com> Message-ID: <20010117104213.B490@kronos.cnri.reston.va.us> On Tue, Jan 16, 2001 at 08:39:35PM -0500, Guido van Rossum wrote: >I expect that there will be an alpha2, but I still recommend that you >check in *something* that works for alpha1, to get maximal testing >coverage. Alpha1 may slip a day or so (Jeremy and I are both late >with our big patches, respectively nested scopes and rich comparisons, >that we really want to have in alpha1). OK; thanks for the pronouncement! I've checked in all the smaller changes that shouldn't break anything. All that's left now is to actually enable the new feature, which requires the nasty changes: * In the top-level Makefile.in, the "sharedmods" target simply runs "./python setup.py build", and "sharedinstall" runs "./python setup.py install". The "clobber" target also deletes the build/ subdirectory where Distutils puts its output. * Rip stuff out of the Setup files. Modules/Setup.config.in only contains entries for the gc and thread modules; the readline, curses, and db modules are removed because it's now setup.py's job to handle them. * Modules/Setup.dist now contains entries for only 3 modules -- _sre, posix, and strop. Guido and Jeremy are rushing to finish their patches in time for the alpha release, though Guido seems to be checking in the rich comparison stuff now. I don't want to impede them by making them stop to debug build problems, so I can either wait until they've landed their changes (at which point there's nothing major left, I think), or they can simply not do a 'cvs update' after the serious changes go in. Thoughts? --amk From barry at digicool.com Wed Jan 17 16:54:06 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 17 Jan 2001 10:54:06 -0500 Subject: [Python-Dev] Breakage in latest CVS Message-ID: <14949.49182.636526.292265@anthem.wooz.org> Looks like the latest CVS (updated just minutes ago) is broken. I'm trying to fix some of these complaints, but thought I'd at least report what I've found... -Barry ... gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -I./../Include -I.. -DHAVE_CONFIG_H -c floatobject.c -o floatobject.o floatobject.c:675: warning: excess elements in struct initializer after `float_as_number' floatobject.c:700: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) floatobject.c:700: initializer element for `PyFloat_Type.tp_flags' is not constant ... intobject.c:800: warning: excess elements in struct initializer after `int_as_number' intobject.c:825: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) intobject.c:825: initializer element for `PyInt_Type.tp_flags' is not constant make[1]: *** [intobject.o] Error 1 ... gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -I./../Include -I.. -DHAVE_CONFIG_H -c longobject.c -o longobject.o longobject.c:1865: warning: excess elements in struct initializer after `long_as_number' longobject.c:1890: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) longobject.c:1890: initializer element for `PyLong_Type.tp_flags' is not constant make[1]: *** [longobject.o] Error 1 From guido at python.org Wed Jan 17 17:09:27 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Jan 2001 11:09:27 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Wed, 17 Jan 2001 09:17:39 CST." <14949.46995.259157.871323@beluga.mojam.com> References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: <200101171609.LAA04102@cj20424-a.reston1.va.home.com> > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. Yes. By this I mean that AA are interchangeable, ditto for A<=B and B>=A. Also A==B interchanges for B==A, and A!=B for B!=A. > From a boolean standpoint this just can't be so. Guido mentions partial > orderings, but I'm still confused. Consider this example: Objects of type A > implement rich comparisons. Objects of type B don't. If my code looks like > > a = A() > b = B() > ... > if b < a: > ... > > My interpretation of the rich comparison stuff is that either > > 1. Since b doesn't implement rich comparisons, the interpreter falls > back to old fashioned comparisons which may or may not allow the > comparison of B objects and A objects. > > or > > 2. The sense of the inequality is switched (a > b) and the rich > comparison code in A's implementation is called. It's case 2. > That's my reading of it. It has to be wrong. The inverse comparison should > be a >= b, not a > b, but the described pairing of comparison functions > would imply otherwise. We're trying very hard *not* to make any connections between a=b. You've learned in grade school that these are each other's Boolean inverse (a=b is false). However, for partial orderings this may not be true: for unordered a and b, none of ab, a>=b, a==b may be true. On the other hand, even for partially ordered types, aa (note: swapped arguments *and* swapped sense of comparison) always give the same outcome! > I'm sure I'm missing something obvious or revealing some fundamental failure > of my grade school education. Please explain... I think what threw you off was the ambiguity of "inverse". This means Boolean negation. I'm not relying on Boolean negation here -- I'm relying on the more fundamental property that aa have the same outcome. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh21 at cam.ac.uk Wed Jan 17 17:13:32 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 17 Jan 2001 16:13:32 +0000 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Skip Montanaro's message of "Wed, 17 Jan 2001 09:17:39 -0600 (CST)" References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: Skip Montanaro writes: > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. > >From a boolean standpoint this just can't be so. Guido mentions partial > orderings, but I'm still confused. Consider this example: Objects of type A > implement rich comparisons. Objects of type B don't. If my code looks like > > a = A() > b = B() > ... > if b < a: > ... > > My interpretation of the rich comparison stuff is that either > > 1. Since b doesn't implement rich comparisons, the interpreter falls > back to old fashioned comparisons which may or may not allow the > comparison of B objects and A objects. > > or > > 2. The sense of the inequality is switched (a > b) and the rich > comparison code in A's implementation is called. > > That's my reading of it. It has to be wrong. The inverse comparison should > be a >= b, not a > b, but the described pairing of comparison functions > would imply otherwise. > > I'm sure I'm missing something obvious or revealing some fundamental failure > of my grade school education. Please explain... For a total order: a < b if and only if b > a. This is what the rich comparison code does. a < b if and only if a >= b. This is that the rich comparison code doesn't do. Does this make sense? Cheers, M. -- Presumably pronging in the wrong place zogs it. -- Aldabra Stoddart, ucam.chat From moshez at zadka.site.co.il Thu Jan 18 01:08:06 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 18 Jan 2001 02:08:06 +0200 (IST) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <14949.46995.259157.871323@beluga.mojam.com> References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: <20010118000806.D1C04A828@darjeeling.zadka.site.co.il> On Wed, 17 Jan 2001 09:17:39 -0600 (CST), Skip Montanaro wrote: > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. I think that you're confused between two meanings of inverses. You think: op is an inverse of op' if for every a,b (a op b) = not (a op' b) Guido meant (and I hope, implemented): op is an inverse of op' if for every a,b (a op b) = (b op' a) And aa a<=b iff b>=a Sounds sane. Unless I'm the one confused.... -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From fredrik at effbot.org Wed Jan 17 17:47:29 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 17:47:29 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: Message-ID: <012901c080a5$306023a0$e46940d5@hagrid> tim wrote: > > Should I check it in? > > Absolutely! But not like as for 2.0: check it in *now*, so we have a few > days to deal with surprises before the alpha release. as it turned out, the source I had didn't build, and the table- building python script generated something that wasn't quite compatible with the C code. bit rot. I've almost sorted it all out. will check it in later tonight (local time). From tim.one at home.com Wed Jan 17 19:27:11 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 13:27:11 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Tools/idle CallTipWindow.py,1.2,1.3 CallTips.py,1.7,1.8 ClassBrowser.py,1.11,1.12 Debugger.py,1.14,1.15 Delegator.py,1.2,1.3 FileList.py,1.7,1.8 FormatParagraph.py,1.8,1.9 IdleConf.py,1.5,1.6 IdleHistory.py,1.3,1 In-Reply-To: <200101171358.IAA27661@cj20424-a.reston1.va.home.com> Message-ID: [an anonymous developer panics, after Tim "reindent"s the IDLE dir] > Oh no! > > I have a whole slew of changes to IDLE sitting in my work directory. > If I do an update half of these will turn into merge conflicts. :-( > > Don't worry, I'll get over it. I imagine this will pop up from time to time until everything is normalized. If it's about to burn you, run reindent.py on the affected directory *before* you update ("python redindent.py -v ."). That will make all the same changes to your local versions as were checked in, modulo the rare hand-edit (of which there were none in the IDLE directory). From akuchlin at mems-exchange.org Wed Jan 17 20:04:04 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 14:04:04 -0500 Subject: [Python-Dev] PEP 229 checked in Message-ID: I've checked in the last bit of the PEP 229 changes. Be sure to rename your Modules/Setup file (or do a 'make distclean' before rebuilding. Squeal if you run into trouble, or file bugs on SF. --am"Aieee!"k From jeremy at alum.mit.edu Wed Jan 17 20:12:47 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 17 Jan 2001 14:12:47 -0500 (EST) Subject: [Python-Dev] unexpected consequence of function attributes Message-ID: <14949.61103.258714.325465@localhost.localdomain> I have found one place in the library that depended on hasattr(func, '__dict__') to return false -- dis.dis. You might want to check and see if there is anything other code that doesn't expect function's to have extra attributes. I expect that only introspective code would be affected. Jeremy From barry at wooz.org Wed Jan 17 20:46:36 2001 From: barry at wooz.org (Barry A. Warsaw) Date: Wed, 17 Jan 2001 14:46:36 -0500 Subject: [Python-Dev] Re: unexpected consequence of function attributes References: <14949.61103.258714.325465@localhost.localdomain> Message-ID: <14949.63132.583025.303677@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I have found one place in the library that depended on JH> hasattr(func, '__dict__') to return false -- dis.dis. You JH> might want to check and see if there is anything other code JH> that doesn't expect function's to have extra attributes. I JH> expect that only introspective code would be affected. I guess we need a test_dis.py in the regression test suite, eh? :) Here's an extremely quick and dirty fix to dis.py. -Barry -------------------- snip snip -------------------- Index: dis.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/dis.py,v retrieving revision 1.28 diff -u -r1.28 dis.py --- dis.py 2001/01/14 23:36:05 1.28 +++ dis.py 2001/01/17 19:45:40 @@ -15,6 +15,10 @@ return if type(x) is types.InstanceType: x = x.__class__ + if hasattr(x, 'func_code'): + x = x.func_code + if hasattr(x, 'im_func'): + x = x.im_func if hasattr(x, '__dict__'): items = x.__dict__.items() items.sort() @@ -28,17 +32,12 @@ except TypeError, msg: print "Sorry:", msg print + elif hasattr(x, 'co_code'): + disassemble(x) else: - if hasattr(x, 'im_func'): - x = x.im_func - if hasattr(x, 'func_code'): - x = x.func_code - if hasattr(x, 'co_code'): - disassemble(x) - else: - raise TypeError, \ - "don't know how to disassemble %s objects" % \ - type(x).__name__ + raise TypeError, \ + "don't know how to disassemble %s objects" % \ + type(x).__name__ def distb(tb=None): """Disassemble a traceback (default: last traceback).""" From barry at wooz.org Wed Jan 17 20:49:51 2001 From: barry at wooz.org (Barry A. Warsaw) Date: Wed, 17 Jan 2001 14:49:51 -0500 Subject: [Python-Dev] Re: unexpected consequence of function attributes References: <14949.61103.258714.325465@localhost.localdomain> Message-ID: <14949.63327.22745.359978@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I have found one place in the library that depended on JH> hasattr(func, '__dict__') to return false -- dis.dis. You JH> might want to check and see if there is anything other code JH> that doesn't expect function's to have extra attributes. I JH> expect that only introspective code would be affected. Patch #103303 http://sourceforge.net/patch/?func=detailpatch&patch_id=103303&group_id=5470 From tim.one at home.com Wed Jan 17 21:51:57 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 15:51:57 -0500 Subject: [Python-Dev] Windows Python totally hosed Message-ID: Failures range from test test_winsound skipped -- Module use of python20.dll conflicts with this version of Python. to test test_tokenize crashed -- exceptions.AttributeError: 're' module has no attribute 'compile' I suspect the latter is really a disguised version of C:\Code\python\dist\src\PCbuild>python Python 2.1a1 (#8, Jan 17 2001, 13:15:23) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import re Traceback (most recent call last): File "", line 1, in ? File "c:\code\python\dist\src\lib\re.py", line 28, in ? from sre import * File "c:\code\python\dist\src\lib\sre.py", line 17, in ? import sre_compile File "c:\code\python\dist\src\lib\sre_compile.py", line 11, in ? import _sre ImportError: Module use of python20.dll conflicts with this version of Python. >>> Suspect all of this has to do with patchlevel.h changing. I'll try to dope it out, but if anyone knows the cure off the top of their head, don't be shy! From akuchlin at mems-exchange.org Wed Jan 17 22:00:56 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 16:00:56 -0500 Subject: [Python-Dev] Re: 'Setup' buglet In-Reply-To: <200101171928.OAA21460@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 17, 2001 at 02:28:36PM -0500 References: <200101171928.OAA21460@cj20424-a.reston1.va.home.com> Message-ID: <20010117160056.A20603@kronos.cnri.reston.va.us> [Taking this bug public] On Wed, Jan 17, 2001 at 02:28:36PM -0500, Guido van Rossum wrote: >One problem seems to be that the creation >of the (minimal) Modules/Setup file doesn't seem to be doing the right >thing. When I delete Modules/Setup, the next "make" doesn't create >it; it used to be copied from Setup.dist if it doesn't exist. This seems to have been removed from Modules/Makefile.pre.in in revision 1.69 by Fred; instead the configure script now copies Setup.dist to Setup, so you have to rerun configure in order to create Modules/Setup after deleting it. --amk From mal at lemburg.com Wed Jan 17 22:04:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 22:04:29 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests Message-ID: <3A6608DD.E12A2422@lemburg.com> I've just checked in a patch which removes all uses of the assert statement in the regression tests. This makes the tests compatible with the -O mode of Python and also allows centralizing error reporting (many tests already provide their own little test function for this purpose). I urge you to only check in tests which use the new API verify() to verify a certain condition. The API is defined in the regression tools module test_support. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at effbot.org Wed Jan 17 22:21:56 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 22:21:56 +0100 Subject: [Python-Dev] Windows Python totally hosed References: Message-ID: <028801c080cb$86658350$e46940d5@hagrid> tim wrote: > Suspect all of this has to do with patchlevel.h changing. I'll try to dope > it out, but if anyone knows the cure off the top of their head, don't be > shy! text.replace("python20", "python21") for all files in the PCBuild directory, plus PC/config.h From tim.one at home.com Wed Jan 17 22:42:13 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 16:42:13 -0500 Subject: [Python-Dev] Windows Python totally hosed In-Reply-To: <028801c080cb$86658350$e46940d5@hagrid> Message-ID: [/F] > text.replace("python20", "python21") for all files in > the PCBuild directory, plus PC/config.h Brrrr. It strikes me as insane to have the core Python files in an MS project file *named* after the release number (python20.dsp). So I'm going to change that to core.dsp so that at least that much never needs to be changed again. gratefully y'rs - tim From fredrik at effbot.org Wed Jan 17 22:47:28 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 22:47:28 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests References: <3A6608DD.E12A2422@lemburg.com> Message-ID: <02b401c080cf$1a3a5530$e46940d5@hagrid> mal wrote: > I urge you to only check in tests which use the new API > verify() to verify a certain condition. The API is defined > in the regression tools module test_support. did you run the test yourself after applying that patch? (a patch to the patch is on the way in. please check that the test suite still runs on non-Windows boxes...) From gstein at lyra.org Wed Jan 17 22:45:44 2001 From: gstein at lyra.org (Greg Stein) Date: Wed, 17 Jan 2001 13:45:44 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.106,2.107 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Wed, Jan 17, 2001 at 01:27:04PM -0800 References: Message-ID: <20010117134544.H7731@lyra.org> On Wed, Jan 17, 2001 at 01:27:04PM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv14991 > > Modified Files: > object.c > Log Message: > Deal properly (?) with comparing recursive datastructures. >... > - Change the in-progress code to use static variables instead of > globals (both the nesting level and the key for the thread dict were > globals but have no reason to be globals; the key can even be a > function-static variable in get_inprogress_dict()). The "compare_nesting" variable is a bit troublesome long-term -- it will cause threading issues in a free-threaded implementation. The solution is to put the value into the thread-state. [ not sure if it matters right now, but just bringing it up ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Wed Jan 17 22:55:02 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 17 Jan 2001 16:55:02 -0500 (EST) Subject: [Python-Dev] [PEP 205] weak references patch Message-ID: <14950.5302.356566.778486@cj42289-a.reston1.va.home.com> I've updated the patch that implements PEP 205: http://sourceforge.net/patch/?func=detailpatch&patch_id=103203&group_id=5470 The actual patch is too big for SF: http://starship.python.net/crew/fdrake/patches/weakref.patch-5 One thing about this is that it changes some of the low-level object creation macros, so you'll need to do a "make clean" before "make" when testing it. Have fun! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at lemburg.com Wed Jan 17 23:16:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 23:16:29 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests References: <3A6608DD.E12A2422@lemburg.com> <02b401c080cf$1a3a5530$e46940d5@hagrid> Message-ID: <3A6619BD.2AC8F6D3@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > I urge you to only check in tests which use the new API > > verify() to verify a certain condition. The API is defined > > in the regression tools module test_support. > > did you run the test yourself after applying that patch? Yes, but as I wrote in the SF patch message: I can only test it on Linux and there not all tests are run due to missing extensions. The alpha testing will hopefully catch all possible bugs this patch introduced. > (a patch to the patch is on the way in. please check > that the test suite still runs on non-Windows boxes...) I'll have to leave that to the Windows wizards, sorry. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Wed Jan 17 23:49:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 17 Jan 2001 23:49:25 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: ; from akuchlin@mems-exchange.org on Wed, Jan 17, 2001 at 02:04:04PM -0500 References: Message-ID: <20010117234925.A17392@xs4all.nl> On Wed, Jan 17, 2001 at 02:04:04PM -0500, Andrew Kuchling wrote: > I've checked in the last bit of the PEP 229 changes. Be sure to > rename your Modules/Setup file (or do a 'make distclean' before > rebuilding. make distclean doesn't remove Modules/Setup anymore :) Also, I couldn't get it to work with an old tree, even after several make distclean/reconfigures. I got tired looking for it, so I just grabbed a new tree. > Squeal if you run into trouble, or file bugs on SF. I have a couple of questions: what to do when setup.py doesn't work ? Is there a way to make it bypass a module ? What about specifying include dirs manually, for some modules (for instance, when you have readline source in a separate directory, and want to link it statically.) Here are are some specific squeals. See at the bottom for the most important one :) On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by setup.py. Also, SSL support for the socket module was not enabled, though OpenSSL is installed, in the default path. On Debian GNU/Linux' 'woody', the 'testing' (soon 'stable') branch, I can't compile dbmmodule: building 'dbm' extension gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -fpic -I. -I/home/thomas/python/python/dist/src/./Include -IInclude/ -c /home/thomas/python/python/dist/src/Modules/dbmmodule.c -o build/temp.linux-i686-2.1/dbmmodule.o /home/thomas/python/python/dist/src/Modules/dbmmodule.c:24: #error "No ndbm.h available!" error: command 'gcc' failed with exit status 1 make: *** [sharedmods] Error 1 (ndbm.h does exist, as /usr/include/db1/ndbm.h. There is also /usr/include/gdbm-ndbm.h, but I'm not sure if that's the same.) Nor can I build the _tkinter module there: building '_tkinter' extension gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -fpic -DWITH_APPINIT=1 -I/usr/X11R6/include -I. -I/home/thomas/python/python/dist/src/./Include -IInclude/ -c /home/thomas/python/python/dist/src/Modules/_tkinter.c -o build/temp.linux-i686-2.1/_tkinter.o /home/thomas/python/python/dist/src/Modules/_tkinter.c:44: tcl.h: No such file or directory In file included from /home/thomas/python/python/dist/src/Modules/_tkinter.c:45:/usr/include/tk.h:66: tcl.h: No such file or directory error: command 'gcc' failed with exit status 1 make: *** [sharedmods] Error 1 The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, which I personally like a lot, though it's probably a bitch to autodetect. (I tried, using autoconf ;-P) On Debian GNU/Linux 'sid', the current unstable branch, I can't compile Python at all, now: c++ -Xlinker -export-dynamic python.o \ ../libpython2.1.a -lpthread -ldl -lutil -lm -o python ../libpython2.1.a(posixmodule.o): In function `posix_tmpnam': /home/thomas/python/python-write/dist/src/Modules/./posixmodule.c:4115: the use of `tmpnam_r' is dangerous, better use `mkstemp' ../libpython2.1.a(posixmodule.o): In function `posix_tempnam': /home/thomas/python/python-write/dist/src/Modules/./posixmodule.c:4071: the use of `tempnam' is dangerous, better use `mkstemp' mv python ../python make[1]: Leaving directory `/home/thomas/python/python-write/dist/src/Modules' ./python ./setup.py build running build running build_ext Traceback (most recent call last): File "./setup.py", line 460, in ? main() File "./setup.py", line 455, in main ext_modules=[Extension('struct', ['structmodule.c'])] File "/home/thomas/python/python-write/dist/src/Lib/distutils/core.py", line 138, in setup dist.run_commands() File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 871, in run_commands self.run_command(cmd) File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 891, in run_command cmd_obj.run() File "/home/thomas/python/python-write/dist/src/Lib/distutils/command/build.py", line 106, in run self.run_command(cmd_name) File "/home/thomas/python/python-write/dist/src/Lib/distutils/cmd.py", line 328, in run_command self.distribution.run_command(command) File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 891, in run_command cmd_obj.run() File "/home/thomas/python/python-write/dist/src/Lib/distutils/command/build_ext.py", line 202, in run customize_compiler(self.compiler) File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 121, in customize_compiler (cc, opt, ccshared, ldshared, so_ext) = \ File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 389, in get_config_vars func() File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 302, in _init_posix raise DistutilsPlatformError, my_msg distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) make: *** [sharedmods] Error 1 For the record, I don't have a /usr/lib/python2.1 directory on the other machines either. I haven't been able to test FreeBSD yet, will get to that later tonight. And most importantly(!), on all these machines, 'make test' stops functioning. In fact, after setup.py started building, you can't run 'make' without 'make clean' anymore. You get a lot of undefined-symbol warnings (see below.) If you run 'make clean;make test' it also doesn't work, because the build directory is not in the Python library path, and regrtest.py requires (at least) the time module. c++ -Xlinker -export-dynamic python.o \ ../libpython2.1.a -lpthread -ldl -lutil -lm -o python ../libpython2.1.a(posixmodule.o): In function `posix_tmpnam': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:4115: the use of `tmpnam_r' is dangerous, better use `mkstemp' ../libpython2.1.a(posixmodule.o): In function `posix_tempnam': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:4071: the use of `tempnam' is dangerous, better use `mkstemp' ../libpython2.1.a(myreadline.o): In function `my_fgets': /home/thomas/python/python/dist/src/Parser/myreadline.c:41: undefined reference to `PyOS_InterruptOccurred' /home/thomas/python/python/dist/src/Parser/myreadline.c:35: undefined reference to `PyOS_InterruptOccurred' ../libpython2.1.a(errors.o): In function `PyErr_SetFromErrnoWithFilename': /home/thomas/python/python/dist/src/Python/errors.c:260: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(pythonrun.o): In function `Py_Finalize': /home/thomas/python/python/dist/src/Python/pythonrun.c:193: undefined reference to `PyOS_FiniInterrupts' ../libpython2.1.a(pythonrun.o): In function `initsigs': /home/thomas/python/python/dist/src/Python/pythonrun.c:1161: undefined reference to `PyOS_InitInterrupts' ../libpython2.1.a(traceback.o): In function `tb_printinternal': /home/thomas/python/python/dist/src/Python/traceback.c:213: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(fileobject.o): In function `get_line': /home/thomas/python/python/dist/src/Objects/fileobject.c:883: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `long_format': /home/thomas/python/python/dist/src/Objects/longobject.c:644: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `x_divrem': /home/thomas/python/python/dist/src/Objects/longobject.c:855: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `long_mul': /home/thomas/python/python/dist/src/Objects/longobject.c:1193: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(object.o):/home/thomas/python/python/dist/src/Objects/object.c:174: more undefined references to `PyErr_CheckSignals' follow ../libpython2.1.a(posixmodule.o): In function `posix_fork': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:1666: undefined reference to `PyOS_AfterFork' ../libpython2.1.a(posixmodule.o): In function `posix_forkpty': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:1733: undefined reference to `PyOS_AfterFork' collect2: ld returned 1 exit status make[1]: *** [link] Error 1 make[1]: Leaving directory `/home/thomas/python/python/dist/src/Modules' make: *** [python] Error 2 -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Wed Jan 17 23:56:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 23:56:58 +0100 Subject: [Python-Dev] Standard install locations for Python ? Message-ID: <3A66233A.A6AE07BD@lemburg.com> I'm currently busy building new version of my mx packages. While trying to convert all of them to distutils I found that there seems to be no standard for installing documentation or other data files of Python extensions. I also noted, that for Windows the standard extension installation defaults to \Python instead of some \Python\Site-Packages. So the general question is: Where should Python extensions install themselves and their docs ? (On Linux the typical place for docs is /usr/doc/packages, for Python code it is /usr/local/lib/pythonX.X/site-packages, BTW) Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Thu Jan 18 00:04:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 17 Jan 2001 18:04:09 -0500 Subject: [Python-Dev] Rich Comparisons technical prerelease In-Reply-To: <200101170422.XAA20626@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 11:22:54PM -0500 References: <200101170422.XAA20626@cj20424-a.reston1.va.home.com> Message-ID: <20010117180409.A17897@thyrsus.com> Guido van Rossum : > This makes it possible to define types with partial orderings. Guido's time machine is working again, and seems now to have been augmented by telepathy. I was just thinking about bugging him about this... I will definitely check this out with my set() class -- it was waiting on rich comparisons so I could do partial-orderings properly. If it works, we'll have set algebra for the standard library. Coolness. -- Eric S. Raymond Under democracy one party always devotes its chief energies to trying to prove that the other party is unfit to rule--and both commonly succeed, and are right... The United States has never developed an aristocracy really disinterested or an intelligentsia really intelligent. Its history is simply a record of vacillations between two gangs of frauds. --- H. L. Mencken From akuchlin at mems-exchange.org Thu Jan 18 00:09:47 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 18:09:47 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117234925.A17392@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 17, 2001 at 11:49:25PM +0100 References: <20010117234925.A17392@xs4all.nl> Message-ID: <20010117180947.E9384@kronos.cnri.reston.va.us> On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: >I have a couple of questions: what to do when setup.py doesn't work ? Is >there a way to make it bypass a module ? What about specifying include dirs There's a 'disabled_module_list' global in the code, but no way to set it from the command-line yet, since I couldn't figure out how to do that in time. >On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by >setup.py. Also, SSL support for the socket module was not enabled, though >OpenSSL is installed, in the default path. Can you take a look at the detection code in setup.py and see what's going wrong. I believe it should be found if OpenSSL is in /usr/local/, but /usr/contrib isn't checked currently. >The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, >which I personally like a lot, though it's probably a bitch to autodetect. >(I tried, using autoconf ;-P) There's code to handle Debian, though I have no way of testing it, and it worked on Neil's Debian box for some reason. Search for debian_tcl_include in setup.py, and see if you can fix it. >distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) Are you sure setup.py is up to date; do a 'cvs update setup.py' to check. You might get a "setup.py is in the way; remove it' message if you downloaded the first setup.py script manually. >without 'make clean' anymore. You get a lot of undefined-symbol warnings >(see below.) If you run 'make clean;make test' it also doesn't work, because >the build directory is not in the Python library path, and regrtest.py >requires (at least) the time module. Again, be sure the tree is up to date; I think this stems from attempting to compile the signal module as shared, which doesn't work. I know that "make test" doesn't work, but am not sure how to fix it yet. --amk From tim.one at home.com Thu Jan 18 00:42:24 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 18:42:24 -0500 Subject: [Python-Dev] Windows Python totally rad Message-ID: Windows Python runs normally again, modulo four test failures I figure are due to the "get rid of assert" patch. Note that the python20 DevStudio subproject is gone. It's been replaced by a new subproject named pythoncore. From thomas at xs4all.net Thu Jan 18 00:44:00 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 00:44:00 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117234925.A17392@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 17, 2001 at 11:49:25PM +0100 References: <20010117234925.A17392@xs4all.nl> Message-ID: <20010118004400.B17392@xs4all.nl> On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: I got around to testing on FreeBSD now, and it actually went pretty smooth! However, some small points: > On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > setup.py. Also, SSL support for the socket module was not enabled, though > OpenSSL is installed, in the default path. Curiously enough, FreeBSD, with OpenSSL installed in /usr/include/openssl, *did* get the socketmodule compiled with SSL support, but without the necessary -I directive, so the compile failed. > And most importantly(!), on all these machines, 'make test' stops > functioning. In fact, after setup.py started building, you can't run 'make' > without 'make clean' anymore. You get a lot of undefined-symbol warnings Strangely enough, this problem does not exist on FreeBSD. I can run 'make' or 'make test' after 'make' just fine. 'make test' still doesn't work because of the incorrect library path, but it doesn't barf like the other systems (BSDI and Debian Linux) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr at thyrsus.com Thu Jan 18 01:32:53 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 17 Jan 2001 19:32:53 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <20010118000806.D1C04A828@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 18, 2001 at 02:08:06AM +0200 References: <14949.46995.259157.871323@beluga.mojam.com> <20010118000806.D1C04A828@darjeeling.zadka.site.co.il> Message-ID: <20010117193253.A18565@thyrsus.com> Moshe Zadka : > I think that you're confused between two meanings of inverses. > > You think: > op is an inverse of op' if for every a,b (a op b) = not (a op' b) > > Guido meant (and I hope, implemented): > op is an inverse of op' if for every a,b (a op b) = (b op' a) I thought the same. if (a op1 b) <=> (b op2 a), op2 is properly described as the "reflection" of op1, and vice-versa. -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From greg at cosc.canterbury.ac.nz Thu Jan 18 01:22:11 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Jan 2001 13:22:11 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> Michael Hudson : > a < b if and only if b > a. > This is what the rich comparison code does. Someone is bound to come up with a use for comparison operator overloading in which this isn't true, just to be difficult! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at python.org Thu Jan 18 04:40:31 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Jan 2001 22:40:31 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.106,2.107 In-Reply-To: Your message of "Wed, 17 Jan 2001 13:45:44 PST." <20010117134544.H7731@lyra.org> References: <20010117134544.H7731@lyra.org> Message-ID: <200101180340.WAA00655@cj20424-a.reston1.va.home.com> > > - Change the in-progress code to use static variables instead of > > globals (both the nesting level and the key for the thread dict were > > globals but have no reason to be globals; the key can even be a > > function-static variable in get_inprogress_dict()). > > The "compare_nesting" variable is a bit troublesome long-term -- it will > cause threading issues in a free-threaded implementation. The solution is to > put the value into the thread-state. > > [ not sure if it matters right now, but just bringing it up ] Good point -- especially since the in-progress-dict is already part of the thread state. Jeremy explained to me that the compare_nesting variable is mostly an optimization (avoiding the work with the in-progress-dict when we don't know for sure that it's worth it) but yes, mixing nesting levels (even if the dicts are separate) could cause coupling or interference between threads... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Thu Jan 18 05:20:30 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 17 Jan 2001 22:20:30 -0600 (CST) Subject: [Python-Dev] urllib.urlencode & repeated values Message-ID: <14950.28430.572215.10643@beluga.mojam.com> I'm pretty sure this has come up before, but urllib.urlencode doesn't handle repeated parameters properly. If I call urllib.urlencode({"performers": ("U2","Lawrence Martin")}) instead of getting performers=U2&performers=Lawrence+Martin I get a quoted stringified tuple: performers=%28%27U2%27%2c+%27Lawrence+Martin%27%29 Obviously, fixing this will change the function's current semantics, but I think it's worth treating lists and tuples (actually, any sequence) as repeated values. If the existing semantics are deemed valuable enough, a third default parameter could be added to switch on the new behavior when desired. If others agree I'd be happy to whip up a patch. I think it's a bug. Skip From jeremy at alum.mit.edu Thu Jan 18 03:58:19 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 17 Jan 2001 21:58:19 -0500 (EST) Subject: [Python-Dev] bug in grammar Message-ID: <14950.23499.275398.963621@localhost.localdomain> As part of the implementation of PEP 227 (and in an attempt to reach some low-hanging fruit Guido mentioned on the types-sig long ago), I have been working on a compiler pass that generates a module-level symbol table. I recently discovered a bug in the handling of list comprehensions that was giving me headaches. I realize now that the problem is with the current grammar and/or compiler. Here's a simple demonstration; try it in your friendly python 2.0 interpreter. >>> [i for i in range(10)] = (1, 2, 3) Traceback (most recent call last): File "", line 1, in ? ValueError: unpack list of wrong size The generated bytecode is: 0 SET_LINENO 0 3 SET_LINENO 1 6 LOAD_CONST 0 (1) 9 LOAD_CONST 1 (2) 12 LOAD_CONST 2 (3) 15 BUILD_TUPLE 3 18 UNPACK_SEQUENCE 1 21 STORE_NAME 0 (i) 24 LOAD_CONST 3 (None) 27 RETURN_VALUE I assume this isn't intended :-). The compiler is ignoring everything after the initial atom in the list comprehension. It's basically compiling the code as if it were: [i] = (1, 2, 3) I'm not sure how to try and fix this. Should the grammar allow one to construct the example statement above? If not, I'm not sure how to fix the grammar. If not, I suppose the compiler should detect that the list comp is misplaced. This seems fairly messy, since there are about 10 nodes between the expr_stmt and the list_for. Or is this a cool way to use list comprehensions to generate ValueErrors? Jeremy From akuchlin at mems-exchange.org Thu Jan 18 06:19:31 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Thu, 18 Jan 2001 00:19:31 -0500 Subject: [Python-Dev] Embedded language discussion Message-ID: <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> http://www.kuro5hin.org/?op=displaystory;sid=2001/1/16/11334/2280 The poster is on a project that's trying to use Python, but they're encountering unspecified problems (perhaps because of the global interpreter lock). --amk From mal at lemburg.com Thu Jan 18 10:32:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 10:32:54 +0100 Subject: [Python-Dev] Windows Python totally rad References: Message-ID: <3A66B846.3D24B959@lemburg.com> Tim Peters wrote: > > Windows Python runs normally again, modulo four test failures I figure are > due to the "get rid of assert" patch. Could you tell me which these are ? The tests tested all passed just fine, so I guess these must be Windows-related problems. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at effbot.org Thu Jan 18 07:48:41 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 18 Jan 2001 07:48:41 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <012901c080a5$306023a0$e46940d5@hagrid> Message-ID: <008701c0811a$b3371c00$e46940d5@hagrid> I wrote: > I've almost sorted it all out. will check it in later tonight (local > time). python build problems and real life got in the way. will 2.1a1 be released according to plan? will there be a 2.1a2 release? maybe I should postpone this? From esr at thyrsus.com Thu Jan 18 08:23:21 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 18 Jan 2001 02:23:21 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? Message-ID: <20010118022321.A9021@thyrsus.com> So I'm writing a module to that needs to generate unique cookies. The module will run inside one of two environments: (1) a trivial test wrapper, not threaded, and (2) a lomg-running multithreaded server. Because Python garbage-collects, hash() of a just-created object isn't good enough. Because we may be threading, millisecond time isn't good enough. Because we may *not* be threading, thread ID isn't good either. On the other hand, I'm on Linux getting millisecond time resolution. And it's not hard to notice that an object hash is a memory address. So, how about `time.time()` + hex(hash([]))? It looks to me like this will remain unique forever, because another thread would have to create an object at the same memory address during the same millisecond to collide. Furthermore, it looks to me like this hack might be portable to any OS with a clock tick shorter than its timeslice. Comments? -- Eric S. Raymond Good intentions will always be pleaded for every assumption of authority. It is hardly too strong to say that the Constitution was made to guard the people against the dangers of good intentions. There are men in all ages who mean to govern well, but they mean to govern. They promise to be good masters, but they mean to be masters. -- Daniel Webster From ping at lfw.org Thu Jan 18 10:29:13 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 01:29:13 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101161402.JAA05045@cj20424-a.reston1.va.home.com> Message-ID: On Tue, 16 Jan 2001, Guido van Rossum wrote: > You mean the tp_print and tp_str function slots in type objects, > right? tp_print *should* always render exactly the same as tp_str. > tp_print is used by the print statement, not by value display at the > interactive prompt. Uh, i hate to disagree with you about your own interpreter, but: com_expr_stmt in Python/compile.c inserts a PRINT_EXPR opcode if c_interactive is true; eval_code2 in Python/ceval.c handles PRINT_EXPR by calling displayhook; sys_displayhook in Python/sysmodule.c prints the object by calling PyFile_WriteObject on sys.stdout; PyFile_WriteObject in Objects/fileobject.c calls PyObject_Print if the file is really a PyFileObject; PyObject_Print in Objects/object.c calls op->ob_type->tp_print if it's not NULL. The print statement produces a PRINT_ITEM opcode, which invokes PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW flag is propagated down to PyObject_Print and into string_print, where it causes the string to fwrite itself directly without quoting. > So, string_print most definitely should *not* be changed -- only > string_repr! I had to change them both before i actually saw the change in the interactive interpreter. Actually, your statement above (that the two should always render the same) seems to imply that if i change one, i must also change the other. -- ?!ng From sjoerd at oratrix.nl Thu Jan 18 11:11:09 2001 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Thu, 18 Jan 2001 11:11:09 +0100 Subject: [Python-Dev] distutils in Python 2.1 not ready for prime time Message-ID: <20010118101110.6D29C31E1B8@bireme.oratrix.nl> I just updated my copy of python with the current CVS version and I am not happy. The current version uses distutils for configuring and compiling most modules that are written in C. That is a nice idea in theory, but in practice it's not ready for prime time yet. The major advantage of using a Setup file is that you can add your own -I and -L compiler flags on a module-by-module basis. I *need* those flags since not all libraries and include files are in standard places (e.g. I need -I/usr/local/include and -L/usr/local/lib for some modules which my compiler doesn't provide by itself). There seems to be no way to tell distutils to supply those flags. The documentation (only on the web site, also not great, but I assume more documentation (at least an up-to-date README) will be provided in the final release) says that that has not yet been implemented. -- Sjoerd Mullender From ping at lfw.org Thu Jan 18 11:14:19 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 02:14:19 -0800 (PST) Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: <3A66BCCC.14997FE3@lemburg.com> Message-ID: I hope you don't mind that i'm taking this over to python-dev, because it led me to discover a more general issue (see below). For the others on python-dev, here's the background: MAL was about to check in the unistr() function, described as follows: > This patch adds a utility function unistr() which works just like > the standard builtin str() -- only that the return value will > always be a Unicode object. > > The patch also adds a new object level C API PyObject_Unicode() > which complements PyObject_Str(). I responded: > Why are unistr() and unicode() two separate functions? > > str() performs one task: convert to string. It can convert anything, > including strings or Unicode strings, numbers, instances, etc. > > The other type-named functions e.g. int(), long(), float(), list(), > tuple() are similar in intent. > > Why have unicode() just for converting strings to Unicode strings, > and unistr() for converting everything else to a Unicode string? > What does unistr(x) do differently from unicode(x) if x is a string? MAL responded: > unistr() is meant to complement str() very closely. unicode() > works as constructor for Unicode objects which can also take > care of decoding encoded data. str() and unistr() don't provide > this capability but instead always assume the default encoding. > > There's also a subtle difference in that str() and unistr() > try the tp_str slot which unicode() doesn't. unicode() > supports any character buffer which str() and unistr() don't. Okay, given this explanation, i still feel fairly confident that unicode() should subsume unistr(). Many of the other type-named functions try various slots: int() looks for __int__ float() looks for __float__ long() looks for __long__ str() looks for __str__ In testing this i also discovered the following: >>> class Foo: ... def __int__(self): ... return 3 ... >>> f = Foo() >>> int(f) 3 >>> long(f) Traceback (most recent call last): File "", line 1, in ? AttributeError: Foo instance has no attribute '__long__' >>> float(f) Traceback (most recent call last): File "", line 1, in ? AttributeError: Foo instance has no attribute '__float__' This is kind of surprising. How about: int() looks for __int__ float() looks for __float__, then tries __int__ long() looks for __long__, then tries __int__ str() looks for __str__ unicode() looks for __unicode__, then tries __str__ The extra parameter to unicode() is very similar to the extra parameter to int(), so i think there is a natural parallel here. Hmm... what about the other types? Wow!! __complex__ can produce a segfault! >>> complex >>> class Foo: ... def __complex__(self): return 3 ... >>> Foo() <__main__.Foo instance at 0x81e8684> >>> f = _ >>> complex(f) Segmentation fault (core dumped) This happens because builtin_complex first retrieves and saves the PyNumberMethods of the argument (in this case, from the instance), then tries to call __complex__ (in this case, returning 3), and THEN coerces the result using nbr->nb_float if the result is not complex! (This calls the instance's nb_float method on the integer object 3!!) I think __complex__ should probably look for __complex__, then __float__, then __int__. One could argue for __list__, __tuple__, or __dict__, but that seems much weaker; the Pythonic way has always been to implement __getitem__ instead. There is no built-in dict(); if it existed i suppose it would do the opposite of x.items(); again a weak argument, though i might have found such a function useful once or twice. And that about covers the built-in types for data. -- ?!ng From ping at lfw.org Thu Jan 18 11:16:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 02:16:42 -0800 (PST) Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Message-ID: On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > str() looks for __str__ Oops. I forgot that str() looks for __str__, then tries __repr__ So, presumably, unicode() should look for __unicode__, then __str__, then __repr__ -- ?!ng From mal at lemburg.com Thu Jan 18 11:51:46 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 11:51:46 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: Message-ID: <3A66CAC2.74FC894@lemburg.com> Ka-Ping Yee wrote: > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > str() looks for __str__ > > Oops. I forgot that > > str() looks for __str__, then tries __repr__ > > So, presumably, > > unicode() should look for __unicode__, then __str__, then __repr__ Not quite... str() does this: 1. strings are passed back as-is 2. the type slot tp_str is tried 3. the method __str__ is tried 4. Unicode returns are converted to strings 5. anything other than a string return value is rejected unistr() does the same, but makes sure that the return value is an Unicode object. unicode() does the following: 1. for instances, __str__ is called 2. Unicode objects are returned as-is 3. string objects or character buffers are used as basis for decoding 4. decoding is applied to the character buffer and the results are returned I think we should perhaps merge the two approaches into one which then applies all of the above in unicode() (and then forget about unistr()). This might lose hide some type errors, but since all other generic constructors behave more or less in the same way, I think unicode() should too. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin at mira.cs.tu-berlin.de Thu Jan 18 11:48:30 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 11:48:30 +0100 Subject: [Python-Dev] Having extensions builtin Message-ID: <200101181048.f0IAmU210251@mira.informatik.hu-berlin.de> With the new distutils configuration scheme, it appears to be difficult to build modules in a non-shared way. Building modules non-shared is desirable when freezing is attempted, and also to reduce the startup time and memory consumption. It is still possible to add modules to Setup or Setup.local, so that they will be build into the interpreter. However, setup.py will still build them in a shared way afterwards. I propose that setup.py builds only those modules that are not builtin. Regards, Martin From martin at mira.cs.tu-berlin.de Thu Jan 18 13:20:06 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 13:20:06 +0100 Subject: [Python-Dev] Standard install locations for Python ? Message-ID: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> > Where should Python extensions install themselves and their docs? I feel that extensions should not need to care. For extensions, distutils will pick a location, and the system administrator configuration the package can chose a different location. Unfortunately, distutils does not support the installation of documentation, which I think it should. Now switching sides, as an administrator, I'd wish distutils to follow the system conventions by default. That means on Linux, documentation should go into the system's directory, which is /usr/share/doc according to latest standards. Distributions vary, so distutils should find out - e.g. by querying the location from rpm. In addition, when building RPMs, distutils should declare these files as %doc in the spec file, so RPM will install it following the system conventions. On Windows, the convention apparently is to put the documentation "nearby" the software, so it should probably go into Doc or a subdirectory thereof. On Unix, there appears to be no standard location, unless the documentation consists of man pages or perhaps info files. So /share/doc is probably a place as good as any other. Regards, Martin From martin at mira.cs.tu-berlin.de Thu Jan 18 11:39:30 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 11:39:30 +0100 Subject: [Python-Dev] SSL detection problem Message-ID: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> The distutils-based configuration fails to build on my system (SuSE 7.0) with the error /usr/src/python/Modules/socketmodule.c:159: rsa.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:160: crypto.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:161: x509.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:162: pem.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:163: ssl.h: Datei oder Verzeichnis nicht gefunden The problem is that these header files are in /usr/include/openssl, which is not in the standard include search path. So the obvious request is: could this be fixed? I guess when setup.py finds the openssl library, it should also try to find ssl.h, in some obvious locations. The not-so-obvious question: How can one work-around such a problem with the new setup scheme? In the old scheme, I could have chosen to either provide the right -I option in Modules/Setup, to disable SSL support, or to disable the _socket module altogether. How can I achieve either configuration with the new scheme? Regards, Martin P.S. As a quick hack, I added a custom include_dirs parameter to the SSL extension. From martin at mira.cs.tu-berlin.de Thu Jan 18 13:39:54 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 13:39:54 +0100 Subject: [Python-Dev] bug in grammar Message-ID: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> > Should the grammar allow one to construct the example statement > above? It should not. Please note that the grammar allows a number of other things, e.g. a+b = c (pass this to parser.suite to see details) > If not, I'm not sure how to fix the grammar. The central problem is that it allows testlist on the LHS of an augassign or '=', whereas the languages only allows a small subset in that position. It is not possible to restrict the grammar in itself, as that will necessarily produce a conflict - you only know that the '+' was incorrect when you see the '='. > I suppose the compiler should detect that the list comp is misplaced I think there should be a well-formedness pass in-between. I.e. after the AST has been build, a single pass should descend through the tree, looking for an expr_statement with more than a single testlist. Once it finds one, it should confirm that this really is a well-formed lvalue (in C speak). In this case, the test should be that each term is a an atom without factors. If the parser itself performs such checks, the compiler could be simplified in many places, I guess. Regards, Martin From thomas at xs4all.net Thu Jan 18 10:53:14 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 10:53:14 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117180947.E9384@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Wed, Jan 17, 2001 at 06:09:47PM -0500 References: <20010117234925.A17392@xs4all.nl> <20010117180947.E9384@kronos.cnri.reston.va.us> Message-ID: <20010118105314.D17392@xs4all.nl> On Wed, Jan 17, 2001 at 06:09:47PM -0500, Andrew Kuchling wrote: > >On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > >setup.py. Also, SSL support for the socket module was not enabled, though > >OpenSSL is installed, in the default path. > > Can you take a look at the detection code in setup.py and see what's > going wrong. I believe it should be found if OpenSSL is in > /usr/local/, but /usr/contrib isn't checked currently. Well, OpenSSL rests in the default location, which is /usr/local/ssl/include/openssl. Haven't the time to look into it right now, sorry. > >The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, > >which I personally like a lot, though it's probably a bitch to autodetect. > >(I tried, using autoconf ;-P) > There's code to handle Debian, though I have no way of testing it, and > it worked on Neil's Debian box for some reason. Search for > debian_tcl_include in setup.py, and see if you can fix it. Ah, yes. The problem in my case is that the *library* files are just in /usr/lib, but the include files are not. I re-indented the code to pull the debian-specific code out of the 'if prefix + os.sep + 'lib' not in lib_dirs' block, and it works now. Haven't tested it on other code yet, but I think it should work regardless. > >distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) > Are you sure setup.py is up to date; do a 'cvs update setup.py' to check. > You might get a "setup.py is in the way; remove it' message if you > downloaded the first setup.py script manually. D'oh, I guess not. I thought I did (I did on all other platforms :) but I guess I didn't, 'cause it works now. Thanx. > >without 'make clean' anymore. You get a lot of undefined-symbol warnings > >(see below.) If you run 'make clean;make test' it also doesn't work, because > >the build directory is not in the Python library path, and regrtest.py > >requires (at least) the time module. > Again, be sure the tree is up to date; I think this stems from > attempting to compile the signal module as shared, which doesn't work. This happened even with completely fresh, newly checked out trees, on all but FreeBSD (three different trees: Debian woody, BSDI 4.0 and BSDI 4.1) so I'm pretty sure that's not it. It works now, though, so I guess the move from a dynamic signalmodule to a static one does the trick ;) I got 'make test' working by applying the following patch to Makefile{,.in}, and running 'make PYTHONPATH=.: test' (determining builddir by hand, for now.): *************** *** 216,223 **** TESTPYTHON= ./python$(EXE) -tt test: all -rm -f $(srcdir)/Lib/test/*.py[co] ! -PYTHONPATH= $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) ! PYTHONPATH= $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) # Install everything install: altinstall bininstall maninstall --- 216,223 ---- TESTPYTHON= ./python$(EXE) -tt test: all -rm -f $(srcdir)/Lib/test/*.py[co] ! -PYTHONPATH=$(PYTHONPATH) $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) ! PYTHONPATH=$(PYTHONPATH) $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) # Install everything install: altinstall bininstall maninstall And because of that, I also noticed something funny: BSDI calls itself 'BSD/OS ', so distutils actually makes a directory called 'lib.bsd' and 'temp.bsd', with inside those a directory 'os--i386-2.1'. Is that a distutils bug, a setup.py bug, or intentional behaviour of one of the two ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas at arctrix.com Thu Jan 18 08:59:22 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 17 Jan 2001 23:59:22 -0800 Subject: [Python-Dev] new Makefile.in Message-ID: <20010117235922.A12356@glacier.fnational.com> Spurred on by comments made by Andrew, I spent some time last night overhauling the Python Makefiles. I now have a toplevel non-recursive Makefile.in that seems to work fairly well. I'm pretty sure it still should be portable. It doesn't use includes or any special GNU make features. It is half the size of the old Makefiles. The build is faster and its now easier to follow if something goes wrong. A question: is it possible to break the Python static library up? For example, instead of having libpython.a have Parser/parser.a, Objects/objects.a, etc? There would still only be one shared library. This would speed up incremental builds and also help Andrew with PEP 229. I'm thinking that the Makefile do something like this: all: python$(EXE) PYLIBS= Parser/parser.a Objects/objects.a ... Modules/modules.a python$(EXE): $(PYLIBS) $(LINKCC) -o python$(EXE) $(PYLIBS) ... Modules/modules.a: minpython$(EXE) ./minpython$(EXE) setup.py AFACT, the only thing affected by splitting up the static library is Misc/Makefile.pre.in. Is this correct? Neil From guido at digicool.com Thu Jan 18 15:52:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 09:52:23 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Thu, 18 Jan 2001 13:22:11 +1300." <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> References: <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> Message-ID: <200101181452.JAA06899@cj20424-a.reston1.va.home.com> > > a < b if and only if b > a. > > This is what the rich comparison code does. > > Someone is bound to come up with a use for comparison > operator overloading in which this isn't true, just > to be difficult! They'll get what they deserve -- this will be clearly documented! --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Thu Jan 18 16:15:25 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 10:15:25 -0500 (EST) Subject: [Python-Dev] Re: bug in grammar In-Reply-To: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> References: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> Message-ID: <14951.2189.14393.52725@localhost.localdomain> If I summarize your suggestion, I think you've said that ideally the grammar should not allow assignment to list comprehensions (or a variety of other constructs) -- but it doesn't so the compiler has to deal with it. This morning it seemed a lot easier to fix the bug than it did last night :-). com_assign() already has a number of checks for syntax errors in assignments. A test for list comprehensions belongs at the same place as tests for assignment to [] and augmented assignments applied to lists. I'll include a fix for assignment to list comprehensions in my big compiler patch. Jeremy From akuchlin at mems-exchange.org Thu Jan 18 16:28:19 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 10:28:19 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com>; from esr@thyrsus.com on Thu, Jan 18, 2001 at 02:23:21AM -0500 References: <20010118022321.A9021@thyrsus.com> Message-ID: <20010118102819.A21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 02:23:21AM -0500, Eric S. Raymond wrote: >And it's not hard to notice that an object hash is a memory address. Unless the object defines __hash__()! If you want the memory address, use id() instead. --amk From akuchlin at mems-exchange.org Thu Jan 18 16:30:36 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 10:30:36 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010118004400.B17392@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 18, 2001 at 12:44:00AM +0100 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> Message-ID: <20010118103036.B21503@kronos.cnri.reston.va.us> >On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: >> On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by >> setup.py. Also, SSL support for the socket module was not enabled, though >> OpenSSL is installed, in the default path. What does the layout of /usr/contrib look like? Is it /usr/contrib/openssl/include/, /usr/contrib/include/, or something else? >Strangely enough, this problem does not exist on FreeBSD. I can run 'make' >or 'make test' after 'make' just fine. 'make test' still doesn't work >because of the incorrect library path, but it doesn't barf like the other >systems (BSDI and Debian Linux) Have you already run "make install"? Perhaps it's picking up the already-installed modules when running "make test", because it really shouldn't be working. --amk From gward at cnri.reston.va.us Thu Jan 18 16:42:51 2001 From: gward at cnri.reston.va.us (Greg Ward) Date: Thu, 18 Jan 2001 10:42:51 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <3A6237D7.673BBB30@lemburg.com>; from mal@lemburg.com on Mon, Jan 15, 2001 at 12:35:51AM +0100 References: <3A6237D7.673BBB30@lemburg.com> Message-ID: <20010118104250.A27049@thrak.cnri.reston.va.us> On 15 January 2001, M.-A. Lemburg said: > He seems to be offline and the people on the distutils list have some > patches and other things which would be nice to have in distutils > for 2.1. Tim was right -- I'm *really* close to being back online. Just have to figure out why qmail's not answering port 25 and why LILO doesn't like my newly repartitioned hard drive, and all will be well. Oh yeah, and getting insurance, and a credit card, and unpacking all these cardboard boxes, and getting some furniture, ... (If anyone is considering it, I do *not* recommend buying a new computer, moving internationally, and getting a high speed home Internet connection all at the same time.) BTW I quite approve of Andrew being temporary Distutils dictator. Should have done it in December, but I didn't think I'd be out of commission for so long. Sigh. Greg From moshez at zadka.site.co.il Fri Jan 19 01:19:45 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 19 Jan 2001 02:19:45 +0200 (IST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: References: Message-ID: <20010119001945.80DC8A83E@darjeeling.zadka.site.co.il> On Thu, 18 Jan 2001 01:29:13 -0800 (PST), Ka-Ping Yee wrote: > On Tue, 16 Jan 2001, Guido van Rossum wrote: > > You mean the tp_print and tp_str function slots in type objects, > > right? tp_print *should* always render exactly the same as tp_str. > > tp_print is used by the print statement, not by value display at the > > interactive prompt. > > Uh, i hate to disagree with you about your own interpreter, but: > > com_expr_stmt in Python/compile.c > inserts a PRINT_EXPR opcode if c_interactive is true; > eval_code2 in Python/ceval.c > handles PRINT_EXPR by calling displayhook; > sys_displayhook in Python/sysmodule.c > prints the object by calling PyFile_WriteObject on sys.stdout; > PyFile_WriteObject in Objects/fileobject.c > calls PyObject_Print if the file is really a PyFileObject; > PyObject_Print in Objects/object.c > calls op->ob_type->tp_print if it's not NULL. > > The print statement produces a PRINT_ITEM opcode, which invokes > PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW > flag is propagated down to PyObject_Print and into string_print, > where it causes the string to fwrite itself directly without quoting. > > > So, string_print most definitely should *not* be changed -- only > > string_repr! > > I had to change them both before i actually saw the change in the > interactive interpreter. Actually, your statement above (that the > two should always render the same) seems to imply that if i change > one, i must also change the other. > > > -- ?!ng > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > > -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido at digicool.com Thu Jan 18 17:23:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:23:19 -0500 Subject: [Python-Dev] unistr() vs. unicode() Message-ID: <200101181623.LAA07389@cj20424-a.reston1.va.home.com> Ping wrote in response to a SourceForge mail about MAL's unistr() checking: ------- Forwarded Message Date: Wed, 17 Jan 2001 23:51:48 -0800 From: Ka-Ping Yee To: noreply at sourceforge.net cc: mal at lemburg.com, guido at python.org, patches at python.org Subject: Re: [Patches] [Patch #101664] Add new unistr() builtin + PyObject_Unic ode() C API On Wed, 17 Jan 2001 noreply at sourceforge.net wrote: > Comment: > This patch adds a utility function unistr() which works just like > the standard builtin str() -- only that the return value will > always be a Unicode object. Sorry for barging in, but i have an issue/question: Why are unistr() and unicode() two separate functions? str() performs one task: convert to string. It can convert anything, including strings or Unicode strings, numbers, instances, etc. The other type-named functions e.g. int(), long(), float(), list(), tuple() are similar in intent. Why have unicode() just for converting strings to Unicode strings, and unistr() for converting everything else to a Unicode string? What does unistr(x) do differently from unicode(x) if x is a string? - -- ?!ng ------- End of Forwarded Message (And no, Tim, this did *not* end up in the patches list because I made Barry remove the reply-to. SourceForge mails never had reply-to to begin with.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jan 18 17:28:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:28:12 -0500 Subject: [Python-Dev] urllib.urlencode & repeated values In-Reply-To: Your message of "Wed, 17 Jan 2001 22:20:30 CST." <14950.28430.572215.10643@beluga.mojam.com> References: <14950.28430.572215.10643@beluga.mojam.com> Message-ID: <200101181628.LAA07406@cj20424-a.reston1.va.home.com> > I'm pretty sure this has come up before, but urllib.urlencode doesn't handle > repeated parameters properly. If I call > > urllib.urlencode({"performers": ("U2","Lawrence Martin")}) > > instead of getting > > performers=U2&performers=Lawrence+Martin > > I get a quoted stringified tuple: > > performers=%28%27U2%27%2c+%27Lawrence+Martin%27%29 > > Obviously, fixing this will change the function's current semantics, but I > think it's worth treating lists and tuples (actually, any sequence) as > repeated values. If the existing semantics are deemed valuable enough, a > third default parameter could be added to switch on the new behavior when > desired. > > If others agree I'd be happy to whip up a patch. I think it's a bug. Agreed. If you can come up with something that supports all sequence types, and treats singleton sequences the same as their one and only item, it would even be the inverse of cgi.parse_qs()! --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 18 17:43:49 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 17:43:49 +0100 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: ; from ping@lfw.org on Thu, Jan 18, 2001 at 02:14:19AM -0800 References: <3A66BCCC.14997FE3@lemburg.com> Message-ID: <20010118174349.E17392@xs4all.nl> On Thu, Jan 18, 2001 at 02:14:19AM -0800, Ka-Ping Yee wrote: > Wow!! __complex__ can produce a segfault! > >>> complex > > >>> class Foo: > ... def __complex__(self): return 3 > ... > >>> Foo() > <__main__.Foo instance at 0x81e8684> > >>> f = _ > >>> complex(f) > Segmentation fault (core dumped) > This happens because builtin_complex first retrieves and saves > the PyNumberMethods of the argument (in this case, from the > instance), then tries to call __complex__ (in this case, returning 3), > and THEN coerces the result using nbr->nb_float if the result is > not complex! (This calls the instance's nb_float method on the > integer object 3!!) I've noticed that lurking bug in the coercion code when I added augmented assignment, though I don't recall whether I fixed it then, nor do I know if that part's been "touched" by the recent coercion changes. If none of the coercion champions speak up, I'll look at this sometime this weekend. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Thu Jan 18 17:50:28 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 11:50:28 -0500 Subject: [Python-Dev] SSL detection problem In-Reply-To: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 11:39:30AM +0100 References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> Message-ID: <20010118115028.D21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 11:39:30AM +0100, Martin v. Loewis wrote: >The problem is that these header files are in /usr/include/openssl, >which is not in the standard include search path. I have an improved version of setup.py (not checked in yet) that tries to do better, checking for both header and library files. One point: the OpenSSL docs imply that the headers should be loaded as , not as ; the header files themselves use the openssl/*.h form, which means you'd need two -I directives.. I'll patch the socket module accordingly. >The not-so-obvious question: How can one work-around such a problem >with the new setup scheme? In the old scheme, I could have chosen to >either provide the right -I option in Modules/Setup, to disable SSL >support, or to disable the _socket module altogether. How can I >achieve either configuration with the new scheme? I still need to implement command-line options to specify such overrides, but that couldn't possibly get done in time for alpha1. I was thinking of something like ---libs="foo bar", ---includes="/usr/include/blah/", and so forth. Suggestions for a better interface welcomed... --amk From guido at digicool.com Thu Jan 18 17:55:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:55:39 -0500 Subject: [Python-Dev] bug in grammar In-Reply-To: Your message of "Wed, 17 Jan 2001 21:58:19 EST." <14950.23499.275398.963621@localhost.localdomain> References: <14950.23499.275398.963621@localhost.localdomain> Message-ID: <200101181655.LAA08001@cj20424-a.reston1.va.home.com> > As part of the implementation of PEP 227 (and in an attempt to reach > some low-hanging fruit Guido mentioned on the types-sig long ago), I > have been working on a compiler pass that generates a module-level > symbol table. I recently discovered a bug in the handling of list > comprehensions that was giving me headaches. > > I realize now that the problem is with the current grammar and/or > compiler. Here's a simple demonstration; try it in your friendly > python 2.0 interpreter. > > >>> [i for i in range(10)] = (1, 2, 3) > Traceback (most recent call last): > File "", line 1, in ? > ValueError: unpack list of wrong size > > The generated bytecode is: > > 0 SET_LINENO 0 > > 3 SET_LINENO 1 > 6 LOAD_CONST 0 (1) > 9 LOAD_CONST 1 (2) > 12 LOAD_CONST 2 (3) > 15 BUILD_TUPLE 3 > 18 UNPACK_SEQUENCE 1 > 21 STORE_NAME 0 (i) > 24 LOAD_CONST 3 (None) > 27 RETURN_VALUE > > I assume this isn't intended :-). The compiler is ignoring everything > after the initial atom in the list comprehension. It's basically > compiling the code as if it were: > > [i] = (1, 2, 3) > > I'm not sure how to try and fix this. Should the grammar allow one to > construct the example statement above? If not, I'm not sure how to > fix the grammar. If not, I suppose the compiler should detect that > the list comp is misplaced. This seems fairly messy, since there are > about 10 nodes between the expr_stmt and the list_for. > > Or is this a cool way to use list comprehensions to generate > ValueErrors? Good catch! Not everything cool deserves to be preserved. It looks like this happens because the code that traverses lists on the left-hand side of an assignment was never told about list comprehensions. You're right that the grammar can't be fixed; it's for the same reason that it can't be fixed to disallow "f() = 1". The solution is to add a test for this to the compiler that flags this as an error. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jan 18 18:01:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 12:01:02 -0500 Subject: [Python-Dev] Embedded language discussion In-Reply-To: Your message of "Thu, 18 Jan 2001 00:19:31 EST." <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> References: <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> Message-ID: <200101181701.MAA08046@cj20424-a.reston1.va.home.com> > http://www.kuro5hin.org/?op=displaystory;sid=2001/1/16/11334/2280 > > The poster is on a project that's trying to use Python, but they're > encountering unspecified problems (perhaps because of the global > interpreter lock). I've sent the poster an email asking to be more specific about his questions; probably doing the right dance when calling Python from a thread created in C++ should do the trick. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jan 18 18:04:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 12:04:43 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Thu, 18 Jan 2001 01:29:13 PST." References: Message-ID: <200101181704.MAA08074@cj20424-a.reston1.va.home.com> > On Tue, 16 Jan 2001, Guido van Rossum wrote: > > You mean the tp_print and tp_str function slots in type objects, > > right? tp_print *should* always render exactly the same as tp_str. > > tp_print is used by the print statement, not by value display at the > > interactive prompt. > > Uh, i hate to disagree with you about your own interpreter, but: > > com_expr_stmt in Python/compile.c > inserts a PRINT_EXPR opcode if c_interactive is true; > eval_code2 in Python/ceval.c > handles PRINT_EXPR by calling displayhook; > sys_displayhook in Python/sysmodule.c > prints the object by calling PyFile_WriteObject on sys.stdout; > PyFile_WriteObject in Objects/fileobject.c > calls PyObject_Print if the file is really a PyFileObject; > PyObject_Print in Objects/object.c > calls op->ob_type->tp_print if it's not NULL. > > The print statement produces a PRINT_ITEM opcode, which invokes > PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW > flag is propagated down to PyObject_Print and into string_print, > where it causes the string to fwrite itself directly without quoting. > > > So, string_print most definitely should *not* be changed -- only > > string_repr! > > I had to change them both before i actually saw the change in the > interactive interpreter. Actually, your statement above (that the > two should always render the same) seems to imply that if i change > one, i must also change the other. Oops. I'm so grateful that we have a collective memory! :-) You're right: tp_print() can be invoked in two modes: with or without Py_PRINT_RAW flag. In raw mode, it should behave exactly like str(); in cooked mode exactly like repr(). --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at mira.cs.tu-berlin.de Thu Jan 18 20:31:29 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 20:31:29 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? Message-ID: <200101181931.f0IJVTc00932@mira.informatik.hu-berlin.de> > Comments? Yes, three of them: 1. To guarantee uniqueness atleast within the process, the easiest solution would be if using_threads: import thread lock=thread.allocate_lock() _acquire = lock.acquire_lock _release = lock.release_lock else: _acquire = _release = lambda:None _cookie = time.time() def getCookie(): global _cookie _acquire() _cookie+=1 result = _cookie _release() return result 2. Invoking [] repeatedly likely returns the an object with the same id() when called twice in a row (i.e. with no intermediate objects allocated in-between). 3. Why did you send this question to python-dev? python-list is more appropriate. Regards, Martin From tim.one at home.com Thu Jan 18 20:49:12 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 14:49:12 -0500 Subject: [Python-Dev] Windows Python totally rad In-Reply-To: <3A66B846.3D24B959@lemburg.com> Message-ID: [MAL] > Could you tell me which these are [new test failures on Windows]? > The tests tested all passed just fine, so I guess these must be > Windows-related problems. Not to worry, all the tests pass now. Don't want to spend time backtracking, as I'm not the one who fixed them and don't know who did. FWIW, they "smelled like" shallow failures (== easy to diagnose & fix). onward!-ly y'rs - tim From martin at mira.cs.tu-berlin.de Thu Jan 18 20:37:04 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 20:37:04 +0100 Subject: [Python-Dev] new Makefile.in Message-ID: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de> > A question: is it possible to break the Python static library up? > For example, instead of having libpython.a have > Parser/parser.a, Objects/objects.a, etc? Please, no. It was that way in Python 1.4 (libModules, libObjects, and I forgot which the others were :-). We had that all documented in our book, then Guido tried to build an extension module for the first time, saw that these many libraries were terrible, and combined them into a single one. That was a good thing, and we have it documented in our book. I'm not at all looking forward to answering all the questions why the build infrastructure of Python changed yet again... Regards, Martin From fdrake at acm.org Thu Jan 18 21:22:30 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 18 Jan 2001 15:22:30 -0500 (EST) Subject: [Python-Dev] weak references in 2.1alpha Message-ID: <14951.20614.176140.672447@cj42289-a.reston1.va.home.com> I'd like to put the weak references patch into the alpha, but haven't received any feedback on the latest patch. I have some comments from Martin von L?wis on the PEP that need to be addressed, and that could change the implementation a bit, but the basic machinery seems to be pretty reasonable and works for me. Does anyone have any objections to it going into the alpha? I'd like to enable more wide-spread testing. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at lemburg.com Thu Jan 18 18:10:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 18:10:14 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? References: <20010118022321.A9021@thyrsus.com> Message-ID: <3A672376.4B951848@lemburg.com> "Eric S. Raymond" wrote: > > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. > > Because Python garbage-collects, hash() of a just-created object isn't > good enough. Because we may be threading, millisecond time isn't > good enough. Because we may *not* be threading, thread ID isn't good > either. > > On the other hand, I'm on Linux getting millisecond time resolution. > And it's not hard to notice that an object hash is a memory address. > > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. > > Comments? A combination of time.time(), process id and counter should work in all cases. Make sure you use a lock around the counter, though. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Jan 18 18:30:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 18:30:52 +0100 Subject: [Python-Dev] Standard install locations for Python ? References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> Message-ID: <3A67284C.B6C617A@lemburg.com> "Martin v. Loewis" wrote: > > > Where should Python extensions install themselves and their docs? > > I feel that extensions should not need to care. For extensions, > distutils will pick a location, and the system administrator > configuration the package can chose a different location. > > Unfortunately, distutils does not support the installation of > documentation, which I think it should. Right. > Now switching sides, as an administrator, I'd wish distutils to follow > the system conventions by default. > > That means on Linux, documentation should go into the system's > directory, which is /usr/share/doc according to latest > standards. Distributions vary, so distutils should find out - e.g. by > querying the location from rpm. In addition, when building RPMs, > distutils should declare these files as %doc in the spec file, so RPM > will install it following the system conventions. You currently have to do this by hand (e.g. in setup.cfg or using the doc_files option). It should fairly easy to add a command similar to install_data though which then applies all the necessary magic to the paths. If there a common landmark to look for on Unix (e.g. in case the system does not use RPM) ? Which paths should distutils check ? (/usr/share/doc/packages, /usr/share/doc, /usr/doc/packages, /usr/doc in that order ?) > On Windows, the convention apparently is to put the documentation > "nearby" the software, so it should probably go into Doc or a > subdirectory thereof. Na, I'd rather have \Python\Site-Packages and \Python\Site-Docs for that purpose. > On Unix, there appears to be no standard location, unless the > documentation consists of man pages or perhaps info files. So > /share/doc is probably a place as good as any other. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Thu Jan 18 18:45:29 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 18 Jan 2001 11:45:29 -0600 (CST) Subject: [Python-Dev] urllib.urlencode & repeated values In-Reply-To: <200101181628.LAA07406@cj20424-a.reston1.va.home.com> References: <14950.28430.572215.10643@beluga.mojam.com> <200101181628.LAA07406@cj20424-a.reston1.va.home.com> Message-ID: <14951.11193.150232.564700@beluga.mojam.com> >> If others agree I'd be happy to whip up a patch. I think it's a bug. Guido> Agreed. Patch #103314: http://sourceforge.net/patch/?func=detailpatch&patch_id=103314&group_id=5470 I assigned it to Fred for doc review. Skip From akuchlin at mems-exchange.org Thu Jan 18 19:56:40 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 13:56:40 -0500 Subject: [Python-Dev] Standard install locations for Python ? In-Reply-To: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 01:20:06PM +0100 References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> Message-ID: <20010118135640.G21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 01:20:06PM +0100, Martin v. Loewis wrote: >On Unix, there appears to be no standard location, unless the >documentation consists of man pages or perhaps info files. So >/share/doc is probably a place as good as any other. This seems like a good suggestion. Should docs go in /share/doc/python/, then? Perhaps with subdirectories for different extensions? --amk From tismer at tismer.com Thu Jan 18 22:39:18 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 18 Jan 2001 22:39:18 +0100 Subject: [Python-Dev] Rich comparison confusion References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> Message-ID: <3A676286.C33823B4@tismer.com> Guido van Rossum wrote: > > > I'm a bit confused about Guido's rich comparison stuff. In the description > > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. > > Yes. By this I mean that AA are interchangeable, ditto for > A<=B and B>=A. Also A==B interchanges for B==A, and A!=B for B!=A. ... > I think what threw you off was the ambiguity of "inverse". This means > Boolean negation. I'm not relying on Boolean negation here -- I'm > relying on the more fundamental property that aa have the > same outcome. Yes, the "inverse" is confusing. Is what you mean the "reverse" ? Like the other right-side operators __radd__, is it correct to think of __ge__ == __rle__ if __rle__ was written in the same fashion like __radd__ ? It looks semantically the same, although the reason for a call might be different. And if my above view is right, would it perhaps be less confusing to use in fact __rle__ and __rlt__, or woudl it be more confusing, since __rlt__ would also be invoked left-to-right, implementing ">". Not shure if I added even more confusion. -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim.one at home.com Thu Jan 18 22:53:44 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 16:53:44 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com> Message-ID: [Eric S. Raymond, in search of uniqueness] > ... > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because > another thread would have to create an object at the same memory > address during the same millisecond to collide. I'm afraid it's much more vulnerable than that: Python's thread granularity is at the bytecode level, not the statement level. It's very easy for thread A and B to see the same `time.time()` value, and after that arbitrarily long amounts of time may pass before they get around to doing the hash([]) business. When hash() completes, the storage for [] is immediately reclaimed under CPython, and it's again very easy for another thread to reuse the storage. I'm attaching an executable test case. It uses time.clock() because that has much higher resolution than time.time() on Windows (better than microsecond), but rounds it back to three decimal places to simulate millisecond resolution. The first three runs: saw 14600 unique in 30000 total saw 14597 unique in 30000 total saw 14645 unique in 30000 total So it sucks bigtime on my box. Better idea: borrow the _ThreadSafeCounter class from the tail end of the current CVS tempfile.py. The code works whether or not threads are available. Then `time.time()` + str(_counter.get_next()) is thread-safe. For that matter, plain old str(_counter.get_next()) will always be unique within a single run. However, in either case you're still not safe against concurrent *processes* generating the same cookies. tempfile.py has to worry about that too, of course, so the *best* idea is to call tempfile.mktemp() and leave it at that. It wastes some time checking the filesystem for a file of the same name (which, btw, goes much quicker on Linux than on Windows). From tismer at tismer.com Thu Jan 18 22:56:08 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 18 Jan 2001 22:56:08 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? References: <20010118022321.A9021@thyrsus.com> Message-ID: <3A676678.7E4AF278@tismer.com> "Eric S. Raymond" wrote: > > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. What do you mean by "unique"? Unique regarding your long-running server? If so, then I wonder why one should do > > So, how about `time.time()` + hex(hash([]))? > instead of using a single, simple counter for all sessions? > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. > > Comments? If I'm not overlooking something fundamental, the counter approach seems to be simpler and most portable. :-) but-sometimes-my-brain-malfunctions-badly-ly y'rs - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From nas at arctrix.com Thu Jan 18 16:07:13 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 07:07:13 -0800 Subject: [Python-Dev] Re: new Makefile.in In-Reply-To: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 08:37:04PM +0100 References: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de> Message-ID: <20010118070713.A13581@glacier.fnational.com> On Thu, Jan 18, 2001 at 08:37:04PM +0100, Martin v. Loewis wrote: > > A question: is it possible to break the Python static library up? > > For example, instead of having libpython.a have > > Parser/parser.a, Objects/objects.a, etc? > > Please, no. Okay. > I'm not at all looking forward to answering all the questions > why the build infrastructure of Python changed yet again... My Makefile patch shouldn't change the way you build extensions. Neil From tim.one at home.com Fri Jan 19 02:45:42 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 20:45:42 -0500 Subject: [Python-Dev] unistr() vs. unicode() In-Reply-To: <200101181623.LAA07389@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > (And no, Tim, this did *not* end up in the patches list because I made > Barry remove the reply-to. SourceForge mails never had reply-to to > begin with.) Aha! Another thing to blame Barry for . From tim.one at home.com Thu Jan 18 23:11:23 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:11:23 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <008701c0811a$b3371c00$e46940d5@hagrid> Message-ID: [/F] > python build problems and real life got in the way. > > will 2.1a1 be released according to plan? will there > be a 2.1a2 release? maybe I should postpone this? Depends on how confident you are. Since this is purely an optimization, I don't think it *needs* to get into a1 in order to make the final release; postponing a few days would be better than pushing too hard on something that's proved hairier than anticipated. do-the-right-thing-whatever-that-is-ly y'rs - tim From guido at digicool.com Fri Jan 19 03:17:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 21:17:36 -0500 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Thu, 18 Jan 2001 02:14:19 PST." References: Message-ID: <200101190217.VAA01497@cj20424-a.reston1.va.home.com> > I hope you don't mind that i'm taking this over to python-dev, > because it led me to discover a more general issue (see below). No -- in fact I wanted to see this here! (My mail backlog seems to be clearing -- or maybe it was only a temporary unclogging... :-) > For the others on python-dev, here's the background: MAL was > about to check in the unistr() function, described as follows: > > > This patch adds a utility function unistr() which works just like > > the standard builtin str() -- only that the return value will > > always be a Unicode object. > > > > The patch also adds a new object level C API PyObject_Unicode() > > which complements PyObject_Str(). > > I responded: > > Why are unistr() and unicode() two separate functions? > > > > str() performs one task: convert to string. It can convert anything, > > including strings or Unicode strings, numbers, instances, etc. > > > > The other type-named functions e.g. int(), long(), float(), list(), > > tuple() are similar in intent. > > > > Why have unicode() just for converting strings to Unicode strings, > > and unistr() for converting everything else to a Unicode string? > > What does unistr(x) do differently from unicode(x) if x is a string? > > MAL responded: > > unistr() is meant to complement str() very closely. unicode() > > works as constructor for Unicode objects which can also take > > care of decoding encoded data. str() and unistr() don't provide > > this capability but instead always assume the default encoding. > > > > There's also a subtle difference in that str() and unistr() > > try the tp_str slot which unicode() doesn't. unicode() > > supports any character buffer which str() and unistr() don't. > > Okay, given this explanation, i still feel fairly confident > that unicode() should subsume unistr(). Many of the other > type-named functions try various slots: > > int() looks for __int__ > float() looks for __float__ > long() looks for __long__ > str() looks for __str__ > > In testing this i also discovered the following: > > >>> class Foo: > ... def __int__(self): > ... return 3 > ... > >>> f = Foo() > >>> int(f) > 3 > >>> long(f) > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Foo instance has no attribute '__long__' > >>> float(f) > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Foo instance has no attribute '__float__' > > This is kind of surprising. How about: > > int() looks for __int__ > float() looks for __float__, then tries __int__ > long() looks for __long__, then tries __int__ > str() looks for __str__ > unicode() looks for __unicode__, then tries __str__ For the numeric types this could perhaps be done by calling PyNumber_Long() from PyNumber_Float(), calling PyNumber_Int() from PyNumber_Long(). Complex is a bit of an exception -- there's no PyNumber_Complex(), just because I felt that nobody would need it. :-) > The extra parameter to unicode() is very similar to the extra > parameter to int(), so i think there is a natural parallel here. Makes sense. > Hmm... what about the other types? > > Wow!! __complex__ can produce a segfault! > > >>> complex > > >>> class Foo: > ... def __complex__(self): return 3 > ... > >>> Foo() > <__main__.Foo instance at 0x81e8684> > >>> f = _ > >>> complex(f) > Segmentation fault (core dumped) > > This happens because builtin_complex first retrieves and saves > the PyNumberMethods of the argument (in this case, from the > instance), then tries to call __complex__ (in this case, returning 3), > and THEN coerces the result using nbr->nb_float if the result is > not complex! (This calls the instance's nb_float method on the > integer object 3!!) Thanks! Fixed now in CVS. > I think __complex__ should probably look for __complex__, then > __float__, then __int__. I make it call PyNumber_Float(), which could be made smarter as explained above. > One could argue for __list__, __tuple__, or __dict__, but that > seems much weaker; the Pythonic way has always been to implement > __getitem__ instead. Yes -- since __list__ etc. aren't used, let's not add them. > There is no built-in dict(); if it existed > i suppose it would do the opposite of x.items(); again a weak > argument, though i might have found such a function useful once > or twice. Yeah, it's not very common. Dict comprehensions anyone? d = {k:v for k,v in zip(range(10), range(10))} # :-) > And that about covers the built-in types for data. Thanks! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu Jan 18 23:13:14 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:13:14 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com> Message-ID: BTW, why doesn't hash([]) blow up in 2.1a1? In 2.0 it raised TypeError: unhashable type Did someone change this deliberately? From tim.one at home.com Thu Jan 18 23:58:22 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:58:22 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Message-ID: [Tim whined] > BTW, why doesn't hash([]) blow up in 2.1a1? In 2.0 it raised > > TypeError: unhashable type > > Did someone change this deliberately? Answer: it's an unintended consequence of the rich-comparison changes. Guido knows how to fix it and probably will. The list type grew a tp_richcompare slot but lost its non-NULL tp_compare pointer. PyObject_Hash wasn't changed accordingly (it now believes lists support neither direct hashing nor comparison, so does them a favor and hashes their memory addresses). Something trickier is probably going wrong elsewhere too, but I won't try to remember what that is unless Guido gets hit by a bus tonight. in-which-case-we-can-push-off-the-funeral-until-after-the-release-ly y'rs - tim From thomas at xs4all.net Fri Jan 19 00:02:09 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 00:02:09 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010118103036.B21503@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 18, 2001 at 10:30:36AM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> Message-ID: <20010119000209.F17392@xs4all.nl> On Thu, Jan 18, 2001 at 10:30:36AM -0500, Andrew Kuchling wrote: > >On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: > >> On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > >> setup.py. Also, SSL support for the socket module was not enabled, though > >> OpenSSL is installed, in the default path. > What does the layout of /usr/contrib look like? Is it > /usr/contrib/openssl/include/, /usr/contrib/include/, or something > else? Actually, it's /usr/local, not /usr/contrib. I've never installed OpenSSL in /usr/contrib, though I could, and maybe BSDI will, in the future. (BSDI installs its own software in /usr, and optional free, pre-compiled software in /usr/contrib.) OpenSSL installs into /usr/local/ssl/include/openssl by default, and installing into /usr/contrib would make it /usr/contrib/ssl/include/openssl. > >Strangely enough, this problem does not exist on FreeBSD. I can run 'make' > >or 'make test' after 'make' just fine. 'make test' still doesn't work > >because of the incorrect library path, but it doesn't barf like the other > >systems (BSDI and Debian Linux) > Have you already run "make install"? Perhaps it's picking up the > already-installed modules when running "make test", because it really > shouldn't be working. Hm, I think you misread my statement. 'make test' *doesn't* work. But it doesn't barf on the signal module being built dynamically either. You fixed that for every platform now, I was just pointing out that this was not a problem for FreeBSD for some reason. 'make test' still doesn't work, but I can make it work by specifying a hand-tweaked PYTHONPATH that includes the OS/arch-dependant build directory. This brings me to another point: how can 'make test' work at all ? Does python always check for './Lib' (and './Modules') for modules ? If that's specific for 'make test' and running python in the source distribution, that sounds like a bit of a weird hack. I can't find any such hackery in the source, but I also can't figure out how else it's working :) More-later--Meteor-((c)-1979)-is-on-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at mira.cs.tu-berlin.de Fri Jan 19 00:14:05 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 19 Jan 2001 00:14:05 +0100 Subject: [Python-Dev] weak references in 2.1alpha Message-ID: <200101182314.f0INE5B00338@mira.informatik.hu-berlin.de> > Does anyone have any objections to it going into the alpha? I'd like to request that the .clear() method is removed from the patch for this alpha, and also that the weak dictionaries are removed until their semantics is clarified. It's always easier to add stuff later than to remove it. Regards, Martin From nas at arctrix.com Thu Jan 18 17:31:09 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 08:31:09 -0800 Subject: [Python-Dev] SSL detection problem In-Reply-To: <20010118115028.D21503@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 18, 2001 at 11:50:28AM -0500 References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> <20010118115028.D21503@kronos.cnri.reston.va.us> Message-ID: <20010118083109.A13972@glacier.fnational.com> On Thu, Jan 18, 2001 at 11:50:28AM -0500, Andrew Kuchling wrote: > On Thu, Jan 18, 2001 at 11:39:30AM +0100, Martin v. Loewis wrote: > >The not-so-obvious question: How can one work-around such a problem > >with the new setup scheme? > > I still need to implement command-line options to specify such > overrides, but that couldn't possibly get done in time for alpha1. My non-recursive makefile patch allows you to use both Setup and setup.py. Its not quite really for prime time but its getting close. I would be interested if someone could point me to the source for some crappy makes. I've tried GNU make, BSD 4.4 pmake and whatever comes with SunOS 5.6. Searching for "make" doesn't work too well. :-( Neil From thomas at xs4all.net Fri Jan 19 00:45:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 00:45:32 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Thu, Jan 18, 2001 at 08:46:54AM -0800 References: Message-ID: <20010119004532.G17392@xs4all.nl> On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > filename = '/tmp/delete_me' This reminds me: we need a portable way to handle test-files :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 00:56:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 18:56:04 -0500 Subject: [Python-Dev] new Makefile.in In-Reply-To: Your message of "Wed, 17 Jan 2001 23:59:22 PST." <20010117235922.A12356@glacier.fnational.com> References: <20010117235922.A12356@glacier.fnational.com> Message-ID: <200101182356.SAA19616@cj20424-a.reston1.va.home.com> Hi Neil, My mail suffers delays of 12-24 hours while mail.python.org is working on some enormous backlog. So I just saw your message about a new Makefile... > Spurred on by comments made by Andrew, I spent some time last > night overhauling the Python Makefiles. I now have a toplevel > non-recursive Makefile.in that seems to work fairly well. I'm > pretty sure it still should be portable. It doesn't use includes > or any special GNU make features. It is half the size of the old > Makefiles. The build is faster and its now easier to follow if > something goes wrong. I'd like to see this! > A question: is it possible to break the Python static library up? > For example, instead of having libpython.a have > Parser/parser.a, Objects/objects.a, etc? There > would still only be one shared library. This would speed up > incremental builds and also help Andrew with PEP 229. I'm > thinking that the Makefile do something like this: > > all: python$(EXE) > > PYLIBS= Parser/parser.a Objects/objects.a ... Modules/modules.a > > python$(EXE): $(PYLIBS) > $(LINKCC) -o python$(EXE) $(PYLIBS) ... > > Modules/modules.a: minpython$(EXE) > ./minpython$(EXE) setup.py Sounds cool to me. (Where's the patch for a shared libpython???) > AFACT, the only thing affected by splitting up the static library > is Misc/Makefile.pre.in. Is this correct? Yeah, and that should be phased out in favor of distutils anyway. Now would be a great time! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 01:34:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:34:02 -0500 Subject: [Python-Dev] Mail delays and SourceForge bugs Message-ID: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Through no fault of my own, email to guido at python.org (which includes the python-dev list) is currently suffering delays of 12-24 hours. I have a feeling this is probably true for all mail going through python.org, so checkin messages ans python-dev discussion have been greatly frustrated, with about 1 day to go until the planned 2.1a1 release date! On top of that, the SourceForge bug manager has developed a problem: all references to http://sourceforge.net/bugs/?group_id=5470/ come back with this error: An error occured in the logger. ERROR: pg_atoi: error in "5470/": can't parse "/" I'm still hoping to release Python 2.1a1 tomorrow, unless Jeremy tells me that he needs more time for his nested scopes patch. In the mean time, please everybody, do check out the latest CVS version and give it a good workout! Andrew's setup.py still has some rough edges, I believe that in order to run it from the build directory you still have to point PYTHONPATH to the build/lib* directory, where he hides the shared libraries for all modules. Andrew, are you planning to fix this? If there's anything that you need me to know about, please mail to guido at digicool.com -- that address suffers no delays. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri Jan 19 01:51:19 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 19:51:19 -0500 Subject: [Python-Dev] RE: [Pycabal] Mail delays and SourceForge bugs In-Reply-To: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Message-ID: [Guido. notes current woes w/ python.org email, and SourceForge] Note too that, over the past two days, it's not possible to follow Python-Dev email via http://mail.python.org/pipermail/python-dev/2001-January/date.html either, as (unlike during previous occurrences of python.org email delays) msgs aren't showing up there in a timely fashion either (for example, the msg of Guido's to which I'm replying isn't there). good-thing-guido's-so-easy-to-channel-ly y'rs - tim From guido at digicool.com Fri Jan 19 01:52:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:52:02 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Your message of "Thu, 18 Jan 2001 02:23:21 EST." <20010118022321.A9021@thyrsus.com> References: <20010118022321.A9021@thyrsus.com> Message-ID: <200101190052.TAA26849@cj20424-a.reston1.va.home.com> > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. > > Because Python garbage-collects, hash() of a just-created object isn't > good enough. Because we may be threading, millisecond time isn't > good enough. Because we may *not* be threading, thread ID isn't good > either. > > On the other hand, I'm on Linux getting millisecond time resolution. > And it's not hard to notice that an object hash is a memory address. > > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. Argh! hash([]) should raise TypeError, since lists are not hashable objects -- mutable objects can't be allowed as dictionary keys. This (hash([]) accidentally returned a value for a brief period after I checked in the rich comparisons -- I've fixed that now. But not to worry: instead of using hash([]), you can use hex(id([])). Same thing. On the other hand, remember how much you can do in a millisecond! (E.g. I can call tempfile.mktemp() 5 times in that time.) And when you create an object and immediately delete it, the next object created is very likely to have the same address. But what's wrong with this: try: from thread import get_ident as unique_id else: def unique_id(): return id([]) --Guido van Rossum (home page: http://www.python.org/~guido/) From billtut at microsoft.com Fri Jan 19 01:53:15 2001 From: billtut at microsoft.com (Bill Tutt) Date: Thu, 18 Jan 2001 16:53:15 -0800 Subject: [Python-Dev] MS CRT crashing: Message-ID: <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> From guido at digicool.com Fri Jan 19 01:53:13 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:53:13 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: Your message of "Thu, 18 Jan 2001 07:48:41 +0100." <008701c0811a$b3371c00$e46940d5@hagrid> References: <012901c080a5$306023a0$e46940d5@hagrid> <008701c0811a$b3371c00$e46940d5@hagrid> Message-ID: <200101190053.TAA26862@cj20424-a.reston1.va.home.com> > I wrote: > > I've almost sorted it all out. will check it in later tonight (local > > time). > > python build problems and real life got in the way. What? You've got a real life? Can't be allowed, not when we're working on a release! > will 2.1a1 be released according to plan? will there > be a 2.1a2 release? maybe I should postpone this? Please check it in, there's still time (2.1a1 won't go out before Friday night, possibly it'll be delayed until Monday). And yes, there will be a 2.1a2. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 01:55:15 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:55:15 -0500 Subject: [Python-Dev] SSL detection problem In-Reply-To: Your message of "Thu, 18 Jan 2001 11:39:30 +0100." <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> Message-ID: <200101190055.TAA26905@cj20424-a.reston1.va.home.com> > The distutils-based configuration fails to build on my system (SuSE > 7.0) with the error > > /usr/src/python/Modules/socketmodule.c:159: rsa.h: Datei oder Verzeichnis nicht > gefunden > /usr/src/python/Modules/socketmodule.c:160: crypto.h: Datei oder Verzeichnis nicht gefunden > /usr/src/python/Modules/socketmodule.c:161: x509.h: Datei oder Verzeichnis nicht gefunden > /usr/src/python/Modules/socketmodule.c:162: pem.h: Datei oder Verzeichnis nicht > gefunden > /usr/src/python/Modules/socketmodule.c:163: ssl.h: Datei oder Verzeichnis nicht > gefunden The same happened to Fred on Mandrake 7.0 (except for the German messages :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 01:58:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:58:16 -0500 Subject: [Python-Dev] Re: unistr() vs. unicode() Message-ID: <200101190058.TAA26931@cj20424-a.reston1.va.home.com> MAL's reply to Ping in this thread. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Thu, 18 Jan 2001 10:52:12 +0100 From: "M.-A. Lemburg" To: Ka-Ping Yee cc: guido at python.org, patches at python.org Subject: Re: [Patches] [Patch #101664] Add new unistr() builtin + PyObject_Unic ode()C API Ka-Ping Yee wrote: > > On Wed, 17 Jan 2001 noreply at sourceforge.net wrote: > > Comment: > > This patch adds a utility function unistr() which works just like > > the standard builtin str() -- only that the return value will > > always be a Unicode object. > > Sorry for barging in, but i have an issue/question: > > Why are unistr() and unicode() two separate functions? > > str() performs one task: convert to string. It can convert anything, > including strings or Unicode strings, numbers, instances, etc. > > The other type-named functions e.g. int(), long(), float(), list(), > tuple() are similar in intent. > > Why have unicode() just for converting strings to Unicode strings, > and unistr() for converting everything else to a Unicode string? > What does unistr(x) do differently from unicode(x) if x is a string? unistr() is meant to complement str() very closely. unicode() works as constructor for Unicode objects which can also take care of decoding encoded data. str() and unistr() don't provide this capability but instead always assume the default encoding. There's also a subtle difference in that str() and unistr() try the tp_str slot which unicode() doesn't. unicode() supports any character buffer which str() and unistr() don't. Perhaps you are right though in that we should make all three APIs behave in the same way with respect to coercing their arguments. This could hide some errors... still in the long run, I agree that the existing setup probably causes more confusion than good. Guido ? - -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ _______________________________________________ Patches mailing list Patches at python.org http://mail.python.org/mailman/listinfo/patches ------- End of Forwarded Message From guido at digicool.com Fri Jan 19 02:04:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 20:04:22 -0500 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Thu, 18 Jan 2001 11:51:46 +0100." <3A66CAC2.74FC894@lemburg.com> References: <3A66CAC2.74FC894@lemburg.com> Message-ID: <200101190104.UAA27056@cj20424-a.reston1.va.home.com> > Ka-Ping Yee wrote: > > > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > > str() looks for __str__ > > > > Oops. I forgot that > > > > str() looks for __str__, then tries __repr__ > > > > So, presumably, > > > > unicode() should look for __unicode__, then __str__, then __repr__ > > Not quite... str() does this: > > 1. strings are passed back as-is > 2. the type slot tp_str is tried > 3. the method __str__ is tried > 4. Unicode returns are converted to strings > 5. anything other than a string return value is rejected > > unistr() does the same, but makes sure that the return > value is an Unicode object. > > unicode() does the following: > > 1. for instances, __str__ is called > 2. Unicode objects are returned as-is > 3. string objects or character buffers are used as basis for decoding > 4. decoding is applied to the character buffer and the results > are returned > > I think we should perhaps merge the two approaches into one > which then applies all of the above in unicode() (and then > forget about unistr()). This might lose hide some type errors, > but since all other generic constructors behave more or less > in the same way, I think unicode() should too. Yes, I would like to see these merged. I noticed that e.g. there is special code to compare Unicode strings in the comparison code (I think I *could* get rid of this now we have rich comparisons, but I decided to put that off), and when I looked at it it uses the same set of conversions as unicode(). Some of these seem questionable to me -- why do you try so many ways to get a string out of an object? (On the other hand the merge of unicode() and unistr() might have this effect anyway...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 02:06:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 20:06:23 -0500 Subject: [Python-Dev] bug in grammar In-Reply-To: Your message of "Thu, 18 Jan 2001 13:39:54 +0100." <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> References: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> Message-ID: <200101190106.UAA27073@cj20424-a.reston1.va.home.com> > I think there should be a well-formedness pass in-between. I.e. after > the AST has been build, a single pass should descend through the tree, > looking for an expr_statement with more than a single testlist. Once > it finds one, it should confirm that this really is a well-formed > lvalue (in C speak). In this case, the test should be that each term > is a an atom without factors. Good ideal. > If the parser itself performs such checks, the compiler could be > simplified in many places, I guess. Not sure that in practice it makes much of a difference: there aren't that many of these kinds of checks, and writing a separate pass is expensive. On the other hand, Jeremy is just writing a separate pass anyway, to collect name usage information for the nested scopes. Maybe it could be folded into that pass... --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Fri Jan 19 04:20:08 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 22:20:08 -0500 (EST) Subject: [Python-Dev] deprecated regex used by un-deprecated modules Message-ID: <14951.45672.806978.600944@localhost.localdomain> There are several modules in the standard library that use the regex module. When they are imported, they print a warning about using a deprecated module. I think this is bad form. Either the modules that depend on regex should by updated to use re or they should be deprecated themselves. I discovered the following offenders: asynchat knee poplib reconvert I would suggest fixing asynchat and poplib and deprecating knee. The reconvert module may be a special case. Jeremy From jeremy at alum.mit.edu Fri Jan 19 04:31:02 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 22:31:02 -0500 (EST) Subject: [Python-Dev] setup.py and build subdirectories Message-ID: <14951.46326.743921.988828@localhost.localdomain> I have a bunch of build directories under the source tree, e.g. src/python/dist/src/build src/python/dist/src/build-pg src/python/dist/src/build-O3 ... The new setup.py did not successfully build in these directories. I hacked distutils a tiny bit and had some success. Patch below. I'm not sure if the approach is kosher, but it allows me to build successfully. I also have a problem running 'make test' from these build directories. The reference to the distutils build directory has '..' prepended to it that shouldn't exist. Jeremy Index: setup.py =================================================================== RCS file: /cvsroot/python/python/dist/src/setup.py,v retrieving revision 1.8 diff -c -r1.8 setup.py *** setup.py 2001/01/18 20:39:34 1.8 --- setup.py 2001/01/19 03:26:55 *************** *** 536,540 **** # --install-platlib if __name__ == '__main__': ! sysconfig.set_python_build() main() --- 536,541 ---- # --install-platlib if __name__ == '__main__': ! path, file = os.path.split(sys.argv[0]) ! sysconfig.set_python_build(path) main() Index: Lib/distutils/sysconfig.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/distutils/sysconfig.py,v retrieving revision 1.31 diff -c -r1.31 sysconfig.py *** Lib/distutils/sysconfig.py 2001/01/17 15:16:52 1.31 --- Lib/distutils/sysconfig.py 2001/01/19 03:27:01 *************** *** 24,37 **** python_build = 0 ! def set_python_build(): """Set the python_build flag to true; this means that we're building Python itself. Only called from the setup.py script shipped with Python. """ global python_build ! python_build = 1 def get_python_inc(plat_specific=0, prefix=None): """Return the directory containing installed Python header files. --- 24,37 ---- python_build = 0 ! def set_python_build(loc): """Set the python_build flag to true; this means that we're building Python itself. Only called from the setup.py script shipped with Python. """ global python_build ! python_build = loc + "/" def get_python_inc(plat_specific=0, prefix=None): """Return the directory containing installed Python header files. *************** *** 48,54 **** prefix = (plat_specific and EXEC_PREFIX or PREFIX) if os.name == "posix": if python_build: ! return "Include/" return os.path.join(prefix, "include", "python" + sys.version[:3]) elif os.name == "nt": return os.path.join(prefix, "Include") # include or Include? --- 48,54 ---- prefix = (plat_specific and EXEC_PREFIX or PREFIX) if os.name == "posix": if python_build: ! return python_build + "Include/" return os.path.join(prefix, "include", "python" + sys.version[:3]) elif os.name == "nt": return os.path.join(prefix, "Include") # include or Include? From tim.one at home.com Fri Jan 19 04:46:16 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 22:46:16 -0500 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() Message-ID: [attribution lost] > There is no built-in dict(); if it existed i suppose it would do > the opposite of x.items(); again a weak argument, though i might > have found such a function useful once or twice. [Guido] > Yeah, it's not very common. Dict comprehensions anyone? > > d = {k:v for k,v in zip(range(10), range(10))} # :-) It's very common in Perl code, but is in no sense the inverse of .items() there: when you build a dict from a list L in Perl, it acts like Python {L[0]: L[1], L[2]: L[3], L[4]: L[5], ... } That's what seems most practical most often; e.g., when crunching over text files with records of the form key value (e.g., mail headers are of this form; simple contact databases; to-do lists segregated by date; etc), whatever fancy re.split() is used to break things apart naturally returns a flat list. A list of two-tuples is natural only if it was obtained from another dict's .items() <0.9 wink>. pushing-the-limits-of-"practicality-beats-purity"?-ly y'rs - tim From tim.one at home.com Fri Jan 19 07:00:27 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 01:00:27 -0500 Subject: [Python-Dev] test_urllib failing on Windows Message-ID: test test_urllib crashed -- exceptions.AssertionError: urllib.quote problem From tim.one at home.com Fri Jan 19 07:39:30 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 01:39:30 -0500 Subject: [Python-Dev] (no subject) Message-ID: [some MS internal support group] > Turns out the C standard explicitly says you can't have an input > follow iutput on a stream without doing fflush or fseek in-between, > to make sure the stdio buffer is cleared. So this program is illegal. It's undefined (there are no "illegal" programs -- that word doesn't appear in the std; "undefined" does and has a precise technical meaning). In the presence of threads-- which the C std doesn't mention --you have to address issues the std doesn't touch. To date, MS's is the only C runtime we've seen that corrupts itself in this situation. It can do anything it likes short of blowing up and still be considered a good threaded implementation. As is, it has to be considered sub-standard, in the ordinary sense of displaying worse behavior than other threaded C stdio implementations. It falls short there on other counts too (like the lack of getc_unlocked() & friends), but internal corruption is a particularly egregious failing. and-that's-the-end-of-it-for-me-ly y'rs - tim From mwh21 at cam.ac.uk Fri Jan 19 09:31:18 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 19 Jan 2001 08:31:18 +0000 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Thomas Wouters's message of "Fri, 19 Jan 2001 00:02:09 +0100" References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: Thomas Wouters writes: > This brings me to another point: how can 'make test' work at all ? Does > python always check for './Lib' (and './Modules') for modules ? If that's > specific for 'make test' and running python in the source distribution, that > sounds like a bit of a weird hack. I can't find any such hackery in the > source, but I also can't figure out how else it's working :) It's in Modules/getpath.c Cheers, M. -- I really hope there's a catastrophic bug insome future e-mail program where if you try and send an attachment it cancels your ISP account, deletes your harddrive, and pisses in your coffee -- Adam Rixey From gstein at lyra.org Fri Jan 19 09:38:54 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 19 Jan 2001 00:38:54 -0800 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) In-Reply-To: ; from gvanrossum@users.sourceforge.net on Thu, Jan 18, 2001 at 04:28:10PM -0800 References: Message-ID: <20010119003854.F7731@lyra.org> On Thu, Jan 18, 2001 at 04:28:10PM -0800, Guido van Rossum wrote: >... > PyTypeObject PyCursesWindow_Type = { > ! PyObject_HEAD_INIT(NULL) > 0, /*ob_size*/ > "curses window", /*tp_name*/ >... > --- 2432,2443 ---- > /* Initialization function for the module */ > > ! DL_EXPORT(void) > init_curses(void) > { > PyObject *m, *d, *v, *c_api_object; > static void *PyCurses_API[PyCurses_API_pointers]; > + > + /* Initialize object type */ > + PyCursesWindow_Type.ob_type = &PyType_Type; > > /* Initialize the C API pointer array */ I've never truly understood this. Is it because Windows cannot initialize (at load-time) a pointer to a data structure that is located in a different DLL? It is a bit painful to keep moving inits from load-time to run-time. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Fri Jan 19 10:01:22 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 04:01:22 -0500 Subject: [Python-Dev] test_urllib failing on Windows Message-ID: Bet it was failing everywhere; it's fixed now. From moshez at zadka.site.co.il Fri Jan 19 18:53:36 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 19 Jan 2001 19:53:36 +0200 (IST) Subject: [Python-Dev] Dbm failure Message-ID: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il> test test_dbm skipped -- /home/moshez/prog/src/python/python/dist/src/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey Did it happen to anyone else? Anything else you need to know? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From mal at lemburg.com Fri Jan 19 10:58:08 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 10:58:08 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> Message-ID: <3A680FB0.AED2DB55@lemburg.com> Guido van Rossum wrote: > > > Ka-Ping Yee wrote: > > > > > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > > > str() looks for __str__ > > > > > > Oops. I forgot that > > > > > > str() looks for __str__, then tries __repr__ > > > > > > So, presumably, > > > > > > unicode() should look for __unicode__, then __str__, then __repr__ > > > > Not quite... str() does this: > > > > 1. strings are passed back as-is > > 2. the type slot tp_str is tried > > 3. the method __str__ is tried > > 4. Unicode returns are converted to strings > > 5. anything other than a string return value is rejected > > > > unistr() does the same, but makes sure that the return > > value is an Unicode object. > > > > unicode() does the following: > > > > 1. for instances, __str__ is called > > 2. Unicode objects are returned as-is > > 3. string objects or character buffers are used as basis for decoding > > 4. decoding is applied to the character buffer and the results > > are returned > > > > I think we should perhaps merge the two approaches into one > > which then applies all of the above in unicode() (and then > > forget about unistr()). This might lose hide some type errors, > > but since all other generic constructors behave more or less > > in the same way, I think unicode() should too. > > Yes, I would like to see these merged. I noticed that e.g. there is > special code to compare Unicode strings in the comparison code (I > think I *could* get rid of this now we have rich comparisons, but I > decided to put that off), and when I looked at it it uses the same set > of conversions as unicode(). Some of these seem questionable to me -- > why do you try so many ways to get a string out of an object? (On the > other hand the merge of unicode() and unistr() might have this effect > anyway...) ... because there are so many ways to get at string representations of objects in Python at C level. If we agree to merge the semantics of the two APIs, then str() would have to change too: is this desirable ? (IMHO, yes) Here's what we could do: a) merge the semantics of unistr() into unicode() b) apply the same semantics in str() c) remove unistr() -- how's that for a short-living builtin ;) About the semantics: These should be backward compatible to str() in that everything that worked before should continue to work after the merge. A strawman for processing str() and unicode(): 1. strings/Unicode is passed back as-is 2. tp_str is tried 3. the method __str__ is tried 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) 5. for str(): Unicode return values are converted to strings using the default encoding for unicode(): Unicode return values are passed back as-is; string return values are decoded according to the encoding parameter 6. the return object is type-checked: str() will always return a string object, unicode() always a Unicode object Note that passing back Unicode is only allowed in case no encoding was given. Otherwise an execption is raised: you can't decode Unicode. As extension we could add encoding and error parameters to str() as well. The result would be either an encoding of Unicode objects passed back by tp_str or __str__ or a recoding of string objects returned by checks 2, 3 or 4. If we agree to take this approach, then we should remove the unistr() Python API before the alpha ships. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at effbot.org Fri Jan 19 11:19:06 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 11:19:06 +0100 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) References: <20010119003854.F7731@lyra.org> Message-ID: <010c01c08201$4b0ec050$e46940d5@hagrid> greg wrote: > I've never truly understood this. Is it because Windows cannot initialize > (at load-time) a pointer to a data structure that is located in a different > DLL? Windows can do it (via DLL initialization code), but the compiler doesn't generate initialization code for C programs. you can compile the module as C++, but that's also a bit painful... From jack at oratrix.nl Fri Jan 19 12:02:00 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 19 Jan 2001 12:02:00 +0100 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments Message-ID: <20010119110200.9E455373C95@snelboot.oratrix.nl> I get the impression that I'm currently seeing a non-NULL third argument in my (C) methods even though the method is called without keyword arguments. Is this new semantics that I missed the discussion about, or is this a bug? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From thomas at xs4all.net Fri Jan 19 13:22:06 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:22:06 +0100 Subject: [Python-Dev] deprecated regex used by un-deprecated modules In-Reply-To: <14951.45672.806978.600944@localhost.localdomain>; from jeremy@alum.mit.edu on Thu, Jan 18, 2001 at 10:20:08PM -0500 References: <14951.45672.806978.600944@localhost.localdomain> Message-ID: <20010119132206.H17392@xs4all.nl> On Thu, Jan 18, 2001 at 10:20:08PM -0500, Jeremy Hylton wrote: > I would suggest fixing asynchat and poplib and deprecating knee. The > reconvert module may be a special case. Can't reconvert just disable the warning before importing regex ? That would seem the sane thing to do, at least to me. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Fri Jan 19 13:26:31 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:26:31 +0100 Subject: [Python-Dev] Mail delays and SourceForge bugs In-Reply-To: <200101190034.TAA26664@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jan 18, 2001 at 07:34:02PM -0500 References: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Message-ID: <20010119132631.I17392@xs4all.nl> On Thu, Jan 18, 2001 at 07:34:02PM -0500, Guido van Rossum wrote: > Through no fault of my own, email to guido at python.org (which includes > the python-dev list) is currently suffering delays of 12-24 hours. I > have a feeling this is probably true for all mail going through > python.org, so checkin messages ans python-dev discussion have been > greatly frustrated, with about 1 day to go until the planned 2.1a1 > release date! I doubt it's (just) you, Guido. I'm seeing similar delays, and I already talked with Barry about it, too. It looks like it's clearing up a bit, now, but it's confusing as hell, for sure ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Fri Jan 19 13:33:47 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:33:47 +0100 Subject: [Python-Dev] Dbm failure In-Reply-To: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Fri, Jan 19, 2001 at 07:53:36PM +0200 References: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il> Message-ID: <20010119133347.J17392@xs4all.nl> On Fri, Jan 19, 2001 at 07:53:36PM +0200, Moshe Zadka wrote: > test test_dbm skipped -- /home/moshez/prog/src/python/python/dist/src/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey > Did it happen to anyone else? Yes, to me. You're suffering from the same thing I did: GNU sucks. Okay, okay, not as much as MS products or most other UNIX software, but still ;) The problem is a conflict between gdbm and glibc. gdbm (1.7.3, which is what woody currently carries, not sure why it isn't updated) offers a dbm interface/replacement, which includes a libdbm.(so|a) and /usr/include/gdbm-ndbm.h. Glibc (or at least the debian package) *also* offers a dbm interface/replacement, which consists of libdb1.(so|a) and /usr/include/db1/ndbm.h (which needs /usr/include/db1/*.h). If you add /usr/include/db1 to your include path, and -ldbm to the dbmmodule, you end up with the wrong versions. You need either to include /usr/include/db1 in your includepath and use -ldb1, or fix up dbmmodule.c so it includes gdbm-ndbm.h and uses -ldbm. I only figured this out yesterday, and sent Andrew a mail about that... I'm not sure what the Right(tm) way to fix this is :( I've always loathed these library/version mismatches :P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Fri Jan 19 14:07:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 14:07:00 +0100 Subject: [Python-Dev] Standard install locations for Python ? References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> <20010118135640.G21503@kronos.cnri.reston.va.us> Message-ID: <3A683BF4.BD74A979@lemburg.com> Andrew Kuchling wrote: > > On Thu, Jan 18, 2001 at 01:20:06PM +0100, Martin v. Loewis wrote: > >On Unix, there appears to be no standard location, unless the > >documentation consists of man pages or perhaps info files. So > >/share/doc is probably a place as good as any other. > > This seems like a good suggestion. Should docs go in > /share/doc/python/, then? Perhaps with > subdirectories for different extensions? Hmm, I guess it's better to follow bdist_rpm here: put the docs into a subdir under .../doc/ using the package name and version. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at alum.mit.edu Fri Jan 19 15:39:13 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 19 Jan 2001 09:39:13 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <20010119110200.9E455373C95@snelboot.oratrix.nl> References: <20010119110200.9E455373C95@snelboot.oratrix.nl> Message-ID: <14952.20881.848489.869512@localhost.localdomain> >>>>> "JJ" == Jack Jansen writes: JJ> I get the impression that I'm currently seeing a non-NULL third JJ> argument in my (C) methods even though the method is called JJ> without keyword arguments. JJ> Is this new semantics that I missed the discussion about, or is JJ> this a bug? This is a bug in the changes I made to the call function implementation. I wasn't sure what was supposed to happen to a function that expected a kw argument but was called without one. I thought I saw some crashes when I passed NULL, so I changed the implementation to pass an empty dictionary. (Is the correct behavior documented anywhere?) If a NULL value is correct, I'll update the implementation and see if I can rediscover those crashes. Jeremy From nas at arctrix.com Fri Jan 19 08:39:50 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 23:39:50 -0800 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010119000209.F17392@xs4all.nl>; from thomas@xs4all.net on Fri, Jan 19, 2001 at 12:02:09AM +0100 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: <20010118233950.A15636@glacier.fnational.com> On Fri, Jan 19, 2001 at 12:02:09AM +0100, Thomas Wouters wrote: > I can't find any such hackery in the source, but I also can't > figure out how else it's working :) I thank you want to look at getpath.c. Neil From jeremy at alum.mit.edu Fri Jan 19 15:44:50 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 19 Jan 2001 09:44:50 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.107,2.108 In-Reply-To: References: Message-ID: <14952.21218.416551.695660@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: GvR> Log Message: Changes to recursive-object comparisons, having to GvR> do with a test case I found where rich comparison of unequal GvR> recursive objects gave unintuituve results. In a discussion GvR> with Tim, where we discovered that our intuition on when a<=b GvR> should be true was failing, we decided to outlaw ordering GvR> comparisons on recursive objects. (Once we have fixed our GvR> intuition and designed a matching algorithm that's practical GvR> and reasonable to implement, we can allow such orderings GvR> again.) Sounds sensible to me! I was quite puzzled about what <= should return for recursive objects. GvR> - Changed the nesting limit to a more reasonable small 20; this GvR> only slows down comparisons of very deeply nested objects GvR> (unlikely to occur in practice), while speeding up GvR> comparisons of recursive objects (previously, this would GvR> first waste time and space on 500 nested comparisons before GvR> it would start detecting recursion). After we talked through this code yesterday, I was also thinking that the limit was too high :-). Jeremy From guido at digicool.com Fri Jan 19 16:49:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 10:49:54 -0500 Subject: [Python-Dev] new Makefile.in In-Reply-To: Your message of "Thu, 18 Jan 2001 18:56:04 EST." <200101182356.SAA19616@cj20424-a.reston1.va.home.com> References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> Message-ID: <200101191549.KAA28699@cj20424-a.reston1.va.home.com> [Neil] > > A question: is it possible to break the Python static library up? [me] > Sounds cool to me. Of course after Martin's response I agree with him -- let's keep it one library. (Although I expect that the combined effect of setup.py and Neil's flat Makefile will still affect the infrastructure to build extensions... :-( ) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 16:56:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 10:56:58 -0500 Subject: [Python-Dev] MS CRT crashing: In-Reply-To: Your message of "Thu, 18 Jan 2001 16:53:15 PST." <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> References: <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> Message-ID: <200101191556.KAA28761@cj20424-a.reston1.va.home.com> Bill Tutt writes: > From the internal support squad: > Turns out the C standard explicitly says you can't have an input follow > output on a stream without doing fflush or fseek in-between, to make sure > the stdio buffer is cleared. So this program is illegal. > > They've gone and resolved it by design. I'd just like to note for the record that this is exactly what I had predicted. I'd also like to note that I *agree*. Tim seems to think there's a race condition in the threading code, but it's really much simpler than that: the same bug can easily be provoked with a single-threaded program: just randomly read and write alternatingly. So obviously the people who wrote the threading code aren't interested in the bug, because it's not in their code -- and the people who wrote the code that doesn't behave well when abused are protected by the C standard... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 17:00:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:00:30 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Thu, 18 Jan 2001 22:39:18 +0100." <3A676286.C33823B4@tismer.com> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> Message-ID: <200101191600.LAA28788@cj20424-a.reston1.va.home.com> > Yes, the "inverse" is confusing. Is what you mean the "reverse" ? > Like the other right-side operators __radd__, is it correct to > think of > > __ge__ == __rle__ > > if __rle__ was written in the same fashion like __radd__ ? > It looks semantically the same, although the reason for a > call might be different. Yes, it's semantically the same, and the reason for the call is the same too ("the left argument doesn't support the operator so let's try if the right one knows"). > And if my above view is right, would it perhaps be less > confusing to use in fact __rle__ and __rlt__, > or woudl it be more confusing, since __rlt__ would also be > invoked left-to-right, implementing ">". I prefer 6 new operators over 12 any day. I can see no valid reason why someone would want to overload a>b different than b; from guido@digicool.com on Fri, Jan 19, 2001 at 10:49:54AM -0500 References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> <200101191549.KAA28699@cj20424-a.reston1.va.home.com> Message-ID: <20010119111455.C25056@kronos.cnri.reston.va.us> On Fri, Jan 19, 2001 at 10:49:54AM -0500, Guido van Rossum wrote: >Of course after Martin's response I agree with him -- let's keep it >one library. (Although I expect that the combined effect of setup.py >and Neil's flat Makefile will still affect the infrastructure to build >extensions... :-( ) Which reminds me... there should really be a way to ignore the setup.py stuff and use the old method. How should that be done. A --use-makesetup flag to configure, maybe? --amk From guido at digicool.com Fri Jan 19 17:14:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:14:20 -0500 Subject: [Python-Dev] Re: test_support.py In-Reply-To: Your message of "Thu, 18 Jan 2001 21:59:23 PST." References: Message-ID: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> > if not condition: > ! raise AssertionError(reason) Wouldn't it be better if this raised TestFailed rather than AssertionError? Or is there code that catches the AssertionError? [...grep...] Yes, there's code that catches AssertionError: (1) in Marc-Andre's own test_unicode.py; (2) in test_re, which catches AssertionError and raises TestFailed instead. Proposal: (1) change verify() to raise TestFailed; (2) change test_unicode.py to catch TestFailed instead. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Fri Jan 19 17:17:06 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 19 Jan 2001 17:17:06 +0100 Subject: [Python-Dev] Rich comparison confusion References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> Message-ID: <3A686882.F78C1268@tismer.com> Guido van Rossum wrote: > > > Yes, the "inverse" is confusing. Is what you mean the "reverse" ? > > Like the other right-side operators __radd__, is it correct to > > think of > > > > __ge__ == __rle__ > > > > if __rle__ was written in the same fashion like __radd__ ? > > It looks semantically the same, although the reason for a > > call might be different. > > Yes, it's semantically the same, and the reason for the call is the > same too ("the left argument doesn't support the operator so let's try > if the right one knows"). > > > And if my above view is right, would it perhaps be less > > confusing to use in fact __rle__ and __rlt__, > > or woudl it be more confusing, since __rlt__ would also be > > invoked left-to-right, implementing ">". > > I prefer 6 new operators over 12 any day. I can see no valid reason > why someone would want to overload a>b different than b there are plenty of reasons why a+b and b+a should be different: > e.g. string concatenation. Sure, I didn't want to introduce new operators, but use the "r" versions for three of the six new operators. But I should have read you proposal before. The confusion is not due to you, but Skip had a read error, since you don't talk about inverses at all: Skip==""" In the description he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. """ Truth==""" There are no explicit "reversed argument" versions of these; instead, __lt__ and __gt__ are each other's reverse, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reverse (similar at the C level). """ No reason for confusion at all > python-dev/null - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas at xs4all.net Fri Jan 19 17:20:56 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 17:20:56 +0100 Subject: [Python-Dev] test_ucn errors ? Message-ID: <20010119172056.K17392@xs4all.nl> I'm currently seeing a failure in test_ucn: test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding error: Illegal Unicode character It looks like one of the unicode literals in test_ucn is invalid, but it's damned hard to pin down which: Python 2.1a1 (#7, Jan 19 2001, 17:06:32) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import test.test_ucn Traceback (most recent call last): File "", line 1, in ? UnicodeError: Unicode-Escape decoding error: Illegal Unicode character >>> I get the same crashes on FreeBSD and (Debian) Linux. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 17:26:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:26:34 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Your message of "Fri, 19 Jan 2001 00:02:09 +0100." <20010119000209.F17392@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: <200101191626.LAA29165@cj20424-a.reston1.va.home.com> > This brings me to another point: how can 'make test' work at all ? Does > python always check for './Lib' (and './Modules') for modules ? Look at the logic in Modules/getpath.c, which calculates the initial (default) sys.path. It detects that it's running from the build tree and then modifies the default path a bit to include Lib and Modules relative to where the python executable was found. > If that's > specific for 'make test' and running python in the source distribution, that > sounds like a bit of a weird hack. I can't find any such hackery in the > source, but I also can't figure out how else it's working :) It's not jut for 'make test' -- it's to make life easy for developers in general (and me in particular :-) who want to try out their hacks without going through 'make install'. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Jan 19 17:34:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 17:34:58 +0100 Subject: [Python-Dev] Re: test_support.py References: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> Message-ID: <3A686CB2.C75D184D@lemburg.com> Guido van Rossum wrote: > > > if not condition: > > ! raise AssertionError(reason) > > Wouldn't it be better if this raised TestFailed rather than > AssertionError? Or is there code that catches the AssertionError? > > [...grep...] > > Yes, there's code that catches AssertionError: > > (1) in Marc-Andre's own test_unicode.py; > > (2) in test_re, which catches AssertionError and raises TestFailed > instead. > > Proposal: > > (1) change verify() to raise TestFailed; > > (2) change test_unicode.py to catch TestFailed instead. +1 Why not simply make TestFailed a subclass of AssertionError ? Then we wouldn't have to fear about breaking test code... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Fri Jan 19 17:34:15 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 17:34:15 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <200101191626.LAA29165@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, Jan 19, 2001 at 11:26:34AM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> Message-ID: <20010119173415.M17295@xs4all.nl> On Fri, Jan 19, 2001 at 11:26:34AM -0500, Guido van Rossum wrote: > > This brings me to another point: how can 'make test' work at all ? Does > > python always check for './Lib' (and './Modules') for modules ? > Look at the logic in Modules/getpath.c, which calculates the initial > (default) sys.path. It detects that it's running from the build tree > and then modifies the default path a bit to include Lib and Modules > relative to where the python executable was found. Aye, I found it now. > > If that's > > specific for 'make test' and running python in the source distribution, that > > sounds like a bit of a weird hack. I can't find any such hackery in the > > source, but I also can't figure out how else it's working :) > It's not jut for 'make test' -- it's to make life easy for developers > in general (and me in particular :-) who want to try out their hacks > without going through 'make install'. Well, after some old SF movies & some sleep, I realized that :) But it is going to have to change: you now have to include the build tree as well, and that is quite a bit more difficult to figure out. I'd suggest a 'make run' that calls python with the appropriate PYTHONPATH environment variable, but that doesn't cover test-scripts (which I use a lot myself.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 17:34:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:34:45 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Fri, 19 Jan 2001 12:02:00 +0100." <20010119110200.9E455373C95@snelboot.oratrix.nl> References: <20010119110200.9E455373C95@snelboot.oratrix.nl> Message-ID: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> > I get the impression that I'm currently seeing a non-NULL third > argument in my (C) methods even though the method is called without > keyword arguments. > Is this new semantics that I missed the discussion about, or is this a bug? Can't tell without spending more time looking at the code and experimenting than I can afford today; but Jeremy refactored the calling code, and it could be that you're seeing an empty dictionary instead of a NULL. Do you really need the NULL? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 17:41:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:41:02 -0500 Subject: [Python-Dev] Mail delays and SourceForge bugs In-Reply-To: Your message of "Fri, 19 Jan 2001 13:26:31 +0100." <20010119132631.I17392@xs4all.nl> References: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> <20010119132631.I17392@xs4all.nl> Message-ID: <200101191641.LAA29324@cj20424-a.reston1.va.home.com> > I doubt it's (just) you, Guido. I'm seeing similar delays, and I already > talked with Barry about it, too. It looks like it's clearing up a bit, now, > but it's confusing as hell, for sure ;) It's worse for me though than for most people: for others, only mail sent through mailman at mail.python.org is affected. For me, mail sent directly to guido at python.org is affected too (which is why I've changed my From address again to that old standby, guido at digicool.com). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 17:53:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:53:39 -0500 Subject: [Python-Dev] deprecated regex used by un-deprecated modules In-Reply-To: Your message of "Thu, 18 Jan 2001 22:20:08 EST." <14951.45672.806978.600944@localhost.localdomain> References: <14951.45672.806978.600944@localhost.localdomain> Message-ID: <200101191653.LAA29774@cj20424-a.reston1.va.home.com> > There are several modules in the standard library that use the regex > module. When they are imported, they print a warning about using a > deprecated module. I think this is bad form. Either the modules that > depend on regex should by updated to use re or they should be > deprecated themselves. > > I discovered the following offenders: > asynchat > knee > poplib > reconvert > > I would suggest fixing asynchat and poplib and deprecating knee. The > reconvert module may be a special case. Agreed. There's an idiom to disable the warning, which you can find in regsub.py: import warnings warnings.filterwarnings("ignore", "", DeprecationWarning, __name__) (The "" should be replaced by the specific warning message though.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 18:21:28 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 12:21:28 -0500 Subject: [Python-Dev] test_ucn errors ? In-Reply-To: Your message of "Fri, 19 Jan 2001 17:20:56 +0100." <20010119172056.K17392@xs4all.nl> References: <20010119172056.K17392@xs4all.nl> Message-ID: <200101191721.MAA31937@cj20424-a.reston1.va.home.com> > I'm currently seeing a failure in test_ucn: > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > error: Illegal Unicode character > > It looks like one of the unicode literals in test_ucn is invalid, but it's > damned hard to pin down which: Feels to me like there's a bug in the string literal processing that makes *any* string literal containing \N{...} fail during code generation. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Fri Jan 19 18:37:41 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 18:37:41 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> Message-ID: <023801c0823e$86fcedc0$e46940d5@hagrid> > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > error: Illegal Unicode character Make sure you rebuild Objects/unicodeobject.o and the ucnhash extension. If they build without warnings, run the following script. import ucnhash count = 0 for code in range(65536): try: name = ucnhash.getname(code) if ucnhash.getcode(name) != code: print name count += 1 except ValueError: pass print count if it prints anything but "10538", let me know. > It looks like one of the unicode literals in test_ucn is invalid, but it's > damned hard to pin down which: If the ucnhash extension cannot be found, the script won't even compile... shouldn't be too hard to fix. From Barrett at stsci.edu Fri Jan 19 18:32:26 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Fri, 19 Jan 2001 12:32:26 -0500 (EST) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <200101191600.LAA28788@cj20424-a.reston1.va.home.com> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> Message-ID: <14952.30800.112503.123675@nem-srvr.stsci.edu> Guido van Rossum writes: > > ... I can see no valid reason why someone would want to overload > a>b different than b I agree. But this assumes that the result of AA is a collection of Booleans. In the Interactive Data Language (IDL) these operators are essentially mapped to ceiling and floor functions which are not commutative. I personally find this silly, but IDL users coming to Python may be surprised when the comparison of two Numeric arrays returns a Boolean-like result. -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From nas at arctrix.com Fri Jan 19 11:43:12 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 19 Jan 2001 02:43:12 -0800 Subject: [Python-Dev] new Makefile.in In-Reply-To: <20010119111455.C25056@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Fri, Jan 19, 2001 at 11:14:55AM -0500 References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> <200101191549.KAA28699@cj20424-a.reston1.va.home.com> <20010119111455.C25056@kronos.cnri.reston.va.us> Message-ID: <20010119024312.A16179@glacier.fnational.com> On Fri, Jan 19, 2001 at 11:14:55AM -0500, Andrew Kuchling wrote: > Which reminds me... there should really be a way to ignore the > setup.py stuff and use the old method. How should that be done. A > --use-makesetup flag to configure, maybe? A different target for make would be easy. Neil From fredrik at effbot.org Fri Jan 19 19:13:15 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 19:13:15 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> Message-ID: <03a201c08243$7fa62af0$e46940d5@hagrid> thomas wrote: > > I'm currently seeing a failure in test_ucn: > > > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > > error: Illegal Unicode character > > > > It looks like one of the unicode literals in test_ucn is invalid, but it's > > damned hard to pin down which: > > Feels to me like there's a bug in the string literal processing that > makes *any* string literal containing \N{...} fail during code > generation. I took another look at the error message: the only explanation I can see here is that the lookup succeeds, but the call to ucn- hash returns a value larger than 0x10ffff. What is Py_UCS4 set to under gcc? Confusing /F From guido at digicool.com Fri Jan 19 19:11:21 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 13:11:21 -0500 Subject: [Python-Dev] Re: test_support.py In-Reply-To: Your message of "Fri, 19 Jan 2001 17:34:58 +0100." <3A686CB2.C75D184D@lemburg.com> References: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> <3A686CB2.C75D184D@lemburg.com> Message-ID: <200101191811.NAA32539@cj20424-a.reston1.va.home.com> > > Proposal: > > > > (1) change verify() to raise TestFailed; > > > > (2) change test_unicode.py to catch TestFailed instead. > > +1 > > Why not simply make TestFailed a subclass of AssertionError ? > Then we wouldn't have to fear about breaking test code... No, I'd rather see the two separated. There can be assert statements in the modules we're testing, and I'd prefer not to see those caught by test code that is trying to catch TestFailed. I'll check this in momentarily. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Fri Jan 19 19:19:37 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 19:19:37 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> Message-ID: <03b301c08244$627f22a0$e46940d5@hagrid> > Feels to me like there's a bug in the string literal processing that > makes *any* string literal containing \N{...} fail during code > generation. umm. can anyone explain how this can happen: python ../lib/test/regrtest.py test_ucn test_ucn 1 test OK. python ../lib/test/test_ucn.py UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name how can a test that works under regrtest.py fail when it's run separately? what am I missing here? From mal at lemburg.com Fri Jan 19 19:48:53 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 19:48:53 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> <03a201c08243$7fa62af0$e46940d5@hagrid> Message-ID: <3A688C15.8C9CFF46@lemburg.com> Fredrik Lundh wrote: > > thomas wrote: > > > I'm currently seeing a failure in test_ucn: > > > > > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > > > error: Illegal Unicode character > > > > > > It looks like one of the unicode literals in test_ucn is invalid, but it's > > > damned hard to pin down which: > > > > Feels to me like there's a bug in the string literal processing that > > makes *any* string literal containing \N{...} fail during code > > generation. > > I took another look at the error message: the only explanation > I can see here is that the lookup succeeds, but the call to ucn- > hash returns a value larger than 0x10ffff. > > What is Py_UCS4 set to under gcc? Should be "unsigned int" on all modern Intel platforms. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Fri Jan 19 19:48:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 13:48:45 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Fri, 19 Jan 2001 12:32:26 EST." <14952.30800.112503.123675@nem-srvr.stsci.edu> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> <14952.30800.112503.123675@nem-srvr.stsci.edu> Message-ID: <200101191848.NAA02765@cj20424-a.reston1.va.home.com> > > ... I can see no valid reason why someone would want to overload > > a>b different than b > > > I agree. But this assumes that the result of AA is a > collection of Booleans. In the Interactive Data Language (IDL) these > operators are essentially mapped to ceiling and floor functions which > are not commutative. I personally find this silly, but IDL users > coming to Python may be surprised when the comparison of two Numeric > arrays returns a Boolean-like result. This means that Python can't be used to emulate this part of IDL. I don't understand how these can be not commutative unless they have a side effect on the left argument, and that's not possible in Python anyway. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri Jan 19 20:18:04 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 14:18:04 -0500 Subject: [Python-Dev] test_ucn errors ? Message-ID: [/F] > umm. can anyone explain how this can happen: > > python ../lib/test/regrtest.py test_ucn > test_ucn > test OK. > > python ../lib/test/test_ucn.py > UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name > > how can a test that works under regrtest.py fail when > it's run separately? what am I missing here? Dunno, but add to the pile of mysteries that you're unique. Here on Win98SE: python ../lib/test/regrtest.py test_ucn test_ucn test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name 1 test failed: test_ucn python ../lib/test/test_ucn.py UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name I suggest you reformat your hard drive, and reinstall Windows . From mwh21 at cam.ac.uk Fri Jan 19 20:25:03 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 19 Jan 2001 19:25:03 +0000 Subject: [Python-Dev] test_ucn errors ? In-Reply-To: "Fredrik Lundh"'s message of "Fri, 19 Jan 2001 19:19:37 +0100" References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> <03b301c08244$627f22a0$e46940d5@hagrid> Message-ID: "Fredrik Lundh" writes: > > Feels to me like there's a bug in the string literal processing that > > makes *any* string literal containing \N{...} fail during code > > generation. > > umm. can anyone explain how this can happen: > > python ../lib/test/regrtest.py test_ucn > test_ucn > 1 test OK. This will run the .pyc if present? > python ../lib/test/test_ucn.py > UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name This won't? Note: no traceback -> (in effect, if not design) compile time error. > how can a test that works under regrtest.py fail when > it's run separately? what am I missing here? Well, this is just my guess. Cheers, M. -- Well, you pretty much need Microsoft stuff to get misbehaviours bad enough to actually tear the time-space continuum. Luckily for you, MS Internet Explorer is available for Solaris. -- Calle Dybedahl, alt.sysadmin.recovery From skip at mojam.com Fri Jan 19 20:55:29 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 19 Jan 2001 13:55:29 -0600 (CST) Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010119173415.M17295@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> Message-ID: <14952.39857.83065.24889@beluga.mojam.com> Thomas> But it is going to have to change: you now have to include the Thomas> build tree as well, and that is quite a bit more difficult to Thomas> figure out. I'd suggest a 'make run' that calls python with the Thomas> appropriate PYTHONPATH environment variable, but that doesn't Thomas> cover test-scripts (which I use a lot myself.) Doesn't Andrew's new "platform" target in the top-level Makefile do the right thing? It *should* generate a platform-specific path to the correct build subdirectory. Skip From MarkH at ActiveState.com Fri Jan 19 21:11:02 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Fri, 19 Jan 2001 12:11:02 -0800 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) In-Reply-To: <010c01c08201$4b0ec050$e46940d5@hagrid> Message-ID: > you can compile the module as C++, but that's also a bit painful... My understanding is that the C std doesn't guarantee the order of static object initialization, whereas C++ does provide these semantics. At least that is the excuse I found when digging into this some years ago. Can't-believe-I-mentioned-the-C-standard-while-Tim-is-listening ly, Mark. From guido at digicool.com Fri Jan 19 21:44:53 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 15:44:53 -0500 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Fri, 19 Jan 2001 10:58:08 +0100." <3A680FB0.AED2DB55@lemburg.com> References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> <3A680FB0.AED2DB55@lemburg.com> Message-ID: <200101192044.PAA04154@cj20424-a.reston1.va.home.com> > If we agree to merge the semantics of the two APIs, then str() > would have to change too: is this desirable ? (IMHO, yes) Not clear. Which is why I'm backing off from my initial support for merging the two. I believe unicode() (which is really just an interface to PyUnicode_FromEncodedObject()) currently already does too much. In particular this whole business with calling __str__ on instances seems to me to be unnecessary. I think it should *only* bother to look for something that supports the buffer interface (checking for regular strings only as a tiny optimization), or existing unicode objects. > Here's what we could do: > > a) merge the semantics of unistr() into unicode() > b) apply the same semantics in str() > c) remove unistr() -- how's that for a short-living builtin ;) > > About the semantics: > > These should be backward compatible to str() in that everything > that worked before should continue to work after the merge. > > A strawman for processing str() and unicode(): > > 1. strings/Unicode is passed back as-is I hope you mean str() passes 8-bit strings back as-is, unicode() passes Unicode strings back as-is, right? > 2. tp_str is tried > 3. the method __str__ is tried Shouldn't have to -- instances should define tp_str and all the magic for calling __str__ should be there. I don't understand why it's not done that way, probably just for historical reasons. I also don't think __str__ should be tried for non-instance types. But, more seriously, I believe tp_str or __str__ shouldn't be tried at all by unicode(). > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > 5. for str(): Unicode return values are converted to strings using > the default encoding > for unicode(): Unicode return values are passed back as-is; > string return values are decoded according to the > encoding parameter > 6. the return object is type-checked: str() will always return > a string object, unicode() always a Unicode object > > Note that passing back Unicode is only allowed in case no encoding > was given. Otherwise an execption is raised: you can't decode > Unicode. > > As extension we could add encoding and error parameters to str() > as well. The result would be either an encoding of Unicode objects > passed back by tp_str or __str__ or a recoding of string objects > returned by checks 2, 3 or 4. Naaaah! > If we agree to take this approach, then we should remove the > unistr() Python API before the alpha ships. Frankly, I believe we need more time to sort this out, and therefore I propose to remove the unistr() built-in before the release. Marc, would you do the honors? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Fri Jan 19 21:55:53 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 21:55:53 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <14952.39857.83065.24889@beluga.mojam.com>; from skip@mojam.com on Fri, Jan 19, 2001 at 01:55:29PM -0600 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> <14952.39857.83065.24889@beluga.mojam.com> Message-ID: <20010119215552.O17295@xs4all.nl> On Fri, Jan 19, 2001 at 01:55:29PM -0600, Skip Montanaro wrote: > > Thomas> But it is going to have to change: you now have to include the > Thomas> build tree as well, and that is quite a bit more difficult to > Thomas> figure out. I'd suggest a 'make run' that calls python with the > Thomas> appropriate PYTHONPATH environment variable, but that doesn't > Thomas> cover test-scripts (which I use a lot myself.) > Doesn't Andrew's new "platform" target in the top-level Makefile do the > right thing? It *should* generate a platform-specific path to the correct > build subdirectory. Yes, it does, that's what I meant with 'make run'. But that isn't quite as user-friendly as the current method. How would you run a script with the current python ? 'make SCRIPT=./spamtest.py runscript' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 23:06:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 17:06:03 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Your message of "Fri, 19 Jan 2001 17:34:15 +0100." <20010119173415.M17295@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> Message-ID: <200101192206.RAA12072@cj20424-a.reston1.va.home.com> I finally figured the best way to fix sys.path to find shared modules built by setup.py. At first I thought I had to add it to getpath.c, but the problem is that the name is calculated by calling distutils.util.get_platform(), and that requires a working Python interpreter, so we'd end up with a chicken-or-egg situation. So instead I added 5 lines to site.py, which tests for os.name=='posix', then for sys.path[-1] ending in '/Modules' -- this tests only succeeds when running from the build directory. Then it calls distutils.util.get_platform() and uses the result to calculate the correct directory name, which is then appended to sys.path. Yes, this slows down startup (it imports a large portion of the distutils package), but I don't care -- after all this is mostly for me so I can play with the interpreter right after I've built it, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Jan 19 22:32:34 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:32:34 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> <3A680FB0.AED2DB55@lemburg.com> <200101192044.PAA04154@cj20424-a.reston1.va.home.com> Message-ID: <3A68B272.BBBAECD1@lemburg.com> Guido van Rossum wrote: > > > If we agree to merge the semantics of the two APIs, then str() > > would have to change too: is this desirable ? (IMHO, yes) > > Not clear. Which is why I'm backing off from my initial support for > merging the two. > > I believe unicode() (which is really just an interface to > PyUnicode_FromEncodedObject()) currently already does too much. In > particular this whole business with calling __str__ on instances seems > to me to be unnecessary. I think it should *only* bother to look for > something that supports the buffer interface (checking for regular > strings only as a tiny optimization), or existing unicode objects. Hmm, unicode() should (just like str()) take an object and convert it to a Unicode string. Since many objects either don't support the tp_str slot (instances don't for some reason -- just like they don't tp_call), I had to add some special cases to make Python instances compatible to Unicode in the same way str() does. What I think is really needed is a concept for "stringification" in Python. We currently have these schemes: 1. tp_str 2. method __str__ (not only of Python instances, but any object) 3. character buffer interface These three could easily be unified into the tp_str slot: e.g. tp_str could do the necessary magic to call __str__ or the buffer interface. Note that the same is true for e.g. tp_call -- the special cases we have in ceval.c for the different builtin callable objects would not be necessary if they would implement tp_call. > > Here's what we could do: > > > > a) merge the semantics of unistr() into unicode() > > b) apply the same semantics in str() > > c) remove unistr() -- how's that for a short-living builtin ;) > > > > About the semantics: > > > > These should be backward compatible to str() in that everything > > that worked before should continue to work after the merge. > > > > A strawman for processing str() and unicode(): > > > > 1. strings/Unicode is passed back as-is > > I hope you mean str() passes 8-bit strings back as-is, unicode() > passes Unicode strings back as-is, right? Right. > > 2. tp_str is tried > > 3. the method __str__ is tried > > Shouldn't have to -- instances should define tp_str and all the magic > for calling __str__ should be there. I don't understand why it's not > done that way, probably just for historical reasons. I also don't > think __str__ should be tried for non-instance types. Ok. > But, more seriously, I believe tp_str or __str__ shouldn't be tried at > all by unicode(). Hmm, but how would you implement generic conversion to Unicode then ? We'll need some way for instances (and other types) to provide a conversion to Unicode. Some time ago we discussed this issue and came to the conclusion that tp_str should be allowed to return Unicode data instead of inventing a new tp_unicode slot for this purpose. > > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > > 5. for str(): Unicode return values are converted to strings using > > the default encoding > > for unicode(): Unicode return values are passed back as-is; > > string return values are decoded according to the > > encoding parameter > > 6. the return object is type-checked: str() will always return > > a string object, unicode() always a Unicode object > > > > Note that passing back Unicode is only allowed in case no encoding > > was given. Otherwise an execption is raised: you can't decode > > Unicode. > > > > As extension we could add encoding and error parameters to str() > > as well. The result would be either an encoding of Unicode objects > > passed back by tp_str or __str__ or a recoding of string objects > > returned by checks 2, 3 or 4. > > Naaaah! Would be nice for symmetry and useful in the light of making Unicode the only string type in Py4k ;-) > > If we agree to take this approach, then we should remove the > > unistr() Python API before the alpha ships. > > Frankly, I believe we need more time to sort this out, and therefore I > propose to remove the unistr() built-in before the release. Marc, > would you do the honors? Ok. I'll remove the builtin and the docs, but will leave the PyObject_Unicode() API enabled. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From uche.ogbuji at fourthought.com Fri Jan 19 22:42:40 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Fri, 19 Jan 2001 14:42:40 -0700 Subject: [Python-Dev] Extension doc bugs Message-ID: <200101192142.OAA29168@localhost.localdomain> I'm using the bleeding-edge documentation at http://python.sourceforge.net/devel-docs/api/api.html I know that it's not complete until someone has the time to do so, but I've run into a few places where it's completely wrong. For instance, from the object protocol docs: """ int PyObject_Cmp (PyObject *o1, PyObject *o2, int *result) Compare the values of o1 and o2 using a routine provided by o1, if one exists, otherwise with a routine provided by o2. The result of the comparison is returned in result. Returns -1 on failure. This is the equivalent of the Python statement "result = cmp(o1, o2)". """ After getting weird behavior implementing this, and then squinting at the relevant Python 2.0 code, it appears that in actuality the Cmp function is to return the direct comparison results (-1, 0, 1 based on ordering of the parameters) furthermore, there is no such "result" argument. 4Suite has a lot of C extension code developed by squinting at Python sources and long gdb sessions and I have a feeling that in many cases we're taking up hacks that would get us into trouble across versions, and all that; but the "official" interfaces and behaviors are not documented (or only poorly documented). In general, the C API docs are in a rather sorry state and though I doubt I could do a great deal about fixing it, I'd be interested in discussion of the matter, and perhaps making what contribution I can. Is the doc-sig the best place for this? My experience there wouldn't seem to encourage this conclusion (most of the discussion is of docstring syntax and neat-o automagic document generators). -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mal at lemburg.com Fri Jan 19 22:46:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:46:24 +0100 Subject: [Python-Dev] readline and setup.py Message-ID: <3A68B5B0.771412F7@lemburg.com> The new setup.py procedure for Python causes readline not to be built on my machine. Instead I get a linker error telling me that termcap is not found. Looking at my old Setup file, I have this line: readline readline.c \ -I/usr/include/readline -L/usr/lib/termcap \ -lreadline -lterm I guess, setup.py should be modified to include additional library search paths -- shouldn't hurt on platforms which don't need them. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Jan 19 22:50:53 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:50:53 +0100 Subject: [Python-Dev] _tkinter and setup.py Message-ID: <3A68B6BD.BAD038D6@lemburg.com> Why does setup.py stop with an error in case _tkinter cannot be built (due to an old Tk/Tcl version in my case) ? I think the policy in setup.py should be to output warnings, but continue building the rest of the Python modules. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Fri Jan 19 23:38:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 17:38:22 -0500 Subject: [Python-Dev] 2.1 alpha 1 release schedule Message-ID: <200101192238.RAA12413@cj20424-a.reston1.va.home.com> Practicality beats purity: we're very close to a release, but I've decided to hold off to give Jeremy a chance to finish the nested scopes, to give Fred a chance to revise the weak references according to Martin's wishes, and in general for things to settle. Most likely we'll be able to release Monday night (Jan 22). Unfortunately email through python.org seems to be wedged again (I swear, it seems like it starts getting wedged every afternoon between 3 and 4!) so I don't have a clear view of what the latest checkins were; but from cvs update it seems that the following things happened this afternoon: - Barry fixed a core dump in function attribute assignments - Marc-Andre withrew unistr(), pending more discussion - Fredrik fixed the ucnhash problem - I fixed two path problems in the new build process that only occurred when you were building in a subdirectory of the source tree Good work, crew! I'm taking the weekend off. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Sat Jan 20 00:23:18 2001 From: jack at oratrix.nl (Jack Jansen) Date: Sat, 20 Jan 2001 00:23:18 +0100 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Message by Guido van Rossum , Fri, 19 Jan 2001 11:34:45 -0500 , <200101191634.LAA29239@cj20424-a.reston1.va.home.com> Message-ID: <20010119232323.70B03116392@oratrix.oratrix.nl> Recently, Guido van Rossum said: > > I get the impression that I'm currently seeing a non-NULL third > > argument in my (C) methods even though the method is called without > > keyword arguments. > > > Is this new semantics that I missed the discussion about, or is this a bug? > > [...] > Do you really need the NULL? The places that I know I was counting on the NULL now have "if ( kw && PyObject_IsTrue(kw))", so I'll just have to hope there aren't any more lingering in there. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim.one at home.com Sat Jan 20 01:04:10 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 19:04:10 -0500 Subject: [Python-Dev] MS CRT crashing: In-Reply-To: <200101191556.KAA28761@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I'd just like to note for the record that this is exactly what I had > predicted. I would have hoped you'd be content to let the record speak for itself . > I'd also like to note that I *agree*. With what? That the program is undefined by the C std was never in dispute. > Tim seems to think there's a race condition in the threading code, > but it's really much simpler than that: the same bug can easily be > provoked with a single-threaded program: just randomly read and > write alternatingly. And this is a point in their favor?! "It's OK that the MT library corrupts itself, because even the single-threaded library does"? > So obviously the people who wrote the threading code aren't interested > in the bug, I don't know that it ever got as far as the people who wrote the threading code, but I sure doubt it: when the reply starts "Turns out the C standard explicitly says ...", it strongly suggests it was written by someone who didn't already know what the C std says, and went looking for an excuse to get it off their plate without further effort. Par for the course, if so. > because it's not in their code -- and the people who wrote the code > that doesn't behave well when abused are protected by the C standard... The behavior of things designated "undefined" and "implementation-defined" by the std fall under "quality of implementation". In the real world, the latter is what vendors compete on; meeting the letter of the std is a bare minimum for playing the game at all. The plain fact is that their library is less robust than others in this case. I worked on a multithreaded stdio implementation at KSR, and that sure couldn't corrupt itself. Looks like no flavor of Linux does either. It's not *reasonable* for a library to corrupt itself in this case, although it's certainly reasonable for its behavior to vary from run to run. There's nothing in the C std that says a conforming implementation can't *crash* on the program void main() {int i = 1;} either . a-std-is-a-floor-on-acceptable-behavior-not-a-ceiling-ly y'rs - tim From gstein at lyra.org Sat Jan 20 02:21:56 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 19 Jan 2001 17:21:56 -0800 Subject: [Python-Dev] initializing ob_type In-Reply-To: ; from MarkH@ActiveState.com on Fri, Jan 19, 2001 at 12:11:02PM -0800 References: <010c01c08201$4b0ec050$e46940d5@hagrid> Message-ID: <20010119172156.Y7731@lyra.org> On Fri, Jan 19, 2001 at 12:11:02PM -0800, Mark Hammond wrote: > > you can compile the module as C++, but that's also a bit painful... > > My understanding is that the C std doesn't guarantee the order of static > object initialization, whereas C++ does provide these semantics. At least > that is the excuse I found when digging into this some years ago. True, but when PyWhatever_Type is initialized, &PyType_Type ought to be ready (even if it isn't initialized). Heck, &PyType_Type points into the Python core which is *definitely* loaded by that point. Now, if "initialization" also means "relocation to a specific address" then I can understand. Hrm... I've just spent some time with the Windows SDK docs, and I can't find anything that really discusses the problem and resolution. There certainly isn't any warning about "don't do this." It all talks about how fixups are stored with the DLL, how you can optionally use BIND to pre-bind the values, blah blah blah. But nothing saying "it doesn't work." It would be interesting to know more about the actual symptoms that appears when the ob_type init is performed by the structure (rather than at runtime). What happens? Bad address? NULL value? Failure to resolve and load? Is PyType_Type not exported correctly or something? Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at digicool.com Sat Jan 20 03:05:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 21:05:39 -0500 Subject: [Python-Dev] How to get setup.py to build expat? Message-ID: <200101200205.VAA13299@cj20424-a.reston1.va.home.com> The setup.py script does not build the expat module for me. I have expat installed in /usr/local, at least I believe so: I have /usr/local/include/xmlparse.h and /usr/local/lib/libexpat.a -- do I need more? How can I get setup.py to spit out what it tries, and why it fails? setup.py -v build doesn't give any extra output. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Sat Jan 20 03:41:43 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 20 Jan 2001 03:41:43 +0100 Subject: [Python-Dev] initializing ob_type References: <010c01c08201$4b0ec050$e46940d5@hagrid> <20010119172156.Y7731@lyra.org> Message-ID: <00f001c0828a$bc903900$e46940d5@hagrid> greg wrote: > It would be interesting to know more about the actual symptoms that appears > when the ob_type init is performed by the structure (rather than at runtime). > What happens? http://www.python.org/doc/FAQ.html#3.24 "3.24. "Initializer not a constant" while building DLL on MS-Windows "Static type object initializers in extension modules may cause compiles to fail with an error message like "initializer not a constant" Cheers /F From uche.ogbuji at fourthought.com Sat Jan 20 06:29:23 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Fri, 19 Jan 2001 22:29:23 -0700 Subject: [Python-Dev] Extension doc bugs In-Reply-To: Message from uche.ogbuji@fourthought.com of "Fri, 19 Jan 2001 14:42:40 MST." <200101192142.OAA29168@localhost.localdomain> Message-ID: <200101200529.WAA30349@localhost.localdomain> > For instance, from the object protocol docs: > > """ > int PyObject_Cmp (PyObject *o1, PyObject *o2, int *result) > Compare the values of o1 and o2 using a routine provided by o1, if one > exists, otherwise with a routine provided by o2. The result of the > comparison is returned in result. Returns -1 on failure. This is the > equivalent of the Python statement "result = cmp(o1, o2)". > """ > > After getting weird behavior implementing this, and then squinting at the > relevant Python 2.0 code, it appears that in actuality the Cmp function is to > return the direct comparison results (-1, 0, 1 based on ordering of the > parameters) furthermore, there is no such "result" argument. Bother. I didn't squint hard enough. I mistook the tp_compare slot for the PyObject_Cmp equivalent. I have indeed run into what I'm sure are nits in the Python/C API but given that my greatest alarm was false, I'll be more careful before bringing up the others. I'm still curious as to the best forum for this. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Sat Jan 20 06:36:12 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 00:36:12 -0500 Subject: [Python-Dev] Extension doc bugs In-Reply-To: <200101192142.OAA29168@localhost.localdomain> Message-ID: [uche.ogbuji at fourthought.com] > ... > In general, the C API docs are in a rather sorry state and though > I doubt I could do a great deal about fixing it, I'd be interested in > discussion of the matter, and perhaps making what contribution I can. > > Is the doc-sig the best place for this? Nope! Discussing it won't do any good, there or anywhere else. What it needs is for people to send better docs to python-docs at python.org or upload LaTeX patches to SourceForge, and to report doc bugs on SourceForge (which is where the start of this msg should have gone!). Most days we just work on whatever is backed up at SourceForge; if doc bugs don't show up there, they won't get repaired. the-docs-are-only-10x-better-than-the-sum-of-the-individual- contributions-ly y'rs - tim From tim.one at home.com Sat Jan 20 07:17:04 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 01:17:04 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Objects object.c,2.109,2.110 In-Reply-To: Message-ID: [Barry] > Modified Files: > object.c > Log Message: > default_3way_compare(): When comparing the pointers, they must be cast > to integer types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t). > ANSI specifies that pointer compares other than == and != to > non-related structures are undefined. This quiets an Insure > portability warning. Barry, that comment belongs in the code, not in the checkin msg. The code *used* to do this correctly (as you well know, since you & I went thru considerable pain to fix this the first time). However, because the *reason* for the convolution wasn't recorded in the code as a comment, somebody threw it all away the first time it got reworked. c-code-isn't-often-self-explanatory-ly y'rs - tim From tim.one at home.com Sat Jan 20 07:30:42 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 01:30:42 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 Message-ID: I had a huge string and wanted to put a double-quote on each end. The boring: '"' + huge + '"' does the job, but is inefficent . Then this transparent variation sprang unbidden from my hoary brow: huge.join('""') *That* should put to rest the argument over whether .join() is more properly a method of the separator or the sequence -- '""'.join(huge) instead would look plain silly . not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim From tim.one at home.com Sat Jan 20 10:28:18 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 04:28:18 -0500 Subject: [Python-Dev] Comparison of recursive objects In-Reply-To: <14952.21218.416551.695660@localhost.localdomain> Message-ID: [Guido's checkin msg] > ... > In a discussion with Tim, where we discovered that our intuition > on when a<=b should be true was failing, we decided to outlaw > ordering comparisons on recursive objects. (Once we have fixed our > intuition and designed a matching algorithm that's practical and > reasonable to implement, we can allow such orderings again.) [Jeremy] > Sounds sensible to me! I was quite puzzled about what <= should > return for recursive objects. That's easy: x <= y for recursive objects should return true if and only if x < y or x == y return true <0.9 wink>. x == y isn't a problem, although Python gives a remarkable answer: recursive objects in Python are instances of rooted, ordered, directed, finite, node-labeled graphs, and "x == y" in Python answers whether their graphs are isomorphic. Viewed that way (which is the correct way <0.5 wink>), the *natural* meaning for "x <= y" is "y contains a subgraph isomorphic to x". And that has *almost* all the nice properties we like: x <= x is true (x <= y and y <= z) implies x <= z (x <= y and y <= x) if and only if x == y However, 1. That's much harder to compute. 2. It implies, e.g., [2] <= [1, 2], and that's not what we *want* non-recursive sequence comparison to mean. 3. It's a partial ordering: given arbitrary x and y, it may be that neither contains an isomorphic image of the other. 4. We've again given up on avoiding surprises in *simple* comparisons among builtin types, like (under current CVS): >>> 1 < [1] < 0L < 1 1 >>> 1 < 1 0 >>> so it's hard to see why we should do any work at all to avoid violating "intuition" when comparing recursive objects: we're already scrubbing the face of intuition with steel wool, setting it on fire, then putting it out with an axe . Now let's look at Guido's example (or one of them, anyway): >>> a = [] >>> a.append(a) >>> a.append("x") >>> b = [] >>> b.append(b) >>> b.append("y") >>> a [[...], 'x'] >>> b [[...], 'y'] >>> I think it's a trick of *typography* that caused my first thought to be "well, clearly, a < b". That is, the *display* shows me two 2-element lists, each with the same "blob" as the first element, and where a[1] is obviously less than b[1]. Since "the blobs" are the same, the second elements control the outcome. But those "blobs" aren't really the same: a[0] is a, and b[0] is b, so asking whether a < b by looking first at their first elements just leads back to the original question: asking whether a[0] < b[0] is again asking whether a < b, and that makes no progress. Saying that a is less than b by fiat is *consistent* with the rules for lexicographic ordering, but so is insisting that a is greater than b. There's no basis for picking one over the other, and so no clear hope of coming up with a generally consistent scheme. Well, one clear hope: if recursive comparison says "not equal", it could resolve the dilemma by comparing object id instead. That would be consistent (I mostly think at the moment ...), but if you run the program above multiple times it may say a < b on some runs and b < a on others. WRT "the right way", it should be clear from the attached picture that neither a nor b contains an isomorphic image of the other, so from that POV they're not comparable (a != b, but neither a <= b nor b <= a holds). So this is what Guido made Python do: >>> a == b # still cool: they're not isomorphic and Python knows it 0 >>> a < b Traceback (most recent call last): File "", line 1, in ? ValueError: can't order recursive values >>> a <= b Traceback (most recent call last): File "", line 1, in ? ValueError: can't order recursive values In light of that, I still find these mildly surprising: >>> a < a 0 >>> a <= a 1 >>> I guess some recursive values are more orderable than others . >>> import copy >>> c = copy.deepcopy(a) >>> c [[...], 'x'] >>> a == c 1 >>> a <= c 1 >>> a < c 0 >>> BTW, this kind of construction appears to give equality-testing that's at best(!) exponential-time in the size of the dicts: def timeeq(x, y): from time import clock import sys s = clock() result = x == y f = clock() print x, result, round(f-s, 1), "seconds" sys.stdout.flush() d = {} e = {} timeeq(d, e) d[0] = d e[0] = e timeeq(d, e) d[1] = d e[1] = e timeeq(d, e) d[2] = d e[2] = e timeeq(d, e) Output: {} 1 0.0 seconds {0: {...}} 1 0.0 seconds {1: {...}, 0: {...}} 1 6.5 seconds After more than 15 minutes, the 3-element dict comparison still hasn't completed (yikes!). ackerman's-function-eat-your-heart-out-ly y'rs - tim -------------- next part -------------- A non-text attachment was scrubbed... Name: loopy.jpg Type: image/jpeg Size: 11363 bytes Desc: not available URL: From thomas at xs4all.net Sat Jan 20 15:30:26 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 15:30:26 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <200101192206.RAA12072@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, Jan 19, 2001 at 05:06:03PM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> <200101192206.RAA12072@cj20424-a.reston1.va.home.com> Message-ID: <20010120153026.L17392@xs4all.nl> On Fri, Jan 19, 2001 at 05:06:03PM -0500, Guido van Rossum wrote: > So instead I added 5 lines to site.py, which tests for > os.name=='posix', then for sys.path[-1] ending in '/Modules' -- this > tests only succeeds when running from the build directory. Then it > calls distutils.util.get_platform() and uses the result to calculate > the correct directory name, which is then appended to sys.path. > Yes, this slows down startup (it imports a large portion of the > distutils package), but I don't care -- after all this is mostly for > me so I can play with the interpreter right after I've built it, > right? Right. The only downside (as far as I can tell) is that 'python -S' no longer works, in the build tree. I don't think that's that big a deal, but it should be documented somewhere, so we don't end up being boggled by it once we forget about it :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Sat Jan 20 17:18:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 11:18:39 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: Your message of "Fri, 19 Jan 2001 00:45:32 +0100." <20010119004532.G17392@xs4all.nl> References: <20010119004532.G17392@xs4all.nl> Message-ID: <200101201618.LAA15675@cj20424-a.reston1.va.home.com> > On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > > > filename = '/tmp/delete_me' > > This reminds me: we need a portable way to handle test-files :) Yeah, I noticed that this test failed on Windows -- fixed now. The test_support module exports TESTFN; there's also tempfile.mktemp() which should generate temporary files on all platforms. Is that enough? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Sat Jan 20 17:36:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 17:36:05 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201618.LAA15675@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Jan 20, 2001 at 11:18:39AM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> Message-ID: <20010120173605.P17295@xs4all.nl> On Sat, Jan 20, 2001 at 11:18:39AM -0500, Guido van Rossum wrote: > > On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > > > > > filename = '/tmp/delete_me' > > > > This reminds me: we need a portable way to handle test-files :) > Yeah, I noticed that this test failed on Windows -- fixed now. > The test_support module exports TESTFN; there's also tempfile.mktemp() > which should generate temporary files on all platforms. > Is that enough? Well, there is one more issue, which we can't fix terribly easy: test_fcntl tries to flock() the file. flock() doesn't work on all filesystems (like NFS) :P If we cared a lot, we could try several alternatives (current dir, /tmp, /var/tmp) in the specific case of flock, but personally I don't want to bother, and real sysadmins (who should care about the test failure) are more likely to build Python on a local disk than in their NFS-mounted homedirectory. At least that's how we do it :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Sat Jan 20 17:43:49 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 11:43:49 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 In-Reply-To: Your message of "Sat, 20 Jan 2001 01:30:42 EST." References: Message-ID: <200101201643.LAA16269@cj20424-a.reston1.va.home.com> > I had a huge string and wanted to put a double-quote on each end. The > boring: > > '"' + huge + '"' > > does the job, but is inefficent . Then this transparent variation > sprang unbidden from my hoary brow: > > huge.join('""') Points off for obscurity though! My favorite for this is: '"%s"' % huge Worth a microbenchmark? > *That* should put to rest the argument over whether .join() is more properly > a method of the separator or the sequence -- '""'.join(huge) instead would > look plain silly . > > not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim Give up the channeling for a while -- there's too much interference in the air from the Microsoft threaded stdio debate still. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Sat Jan 20 17:47:44 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 20 Jan 2001 10:47:44 -0600 (CST) Subject: [Python-Dev] how to test my __all__ lists? Message-ID: <14953.49456.654121.987189@beluga.mojam.com> How do I test the __all__ lists I'm building? I'm worried about a couple things: 1. I may have typos 2. I may leave something out of a list that should be imported by from-module-import-*. Thoughts? Skip From guido at digicool.com Sat Jan 20 18:00:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 12:00:05 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: Your message of "Sat, 20 Jan 2001 17:36:05 +0100." <20010120173605.P17295@xs4all.nl> References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> Message-ID: <200101201700.MAA16491@cj20424-a.reston1.va.home.com> > > > > filename = '/tmp/delete_me' > > > > > > This reminds me: we need a portable way to handle test-files :) > > Yeah, I noticed that this test failed on Windows -- fixed now. > > > The test_support module exports TESTFN; there's also tempfile.mktemp() > > which should generate temporary files on all platforms. > > Is that enough? > > Well, there is one more issue, which we can't fix terribly easy: test_fcntl > tries to flock() the file. flock() doesn't work on all filesystems (like > NFS) :P If we cared a lot, we could try several alternatives (current dir, > /tmp, /var/tmp) in the specific case of flock, but personally I don't want to > bother, and real sysadmins (who should care about the test failure) are more > likely to build Python on a local disk than in their NFS-mounted > homedirectory. At least that's how we do it :-) These days, I would think that it's a pretty sure bet that the system's tmp directory is not on NFS. Then we could just use tempfile.mktemp() in that module, right? Or does the /tmp filesystem on Linux (which AFAIK is a RAM disk implemented in virtual memory so it uses swap space when it runs out of RAM) not support locking? I don't particularly care about fixing this -- I haven't seen bug reports about this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jan 20 18:38:38 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 12:38:38 -0500 Subject: [Python-Dev] how to test my __all__ lists? In-Reply-To: Your message of "Sat, 20 Jan 2001 10:47:44 CST." <14953.49456.654121.987189@beluga.mojam.com> References: <14953.49456.654121.987189@beluga.mojam.com> Message-ID: <200101201738.MAA16636@cj20424-a.reston1.va.home.com> > How do I test the __all__ lists I'm building? I'm worried about a couple > things: > > 1. I may have typos Do "from M import *" -- this will raise an AttributeError if there's something in __all__ that's not defined in the module. > 2. I may leave something out of a list that should be imported by > from-module-import-*. That's what alpha-testing's for. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at netaxs.com Sat Jan 20 18:49:43 2001 From: esr at netaxs.com (Eric Raymond) Date: Sat, 20 Jan 2001 12:49:43 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <3A672376.4B951848@lemburg.com>; from M.-A. Lemburg on Thu, Jan 18, 2001 at 06:10:14PM +0100 References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> Message-ID: <20010120124943.C6073@unix3.netaxs.com> > A combination of time.time(), process id and counter should > work in all cases. Make sure you use a lock around the counter, > though. Yes, but...this hack has to work in a multithreaded environment, so process ID isn't good enough. And I don't want to keep a counter around if I don't have to. -- Eric S. Raymond From guido at digicool.com Sat Jan 20 19:01:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 13:01:04 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Your message of "Sat, 20 Jan 2001 12:49:43 EST." <20010120124943.C6073@unix3.netaxs.com> References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> <20010120124943.C6073@unix3.netaxs.com> Message-ID: <200101201801.NAA16880@cj20424-a.reston1.va.home.com> > > A combination of time.time(), process id and counter should > > work in all cases. Make sure you use a lock around the counter, > > though. > > Yes, but...this hack has to work in a multithreaded environment, > so process ID isn't good enough. And I don't want to keep a counter > around if I don't have to. Sorry Eric, this just doesn't make sense. Keeping a counter around in your module (protected by a semaphore) is obviously the right solution. Why are you fighting it? --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at netaxs.com Sat Jan 20 19:20:26 2001 From: esr at netaxs.com (Eric Raymond) Date: Sat, 20 Jan 2001 13:20:26 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <200101201801.NAA16880@cj20424-a.reston1.va.home.com>; from Guido van Rossum on Sat, Jan 20, 2001 at 01:01:04PM -0500 References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> <20010120124943.C6073@unix3.netaxs.com> <200101201801.NAA16880@cj20424-a.reston1.va.home.com> Message-ID: <20010120132026.E6073@unix3.netaxs.com> On Sat, Jan 20, 2001 at 01:01:04PM -0500, Guido van Rossum wrote: > > Yes, but...this hack has to work in a multithreaded environment, > > so process ID isn't good enough. And I don't want to keep a counter > > around if I don't have to. > > Sorry Eric, this just doesn't make sense. Keeping a counter around in > your module (protected by a semaphore) is obviously the right > solution. Why are you fighting it? Actually, I'm not fighting it any more. I changed my mind a few minutes after shipping that response. -- Eric S. Raymond From thomas at xs4all.net Sat Jan 20 19:37:10 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 19:37:10 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201700.MAA16491@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Jan 20, 2001 at 12:00:05PM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> Message-ID: <20010120193710.Q17295@xs4all.nl> On Sat, Jan 20, 2001 at 12:00:05PM -0500, Guido van Rossum wrote: > > Well, there is one more issue, which we can't fix terribly easy: test_fcntl > > tries to flock() the file. flock() doesn't work on all filesystems (like > > NFS) :P If we cared a lot, we could try several alternatives (current dir, > > /tmp, /var/tmp) in the specific case of flock, but personally I don't want to > > bother, and real sysadmins (who should care about the test failure) are > > more likely to build Python on a local disk than in their NFS-mounted > > homedirectory. At least that's how we do it :-) > These days, I would think that it's a pretty sure bet that the > system's tmp directory is not on NFS. Then we could just use > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > it uses swap space when it runs out of RAM) not support locking? Actually, most Linux distributions don't care enough about /tmp to make it a RAM-based filesystem. At least Debian and RedHat don't :) (There's a good reason for that: Linux's disk-data cache rocks if you have enough RAM, so there's no real gain in using a ramdisk) BSDI does (optionally) have such a /tmp, and probably the other BSD derived systems as well. But that doesn't mean it doesn't support locking, so that's not a real excuse. But like I said, I don't care enough to worry about it. I'll look at it before alpha2. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Sat Jan 20 21:10:51 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 15:10:51 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Message-ID: [Tim] > ... > 4. We've again given up on avoiding surprises in *simple* comparisons > among builtin types, like (under current CVS): > > >>> 1 < [1] < 0L < 1 > 1 > >>> 1 < 1 > 0 > >>> I really dislike that. Here's a consequence at a higher level: N = 5 x = [1 for i in range(N)] + \ [[1] for i in range(N)] + \ [0L for i in range(N)] x.sort() print x from random import shuffle tries = failures = 0 while failures < 5: tries += 1 y = x[:] shuffle(y) y.sort() if x != y: print "oops, on try number", tries print y failures += 1 and here's a typical run (2.1a1): [1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L, 0L] oops, on try number 3 [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] oops, on try number 5 [[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]] oops, on try number 6 [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] oops, on try number 7 [[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]] oops, on try number 8 [0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L] I've often used list.sort() on a heterogeneous list simply to bring the elements of the same type next to each other. But as "try number 5" shows, I can no longer rely on even getting all the lists together. Indeed, heterogenous list.sort() has become a very bad (biased and slow) implementation of random.shuffle() . Under 2.0, the program never prints "oops", because the only violations of transitivity in 2.0's ordering of builtin types were bugs in the implementation (none of which show up in this simple test case); 2.0's .sort() *always* produces [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] The base trick in 2.0 was sound: when falling back to the "compare by name of the type" last resort, treat all numeric types as if they had the same name. While Python can't enforce that any user-defined __cmp__ is consistent, I think it should continue to set a good example in the way it implements its own comparisons. grumblingly y'rs - tim From skip at mojam.com Sat Jan 20 21:42:27 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 20 Jan 2001 14:42:27 -0600 (CST) Subject: [Python-Dev] should a module's thread safety be documented? Message-ID: <14953.63539.629197.232848@beluga.mojam.com> A bit late for 2.1alpha1, but it just occurred to me that perhaps there should be an annotation in the documentation that indicates whether or not a module is thread-safe. For example, many functions in fileinput rely on a module global called _state. It strikes me that this module is not likely to be thread-safe, yet the documentation doesn't appear to mention this, certainly not in an obvious fashion. Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of LaTex macros in Fred's arsenal? This would make documenting these properties both easy and consistent across modules. Skip From tim.one at home.com Sat Jan 20 22:13:41 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 16:13:41 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 In-Reply-To: <200101201643.LAA16269@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > huge.join('""') [Guido] > Points off for obscurity though! The Subject line was "Stupid Python Tricks" for a reason . Those who don't know the language inside-out should be tickled by figuring out why it even *works* (hint for the baffled: you have to view '""' as a sequence rather than as an atomic string). > My favorite for this is: > > '"%s"' % huge > > Worth a microbenchmark? Absolutely! I get: obvious 15.574 obscure 8.165 sprintf 8.133 after running: ITERS = 1000 indices = [0] * ITERS def obvious(huge): for i in indices: '"' + huge + '"' def obscure(huge): for i in indices: huge.join('""') def sprintf(huge): for i in indices: '"%s"' % huge def runtimes(huge): from time import clock for f in obvious, obscure, sprintf: start = clock() f(huge) finish = clock() print "%12s %7.3f" % (f.__name__, finish - start) runtimes("x" * 1000000) under current 2.1a1. Not a dead-quiet machine, but the difference is too small to care. Speed up huge.join attr lookup, and it would probably be faster . Hmm: if I boost ITERS high enough and cut back the size of huge, "obscure" eventually becomes *slower* than "obvious", and even if the "huge.join" lookup is floated out of the loop. I guess that points to the relative burden of calling a bound method. So, in real life, the huge.join approach may well be the slowest! >> not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim > Give up the channeling for a while -- there's too much interference in > the air from the Microsoft threaded stdio debate still. :-) What debate? You need two arguably valid points of view for a debate to even start . gloating-in-victory-vicious-in-defeat-but-simply-unbearable-in- ambiguity-ly y'rs - tim From fdrake at acm.org Sat Jan 20 22:23:58 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 20 Jan 2001 16:23:58 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201700.MAA16491@cj20424-a.reston1.va.home.com> References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> Message-ID: <14954.494.223724.705495@cj42289-a.reston1.va.home.com> Guido van Rossum writes: > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > it uses swap space when it runs out of RAM) not support locking? I thought it was Solaris that used available+virtual memory for /tmp; that was what we ran into at CNRI. (Which doesn't preclude Linux from doing the same, I just don't recall that we've encountered that.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Sat Jan 20 23:05:27 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 20 Jan 2001 17:05:27 -0500 (EST) Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14953.63539.629197.232848@beluga.mojam.com> References: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > should be an annotation in the documentation that indicates whether or not a > module is thread-safe. For example, many functions in fileinput rely on a If you can create a list of the known thread safe and known thread unsafe modules, I'll come up with appropriate annotations for the documentation. > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > LaTex macros in Fred's arsenal? This would make documenting these > properties both easy and consistent across modules. Not sure that this is exactly the right approach to the markup; I'll think about this one. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at mojam.com Sat Jan 20 23:31:52 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 20 Jan 2001 16:31:52 -0600 (CST) Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> References: <14953.63539.629197.232848@beluga.mojam.com> <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> Message-ID: <14954.4568.460875.662560@beluga.mojam.com> Fred> If you can create a list of the known thread safe and known thread Fred> unsafe modules, I'll come up with appropriate annotations for the Fred> documentation. I think that's going to be a significant undertaking, requiring examination of a lot of Python and C code. I'd rather approach it incrementally, which was why I suggested the LaTeX macros. As modules are determined to be safe or unsafe, the appropriate safety macro could just be inserted into the correct lib*.tex file. It would (in my mind) expand to a stock bit of text inserted at a standard place in the file. Skip From tim.one at home.com Sat Jan 20 23:52:09 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 17:52:09 -0500 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: [Skip Montanaro] > ... > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to > the litany of LaTex macros in Fred's arsenal? This would make > documenting these properties both easy and consistent across > modules. When a module is *not* threadsafe, that's usually considered "a bug" in the module. So we should just point out modules that aren't threadsafe by design. Alas, that's A Project. From nas at arctrix.com Sat Jan 20 16:59:14 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sat, 20 Jan 2001 07:59:14 -0800 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sat, Jan 20, 2001 at 03:10:51PM -0500 References: Message-ID: <20010120075914.B18840@glacier.fnational.com> On Sat, Jan 20, 2001 at 03:10:51PM -0500, Tim Peters wrote: > While Python can't enforce that any user-defined __cmp__ is consistent, I > think it should continue to set a good example in the way it implements its > own comparisons. I think the 2.0 behavior should be fairly easy to restore. I'll leave it up to Guido though since he's "Mr. Comparison" now and I haven't looked at the code since I checked in the coercion patch. Neil From nas at arctrix.com Sat Jan 20 17:03:36 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sat, 20 Jan 2001 08:03:36 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <14954.494.223724.705495@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Sat, Jan 20, 2001 at 04:23:58PM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> <14954.494.223724.705495@cj42289-a.reston1.va.home.com> Message-ID: <20010120080336.C18840@glacier.fnational.com> On Sat, Jan 20, 2001 at 04:23:58PM -0500, Fred L. Drake, Jr. wrote: > > Guido van Rossum writes: > > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > > it uses swap space when it runs out of RAM) not support locking? > > I thought it was Solaris that used available+virtual memory for > /tmp; that was what we ran into at CNRI. (Which doesn't preclude > Linux from doing the same, I just don't recall that we've encountered > that.) I don't know of any Linux system that uses a RAM based /tmp. The Linux implemention of ext2 is so fast it doesn't make any sense. If you have enough memory all the data is stored in the buffer, page, and inode caches anyhow. Neil From trentm at ActiveState.com Sun Jan 21 00:35:56 2001 From: trentm at ActiveState.com (Trent Mick) Date: Sat, 20 Jan 2001 15:35:56 -0800 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? Message-ID: <20010120153556.C18375@ActiveState.com> ... or am I missing something? With Python 2.0 on Windows 2000, when playing with sys.exit() and sys.argv() I get some unexpected results. First here is a simple case that shows what I expect. I run "caller_good.py" which call "callee_good.py" and prints its return value. "callee_good.py" returns 42 so "42" is printed: ----------------- caller_good.py -------------------- import os retval = os.system("python callee_good.py") print "caller: the retval is", retval ----------------------------------------------------- ----------------- callee_good.py -------------------- import sys sys.exit(42) ----------------------------------------------------- D:\trentm\tmp>python caller_good.py caller: the retval is 42 Now here is what I didn't expect. I changed "caller_bad.py" to pass, as an argument, the value that "callee_bad.py" should return. ----------------- caller_bad.py --------------------- import os retval = os.system("python callee_bad.py 42") print "caller: the retval is", retval ----------------------------------------------------- ----------------- callee_bad.py --------------------- import sys firstarg = sys.argv[1] print "callee_bad: firstarg is", firstarg sys.exit(firstarg) ----------------------------------------------------- D:\trentm\tmp>python caller_bad.py callee_bad: firstarg is 42 42 # <---- where did *this* print come from? caller: the retval is 1 # <---- and this retval is incorrect Any ideas? I have not tried to track this down yet nor have I tried the latest Python-CVS state. Trent -- Trent Mick TrentM at ActiveState.com From moshez at zadka.site.co.il Sun Jan 21 13:37:57 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sun, 21 Jan 2001 14:37:57 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,NONE,1.1 In-Reply-To: References: Message-ID: <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> Yay! I can change to python-dev manually! (hear sounds of the timbot's teeth grinding) On Sat, 20 Jan 2001, Skip Montanaro wrote: > def check_all(_modname): > exec "import %s" % _modname > verify(hasattr(sys.modules[_modname],"__all__"), > "%s has no __all__ attribute" % _modname) > exec "del %s" % _modname > exec "from %s import *" % _modname > > _keys = locals().keys() .... Wouldn't it be better to use the d = {} exec "foo", d And verify "d" instead? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido at digicool.com Sun Jan 21 17:51:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 21 Jan 2001 11:51:45 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sat, 20 Jan 2001 15:10:51 EST." References: Message-ID: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> [Tim, complaining that numerical types are no longer lumped together in default comparisons:] > I've often used list.sort() on a heterogeneous list simply to bring the > elements of the same type next to each other. But as "try number 5" shows, > I can no longer rely on even getting all the lists together. Indeed, > heterogenous list.sort() has become a very bad (biased and slow) > implementation of random.shuffle() . > > Under 2.0, the program never prints "oops", because the only violations of > transitivity in 2.0's ordering of builtin types were bugs in the > implementation (none of which show up in this simple test case); 2.0's > .sort() *always* produces > > [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] > > The base trick in 2.0 was sound: when falling back to the "compare by name > of the type" last resort, treat all numeric types as if they had the same > name. > > While Python can't enforce that any user-defined __cmp__ is consistent, I > think it should continue to set a good example in the way it implements its > own comparisons. I think I can put this behavior back. (I believe that before I reorganized the comparison code, it seemed really tricky to do this, but after refactoring the code, it's quite easy to do.) My only concern is that under the old schele, two different numeric extension types that somehow can't be compared will end up being *equal*. To fix this, I propose that if the names compare equal, as a last resort we compare the type pointers -- this should be consistent too. Here's a patch that stops your test program from reporting failures: *** object.c 2001/01/21 16:25:18 2.112 --- object.c 2001/01/21 16:50:16 *************** *** 522,527 **** --- 522,528 ---- default_3way_compare(PyObject *v, PyObject *w) { int c; + char *vname, *wname; if (v->ob_type == w->ob_type) { /* When comparing these pointers, they must be cast to *************** *** 550,557 **** } /* different type: compare type names */ ! c = strcmp(v->ob_type->tp_name, w->ob_type->tp_name); ! return (c < 0) ? -1 : (c > 0) ? 1 : 0; } #define CHECK_TYPES(o) PyType_HasFeature((o)->ob_type, Py_TPFLAGS_CHECKTYPES) --- 551,571 ---- } /* different type: compare type names */ ! if (v->ob_type->tp_as_number) ! vname = ""; ! else ! vname = v->ob_type->tp_name; ! if (w->ob_type->tp_as_number) ! wname = ""; ! else ! wname = w->ob_type->tp_name; ! c = strcmp(vname, wname); ! if (c < 0) ! return -1; ! if (c > 0) ! return 1; ! /* Same type name, or (more likely) incomparable numeric types */ ! return (v->ob_type < w->ob_type) ? -1 : 1; } #define CHECK_TYPES(o) PyType_HasFeature((o)->ob_type, Py_TPFLAGS_CHECKTYPES) Let me know if you agree with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sun Jan 21 18:00:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 21 Jan 2001 12:00:02 -0500 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: Your message of "Sat, 20 Jan 2001 14:42:27 CST." <14953.63539.629197.232848@beluga.mojam.com> References: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: <200101211700.MAA25479@cj20424-a.reston1.va.home.com> > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > should be an annotation in the documentation that indicates whether or not a > module is thread-safe. For example, many functions in fileinput rely on a > module global called _state. It strikes me that this module is not likely > to be thread-safe, yet the documentation doesn't appear to mention this, > certainly not in an obvious fashion. > > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > LaTex macros in Fred's arsenal? This would make documenting these > properties both easy and consistent across modules. It's hard to say whether a *whole module* is threadsafe. E.g. in the fileinput example, there's the clear implication that if you use this in multiple threads, you should instantiate your own FileInput instances, and then you're totally thread-safe. Clearly the semantics of the module-global functions are thread-unsafe though. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sun Jan 21 19:45:07 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 13:45:07 -0500 Subject: [Python-Dev] test_sax failing (Windows) Message-ID: test test_sax crashed -- exceptions.SystemError: 'finally' pops bad exception Sometimes it crashes (some flavor of memory fault) instead. Elsewhere? From nas at arctrix.com Sun Jan 21 13:28:35 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 21 Jan 2001 04:28:35 -0800 Subject: [Python-Dev] autoconf --enable vs. --with Message-ID: <20010121042835.A19774@glacier.fnational.com> I've been working a bit on the build process lately. I came across this in the autoconf documentation: If a software package has optional compile-time features, the user can give `configure' command line options to specify whether to compile them. The options have one of these forms: --enable-FEATURE[=ARG] --disable-FEATURE Some packages require, or can optionally use, other software packages which are already installed. The user can give `configure' command line options to specify which such external software to use. The options have one of these forms: --with-package[=ARG] --without-package Is it worth fixing the Python configure script to comply with these definitions? It looks like with-cycle-gc and mybe with-pydebug would have to be changed. Neil AC_ARG_ENABLE From tim.one at home.com Sun Jan 21 20:44:38 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 14:44:38 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> Message-ID: [Guido, on again lumping numbers together] > I think I can put this behavior back. (I believe that before I > reorganized the comparison code, it seemed really tricky to do this, > but after refactoring the code, it's quite easy to do.) I can believe that; and I believe the "bugs" in 2.0 ended up somewhere in or around the bowels of the xxxHalfBinOp-like routines (which were really tricky to my eyes -- the interactions among coercions and comparisons were hard to keep straight). > My only concern is that under the old schele, two different numeric > extension types that somehow can't be compared will end up being > *equal*. To fix this, I propose that if the names compare equal, as a > last resort we compare the type pointers -- this should be consistent > too. Agreed, and sounds fine! Save Barry a little work, though: > ! /* Same type name, or (more likely) incomparable numeric types */ > ! return (v->ob_type < w->ob_type) ? -1 : 1; That's non-std C in a way Insure complains about elsewhere; change to return ((Py_uintptr_t)v->ob_type < (Py_uintptr_t)w->ob_type) ? -1 : 1; if-vendors-stuck-to-the-letter-of-the-c-std-python-wouldn't- compile-at-all-ly y'rs - tim From trentm at ActiveState.com Sun Jan 21 21:01:44 2001 From: trentm at ActiveState.com (Trent Mick) Date: Sun, 21 Jan 2001 12:01:44 -0800 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? In-Reply-To: <20010120153556.C18375@ActiveState.com>; from trentm@ActiveState.com on Sat, Jan 20, 2001 at 03:35:56PM -0800 References: <20010120153556.C18375@ActiveState.com> Message-ID: <20010121120144.B28643@ActiveState.com> On Sat, Jan 20, 2001 at 03:35:56PM -0800, Trent Mick wrote: > > ... or am I missing something? Ignore me. RTFM (sys.exit), Trent. Sorry, Trent -- Trent Mick TrentM at ActiveState.com From tim.one at home.com Sun Jan 21 21:13:02 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 15:13:02 -0500 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? In-Reply-To: <20010121120144.B28643@ActiveState.com> Message-ID: [Trent, quoting Trent] >> >> ... or am I missing something? [and back to Trent] > Ignore me. RTFM (sys.exit), Trent. Nobody wants to ignore *you*, Trent! If it's not the case that you wanted to code sys.exit(int(firstarg)) instead, holler, cuz if that wasn't the problem I'm still baffled. or-if-it-was-it-caught-you-because-sys.exit's-tricks-aren't- really-pythonic-ly y'rs - tim From loewis at informatik.hu-berlin.de Sun Jan 21 22:21:24 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:21:24 +0100 (MET) Subject: [Python-Dev] test_sax failing (Windows) Message-ID: <200101212121.WAA16327@pandora.informatik.hu-berlin.de> > Elsewhere? Not for me, on neither Solaris nor Linux. What expat version? Regards, Martin From loewis at informatik.hu-berlin.de Sun Jan 21 22:22:44 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:22:44 +0100 (MET) Subject: [Python-Dev] autoconf --enable vs. --with Message-ID: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> > It looks like with-cycle-gc and mybe with-pydebug would have to be > changed. I'm in favour of changing it. Regards, Martin From loewis at informatik.hu-berlin.de Sun Jan 21 22:34:08 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:34:08 +0100 (MET) Subject: [Python-Dev] test___all__ fails with no bsddb Message-ID: <200101212134.WAA16446@pandora.informatik.hu-berlin.de> On my Solaris 2.6 installation, with no bsddb module, I get test test___all__ failed -- dbhash has no __all__ attribute This is caused by anydbm importing dbhash first. After that fails, dbhash is still in sys.modules, and the next import of dbhash silently loads an incomplete module. Regards, Martin From tim.one at home.com Sun Jan 21 22:38:11 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 16:38:11 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212121.WAA16327@pandora.informatik.hu-berlin.de> Message-ID: [Martin von Loewis] > Not for me, on neither Solaris nor Linux. What expat version? Tell me how to answer the question, and I'll be happy to (I have no idea what any of this stuff is or does). My pyexpat.c (well, my *everything*) is current CVS, pyexpat.c in particular is revision 2.33. xmltok.dll and xmlparse.dll were obtained from ftp://ftp.jclark.com/pub/xml/expat.zip for the 2.0 release. Is any of that relevant? The tests passed in the wee hours (EST; UTC -0500) this morning. They began failing after I updated around 1pm EST today. From thomas at xs4all.net Sun Jan 21 22:54:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 21 Jan 2001 22:54:05 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 02:44:38PM -0500 References: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> Message-ID: <20010121225405.M17392@xs4all.nl> On Sun, Jan 21, 2001 at 02:44:38PM -0500, Tim Peters wrote: > > ! /* Same type name, or (more likely) incomparable numeric types */ > > ! return (v->ob_type < w->ob_type) ? -1 : 1; > That's non-std C in a way Insure complains about elsewhere; change to > return ((Py_uintptr_t)v->ob_type < > (Py_uintptr_t)w->ob_type) ? -1 : 1; Why is comparing v->ob_type with w->ob_type illegal ? They're both pointers to the same type, aren't they ? > if-vendors-stuck-to-the-letter-of-the-c-std-python-wouldn't- > compile-at-all-ly y'rs - tim That's easy to check, gcc has these nice (and from a users point of view, fairly useless) options: '-ansi', '-pedantic' and '-pedantic-errors'. '-ansi' disables some GCC-specific features, -pedantic turns gcc into a whiney pedantic I'm sure you'd get along with just fine , and -pedantic-errors turns those whines into errors. Doing a quick check I see one error I added myself (but haven't commited) in the continue-inside-try patch (a trailing comma in an enumerator definition), and one error in configure (it mis-detects the arguments to setpgrp() in strict-ANSI mode, for some reason.) I don't see any errors in the core Python. I see an error in the nis module (missing function prototype, and broken system-include file) and a *lot* of errors in linuxaudiodev, but nothing else in the set of modules I can compile. Not bad! Note that this was tested in a current tree. I couldn't find either Guido's 'broken' code or your proposed 'good' code, so I don't know if you checked in a fix yet. If you didn't, don't bother, it's not broken :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis at informatik.hu-berlin.de Sun Jan 21 23:00:47 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 23:00:47 +0100 (MET) Subject: [Python-Dev] Re: test_sax failing (Windows) In-Reply-To: References: Message-ID: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> > [Martin von Loewis] > > Not for me, on neither Solaris nor Linux. What expat version? > > Tell me how to answer the question, and I'll be happy to (I have no idea > what any of this stuff is or does). > > My pyexpat.c (well, my *everything*) is current CVS, pyexpat.c in > particular is revision 2.33. That's good; mine too. > xmltok.dll and xmlparse.dll were obtained from > > ftp://ftp.jclark.com/pub/xml/expat.zip > > for the 2.0 release. > > Is any of that relevant? That gives some clue, yes. Unfortunately, that URL itself is a symlink that was expat1_1.zip (157936 bytes) at some point, and now is expat1_2.zip (153591 bytes). The files themselves are not self-identifying, it's hard to tell once unzipped... Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either works for me. I never tested 1.95.x (which is also not available from jclark.com). > The tests passed in the wee hours (EST; UTC -0500) this morning. > They began failing after I updated around 1pm EST today. I just merged pyexpat changes from PyXML into Python 2 so that could be the cause. However, this very code has been used for some time by PyXML users, why it crashes for you is a mystery to me. Any chance of producing a C backtrace? Regards, Martin From tim.one at home.com Sun Jan 21 23:09:30 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 17:09:30 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: Message-ID: FYI, under the debug-build Python, running test_sax.py under the debugger dies like so: Passed test_attrs_empty Passed test_attrs_wattr Passed test_escape_all Passed test_escape_basic Passed test_escape_extra Passed test_expat_attrs_empty Passed test_expat_attrs_wattr Passed test_expat_dtdhandler Passed test_expat_entityresolver Passed test_expat_file Traceback (most recent call last): File "../lib/test/test_sax.py", line 603, in ? confirm(value(), name) File "../lib/test/test_sax.py", line 435, in test_expat_incomplete parser.parse(StringIO("")) File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 42, in parse xmlreader.IncrementalParser.parse(self, source) File "c:\code\python\dist\src\lib\xml\sax\xmlreader.py", line 122, in parse self.close() File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 91, in close self.feed("", isFinal = 1) File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 82, in feed except expat.error: SystemError: 'finally' pops bad exception Running it from a command line instead produces the same output up to but not including the traceback, and Python crashes with a memory fault then. Attaching to the process with a debugger at that point shows it trying to do _Py_Dealloc on an op whose op->op_type member is NULL. Here's the call stack at that point: _Py_Dealloc(_object * 0x007af100) line 1304 + 6 bytes insertdict(dictobject * 0x007637ec, _object * 0x007a8270, long -1601350627, _object * 0x1e1eff18 __Py_NoneStruct) line 364 + 48 bytes PyDict_SetItem(_object * 0x007637ec, _object * 0x007a8270, _object * 0x1e1eff18 __Py_NoneStruct) line 498 + 21 bytes PyDict_SetItemString(_object * 0x007637ec, char * 0x1e1d84fc, _object * 0x1e1eff18 __Py_NoneStruct) line 1272 + 17 bytes PySys_SetObject(char * 0x1e1d84fc, _object * 0x1e1eff18 __Py_NoneStruct) line 67 + 17 bytes reset_exc_info(_ts * 0x00760630) line 2207 + 17 bytes eval_code2(PyCodeObject * 0x00993df0, _object * 0x0098794c, _object * 0x00000000, _object * * 0x007a9d28, int 2, _object * * 0x007a9d30, int 1, _object * * 0x009a0b60, int 1) line 2125 + 9 bytes fast_function(_object * 0x009a4f6c, _object * * * 0x0063f5a0, int 4, int 2, int 1) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x00993910, _object * 0x0098794c, _object * 0x00000000, _object * * 0x007a05e8, int 1, _object * * 0x007a05ec, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes fast_function(_object * 0x009a549c, _object * * * 0x0063f738, int 1, int 1, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x007b35e0, _object * 0x0098110c, _object * 0x00000000, _object * * 0x009beb10, int 2, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes call_eval_code2(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2765 + 57 bytes call_object(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2594 + 17 bytes call_method(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2717 + 17 bytes call_object(_object * 0x007e125c, _object * 0x009beafc, _object * 0x00000000) line 2592 + 17 bytes do_call(_object * 0x007e125c, _object * * * 0x0063f96c, int 2, int 0) line 2915 + 17 bytes eval_code2(PyCodeObject * 0x00991560, _object * 0x0098794c, _object * 0x00000000, _object * * 0x009bce98, int 2, _object * * 0x009bcea0, int 0, _object * * 0x00000000, int 0) line 1863 + 30 bytes fast_function(_object * 0x009a7dfc, _object * * * 0x0063fb04, int 2, int 2, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x009f7e00, _object * 0x0076f14c, _object * 0x00000000, _object * * 0x00775904, int 0, _object * * 0x00775904, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes fast_function(_object * 0x009bc8ac, _object * * * 0x0063fc9c, int 0, int 0, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x009f86d0, _object * 0x0076f14c, _object * 0x0076f14c, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes PyEval_EvalCode(PyCodeObject * 0x009f86d0, _object * 0x0076f14c, _object * 0x0076f14c) line 338 + 29 bytes run_node(_node * 0x007aa740, char * 0x00760dd9, _object * 0x0076f14c, _object * 0x0076f14c) line 919 + 17 bytes run_err_node(_node * 0x007aa740, char * 0x00760dd9, _object * 0x0076f14c, _object * 0x0076f14c) line 907 + 21 bytes PyRun_FileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 257, _object * 0x0076f14c, _object * 0x0076f14c, int 1) line 899 + 21 bytes PyRun_SimpleFileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 1) line 612 + 30 bytes PyRun_AnyFileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 1) line 466 + 17 bytes Py_Main(int 2, char * * 0x00760da0) line 295 + 44 bytes main(int 2, char * * 0x00760da0) line 10 + 13 bytes insertdict is doing Py_DECREF(old_value); reset_exc_info is doing PySys_SetObject("exc_type", frame->f_exc_type); Bet that's as helpful to you as it was to me . From thomas at xs4all.net Sun Jan 21 23:13:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 21 Jan 2001 23:13:02 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <20010121225405.M17392@xs4all.nl>; from thomas@xs4all.net on Sun, Jan 21, 2001 at 10:54:05PM +0100 References: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> <20010121225405.M17392@xs4all.nl> Message-ID: <20010121231302.N17392@xs4all.nl> On Sun, Jan 21, 2001 at 10:54:05PM +0100, Thomas Wouters wrote: > I see an error in the nis module (missing function prototype, and broken > system-include file) and a *lot* of errors in linuxaudiodev The errors in linuxaudiodev are only errors because for some reason, in -ansi -pedantic-errors mode, gcc doesn't define the 'linux' symbol. IMHO, not worth fixing. The nismodule is 'broken' because of this: static nismaplist * nis_maplist (void) { nisresp_maplist *list; char *dom; CLIENT *cl, *clnt_create(); clnt_create() should be declared by the system include files. Anyone have objections to me moving it to pyport.h, inside the '#if 0' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Sun Jan 21 23:28:45 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 17:28:45 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <20010121225405.M17392@xs4all.nl> Message-ID: [Thomas Wouters] > Why is comparing v->ob_type with w->ob_type illegal ? They're > both pointers to the same type, aren't they ? Non-equality comparison of pointers is defined if and only if the pointers are both addresses in the same contiguous structure (think struct or array); an exception is made for a pointer "one beyond the end" of an array, i.e. if sometype a[N]; then &a[0] < &a[N] == 1 is guaranteed despite that &a[N] is outside the bounds of a; but &a[0] < &a[N+1] is undefined (which *means* undefined! e.g., it's OK if they compare equal, or if the comparison causes a hardware fault, or ...). > That's easy to check, gcc has these nice (and from a users point of view, > fairly useless) options: '-ansi', '-pedantic' and '-pedantic-errors'. > '-ansi' disables some GCC-specific features, -pedantic turns gcc into a > whiney pedantic I'm sure you'd get along with just fine , and > -pedantic-errors turns those whines into errors. Your faith in gcc is as charming as it is naive : the most interesting cases of undefined behavior can't be checked no-way, no-how at compile-time. That's why Barry keeps talking employers into dumping thousands of dollars into a single Insure++ license. Insure++ actually tags every pointer at runtime with its source, and gripes if non-equality comparisons are done on a pair not derived from the same array or malloc etc. Since Python type objects are individually allocated (not taken from a preallocated contiguous vector), Insure++ should complain about that compare. > ... > Note that this was tested in a current tree. I couldn't find > either Guido's 'broken' code or your proposed 'good' code, so I > don't know if you checked in a fix yet. If you didn't, don't bother, > it's not broken :-) Guido hasn't checked it in yet, but gcc isn't smart enough to detect *this* breakage anyway. From fredrik at effbot.org Mon Jan 22 00:02:10 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Mon, 22 Jan 2001 00:02:10 +0100 Subject: [Python-Dev] more unicode database changes Message-ID: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Just checked in another unicode database patch, which saves another ~60k. On my Windows box, the Unicode tables are now about 200k (down from 600k in 2.0). After this change, Modules/unicodedatabase.[ch] are no longer used. Since I'm on a Windows box with MSVC 5.0, I don't really want to try removing them from the official build files. In- stead, I've checked in empty versions of the files. Can anyone help me get rid of all references to them from the build files (and CVS)? PS. btw, if my changes broke the build somewhere, let me know asap! From tim.one at home.com Mon Jan 22 00:07:14 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:07:14 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: [Martin, on ftp://ftp.jclark.com/pub/xml/expat.zip] > ... > That gives some clue, yes. Unfortunately, that URL itself is a symlink > that was expat1_1.zip (157936 bytes) at some point, That's the one I've been using. > and now is expat1_2.zip (153591 bytes). I'm assuming you're recommending that one! Based on that assumption, I've downloaded a new one and will put that in the 2.1a1 Windows release. Scream if that's not what you want. > ... > Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either > works for me. I never tested 1.95.x (which is also not available from > jclark.com). If you do and love it, let me know where to get it and I'll ship that instead. >> The tests passed in the wee hours (EST; UTC -0500) this morning. >> They began failing after I updated around 1pm EST today. > I just merged pyexpat changes from PyXML into Python 2 so that could > be the cause. However, this very code has been used for some time by > PyXML users, why it crashes for you is a mystery to me. Perhaps gc, perhaps uninitialized vars, ..., hard to say. Unfortunately, it's not unusual for flawed code to display different behavior across platforms; or, from the long-term QA perspective, it's *great* that flawed code doesn't always appear to work on all platforms . > Any chance of producing a C backtrace? Sent that before; doesn't look like much help; we're seeing a NULL type pointer, but at that stage there's no telling when or where or why it *became* NULL. I'm going to rebuild the world from scratch, and use the new DLLs. You should assume that didn't help unless I say otherwise within 15 minutes. From tim.one at home.com Mon Jan 22 00:09:51 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:09:51 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: [/F] > Just checked in another unicode database patch, which > saves another ~60k. On my Windows box, the Unicode > tables are now about 200k (down from 600k in 2.0). Yay! I take it CNRI wasn't paying you by the byte . > After this change, Modules/unicodedatabase.[ch] are no > longer used. > > Since I'm on a Windows box with MSVC 5.0, I don't really > want to try removing them from the official build files. In- > stead, I've checked in empty versions of the files. That's fine. > Can anyone help me get rid of all references to them from > the build files (and CVS)? > > > > PS. btw, if my changes broke the build somewhere, let me > know asap! I'll take care of the MS project files -- and I was just about to rebuild the world from scratch anyway. From tim.one at home.com Mon Jan 22 00:20:03 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:20:03 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: > After this change, Modules/unicodedatabase.[ch] are no > longer used. Not so: unicodedata.c still #includes unicodedatabase.h. From tim.one at home.com Mon Jan 22 00:53:13 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:53:13 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: [/F] > ... > PS. btw, if my changes broke the build somewhere, let me > know asap! The Windows build is fine now and changes checked-in. You can remove Modules/unicodedatabase.[ch] from the project without hurting it (although I imagine the Unixish builds still need to learn about this!). From tim.one at home.com Mon Jan 22 01:12:21 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 19:12:21 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: More FYI: With the new expat1_2.zip (153591 bytes) DLLs, all tests pass on Windows except for test_sax. No change in symptoms. The failure modes for test_sax depend on all of: + Whether run in release or debug builds. + Whether text_sax.py is run directly or via regrtest.py. + Whether I delete all .pyc/.pyo files first, or use precomplied ones. + In debug builds, whether the test is started from within the debugger, or I start it via cmdline and attach to the process after it crashes (with a memory fault). Here's a new failure mode: test test_sax crashed -- XMLParserType: no element found: line 1, column 5 So this smells to high heaven of either a nasty gc problem or referencing uninitialized memory. Symptoms don't change if I stick import gc gc.disable() at the start of test_sax.py. Barry, can you try running test_sax under Insure? I've got little chance of making enough time tonight to figure this out the hard way ... From nas at arctrix.com Sun Jan 21 18:28:52 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 21 Jan 2001 09:28:52 -0800 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 07:12:21PM -0500 References: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: <20010121092852.A24605@glacier.fnational.com> On Sun, Jan 21, 2001 at 07:12:21PM -0500, Tim Peters wrote: > So this smells to high heaven of either a nasty gc problem or referencing > uninitialized memory. Symptoms don't change if I stick > > import gc > gc.disable() > > at the start of test_sax.py. Can you try it with WITH_CYCLE_GC undefined? Neil From greg at cosc.canterbury.ac.nz Mon Jan 22 01:25:08 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 13:25:08 +1300 (NZDT) Subject: [Python-Dev] a>b == b Message-ID: <200101220025.NAA01809@s454.cosc.canterbury.ac.nz> Suppose I have a class which checks whether it knows how to do a comparison, and if not, wants to pass it on to the other operand in case it knows: class Foo: def __lt__(self, other): if I_know_about(other): # do the comparison else: return other.__gt__(self) If the other operand has a __gt__ method which is doing similar tricks, infinite recursion could result. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Jan 22 01:36:51 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 13:36:51 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <200101191848.NAA02765@cj20424-a.reston1.va.home.com> Message-ID: <200101220036.NAA01813@s454.cosc.canterbury.ac.nz> Guido: > I don't understand how these can be not commutative unless they have a > side effect on the left argument I think he meant "not reflective". If ab == ceil(a,b), then clearly aa. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mwh21 at cam.ac.uk Mon Jan 22 01:48:16 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 22 Jan 2001 00:48:16 +0000 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Greg Ewing's message of "Mon, 22 Jan 2001 13:36:51 +1300 (NZDT)" References: <200101220036.NAA01813@s454.cosc.canterbury.ac.nz> Message-ID: Greg Ewing writes: > Guido: > > > I don't understand how these can be not commutative unless they have a > > side effect on the left argument > > I think he meant "not reflective". If ab == > ceil(a,b), then clearly aa. What's floor of two arguments? In common lisp, (floor a b) is the largest integer n such that (<= n (/ a b)), in Python it's a type error... if you meant min(a,b), then I then think the programmer who thinks "min(a,b)" is spelt "a Message-ID: <200101220052.NAA01817@s454.cosc.canterbury.ac.nz> > Non-equality comparison of pointers is defined if and only if the pointers > are both addresses in the same contiguous structure I'm not sure that the proposed alternative (casting both pointers to ints and comparing the ints) is any better. Does the C std define the result of doing that to two unrelated pointers? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon Jan 22 01:56:16 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 19:56:16 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <20010121092852.A24605@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Can you try it with WITH_CYCLE_GC undefined? Good idea -- for someone with an infinite amount of free time . But being a good sport, I did as you asked with giddy cheer. Alas, it didn't help (all the same bizarre context-dependent test_sax failure modes). I'm sure I disabled WITH_CYCLE_GC correctly, because "import gc" now fails with ImportError in both release and debug builds. BTW, a refcount-too-low problem is another good candidate. From greg at cosc.canterbury.ac.nz Mon Jan 22 02:00:46 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 14:00:46 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: <200101220100.OAA01820@s454.cosc.canterbury.ac.nz> Michael Hudson : > if you meant min(a,b), Yes, sorry, that's what I meant. Or at least that's what I thought the original poster meant - if he didn't, then I'm confused, too! Anyway, I agree that it's a silly thing to want to make a>b mean, and I'm not all that disappointed that it won't be possible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon Jan 22 02:11:52 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:11:52 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101220052.NAA01817@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > I'm not sure that the proposed alternative (casting both > pointers to ints and comparing the ints) is any better. > Does the C std define the result of doing that to two > unrelated pointers? C99 guarantees that, if the type exists, casting a pointer to type uintptr_t won't blow up, and also guarantees that comparisons between (at least) ints of the same type won't blow up. Beyond that, we don't care what it returns. Mostly we're trying to eliminate warnings Barry has to wade thru from Insure++ -- same reason we have a "no compiler warnings!" build policy. Doing the cast is obviously "better" when viewed through Barry's 4AM eyes. You can find out *why* C has this rule (which was in C89, not new in C99) by reading the C FAQ. From tim.one at home.com Mon Jan 22 02:23:27 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:23:27 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: [Michael Hudson] > ... > if you meant min(a,b), then I then think the programmer who > thinks "min(a,b)" is spelt "a deal with (if min has a symbol it's /\, but never mind that). Curiously, in the Icon language, if a is less than b then a < b returns b while b > a returns a. In this way they get the same effect as Python's chained comparisons a < b < c < d via purely binary operators (if a is *not* less than b, a < b in Icon "fails", which is a silent event that causes the expression's context to backtrack -- but we won't go into that here ). Anyway, that accounts for this curious Icon idiom: a <:= b which is short for a := a < b and binds a to max(a, b) (if a is smaller, a < b returns b and the assignment proceeds; but if a is not smaller, a < b fails and that propagates into its context, which here has no other possibilities to backtrack into, so the stmt just ends leaving a alone). "<"-and-">"-are-just-bags-of-pixels-ly y'rs - tim From uche.ogbuji at fourthought.com Mon Jan 22 02:24:46 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Sun, 21 Jan 2001 18:24:46 -0700 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: Message from Guido van Rossum of "Sun, 21 Jan 2001 12:00:02 EST." <200101211700.MAA25479@cj20424-a.reston1.va.home.com> Message-ID: <200101220124.SAA08868@localhost.localdomain> > > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > > should be an annotation in the documentation that indicates whether or not a > > module is thread-safe. For example, many functions in fileinput rely on a > > module global called _state. It strikes me that this module is not likely > > to be thread-safe, yet the documentation doesn't appear to mention this, > > certainly not in an obvious fashion. > > > > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > > LaTex macros in Fred's arsenal? This would make documenting these > > properties both easy and consistent across modules. > > It's hard to say whether a *whole module* is threadsafe. E.g. in the > fileinput example, there's the clear implication that if you use this > in multiple threads, you should instantiate your own FileInput > instances, and then you're totally thread-safe. Clearly the semantics > of the module-global functions are thread-unsafe though. Perhaps what is needed rather is a prose annotation for thread-safety issues. My TeX is rusty, but in Docbook, with the use of role attributes, one could have, taking your FileInput example The module-global functions are not safe, but if you instantiate your own FileInput instances, they will be totally thread-safe. That way the MT issues could be styled differently on rendering, gathered into separate documentation, stripped by those who don't care, etc. I imagine this is also possible in TeX. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Mon Jan 22 02:32:30 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:32:30 -0500 Subject: [Python-Dev] a>b == b Message-ID: [Greg Ewing] > Suppose I have a class which checks whether it knows > how to do a comparison, and if not, wants to pass it > on to the other operand in case it knows: > > class Foo: > > def __lt__(self, other): > if I_know_about(other): > # do the comparison > else: > return other.__gt__(self) > > If the other operand has a __gt__ method which is > doing similar tricks, infinite recursion could result. Does this have something to do with comparisons? That is, wouldn't the same be true if you coded two methods named "spam" and "eggs" in this way? whatever = 0 class Foo: def spam(self, other): if whatever: return 1 else: return other.eggs(self) class Bar: def eggs(self, other): if whatever: return 1 else: return other.spam(self) Foo().spam(Bar()) # RuntimeError: Maximum recursion depth exceeded It that's all there is to it, you got what you asked for. From greg at cosc.canterbury.ac.nz Mon Jan 22 04:31:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 16:31:41 +1300 (NZDT) Subject: [Python-Dev] a>b == b Message-ID: <200101220331.QAA01833@s454.cosc.canterbury.ac.nz> Tim Peters : > Does this have something to do with comparisons? That is, wouldn't the same > be true if you coded two methods named "spam" and "eggs" in this > way? Yes, but Guido hasn't decreed that a.spam(b) and b.eggs(a) are to have a reflective relationship with each other. But don't worry - I've belatedly realised that the correct way to do what I was talking about is to return NotImplemented and let the interpreter take care of calling the reflected method. So I withdraw my objection. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon Jan 22 08:54:32 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 02:54:32 -0500 Subject: [Python-Dev] Worse news Message-ID: I still don't have a clue about test_sax, but have stumbled into more failure modes. Most of them seem related to the SystemError ("'finally' pops bad exception"). Around that part of ceval.c, sometimes the v popped off the stack has a NULL type pointer, other times it's a pointer to a damaged PyTuple_Type (for example, with a tp_dealloc field of 0x61, which leads to an illegal instruction exception). The MS debug heap routines fill all newly malloc'ed memory with 0Xcd ("clean landfill"), fill free'ed memory with 0Xdd ("dead landfill"), and *pad* malloc'ed memory with some number of 0xfd bytes on both sides ("no-man's land"). The clean landfill and no-man's land patterns are showing up more often they should "by chance", and especially in high-order bytes. Just more evidence of the obvious: something is really screwed up . I cannot get the subtest that test_sax is calling (test_expat_incomplete) to fail in isolation. Next headache: If I delete all .pyc files from Lib/ and Lib/test/, and then run: python ../lib/test/regrtest.py -x test_sax by hand, all the 98 tests that *should* run on Windows (excluding, of course, test_sax, which is no longer tried) pass. If I immediately run them again (without deleting .pyc) by hand: python ../lib/test/regrtest.py -x test_sax then they again pass. However, if I do rt -x test_sax which does exactly the steps (delete .pyc, run regrest excluding test_sax, run regrtest again) via the little MS batch file rt.bat, then on the second time thru regrtest, and 5 times out of 5, it died in test_extcall with an "illegal operation", while executing if (TYPE(c) == DOUBLESTAR) { near the end of symtable_params in compile.c. This is an optimized build, and the debugger has no idea what's in c at this point; to judge from the offending machine instruction and register contents, though, c is a bad pointer. Have not been able to get test_extcall to fail in isolation. Have also been unable to get test_extcall to fail in the debug build. So there's evidence of Deep Rot beyond test_sax, but test_sax remains the only test that fails every time and under both build types. Running regrtest with -r (randomize test order) is also "interesting": first time I tried that, test_cpickle failed (truncated output) as well as test_sax. I doubt anyone has run the tests more often than me over the last week, so I'm not surprised I'm seeing the most problems. However, since *nobody* is seeing anything on Linux, I'd at least like to get *someone* else to run the tests on Windows. While I'm not having any unusual problems with my box, it's certainly possible that I've got a corrupted file or a flaky memory chip etc, or that MSVC is generating bad code for some recent change (although that's unlikely since the debug build generates *really* straightforward code). Deleting my entire PCbuild subtree and refetching it from CVS didn't make any difference. From esr at thyrsus.com Mon Jan 22 09:01:27 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 03:01:27 -0500 Subject: [Python-Dev] autoconf --enable vs. --with In-Reply-To: <200101212122.WAA16371@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Sun, Jan 21, 2001 at 10:22:44PM +0100 References: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> Message-ID: <20010122030127.C20804@thyrsus.com> Martin von Loewis : > > It looks like with-cycle-gc and mybe with-pydebug would have to be > > changed. > > I'm in favour of changing it. Likewise. Let's be good neighbors. -- Eric S. Raymond Where rights secured by the Constitution are involved, there can be no rule making or legislation which would abrogate them. -- Miranda vs. Arizona, 384 US 436 p. 491 From loewis at informatik.hu-berlin.de Mon Jan 22 09:26:15 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 22 Jan 2001 09:26:15 +0100 (MET) Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: References: Message-ID: <200101220826.JAA20819@pandora.informatik.hu-berlin.de> > Running it from a command line instead produces the same output up to but > not including the traceback, and Python crashes with a memory fault then. > Attaching to the process with a debugger at that point shows it trying to do > _Py_Dealloc on an op whose op->op_type member is NULL. [...] > Bet that's as helpful to you as it was to me . Well, it was atleast motivating enough to try it out on my Whistler installation. Purify would probably find this rather quickly; the code writes into the 257th element of a 256-elements array. I've committed a fix. Depending on the exact organization of globals, this could have easily gone unnoticed. MSVC packs variables more than gcc does, so the write would overwrite one byte in ErrorObject, which would then not point to a PyObject anymore. Thanks for your patience, Martin From tim.one at home.com Mon Jan 22 10:18:04 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 04:18:04 -0500 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <200101220826.JAA20819@pandora.informatik.hu-berlin.de> Message-ID: [Martin] > Well, it was atleast motivating enough to try it out on my Whistler > installation. Purify would probably find this rather quickly; the code > writes into the 257th element of a 256-elements array. Ah! You shouldn't do that . > I've committed a fix. But you should do that. Thank you! Here's where I am now: ========================================================================= All test_sax failures have gone away (yay!). ========================================================================= Running rt -x test_sax on Windows still blows up in test_extcall on the 2nd pass. It does not blow up: using the debug build; or if test_sax is *not* excluded; or in the 1st pass; or when running text_extcall in isolation; or if the steps rt performs are done by hand ========================================================================= Running rt -r on Windows still sees test_cpickle fail in the first pass (with truncated output), but succeed in the second pass. First-pass failure is always like so (modulo line breaks I'm inserting by hand): test test_cpickle failed -- Tail of expected stdout unseen: 'dumps()\012 loads()\012 ok\012 loads() DATA\012 ok\012 dumps() binary\012 loads() binary\012 ok\012 loads() BINDATA\012 ok\012 dumps() RECURSIVE\012 ok\012' I've also seen it fail at least once when doing the same thing by hand: del ..\lib\*.pyc del ..\lib\test\*.pyc python ../lib/test/regrtest.py -r else-i-would-have-asked-martin-to-look-for-a digit-to-change-in- command.com-ly y'rs - tim From mal at lemburg.com Mon Jan 22 11:19:18 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:19:18 +0100 Subject: [Python-Dev] more unicode database changes References: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: <3A6C0926.D0A004E4@lemburg.com> Fredrik Lundh wrote: > > Just checked in another unicode database patch, which > saves another ~60k. On my Windows box, the Unicode > tables are now about 200k (down from 600k in 2.0). Great work, Fredrik :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 22 11:42:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:42:52 +0100 Subject: [Python-Dev] readline and setup.py References: <3A68B5B0.771412F7@lemburg.com> Message-ID: <3A6C0EAC.7D322174@lemburg.com> "M.-A. Lemburg" wrote: > > The new setup.py procedure for Python causes readline not to > be built on my machine. Instead I get a linker error telling > me that termcap is not found. > > Looking at my old Setup file, I have this line: > > readline readline.c \ > -I/usr/include/readline -L/usr/lib/termcap \ > -lreadline -lterm > > I guess, setup.py should be modified to include additional > library search paths -- shouldn't hurt on platforms which > don't need them. Here's a patch which works for me: projects/Python> diff CVS-Python/setup.py Dev-Python/ --- CVS-Python/setup.py Mon Jan 22 11:36:56 2001 +++ Dev-Python/setup.py Mon Jan 22 11:40:15 2001 @@ -216,10 +216,11 @@ class PyBuildExt(build_ext): exts.append( Extension('rgbimg', ['rgbimgmodule.c']) ) # readline if (self.compiler.find_library_file(lib_dirs, 'readline')): exts.append( Extension('readline', ['readline.c'], + library_dirs=['/usr/lib/termcap'], libraries=['readline', 'termcap']) ) # The crypt module is now disabled by default because it breaks builds # on many systems (where -lcrypt is needed), e.g. Linux (I believe). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 22 11:52:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:52:17 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> Message-ID: <3A6C10E1.EF890356@lemburg.com> "M.-A. Lemburg" wrote: > > Why does setup.py stop with an error in case _tkinter cannot > be built (due to an old Tk/Tcl version in my case) ? > > I think the policy in setup.py should be to output warnings, > but continue building the rest of the Python modules. I haven't heard anything from the powers to be... what should the policy be for auto-detected and -configured modules ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Mon Jan 22 13:37:04 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 13:37:04 +0100 Subject: [Python-Dev] _tkinter and setup.py In-Reply-To: <3A6C10E1.EF890356@lemburg.com>; from mal@lemburg.com on Mon, Jan 22, 2001 at 11:52:17AM +0100 References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> Message-ID: <20010122133704.O17392@xs4all.nl> On Mon, Jan 22, 2001 at 11:52:17AM +0100, M.-A. Lemburg wrote: > "M.-A. Lemburg" wrote: > > I think the policy in setup.py should be to output warnings, > > but continue building the rest of the Python modules. > I haven't heard anything from the powers to be... what should the > policy be for auto-detected and -configured modules ? I think Andrew is still working on a way to disable modules from the command line somehow. (I think moving setup.py to setup.py.in, and using autoconf --options would be easiest on both developer and user, but that's just me.) I also think everyone agrees with you that a module that can't be build shouldn't stop the entire process in the final release (and possibly the betas) but that it's definately a good way to debug setup.py in the alphas. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tismer at tismer.com Mon Jan 22 14:13:46 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 14:13:46 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: Message-ID: <3A6C320A.37CBB4E5@tismer.com> Maybe I can help. Tim Peters wrote: ... > Here's where I am now: > > ========================================================================= > All test_sax failures have gone away (yay!). > ========================================================================= > Running > > rt -x test_sax > > on Windows still blows up in test_extcall on the 2nd pass. It does not blow > up: > > using the debug build; or > if test_sax is *not* excluded; or > in the 1st pass; or > when running text_extcall in isolation; or > if the steps rt performs are done by hand ... I got problems with XML as well. I'm not using SAX, but plain expat for speed. The following error happens after parsing thousands of small XML files: from_my_log_window=""" \\bned-s1\tismer\pxml\sdf\mdl\DisplayRGB\1 \\bned-s1\tismer\pxml\sdf\mdl\DisplayVideo\1 Traceback (innermost last): File "", line 1, in ? File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 149, in getall res.append(p.parse()) File "D:\crml_doc\pxml\clean.py", line 81, in parse self.parsers[0].Parse(self.txt1, 1) File "D:\crml_doc\pxml\clean.py", line 53, in endElementMaster if self.txt2: self.parsers[1].Parse(self.txt2, 1) File "D:\crml_doc\pxml\clean.py", line 46, in startElementOther if name <> "MASTER": UnicodeError: UTF-8 decoding error: invalid data """ The good news: The error is reproducible, happens the same under PythonWin and DOS Python, and I can reduce it to a single XML file. That indicates to me that I am near the reason of the bug, not at late, indirect effects. It also *might* be related to Unicode. I will now try to create a minimized script and XML data that produces the above again. back in an hour - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas at xs4all.net Mon Jan 22 14:52:44 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 14:52:44 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 05:28:45PM -0500 References: <20010121225405.M17392@xs4all.nl> Message-ID: <20010122145244.Y17295@xs4all.nl> On Sun, Jan 21, 2001 at 05:28:45PM -0500, Tim Peters wrote: > [Thomas Wouters] > > Why is comparing v->ob_type with w->ob_type illegal ? They're > > both pointers to the same type, aren't they ? > Non-equality comparison of pointers is defined if and only if the pointers > are both addresses in the same contiguous structure (think struct or array); > an exception is made for a pointer "one beyond the end" of an array, i.e. if > sometype a[N]; > then &a[0] < &a[N] == 1 is guaranteed despite that &a[N] is outside the > bounds of a; but &a[0] < &a[N+1] is undefined (which *means* undefined! > e.g., it's OK if they compare equal, or if the comparison causes a hardware > fault, or ...). Ok, I guess I stand corrected. I was confused by the name of Py_uintptr_t: I thought it was a pointer-to-int, not an int large enough to hold a pointer. I'm also positively appalled by the fact the standard refuses to define sane behaviour for out-of-bounds access on an array, but attaches some weird significance to what pointers are pointing *to*, when comparing the values of those pointers, regardless of what type of object they are stored in. But I guess I don't have to whine about that to you, Tim :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tismer at tismer.com Mon Jan 22 15:03:25 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 15:03:25 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> Message-ID: <3A6C3DAD.522CE623@tismer.com> Christian Tismer wrote: > > Maybe I can help. ... ... > I will now try to create a minimized script and XML data that > produces the above again. > > back in an hour - chris Here we go. The following session produces the mentioned UTF8 error: >>> txt = "" >>> def startelt(name, dic): ... print name, dic ... >>> p=expat.ParserCreate() >>> p.StartElementHandler = startelt >>> p.Parse(txt) Traceback (innermost last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data Behavior depends of the ASCII code. From jeremy at alum.mit.edu Mon Jan 22 15:19:34 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 09:19:34 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: References: Message-ID: <14956.16758.68050.257212@localhost.localdomain> Tim, Funny (strange or haha?) that test_extcall is failing since the two pieces of code I've modified most recently are compile.c and the section of ceval.c that handles extended call syntax. I just got through my mail this morning and I'll see what I can reproduce on Linux. As for the test_sax failure, is any of the Python code being executed conditional on platform? The compiler may be generating bad bytecode for a code path that is only executed on Windows. Jeremy From mal at lemburg.com Mon Jan 22 15:27:38 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 15:27:38 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> Message-ID: <3A6C4359.BCB06252@lemburg.com> Christian Tismer wrote: > > Christian Tismer wrote: > > > > Maybe I can help. > > ... > > ... > > I will now try to create a minimized script and XML data that > > produces the above again. > > > > back in an hour - chris > > Here we go. > The following session produces the mentioned UTF8 error: > > >>> txt = "" > >>> def startelt(name, dic): > ... print name, dic > ... > >>> p=expat.ParserCreate() > >>> p.StartElementHandler = startelt > >>> p.Parse(txt) > Traceback (innermost last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > Behavior depends of the ASCII code. > >From code 128 (0200) to 191 (0277) the parser gives an > not well-formed exception, as it should be. > > The codes from 192 to 236, 238-243 produce > "UTF-8 decoding error: invalid data", > the rest gives "not well-formed". > > I would like to know if this happens with your (Tim) modified > version as well. I'm using plain vanilla BeOpen Python 2.0 . This has nothing to do with Python. UTF-8 marks the codes from 128-191 as illegal prefix. See Object/unicodeobject.c: static char utf8_code_length[256] = { /* Map UTF-8 encoded prefix byte to sequence length. zero means illegal prefix. see RFC 2279 for details */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 0, 0 }; Perhaps the parser should catch the UnicodeError and instead return a not-wellformed exception ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 22 15:38:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 15:38:14 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> Message-ID: <3A6C45D5.9A6FA25C@lemburg.com> Thomas Wouters wrote: > > On Mon, Jan 22, 2001 at 11:52:17AM +0100, M.-A. Lemburg wrote: > > "M.-A. Lemburg" wrote: > > > > I think the policy in setup.py should be to output warnings, > > > but continue building the rest of the Python modules. > > > I haven't heard anything from the powers to be... what should the > > policy be for auto-detected and -configured modules ? > > I think Andrew is still working on a way to disable modules from the command > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > --options would be easiest on both developer and user, but that's just me.) This is fairly simple to do: distutils allows great flexibility when it comes to adding user options, e.g. we could have python setup.py --enable-tkinter --disable-readline or more generic python setup.py --enable-package tkinter --disable-package readline The options could then be edited in setup.cfg. > I also think everyone agrees with you that a module that can't be build > shouldn't stop the entire process in the final release (and possibly the > betas) but that it's definately a good way to debug setup.py in the alphas. True... but currently the only way to get Python to compile is to hand-edit setup.py and this is not easy for people with no prior distutils experience. BTW, in my case, setup.py did find the TK-libs for 8.0, but for a beta version -- as a result, _tkinter.c's version #error line triggered and the build failed. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Mon Jan 22 15:38:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 09:38:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,NONE,1.1 In-Reply-To: Your message of "Sun, 21 Jan 2001 14:37:57 +0200." <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> References: <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> Message-ID: <200101221438.JAA29303@cj20424-a.reston1.va.home.com> > Wouldn't it be better to use the > > d = {} > exec "foo", d Surely you meant exec "foo" in d --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Mon Jan 22 15:43:42 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 15:43:42 +0100 Subject: [Python-Dev] _tkinter and setup.py In-Reply-To: <3A6C45D5.9A6FA25C@lemburg.com>; from mal@lemburg.com on Mon, Jan 22, 2001 at 03:38:14PM +0100 References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> <3A6C45D5.9A6FA25C@lemburg.com> Message-ID: <20010122154342.B17295@xs4all.nl> On Mon, Jan 22, 2001 at 03:38:14PM +0100, M.-A. Lemburg wrote: > > I think Andrew is still working on a way to disable modules from the command > > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > > --options would be easiest on both developer and user, but that's just me.) > This is fairly simple to do: distutils allows great flexibility > when it comes to adding user options, e.g. we could have > > python setup.py --enable-tkinter --disable-readline > > or more generic > > python setup.py --enable-package tkinter --disable-package readline > > The options could then be edited in setup.cfg. Note that the 'user' only has 'configure' and 'make' to run, so optimally, the options would have to be given to one of those (preferably to 'configure', to keep it similar to 90% of the packages out there.) > but currently the only way to get Python to compile is > to hand-edit setup.py and this is not easy for people with no > prior distutils experience. You only have to edit the 'disabled_module_list' variable... not too hard even if you don't have distutils experience (though you do need some python experience.) I don't think its wrong to expect people who compile alpha versions to have at least that much knowledge (though it should be noted in the README somewhere.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis at informatik.hu-berlin.de Mon Jan 22 15:46:39 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 22 Jan 2001 15:46:39 +0100 (MET) Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <3A6C4359.BCB06252@lemburg.com> (mal@lemburg.com) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> <3A6C4359.BCB06252@lemburg.com> Message-ID: <200101221446.PAA05164@pandora.informatik.hu-berlin.de> > This has nothing to do with Python. UTF-8 marks the codes > from 128-191 as illegal prefix. [...] > Perhaps the parser should catch the UnicodeError and > instead return a not-wellformed exception ?! Right on both accounts. If no encoding is specified, and if the document appears not to be UTF-16 in any endianness, an XML processor shall assume it is UTF-8. As Marc-Andre explains, your document is not proper UTF-8, hence the error. The confusing thing is that expat itself does not care about it not being UTF-8; that is only detected when the callback is invoked in pyexpat, and therefore conversion to a Unicode object is attempted. The right solution probably would be to change expat so that it determines correctness of the encoding for each string it gets as part of the wellformedness analysis, and produces illformedness exceptions when an encoding error occurs. Patches are welcome, although they probable should go to sourceforge.net/projects/expat. Regards, Martin From jack at oratrix.nl Mon Jan 22 15:57:33 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 22 Jan 2001 15:57:33 +0100 Subject: [Python-Dev] test_sax and site-python Message-ID: <20010122145733.85E51373C95@snelboot.oratrix.nl> I'm not sure whether this is really a bug, but I had the problem that there was something wrong with the xml package I had installed into my Lib/site-python, and this caused test_sax to complain. If the test stuff is expected to test only the core functionality maybe sys.path should be edited so that it only contains directories that are part of the core distribution? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From tismer at tismer.com Mon Jan 22 16:05:24 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 16:05:24 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> <3A6C4359.BCB06252@lemburg.com> Message-ID: <3A6C4C34.4D1252C9@tismer.com> "M.-A. Lemburg" wrote: ... > > The codes from 192 to 236, 238-243 produce > > "UTF-8 decoding error: invalid data", > > the rest gives "not well-formed". > > > > I would like to know if this happens with your (Tim) modified > > version as well. I'm using plain vanilla BeOpen Python 2.0 . > > This has nothing to do with Python. UTF-8 marks the codes > from 128-191 as illegal prefix. See Object/unicodeobject.c: ... Schade. > Perhaps the parser should catch the UnicodeError and > instead return a not-wellformed exception ?! I belive it would be better. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at digicool.com Mon Jan 22 16:06:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:06:06 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.24,2.25 In-Reply-To: Your message of "Sun, 21 Jan 2001 15:34:14 PST." References: Message-ID: <200101221506.KAA29773@cj20424-a.reston1.va.home.com> > Move declaration of 'clnt_create()' NIS function to pyport.h, as it's > supposed to be declared in system include files (with a proper prototype.) > Should be moved to a platform-specific block if anyone finds out which > broken platforms need it :-) [The following is inside #if 0] > + /* From Modules/nismodule.c */ > + CLIENT *clnt_create(); > + Thomas, I'm not sure if this particular declaration belongs in pyport.h, even inside #if 0. CLIENT is declared in a NIS-specific header file that's not included by pyport.h, but which *is* included by nismodule.c. I think you did the right thing to nismodule.c; the pyport.h patch is redundant in my eyes. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jan 22 16:12:49 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 16:12:49 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> <3A6C45D5.9A6FA25C@lemburg.com> <20010122154342.B17295@xs4all.nl> Message-ID: <3A6C4DF1.F71AA631@lemburg.com> Thomas Wouters wrote: > > On Mon, Jan 22, 2001 at 03:38:14PM +0100, M.-A. Lemburg wrote: > > > > I think Andrew is still working on a way to disable modules from the command > > > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > > > --options would be easiest on both developer and user, but that's just me.) > > > This is fairly simple to do: distutils allows great flexibility > > when it comes to adding user options, e.g. we could have > > > > python setup.py --enable-tkinter --disable-readline > > > > or more generic > > > > python setup.py --enable-package tkinter --disable-package readline > > > > The options could then be edited in setup.cfg. > > Note that the 'user' only has 'configure' and 'make' to run, so optimally, > the options would have to be given to one of those (preferably to > 'configure', to keep it similar to 90% of the packages out there.) Hmm, but then you'll have to hack autoconf again... (even if only to pass the options to setup.py somehow, e.g. via your proposed setup.cfg.in trick). > > but currently the only way to get Python to compile is > > to hand-edit setup.py and this is not easy for people with no > > prior distutils experience. > > You only have to edit the 'disabled_module_list' variable... not too hard > even if you don't have distutils experience (though you do need some python > experience.) I don't think its wrong to expect people who compile alpha > versions to have at least that much knowledge (though it should be noted in > the README somewhere.) Oops, you're right; must have overlooked that one in setup.py. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Mon Jan 22 16:14:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 16:14:02 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.24,2.25 In-Reply-To: <200101221506.KAA29773@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 10:06:06AM -0500 References: <200101221506.KAA29773@cj20424-a.reston1.va.home.com> Message-ID: <20010122161402.D17295@xs4all.nl> On Mon, Jan 22, 2001 at 10:06:06AM -0500, Guido van Rossum wrote: > > Move declaration of 'clnt_create()' NIS function to pyport.h, as it's > > supposed to be declared in system include files (with a proper prototype.) > > Should be moved to a platform-specific block if anyone finds out which > > broken platforms need it :-) > > [The following is inside #if 0] > > + /* From Modules/nismodule.c */ > > + CLIENT *clnt_create(); > > + > > Thomas, I'm not sure if this particular declaration belongs in > pyport.h, even inside #if 0. > > CLIENT is declared in a NIS-specific header file that's not included by > pyport.h, but which *is* included by nismodule.c. > > I think you did the right thing to nismodule.c; the pyport.h patch is > redundant in my eyes. The same goes for most prototypes inside that '#if 0'. I see it more as an easy list to see what prototypes were removed than as proper examples of the prototype. You're right about CLIENT being defined in system-specific include files, I just wasn't worried about it because it was inside an '#if 0' that will never be turned into an '#if 1'. If a specific platform needs that prototype, we'll figure out how to arrange the prototype then :) But if you want me to remove it, that's fine. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jan 22 16:22:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:22:29 -0500 Subject: [Python-Dev] autoconf --enable vs. --with In-Reply-To: Your message of "Mon, 22 Jan 2001 03:01:27 EST." <20010122030127.C20804@thyrsus.com> References: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> <20010122030127.C20804@thyrsus.com> Message-ID: <200101221522.KAA30287@cj20424-a.reston1.va.home.com> > I've been working a bit on the build process lately. I came > across this in the autoconf documentation: > > > If a software package has optional compile-time features, the > user can give `configure' command line options to specify > whether to compile them. The options have one of these forms: > > --enable-FEATURE[=ARG] > --disable-FEATURE > > Some packages require, or can optionally use, other software > packages which are already installed. The user can give > `configure' command line options to specify which such > external software to use. The options have one of these > forms: > > --with-package[=ARG] > --without-package > > > Is it worth fixing the Python configure script to comply with > these definitions? It looks like with-cycle-gc and mybe > with-pydebug would have to be changed. OK, but please add explicit checks for the old --with[out]-cycle-gc and --with[out]-pydebug flags that cause errors (not just warnings) when these forms are used. It's bad enough that configure doesn't flag typos in such options as errors; if we change the option names, we really owe users who were using the old forms a clear error. (Is this stupid autoconf behavior changable? Does it also apply to enable/disable?) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Jan 22 16:19:49 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 22 Jan 2001 10:19:49 -0500 (EST) Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: References: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: <14956.20373.104748.573294@cj42289-a.reston1.va.home.com> [Martin, on ftp://ftp.jclark.com/pub/xml/expat.zip] > Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either > works for me. I never tested 1.95.x (which is also not available from > jclark.com). Tim Peters writes: > If you do and love it, let me know where to get it and I'll ship that > instead. I'll recommend not updating to 1.95.1; let's awit at least until 1.95.2 is out. These are really just pre-2.0 releases to shake things out. I have been using the current Expat CVS lightly, but need to do more testing before I can be confident in it and our bindings (not yet checked in anywhere; should be in PyXML soon). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jeremy at alum.mit.edu Mon Jan 22 16:44:41 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 10:44:41 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: References: Message-ID: <14956.21865.943601.735426@localhost.localdomain> On Linux, I am also seeing test_cpickle failures. I have not been able to reproduce failures in test_extcall or test_sax. I ran 'regrtest.py -r -x test_thread test_unicodedata test_signal test_select test_poll' 10 times and test_cpickle failed five times. (I did the peculiar run because exclyding those five tests shaves two minutes off the running time of the test suite.) No more time to look into this... Jeremy From jeremy at alum.mit.edu Mon Jan 22 16:26:27 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 10:26:27 -0500 (EST) Subject: [Python-Dev] getcode() function in pyexpat.c Message-ID: <14956.20771.447958.389724@localhost.localdomain> The pyexpat module uses functions named getcode() and call_with_frame() for handlers of some sort. I can make this much out from the code, but the rest is a bit of a mystery. I was trying to read this code because of the errors Tim is seeing with test_sax on Windows. A few comments to explain this highly stylized and macro-laden code would be appreciated. The module appears to be creating empty code objects and calling them. I say they appear to be empty, because when they are created they don't appear to have anything initialized except name, filename, and firstlineno. getcode(EndNamespaceDecl, 419) (The freevars and cellvars entries are part of the support for nested scopes. They can be safely ignored for the moment.) I simply don't understand what's going on -- and I'm deeply suspicious that it is the source of whatever problems Tim is seeing with test_sax. Jeremy From thomas at xs4all.net Mon Jan 22 16:55:35 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 16:55:35 +0100 Subject: [Python-Dev] 'make distclean' broken. Message-ID: <20010122165535.P17392@xs4all.nl> 'make distclean' seems broken, at least on non-GNU make's: [snip] clobbering subdirectory Modules rm -f *.o python core *~ [@,#]* *.old *.orig *.rej rm -f add2lib hassignal rm -f *.a tags TAGS config.c Makefile.pre rm -f *.so *.sl so_locations make -f ./Makefile.in SUBDIRS="Include Lib Misc Demo" clobber "./Makefile.in", line 134: Need an operator make: fatal errors encountered -- cannot continue *** Error code 1 (ignored) rm -f config.status config.log config.cache config.h Makefile rm -f buildno platform rm -f Modules/Makefile [snip] (This is using FreeBSD's 'make'.) Looking at line 134, I'm not sure why it works with GNU make other than that it avoids complaining about syntax errors it doesn't run into (which could be both bad and good :) or that it avoids complaining about obvious GNU autoconf tricks. But I don't know enough about make to say for sure, nor to fix the above problem. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jan 22 16:55:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:55:42 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sun, 21 Jan 2001 17:28:45 EST." References: Message-ID: <200101221555.KAA30935@cj20424-a.reston1.va.home.com> > Your faith in gcc is as charming as it is naive : the most > interesting cases of undefined behavior can't be checked no-way, no-how at > compile-time. That's why Barry keeps talking employers into dumping > thousands of dollars into a single Insure++ license. Insure++ actually tags > every pointer at runtime with its source, and gripes if non-equality > comparisons are done on a pair not derived from the same array or malloc > etc. Since Python type objects are individually allocated (not taken from a > preallocated contiguous vector), Insure++ should complain about that > compare. IMHO, *this* *particular* gripe of Insure++ is just a pain in the butt, and I wish there was a way to turn it off in Insure++ without having to fix the code. IMHO, this was included in the standard to allow segmented-memory implementations of C. Think certain DOS or Windows 3.1 memory models where a pointer is a segment plus an offset. This is not current practice even on Palmpilots! The standard may say that such comparisons are undefined, but I don't care about this particular undefinedness, and I'm annoyed by the required patches. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 22 17:02:15 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 11:02:15 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sun, 21 Jan 2001 14:44:38 EST." References: Message-ID: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> > > My only concern is that under the old schele, two different numeric > > extension types that somehow can't be compared will end up being > > *equal*. To fix this, I propose that if the names compare equal, as a > > last resort we compare the type pointers -- this should be consistent > > too. > > Agreed, and sounds fine! Checked in now. While fixing the test_b1 code again, which depends on this behavior, I thought of a refinement: it wouldn't be hard to make None compare smaller than *anything* (including numbers). Is this worth it? diff -c -r2.113 object.c *** object.c 2001/01/22 15:59:32 2.113 --- object.c 2001/01/22 16:03:38 *************** *** 550,555 **** --- 550,561 ---- PyErr_Clear(); } + /* None is smaller than anything */ + if (v == Py_None) + return -1; + if (w == Py_None) + return 1; + /* different type: compare type names */ if (v->ob_type->tp_as_number) vname = ""; --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh21 at cam.ac.uk Mon Jan 22 17:12:47 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: Mon, 22 Jan 2001 16:12:47 +0000 (GMT) Subject: [Python-Dev] Worse news In-Reply-To: <14956.21865.943601.735426@localhost.localdomain> Message-ID: On Mon, 22 Jan 2001, Jeremy Hylton wrote: > On Linux, I am also seeing test_cpickle failures. I have not been > able to reproduce failures in test_extcall or test_sax. Hmm - my machine's done 28 exemplary "make clean; make test" runs this morning. I last updated yesterday afternoon my time (~1700 GMT). Of course, I don't build pyexpat... > No more time to look into this... Don't you just love memory corruption bugs? Cheers, M. From akuchlin at mems-exchange.org Mon Jan 22 17:28:59 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 22 Jan 2001 11:28:59 -0500 Subject: [Python-Dev] Python 2.1 article Message-ID: I've put together an almost-complete first draft of a "What's New in 2.1" article. The only missing piece is a section on the Nested Scopes PEP, which obviously has to wait for the changes to get checked in. http://www.amk.ca/python/2.1/ ; as usual, nitpicking comments are welcomed. --amk From nas at arctrix.com Mon Jan 22 11:00:43 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 02:00:43 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from mwh21@cam.ac.uk on Mon, Jan 22, 2001 at 04:12:47PM +0000 References: <14956.21865.943601.735426@localhost.localdomain> Message-ID: <20010122020043.A25687@glacier.fnational.com> On Mon, Jan 22, 2001 at 04:12:47PM +0000, Michael Hudson wrote: > Don't you just love memory corruption bugs? Great fun. I've played around with efence and debauch on the weekend. I even when as far as merging an updated fmalloc from the XFree source tree into debauch and writing a reporting script in Python. I probably would have caught the pyexpat overrun if I would have used efence with EF_ALIGNMENT=0 and complied with -fpack-struct. I'll have to try it tonight. Maybe something else will turn up. Neil From guido at digicool.com Mon Jan 22 18:12:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 12:12:29 -0500 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: Your message of "Mon, 22 Jan 2001 16:55:35 +0100." <20010122165535.P17392@xs4all.nl> References: <20010122165535.P17392@xs4all.nl> Message-ID: <200101221712.MAA00694@cj20424-a.reston1.va.home.com> > 'make distclean' seems broken, at least on non-GNU make's: > > [snip] > clobbering subdirectory Modules > rm -f *.o python core *~ [@,#]* *.old *.orig *.rej > rm -f add2lib hassignal > rm -f *.a tags TAGS config.c Makefile.pre > rm -f *.so *.sl so_locations > make -f ./Makefile.in SUBDIRS="Include Lib Misc Demo" clobber > "./Makefile.in", line 134: Need an operator > make: fatal errors encountered -- cannot continue > *** Error code 1 (ignored) > rm -f config.status config.log config.cache config.h Makefile > rm -f buildno platform > rm -f Modules/Makefile > [snip] > > (This is using FreeBSD's 'make'.) > > Looking at line 134, I'm not sure why it works with GNU make other than that > it avoids complaining about syntax errors it doesn't run into (which could > be both bad and good :) or that it avoids complaining about obvious GNU > autoconf tricks. But I don't know enough about make to say for sure, nor to > fix the above problem. There's one line in Makefile.in that trips over Make (mine also complains about it): @SET_DLLLIBRARY@ Looking at the code in configure.in that generates this macro: AC_SUBST(SET_DLLLIBRARY) LDLIBRARY='' SET_DLLLIBRARY='' . . (and later) . cygwin*) LDLIBRARY='libpython$(VERSION).dll.a' SET_DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' ;; I don't see why we couldn't change this so that Makefile.in just contains DLLLIBRARY= @DLLLIBRARY@ and then configure.in could be changed to AC_SUBST(DLLLIBRARY) LDLIBRARY='' DLLLIBRARY='' . . (and later) . cygwin*) LDLIBRARY='libpython$(VERSION).dll.a' DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' ;; Or am I missing something? Does this fix the problem? --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Mon Jan 22 18:21:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 12:21:09 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221602.LAA31103@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 11:02:15AM -0500 References: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> Message-ID: <20010122122109.A14952@thyrsus.com> Guido van Rossum : > While fixing the test_b1 code again, which depends on this behavior, I > thought of a refinement: it wouldn't be hard to make None compare > smaller than *anything* (including numbers). > > Is this worth it? I think so, if only for the sake of well-definedness. -- Eric S. Raymond "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin, Historical Review of Pennsylvania, 1759. From thomas at xs4all.net Mon Jan 22 18:25:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 18:25:30 +0100 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: <200101221712.MAA00694@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 12:12:29PM -0500 References: <20010122165535.P17392@xs4all.nl> <200101221712.MAA00694@cj20424-a.reston1.va.home.com> Message-ID: <20010122182530.E17295@xs4all.nl> On Mon, Jan 22, 2001 at 12:12:29PM -0500, Guido van Rossum wrote: > and then configure.in could be changed to > AC_SUBST(DLLLIBRARY) > LDLIBRARY='' > DLLLIBRARY='' > . > . (and later) > . > cygwin*) > LDLIBRARY='libpython$(VERSION).dll.a' > DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' > ;; You mean DLLLIBRARY='$(basename $(LDLIBRARY))' But yes, that fixes it. > Or am I missing something? Well, on *that* I'm not sure, that's why I asked :P If things in the Python source boggle me, they are always there for a good reason. Well, maybe just 'almost always', but practically always :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas at arctrix.com Mon Jan 22 11:39:59 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 02:39:59 -0800 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: <200101221712.MAA00694@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 12:12:29PM -0500 References: <20010122165535.P17392@xs4all.nl> <200101221712.MAA00694@cj20424-a.reston1.va.home.com> Message-ID: <20010122023959.A25798@glacier.fnational.com> [Guido on change SET_DLLLIBRARY] > Or am I missing something? I don't think so. My new Makefile uses "FOO = @FOO@" everywhere. SET_CXX is the same way in the current Makefile. Neil From esr at thyrsus.com Mon Jan 22 18:41:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 12:41:59 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? Message-ID: <20010122124159.A14999@thyrsus.com> \section{\module{set} --- Basic set algebra for Python} \declaremodule{standard}{set} \modulesynopsis{Basic set algebra operations on sequences.} \moduleauthor{Eric S. Raymond}{esr at thyrsus.com} \sectionauthor{Eric S. Raymond}{esr at thyrsus.com} The \module{set} module defines functions for treating lists and other sequences as mathematical sets, and defines a set class that uses these operations natively and overloads Python's standard operator set. The \module{set} functions work on any sequence type and return lists. The set methods can take a set or any sequence type as an argument. Set or sequence elements may be of any type and may be mutable. Comparisons and membership tests of elements against sequence objects are done using \keyword{in}, and so can be customized by supplying a suitable \method{__getattr__} method for the sequence type. The running time of these functions is O(n**2) in the worst case unless otherwise noted. For cases that can be short-circuited by cardinality comparisons, this has been done. \begin{funcdesc}{setify}{list1} Returns a list of the argument sequence's elements with duplicates removed. \end{funcdesc} \begin{funcdesc}{union}{list1, list2} Set union. All elements of both sets or sequences are returned. \end{funcdesc} \begin{funcdesc}{intersection}{list1, list2} Set intersection. All elements common to both sets or sequences are returned. \end{funcdesc} \begin{funcdesc}{difference}{list1, list2} Set difference. All elements of the first set or sequence not present in the second are returned. \end{funcdesc} \begin{funcdesc}{symmetric_difference}{list1, list2} Set symmetric difference. All elements present in one sequence or the other but not in both are returned. \end{funcdesc} \begin{funcdesc}{cartesian}{list1, list2} Returns a list of tuples consisting of all possible pairs of elements from the first and second sequences or sets. \end{funcdesc} \begin{funcdesc}{equality}{list1, list2} Set comparison. Return 1 if the two sets or sequences contain exactly the same elements, 0 or otherwise. \end{funcdesc} \begin{funcdesc}{subset}{list1, list2} Set subset test. Return 1 if all elements of the fiorst set or sequence are members of the second, 0 otherwise. \end{funcdesc} \begin{funcdesc}{proper_subset}{list1, list2} Set subset test, excluding equality. Return 1 if the arguments fail a set equality test, and all elements of the fiorst set or sequence are members of the second, 0 otherwise. \end{funcdesc} \begin{funcdesc}{powerset}{list1} Return the set of all subsets of the argument set or sequence. Warning: this produces huge results from small arguments and is O(2**n) in both running time and space requirements; you can readily run yourself out of memory using it. \end{funcdesc} \subsection{set Objects \label{set-objects}} A \class{set} instance uses the \module{set} module functions to implement set semantics on the list it contains, and to support a full set of Python list methods and operaors. Thus, the set methods can take a set or any sequence type as an argument. A set object contains a single data member: \begin{memberdesc}{elements} List containing the elements of the set. \end{memberdesc} Set objects can be treated as mutable sequences; they support the special methods \method{__len__}, \method{__getattr__}, \method{__setattr__}, and \method{__delattr__}. Through \method{__getattr__}, they support the memebership test via \keyword{in}. All the standard mutable-sequence methods \method{list}, \method{append}, \method{extend}, \method{count}, \method{index}, \method{insert} (the index argument is ignored), \method{pop}, \method{remove}, \method{reverse}, and \method{sort} are also supported. After method calls that add elements (\method{setattr}, \method{append}, \method{extend}, \method{insert}), the elements of the data member are re-setified, so it is not possible to introduce duplicates. Calling \function{repr()} on a set returns the result of calling \function{repr} on its element list. Calling \function{str()} returns a representation resembling mathematical notation for the set; an open set bracket, followed by a comma-separated list of \function{str()} representations of the elements, followed by a close set brackets. Set objects support the following Python operators: \begin {tableiii}{l|l|l}{code}{Operator}{Function}{Description} \lineiii{|,+}{union}{Union} \lineiii{&}{intersection}{Intersection} \lineiii{-}{difference}{Difference} \lineiii{^}{symmetric_difference}{Symmetric differe} \lineiii{*}{cartesian}{Cartesian product} \lineiii{==}{equality}{Equality test} \lineiii{!=,<>}{}{Inequality test} \lineiii{<}{proper_subset}{Proper-subset test} \lineiii{<=}{subset}{Subset test} \lineiii{>}{}{Proper superset test} \lineiii{>=}{}{Superset test} \end {tableiii} -- Eric S. Raymond Government is actually the worst failure of civilized man. There has never been a really good one, and even those that are most tolerable are arbitrary, cruel, grasping and unintelligent. -- H. L. Mencken From esr at snark.thyrsus.com Mon Jan 22 19:28:57 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 13:28:57 -0500 Subject: [Python-Dev] I still can't build HTML in a current CVS tree. Message-ID: <200101221828.f0MISvH15121@snark.thyrsus.com> Fred, I still can't build HTML documentation in a current CVS tree -- same complaint about lib/modindex.html being absent. Can we get this fixed before 2.1 ships? -- Eric S. Raymond ...Virtually never are murderers the ordinary, law-abiding people against whom gun bans are aimed. Almost without exception, murderers are extreme aberrants with lifelong histories of crime, substance abuse, psychopathology, mental retardation and/or irrational violence against those around them, as well as other hazardous behavior, e.g., automobile and gun accidents." -- Don B. Kates, writing on statistical patterns in gun crime From fredrik at effbot.org Mon Jan 22 19:33:56 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Mon, 22 Jan 2001 19:33:56 +0100 Subject: [Python-Dev] Python 2.1 article References: Message-ID: <059b01c084a1$e431e490$e46940d5@hagrid> > I've put together an almost-complete first draft of a "What's New in > 2.1" article. The only missing piece is a section on the Nested > Scopes PEP, which obviously has to wait for the changes to get checked > in. what's the current 2.1a1 eta? (pep 226 still says last friday) today? wednesday? this week? this month? Curious /F From mal at lemburg.com Mon Jan 22 19:33:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 19:33:24 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <20010122124159.A14999@thyrsus.com> Message-ID: <3A6C7CF4.F10AA77B@lemburg.com> [LaTeX file] Eric, we are all hackers, but plain LaTeX is not really the right format for a posting to a mailing list... at least not if you really expect feedback ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin at mira.cs.tu-berlin.de Mon Jan 22 19:36:16 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 22 Jan 2001 19:36:16 +0100 Subject: [Python-Dev] getcode() function in pyexpat.c Message-ID: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> > A few comments to explain this highly stylized and macro-laden code > would be appreciated. I probably can't do that before 2.1a1, but I promise to suggest something right afterwards. In general, the macro magic is designed to make the many expat callbacks available to Python. RC_HANDLER (for return code) is the most general template; VOID_HANDLER and INT_HANDLER are common specializations. In the core of RC_HANDLER, there a tuple is built and a Python function is called. The code used to do PyEval_CallObject right inside the macro; the call_with_frame feature is new compared to 2.0. It solves the specific problem of incomprehensible tracebacks. In a typical SAX application, the user code calls expatreader.ExpatParser.parse, which in turn calls self._parser.Parse(data, isFinal) Now, in 2.0, a common problem was a traceback self._parser.Parse(data, isFinal) TypeError: not enough arguments; expected 4, got 2 Everybody assumes a problem in the call to Parse; the real problem is in the call to the callback inside RC_HANDLER, which tried to call a user's function with two arguments that expected four. 2.1 would improve this slightly on its own, writing self._parser.Parse(data, isFinal) TypeError: characters() takes exactly 4 arguments (2 given) With that code, you get File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 81, in feed self._parser.Parse(data, isFinal) File "pyexpat.c", line 379, in CharacterData TypeError: characters() takes exactly 4 arguments (2 given) So that tells you that it is the CharacterData handler that invokes characters(). You are right that the frame object is not used otherwise; it is just there to make a nice traceback. > I simply don't understand what's going on -- and I'm deeply > suspicious that it is the source of whatever problems Tim is seeing > with test_sax. I thought so, too, at first; it turned out that the problem was elsewhere. Regards, Martin From guido at digicool.com Mon Jan 22 20:04:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 14:04:02 -0500 Subject: [Python-Dev] Python 2.1 article In-Reply-To: Your message of "Mon, 22 Jan 2001 19:33:56 +0100." <059b01c084a1$e431e490$e46940d5@hagrid> References: <059b01c084a1$e431e490$e46940d5@hagrid> Message-ID: <200101221904.OAA01170@cj20424-a.reston1.va.home.com> > what's the current 2.1a1 eta? (pep 226 still > says last friday) You missed my email that I sent out Friday. Tentatively it's going out tonight. No point in updating the PEP each time there's slippage. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 22 20:10:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 14:10:54 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Mon, 22 Jan 2001 12:41:59 EST." <20010122124159.A14999@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> Message-ID: <200101221910.OAA01218@cj20424-a.reston1.va.home.com> Eric, There's already a PEP on a set object type, and everybody and their aunt has already implemented a set datatype. If *your* set module is ready for prime time, why not publish it in the Vaults of Parnassus? --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Mon Jan 22 20:29:18 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 14:29:18 -0500 (EST) Subject: [Python-Dev] Re: getcode() function in pyexpat.c In-Reply-To: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> References: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> Message-ID: <14956.35342.724657.865367@localhost.localdomain> >>>>> "MvL" == Martin v Loewis writes: >> I simply don't understand what's going on -- and I'm deeply >> suspicious that it is the source of whatever problems Tim is >> seeing with test_sax. MvL> I thought so, too, at first; it turned out that the problem was MvL> elsewhere. What was the cause of that problem? I didn't see any mail after Tim's middle-of-the-night message "Worse news." Jeremy From tim.one at home.com Mon Jan 22 21:01:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:01:59 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > While fixing the test_b1 code again, which depends on this behavior, I > thought of a refinement: it wouldn't be hard to make None compare > smaller than *anything* (including numbers). > > Is this worth it? First, an attempt to see what Python did in this morning's CVS turned up an internal error for Jeremy: >>> [None < x for x in (1, 1L, 1j, 1.0, [1], {}, (1,))] name: None, in ?, file '', line 1 locals: {'[1]': 0, 'x': 1} globals: {} Fatal Python error: compiler did not label name as local or global abnormal program termination A simpler way to provoke that: >>> [None < 2 for x in "x"] name: None, in ?, file '', line 1 locals: {'[1]': 0, 'x': 1} globals: {} Fatal Python error: compiler did not label name as local or global Anyway, I think forcing None to be "the smallest" is cute! Inexpensive to do, and while I don't see a compelling *use* for it, I bet it would be least surprising to newbies. +1. From fdrake at acm.org Mon Jan 22 21:08:54 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 22 Jan 2001 15:08:54 -0500 (EST) Subject: [Python-Dev] Re: I still can't build HTML in a current CVS tree. In-Reply-To: <200101221828.f0MISvH15121@snark.thyrsus.com> References: <200101221828.f0MISvH15121@snark.thyrsus.com> Message-ID: <14956.37718.968912.189834@cj42289-a.reston1.va.home.com> Eric S. Raymond writes: > Fred, I still can't build HTML documentation in a current CVS tree -- same > complaint about lib/modindex.html being absent. Can we get this fixed > before 2.1 ships? I'm guessing I've lost a previous email on the topic, or it's buried in my inbox. If this is still a problem after today's checkins, could you please file a bug report and assign it to me? Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Mon Jan 22 21:26:15 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:26:15 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221555.KAA30935@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > IMHO, *this* *particular* gripe of Insure++ is just a pain in the > butt, and I wish there was a way to turn it off in Insure++ without > having to fix the code. Maybe there is. Barry? > IMHO, this was included in the standard to allow segmented-memory > implementations of C. Think certain DOS or Windows 3.1 memory models > where a pointer is a segment plus an offset. This is not current > practice even on Palmpilots! I could ask Tom MacDonald (former X3J11 chair), but don't want to bother him. The way these things usually turn out: the committee debated it 100 times over 10 years, but some committee member steadfastly claimed it was important. Since ANSI/ISO committees work via consensus, one implacable objector is enough. WRT pointers, I know that while the C committee did worry about segmented architectures a lot in the past, tagged architectures gave them much thornier problems (the HW tags each "word" with some manner of metadata (such as a busy/free or empty/full bit, or read+write permission bits, or a data type identifier, or a "capability" tag tying into a HW-enforced security architecture, ...), and checks those on each access, and some of the metadata can propagate into a pointer, and the HW can raise faults on pointer comparisons if the metadata doesn't match). While such machines aren't in common use, the US Govt does all sorts of things they don't talk about -- if it's not IBM's representative protecting a 40-year old architecture, it's someone emphatically not from the NSA protecting something they're not at liberty to discuss. Of course Python wants to run there too, even if we never hear about it ... > The standard may say that such comparisons are undefined, but I don't > care about this particular undefinedness, and I'm annoyed by the > required patches. Ya, and I'm annoyed that MS stdio corrupts itself -- but they're just clinging to the letter of the std too, and I've learned to live with it gracefully . pointer-ordering-comparisons-should-be-very-rare-anyway-ly y'rs - tim From tim.one at home.com Mon Jan 22 21:55:30 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:55:30 -0500 Subject: [Python-Dev] Worse news In-Reply-To: Message-ID: [Michael Hudson] > Hmm - my machine's done 28 exemplary "make clean; make test" runs this > morning. I last updated yesterday afternoon my time (~1700 GMT). So does mine now. The remaining failures require *unusual* ways of running the test suite (with -r to get test_cpickle to fail, confirmed now by Jeremy under Linux; and in an extremely specialized and seemingly Windows-specific way to get test_extcall to blow up w/ a bad pointer). From tim.one at home.com Mon Jan 22 22:07:27 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 16:07:27 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <14956.16758.68050.257212@localhost.localdomain> Message-ID: [Jeremy Hylton] > Funny (strange or haha?) that test_extcall is failing since the two > pieces of code I've modified most recently are compile.c and the > section of ceval.c that handles extended call syntax. Ya, I knew that, but I avoided wagging a Finger of Shame in your direction because coincidence isn't proof . > ... > As for the test_sax failure, There is no test_sax failure anywhere anymore that I know of (Martin found a dead-wrong array decl in contributed pyexpat.c code and repaired it). And I believe my "rt -x test_sax" failure in test_extcall almost certainly has nothing to do with test_sax -- far more likely the connection to test_sax is an accident, and that if I spend umpteen hours trying other things at random I'll provoke the same memory accident leading to a bad pointer via excluding some other test. I just picked test_sax because that *was* broken and I wanted to get thru the rest of the tests. BTW, delighted(?) to hear that test_cpickle fails for you too! I'm sure test_extcall is going to blow up for other people eventually too -- but it is sooooo hard to provoke even for me. I've dropped the effort pending news from someone running Insure++ or efence or whatever. From guido at digicool.com Mon Jan 22 22:18:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 16:18:26 -0500 Subject: [Python-Dev] Worse news In-Reply-To: Your message of "Mon, 22 Jan 2001 16:07:27 EST." References: Message-ID: <200101222118.QAA28305@cj20424-a.reston1.va.home.com> [Tim] > So does mine now. The remaining failures require *unusual* ways of running > the test suite (with -r to get test_cpickle to fail, confirmed now by Jeremy > under Linux; [and later] > BTW, delighted(?) to hear that test_cpickle fails for you too! This (test_cpickle) is a red herring -- it's a shallow failure in the test suite. test_cpickle imports test_pickle, but test_pickle first outputs the test output from testing pickle -- unless test_pickle has been run before! This succeeds: ./python Lib/test/regrtest.py test_cpickle test_pickle and this fails: ./python Lib/test/regrtest.py test_pickle test_cpickle Use regrtest.py -v to fidn out why. :-) I'm not sure how to restucture this, but it's not of the same quality as test_extcall or test_sax failing. Neither of those has failed for me on Linux during hours of testing. However on Windows I get an occasional appfail dialog box when using rt.bat. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Mon Jan 22 15:44:00 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 06:44:00 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from tim.one@home.com on Mon, Jan 22, 2001 at 04:07:27PM -0500 References: <14956.16758.68050.257212@localhost.localdomain> Message-ID: <20010122064400.A26543@glacier.fnational.com> On Mon, Jan 22, 2001 at 04:07:27PM -0500, Tim Peters wrote: > I've dropped the effort pending news from someone running > Insure++ or efence or whatever. efence to the rescue! I compiled with -fstruct-pack and used EF_ALIGNMENT=0 and now I can trigger a core dump by running test_extcall. More news comming... Neil From tim.one at home.com Mon Jan 22 22:41:08 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 16:41:08 -0500 Subject: [Python-Dev] test_sax and site-python In-Reply-To: <20010122145733.85E51373C95@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > I'm not sure whether this is really a bug, but I had the problem > that there was something wrong with the xml package I had > installed into my Lib/site-python, and this caused test_sax to > complain. > > If the test stuff is expected to test only the core functionality > maybe sys.path should be edited so that it only contains directories > that are part of the core distribution? AFAIK, xml *is* considered part of the core now, and has been since 2.0 was released. The wisdom of that decision is debatable with hindsight, but AFAICT xml is in the same boat as, say, zlib now: not builtin, and requires 3rd-party code to work, but part of the core all the same. The Windows installer comes w/ the necessary xml (and zlib) pieces, and I suppose the Mac Python package also should. From nas at arctrix.com Mon Jan 22 16:00:57 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 07:00:57 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from tim.one@home.com on Mon, Jan 22, 2001 at 04:07:27PM -0500 References: <14956.16758.68050.257212@localhost.localdomain> Message-ID: <20010122070057.A26575@glacier.fnational.com> Perhaps this will help somone track down the bug: [running test_extcall...] unbound method method() must be called with instance as first argument unbound method method() must be called with instance as first argument Program received signal SIGSEGV, Segmentation fault. symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 4330 if (TYPE(c) == DOUBLESTAR) { (gdb) l 4325 symtable_add_def(st, STR(CHILD(n, i)), 4326 DEF_PARAM | DEF_STAR); 4327 i += 2; 4328 c = CHILD(n, i); 4329 } 4330 if (TYPE(c) == DOUBLESTAR) { 4331 i++; 4332 symtable_add_def(st, STR(CHILD(n, i)), 4333 DEF_PARAM | DEF_DOUBLESTAR); 4334 } (gdb) p c $3 = (node *) 0x42a43fff (gdb) p *c $4 = {n_type = 0, n_str = 0x0, n_lineno = 0, n_nchildren = 0, n_child = 0x0} (gdb) p n $5 = (node *) 0x42a3ffd7 (gdb) p *n $6 = {n_type = 261, n_str = 0x0, n_lineno = 1, n_nchildren = 2, n_child = 0x42a43fc3} (gdb) bt 10 #0 symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 #1 0x8060126 in symtable_funcdef (st=0x429bafd0, n=0x42a23feb) at Python/compile.c:4245 #2 0x805fd29 in symtable_node (st=0x429bafd0, n=0x429b0fc3) at Python/compile.c:4128 #3 0x80600da in symtable_node (st=0x429bafd0, n=0x4290cfeb) at Python/compile.c:4232 #4 0x805f443 in symtable_build (c=0xbffff5c8, n=0x4290cfeb) at Python/compile.c:3816 #5 0x805f130 in jcompile (n=0x4290cfeb, filename=0x80a040f "", base=0x0) at Python/compile.c:3720 #6 0x805f0c2 in PyNode_Compile (n=0x4290cfeb, filename=0x80a040f "") at Python/compile.c:3699 #7 0x8069adf in run_node (n=0x4290cfeb, filename=0x80a040f "", globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:915 #8 0x8069ac0 in run_err_node (n=0x4290cfeb, filename=0x80a040f "", globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:907 #9 0x8069a30 in PyRun_String ( str=0x429f9fd1 "def zv(*v): print \"ok zv\", a, b, d, e, v, k", start=257, globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:881 (More stack frames follow...) From thomas at xs4all.net Mon Jan 22 23:13:29 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 23:13:29 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <20010122070057.A26575@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 22, 2001 at 07:00:57AM -0800 References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> Message-ID: <20010122231329.A27785@xs4all.nl> On Mon, Jan 22, 2001 at 07:00:57AM -0800, Neil Schemenauer wrote: > Perhaps this will help somone track down the bug: > [running test_extcall...] > unbound method method() must be called with instance as first argument > unbound method method() must be called with instance as first argument > > Program received signal SIGSEGV, Segmentation fault. > symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 > 4330 if (TYPE(c) == DOUBLESTAR) { > (gdb) l > 4325 symtable_add_def(st, STR(CHILD(n, i)), > 4326 DEF_PARAM | DEF_STAR); > 4327 i += 2; > 4328 c = CHILD(n, i); > 4329 } > 4330 if (TYPE(c) == DOUBLESTAR) { > 4331 i++; > 4332 symtable_add_def(st, STR(CHILD(n, i)), > 4333 DEF_PARAM | DEF_DOUBLESTAR); > 4334 } > (gdb) p c > $3 = (node *) 0x42a43fff > (gdb) p *c > $4 = {n_type = 0, n_str = 0x0, n_lineno = 0, n_nchildren = 0, n_child = 0x0} > (gdb) p n > $5 = (node *) 0x42a3ffd7 > (gdb) p *n > $6 = {n_type = 261, n_str = 0x0, n_lineno = 1, n_nchildren = 2, > n_child = 0x42a43fc3} n_child is 0x42a43fc3. That's n_child[0]. 0x42a43fff is the child being handled now. That would be n_child[3] (0x42a43fff - 0x42a3ffd7 == 60, a struct node is 20 bytes.) But n_children is 2, so it's an off-by-two error somewhere -- and look, there's a "i += 2' right above it ! It *looks* like this code will blow up whenever you use '*eggs' without '**spam' in a funtion definition. That's a fairly wild guess, but it's worth a try. Try this patch: Index: Python/compile.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/compile.c,v retrieving revision 2.148 diff -c -c -r2.148 compile.c *** Python/compile.c 2001/01/22 04:35:57 2.148 --- Python/compile.c 2001/01/22 22:12:31 *************** *** 4324,4329 **** --- 4324,4331 ---- i++; symtable_add_def(st, STR(CHILD(n, i)), DEF_PARAM | DEF_STAR); + if (NCH(n) <= i+2) + return; i += 2; c = CHILD(n, i); } -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr at thyrsus.com Mon Jan 22 21:13:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 15:13:09 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101221910.OAA01218@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 02:10:54PM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> Message-ID: <20010122151309.C15236@thyrsus.com> Guido van Rossum : > There's already a PEP on a set object type, and everybody and their > aunt has already implemented a set datatype. I've just read the PEP. Greg's proposal has a couple of problems. The biggest one is that the interface design isn't very Pythonic -- it's formally adequate, but doesn't exploit the extent to which sets naturally have common semantics with existing Python sequence types. This is bad; it means that a lot of code that could otherwise ignore the difference between lists and sets would have to be specialized one way or the other for no good reason. The only other set module I can find in the Vaults or anywhere else is kjBuckets (which I knew about before). Looks like a good design, but complicated -- and requires installation of an extension. > If *your* set module is ready for prime time, why not publish it in > the Vaults of Parnassus? I suppose that's what I'll do if you don't bless it for the standard library. But here are the reasons I suggest you should do so: 1. It supports a set of operations that are both often useful and fiddly to get right, thus enhancing the "batteries are included" effect. (I used its ancestor for representing seen-message numbers in a specialized mailreader, for example.) 2. It's simple for application programmers to use. No extension module to integrate. 3. It's unsurprising. My set objects behave almost exactly like other mutable sequences, with all the same built-in methods working, except for the fact that you can't introduce duplicates with the mutators. 4. It's already completely documented in a form suitable for the library. 5. It's simple enough not to cause you maintainance hassles down the road, and even if it did the maintainer is unlikely to disappear :-). -- Eric S. Raymond The United States is in no way founded upon the Christian religion -- George Washington & John Adams, in a diplomatic message to Malta. From guido at digicool.com Mon Jan 22 23:29:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 17:29:26 -0500 Subject: [Python-Dev] test_sax and site-python In-Reply-To: Your message of "Mon, 22 Jan 2001 16:41:08 EST." References: Message-ID: <200101222229.RAA28667@cj20424-a.reston1.va.home.com> > [Jack Jansen] > > I'm not sure whether this is really a bug, but I had the problem > > that there was something wrong with the xml package I had > > installed into my Lib/site-python, and this caused test_sax to > > complain. > > > > If the test stuff is expected to test only the core functionality > > maybe sys.path should be edited so that it only contains directories > > that are part of the core distribution? > [Tim] > AFAIK, xml *is* considered part of the core now, and has been since 2.0 was > released. The wisdom of that decision is debatable with hindsight, but > AFAICT xml is in the same boat as, say, zlib now: not builtin, and requires > 3rd-party code to work, but part of the core all the same. The Windows > installer comes w/ the necessary xml (and zlib) pieces, and I suppose the > Mac Python package also should. Yes, but Jack was talking about a non-std xml package in site-python... I agree that this shouldn't be picked up. But is it worth taking draconian measures to avoid this? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 22 23:35:08 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 17:35:08 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <200101222118.QAA28305@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This (test_cpickle) is a red herring -- it's a shallow failure in the > test suite. Fixed now -- thanks! Please note that Neil got text_extcall to fail in exactly the same place (see his recent Python-Dev) mail. That's the only remaining failure I know of. > ... > However on Windows I get an occasional appfail dialog box when > using rt.bat. I don't believe I've ever seen one of those ("appfail" rings no bells), and rt has never acted strangely for me. Your DOS-box properties may be screwed up: use Start -> Find -> Files or Folders ...; set "Look in" to C:; enter *.pif in the "Named:" box; click Find. You'll probably get a dozen hits. One of them will correspond to the method you use to open a DOS box (which I don't know). Right-click on that one and select Properties. On the Memory tab of the dialog that pops up, the four dropdown lists should have "Auto" selected. "Uses HMA" should be checked. Hmm ... looks like "Protected" *should* be checked but mine isn't ... oh, this goes on and on. I don't even know which version of Windows you're using here! How about I look at it next time I'm at your house ... From greg at cosc.canterbury.ac.nz Mon Jan 22 23:50:07 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 11:50:07 +1300 (NZDT) Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl> Message-ID: <200101222250.LAA01929@s454.cosc.canterbury.ac.nz> > 4330 if (TYPE(c) == DOUBLESTAR) { > 4325 symtable_add_def(st, STR(CHILD(n, i)), > 4326 DEF_PARAM | DEF_STAR); Shouldn't line 4330 say if (TYPE(c) == STAR) ? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From thomas at xs4all.net Mon Jan 22 23:56:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 23:56:02 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <200101222250.LAA01929@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Tue, Jan 23, 2001 at 11:50:07AM +1300 References: <20010122231329.A27785@xs4all.nl> <200101222250.LAA01929@s454.cosc.canterbury.ac.nz> Message-ID: <20010122235602.B27785@xs4all.nl> On Tue, Jan 23, 2001 at 11:50:07AM +1300, Greg Ewing wrote: > > 4330 if (TYPE(c) == DOUBLESTAR) { > > 4325 symtable_add_def(st, STR(CHILD(n, i)), > > 4326 DEF_PARAM | DEF_STAR); > Shouldn't line 4330 say if (TYPE(c) == STAR) ? No, that's line 4323. You can't have doublestar without having star, and star should precede doublestar. (Grammar should enforce that.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From paulp at ActiveState.com Tue Jan 23 00:02:07 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 22 Jan 2001 15:02:07 -0800 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> Message-ID: <3A6CBBEF.4732BFF2@ActiveState.com> Guido van Rossum wrote: > > .... > > Yes, wow! > > .... I apologize but I'm not clear on my responsibilities here, if any. I wrote a PEP for online help. I submitted a partial implementation. Ping wrote a full implementation that basically supercedes mine. There are various ideas for improving it, but I think that we agree that the core is solid. Several people have said that it should be moved into the core library. Nobody has said that it shouldn't. Whose move is it? What's next? Paul Prescod From fredrik at effbot.org Tue Jan 23 00:08:40 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 00:08:40 +0100 Subject: [Python-Dev] test___all__ fails if bsddb not available Message-ID: <079a01c084c8$43023e40$e46940d5@hagrid> test___all__ test test___all__ failed -- dbhash has no __all__ attribute maybe this test shouldn't depend on optional modules? From nas at arctrix.com Mon Jan 22 17:24:34 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 08:24:34 -0800 Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 22, 2001 at 11:13:29PM +0100 References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> <20010122231329.A27785@xs4all.nl> Message-ID: <20010122082433.B26765@glacier.fnational.com> On Mon, Jan 22, 2001 at 11:13:29PM +0100, Thomas Wouters wrote: > That's a fairly wild guess, but it's worth a try. Try this > patch: [...] Works for me. Neil From greg at cosc.canterbury.ac.nz Tue Jan 23 00:21:14 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 12:21:14 +1300 (NZDT) Subject: [Python-Dev] Worse news In-Reply-To: <20010122235602.B27785@xs4all.nl> Message-ID: <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> Thomas Wouters : > You can't have doublestar without having star What?!? You could in 1.5.2. Has that changed? Anyway, it just looked a bit odd that it seemed to be testing for DOUBLESTAR and then adding a DEF_STAR thing to the symtab. But I guess I should shut up until I've seen all of the code. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From thomas at xs4all.net Tue Jan 23 00:26:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 23 Jan 2001 00:26:02 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <200101222321.MAA01957@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Tue, Jan 23, 2001 at 12:21:14PM +1300 References: <20010122235602.B27785@xs4all.nl> <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> Message-ID: <20010123002602.C27785@xs4all.nl> On Tue, Jan 23, 2001 at 12:21:14PM +1300, Greg Ewing wrote: > Thomas Wouters : > > You can't have doublestar without having star > What?!? You could in 1.5.2. Has that changed? Sorry, my bad, I'm wrong. (I just tested this.) I could swear it was that way, but it's 0:25 right now, after a night with about 2 hours decent sleep, so ignore my delusions :) > Anyway, it just looked a bit odd that it seemed to be testing > for DOUBLESTAR and then adding a DEF_STAR thing to the symtab. > But I guess I should shut up until I've seen all of the code. No, it's not doing that. It's adding the symbol name to the symtab, with DEF_DOUBLESTAR as one of its flags. Not sure what the flag does, but I could guess. (But see the above mentioned delusions as to why I'm not doing that out loud anymore :-) The 'if' in front of it adds the symbol to the symtab with DEF_STAR as a flag, in the case of 'STAR' (rather than DOUBLESTAR). Really. go check :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 23 00:31:03 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 23 Jan 2001 00:31:03 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <20010123002602.C27785@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 23, 2001 at 12:26:02AM +0100 References: <20010122235602.B27785@xs4all.nl> <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> <20010123002602.C27785@xs4all.nl> Message-ID: <20010123003103.D27785@xs4all.nl> On Tue, Jan 23, 2001 at 12:26:02AM +0100, Thomas Wouters wrote: > On Tue, Jan 23, 2001 at 12:21:14PM +1300, Greg Ewing wrote: > > Thomas Wouters : > > > > You can't have doublestar without having star > > > What?!? You could in 1.5.2. Has that changed? > Sorry, my bad, I'm wrong. (I just tested this.) I could swear it was that > way, but it's 0:25 right now, after a night with about 2 hours decent sleep, > so ignore my delusions :) Ah, yeah, what I meant to *think* was: you can't have *spam *after* **eggs: >>> def foo(x, **kwarg, *arg) File "", line 1 def foo(x, **kwarg, *arg) ^ SyntaxError: invalid syntax So the logic of the latter part of the function seems okay (after the little patch I posted before.) Jeremy should give his expert opinion before it goes in, though, since it's his code :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Tue Jan 23 00:36:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 18:36:17 -0500 Subject: [Python-Dev] test___all__ fails if bsddb not available In-Reply-To: Your message of "Tue, 23 Jan 2001 00:08:40 +0100." <079a01c084c8$43023e40$e46940d5@hagrid> References: <079a01c084c8$43023e40$e46940d5@hagrid> Message-ID: <200101222336.SAA30480@cj20424-a.reston1.va.home.com> > test test___all__ failed -- dbhash has no __all__ attribute > > maybe this test shouldn't depend on optional modules? Fixed -- I just skip dbhash if bsddb can't be imported. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Tue Jan 23 01:38:28 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 19:38:28 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl> References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> <20010122231329.A27785@xs4all.nl> Message-ID: <14956.53892.651549.493268@localhost.localdomain> Thomas, Your patch has the right diagnosis, although I would write it a tad differently. NCH(n) <= i + 2 should be NCH(n) < i + 2, because CHILD(n, NCH(i)) is not valid. I'll check it in. Jeremy From jeremy at alum.mit.edu Tue Jan 23 02:23:56 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 20:23:56 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <20010119232323.70B03116392@oratrix.oratrix.nl> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> Message-ID: <14956.56620.706531.647341@localhost.localdomain> >>>>> "JJ" == Jack Jansen writes: JJ> Recently, Guido van Rossum said: >> > I get the impression that I'm currently seeing a non-NULL third >> > argument in my (C) methods even though the method is called >> > without keyword arguments. >> >> > Is this new semantics that I missed the discussion about, or is >> > this a bug? >> >> [...] Do you really need the NULL? JJ> The places that I know I was counting on the NULL now have "if ( JJ> kw && PyObject_IsTrue(kw))", so I'll just have to hope there JJ> aren't any more lingering in there. Guido, Does your query ("Do you really need the NULL?") mean that you don't care whether the argument is NULL or an empty dictionary? I could change the code to do either for 2.1a2, if you have a preference. Jeremy From guido at digicool.com Tue Jan 23 02:33:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 20:33:20 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Mon, 22 Jan 2001 20:23:56 EST." <14956.56620.706531.647341@localhost.localdomain> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> Message-ID: <200101230133.UAA04378@cj20424-a.reston1.va.home.com> > Guido, > > Does your query ("Do you really need the NULL?") mean that you don't > care whether the argument is NULL or an empty dictionary? I could > change the code to do either for 2.1a2, if you have a preference. > > Jeremy Robust code IMO should treat NULL and {} the same. But since traditionally we passed NULL, it's better to pass NULL rather than {}. I believe that's the status quo now, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Tue Jan 23 02:54:53 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 20:54:53 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <200101230133.UAA04378@cj20424-a.reston1.va.home.com> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> <200101230133.UAA04378@cj20424-a.reston1.va.home.com> Message-ID: <14956.58477.874472.190937@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: [Jeremy wrote:] >> Does your query ("Do you really need the NULL?") mean that you >> don't care whether the argument is NULL or an empty dictionary? >> I could change the code to do either for 2.1a2, if you have a >> preference. GvR> Robust code IMO should treat NULL and {} the same. But since GvR> traditionally we passed NULL, it's better to pass NULL rather GvR> than {}. I believe that's the status quo now, right? The current status in CVS is to pass {}, because there appeared to be some case where a PyCFunction was not expecting NULL. I assumed, without checking, that {} was required and change the implementation to always pass a dictionary to METH_KEYWORDS functions. I could change it back to NULL and see if I can reproduce the error I was seeing. Jeremy From guido at digicool.com Tue Jan 23 03:01:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 21:01:12 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Mon, 22 Jan 2001 20:54:53 EST." <14956.58477.874472.190937@localhost.localdomain> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> <200101230133.UAA04378@cj20424-a.reston1.va.home.com> <14956.58477.874472.190937@localhost.localdomain> Message-ID: <200101230201.VAA15993@cj20424-a.reston1.va.home.com> > [Jeremy wrote:] > >> Does your query ("Do you really need the NULL?") mean that you > >> don't care whether the argument is NULL or an empty dictionary? > >> I could change the code to do either for 2.1a2, if you have a > >> preference. > > GvR> Robust code IMO should treat NULL and {} the same. But since > GvR> traditionally we passed NULL, it's better to pass NULL rather > GvR> than {}. I believe that's the status quo now, right? > > The current status in CVS is to pass {}, because there appeared to be > some case where a PyCFunction was not expecting NULL. I assumed, > without checking, that {} was required and change the implementation > to always pass a dictionary to METH_KEYWORDS functions. I could > change it back to NULL and see if I can reproduce the error I was > seeing. Yes, that's a good idea. I hope that the {} in alpha 1 won't make folks think that they will never see a NULL in the future and code accordingly... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 23 03:15:11 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 21:15:11 -0500 Subject: [Python-Dev] 2.1a1 release tonight -- but no nested scopes or weak refs Message-ID: <200101230215.VAA16577@cj20424-a.reston1.va.home.com> We've decided to release 2.1a1 without further ado, but without two big hopeful patches: Jeremy's nested scopes aren't finished and will take considerably more time, and Fred's weak references need more review (I haven't had the time to look at the code). Rather than wait longer, I've decided to try and release 2.1a1 tonight -- there's nothing I'm waiting for now before I can cut a tarball. There will be an alpha2 release around February 1. Please don't make any check-ins until I announce the 2.1a1 release here. (PythonLabs: please mail or phone me if you need to check in a last-minute thing -- I'm tagging the tree now.) More news as it happens, --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Jan 23 03:36:24 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 22 Jan 2001 20:36:24 -0600 (CST) Subject: [Python-Dev] test_grammar failing Message-ID: <14956.60968.363878.643640@beluga.mojam.com> At the end of this: make distclean ; ./configure ; make OPT='-g -pipe' ; make test I get this: rm -f ./Lib/test/*.py[co] PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l test_grammar name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 locals: {'x': 2, '[1]': 1, 'l': 0} globals: {} Fatal Python error: compiler did not label name as local or global make: *** [test] Aborted PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l test_grammar name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 locals: {'x': 2, '[1]': 1, 'l': 0} globals: {} Fatal Python error: compiler did not label name as local or global make: *** [test] Aborted Any ideas? I notice that Jeremy checked in some changes to test_grammar.py this evening. Skip From gvwilson at nevex.com Tue Jan 23 03:47:33 2001 From: gvwilson at nevex.com (Greg Wilson) Date: Mon, 22 Jan 2001 21:47:33 -0500 (EST) Subject: [Python-Dev] re: I think my set module is ready for prime time Message-ID: > > Guido van Rossum: > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. > Eric Raymond: > Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > ...doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. I agree with Eric's point; I put the interface design on hold while I went off to try to find an efficient implementation capable of handling mutable values (i.e. one that would allow things like sets of sets). I'm still looking :-(, but would appreciate comments from this list on Eric's interface. Thanks, Greg From guido at digicool.com Tue Jan 23 04:02:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 22:02:50 -0500 Subject: [Python-Dev] test_grammar failing In-Reply-To: Your message of "Mon, 22 Jan 2001 20:36:24 CST." <14956.60968.363878.643640@beluga.mojam.com> References: <14956.60968.363878.643640@beluga.mojam.com> Message-ID: <200101230302.WAA27104@cj20424-a.reston1.va.home.com> > At the end of this: > > make distclean ; ./configure ; make OPT='-g -pipe' ; make test > > I get this: > > rm -f ./Lib/test/*.py[co] > PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l > test_grammar > name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 > locals: {'x': 2, '[1]': 1, 'l': 0} > globals: {} > Fatal Python error: compiler did not label name as local or global > make: *** [test] Aborted > PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l > test_grammar > name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 > locals: {'x': 2, '[1]': 1, 'l': 0} > globals: {} > Fatal Python error: compiler did not label name as local or global > make: *** [test] Aborted > > Any ideas? I notice that Jeremy checked in some changes to test_grammar.py > this evening. Try another cvs update and rebuild. The test that Jeremy checked in is supposed to catch a bug in the compiler code that he checked in. The latest compile.c is 103277 bytes long (in Unix). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 23 04:33:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 22:33:02 -0500 Subject: [Python-Dev] Python 2.1 alpha 1 released! Message-ID: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Thanks to the PythonLabs developers and the many hard-working volunteers, I'm proud to release Python 2.1a1 -- the first alpha release of Python version 2.1. The release mechanics are different than for previous releases: we're only releasing through SourceForge for now. The official source tarball is already available from the download page: http://sourceforge.net/project/showfiles.php?group_id=5470 Additional files will be released soon: a Windows installer, Linux RPMs, and documentation. Please give it a good try! The only way Python 2.1 can become a rock-solid product is if people test the alpha releases. Especially if you are using Python for demanding applications or on extreme platforms we are interested in hearing your feedback. Are you embedding Python or using threads? Please test your application using Python 2.1a1! Please submit all bug reports through SourceForge: http://sourceforge.net/bugs/?group_id=5470 Here's the NEWS file: What's New in Python 2.1 alpha 1? ================================= Core language, builtins, and interpreter - There is a new Unicode companion to the PyObject_Str() API called PyObject_Unicode(). It behaves in the same way as the former, but assures that the returned value is an Unicode object (applying the usual coercion if necessary). - The comparison operators support "rich comparison overloading" (PEP 207). C extension types can provide a rich comparison function in the new tp_richcompare slot in the type object. The cmp() function and the C function PyObject_Compare() first try the new rich comparison operators before trying the old 3-way comparison. There is also a new C API PyObject_RichCompare() (which also falls back on the old 3-way comparison, but does not constrain the outcome of the rich comparison to a Boolean result). The rich comparison function takes two objects (at least one of which is guaranteed to have the type that provided the function) and an integer indicating the opcode, which can be Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, Py_GE (for <, <=, ==, !=, >, >=), and returns a Python object, which may be NotImplemented (in which case the tp_compare slot function is used as a fallback, if defined). Classes can overload individual comparison operators by defining one or more of the methods__lt__, __le__, __eq__, __ne__, __gt__, __ge__. There are no explicit "reflected argument" versions of these; instead, __lt__ and __gt__ are each other's reflection, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reflection (similar at the C level). No other implications are made; in particular, Python does not assume that == is the Boolean inverse of !=, or that < is the Boolean inverse of >=. This makes it possible to define types with partial orderings. Classes or types that want to implement (in)equality tests but not the ordering operators (i.e. unordered types) should implement == and !=, and raise an error for the ordering operators. It is possible to define types whose rich comparison results are not Boolean; e.g. a matrix type might want to return a matrix of bits for A < B, giving elementwise comparisons. Such types should ensure that any interpretation of their value in a Boolean context raises an exception, e.g. by defining __nonzero__ (or the tp_nonzero slot at the C level) to always raise an exception. - Complex numbers use rich comparisons to define == and != but raise an exception for <, <=, > and >=. Unfortunately, this also means that cmp() of two complex numbers raises an exception when the two numbers differ. Since it is not mathematically meaningful to compare complex numbers except for equality, I hope that this doesn't break too much code. - Functions and methods now support getting and setting arbitrarily named attributes (PEP 232). Functions have a new __dict__ (a.k.a. func_dict) which hold the function attributes. Methods get and set attributes on their underlying im_func. It is a TypeError to set an attribute on a bound method. - The xrange() object implementation has been improved so that xrange(sys.maxint) can be used on 64-bit platforms. There's still a limitation that in this case len(xrange(sys.maxint)) can't be calculated, but the common idiom "for i in xrange(sys.maxint)" will work fine as long as the index i doesn't actually reach 2**31. (Python uses regular ints for sequence and string indices; fixing that is much more work.) - Two changes to from...import: 1) "from M import X" now works even if M is not a real module; it's basically a getattr() operation with AttributeError exceptions changed into ImportError. 2) "from M import *" now looks for M.__all__ to decide which names to import; if M.__all__ doesn't exist, it uses M.__dict__.keys() but filters out names starting with '_' as before. Whether or not __all__ exists, there's no restriction on the type of M. - File objects have a new method, xreadlines(). This is the fastest way to iterate over all lines in a file: for line in file.xreadlines(): ...do something to line... See the xreadlines module (mentioned below) for how to do this for other file-like objects. - Even if you don't use file.xreadlines(), you may expect a speedup on line-by-line input. The file.readline() method has been optimized quite a bit in platform-specific ways: on systems (like Linux) that support flockfile(), getc_unlocked(), and funlockfile(), those are used by default. On systems (like Windows) without getc_unlocked(), a complicated (but still thread-safe) method using fgets() is used by default. You can force use of the fgets() method by #define'ing USE_FGETS_IN_GETLINE at build time (it may be faster than getc_unlocked()). You can force fgets() not to be used by #define'ing DONT_USE_FGETS_IN_GETLINE (this is the first thing to try if std test test_bufio.py fails -- and let us know if it does!). - In addition, the fileinput module, while still slower than the other methods on most platforms, has been sped up too, by using file.readlines(sizehint). - Support for run-time warnings has been added, including a new command line option (-W) to specify the disposition of warnings. See the description of the warnings module below. - Extensive changes have been made to the coercion code. This mostly affects extension modules (which can now implement mixed-type numerical operators without having to use coercion), but occasionally, in boundary cases the coercion semantics have changed subtly. Since this was a terrible gray area of the language, this is considered an improvement. Also note that __rcmp__ is no longer supported -- instead of calling __rcmp__, __cmp__ is called with reflected arguments. - In connection with the coercion changes, a new built-in singleton object, NotImplemented is defined. This can be returned for operations that wish to indicate they are not implemented for a particular combination of arguments. From C, this is Py_NotImplemented. - The interpreter accepts now bytecode files on the command line even if they do not have a .pyc or .pyo extension. On Linux, after executing echo ':pyc:M::\x87\xc6\x0d\x0a::/usr/local/bin/python:' > /proc/sys/fs/binfmt_misc/register any byte code file can be used as an executable (i.e. as an argument to execve(2)). - %[xXo] formats of negative Python longs now produce a sign character. In 1.6 and earlier, they never produced a sign, and raised an error if the value of the long was too large to fit in a Python int. In 2.0, they produced a sign if and only if too large to fit in an int. This was inconsistent across platforms (because the size of an int varies across platforms), and inconsistent with hex() and oct(). Example: >>> "%x" % -0x42L '-42' # in 2.1 'ffffffbe' # in 2.0 and before, on 32-bit machines >>> hex(-0x42L) '-0x42L' # in all versions of Python The behavior of %d formats for negative Python longs remains the same as in 2.0 (although in 1.6 and before, they raised an error if the long didn't fit in a Python int). %u formats don't make sense for Python longs, but are allowed and treated the same as %d in 2.1. In 2.0, a negative long formatted via %u produced a sign if and only if too large to fit in an int. In 1.6 and earlier, a negative long formatted via %u raised an error if it was too big to fit in an int. - Dictionary objects have an odd new method, popitem(). This removes an arbitrary item from the dictionary and returns it (in the form of a (key, value) pair). This can be useful for algorithms that use a dictionary as a bag of "to do" items and repeatedly need to pick one item. Such algorithms normally end up running in quadratic time; using popitem() they can usually be made to run in linear time. Standard library - In the time module, the time argument to the functions strftime, localtime, gmtime, asctime and ctime is now optional, defaulting to the current time (in the local timezone). - The ftplib module now defaults to passive mode, which is deemed a more useful default given that clients are often inside firewalls these days. Note that this could break if ftplib is used to connect to a *server* that is inside a firewall, from outside; this is expected to be a very rare situation. To fix that, you can call ftp.set_pasv(0). - The module site now treats .pth files not only for path configuration, but also supports extensions to the initialization code: Lines starting with import are executed. - There's a new module, warnings, which implements a mechanism for issuing and filtering warnings. There are some new built-in exceptions that serve as warning categories, and a new command line option, -W, to control warnings (e.g. -Wi ignores all warnings, -We turns warnings into errors). warnings.warn(message[, category]) issues a warning message; this can also be called from C as PyErr_Warn(category, message). - A new module xreadlines was added. This exports a single factory function, xreadlines(). The intention is that this code is the absolutely fastest way to iterate over all lines in an open file(-like) object: import xreadlines for line in xreadlines.xreadlines(file): ...do something to line... This is equivalent to the previous the speed record holder using file.readlines(sizehint). Note that if file is a real file object (as opposed to a file-like object), this is equivalent: for line in file.xreadlines(): ...do something to line... - The bisect module has new functions bisect_left, insort_left, bisect_right and insort_right. The old names bisect and insort are now aliases for bisect_right and insort_right. XXX_right and XXX_left methods differ in what happens when the new element compares equal to one or more elements already in the list: the XXX_left methods insert to the left, the XXX_right methods to the right. Code that doesn't care where equal elements end up should continue to use the old, short names ("bisect" and "insort"). - The new curses.panel module wraps the panel library that forms part of SYSV curses and ncurses. Contributed by Thomas Gellekum. - The SocketServer module now sets the allow_reuse_address flag by default in the TCPServer class. - A new function, sys._getframe(), returns the stack frame pointer of the caller. This is intended only as a building block for higher-level mechanisms such as string interpolation. Build issues - For Unix (and Unix-compatible) builds, configuration and building of extension modules is now greatly automated. Rather than having to edit the Modules/Setup file to indicate which modules should be built and where their include files and libraries are, a distutils-based setup.py script now takes care of building most extension modules. All extension modules built this way are built as shared libraries. Only a few modules that must be linked statically are still listed in the Setup file; you won't need to edit their configuration. - Python should now build out of the box on Cygwin. If it doesn't, mail to Jason Tishler (jlt63 at users.sourceforge.net). - Python now always uses its own (renamed) implementation of getopt() -- there's too much variation among C library getopt() implementations. - C++ compilers are better supported; the CXX macro is always set to a C++ compiler if one is found. Windows changes - select module: By default under Windows, a select() call can specify no more than 64 sockets. Python now boosts this Microsoft default to 512. If you need even more than that, see the MS docs (you'll need to #define FD_SETSIZE and recompile Python from source). - Support for Windows 3.1, DOS and OS/2 is gone. The Lib/dos-8x3 subdirectory is no more! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Tue Jan 23 05:11:09 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:11:09 -0800 (PST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <3A6CBBEF.4732BFF2@ActiveState.com> Message-ID: Guido van Rossum wrote: > Yes, wow! Paul Prescod wrote: > I apologize but I'm not clear on my responsibilities here, if any. I > wrote a PEP for online help. I submitted a partial implementation. Hi, guys. Sorry i haven't been sending updates on what i'm doing. Here's the current picture as i see it. > Ping wrote a full implementation that basically supercedes mine. My implementation is "full" in that it deploys and seems to work on arbitrary modules as it stands, but it doesn't really supercede Paul's because it leaves out the big piece of Paul's work that did conversion from packaged HTML docs to plain text. It also has the deficiency that it imports modules live; for untrusted modules, this is a security risk. I know Paul has been working on stuff to compile a module into a kind of skeleton object that has all the same name bindings but no live contents, and if that works reliably, we should definitely try plugging that in. > There are various ideas for improving it, but I think that we agree > that the core is solid. Yes. I believe that as it stands, pydoc is useful enough to be a net positive addition to the core. inspect.py alone has been stable and alpha-ready for some time, i believe. Here is a summary of its status and work that remains. pydoc has: inspecting live objects generating text docs from live objects generating HTML docs from live objects serving HTML docs from a little web server showing docs from the command line showing docs from within the interactive interpreter apropos-style module listing It's missing the following, and Paul had stuff for this: inspecting unsafe modules generating text docs from packaged HTML (e.g. language reference) It also needs these: generating docs from a file given on the command line (easy) more Windows and Mac testing and decisions various small bugfixes This past week i've been messing around with Windows and Mac stuff, trying to see whether it's possible to reliably spawn a webserver and launch a web browser at the same time (this would seem to be a good default action to do on GUI platforms). In trying to do the latter i've found the webbrowser module pretty unreliable, by the way. For example, it relies on a constant delay of 4 seconds to launch a new browser that can't be expected on all platforms, and fails to launch Netscape 3 because it supplies an illegal command-line option. When i've found good cross-platform ways to make this work i'll suggest some patches. I've so far considered this project blocked only on cross-platform testing -- do you agree? While i know that inspecting unsafe modules and processing packaged HTML are important features, i don't consider them essential. -- ?!ng From ping at lfw.org Tue Jan 23 05:14:50 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:14:50 -0800 (PST) Subject: [Python-Dev] webbrowser.py In-Reply-To: Message-ID: On Mon, 22 Jan 2001, Ka-Ping Yee wrote: > In trying to do the latter i've found the webbrowser module pretty > unreliable, by the way. For example, it relies on a constant delay > of 4 seconds to launch a new browser that can't be expected on all > platforms, and fails to launch Netscape 3 because it supplies an > illegal command-line option. When i've found good cross-platform > ways to make this work i'll suggest some patches. Oh, and i forgot to mention... i was pretty disappointed that: setenv BROWSER my_browser_program python -c 'import webbrowser; webbrowser.open("http://python.org/")' doesn't execute "my_browser_program http://python.org/" as i would have hoped. Even for a known browser type: setenv BROWSER lynx python -c 'import webbrowser; webbrowser.open("http://python.org/")' does not work as expected, either. (Red Hat Linux here.) -- ?!ng From ping at lfw.org Tue Jan 23 05:22:56 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:22:56 -0800 (PST) Subject: [Python-Dev] Is X a (sequence|mapping)? Message-ID: We can implement abstract interfaces (sequence, mapping, number) in Python with the appropriate __special__ methods, but i don't see an easy way to test if something supports one of these abstract interfaces in Python. At the moment, to see if something is a sequence i believe i have to say something like try: x[0] except: # not a sequence else: # okay, it's a sequence or if hasattr(x, '__getitem__') or type(x) in [type(()), type([])]: ... Is there, or should there be, a better way to do this? -- ?!ng From greg at cosc.canterbury.ac.nz Tue Jan 23 05:46:26 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 17:46:26 +1300 (NZDT) Subject: [Python-Dev] re: I think my set module is ready for prime time In-Reply-To: Message-ID: <200101230446.RAA01992@s454.cosc.canterbury.ac.nz> Greg Wilson : > an efficient implementation capable of > handling mutable values (i.e. one that would allow things like sets of > sets) I suspect that such a thing is impossible. To avoid a linear search you have to take advantage of some kind of hashing or ordering, which you can't do if your objects can change their values out from under you. Also, there's nothing to stop someone from mutating two previously unequal elements so that they're equal. Then you have a "set" with two identical elements, which isn't a set any more, it's just a collection. So, I submit that the very concept of a set only makes sense for immutable values. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Tue Jan 23 06:03:18 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 00:03:18 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: Message-ID: [?!ng] > ... > At the moment, to see if something is a sequence i believe i have to > say something like > > try: > x[0] > except: > # not a sequence > else: > # okay, it's a sequence > > or > > if hasattr(x, '__getitem__') or type(x) in [type(()), type([])]: > ... > > Is there, or should there be, a better way to do this? Dunno. What's a sequence? If you want to know whether x[0] will blow up, trying x[0] is the most obvious way. BTW, I expect trying x[:0] is a better idea: doesn't succeed for dicts, and doesn't blow up for an irrelevant reason if x is an empty sequence. BTW2, your second method suggests an uncomfortable truth: many contexts that want "a sequence" don't want strings to pass the test, despite that strings are as much sequences as lists in Python, no matter how "a sequence" is defined. afraid that-what-you-want-to-do-with-it-is-more-important-than-what- python-calls-it-ly y'rs - tim From ping at lfw.org Tue Jan 23 06:27:30 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 21:27:30 -0800 (PST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010122124159.A14999@thyrsus.com> Message-ID: On Mon, 22 Jan 2001, Eric S. Raymond wrote: > \section{\module{set} --- > Basic set algebra for Python} I'd like to look at the module. Did you actually show us the code for this, or am i a blind doofus? (Please, no answers to the unasked question of whether i am a doofus.) -- ?!ng From tim.one at home.com Tue Jan 23 07:05:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 01:05:26 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <20010122064400.A26543@glacier.fnational.com> Message-ID: In finding and repairing the test_extcall bug, Neil and Thomas have once again contributed beyond the call of duty. Thank you! It took some doing to convince Guido to release his Dutch Death Grip on the PythonLabs coffers, but in the end he was overcome by the moral necessity of rewarding you sterling fellows for your golden deeds: you're both entitled to free(*)-- yes, FREE(*)! --copies of all Python 2.1 alpha, *and* beta, releases(*)! you-wouldn't-believe-how-much-he-charges-us-ly y'rs - tim (*) Does not apply to Jython releases. All applicable taxes are the responsibility of the recipient. No warranty is expressed or implied. This offer has not been reviewed or approved by CWI, CNRI, BeOpen.com, or Digital Creations 2. Export restrictions may apply. By acceptance of this offer, recipient grants perpetual license to use their name, image and likeness in Python promotional materials without compensation. Packaging, handling, shipping and insurance costs to be borne by recipient, but in no case to exceed 1 (one) US$/byte. This offer may be withdrawn at any time, including but not limited to retroactively, at the sole discretion of Guido van Rossum, or such of his heirs and successors as he may designate from time to time. From martin at mira.cs.tu-berlin.de Tue Jan 23 09:14:32 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 23 Jan 2001 09:14:32 +0100 Subject: [Python-Dev] Is X a (sequence|mapping)? Message-ID: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> > i don't see an easy way to test if something supports one of these > abstract interfaces in Python. Why do you want to test for that? If you have an algorithm that only operates on integer-indexed things, what can you do if the test fails? So it is always better to just use the object in the algorithm, and let it break with an exception if somebody passes a bad object. Regards, Martin From mal at lemburg.com Tue Jan 23 10:08:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 10:08:24 +0100 Subject: [Python-Dev] webbrowser.py References: Message-ID: <3A6D4A08.B3806984@lemburg.com> Ka-Ping Yee wrote: > > On Mon, 22 Jan 2001, Ka-Ping Yee wrote: > > In trying to do the latter i've found the webbrowser module pretty > > unreliable, by the way. For example, it relies on a constant delay > > of 4 seconds to launch a new browser that can't be expected on all > > platforms, and fails to launch Netscape 3 because it supplies an > > illegal command-line option. When i've found good cross-platform > > ways to make this work i'll suggest some patches. > > Oh, and i forgot to mention... i was pretty disappointed that: > > setenv BROWSER my_browser_program > python -c 'import webbrowser; webbrowser.open("http://python.org/")' > > doesn't execute "my_browser_program http://python.org/" as i would > have hoped. Even for a known browser type: > > setenv BROWSER lynx > python -c 'import webbrowser; webbrowser.open("http://python.org/")' > > does not work as expected, either. (Red Hat Linux here.) Hmm, lynx should work (the module has explicit support for it) and yes, I agree, webbrowser should trust BROWSER and use a generic calling mechanism (program ) for opening the URL. Too late for 2.1a1, but maybe for a2 ?! BTW, I think that the second line here is causing the problem: class CommandLineBrowser: _browsers = [] # <- this overrides the global of the same name if os.environ.get("DISPLAY"): _browsers.extend([ ("netscape", "netscape %s >/dev/null &"), ("mosaic", "mosaic %s >/dev/null &"), ]) _browsers.extend([ ("lynx", "lynx %s"), ("w3m", "w3m %s"), ]) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Jan 23 10:15:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 10:15:11 +0100 Subject: [Python-Dev] Is X a (sequence|mapping)? References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> Message-ID: <3A6D4B9F.38B17046@lemburg.com> "Martin v. Loewis" wrote: > > > i don't see an easy way to test if something supports one of these > > abstract interfaces in Python. > > Why do you want to test for that? If you have an algorithm that only > operates on integer-indexed things, what can you do if the test fails? > > So it is always better to just use the object in the algorithm, and > let it break with an exception if somebody passes a bad object. Right. Polymorphic code will usually get you more out of an algorithm, than type-safe or interface-safe code. BTW, there are Python interfaces to PySequence_Check() and PyMapping_Check() burried in the builtin operator module in case you really do care ;) ... operator.isSequenceType() operator.isMappingType() + some other C style _Check() APIs These only look at the type slots though, so Python instances will appear to support everything but when used fail with an exception if they don't provide the proper __xxx__ hooks. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 23 10:17:30 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 04:17:30 -0500 Subject: [Python-Dev] webbrowser.py Message-ID: <20010123041730.A25165@thyrsus.com> Ping's complaints are justified -- I've been looking at and testing webbrowser.py and it's a mess. Among other things: 1. The BROWSER variable is not interpreted properly. 2. The code is stupid about loading platform support it doesn't need. 3. It's not possible to specify lynx as a browser under Unix, because the computation of available browsers is split in two and partly done inside the CommandLineBrowser class. 3. The module code is excessively hard to read, obscuring these bugs. Our mistake was hurriedly merging the launcher code from IDLE with the browser-finder hack I wrote (the guts of CommandLineBrowser). The resulting code is a bad, overcomplicated architecture with a nasty seam in it. As co-designer/implementor I should have caught this sooner, but I was in a hurry to get a CML2 prototype out the door and didn't test anything but the case I needed. My apologies to all. I'm rewriting to fix these problems now. Documented semantics of entry points will be preserved. -- Eric S. Raymond The politician attempts to remedy the evil by increasing the very thing that caused the evil in the first place: legal plunder. -- Frederick Bastiat From mal at lemburg.com Tue Jan 23 11:26:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 11:26:16 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> Message-ID: <3A6D5C48.A076DA0@lemburg.com> "Eric S. Raymond" wrote: > > Guido van Rossum : > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. > > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. > > The only other set module I can find in the Vaults or anywhere else is > kjBuckets (which I knew about before). Looks like a good design, but > complicated -- and requires installation of an extension. There's also a kjSet.py available at Aaron's site: http://www.chordate.com/kwParsing/index.html which is a pure Python version of the C extenion's kjSet type. > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: > > 1. It supports a set of operations that are both often useful and > fiddly to get right, thus enhancing the "batteries are included" > effect. (I used its ancestor for representing seen-message numbers in > a specialized mailreader, for example.) > > 2. It's simple for application programmers to use. No extension module > to integrate. > > 3. It's unsurprising. My set objects behave almost exactly like other > mutable sequences, with all the same built-in methods working, except for > the fact that you can't introduce duplicates with the mutators. > > 4. It's already completely documented in a form suitable for the library. > > 5. It's simple enough not to cause you maintainance hassles down the > road, and even if it did the maintainer is unlikely to disappear :-). All very well, but are sets really that essential to every day Python programming ? If we include sets then we ought to also include graphs, tries, btrees and all those other goodies we have in computer science. All of these types are available out there, but I believe the audience who really cares for these types is also capable of downloading the extensions and installing them. It would be nice if all of these extension could go into a SUMO edition of Python though... together with your set module. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 23 12:08:06 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 06:08:06 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D5C48.A076DA0@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 11:26:16AM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> Message-ID: <20010123060806.A25436@thyrsus.com> M.-A. Lemburg : > All very well, but are sets really that essential to every > day Python programming ? If we include sets then we ought to > also include graphs, tries, btrees and all those other goodies > we have in computer science. I use sets a lot. And there was enough demand to generate a PEP. But the wider question here is how seriously we take "batteries are included" as a design principle. Does a facility have to be useful *every day* to be worth being in the standard library? And if so, what are things like the POP3 and IMAP libraries (or, for that matter, my own shlex and netrc modules) doing there? I don't think so. I think there are at least four different possible reasons for something to be in the standard library: 1. It's useful every day. 2. It's useful less frequently than every day, but is a stable cross-platform implementation of a wheel that would otherwise have to be reinvented frequently. That is, you can solve it *once* and have a zero-maintainance increment to the power of the language. 3. It's a technique that's not often used, and not necessarily stable in the face of platform variations, but nothing else will do when you need it and it's notably difficult to get right. (popen2 and BaseHTTPServer would be good examples of this.) 4. It's a developer checklist feature that improves Python's competitive position against Perl, Tcl, and other contenders for the same ecological niche. IMO a lightweight set facility, like POP3 and IMAP, qualifies under 2 and 4 even if not under 1 and 3. This question keeps coming up in different guises. I'm often the one to raise it, because I favor an aggressive interpretation of "batteries are included" that would pull in a lot of stuff. Yes, this makes more work for us -- but I think it's work we should be doing. While minimalism is an excellent design heuristic for the core language, I think it's a bad one for the libraries. Python is a high-level language and programmers using it both expect and deserve high-level libraries -- yes, including graphs/tries/btrees and all that computer science stuff. Just as much to the point, Python competing against languages like Perl that frequently get design wins against it because of the richness of the environment *they* are willing to carry around. Guido and Tim and others are more conservative than I, which would be OK -- but it seems to me that the conservatives do not have consistent or well-thought-out criteria for what to include, which is *not* OK. We need to solve this problem. Some time back I initiated a library guidelines PEP, then dropped it due to press of overwork. But the general question is going to keep coming up and we ought to have policy guidelines that potential library developers can understand. Should I pick this up again? -- Eric S. Raymond I do not find in orthodox Christianity one redeeming feature. -- Thomas Jefferson From mal at lemburg.com Tue Jan 23 12:50:39 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 12:50:39 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> Message-ID: <3A6D700F.7A9E2509@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > All very well, but are sets really that essential to every > > day Python programming ? If we include sets then we ought to > > also include graphs, tries, btrees and all those other goodies > > we have in computer science. > > I use sets a lot. And there was enough demand to generate a PEP. Sure, but sets are fairly easy to implement using Python dictionaries -- at least at the level normally needed by Python programs. Sets, queues and graphs are examples of data types which can have many different faces; it is hard to design APIs for these which meet everyones needs. > But the wider question here is how seriously we take "batteries are > included" as a design principle. Does a facility have to be useful > *every day* to be worth being in the standard library? And if so, > what are things like the POP3 and IMAP libraries (or, for that matter, > my own shlex and netrc modules) doing there? You can argue the same way for all kinds of extensions and packages you find in the Vaults. That's why there's demand for a different packaging of Python and this is what Moshe's PEP 206 addresses: http://python.sourceforge.net/peps/pep-0206.html > I don't think so. I think there are at least four different > possible reasons for something to be in the standard library: > > 1. It's useful every day. > > 2. It's useful less frequently than every day, but is a stable > cross-platform implementation of a wheel that would otherwise have to > be reinvented frequently. That is, you can solve it *once* and have a > zero-maintainance increment to the power of the language. > > 3. It's a technique that's not often used, and not necessarily stable > in the face of platform variations, but nothing else will do > when you need it and it's notably difficult to get right. (popen2 and > BaseHTTPServer would be good examples of this.) > > 4. It's a developer checklist feature that improves Python's competitive > position against Perl, Tcl, and other contenders for the same ecological > niche. > > IMO a lightweight set facility, like POP3 and IMAP, qualifies under 2 and 4 > even if not under 1 and 3. > > This question keeps coming up in different guises. I'm often the one to > raise it, because I favor an aggressive interpretation of "batteries > are included" that would pull in a lot of stuff. Yes, this makes more > work for us -- but I think it's work we should be doing. > > While minimalism is an excellent design heuristic for the core language, > I think it's a bad one for the libraries. Python is a high-level language > and programmers using it both expect and deserve high-level libraries -- > yes, including graphs/tries/btrees and all that computer science stuff. > > Just as much to the point, Python competing against languages like > Perl that frequently get design wins against it because of the > richness of the environment *they* are willing to carry around. > > Guido and Tim and others are more conservative than I, which would be > OK -- but it seems to me that the conservatives do not have consistent > or well-thought-out criteria for what to include, which is *not* OK. > We need to solve this problem. > > Some time back I initiated a library guidelines PEP, then dropped it > due to press of overwork. But the general question is going to keep > coming up and we ought to have policy guidelines that potential > library developers can understand. > > Should I pick this up again? Hmm, we already have the PEP 206 which focusses on the topic. Perhaps you could work with Moshe to sort out the "which batteries do we need" sub-topic ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 23 13:20:46 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 07:20:46 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D700F.7A9E2509@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 12:50:39PM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> Message-ID: <20010123072046.A25593@thyrsus.com> M.-A. Lemburg : > > But the wider question here is how seriously we take "batteries are > > included" as a design principle. Does a facility have to be useful > > *every day* to be worth being in the standard library? And if so, > > what are things like the POP3 and IMAP libraries (or, for that matter, > > my own shlex and netrc modules) doing there? > > You can argue the same way for all kinds of extensions and > packages you find in the Vaults. That's why there's demand for > a different packaging of Python and this is what Moshe's > PEP 206 addresses: > > http://python.sourceforge.net/peps/pep-0206.html Muttering "PEP 206" evades the fundamental problem rather than solving it. Not that I'm saying Moshe hasn't made a valiant effort, within the political constraint that the BDFL and others seem unwilling to confront the deeper issue. But PEP 206 is not enough. Here is why: 1. If the "Sumo" packaging ever happens, the vanilla non-Sumo version that Guido issues will quickly become of mostly theoretical interest -- because Red Hat and everybody else will move to Sumo instantly, figuring they have nothing to lose by including more features. 2. If by some change I'm wrong about 1, the outcome will be worse; we'll in effect have fragmented the language, because there won't be consistency in what library stuff is available between Sumo and non-Sumo builds on the same platform. 3. There are documentation issues as well. It's already a blot on Python that the standard documentation set doesn't cover Tkinter. In the Sumo distribution, the gap between what's installed and what's documented is likely to widen further. Developers will see this as pointlessly irritating -- and they'll be right. The stock distribution should *be* the Sumo distribution. If we're really so terrified of the extra maintainence load, then the right fix is to mark some modules and documentation as "externally maintained" with prominent pointers back to the responsible people. -- Eric S. Raymond The day will come when the mystical generation of Jesus by the Supreme Being as his father, in the womb of a virgin, will be classed with the fable of the generation of Minerva in the brain of Jupiter. -- Thomas Jefferson, 1823 From mal at lemburg.com Tue Jan 23 13:48:09 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 13:48:09 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> Message-ID: <3A6D7D89.A6BE1B74@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > > But the wider question here is how seriously we take "batteries are > > > included" as a design principle. Does a facility have to be useful > > > *every day* to be worth being in the standard library? And if so, > > > what are things like the POP3 and IMAP libraries (or, for that matter, > > > my own shlex and netrc modules) doing there? > > > > You can argue the same way for all kinds of extensions and > > packages you find in the Vaults. That's why there's demand for > > a different packaging of Python and this is what Moshe's > > PEP 206 addresses: > > > > http://python.sourceforge.net/peps/pep-0206.html > > Muttering "PEP 206" evades the fundamental problem rather than solving it. > > Not that I'm saying Moshe hasn't made a valiant effort, within the political > constraint that the BDFL and others seem unwilling to confront the deeper > issue. But PEP 206 is not enough. Here is why: > > 1. If the "Sumo" packaging ever happens, the vanilla non-Sumo version that > Guido issues will quickly become of mostly theoretical interest -- because > Red Hat and everybody else will move to Sumo instantly, figuring they have > nothing to lose by including more features. > > 2. If by some change I'm wrong about 1, the outcome will be worse; > we'll in effect have fragmented the language, because there won't be > consistency in what library stuff is available between Sumo and > non-Sumo builds on the same platform. > > 3. There are documentation issues as well. It's already a blot on > Python that the standard documentation set doesn't cover Tkinter. In > the Sumo distribution, the gap between what's installed and what's > documented is likely to widen further. Developers will see this as > pointlessly irritating -- and they'll be right. > > The stock distribution should *be* the Sumo distribution. If we're really > so terrified of the extra maintainence load, then the right fix is to > mark some modules and documentation as "externally maintained" with > prominent pointers back to the responsible people. That's your POV, others think different and since this is not a democracy, the Sumo distribution is a feasable way of satisfying both needs. There are a few other issues to consider as well: * licensing is a problem (and this is also mentioned in the PEP 206) since some of the nicer additions are GPLed and thus not in the spirit of Python's closed-source friendliness which has provided it with a large user base in the commercial field * packages authors are not all the same and some may not want to split their distribution due to the integration of their package in a Sumo-distribution * the packages mentioned in PEP 206 are very complex and usually largish; maintaining them will cause much more effort compared to the standard lib modules and extensions * the build process varies widely between packages; even though we have distutils, some of the packages extend it to fit their specific needs (which is OK, but causes extra efforts in getting the build process combined) I'm not objecting to the Sumo-distribution project; to the contrary -- I tried a similar project a few years ago: the Python PowerTools distribution which you can download from: http://www.lemburg.com/python/PowerTools-0.2.zip The project died quickly though, as I wasn't able to keep up with the maintenance effort. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin at cnri.reston.va.us Tue Jan 23 14:40:06 2001 From: akuchlin at cnri.reston.va.us (Andrew Kuchling) Date: Tue, 23 Jan 2001 08:40:06 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D7D89.A6BE1B74@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 01:48:09PM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> <3A6D7D89.A6BE1B74@lemburg.com> Message-ID: <20010123084006.A23485@newcnri.cnri.reston.va.us> On Tue, Jan 23, 2001 at 01:48:09PM +0100, M.-A. Lemburg wrote: >There are a few other issues to consider as well: > To add a few: * The larger the amount of code in the distribution, the more effort it is maintain it all. * Minor fixes aren't available until the next Python release. For example, to drag out the XML code again: there have been two PyXML releases since Python 2.0 fixing various bugs, but someone who sticks to installing just Python will not be able to get at those bugfixes until April (when 2.1 is supposed to get finalized). If there were a core Python distribution and a sumo distribution, and the sumo distribution was the one that most people downloaded and used, that would be perfectly OK. Practically no one assembles their own Linux distribution, and that's not considered a problem. To some degree, if you're using a well-packaged Linux distribution such as Debian, you also have Python distribution mechanism with intermodule dependencies; we just have to reinvent the wheel for people on other platforms. >The project died quickly though, as I wasn't able to keep >up with the maintenance effort. Interesting. Did you get much feedback indicating that people used it much? Perhaps when you were doing that effort the Python community was composed more of self-reliant early adopter types; there are probably more newbies around now. --amk From mal at lemburg.com Tue Jan 23 15:05:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 15:05:13 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> <3A6D7D89.A6BE1B74@lemburg.com> <20010123084006.A23485@newcnri.cnri.reston.va.us> Message-ID: <3A6D8F99.53A0F411@lemburg.com> Andrew Kuchling wrote: > > On Tue, Jan 23, 2001 at 01:48:09PM +0100, M.-A. Lemburg wrote: > >There are a few other issues to consider as well: > > > > To add a few: > > * The larger the amount of code in the distribution, the more effort it is > maintain it all. > > * Minor fixes aren't available until the next Python release. For example, > to drag out the XML code again: there have been two PyXML releases since > Python 2.0 fixing various bugs, but someone who sticks to installing just > Python will not be able to get at those bugfixes until April (when 2.1 > is supposed to get finalized). > > If there were a core Python distribution and a sumo distribution, and the > sumo distribution was the one that most people downloaded and used, that > would be perfectly OK. Practically no one assembles their own Linux > distribution, and that's not considered a problem. To some degree, if > you're using a well-packaged Linux distribution such as Debian, you also > have Python distribution mechanism with intermodule dependencies; we just > have to reinvent the wheel for people on other platforms. > > >The project died quickly though, as I wasn't able to keep > >up with the maintenance effort. > > Interesting. Did you get much feedback indicating that people used it much? Not much -- the interested parties were mostly Python experts (the lib started out as a project called expert-lib). > Perhaps when you were doing that effort the Python community was composed > more of self-reliant early adopter types; there are probably more newbies > around now. True. The included packages are dated 1997-1998 -- at that time Starship was just starting to get off the ground (this are moving at a much faster pace now). The PowerTools package still uses the Makefile.pre.in mechanism (with much success though) as distutils wasn't even considered at the time. Perhaps Moshe could pick this up to have a head start for Sumo-Python ?! Some of the included packages are not available elsewhere, AFAIK, so it may well be worthwhile having a look (e.g. the LGPLed trie and btree implementations donated by John W. M. Stevens). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Tue Jan 23 15:06:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 09:06:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: Your message of "Tue, 23 Jan 2001 04:17:30 EST." <20010123041730.A25165@thyrsus.com> References: <20010123041730.A25165@thyrsus.com> Message-ID: <200101231406.JAA04765@cj20424-a.reston1.va.home.com> > Ping's complaints are justified -- I've been looking at and testing > webbrowser.py and it's a mess. Among other things: > > 1. The BROWSER variable is not interpreted properly. > > 2. The code is stupid about loading platform support it doesn't need. > > 3. It's not possible to specify lynx as a browser under Unix, because the > computation of available browsers is split in two and partly done inside > the CommandLineBrowser class. > > 3. The module code is excessively hard to read, obscuring these bugs. > > Our mistake was hurriedly merging the launcher code from IDLE with the > browser-finder hack I wrote (the guts of CommandLineBrowser). The resulting > code is a bad, overcomplicated architecture with a nasty seam in it. > > As co-designer/implementor I should have caught this sooner, but I was > in a hurry to get a CML2 prototype out the door and didn't test > anything but the case I needed. My apologies to all. > > I'm rewriting to fix these problems now. Documented semantics of entry > points will be preserved. Excellent, Eric! That's the spirit. Can you point me to docs explaining the meaning of the BROWSER environment variable? I've never heard of it... The last new environment variables I learned were PAGER and EDITOR, probably 15 years ago when 4.1BSD was released... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 23 15:22:26 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 09:22:26 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101231406.JAA04765@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 09:06:47AM -0500 References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> Message-ID: <20010123092226.A25968@thyrsus.com> Guido van Rossum : > Can you point me to docs explaining the meaning of the BROWSER > environment variable? I've never heard of it... The last new > environment variables I learned were PAGER and EDITOR, probably 15 > years ago when 4.1BSD was released... :-) You've never heard of BROWSER because I invented it and have not widely popularized it yet :-). Ping knew about it either because he read the module code and saw that it was supposed to work, or because he remembered the design discussion when webbrowser.py was first implemented. I've had conversations with some key Perl and Tcl people (Larry Wall, Tom Christiansen, Clif Flynt) about the BROWSER convention, and they agree it's a good idea. I'll probably hack support for it into Perl's browser launcher next. It's documented in the version of libwebbrowser.tex now in the CVS tree. -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From nas at arctrix.com Tue Jan 23 09:30:56 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 23 Jan 2001 00:30:56 -0800 Subject: [Python-Dev] Does autoconfig detect INSTALL incorrectly? Message-ID: <20010123003056.A28309@glacier.fnational.com> Why is the configure.in file set to always use "install-sh"? There is a comment that says: # Install just never works :-( I don't think that statement is accurate. /usr/bin/install works quite well on my machine. The only commments I can find in the changelog are: revision 1.16 date: 1995/01/20 14:12:16; author: guido; state: Exp; lines: +27 -2 add INSTALL_PROGRAM and INSTALL_DATA; check for getopt and: revision 1.5 date: 1994/08/19 15:33:51; author: guido; state: Exp; lines: +14 -6 Simplify value of INSTALL (always 'cp'). Is there any reason why the autoconf macro AC_PROG_INSTALL is not used? The documentation seems to indicate that is does what we want. Neil From guido at digicool.com Tue Jan 23 16:31:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 10:31:39 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: Your message of "Tue, 23 Jan 2001 10:15:11 +0100." <3A6D4B9F.38B17046@lemburg.com> References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> <3A6D4B9F.38B17046@lemburg.com> Message-ID: <200101231531.KAA05122@cj20424-a.reston1.va.home.com> > Polymorphic code will usually get you more out of an > algorithm, than type-safe or interface-safe code. Right. But there are times when people want to write methods that take e.g. either a sequence or a mapping, and need to distinguish between the two. That's not easy in Python! Java and C++ support it very well though, and thus we'll always keep seeing this kind of complaint. Not sure what to do, except to recommend "find out which methods you expect in one case but not in the other (e.g. keys()) and do a hasattr() test for that." > BTW, there are Python interfaces to PySequence_Check() and > PyMapping_Check() burried in the builtin operator module in case > you really do care ;) ... > > operator.isSequenceType() > operator.isMappingType() > + some other C style _Check() APIs > > These only look at the type slots though, so Python instances > will appear to support everything but when used fail with > an exception if they don't provide the proper __xxx__ hooks. Yes, these should probably be deprecated. I certainly have never used them! (The operator module doesn't seem to get much use in general... Was it a bad idea?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 23 16:49:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 10:49:23 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Mon, 22 Jan 2001 15:13:09 EST." <20010122151309.C15236@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> Message-ID: <200101231549.KAA05172@cj20424-a.reston1.va.home.com> > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. Actually, I thought that Greg's proposal has some charm: it seems to be using a natural extension of the existing dictionary syntax, where a set is a dictionary without the values. I haven't thought about this deeply enough, but I see a lot of potential here. I understand that you have probably given this more thought than I have recently, so I'd like to see your more detailed analysis of what you do and don't like about Greg's proposal! > The only other set module I can find in the Vaults or anywhere else is > kjBuckets (which I knew about before). Looks like a good design, but > complicated -- and requires installation of an extension. > > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: > > 1. It supports a set of operations that are both often useful and > fiddly to get right, thus enhancing the "batteries are included" > effect. (I used its ancestor for representing seen-message numbers in > a specialized mailreader, for example.) I haven't read your docs yet (and no time because Digital Creations is requiring my attention all of today), but I expect that designing a universal set type, one that is good enough to be used in all sorts of applications, is very difficult. > 2. It's simple for application programmers to use. No extension module > to integrate. This is a silly argument for wanting something to be added to the core. If it's part of the core, the need for an extension is immaterial because that extension will always be available. So I conclude that your module is set up perfectly for a popular module in the Vaults. :-) > 3. It's unsurprising. My set objects behave almost exactly like other > mutable sequences, with all the same built-in methods working, except for > the fact that you can't introduce duplicates with the mutators. Ah, so you see a set as an extension of a sequence. That may be the big rift between your version and Greg's PEP: are sets more like sequences or more like dictionaries? > 4. It's already completely documented in a form suitable for the library. Much appreciated. > 5. It's simple enough not to cause you maintainance hassles down the > road, and even if it did the maintainer is unlikely to disappear :-). I'll be the judge of that, and since you prefer not to show your source code (why is that?), I can't tell yet. [...time flows...] Having just skimmed your docs, I'm disappointed that you choose lists as your fundamental representation type -- this makes it slow to test for membership and hence makes intersection and union slow. I suppose that you have evidence from using this that those operations aren't used much, or not for large sets? This is one of the problems with coming up with a set type for the core: it has to work for (nearly) everybody. It's no big deal if the Vaults contain three or more set modules -- perfect even, people can choose the best one for their purpose. But in the core, there's only room for one set type or module. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 23 17:30:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 11:30:50 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231549.KAA05172@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 10:49:23AM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> Message-ID: <20010123113050.A26162@thyrsus.com> Guido van Rossum : > I understand that you have probably given this more thought than I > have recently, so I'd like to see your more detailed analysis of what > you do and don't like about Greg's proposal! I've already covered my big objection, the fact that it doesn't support the degree of polymorphic crossover one might expect with sequence types (and Greg has agreed that I have a point there). Another problem is the lack of support for mutable elements (and yes, I'm quite aware of the problems with this.) One thing I do like is the proposal for an actual set input syntax. Of course this would require that the set type become one of the builtins, with compiler support. > I haven't read your docs yet (and no time because Digital Creations is > requiring my attention all of today), but I expect that designing a > universal set type, one that is good enough to be used in all sorts of > applications, is very difficult. For "difficult" read "can't be done". This is one of those cases where no matter what implementation you choose, some of the operations you want to be cheap will be worst-case quadratic. Life is like that. So I chose a dead-simple representation and accepted quadratic times for union/intersection. > > 2. It's simple for application programmers to use. No extension module > > to integrate. > > This is a silly argument for wanting something to be added to the > core. If it's part of the core, the need for an extension is > immaterial because that extension will always be available. So > I conclude that your module is set up perfectly for a popular module > in the Vaults. :-) Reasonable point. > > 3. It's unsurprising. My set objects behave almost exactly like other > > mutable sequences, with all the same built-in methods working, except for > > the fact that you can't introduce duplicates with the mutators. > > Ah, so you see a set as an extension of a sequence. That may be the > big rift between your version and Greg's PEP: are sets more like > sequences or more like dictionaries? Indeed it is. > > 5. It's simple enough not to cause you maintainance hassles down the > > road, and even if it did the maintainer is unlikely to disappear :-). > > I'll be the judge of that, and since you prefer not to show your > source code (why is that?), I can't tell yet. No nefarious concealment going on here here :-), I've sent versions of the code to Greg and Ping already. I'll shoot you a copy too. > Having just skimmed your docs, I'm disappointed that you choose lists > as your fundamental representation type -- this makes it slow to test > for membership and hence makes intersection and union slow. Not quite. Membership test is still linear-time; so is adding and deleting elements. It's true that union and intersection are quadratic, but see below. > I suppose > that you have evidence from using this that those operations aren't > used much, or not for large sets? Exactly! In my experience the usage pattern of a class like this runs heavily to small sets (usually < 64 elements); membership tests dominate usage, with addition and deletion of elements running second and the "classical" boolean operations like union and intersection being uncommon. What you get by going with a dictionary representation is that membership test becomes close to constant-time, while insertion and deletion become sometimes cheap and sometimes quite expensive (depending of course on whether you have to allocate a new hash bucket). Given the usage pattern I described, the overall difference in performance is marginal. > This is one of the problems with > coming up with a set type for the core: it has to work for (nearly) > everybody. As I pointed out above (and someone else on the list had made the same point earlier), "works for everbody" isn't really possible here. So my solution does the next best thing -- pick a choice of tradeoffs that isn't obviously worse than the alternatives and keeps things bog-simple. -- Eric S. Raymond Alcohol still kills more people every year than all `illegal' drugs put together, and Prohibition only made it worse. Oppose the War On Some Drugs! -------------- next part -------------- """ A set-algebra module for Python. The functions work on any sequence type and return lists. The set methods can take a set or any sequence type as an argument. They are insensitive to the types of the elements. Lists are used rather than dictionaries so the elements can be mutable. """ # Design and implementation by ESR, January 2001. def setify(list1): # Used by set constructor "Remove duplicates in sequence." res = [] for i in range(len(list1)): duplicate = 0 for j in range(i): if list1[i] == list1[j]: duplicate = 1 break if not duplicate: res.append(list1[i]) return res def union(list1, list2): # Used for | "Compute set intersection of sequences." res = list1[:] for x in list2: if not x in list1: res.append(x) return res def intersection(list1, list2): # Used for & "Compute set intersection of sequences." res = [] for x in list1: if x in list2: res.append(x) return res def difference(list1, list2): # Used for - "Compute set difference of sequences." res = [] for x in list1: if not x in list2: res.append(x) return res def symmetric_difference(list1, list2): # Used for ^ "Compute set symmetric-difference of sequences." res = [] for x in list1: if not x in list2: res.append(x) for x in list2: if not x in list1: res.append(x) return res def cartesian(list1, list2): # Used for * "Cartesian product of sequences considered as sets." res = [] for x in list1: for y in list2: res.append((x,y)) return res def equality(list1, list2): "Test sequences considered as sets for equality." if len(list1) != len(list2): return 0 for x in list1: if not x in list2: return 0 for x in list2: if not x in list1: return 0 return 1 def proper_subset(list1, list2): "Return 1 if first argument is a proper subset of second, 0 otherwise." if not len(list1) < len(list2): return 0 for x in list1: if not x in list2: return 0 return 1 def subset(list1, list2): "Return 1 if first argument is a subset of second, 0 otherwise." if not len(list1) <= len(list2): return 0 for x in list1: if not x in list2: return 0 return 1 def powerset(base): "Compute the set of all subsets of a set." powerset = [] for n in xrange(2 ** len(base)): subset = [] for e in xrange(len(base)): if n & 2 ** e: subset.append(base[e]) powerset.append(subset) return powerset class set: "Lists with set-theoretic operations." def __init__(self, value): self.elements = setify(value) def __len__(self): return len(self.elements) def __getitem__(self, ind): return self.elements[ind] def __setitem__(self, ind, val): if val not in self.elements: self.elements[ind] = val def __delitem__(self, ind): del self.elements[ind] def list(self): return self.elements def append(self, new): if new not in self.elements: self.elements.append(new) def extend(self, new): self.elements.extend(new) self.elements = setify(self.elements) def count(self, x): self.elements.count(x) def index(self, x): self.elements.index(x) def insert(self, i, x): if x not in self.elements: self.elements.index(i, x) def pop(self, i=None): self.elements.pop(i) def remove(self, x): self.elements.remove(x) def reverse(self): self.elements.reverse() def sort(self, cmp=None): self.elements.sort(cmp) def __or__(self, other): if type(other) == type(self): other = other.elements return set(union(self.elements, other)) __add__ = __or__ def __and__(self, other): if type(other) == type(self): other = other.elements return set(intersection(self.elements, other)) def __sub__(self, other): if type(other) == type(self): other = other.elements return set(difference(self.elements, other)) def __xor__(self, other): if type(other) == type(self): other = other.elements return set(symmetric_difference(self.elements, other)) def __mul__(self, other): if type(other) == type(self): other = other.elements return set(cartesian(self.elements, other)) def __eq__(self, other): if type(other) == type(self): other = other.elements return self.elements == other def __ne__(self, other): if type(other) == type(self): other = other.elements return self.elements != other def __lt__(self, other): if type(other) == type(self): other = other.elements return proper_subset(self.elements, other) def __le__(self, other): if type(other) == type(self): other = other.elements return subset(self.elements, other) def __gt__(self, other): if type(other) == type(self): other = other.elements return proper_subset(other, self.elements) def __ge__(self, other): if type(other) == type(self): other = other.elements return subset(other, self.elements) def __str__(self): res = "{" for x in self.elements: res = res + str(x) + ", " res = res[0:-2] + "}" return res def __repr__(self): return repr(self.elements) if __name__ == '__main__': a = set([1, 2, 3, 4]) b = set([1, 4]) c = set([5, 6]) d = [1, 1, 2, 1] print `d`, "setifies to", set(d) print `a`, "|", `b`, "is", `a | b` print `a`, "^", `b`, "is", `a ^ b` print `a`, "&", `b`, "is", `a & b` print `b`, "*", `c`, "is", `b * c` print `a`, '<', `b`, "is", `a < b` print `a`, '>', `b`, "is", `a > b` print `b`, '<', `c`, "is", `b < c` print `b`, '>', `c`, "is", `b > c` print "Power set of", `c`, "is", powerset(c) # end From sdm7g at virginia.edu Tue Jan 23 18:12:22 2001 From: sdm7g at virginia.edu (Steven D. Majewski) Date: Tue, 23 Jan 2001 12:12:22 -0500 (EST) Subject: [Python-Dev] libraries=['m'] in config.py [Re: Python 2.1 alpha 1 released!] In-Reply-To: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Message-ID: Is there a simple way (other than editing config.py) to remove the effect of all of the "libraries=['m']" options from config.py ? This breaks the MacOSX build as there's no libm -- that functionality is build into the System.framework . Shouldn't these type of flags be acquired from configure or the make environment somehow ? -- Steve Majewski ( BTW: OSX build also needs a "-traditional-cpp" flag to get thru compiling classobject.c without error. ) From uche.ogbuji at fourthought.com Tue Jan 23 18:28:18 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 10:28:18 -0700 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: Message from Martin von Loewis of "Mon, 22 Jan 2001 15:46:39 +0100." <200101221446.PAA05164@pandora.informatik.hu-berlin.de> Message-ID: <200101231728.KAA03408@localhost.localdomain> > > This has nothing to do with Python. UTF-8 marks the codes > > from 128-191 as illegal prefix. > [...] > > Perhaps the parser should catch the UnicodeError and > > instead return a not-wellformed exception ?! > > Right on both accounts. If no encoding is specified, and if the > document appears not to be UTF-16 in any endianness, an XML processor > shall assume it is UTF-8. As Marc-Andre explains, your document is not > proper UTF-8, hence the error. > > The confusing thing is that expat itself does not care about it not > being UTF-8; that is only detected when the callback is invoked in > pyexpat, and therefore conversion to a Unicode object is attempted. Pyexpat violates the XML spec here. XML parsers are not allowed to "recover" from well-formedness errors. And I would classify blithley reporting the character data as "recovery". However, I'm amazed that this wouldn't have come up before, considering the pedigree of expat. I'll poke around, and raise a bug on the expat site if need be. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tismer at tismer.com Tue Jan 23 18:35:08 2001 From: tismer at tismer.com (Christian Tismer) Date: Tue, 23 Jan 2001 18:35:08 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <200101231728.KAA03408@localhost.localdomain> Message-ID: <3A6DC0CC.C4FF83DF@tismer.com> uche.ogbuji at fourthought.com wrote: > > > > This has nothing to do with Python. UTF-8 marks the codes > > > from 128-191 as illegal prefix. > > [...] > > > Perhaps the parser should catch the UnicodeError and > > > instead return a not-wellformed exception ?! > > > > Right on both accounts. If no encoding is specified, and if the > > document appears not to be UTF-16 in any endianness, an XML processor > > shall assume it is UTF-8. As Marc-Andre explains, your document is not > > proper UTF-8, hence the error. > > > > The confusing thing is that expat itself does not care about it not > > being UTF-8; that is only detected when the callback is invoked in > > pyexpat, and therefore conversion to a Unicode object is attempted. > > Pyexpat violates the XML spec here. XML parsers are not allowed to "recover" > from well-formedness errors. And I would classify blithley reporting the > character data as "recovery". > > However, I'm amazed that this wouldn't have come up before, considering the > pedigree of expat. Well, I had to write a preprocessor which turns some "xml-like" but not well-formed stuff into something useable. This was a bulk of 100 MB of data, partially hand-written, partially machine-generated, but not really well-formed. Some special characters appeared very late in the data set, raising an error in Python 2.0, but not in 1.5.2, so I perceived it as an error in the parser first, not the data. :-) ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From uche.ogbuji at fourthought.com Tue Jan 23 18:55:12 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 10:55:12 -0700 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: Message from Christian Tismer of "Mon, 22 Jan 2001 16:05:24 +0100." <3A6C4C34.4D1252C9@tismer.com> Message-ID: <200101231755.KAA03471@localhost.localdomain> > "M.-A. Lemburg" wrote: > ... > > > The codes from 192 to 236, 238-243 produce > > > "UTF-8 decoding error: invalid data", > > > the rest gives "not well-formed". > > > > > > I would like to know if this happens with your (Tim) modified > > > version as well. I'm using plain vanilla BeOpen Python 2.0 . > > > > This has nothing to do with Python. UTF-8 marks the codes > > from 128-191 as illegal prefix. See Object/unicodeobject.c: > ... > > Schade. > > > Perhaps the parser should catch the UnicodeError and > > instead return a not-wellformed exception ?! > > I belive it would be better. Yes, and given there is not much time before thr 2.1 release, doing so is an acceptable stop-gap. However, I think the real fix has to lie in expat. I just had a *very* quick and dirty perusal of expat 1.2 and 1.95.1, and not only do the UTF-8 validity checks (at the top of xmltok.c) seem wrong, but it doesn't look as if they're ever invoked. I'll try to some time to look into this more closely, or perhaps someone will straighten me out if I'm on the wrong trail. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fredrik at effbot.org Tue Jan 23 19:03:42 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 19:03:42 +0100 Subject: [Python-Dev] getting rid of ucnhash Message-ID: <013901c08566$d2a8f360$e46940d5@hagrid> It's probably just me, but the names of the two unicode modules tend to irritate me: > ls u*.pyd ucnhash.pyd unicodedata.pyd (the former contains names, the latter data) I've been meaning to rename the former, but I just realized that it might be better to get rid of it completely, and move its functionality into the unicodedata module. The result is a single 200k unicodedata module, which con- tains the name database as well as two new functions: name(character [, default]) => map unicode character to name. if the name doesn't exist, return the default object, or raise ValueError. lookup(name) => unicode character (or raise KeyError if it doesn't exist) Should I check it in now, change the names/semantics and check it in, or post it to sourceforge? Cheers /F From uche.ogbuji at fourthought.com Tue Jan 23 19:00:19 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:00:19 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Mon, 22 Jan 2001 12:41:59 EST." <20010122124159.A14999@thyrsus.com> Message-ID: <200101231800.LAA03515@localhost.localdomain> > \section{\module{set} --- > Basic set algebra for Python} Looks good. Are you making this available for download? I could put this to experimental use right away (experimental since, IIRC, you are using the new rich comparisons). -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji at fourthought.com Tue Jan 23 19:16:27 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:16:27 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Mon, 22 Jan 2001 15:13:09 EST." <20010122151309.C15236@thyrsus.com> Message-ID: <200101231816.LAA03551@localhost.localdomain> > Guido van Rossum : > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. Tim mentioned that he had one, and he also claimed that every other dodder had a set class, but the only one listed in the vaults is kjBuckets, which I'm not sure is maintained any more. (Is Aaron Watters hereabouts?) > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. IMO, Eric's Set interface is close to perfect. PEP 218 is interesting, but I'm not sure it's worth slogging through the inevitable uproar over an entirely new syntactic construct (the "{}" notation) before getting something as useful as a set class into the standard library. > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: For what it's worth, I'm +1 on adding this to the standard library. I've seen so many set hacks with dictionaries (memory ouch) and list hacks (speed ouch) in Python code out there, that I'm convinced it would meet much more common usage than, say zlib, xdr, or even expat. On this hacker list everyone's aunt might whip up set extensions on boring weekends, but I doubt this describes the overall Python populace. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji at fourthought.com Tue Jan 23 19:29:36 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:29:36 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "M.-A. Lemburg" of "Tue, 23 Jan 2001 11:26:16 +0100." <3A6D5C48.A076DA0@lemburg.com> Message-ID: <200101231829.LAA03575@localhost.localdomain> > All very well, but are sets really that essential to every > day Python programming ? Not everyday, but as I said, the standard library has zlib, expat, tkinter, colorsys, and a whole lot of other stuff that is undoubtedly less useful than a set class. > If we include sets then we ought to > also include graphs, tries, btrees I see all of these as far less commonly useful than sets (at least in situations where implementations using existing data structures won't suffice). I run into needs for sets all the time. I don't have as much trouble with your other examples, though I've always considered tries as a possible performance boost in XPath. Oddly enough another data structure I often wish I had is a splay tree, and I hope to wrap my old C++ splay tree implementation for Python one of these days. > and all those other goodies > we have in computer science. All of these types are available > out there, but I believe the audience who really cares for these > types is also capable of downloading the extensions and installing > them. > > It would be nice if all of these extension could go into a SUMO > edition of Python though... together with your set module. Considering "batteries included", it's worth considering these very important "batteries". -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From skip at mojam.com Tue Jan 23 19:35:04 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 23 Jan 2001 12:35:04 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: References: Message-ID: <14957.52952.48739.53360@beluga.mojam.com> Guido> - Use "exec ... in dict" to avoid having to walk on eggshells; Guido> locals no don't have to start with underscore. Thanks. I have just been incredibly short on time lately. Guido> - Only test dbhash if bsddb can be imported. (Wonder if there Guido> are more like this?) Alpha testing should pick those up, yes? ;-) Guido> ! try: Guido> ! import bsddb Guido> ! except ImportError: Guido> ! if verbose: Guido> ! print "can't import bsddb, so skipping dbhash" Guido> ! else: Guido> ! check_all("dbhash") Instead of having to know that dbhash includes bsddb, shouldn't dbhash be the module that's imported here? Skip From uche.ogbuji at fourthought.com Tue Jan 23 19:36:59 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:36:59 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Tue, 23 Jan 2001 11:30:50 EST." <20010123113050.A26162@thyrsus.com> Message-ID: <200101231836.LAA03655@localhost.localdomain> > """ > A set-algebra module for Python. > > The functions work on any sequence type and return lists. > The set methods can take a set or any sequence type as an argument. > They are insensitive to the types of the elements. > > Lists are used rather than dictionaries so the elements can be mutable. > > """ Hmm. I was hoping this was actually a C extension for the performance boost, esp. given the number of __foo__ methods in the set class. Implementation in Python makes my interest in adding it to the standard lib more tepid (not to cast the least bit of aspersion on your work). -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From skip at mojam.com Tue Jan 23 19:37:44 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 23 Jan 2001 12:37:44 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <3A6CBBEF.4732BFF2@ActiveState.com> References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> <3A6CBBEF.4732BFF2@ActiveState.com> Message-ID: <14957.53112.119272.797494@beluga.mojam.com> Paul> I apologize but I'm not clear on my responsibilities here, if Paul> any. I wrote a PEP for online help. I submitted a partial Paul> implementation. Perhaps I am the one who should apologize. I started the thread. I tried Ping's code and was simply amazed at how useful it was. I didn't bother checking the list of PEPs to see if it overlapped with something there, and I suspect any discussion of this stuff has taken place in the doc sig, where I don't hang out. Skip From esr at thyrsus.com Tue Jan 23 19:39:04 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 13:39:04 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231816.LAA03551@localhost.localdomain>; from uche.ogbuji@fourthought.com on Tue, Jan 23, 2001 at 11:16:27AM -0700 References: <200101231816.LAA03551@localhost.localdomain> Message-ID: <20010123133904.B26487@thyrsus.com> uche.ogbuji at fourthought.com : > I've seen so many set hacks with dictionaries (memory ouch) and list > hacks (speed ouch) in Python code out there, that I'm convinced it > would meet much more common usage than, say zlib, xdr, or even > expat. Uche brings up a point I meant to make in my reply to Guido. The dict- vs.-list choice in set representation is indeed a choice between memory ouch and speed ouch. I believe most uses of sets are small sets. That reduces the speed ouch of using a list representation and increases the proportional memory ouch of a dictionary implementation. -- Eric S. Raymond Question with boldness even the existence of a God; because, if there be one, he must more approve the homage of reason, than that of blindfolded fear.... Do not be frightened from this inquiry from any fear of its consequences. If it ends in the belief that there is no God, you will find incitements to virtue in the comfort and pleasantness you feel in its exercise... -- Thomas Jefferson, in a 1787 letter to his nephew From jeremy at alum.mit.edu Tue Jan 23 19:41:23 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 13:41:23 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123113050.A26162@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> Message-ID: <14957.53331.342827.462297@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Guido van Rossum : >> Having just skimmed your docs, I'm disappointed that you choose >> lists as your fundamental representation type -- this makes it >> slow to test for membership and hence makes intersection and >> union slow. ESR> Not quite. Membership test is still linear-time; so is adding ESR> and deleting elements. It's true that union and intersection ESR> are quadratic, but see below. >> I suppose that you have evidence from using this that those >> operations aren't used much, or not for large sets? ESR> Exactly! In my experience the usage pattern of a class like ESR> this runs heavily to small sets (usually < 64 elements); ESR> membership tests dominate usage, with addition and deletion of ESR> elements running second and the "classical" boolean operations ESR> like union and intersection being uncommon. I use a Set type in the compiler package (Tools/compiler/compiler) to collect the names for a code block. I implemented a trivial Set type using a dictionary, because it supported the operations I was most interested in: addition, membership tests, intersection, and get elements as sequence (in arbitrary order). Those are the only operations the compiler uses. I think I use sets for this purpose frequently, although I can't think of any other good examples at the moment. I usually just use a dictionary explicitly. In the compiler, I chose an explicit Set class with unique method names (add, has_elt, elements) to make it obvious for readers that I was using a set. ESR> What you get by going with a dictionary representation is that ESR> membership test becomes close to constant-time, while insertion ESR> and deletion become sometimes cheap and sometimes quite ESR> expensive (depending of course on whether you have to allocate ESR> a new hash bucket). Given the usage pattern I described, the ESR> overall difference in performance is marginal. The cost of insertion would presumably be dominated by the frequency of dictionary resizes. I don't know how often they occur, but I assume the dictionary type is designed to accommodate efficient insert. I did a quick and dirty performance comparison of dictionary-based and list-based sets. (I'll include the code below.) It uses sample data collected from running the compiler; so it is measuring actual usage. The tests showed that dictionary-based sets were always faster. For small tests (3 operations), the difference was about 10 percent. For larger tests (88 operations), the difference ranged from 180 to almost 700 percent. >> This is one of the problems with coming up with a set type for >> the core: it has to work for (nearly) everybody. ESR> As I pointed out above (and someone else on the list had made ESR> the same point earlier), "works for everbody" isn't really ESR> possible here. So my solution does the next best thing -- pick ESR> a choice of tradeoffs that isn't obviously worse than the ESR> alternatives and keeps things bog-simple. For my applications, the dictionary-based approach is faster and offers a natural interface. If a set implementation were included in the standard library, I would like to see either (1) the implementation that favors my needs or (2) multiple implementations tuned for different uses. I think it would be just as easy to make set implementations available separately, though. Jeremy -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sets.tar URL: From loewis at informatik.hu-berlin.de Tue Jan 23 19:51:37 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 23 Jan 2001 19:51:37 +0100 (MET) Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <200101231755.KAA03471@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101231755.KAA03471@localhost.localdomain> Message-ID: <200101231851.TAA19488@pandora.informatik.hu-berlin.de> > I'll try to some time to look into this more closely, or perhaps > someone will straighten me out if I'm on the wrong trail. Spending only a little time myself, either, I'd agree with your conclusions. Regards, Martin From esr at thyrsus.com Tue Jan 23 19:55:30 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 13:55:30 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <14957.53331.342827.462297@localhost.localdomain>; from jeremy@alum.mit.edu on Tue, Jan 23, 2001 at 01:41:23PM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> Message-ID: <20010123135530.A26565@thyrsus.com> Jeremy Hylton : Content-Description: message body text > The tests showed that dictionary-based sets were always faster. For > small tests (3 operations), the difference was about 10 percent. For > larger tests (88 operations), the difference ranged from 180 to almost > 700 percent. Not surprising. 88 elements is getting pretty large. -- Eric S. Raymond Hoplophobia (n.): The irrational fear of weapons, correctly described by Freud as "a sign of emotional and sexual immaturity". Hoplophobia, like homophobia, is a displacement symptom; hoplophobes fear their own "forbidden" feelings and urges to commit violence. This would be harmless, except that they project these feelings onto others. The sequelae of this neurosis include irrational and dangerous behaviors such as passing "gun-control" laws and trashing the Constitution. From petrilli at amber.org Tue Jan 23 20:06:05 2001 From: petrilli at amber.org (Christopher Petrilli) Date: Tue, 23 Jan 2001 14:06:05 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123133904.B26487@thyrsus.com>; from esr@thyrsus.com on Tue, Jan 23, 2001 at 01:39:04PM -0500 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> Message-ID: <20010123140604.E18796@trump.amber.org> Eric S. Raymond [esr at thyrsus.com] wrote: > I believe most uses of sets are small sets. That reduces the speed ouch > of using a list representation and increases the proportional memory > ouch of a dictionary implementation. The problem is that there are a lot of uses for large sets, especially when you begin to introduce intersections and unions. If an implementation is only useful for a few dozen (or a hundered) items in the set, that eliminates a lot of places where the real use of set types is useful---optimizing large scale manipulations. Zope for example, manipulates sets with 10,000 items in it on a regular basis when doing text index manipulation. The data structures are heavily optimized for this kind of behaviour, without a major sacrifice in space. I think Jim perhaps can talk to this. Unfortunately, for me, a Python implementation of Sets is only interesting academicaly. Any time I've needed to work with them at a large scale, I've needed them *much* faster than Python could achieve without a C extension. Perhaps the difference is in problem domain. In the "scripting" problem domain, I would agree that Setswould rarely reach large sizes, and so a algorithm which performed in quadratic time might be fine, because the actual resultant time is small. However, in more full-blown applications, this would be counter productive, and the user would be forced implement their own (or use Aaron's excellent kjBuckets). Just my opinion, of course. Chris -- | Christopher Petrilli | petrilli at amber.org From ping at lfw.org Tue Jan 23 20:27:38 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 11:27:38 -0800 (PST) Subject: [Python-Dev] Sets: elt in dict, lst.include In-Reply-To: <14957.53331.342827.462297@localhost.localdomain> Message-ID: On Tue, 23 Jan 2001, Jeremy Hylton wrote: > For my applications, the dictionary-based approach is faster and > offers a natural interface. The only change that needs to be made to support sets of immutable elements is to provide "in" on dictionaries. The rest is then all quite natural: dict[key] = 1 if key in dict: ... for key in dict: ... (Then we can also get rid of the ugly has_key method.) For those that need mutable set elements badly enough to sacrifice a little speed, we can add two methods to lists: lst.include(elt) # same as - if elt not in lst: lst.append(elt) lst.exclude(elt) # same as - while elt in lst: lst.remove(elt) (These are generally useful methods to have anyway.) This proposal has the following advantages: 1. You still get to choose which implementation best suits your needs. 2. No new types are introduced; lists and dicts are well understood. 3. Both features are extremely simple to understand and explain. 4. Both features are useful in their own right, and could stand as independent proposals to improve lists and dicts respectively. (For instance, i spotted about 10 places in the std library where the 'include' method could be used, and i know i would use it myself -- certainly more often than pop or reverse!) 5. In all cases this is faster than a new Python class. (For instance, Jeremy's implementation even contained a commented-out optimization that stored self.elts.has_key as self.has_elt to speed things up a bit. Using straight dicts would see this optimization and raise it one, with no effort at all.) 6. Either feature can be independently approved or rejected without affecting the other. -- ?!ng From loewis at informatik.hu-berlin.de Tue Jan 23 20:33:00 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 23 Jan 2001 20:33:00 +0100 (MET) Subject: [Python-Dev] getting rid of ucnhash Message-ID: <200101231933.UAA02223@pandora.informatik.hu-berlin.de> > Should I check it in now, change the names/semantics and check it > in, or post it to sourceforge? Is that two or three options? If three, what change in semantics did you propose? Anyway, I feel it could go in right now; the only breakage would be to applications that use ucnhash.ucnhashAPI, right? Regards, Martin From fredrik at effbot.org Tue Jan 23 20:49:09 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 20:49:09 +0100 Subject: [Python-Dev] Re: getting rid of ucnhash References: <200101231933.UAA02223@pandora.informatik.hu-berlin.de> Message-ID: <01e801c08575$8f71c680$e46940d5@hagrid> martin wrote: > > Should I check it in now, change the names/semantics and check it > > in, or post it to sourceforge? > > Is that two or three options? three, I think. > If three, what change in semantics did you propose? none -- but maybe someone else has a better name for "lookup"? (the "name" function behaves like the existing property methods in 2.0's unicodedata) > Anyway, I feel it could go in right now; the only breakage would be to > applications that use ucnhash.ucnhashAPI, right? yup -- and those applications are already broken, since the CObject was renamed in 2.1a1. (well, any code using 2.1a1's new ucnhash.getcode/getname functions will of course also break. but I think we can live with that ;-) Cheers /F From ping at lfw.org Tue Jan 23 20:43:50 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 11:43:50 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Message-ID: Christopher Petrilli wrote: > The problem is that there are a lot of uses for large sets, especially > when you begin to introduce intersections and unions. [...] > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. On Tue, 23 Jan 2001, Ka-Ping Yee wrote: > This proposal has the following advantages: [six nice things about 'in dict' and 'lst.include'] I forgot to mention an important seventh advantage: 7. The list and dictionary data structures are implemented in the C core, so we leave open the possibility of a wizard going and optimizing the snot out of them later. Just as there's e.g. a boundary on recursion levels before Python invokes the cycle detection algorithm during comparison, if we decide we need more speed for big sets, Python could notice when a list or dictionary gets very big and invoke more powerful optimizations. We don't have to do this now, but the important thing is that we will always have the option to make Christopher's dream come true. (A wizard can do this once, and every Python script on the planet benefits.) In general i support Python deciding on the Right Thing to do under the hood, performance-wise, so that the programmer doesn't have to think too hard about what data structure to choose. -- ?!ng From nas at arctrix.com Tue Jan 23 14:08:07 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 23 Jan 2001 05:08:07 -0800 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123140604.E18796@trump.amber.org>; from petrilli@amber.org on Tue, Jan 23, 2001 at 02:06:05PM -0500 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> Message-ID: <20010123050807.A29115@glacier.fnational.com> On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. I think this argues that if sets are added to the core they should be implemented as an extension type with the speed of dictionaries and the memory usage of lists. Basicly, we would use the implementation of PyDict but drop the values. Neil From jeremy at alum.mit.edu Tue Jan 23 20:48:18 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 14:48:18 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <14957.53331.342827.462297@localhost.localdomain> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> Message-ID: <14957.57346.248852.656387@localhost.localdomain> Sorry about the garbled attachment on the previous message; I think I got the content-type wrong. Here's a second try. Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: sets.tar Type: application/octet-stream Size: 20480 bytes Desc: not available URL: From petrilli at amber.org Tue Jan 23 21:06:16 2001 From: petrilli at amber.org (Christopher Petrilli) Date: Tue, 23 Jan 2001 15:06:16 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com>; from nas@arctrix.com on Tue, Jan 23, 2001 at 05:08:07AM -0800 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> Message-ID: <20010123150616.F18796@trump.amber.org> Neil Schemenauer [nas at arctrix.com] wrote: > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > Unfortunately, for me, a Python implementation of Sets is only > > interesting academicaly. Any time I've needed to work with them at a > > large scale, I've needed them *much* faster than Python could achieve > > without a C extension. > > I think this argues that if sets are added to the core they > should be implemented as an extension type with the speed of > dictionaries and the memory usage of lists. Basicly, we would > use the implementation of PyDict but drop the values. This is effectively the implementation that Zope has for Sets. In addition we have "buckets" that have scores on them (which are implemented as a modified BTree). Unfortunately Jim Fulton (who wrote all the code for that level) is in a meeting, but I hope he'll comment on the implementation that was chosen for our software. Chris -- | Christopher Petrilli | petrilli at amber.org From jeremy at alum.mit.edu Tue Jan 23 20:56:05 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 14:56:05 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123135530.A26565@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> <20010123135530.A26565@thyrsus.com> Message-ID: <14957.57813.23072.723418@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Jeremy Hylton : Content-Description: ESR> message body text >> The tests showed that dictionary-based sets were always faster. >> For small tests (3 operations), the difference was about 10 >> percent. For larger tests (88 operations), the difference ranged >> from 180 to almost 700 percent. ESR> Not surprising. 88 elements is getting pretty large. Large for what? I've got directories with that many files and modules with the many names defined at the top-level :-). I'm just reporting the range of set sizes I've encountered for a real application. In general, I expect a few hundred elements should be handled without trouble by most Python containers. Jeremy From gvwilson at nevex.com Tue Jan 23 21:26:22 2001 From: gvwilson at nevex.com (Greg Wilson) Date: Tue, 23 Jan 2001 15:26:22 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123200601.87817EF68@mail.python.org> Message-ID: <001101c0857a$c0dce420$770a0a0a@nevex.com> Greg Wilson: Meta-question: do people want to continue to discuss sets on the general python-dev list, or take it out-of-line (e.g. to an egroups list)? I'm finding all of the discussion very useful, but I realize that many readers might prefer to concentrate on the 2.1 release... > Jeremy Hylton : > > The tests showed that dictionary-based sets were always faster. > > small tests (3 operations), the difference was about 10 percent. > > larger tests (88 operations), the difference ranged from > > 180 to almost 700 percent. > Eric Raymond : > Not surprising. 88 elements is getting pretty large. Greg Wilson: Really? I was testing my implementation with sets of email addresses grep'd out of old mail folders --- typical sizes were several thousand elements. > From: Christopher Petrilli > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. Greg Wilson: I had been expecting to implement this in C, not in pure Python, for performance. > From: Christopher Petrilli > In the "scripting" problem domain, I would agree that Sets would > rarely reach large sizes, > and so a algorithm which performed in quadratic time might be fine, Greg Wilson: I strongly disagree (see the email address example above --- it was the first thing that occurred to me to try). I am still hoping to find a sub-quadratic (preferably sub-linear) implementation. I can do it in C++ with observer/observable (contained items notify containers of changes in value, sets store all equivalent items in the same bucket), but that doesn't really help... > From: Ka-Ping Yee > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries... and: > From: Neil Schemenauer > ...if sets are added to the core...we would > use the implementation of PyDict but drop the values. Unfortunately, if values are required to be immutable, then sets of sets aren't possible... :-( Thanks, everyone, Greg From esr at thyrsus.com Tue Jan 23 21:38:39 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 15:38:39 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from ping@lfw.org on Tue, Jan 23, 2001 at 11:27:38AM -0800 References: <14957.53331.342827.462297@localhost.localdomain> Message-ID: <20010123153839.B26676@thyrsus.com> Ka-Ping Yee : > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries. The rest is then all > quite natural: > > dict[key] = 1 > if key in dict: ... > for key in dict: ... Independently of implementation issues about sets, I think this is a damn fine idea. +1. > (Then we can also get rid of the ugly has_key method.) > > For those that need mutable set elements badly enough to sacrifice > a little speed, we can add two methods to lists: > > lst.include(elt) # same as - if elt not in lst: lst.append(elt) > lst.exclude(elt) # same as - while elt in lst: lst.remove(elt) +1 on the concept, -0 on the names. -- Eric S. Raymond [The disarming of citizens] has a double effect, it palsies the hand and brutalizes the mind: a habitual disuse of physical forces totally destroys the moral [force]; and men lose at once the power of protecting themselves, and of discerning the cause of their oppression. -- Joel Barlow, "Advice to the Privileged Orders", 1792-93 From tim.one at home.com Tue Jan 23 23:02:41 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 17:02:41 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: <200101231531.KAA05122@cj20424-a.reston1.va.home.com> Message-ID: >> operator.isMappingType() >> + some other C style _Check() APIs [Guido] > Yes, these should probably be deprecated. I certainly have never > used them! (The operator module doesn't seem to get much use in > general... It's used heavily by test_operator.py . Outside of that, it's used maybe three times in the std distribution, nowhere essential; the return map(operator.__div__, rgbtuple, _maxtuple) in Pynche's ColorDB.py is typical. 2.0's return [x / 256. for x in rgbtuple] does the same thing more clearly (_maxtuple is a module constant). It appeals to functional-language fans and extreme micro-optimizers, so they don't have to type "lambda" in the simplest cases. At least operator.truth(x) is *clearer* than "not not x". > Was it a bad idea?) Mixed, but I'd say more bad than good overall. From thomas at xs4all.net Wed Jan 24 00:38:14 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 00:38:14 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010123153839.B26676@thyrsus.com>; from esr@thyrsus.com on Tue, Jan 23, 2001 at 03:38:39PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> Message-ID: <20010124003814.F27785@xs4all.nl> On Tue, Jan 23, 2001 at 03:38:39PM -0500, Eric S. Raymond wrote: > > The only change that needs to be made to support sets of immutable > > elements is to provide "in" on dictionaries. The rest is then all > > quite natural: > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > Independently of implementation issues about sets, I think this is a > damn fine idea. +1. It's come up before. The problem with it is that it's not quite obvious whether it is 'if key in dict' or 'if value in dict'. Sure, from the above example it's obvious what you *expect*, but I suspect that 'for x in dict' will result in a 40/60 split in expectations, and like American voters, the 20% middle section will change their vote each recount :-) Now, if only there was a terribly obvious way to spell it... so that it's immediately obvious which of the two you wanted.... something like, oh, I donno, this, maybe: if key in dict.keys: ... if value in dict.values: ... Ponder-ponder--Guido-should-use-the-time-machine-for-this-one!-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Wed Jan 24 01:13:20 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 01:13:20 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> Message-ID: <02f401c0859a$765d07c0$e46940d5@hagrid> > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. you forgot "if (key, value) in dict" on the other hand, it's not quite obvious that "list.sort" doesn't return the sorted list, "print >>None" prints to standard output, "except KeyError, ValueError" doesn't catch a ValueError exception, etc, etc, etc. (nor that it's "has_key" and "hasattr", and not "has_key" and "has_attr" or "haskey" and "hasattr" ;-) let's just say that "in" is the same thing as "has_key", and be done with it. Cheers /F From tim.one at home.com Wed Jan 24 02:51:22 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 20:51:22 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123140604.E18796@trump.amber.org> Message-ID: [Christopher Petrilli] > .... > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. How do you know that? I've used large sets in Python happily without resorting to C or kjbuckets (which is really aiming at fast operations on *graphs*, in which area it has no equal). Everyone (except Eric ) uses dicts to implement sets in Python, and "most" set operations can work at full C speed then; e.g., assuming both sets have N elements: membership testing O(1) -- it's just dict.has_key() element insertion O(1) -- dict[element] = 1 element removal O(1) -- del dict[element] union O(N), but at full C speed -- dict1.update(dict2) intersection O(N), but at Python speed (the only 2.1 dog in the bunch!) choose some element and remove it took O(N) time and additional space in 2.0, but is O(1) in both since dict.pop() was introduced iteration O(N), with O(N) additional space using dict.keys(), or O(1) additional space using dict.pop() repeatedly What are you going to do in C that's faster than using a Python dict for this purpose? Most key set operations are straightforward Python dict 1-liners then, and Python dicts are very fast. kjbuckets sets were slower last time I timed them (several years ago, but Python dicts have gotten faster since then while kjbuckets has been stagnant). There's a long tradition in the Lisp world of using unordered lists to represent sets (when the only tool you have is a hammer ... <0.5 wink>), but it's been easy to do much better than that in Python almost since the start. Even in the Python list world, enormous improvements for large sets can be gotten by maintaining lists in sorted order (then most O(N) operations drop to O(log2(N)), and O(N**2) to O(N)). Curiously, though, in 2.1 we can still use a dict-set for complex numbers, but no longer a sorted-list-set! Requiring a total ordering can get in the way more than requiring hashability (and vice versa -- that's a tough one). measurement-is-the-measure-of-all-measurable-things-ly y'rs - tim From greg at cosc.canterbury.ac.nz Wed Jan 24 03:45:01 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:45:01 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010124003814.F27785@xs4all.nl> Message-ID: <200101240245.PAA02098@s454.cosc.canterbury.ac.nz> Thomas Wouters : > Now, if only there was a terribly obvious way to spell it... so that it's > immediately obvious which of the two you wanted... Well, in the case of for key in d: or for value in d: it's immediately obvious to a *human* reader what is meant, so all we need to do is make the compiler a bit smarter. This can easily be done by the use of a small table, containing the equivalents of the words 'key' and 'value' in all known natural languages, against which the target variable name is matched using some suitable fuzzy matching algorithm. Soundex could be used for this, if we can decide on which version to use... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed Jan 24 03:46:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 21:46:37 -0500 Subject: [Python-Dev] getting rid of ucnhash In-Reply-To: Your message of "Tue, 23 Jan 2001 19:03:42 +0100." <013901c08566$d2a8f360$e46940d5@hagrid> References: <013901c08566$d2a8f360$e46940d5@hagrid> Message-ID: <200101240246.VAA06336@cj20424-a.reston1.va.home.com> > It's probably just me, but the names of the two unicode > modules tend to irritate me: > > > ls u*.pyd > ucnhash.pyd unicodedata.pyd > > (the former contains names, the latter data) > > I've been meaning to rename the former, but I just realized > that it might be better to get rid of it completely, and move > its functionality into the unicodedata module. > > The result is a single 200k unicodedata module, which con- > tains the name database as well as two new functions: > > name(character [, default]) => map unicode > character to name. if the name doesn't exist, > return the default object, or raise ValueError. > > lookup(name) => unicode character > (or raise KeyError if it doesn't exist) > > Should I check it in now, change the names/semantics and check > it in, or post it to sourceforge? To me, both of these are irrelevant details of the Unicode implementation. :-) IOW, feel free to check it in. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed Jan 24 03:49:21 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:49:21 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message-ID: <200101240249.PAA02101@s454.cosc.canterbury.ac.nz> Tim Peters : > Requiring a total ordering can get in the way more than requiring > hashability Often it's useful to have *some* total ordering, and you don't really care what it is as long as its consistent. Maybe all types should be required to support cmp(x,y) even if doing x < y via the rich comparison route raises a NotOrderable exception. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Wed Jan 24 03:52:43 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:52:43 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com> Message-ID: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Neil Schemenauer : > Basicly, we would > use the implementation of PyDict but drop the values. This could be incorporated into PyDict. Instead of storing keys and values in the same array, keep them in separate arrays and only allocate the values array the first time someone stores a value other than 1. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed Jan 24 03:58:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 21:58:59 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Wed, 24 Jan 2001 01:13:20 +0100." <02f401c0859a$765d07c0$e46940d5@hagrid> References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> Message-ID: <200101240258.VAA06479@cj20424-a.reston1.va.home.com> > let's just say that "in" is the same thing as "has_key", > and be done with it. You know, I've long resisted this, but I agree now -- this is the right thing. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 04:11:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:11:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: Your message of "Tue, 23 Jan 2001 12:35:04 CST." <14957.52952.48739.53360@beluga.mojam.com> References: <14957.52952.48739.53360@beluga.mojam.com> Message-ID: <200101240311.WAA06582@cj20424-a.reston1.va.home.com> > Guido> - Use "exec ... in dict" to avoid having to walk on eggshells; > Guido> locals no don't have to start with underscore. > > Thanks. I have just been incredibly short on time lately. You're welcome. > Guido> - Only test dbhash if bsddb can be imported. (Wonder if there > Guido> are more like this?) > > Alpha testing should pick those up, yes? ;-) Yes. :-) > Guido> ! try: > Guido> ! import bsddb > Guido> ! except ImportError: > Guido> ! if verbose: > Guido> ! print "can't import bsddb, so skipping dbhash" > Guido> ! else: > Guido> ! check_all("dbhash") > > Instead of having to know that dbhash includes bsddb, shouldn't dbhash be > the module that's imported here? I think I saw a complaint about this that specifically said that when dbhash is imported when bsddb can't be imported, an incomplete dbhash is left behind in sys.modules, and then a second import of dbhash will succeed -- but of course it will define no objects. Since dbhash may be imported elsewhere, testing for bsddb is safer. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 04:22:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:22:14 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.114,2.115 In-Reply-To: Your message of "Tue, 23 Jan 2001 08:24:38 PST." References: Message-ID: <200101240322.WAA06671@cj20424-a.reston1.va.home.com> > A few miscellaneous helpers. > > PyObject_Dump(): New function that is useful when debugging Python's C > runtime. In something like gdb it can be a pain to get some useful > information out of PyObject*'s. This function prints the str() of the > object to stderr, along with the object's refcount and hex address. > > PyGC_Dump(): Similar to PyObject_Dump() but knows how to cast from the > garbage collector prefix back to the PyObject* structure. > > [See Misc/gdbinit for some useful gdb hooks] > > none_dealloc(): Rather than SEGV if we accidentally decref None out of > existance, we assign None's and NotImplemented's destructor slot to > this function, which just calls abort(). Barry, since these are only gdb helpers, would it perhaps be better if their names started with "_Py" to indicate that they aren't part of the regular API? They violate an important rule: you shouldn't write to stderr directly, but always to sys.stderr. (There's a helper routines to write to stderr: PySys_WriteStderr().) I understand that for the gdb helper it's important to use the real stderr, and I don't object to having these functions present at all times (they're so small), but I do think that we should make it clear (by a _Py name, and also by a comment) that they should not be called! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Wed Jan 24 04:29:24 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 19:29:24 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010124003814.F27785@xs4all.nl> Message-ID: I wrote: > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries. Thomas Wouters wrote: > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. Yes, and i've seen this objection before, and i think it's silly. > Sure, from the above > example it's obvious what you *expect*, but I suspect that 'for x in dict' > will result in a 40/60 split in expectations, No way... it's at least 90/10. How often do you write 'dict.has_key(x)'? (std lib says: 206) How often do you write 'for x in dict.keys()'? (std lib says: 49) How often do you write 'x in dict.values()'? (std lib says: 0) How often do you write 'for x in dict.values()'? (std lib says: 3) I rest my case. -- ?!ng From barry at digicool.com Wed Jan 24 04:44:31 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 23 Jan 2001 22:44:31 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.114,2.115 References: <200101240322.WAA06671@cj20424-a.reston1.va.home.com> Message-ID: <14958.20383.795064.832967@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Barry, since these are only gdb helpers, would it perhaps be GvR> better if their names started with "_Py" to indicate that GvR> they aren't part of the regular API? They violate an GvR> important rule: you shouldn't write to stderr directly, but GvR> always to sys.stderr. (There's a helper routines to write to GvR> stderr: PySys_WriteStderr().) I understand that for the gdb GvR> helper it's important to use the real stderr, and I don't GvR> object to having these functions present at all times GvR> (they're so small), but I do think that we should make it GvR> clear (by a _Py name, and also by a comment) that they should GvR> not be called! I thought about it, couldn't decide and figured I'd check it in anyway, knowing that you'd let me know. See how wise I was? :) I will rename them as _Py* and fix the gdbinit file accordingly. One note: these functions /ought/ to be useful for dbx or any other command line debugger. I just haven't used anything but gdb for years. If anybody's got a dbxinit equivalent I could add that to Misc too. nothing-an-adjacent-office-wouldn't-have-solved-much-more-quick-ly y'rs, -Barry From guido at digicool.com Wed Jan 24 04:46:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:46:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: Your message of "Tue, 23 Jan 2001 09:22:26 EST." <20010123092226.A25968@thyrsus.com> References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> Message-ID: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> > Guido van Rossum : > > Can you point me to docs explaining the meaning of the BROWSER > > environment variable? I've never heard of it... The last new > > environment variables I learned were PAGER and EDITOR, probably 15 > > years ago when 4.1BSD was released... :-) ESR replies: > You've never heard of BROWSER because I invented it and have not > widely popularized it yet :-). Ping knew about it either because he > read the module code and saw that it was supposed to work, or because > he remembered the design discussion when webbrowser.py was first > implemented. > > I've had conversations with some key Perl and Tcl people (Larry Wall, > Tom Christiansen, Clif Flynt) about the BROWSER convention, and they > agree it's a good idea. I'll probably hack support for it into Perl's > browser launcher next. > > It's documented in the version of libwebbrowser.tex now in the CVS > tree. Grumble. That wasn't the kind of answer I expected. I don't like it if Python is used as a wedge to get a particular thing introduced to the rest of the world, no matter how useful it may seem at the time. If something is already a popular convention, I'll happily adopt it, but I'm not comfortable being put in front of somebody else's cart. There just are too many carts that would like to be pulled by a horse as strong as Python, and I don't want to take sides if I can avoid it. BROWSER seems unlikely to take the world by storm and I don't feel I need to be involved in the effort to get it accepted. (And yes, I know there are enough cases where I *did* take sides. There were some cases where I *do* want to take a side, and there were some mistakes -- which is one of the reasons why I'm shy about taking sides now.) Anyway, shouldn't you also talk to the developers of packages like KDE and Gnome? Surely their users would like to be able to configure the default webbrowser. Talking just to the scripting language people seems like you're thinking too small. There must be lots of C apps with the desire to invoke a browser. Also Emacs, which has an extensive list of browser-url-* functions (you might even learn a few tricks from it about how to invoke various external browsers) but AFAIK no default browser selection. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 04:54:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:54:25 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Wed, 24 Jan 2001 15:52:43 +1300." <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> > Neil Schemenauer : > > > Basicly, we would > > use the implementation of PyDict but drop the values. > > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Not a bad idea! (But shouldn't the default value be something else, like none?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 05:20:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 23:20:56 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Wed, 24 Jan 2001 00:38:14 +0100." <20010124003814.F27785@xs4all.nl> References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> Message-ID: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> > > > dict[key] = 1 > > > if key in dict: ... > > > for key in dict: ... > > > Independently of implementation issues about sets, I think this is a > > damn fine idea. +1. > > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. Sure, from the above > example it's obvious what you *expect*, but I suspect that 'for x in dict' > will result in a 40/60 split in expectations, and like American voters, the > 20% middle section will change their vote each recount :-) > > Now, if only there was a terribly obvious way to spell it... so that it's > immediately obvious which of the two you wanted.... something like, oh, I > donno, this, maybe: > > if key in dict.keys: ... > if value in dict.values: ... > > Ponder-ponder--Guido-should-use-the-time-machine-for-this-one!-ly y'rs, No chance of a time-machine escape, but I *can* say that I agree that Ping's proposal makes a lot of sense. This is a reversal of my previous opinion on this matter. (Take note -- those don't happen very often! :-) First to submit a working patch gets a free copy of 2.1a2 and subsequent releases, --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 24 05:50:49 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 23:50:49 -0500 Subject: [Python-Dev] getting rid of ucnhash In-Reply-To: <013901c08566$d2a8f360$e46940d5@hagrid> Message-ID: [/F] > It's probably just me, but the names of the two unicode > modules tend to irritate me: I don't care much about the names, but having two Unicode subprojects in the MS build seems overkill . > ls u*.pyd > ucnhash.pyd unicodedata.pyd > > (the former contains names, the latter data) Maybe that's the reason: the names don't get loaded at all unless you *use* one of the name APIs? Hard to say whether that's worth the bother; now that everything has been nicely compressed, it's sure not as compelling as it may have been earlier. > I've been meaning to rename the former, but I just realized > that it might be better to get rid of it completely, and move > its functionality into the unicodedata module. > > The result is a single 200k unicodedata module, which con- > tains the name database as well as two new functions: > > name(character [, default]) => map unicode > character to name. if the name doesn't exist, > return the default object, or raise ValueError. > > lookup(name) => unicode character > (or raise KeyError if it doesn't exist) > > Should I check it in now, change the names/semantics and check > it in, or post it to sourceforge? I have no opinion on what's best: you're working with it, you're the best judge of that. I only vote for checking in whatever you decide sooner rather than later; I'll fiddle the MS project files and readmes accordingly ASAP after that. From moshez at zadka.site.co.il Wed Jan 24 15:07:08 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 24 Jan 2001 16:07:08 +0200 (IST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001, Greg Ewing wrote: > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Cool idea, but even cooler (would catch more idioms, that is) is "the first time someone stores something not 'is' something in the dict, allocate the values array". This would catch small numbers, None and identifier-looking strings, for the measly cost of one pointer/dict object. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From moshez at zadka.site.co.il Wed Jan 24 15:15:39 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 24 Jan 2001 16:15:39 +0200 (IST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>, <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> Message-ID: <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il> On Tue, 23 Jan 2001 22:46:47 -0500, Guido van Rossum wrote: [ESR] > You've never heard of BROWSER because I invented it and have not > widely popularized it yet :-). [Guido v. Rossum] > Grumble. That wasn't the kind of answer I expected. I don't like it > if Python is used as a wedge to get a particular thing introduced to > the rest of the world, no matter how useful it may seem at the time. Guido, I think you're being over-dramatic. BROWSER is right in the tradition of PAGER and EDITOR, and a lot of other programs need it. I know Eric uses RH and mutt, so probably RH's urlview program (which mutt uses to jump to URLs) uses BROWSER. I was just about to submit a bug report to Debian that their urlview doesn't respect it. And if you really don't want to be a horse in front of a cart... > Anyway, shouldn't you also talk to the developers of packages like KDE > and Gnome? Surely their users would like to be able to configure the > default webbrowser. Yes -- via GNOME/KDE specific mechanisms. I have 0 experience with KDE, but I'm guessing the GNOME guys would do it via the GNOME "registry". KDE probably has something similar. I'm sure you wouldn't want Python to depend on GNOME, though it would be nice to make the browser-choosing part pluggable so when "import gnome" is done, it automatically tries to choose the user's browser. On UNIX (as opposed to GNOME/KDE, which are pretty much operating systems themselves), these things are done via environment variable. And $BROWSER doesn't seem like that much of an innovation. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From skip at mojam.com Wed Jan 24 07:28:21 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 00:28:21 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: <200101240311.WAA06582@cj20424-a.reston1.va.home.com> References: <14957.52952.48739.53360@beluga.mojam.com> <200101240311.WAA06582@cj20424-a.reston1.va.home.com> Message-ID: <14958.30213.325584.373062@beluga.mojam.com> Guido> I think I saw a complaint about this that specifically said that Guido> when dbhash is imported when bsddb can't be imported, an Guido> incomplete dbhash is left behind in sys.modules, and then a Guido> second import of dbhash will succeed -- but of course it will Guido> define no objects. So it does: % ./python Python 2.1a1 (#2, Jan 23 2001, 23:30:41) [GCC 2.95.3 19991030 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import dbhash Traceback (most recent call last): File "", line 1, in ? File "/home/beluga/skip/src/python/dist/src/Lib/dbhash.py", line 3, in ? import bsddb ImportError: No module named bsddb >>> import dbhash >>> Can that be construed as a bug? If import fails, shouldn't the stub module that was inserted in sys.modules be removed? Skip From skip at mojam.com Wed Jan 24 07:31:08 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 00:31:08 -0600 (CST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <14958.30380.851599.764535@beluga.mojam.com> Guido> BROWSER seems unlikely to take the world by storm and I don't Guido> feel I need to be involved in the effort to get it accepted. Editors and web browsers are classes of tools which (one would hope) will always come in several varieties. Users have to have some way to specify what to launch. BROWSER seems analogous to the EDITOR environment variable which is commonly used in Unix environments for just that purpose. Skip From thomas at xs4all.net Wed Jan 24 08:03:09 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 08:03:09 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101240420.XAA07153@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 11:20:56PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <200101240420.XAA07153@cj20424-a.reston1.va.home.com> Message-ID: <20010124080308.G27785@xs4all.nl> On Tue, Jan 23, 2001 at 11:20:56PM -0500, Guido van Rossum wrote: > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, Patch submitted. It only implements 'if key in dict', not 'for key in dict'. The latter is kind of hard until we have a separate iteration protocol. (PEP, anyone ?) Once we have it, we could consider 'for key, value in dict', which is now easily explained with 'dict.popitem()'. Does this mean I get a legally sound and thus empty legal statement with every Python release for the rest of your, its or my life, Guido, or will you just make me 'Free Python Release Receiver For Life' ? :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From pf at artcom-gmbh.de Wed Jan 24 08:31:30 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 24 Jan 2001 08:31:30 +0100 (MET) Subject: OT: contribution rewards (was Re: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> from Guido van Rossum at "Jan 23, 2001 11:20:56 pm" Message-ID: Hi, Guido van Rossum: [...] > Ping's proposal makes a lot of sense. This is a reversal of my > previous opinion on this matter. (Take note -- those don't happen > very often! :-) It gives a warm und fuzzy feeling to see that happen sometimes at all. ;-) > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, This repeated offer of free copies of Python becomes increasingly boring. For quite a while I myself have not contributed anything useful and I am nevertheless hoarding free copies of Python here. ;-) What about offering another immaterial reward to potential contributors instead? What about "fame points"? Anybody contributing something useful to Python receives a certain number of "fame points": These fame points will be added and placed in front of the name of the contributor into the ACKS file and the file will be sorted accordingly turning the ACKS file effectively into some kind of "Python contribution high score" ... ;-) Just kidding, Peter From tim.one at home.com Wed Jan 24 09:08:50 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 03:08:50 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com> Message-ID: [Neil Schemenauer] > I think this argues that if sets are added to the core they > should be implemented as an extension type with the speed of > dictionaries and the memory usage of lists. Basicly, we would > use the implementation of PyDict but drop the values. They'll be slower than dicts and take more memory than lists then. WRT memory, dicts cache the hash code with each entry for speed (so double the memory of a list even without the value field), and are never more than 2/3 full anyway. The dict implementation also gets low-level speed benefits out of using both the key and value fields to characterize the nature of a slot (the key field is NULL iff the slot is virgin; the value field is NULL iff the slot is available (virgin or dummy)). Dummy slots can be avoided (and so also the need for runtime code to distinguish them from active slots) by using a hash table of pointers to linked lists-- or flex vectors, or linked lists of small vectors --instead, and in most ways that leads to much simpler code (no more fiddling with dummies, no more probe-sequence hassles, no more boosting the size before the table is full). But without fine control over the internals of malloc, that takes even more memory in the end. Interesting twist: "a dict" *is* "a set", but a set of (key, value) pairs further constrained so that no two elements have the same key. So any set implementation can be used as-is to implement a dict as a set of 2-tuples, customizing the hash and "is equal" functions to look at just the tuples' first elements. The was the view taken by SETL in 1969, although their "map" (dict) type was eventually optimized to get away from actually constructing 2-tuples. Indeed, SETL eventually grew an elaborate optional type declaration sublanguage, allowing the user to influence many details of its many internal set-storage schemes; e.g., from pg 399 of "Programming With Sets: An Introduction to SETL": For example, we can declare [I'm putting their keywords in UPPERCASE for, umm, clarity] successors: LOCAL MMAP(ELMT b) REMOTE SET(ELMT b); This declaration specifies that for each x in b the image set successors{x} is stored in the element block of x, and that this image set is always to be represented as a bit vector. Similarly, the declaration successors: LOCAL MMAP(ELMT b) SPARSE SET(ELMT b); specifies that for each x in b the image set successors{x} is to be stored as a hash table containing pointers to elements of b. Note that the attribute LOCAL cannot be used for image sets of multivalued maps, This follows from the remarks in section 10.4.3 on the awkwardness of making local objects into subparts of composite objects. Clear? Snort. Here are some citations lifted from the web for their experience in trying to make these kinds of decisions by magic: @article{dewar:79, title="Programming by Refinement, as Exemplified by the {SETL} Representation Sublanguage", author="Robert B. K. Dewar and Arthur Grand and Ssu-Cheng Liu and Jacob T. Schwartz and Edmond Schonberg", journal=toplas, year=1979, month=jul, volume=1, number=1, pages="27--49" } @article{schonberg:81, title="An Automatic Technique for Selection of Data Structures in {SETL} Programs", author="Edmond Schonberg and Jacob T. Schwartz and Micha Sharir", journal=toplas, year=1981, month=apr, volume=3, number=2, pages="126--143" } @article{freudenberger:83, title="Experience with the {SETL} Optimizer", author="Stefan M. Freudenberger and Jacob T. Schwartz and Micha Sharir", pages="26--45", journal=toplas, year=1983, month=jan, volume=5, number=1 } If someone wanted to take sets seriously today, a better approach would be to define a minimal "set interface" ("abstract base class" in C++ terms), then supply multiple implementations of that interface, letting the user choose directly which implementation strategy they want for each of their sets. And people are doing just that in the C++ and Java worlds; e.g., http://developer.java.sun.com/developer/onlineTraining/ collections/Collection.html#SetInterface Curiously, the newer Java Collections Framework (covering multiple implementations of list, set, and dict interfaces) gave up on thread-safety by default, because it cost too much at runtime. Just another thing to argue about . we're-not-exactly-pioneers-here-ly y'rs - tim From fredrik at effbot.org Wed Jan 24 09:29:30 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 09:29:30 +0100 Subject: [Python-Dev] getting rid of ucnhash References: <013901c08566$d2a8f360$e46940d5@hagrid> <200101240246.VAA06336@cj20424-a.reston1.va.home.com> Message-ID: <019801c085df$c7ee0540$e46940d5@hagrid> guido wrote: > > It's probably just me, but the names of the two unicode > > modules tend to irritate me: > > > > > ls u*.pyd > > ucnhash.pyd unicodedata.pyd > > To me, both of these are irrelevant details of the Unicode > implementation. :-) IOW, feel free to check it in. Done. Note that Include/ucnhash.h is still there; it declares the "ucnhash_CAPI" structure used to access names from the unicodeobject module. (and all name-related tests are still kept in test_ucn) I'll leave it to Tim to update the MSVC build files. Cheers /F From tim.one at home.com Wed Jan 24 09:28:34 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 03:28:34 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Can you point me to docs explaining the meaning of the BROWSER > environment variable? I've never heard of it... The last new > environment variables I learned were PAGER and EDITOR, probably 15 > years ago when 4.1BSD was released... :-) I gotta say, politics aside, BROWSER is a screamingly natural answer to the question "what comes next in this sequence?": PAGER, EDITOR, ... Dear Lord, even *I* use a browser almost every week . explicit-is-better-than-implicit-ly y'rs - tim From esr at thyrsus.com Wed Jan 24 10:02:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:02:59 -0500 Subject: OT: contribution rewards (was Re: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: ; from pf@artcom-gmbh.de on Wed, Jan 24, 2001 at 08:31:30AM +0100 References: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> Message-ID: <20010124040259.A28086@thyrsus.com> Peter Funk : > What about offering another immaterial reward to potential contributors > instead? What about "fame points"? Anybody contributing something > useful to Python receives a certain number of "fame points": These > fame points will be added and placed in front of the name of > the contributor into the ACKS file and the file will be sorted > accordingly turning the ACKS file effectively into some kind of > "Python contribution high score" ... ;-) > > Just kidding, Peter You may be joking, but as an observer of how gift cultures work I say this isn't a bad idea. -- Eric S. Raymond "One of the ordinary modes, by which tyrants accomplish their purposes without resistance, is, by disarming the people, and making it an offense to keep arms." -- Constitutional scholar and Supreme Court Justice Joseph Story, 1840 From esr at thyrsus.com Wed Jan 24 10:09:18 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:09:18 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101240258.VAA06479@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 09:58:59PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> <200101240258.VAA06479@cj20424-a.reston1.va.home.com> Message-ID: <20010124040918.B28086@thyrsus.com> Guido van Rossum : > > let's just say that "in" is the same thing as "has_key", > > and be done with it. > > You know, I've long resisted this, but I agree now -- this is the > right thing. I think we've just justified the time and energy that went into this discussion. -- Eric S. Raymond What is a magician but a practicing theorist? -- Obi-Wan Kenobi, 'Return of the Jedi' From esr at thyrsus.com Wed Jan 24 10:14:27 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:14:27 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 10:46:47PM -0500 References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124041427.D28086@thyrsus.com> Guido van Rossum : > Grumble. That wasn't the kind of answer I expected. I don't like it > if Python is used as a wedge to get a particular thing introduced to > the rest of the world, no matter how useful it may seem at the time. Oh, stop! I'm not using Python as an argument for other people to adopt the BROWSER convention. The idea sells itself quite nicely by analogy to EDITOR and PAGER the second people hear it. > Anyway, shouldn't you also talk to the developers of packages like KDE > and Gnome? Surely their users would like to be able to configure the > default webbrowser. Talking just to the scripting language people > seems like you're thinking too small. There must be lots of C apps > with the desire to invoke a browser. Also Emacs, which has an > extensive list of browser-url-* functions (you might even learn a few > tricks from it about how to invoke various external browsers) but > AFAIK no default browser selection. All on my TO-DO list. -- Eric S. Raymond It is proper to take alarm at the first experiment on our liberties. We hold this prudent jealousy to be the first duty of citizens and one of the noblest characteristics of the late Revolution. The freemen of America did not wait till usurped power had strengthened itself by exercise and entangled the question in precedents. They saw all the consequences in the principle, and they avoided the consequences by denying the principle. We revere this lesson too much ... to forget it -- James Madison. From esr at thyrsus.com Wed Jan 24 10:16:12 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:16:12 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 03:28:34AM -0500 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124041612.E28086@thyrsus.com> Tim Peters : > I gotta say, politics aside, BROWSER is a screamingly natural answer to the > question "what comes next in this sequence?": > > PAGER, EDITOR, ... That's exactly what I thought when I was struck by the obvious. Everybody I spread this meme to seems to agree. -- Eric S. Raymond Government is actually the worst failure of civilized man. There has never been a really good one, and even those that are most tolerable are arbitrary, cruel, grasping and unintelligent. -- H. L. Mencken From esr at thyrsus.com Wed Jan 24 10:21:56 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:21:56 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Wed, Jan 24, 2001 at 04:15:39PM +0200 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>, <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il> Message-ID: <20010124042156.F28086@thyrsus.com> Moshe Zadka : > I know Eric uses RH and mutt, so probably RH's urlview program (which > mutt uses to jump to URLs) uses BROWSER. I was just about to submit > a bug report to Debian that their urlview doesn't respect it. Oh, *do* that! Note: BROWSER may consist of a colon-separated series of parts, browser commands to be tried in order (this is useful so you can put an X browser first, then a console browser, and have the right thing happen). If a part contains %s, the URL is substituted there; otherwise, the URL is concatenated to the command after a space. -- Eric S. Raymond Gun Control: The theory that a woman found dead in an alley, raped and strangled with her panty hose, is somehow morally superior to a woman explaining to police how her attacker got that fatal bullet wound. -- L. Neil Smith From tim.one at home.com Wed Jan 24 10:24:26 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 04:24:26 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> Message-ID: [Greg Ewing] > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. [Guido] > Not a bad idea! In theory, but if Vladimir were here he'd bust a gut over the possibly bad cache effects on "real dicts" (by keeping everything together, simply accessing the cached hash code brings both the key and value pointers into L1 cache too). We would need to quantify the effect of breaking that connection. > (But shouldn't the default value be something else, > like none?) Bleech. I hate the idiom of using a false value to mean "present". d = {} for x in seq: d[x] = 1 runs faster too (None needs a LOAD_GLOBAL now). From tim.one at home.com Wed Jan 24 11:01:36 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 05:01:36 -0500 Subject: [Python-Dev] test___all__ failing; Windows Message-ID: > python ../lib/test/regrtest.py test___all__ test___all__ test test___all__ crashed -- exceptions.AttributeError: 'locale' module has no attribute 'LC_MESSAGES' And indeed it does not: > python Python 2.1a1 (#9, Jan 24 2001, 04:40:55) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import locale >>> dir(locale) ['CHAR_MAX', 'Error', 'LC_ALL', 'LC_COLLATE', 'LC_CTYPE', 'LC_MONETARY', 'LC_NUMERIC', 'LC_TIME', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '_build_localename', '_group', '_parse_localename', '_print_locale', '_setlocale', '_test', 'atof', 'atoi', 'encoding_alias', 'format', 'getdefaultlocale', 'getlocale', 'locale_alias', 'localeconv', 'normalize', 'resetlocale', 'setlocale', 'str', 'strcoll', 'string', 'strxfrm', 'sys', 'windows_locale'] >>> Nor is LC_MESSAGES std C (the other LC_XXX guys are). I pin the blame on from _locale import * in locale.py -- who knows what that's supposed to export? Certainly not Skip . From tim.one at home.com Wed Jan 24 11:17:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 05:17:47 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: Message-ID: Nevermind; checked in a hack to stop the error on Windows. From mal at lemburg.com Wed Jan 24 14:00:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 14:00:28 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> Message-ID: <3A6ED1EC.237B5B1D@lemburg.com> Fredrik Lundh wrote: > > > It's come up before. The problem with it is that it's not quite obvious > > whether it is 'if key in dict' or 'if value in dict'. > > you forgot "if (key, value) in dict" > > on the other hand, it's not quite obvious that "list.sort" > doesn't return the sorted list, "print >>None" prints to > standard output, "except KeyError, ValueError" doesn't > catch a ValueError exception, etc, etc, etc. > > (nor that it's "has_key" and "hasattr", and not "has_key" > and "has_attr" or "haskey" and "hasattr" ;-) > > let's just say that "in" is the same thing as "has_key", > and be done with it. +1 all the way :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 24 15:01:33 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 15:01:33 +0100 Subject: [Python-Dev] Interfaces (Is X a (sequence|mapping)?) References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> <3A6D4B9F.38B17046@lemburg.com> <200101231531.KAA05122@cj20424-a.reston1.va.home.com> Message-ID: <3A6EE03D.4D5DFD17@lemburg.com> Guido van Rossum wrote: > > > Polymorphic code will usually get you more out of an > > algorithm, than type-safe or interface-safe code. > > Right. > > But there are times when people want to write methods that take > e.g. either a sequence or a mapping, and need to distinguish between > the two. That's not easy in Python! Java and C++ support it very > well though, and thus we'll always keep seeing this kind of > complaint. Not sure what to do, except to recommend "find out which > methods you expect in one case but not in the other (e.g. keys()) and > do a hasattr() test for that." Perhaps we should provide simple means for testing a set of available methods and slots ?! E.g. hasinterface(obj, ('keys', 'items', '__len__')) Objects could provide an __interface__ special attribute for this purpose (since not all slots can be auto-detected and -verified without side-effects). > > BTW, there are Python interfaces to PySequence_Check() and > > PyMapping_Check() burried in the builtin operator module in case > > you really do care ;) ... > > > > operator.isSequenceType() > > operator.isMappingType() > > + some other C style _Check() APIs > > > > These only look at the type slots though, so Python instances > > will appear to support everything but when used fail with > > an exception if they don't provide the proper __xxx__ hooks. > > Yes, these should probably be deprecated. I certainly have never used > them! (The operator module doesn't seem to get much use in > general... Was it a bad idea?) Some of these are nice to have and provide some good performance boost (e.g. the numeric slot access APIs). The type slot checking APIs are not too useful though. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at digicool.com Wed Jan 24 10:05:44 2001 From: jim at digicool.com (Jim Fulton) Date: Wed, 24 Jan 2001 04:05:44 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> <20010123150616.F18796@trump.amber.org> Message-ID: <3A6E9AE8.6C2D3CF0@digicool.com> Christopher Petrilli wrote: > > Neil Schemenauer [nas at arctrix.com] wrote: > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > Unfortunately, for me, a Python implementation of Sets is only > > > interesting academicaly. Any time I've needed to work with them at a > > > large scale, I've needed them *much* faster than Python could achieve > > > without a C extension. > > > > I think this argues that if sets are added to the core they > > should be implemented as an extension type with the speed of > > dictionaries and the memory usage of lists. Basicly, we would > > use the implementation of PyDict but drop the values. > > This is effectively the implementation that Zope has for Sets. Except we use sorted collections with binary search for sets. I think that a simple hash-based set would make alot of sense. > In > addition we have "buckets" that have scores on them (which are > implemented as a modified BTree). > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > a meeting, but I hope he'll comment on the implementation that was > chosen for our software. We have a number of special needs: - Scalability is critical. We make some special opimizations, like sets of integers and mapping objects with integer keys and values. In these cases, data are stored using C int arrays, allowing very efficient data storage and manipulation, especially when using integer keys. - We need to spread data over multiple database records. Our data structures may be hundreds of megabytes in size. We have ZODB-aware structures that use multiple independently stored database objects. - Range searches are very common, and under some circomstances, sorted collections and BTrees can have very little overhead compared to dictionaries. For this reason, out mapping objects and sets have been based on BTrees and sorted collections. Unfortunately, our current BTree implementation has a flaw that causes excessive number of objects to be updated when items are added and removed. (Each BTree internal node keeps track of the number of objects contained in it.) Also, out current sets are limited to integers and cannot be spread over multiple database records. We are completing a new BTree implementation that overcomes these limitations. IN this implementation, we will provide sets as value-less BTrees. Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org From gvwilson at nevex.com Wed Jan 24 15:10:41 2001 From: gvwilson at nevex.com (Greg Wilson) Date: Wed, 24 Jan 2001 09:10:41 -0500 Subject: [Python-Dev] re: sets In-Reply-To: <20010124032401.EB329F199@mail.python.org> Message-ID: <000301c0860f$6fa29010$770a0a0a@nevex.com> 1. I did a poll overnight by email of 22 friends and colleagues, none of whom are regular Python users (yet). My question was, "Would you expect the interface of a set class to be like the interface of a vector or list, or like the interface of a map or hash?" 15 people have replied; all 15 have said, "map or hash". Several respondents are Perl hackers, so I'm sure the answer is influenced by previous exposure to the set-as-valueless-hash idiom. Still, I think 15-0 is a pretty convincing score... Four, unprompted, said that they thought the STL's hierarchy of containers was as good as it gets, and that other languages should mirror it. (One of those added that this makes teaching much simpler --- students can transfer instincts from one language to another.) 2. Is there enough interest in sets for a BOF at IPC9? Please reply to me point-to-point if you're interested; I'll summarize and post the result. I volunteer to bring the donuts... > > Ka-Ping Yee: > > The only change that needs to be made to support sets of immutable > > elements is to provide "in" on dictionaries. The rest is then all > > quite natural: > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > various: > > [but what about 'value in dict' or '(key, value) in dict'?] > Fredrik Lundh: > let's just say that "in" is the same thing as "has_key", > and be done with it. > Guido van Rossum: > You know, I've long resisted this, but I agree now -- this is the > right thing. Greg Wilson: Woo hoo! Now, on a related note, what is the status of the 'indices()' proposal, as in: for i in indices(someList): instead of: for i in range(len(someList)): Would 'indices(dict)' be the same as 'dict.keys()', to allow uniform iteration? Or would it be more economical to introduce a 'keys()' method on lists and tuples, so that: for i in collection.keys(): would work on dicts, lists, and tuples? I know that 'keys()' is the wrong name for lists and tuples, but dicts are already using it, and it's completely unambiguous... Thanks, Greg From mal at lemburg.com Wed Jan 24 15:46:10 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 15:46:10 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> <20010123150616.F18796@trump.amber.org> <3A6E9AE8.6C2D3CF0@digicool.com> Message-ID: <3A6EEAB2.5E6A4E83@lemburg.com> Jim Fulton wrote: > > Christopher Petrilli wrote: > > > > Neil Schemenauer [nas at arctrix.com] wrote: > > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > > Unfortunately, for me, a Python implementation of Sets is only > > > > interesting academicaly. Any time I've needed to work with them at a > > > > large scale, I've needed them *much* faster than Python could achieve > > > > without a C extension. > > > > > > I think this argues that if sets are added to the core they > > > should be implemented as an extension type with the speed of > > > dictionaries and the memory usage of lists. Basicly, we would > > > use the implementation of PyDict but drop the values. > > > > This is effectively the implementation that Zope has for Sets. > > Except we use sorted collections with binary search for sets. > > I think that a simple hash-based set would make alot of sense. > > > In > > addition we have "buckets" that have scores on them (which are > > implemented as a modified BTree). > > > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > > a meeting, but I hope he'll comment on the implementation that was > > chosen for our software. > > We have a number of special needs: > > - Scalability is critical. We make some special opimizations, > like sets of integers and mapping objects with integer keys > and values. In these cases, data are stored using C int arrays, > allowing very efficient data storage and manipulation, especially > when using integer keys. > > - We need to spread data over multiple database records. Our data > structures may be hundreds of megabytes in size. We have ZODB-aware > structures that use multiple independently stored database objects. > > - Range searches are very common, and under some circomstances, > sorted collections and BTrees can have very little overhead > compared to dictionaries. For this reason, out mapping objects > and sets have been based on BTrees and sorted collections. > > Unfortunately, our current BTree implementation has a flaw that > causes excessive number of objects to be updated when items are > added and removed. (Each BTree internal node keeps track of the number > of objects contained in it.) Also, out current sets are limited > to integers and cannot be spread over multiple database records. > > We are completing a new BTree implementation that overcomes these > limitations. IN this implementation, we will provide sets as > value-less BTrees. You may want to check out a soon to be released new mx package: mxBeeBase. This is an on-disk b+tree implementation which supports data files up to 2GB on 32-bit platforms. Here's a preview: http://www.lemburg.com/python/mxBeeBase.html (The links on that page are not functional.) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Wed Jan 24 15:42:23 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 08:42:23 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: References: Message-ID: <14958.59855.4855.52638@beluga.mojam.com> Tim> Nor is LC_MESSAGES std C (the other LC_XXX guys are). Tim> I pin the blame on Tim> from _locale import * Tim> in locale.py -- who knows what that's supposed to export? Tim> Certainly not Skip . Was that a roundabout way of complimenting me for having found a bug? ;-) Skip From skip at mojam.com Wed Jan 24 15:50:02 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 08:50:02 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: References: Message-ID: <14958.60314.482226.825611@beluga.mojam.com> Tim> Nevermind; checked in a hack to stop the error on Windows. Probably should file a bug report (if you haven't already) so the root problem isn't forgotten because the hack obscures it. I see this code in localemodule.c: #ifdef LC_MESSAGES x = PyInt_FromLong(LC_MESSAGES); PyDict_SetItemString(d, "LC_MESSAGES", x); Py_XDECREF(x); #endif /* LC_MESSAGES */ Martin, looks like this module is your baby. Care to hazard a guess about whether LC_MESSAGES should always or never be there? Skip From fredrik at effbot.org Wed Jan 24 16:11:33 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 16:11:33 +0100 Subject: [Python-Dev] test___all__ failing; Windows References: <14958.60314.482226.825611@beluga.mojam.com> Message-ID: <04de01c08617$f56216f0$e46940d5@hagrid> Skip wrote: > Probably should file a bug report (if you haven't already) so the root > problem isn't forgotten because the hack obscures it. I see this code in > localemodule.c: > > #ifdef LC_MESSAGES > x = PyInt_FromLong(LC_MESSAGES); > PyDict_SetItemString(d, "LC_MESSAGES", x); > Py_XDECREF(x); > #endif /* LC_MESSAGES */ > > Martin, looks like this module is your baby. Care to hazard a guess about > whether LC_MESSAGES should always or never be there? I think the correct answer is "sometimes": ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, and LC_TIME Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME in other words, if it's supported, it should be exposed by the Python bindings. Cheers /F From tismer at tismer.com Wed Jan 24 15:40:04 2001 From: tismer at tismer.com (Christian Tismer) Date: Wed, 24 Jan 2001 16:40:04 +0200 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <3A6EE944.C8CC6EF7@tismer.com> Greg Ewing wrote: > > Neil Schemenauer : > > > Basicly, we would > > use the implementation of PyDict but drop the values. > > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Very good idea. It fits also in my view of how dicts should be implemented: Keep keys and values apart, since this information has different access patterns. I think (or at least hope) that dictionaries become faster, when hashes, keys and values are in seperate areas, giving more cache hits. Not sure if hashes and keys should be apart, but sure for values. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at digicool.com Wed Jan 24 16:37:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 10:37:03 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: Your message of "Wed, 24 Jan 2001 00:28:21 CST." <14958.30213.325584.373062@beluga.mojam.com> References: <14957.52952.48739.53360@beluga.mojam.com> <200101240311.WAA06582@cj20424-a.reston1.va.home.com> <14958.30213.325584.373062@beluga.mojam.com> Message-ID: <200101241537.KAA27039@cj20424-a.reston1.va.home.com> > Guido> I think I saw a complaint about this that specifically said that > Guido> when dbhash is imported when bsddb can't be imported, an > Guido> incomplete dbhash is left behind in sys.modules, and then a > Guido> second import of dbhash will succeed -- but of course it will > Guido> define no objects. > > So it does: > > % ./python > Python 2.1a1 (#2, Jan 23 2001, 23:30:41) > [GCC 2.95.3 19991030 (prerelease)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> import dbhash > Traceback (most recent call last): > File "", line 1, in ? > File "/home/beluga/skip/src/python/dist/src/Lib/dbhash.py", line 3, in ? > import bsddb > ImportError: No module named bsddb > >>> import dbhash > >>> > > Can that be construed as a bug? If import fails, shouldn't the stub module > that was inserted in sys.modules be removed? Yep, but not a very important bug -- typically this isn't caught. Feel free to check in a change; I think you should be able to insert something like import sys try: import bsddb except ImportError: del sys.modules[__name__] raise into dbhash. If this works for you in testing, forget the patch manager, just check it in. (I'm too busy to do much myself, the company needs me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Wed Jan 24 16:32:55 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 24 Jan 2001 16:32:55 +0100 (MET) Subject: LC_MESSAGES (was Re: [Python-Dev] test___all__ failing; Windows) In-Reply-To: <14958.60314.482226.825611@beluga.mojam.com> from Skip Montanaro at "Jan 24, 2001 8:50: 2 am" Message-ID: Hi, Skip Montanaro: > > Tim> Nevermind; checked in a hack to stop the error on Windows. > > Probably should file a bug report (if you haven't already) so the root > problem isn't forgotten because the hack obscures it. I see this code in > localemodule.c: > > #ifdef LC_MESSAGES > x = PyInt_FromLong(LC_MESSAGES); > PyDict_SetItemString(d, "LC_MESSAGES", x); > Py_XDECREF(x); > #endif /* LC_MESSAGES */ > > Martin, looks like this module is your baby. Care to hazard a guess about > whether LC_MESSAGES should always or never be there? AFAI found out, LC_MESSAGES was added to the POSIX "standard" in Posix.2. Non-posix2 compatible systems probably miss the proper functionality behind 'setlocale()'. So the best solution would be to add a clever emulation/approximation of this feature, if the underlying platform (here windows) doesn't provide it. This would require to wrap 'setlocale()'. But I'm not sure how to emulate for example 'setlocale(LC_MESSAGES, 'DE_de') on a Windows box. May be it is impossible to achieve. What I would love to see is that the typical query 'setlocale(LC_MESSAGES)' would return 'DE_de' on a Box running for example the german version of Windows or MacOS. This would eliminate the need for ugly language selection menus on these platforms in a portable fashion. Regards, Peter From guido at digicool.com Wed Jan 24 16:41:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 10:41:07 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Wed, 24 Jan 2001 16:07:08 +0200." <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> Message-ID: <200101241541.KAA27082@cj20424-a.reston1.va.home.com> > > This could be incorporated into PyDict. Instead of storing keys and > > values in the same array, keep them in separate arrays and only > > allocate the values array the first time someone stores a value other > > than 1. > > Cool idea, but even cooler (would catch more idioms, that is) is > "the first time someone stores something not 'is' something in the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > dict, allocate the values array". This would catch small numbers, > None and identifier-looking strings, for the measly cost of one > pointer/dict object. Sorry, but I don't understand what you mean by the ^^^ marked phrase. Can you please elaborate? Regarding storing one for "present", that's all well and fine, but it suggests to me that storing a false value could mean "not present". Do we really want that? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Thu Jan 25 01:50:13 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 02:50:13 +0200 (IST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101241541.KAA27082@cj20424-a.reston1.va.home.com> References: <200101241541.KAA27082@cj20424-a.reston1.va.home.com>, <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> Message-ID: <20010125005013.58C12A840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 10:41:07 -0500, Guido van Rossum wrote: > > Cool idea, but even cooler (would catch more idioms, that is) is > > "the first time someone stores something not 'is' something in the > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > dict, allocate the values array". This would catch small numbers, > > None and identifier-looking strings, for the measly cost of one > > pointer/dict object. > > Sorry, but I don't understand what you mean by the ^^^ marked phrase. > Can you please elaborate? I should really stop writing incomprehensible bits like that. Heck, I can't even understand it on second reading. I meant that the dictionary would keep a slot for "the one and only value". First time someone puts a value in the dict, it puts it in the "one and only value" slot, and doesn't initalize the value array. The second time someone puts a value, it checks for pointer equality with that "one and only value". If it is the same, it it still doesn't initalize the value array. The only time when the dictionary initalizes the value array is when two pointer-different values are put in. This would let me code a[key] = None For my sets (but consistent in the same set!) a[key] = 1 When the timbot codes (again, consistent in the same set) and a[key] = 'present' If you're really weird. (identifier-like strings get interned) That's not *semantics*, that's *optimization* for a commonly used (I think) idiom with dictionaries -- you can't predict the value, but it will probably remain the same. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From skip at mojam.com Wed Jan 24 17:44:17 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 10:44:17 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <04de01c08617$f56216f0$e46940d5@hagrid> References: <14958.60314.482226.825611@beluga.mojam.com> <04de01c08617$f56216f0$e46940d5@hagrid> Message-ID: <14959.1633.163407.779930@beluga.mojam.com> Fredrik> I think the correct answer is "sometimes": Fredrik> ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, Fredrik> LC_MONETARY, LC_NUMERIC, and LC_TIME Fredrik> Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, Fredrik> LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and Fredrik> LC_TIME Fredrik> in other words, if it's supported, it should be exposed by Fredrik> the Python bindings. Then this suggests that either Tim's hack is the correct fix (leave it out because we can't rely on it always being there) or I should add it to __all__ at the bottom of the file if and only if it's present in the module's namespace. Skip From moshez at zadka.site.co.il Thu Jan 25 01:57:22 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 02:57:22 +0200 (IST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <04de01c08617$f56216f0$e46940d5@hagrid> References: <04de01c08617$f56216f0$e46940d5@hagrid>, <14958.60314.482226.825611@beluga.mojam.com> Message-ID: <20010125005722.D2229A840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 16:11:33 +0100, "Fredrik Lundh" wrote: > I think the correct answer is "sometimes": > > ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, > LC_MONETARY, LC_NUMERIC, and LC_TIME > > Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, > LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and > LC_TIME > > in other words, if it's supported, it should be exposed by > the Python bindings. In that case, the __all__ attribute in the module has to be calculated dynamically. Say, adding code like try: LC_MESSAGES except NameError: pass else: __all__.append('LC_MESSAGES') Ditto for anything else. Should I check in a patch? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From trentm at ActiveState.com Wed Jan 24 17:49:17 2001 From: trentm at ActiveState.com (Trent Mick) Date: Wed, 24 Jan 2001 08:49:17 -0800 Subject: [Python-Dev] webbrowser.py In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 03:28:34AM -0500 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124084917.C29977@ActiveState.com> How will the expected adherence of apps to BROWSER jive with the current (and poorly understood by me) Windows convention of specifying the "default" browser somewhere in the registry? Trent -- Trent Mick TrentM at ActiveState.com From skip at mojam.com Wed Jan 24 17:49:23 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 10:49:23 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <20010125005722.D2229A840@darjeeling.zadka.site.co.il> References: <04de01c08617$f56216f0$e46940d5@hagrid> <14958.60314.482226.825611@beluga.mojam.com> <20010125005722.D2229A840@darjeeling.zadka.site.co.il> Message-ID: <14959.1939.398029.896891@beluga.mojam.com> Moshe> In that case, the __all__ attribute in the module has to be Moshe> calculated dynamically. Say, adding code like No need. I've already got this exact change in my local copy and I'll be adding a few more __all__ lists later today. Skip From paulp at ActiveState.com Wed Jan 24 17:56:26 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 24 Jan 2001 08:56:26 -0800 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> <200101241541.KAA27082@cj20424-a.reston1.va.home.com> Message-ID: <3A6F093A.A311C71E@ActiveState.com> Guido van Rossum wrote: > >... > > > Cool idea, but even cooler (would catch more idioms, that is) is > > "the first time someone stores something not 'is' something in the > > Sorry, but I don't understand what you mean by the ^^^ marked phrase. > Can you please elaborate? I wasn't clear about that either. The idea is: def add(new_value): if not values_array: if self.magic_value is NULL: self.magic_value = new_value elif new_value is not self.magic_value: self.values_array=[self.magic_value, new_value, ... ] else: # new_value is self.magic_value: do nothing I am neutral on this proposal myself. I think that even if we optimize any code where you pass the same thing over and over again, we should document a convention for consistency. So I'm not sure there is much advantage. Paul Prescod From esr at thyrsus.com Wed Jan 24 17:53:31 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 11:53:31 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124084917.C29977@ActiveState.com>; from trentm@ActiveState.com on Wed, Jan 24, 2001 at 08:49:17AM -0800 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> Message-ID: <20010124115331.A15059@thyrsus.com> Trent Mick : > How will the expected adherence of apps to BROWSER jive with the current (and > poorly understood by me) Windows convention of specifying the "default" > browser somewhere in the registry? BROWSER overrides the registry setting. Which is OK; under Windows, only wizards are going to muck with it. -- Eric S. Raymond Ideology, politics and journalism, which luxuriate in failure, are impotent in the face of hope and joy. -- P. J. O'Rourke From guido at digicool.com Wed Jan 24 17:59:00 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 11:59:00 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: Your message of "Wed, 24 Jan 2001 10:44:17 CST." <14959.1633.163407.779930@beluga.mojam.com> References: <14958.60314.482226.825611@beluga.mojam.com> <04de01c08617$f56216f0$e46940d5@hagrid> <14959.1633.163407.779930@beluga.mojam.com> Message-ID: <200101241659.LAA27650@cj20424-a.reston1.va.home.com> > Fredrik> I think the correct answer is "sometimes": > > Fredrik> ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, > Fredrik> LC_MONETARY, LC_NUMERIC, and LC_TIME > > Fredrik> Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, > Fredrik> LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and > Fredrik> LC_TIME > > Fredrik> in other words, if it's supported, it should be exposed by > Fredrik> the Python bindings. > > Then this suggests that either Tim's hack is the correct fix (leave it out > because we can't rely on it always being there) or I should add it to > __all__ at the bottom of the file if and only if it's present in the > module's namespace. The latter. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Thu Jan 25 18:05:44 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 19:05:44 +0200 (IST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124084917.C29977@ActiveState.com> References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 08:49:17 -0800, Trent Mick wrote: > How will the expected adherence of apps to BROWSER jive with the current (and > poorly understood by me) Windows convention of specifying the "default" > browser somewhere in the registry? The "webbrowser" module should prefer to take the setting from the registry on windows. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido at digicool.com Wed Jan 24 18:17:09 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 12:17:09 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Thu, 25 Jan 2001 02:50:13 +0200." <20010125005013.58C12A840@darjeeling.zadka.site.co.il> References: <200101241541.KAA27082@cj20424-a.reston1.va.home.com>, <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> <20010125005013.58C12A840@darjeeling.zadka.site.co.il> Message-ID: <200101241717.MAA27852@cj20424-a.reston1.va.home.com> > I meant that the dictionary would keep a slot for "the one and only > value". First time someone puts a value in the dict, it puts it > in the "one and only value" slot, and doesn't initalize the value > array. The second time someone puts a value, it checks for pointer > equality with that "one and only value". If it is the same, it > it still doesn't initalize the value array. The only time when > the dictionary initalizes the value array is when two pointer-different > values are put in. > > This would let me code > > a[key] = None > > For my sets (but consistent in the same set!) > > a[key] = 1 > > When the timbot codes (again, consistent in the same set) > > and > > a[key] = 'present' > > If you're really weird. > > (identifier-like strings get interned) > > That's not *semantics*, that's *optimization* for a commonly > used (I think) idiom with dictionaries -- you can't predict > the value, but it will probably remain the same. This I like! But note that a dict currently uses 12 bytes per slot in the hash table (on a 32-bit platform: long me_hash; PyObject *me_key, *me_value). The hash table's fill factor is typically between 50 and 67%. I think removing the hashes would slow down lookups too much, so optimizing identical values out would only save 6-8 bytes per existing key on average. Not clear if it's worth enough. I think I have to agree with Tim's expectation that two (or three) separate parallel arrays will reduce the cache locality and thus slow things down. Once you start probing, you jump through the hashtable at large random strides, causing bad cache performance (for largeish hash tables); but since often enough the first slot tried is right, you have the hash, key and value right next together, typically on the same cache line. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Wed Jan 24 18:31:55 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 12:31:55 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 25, 2001 at 07:05:44PM +0200 References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> Message-ID: <20010124123155.A15203@thyrsus.com> Moshe Zadka : > > How will the expected adherence of apps to BROWSER jive with the > > current (and poorly understood by me) Windows convention of > > specifying the "default" browser somewhere in the registry? > > The "webbrowser" module should prefer to take the setting from the > registry on windows. Um, that's not the way it works right now. The windows-default browser choice launches the registered default browser, but BROWSER may have something else in its search list first. -- Eric S. Raymond The real point of audits is to instill fear, not to extract revenue; the IRS aims at winning through intimidation and (thereby) getting maximum voluntary compliance -- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980 From esr at thyrsus.com Wed Jan 24 18:52:11 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 12:52:11 -0500 Subject: [Python-Dev] BROWSER status Message-ID: <20010124125211.A15276@thyrsus.com> I spent the morning writing and testing patches to make urlview and GNU Emacs BROWSER-aware, and have sent them off to the relevant maintainers. I've also sent a patch to Andries Brouwer for the environ(5) man page. Those of you interested in my latest bit of social engineering can take a look at http://www.tuxedo.org/~esr/BROWSER/ A bow in Guido's direction -- if he hadn't been grouchy about this I probably wouldn't have gotten to shipping those patches for a while. -- Eric S. Raymond A right is not what someone gives you; it's what no one can take from you. -- Ramsey Clark From thomas at xs4all.net Wed Jan 24 19:33:27 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 19:33:27 +0100 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 25, 2001 at 07:05:44PM +0200 References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> Message-ID: <20010124193326.B962@xs4all.nl> On Thu, Jan 25, 2001 at 07:05:44PM +0200, Moshe Zadka wrote: > On Wed, 24 Jan 2001 08:49:17 -0800, Trent Mick wrote: > > > How will the expected adherence of apps to BROWSER jive with the current (and > > poorly understood by me) Windows convention of specifying the "default" > > browser somewhere in the registry? > The "webbrowser" module should prefer to take the setting from the > registry on windows. Why ? That's a lot harder to change, and not settable per 'shell'/'thread'/'process'. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed Jan 24 20:54:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 14:54:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124115331.A15059@thyrsus.com> Message-ID: Guys, while I like BROWSER, don't think it has anything to do with Windows! Windows is not Unix; doesn't have PAGER or EDITOR either; and, in general, use of envars is an abomination under Windows. The old webbrowser.py uses the Windows-specific os.startfile(url) because that's the *right* way to do it on Windows, wizard or not. And you would have to be a Windows wizard to succeed in launching a browser under Windows in any other way anyway. You may as well try to sell the notion that, on Unix, Python should maintain a dict mapping file extensions to the user's preferred ways of opening such files <0.9 wink>. From tim.one at home.com Wed Jan 24 20:56:32 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 14:56:32 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124193326.B962@xs4all.nl> Message-ID: >> The "webbrowser" module should prefer to take the setting from the >> registry on windows. > Why ? That's a lot harder to change, and not settable per > 'shell'/'thread'/'process'. A Windows user has a legitimate expectation that *every* time an .html file is opened, it will come up in their browser of choice. That choice is made via the registry, and this is how *all* apps work under Windows. Ditto for .htm files (and that may be a different browser than is used for .html files, but again the user has set up their registry to do what *they* want done with it). It's not supposed to be easy to change; it is supposed to be consistent. Using a different browser per shell/thread/process is a foreign concept; it's also a useless concept on Windows <0.5 wink>. From tim.one at home.com Wed Jan 24 21:32:35 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 15:32:35 -0500 Subject: LC_MESSAGES (was Re: [Python-Dev] test___all__ failing; Windows) In-Reply-To: Message-ID: [Peter Funk] > ... > AFAI found out, LC_MESSAGES was added to the POSIX "standard" in Posix.2. FYI, it appears that C99 declined to adopt this extension to C89, but don't know why (the C99 Rationale doesn't mention it). That means the vendors who don't already support it can (well, *will*) use the new C99 std as "a reason" to continue leaving it out. From tim.one at home.com Wed Jan 24 21:15:28 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 15:15:28 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <14959.1633.163407.779930@beluga.mojam.com> Message-ID: [Skip] > Then this suggests that either Tim's hack is the correct fix (leave it out > because we can't rely on it always being there) or I should add it to > __all__ at the bottom of the file if and only if it's present in the > module's namespace. What you suggest at the end *is* the hack I checked in. That is, it's already done. The existence of LC_MESSAGES is clearly platform-specific; if anyone can say for sure a priori *which* platforms it's available on, tell Fred Drake so he can update the docs accordingly. From skip at mojam.com Wed Jan 24 22:25:45 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 15:25:45 -0600 (CST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124123155.A15203@thyrsus.com> References: <20010124084917.C29977@ActiveState.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> <20010124123155.A15203@thyrsus.com> Message-ID: <14959.18521.648454.488731@beluga.mojam.com> >>>>> "Eric" == Eric S Raymond writes: Moshe Zadka : >> The "webbrowser" module should prefer to take the setting from the >> registry on windows. Eric> Um, that's not the way it works right now. The windows-default Eric> browser choice launches the registered default browser, but Eric> BROWSER may have something else in its search list first. Why not have a special REGISTRY token you can place in the BROWSER path to tell it when to consult the registry? On non-Windows platforms it can simply be ignored: BROWSER=netscape:REGISTRY:explorer Skip From esr at thyrsus.com Wed Jan 24 22:30:44 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 16:30:44 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <14959.18521.648454.488731@beluga.mojam.com>; from skip@mojam.com on Wed, Jan 24, 2001 at 03:25:45PM -0600 References: <20010124084917.C29977@ActiveState.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> <20010124123155.A15203@thyrsus.com> <14959.18521.648454.488731@beluga.mojam.com> Message-ID: <20010124163044.A15877@thyrsus.com> Skip Montanaro : > Why not have a special REGISTRY token you can place in the BROWSER path to > tell it when to consult the registry? On non-Windows platforms it can > simply be ignored: In effect, windows-default is that special token. -- Eric S. Raymond The Bible is not my book, and Christianity is not my religion. I could never give assent to the long, complicated statements of Christian dogma. -- Abraham Lincoln From martin at mira.cs.tu-berlin.de Wed Jan 24 22:41:11 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 24 Jan 2001 22:41:11 +0100 Subject: [Python-Dev] Tkinter documentation (Was: What does "batteries are included" mean?) Message-ID: <200101242141.f0OLfBT01812@mira.informatik.hu-berlin.de> > It's already a blot on Python that the standard documentation set > doesn't cover Tkinter. Just point your friendly web browser to Ping's HTML generator and ask for Tkinter, or invoke "pydoc.py Tkinter". [I wouldn't have brought this up if it hadn't been the contribution of my friend Nils Fischbeck:-] Regards, Martin From nas at arctrix.com Wed Jan 24 16:31:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 24 Jan 2001 07:31:55 -0800 Subject: [Python-Dev] Makefile changes Message-ID: <20010124073155.B32266@glacier.fnational.com> I've checked in my new makefile. Hopefully everything goes well. The following files are no longer used so please don't patch them: Grammar/Makefile.in Include/Makefile Lib/Makefile Modules/Makefile.pre.in Objects/Makefile.in Parser/Makefile.in Python/Makefile.in Makefile.in They will be removed in a few days assuming all goes well. You should re-run configure to use the new makefile. I would appreciate it if people using platforms other than Linux and GNU make could give me some feedback on the build process. Does configure and make work okay? Does "make test" and "make install" work? Thanks. Neil From greg at cosc.canterbury.ac.nz Wed Jan 24 23:55:00 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Jan 2001 11:55:00 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> Message-ID: <200101242255.LAA02208@s454.cosc.canterbury.ac.nz> Guido: > But shouldn't the default value be something else, > like none? It should really be whatever is the first value that gets stored after the dict is created. That way people can use whatever they want for their dummy value and it will Just Work. And it will probably catch most existing uses of a dict as a set as well. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From ping at lfw.org Wed Jan 24 21:33:43 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 24 Jan 2001 12:33:43 -0800 (PST) Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! Message-ID: Hi -- after updating my CVS tree today with Python 2.1a1, i ran the tests and test_inspect failed. This revealed that the format of code.co_varnames has changed. At first i tried to update the inspect.py module to check the Python version number and track the change, but now i believe this is actually symptomatic of a real interpreter problem. Consider the function: def f(a, (b, c), *d): x = 1 print a, b, c, d, x Whereas in Python 1.5.2: f.func_code.co_argcount = 2 f.func_code.co_nlocals = 6 f.func_code.co_names = ('x', 'a', 'b', 'c', 'd') f.func_code.co_varnames = ('a', '.2', 'd', 'b', 'c', 'x') In Python 2.1a1: f.func_code.co_argcount = 2 f.func_code.co_nlocals = 6 f.func_code.co_names = ('b', 'c', 'x', 'a', 'd') f.func_code.co_varnames = ('a', '.2', 'b', 'c', 'd', 'x') Notice how the ordering of the variable names has changed. I went and looked at the CO_VARARGS clause in eval_code2 to see if it put the varargs and kwdict arguments in different slots, but it appears unchanged! It still puts varargs at locals[co_argcount] and kwdict at locals[co_argcount + 1]. Please try: >>> def f(a, (b, c), *d): ... x = 1 ... print a, b, c, d, x ... >>> f(1, (2, 3), 4) 1 2 3 Traceback (most recent call last): File "", line 1, in ? File "", line 3, in f UnboundLocalError: local variable 'd' referenced before assignment >>> In Python 1.5.2, this prints "1 2 3 (4,)" as expected. I only have 1.5.2 and 2.1a1 to test. I hope this problem isn't present in 2.0... Note that test_inspect was the only test to fail! It might be the only test that checks anonymous and *varargs at the same time. (Yet another reason to put inspect in the core...) I did recently check in additions to test_extcall that made the test much beefier -- but that only tested combinations of regular, keyword, varargs, and kwdict arguments; it neglected to test anonymous (tuple) arguments as well. -- ?!ng From tim.one at home.com Thu Jan 25 00:56:25 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 18:56:25 -0500 Subject: [Python-Dev] Re: test___all__ failing; Windows Message-ID: > In that case, the __all__ attribute in the module has to be calculated > dynamically. Say, adding code like > > try: > LC_MESSAGES > except NameError: > pass > else: > __all__.append('LC_MESSAGES') > > Ditto for anything else. > > Should I check in a patch? SourceForge CVS doesn't appear to be broken, so I can only conclude everyone decided this was a bad to stop taking drugs <0.9 wink>. From tim.one at home.com Thu Jan 25 01:04:50 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 19:04:50 -0500 Subject: [Python-Dev] (no subject) Message-ID: [Skip] > Why not have a special REGISTRY token you can place in the BROWSER > path to tell it when to consult the registry? On non-Windows > platforms it can simply be ignored: > > BROWSER=netscape:REGISTRY:explorer Because non-Windows platforms shouldn't be bothered with Windows silliness any more than Windows users should be bothered with Unix silliness. BROWSER isn't of any use on Windows, and REGISTRY isn't of any use on Unix. Eric may still *think* BROWSER is of use on Windows, but if so that's not really a technical problem . From thomas at xs4all.net Thu Jan 25 01:25:54 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 01:25:54 +0100 Subject: [Python-Dev] Makefile changes In-Reply-To: <20010124073155.B32266@glacier.fnational.com>; from nas@arctrix.com on Wed, Jan 24, 2001 at 07:31:55AM -0800 References: <20010124073155.B32266@glacier.fnational.com> Message-ID: <20010125012554.F962@xs4all.nl> On Wed, Jan 24, 2001 at 07:31:55AM -0800, Neil Schemenauer wrote: > I would appreciate it if people using platforms other than Linux > and GNU make could give me some feedback on the build process. > Does configure and make work okay? Does "make test" and "make > install" work? Thanks. Only have time for a quick check now, and no time what so ever tomorrow, but at first glance, it looks okay (read: it compiles Python) on BSDI 4.0.1, BSDI 4.1 and FreeBSD 4.2. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr at thyrsus.com Thu Jan 25 01:15:10 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 19:15:10 -0500 Subject: [Python-Dev] (no subject) In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 07:04:50PM -0500 References: Message-ID: <20010124191510.A17782@thyrsus.com> Tim Peters : > Because non-Windows platforms shouldn't be bothered with Windows silliness > any more than Windows users should be bothered with Unix silliness. BROWSER > isn't of any use on Windows, and REGISTRY isn't of any use on Unix. Eric > may still *think* BROWSER is of use on Windows, but if so that's not really > a technical problem . Actually that's not something I have an opinion on. I addressed the original question because I know it would be technically possible to set a BROWSER variable under Windows. Yes, an unlikely move, but possible. -- Eric S. Raymond A man who has nothing which he is willing to fight for, nothing which he cares about more than he does about his personal safety, is a miserable creature who has no chance of being free, unless made and kept so by the exertions of better men than himself. -- John Stuart Mill, writing on the U.S. Civil War in 1862 From tim.one at home.com Thu Jan 25 05:38:54 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 23:38:54 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <3A6EE944.C8CC6EF7@tismer.com> Message-ID: [Christian Tismer] > ... > Not sure if hashes and keys should be apart, but > sure for values. How so? That is, under what assumptions? Any savings from separation would appear to require that I look up keys a lot more than I access the associated values; while trivially true for dicts used as sets, it seems dubious to me for use of dicts as mappings (count[word] += 1, etc). From Jason.Tishler at dothill.com Thu Jan 25 07:09:47 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Thu, 25 Jan 2001 01:09:47 -0500 Subject: [Python-Dev] Re: Python 2.1 alpha 1 released! In-Reply-To: <200101230333.WAA28376@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 10:33:02PM -0500 References: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Message-ID: <20010125010947.M1256@dothill.com> On Mon, Jan 22, 2001 at 10:33:02PM -0500, Guido van Rossum wrote: > - Python should now build out of the box on Cygwin. If it doesn't, > mail to Jason Tishler (jlt63 at users.sourceforge.net). Although Python CVS built OOTB under Cygwin until 2001/01/17 18:54:54, Python 2.1a1 needs a small patch in order to build cleanly under Cygwin. If interested, please see the following for details: http://www.cygwin.com/ml/cygwin-apps/2001-01/msg00019.html Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From tim.one at home.com Thu Jan 25 08:29:19 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 02:29:19 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231549.KAA05172@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > It's no big deal if the Vaults contain three or more set modules -- > perfect even, people can choose the best one for their purpose. They really can't, not realistically, unless all the modules in question conform to the same interface (which users can't control), and users restrict themselves to methods defined only in the interface (which users can control). The problem is that "their purpose" changes over time, and in some cases the effects of representation on performance simply can't be out-guessed in advance of actual measurement. If people need to change any more than just the import statement, *then* a single implementation has to be all things to all people. I hate to say this (bet ?), but I suspect the fact that Python's basic types are all builtin and not classes has kept us from fully appreciating the class-based "1 interface, N implementations" approach that C++ and Java hackers are having so much fun with. They're not all that easy to find, but people who have climbed the steep STL learning curve often end up in the same ecstatic trance I used to see only among fellow Pythoneers. > But in the core, there's only room for one set type or module. I don't like the conclusion: it implies there's no room in the core for more than one implementation of anything, yet one-size-fits-all doesn't. I have no problem with the idea that there's only room for one Set *interface* in the core. Then you only need Pronounce on a reasonable set of abstract operations, and leave the implementation tradeoffs to be made by different people in different ways (I've really got no use for Eric's list-based sets; he's really got no use for my sets-of-sets). That said, if there can be at most one, and must be at least one, a hashtable based set is the best compromise there is, and mutable objects as elements should not be supported (they add great implementation complexity for the benefit of relatively few applications). jeremy's-set-class-couldn't-be-accused-of-overkill-ly y'rs - tim From tim.one at home.com Thu Jan 25 08:57:18 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 02:57:18 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123113050.A26162@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > What you get by going with a dictionary representation is that > membership test becomes close to constant-time, while insertion and > deletion become sometimes cheap and sometimes quite expensive > (depending of course on whether you have to allocate a new > hash bucket). Note that Python's dicts aren't vulnerable to that: they use open addressing in a contiguous, preallocated vector. There are no mallocs() or free()s going on for lookups, deletes, or inserts, unless an insert happens to hit a "time to double the size of the vector" boundary. Deletes never cost more than a lookup; inserts never more unless the table-size boundary is hit (one in 2**N unique inserts, at which point N goes up too). > ... > "works for everbody" isn't really possible here. So my solution > does the next best thing -- pick a choice of tradeoffs that isn't > obviously worse than the alternatives and keeps things bog-simple. I agree that this shouldn't be an either/or choice, but if it's going to be forced into that mold I have to protest that the performance of unordered lists would kill most of the set applications I've ever had. I typically have a small number of very large sets (and I'm talking not 100s, but often 100s of 1000s of elements). The relatively large memory burden of a dict representation wouldn't bother me unless I instead had 100s of 1000s of very small sets. which-we-may-happen-in-my-next-life-but-not-in-this-one-ly y'rs - tim From tim.one at home.com Thu Jan 25 09:08:30 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 03:08:30 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <001101c0857a$c0dce420$770a0a0a@nevex.com> Message-ID: [Greg Wilson] > ... > Unfortunately, if values are required to be immutable, then sets of > sets aren't possible... :-( Sure they are. I wrote about how before, and Moshe put up a simple implementation as a SourceForge patch. Not bulletproof, though: "consentng adults". No matter *what* you implement, I'll find *some* way to trick it into believing my sets are immutable , so don't worry about that. Bulletproof is very hard, and is a minority distraction at best. IIRC, SETL had "by value" semantics when inserting a set into another set as an element, and had some exceedingly hairy copy-on-write scheme under the covers to make that bearably quick. That may be wrong, though. Herman Venter's Slim (Sets, Lists and Maps) language does work that way (Guido, Herman was a friend of the departed Stoffel Erasmus, who you may recall fondly from Python's very early days -- if *that* doesn't make sets attractive to you, nothing will ). Ah! Meant to post this before: http://birch.eecs.lehigh.edu/~bacon/setlprog.ps.gz That's a readable and very good intro to SETL Classic. People pondering computerized sets should at least catch up with what was common knowledge 30 years ago . From thomas at xs4all.net Thu Jan 25 10:24:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 10:24:24 +0100 Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! In-Reply-To: ; from ping@lfw.org on Wed, Jan 24, 2001 at 12:33:43PM -0800 References: Message-ID: <20010125102424.G962@xs4all.nl> On Wed, Jan 24, 2001 at 12:33:43PM -0800, Ka-Ping Yee wrote: > Please try: > >>> def f(a, (b, c), *d): > ... x = 1 > ... print a, b, c, d, x > ... > >>> f(1, (2, 3), 4) > 1 2 3 > Traceback (most recent call last): > File "", line 1, in ? > File "", line 3, in f > UnboundLocalError: local variable 'd' referenced before assignment > >>> > In Python 1.5.2, this prints "1 2 3 (4,)" as expected. > I only have 1.5.2 and 2.1a1 to test. I hope this problem > isn't present in 2.0... It isn't present in 2.0. This is probably related to Jeremy's changes in the call mechanism or the compiler track, though Jeremy himself is the best person to claim that for sure :) > Note that test_inspect was the only test to fail! It might be the > only test that checks anonymous and *varargs at the same time. > (Yet another reason to put inspect in the core...) Well, this is not an inspect-specific test, so it shouldn't *be* in test_inspect, it should be in test_extcall :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Thu Jan 25 10:45:31 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 10:45:31 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/lib libwinsound.tex,1.5,1.6 References: Message-ID: <003801c086b3$8ff41560$e46940d5@hagrid> tim accidentally wrote: > \versionadded{1.5.3} % XXX fix this version number when release is scheduled! 1.5.3? time for a 1.5.3 => 1.6 query replace? > fgrep 1.5.3 doc/*/*.tex doc/lib/libcmp.tex:\deprecated{1.5.3}{Use the \module{filecmp} module inste doc/lib/libcmpcache.tex:\deprecated{1.5.3}{Use the \module{filecmp} module ad.} doc/lib/libwinsound.tex: \versionadded{1.5.3} % XXX fix this version number or am I missing something? Cheers /F From tim.one at home.com Thu Jan 25 12:20:18 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 06:20:18 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/lib libwinsound.tex,1.5,1.6 In-Reply-To: <003801c086b3$8ff41560$e46940d5@hagrid> Message-ID: Gotta ask Fred about this one! > or am I missing something? Yes, the Python 1.5.3 release. I use it all the time . From tismer at tismer.com Thu Jan 25 13:22:32 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 25 Jan 2001 14:22:32 +0200 Subject: [Python-Dev] Intended to work? (lambda x,y:map(eval, ["x", "y"]))(2,3) Message-ID: <3A701A88.F2C68635@tismer.com> In a function like this: def f(x): return eval("x") , eval uses the local function namespace, and the above works. This is according to chapter 2.3 of the Python library ref. Now on my problem: When eval() is used with map, the same mechanism takes place: def f(x): return map(eval,["x"]) It works the same as the above, because map is a builtin function that does not modify the frame chain, so eval finds the local namespace. Not so with Stackless Python (at the moment), since Stackless map assigns an own frame to map without passing the correct namespaces to it. (Reported by Bernd Rinn) Question: Is this by chance, or is eval() *meant* to function with the local namespace, even if it is executed in the context of a function like map() ? The description of map() does not state whether it has to pass its surrounding namespace to the mapped function, and if one simulates map() by writing one's own python implementation, it will fail exactly like Stackless does today. The same applies to apply(). I think I should fix Stackless here, anyway? ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at digicool.com Thu Jan 25 14:35:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 08:35:12 -0500 Subject: [Python-Dev] Re: Intended to work? (lambda x,y:map(eval, ["x", "y"]))(2,3) In-Reply-To: Your message of "Thu, 25 Jan 2001 14:22:32 +0200." <3A701A88.F2C68635@tismer.com> References: <3A701A88.F2C68635@tismer.com> Message-ID: <200101251335.IAA16713@cj20424-a.reston1.va.home.com> > In a function like this: > > def f(x): > return eval("x") > > , eval uses the local function namespace, and the above works. > This is according to chapter 2.3 of the Python library ref. > > Now on my problem: When eval() is used with map, the same > mechanism takes place: > > def f(x): > return map(eval,["x"]) > > It works the same as the above, because map is a builtin function > that does not modify the frame chain, so eval finds the local > namespace. > Not so with Stackless Python (at the moment), since Stackless map > assigns an own frame to map without passing the correct namespaces > to it. (Reported by Bernd Rinn) > > Question: Is this by chance, or is eval() *meant* to function with > the local namespace, even if it is executed in the context of > a function like map() ? Map, being a built-in, is transparent to namespaces. > The description of map() does not state whether it has to pass > its surrounding namespace to the mapped function, and if one > simulates map() by writing one's own python implementation, > it will fail exactly like Stackless does today. The same > applies to apply(). So you can't simulate a built-in. > I think I should fix Stackless here, anyway? Yes. Note: beware of Jeremy's nested scopes. That adds a whole slew of namespaces! (But eval() is more crippled there.) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Thu Jan 25 16:20:45 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 10:20:45 -0500 (EST) Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! In-Reply-To: <20010125102424.G962@xs4all.nl> References: <20010125102424.G962@xs4all.nl> Message-ID: <14960.17485.549337.5476@localhost.localdomain> >>>>> "TW" == Thomas Wouters writes: TW> On Wed, Jan 24, 2001 at 12:33:43PM -0800, Ka-Ping Yee wrote: >> Please try: >> >>> def f(a, (b, c), *d): >> ... x = 1 ... print a, b, c, d, x ... >> >>> f(1, (2, 3), 4) >> 1 2 3 Traceback (most recent call last): File "", line 1, >> in ? File "", line 3, in f UnboundLocalError: local >> variable 'd' referenced before assignment >> >>> >> In Python 1.5.2, this prints "1 2 3 (4,)" as expected. >> I only have 1.5.2 and 2.1a1 to test. I hope this problem isn't >> present in 2.0... TW> It isn't present in 2.0. This is probably related to Jeremy's TW> changes in the call mechanism or the compiler track, though TW> Jeremy himself is the best person to claim that for sure :) The bug is in the compiler. It creates varnames while it is parsing the argument list. While I got the handling of the anonymous tuples right, I forgot to insert *varargs or **kwargs in varnames *before* the names defined in the tuple. I will fix it real soon now. >> Note that test_inspect was the only test to fail! It might be >> the only test that checks anonymous and *varargs at the same >> time. (Yet another reason to put inspect in the core...) TW> Well, this is not an inspect-specific test, so it shouldn't *be* TW> in test_inspect, it should be in test_extcall :) It should probably be in test_grammar. The ext call mechanism is only invoked when the caller uses a form like 'f(*arg)'. Perhaps the name "ext call" isn't very clear. Jeremy From esr at thyrsus.com Thu Jan 25 17:19:36 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 11:19:36 -0500 Subject: [Python-Dev] Waiting method for file objects Message-ID: <20010125111936.A23512@thyrsus.com> I have been researching the question of how to ask a file descriptor how much data it has waiting for the next sequential read, with a view to discovering what cross-platform behavior we could count on for a hypothetical `waiting' method in Python's built-in file class. 1: Why bother? I have these main applications in mind: 1. Detecting EOF on a static plain file. 2. Non-blocking poll of a socket opened in non-blocking mode. 3. Non-blocking poll of a FIFO opened in non-blocking mode. 4. Non-blocking poll of a terminal device opened in non-blocking mode. These are all frequently requested capabilities on C newsgroups -- how often have *you* seen the "how do I detect an individual keypress" question from beginning programmers? I believe having these capabilities would substantially enhance Python's appeal. 2: What would be under the hood? Summary: We can do this portably, and we can do it with only one (1) new #ifdef. Our tools for this purpose will be the fstat(2) st_size field and the FIONREAD ioctl(2) call. They are complementary. In all supposedly POSIX-conformant environments I know of, the st_size field has a documented meaning for plain files (S_IFREG) and may or may not give a meaningful number for FIFOs, sockets, and tty devices. The Single Unix Specification is silent on the meaning of st_size for file types other than regular files (S_IFREG). I have filed a defect report about this with OpenGroup and am discussing appropriate language with them. (The last sentence of the Inferno operating system's language on stat(2) is interesting: "If the file resides on permanent storage and is not a directory, the length returned by stat is the number of bytes in the file. For directories, the length returned is zero. Some devices report a length that is the number of bytes that may be read from the device without blocking.") The FIONREAD ioctl(2) call, on the other hand, returns bytes waiting on character devices such as FIFOs, sockets, or ttys -- but does not return a useful value for files or directories or block devices. The FIONREAD ioctl was supported in both SVr4 and 4.2BSD. It's present in all the open-source Unixes, SunOS, Solaris, and AIX. Via Google search I have discovered that it's also supported in the Windows Sockets API and the GUSI POSIX libraries for the Macintosh. Thus, it can be considered portable for Python's purposes even though it's rather sparsely documented. I was able to obtain confirming information on Linux from Linus Torvalds himself. My information on Windows and the Mac is from Gavriel State, formerly a lead developer on Corel's WINE team and a programmer with extensive cross-platform experience. Gavriel reported on the MSCRT POSIX environment, on the Metrowerks Standard Library POSIX implementation for the Mac, and on the GUSI POSIX implementation for the Mac. 2.1: Plain files Torvalds and State confirm that for plain files (S_IFREG) the st_size field is reliable on all three platforms. On the Mac it gives the file's data fork size. One apparent difficulty with the plain-file case is that POSIX does not guarantee anything about seek_t quantities such as lseek(2) returns and the st_size field except that they can be compared for equality. Thus, under the strict letter of POSIX law, `waiting' can be used to detect EOF but not to get a reliable read-size return in any other file position. Fortunately, this is less an issue than it appears. The weakness of the POSIX language was a 1980s-era concession to a generation of mainframe operating systems with record-oriented file structures -- all of which are now either thoroughly obsolete or (in the case of IBM VM/CMS) have become Linux emulators :-). On modern operating systems under which files have character granularity, stat(2) emulations can be and are written to give the right result. 2.2: Block devices The directory case (S_IFDIR) is a complete loss. Under Unixes, including Linux, the fstat(2) size field gives the allocated size of the directory as if it were a plain file. Under MSCRT POSIX the meaning is undocumented and unclear. Metroworks returns garbage. GUSI POSIX returns the number of files in the directory! FIONREAD cannot be used on directories. Block devices (S_IFBLK) are a mess again. Linus points out that a system with removable or unmountable volumes *cannot* return a useful st_size field -- what happens when the device is dismounted? 2.3: Character devices Pipes and FIFOs (S_IFIFO) look better. On MSCRT the fstat(2) size field returns the number of bytes waiting to be read. This is also true under current Linuxes, though Torvalds says it is "an implementation detail" and recommends polling with the FIONREAD ioctl instead. Fortunately, FIONREAD is available under Unix, Windows, and the Mac. Sockets (S_IFSOCK) look better too. Under Linux, the fstat(2) size field gives number of bytes waiting. Torvalds again says this is "an implementation detail" and recommends polling with the FIONREAD ioctl. Neither MSCRT POSIX nor Metroworks has direct support for sockets. GUSI POSIX returns 1 (!) in the st_size field. But FIONREAD is available under Unix, Windows, and the GUSI POSIX libraries on the Mac. Character devices (S_IFCHR) can be polled with FIONREAD. This technique has a long history of use with tty devices under Unix. I don't know whether it will work with the equivalents of terminal devices for Windows and the Mac. Fortunately this is not a very important question, as those are GUI environments with the terminal devices are rarely if ever used. 3. How does this turn into Python? The upshot of our portability analysis is that by using FIONREAD and fstat(2), we can get useful results for plain files, pipes, and sockets on all three platforms. Directories and block devices are a complete loss. Character devices (in particular, ttys) we can poll reliably under Unix. What we'll get polling the equivalents of tty or character devices under Windows and the Mac is presently unknown, but also unimportant. My proposed semantics for a Python `waiting' method is that it reports the amount of data that would be returned by a read() call at the time of the waiting-method invocation. The interpreter throws OSError if such a report is impossible or forbidden. I have enclosed a patch against the current CVS sources, including documentation. This patch is tested and working against plain files, sockets, and FIFOs under Linux. I have also attached the Python test program I used under Linux. I would appreciate it if those of you on Windows and Macintosh machines would test the waiting method. The test program will take some porting, because it needs to write to a FIFO in background. Under Linux I do it this way: (echo -n '%s' >testfifo; echo 'Data written to FIFO.') & I don't know how to do the equivalent under Windows or Mac. When you run this program, it will try to mail me your test results. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address -------------- next part -------------- Index: fileobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/fileobject.c,v retrieving revision 2.108 diff -c -r2.108 fileobject.c *** fileobject.c 2001/01/18 03:03:16 2.108 --- fileobject.c 2001/01/25 16:16:10 *************** *** 35,40 **** --- 35,44 ---- #include #endif + #ifndef DONT_HAVE_IOCTL_H + #include + #endif + typedef struct { PyObject_HEAD *************** *** 423,428 **** --- 427,513 ---- } static PyObject * + file_waiting(PyFileObject *f, PyObject *args) + { + struct stat stbuf; + #ifdef HAVE_FSTAT + int ret; + #endif + + if (f->f_fp == NULL) + return err_closed(); + if (!PyArg_NoArgs(args)) + return NULL; + #ifndef HAVE_FSTAT + PyErr_SetString(PyExc_OSError, "fstat(2) is not available."); + clearerr(f->f_fp); + return NULL; + #else + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = fstat(fileno(f->f_fp), &stbuf); + Py_END_ALLOW_THREADS + if (ret == -1) { /* the fstat failed */ + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } else if (S_ISDIR(stbuf.st_mode) || S_ISBLK(stbuf.st_mode)) { + PyErr_SetString(PyExc_IOError, + "Can't poll a block device or directory."); + clearerr(f->f_fp); + return NULL; + } else if (S_ISREG(stbuf.st_mode)) { /* plain file */ + #if defined(HAVE_LARGEFILE_SUPPORT) && SIZEOF_OFF_T < 8 && SIZEOF_FPOS_T >= 8 + fpos_t pos; + #else + off_t pos; + #endif + Py_BEGIN_ALLOW_THREADS + errno = 0; + pos = _portable_ftell(f->f_fp); + Py_END_ALLOW_THREADS + if (pos == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + #if !defined(HAVE_LARGEFILE_SUPPORT) + return PyInt_FromLong(stbuf.st_size - pos); + #else + return PyLong_FromLongLong(stbuf.st_size - pos); + #endif + } else if (S_ISFIFO(stbuf.st_mode) + || S_ISSOCK(stbuf.st_mode) + || S_ISCHR(stbuf.st_mode)) { /* stream device */ + #ifndef FIONREAD + PyErr_SetString(PyExc_OSError, + "FIONREAD is not available."); + clearerr(f->f_fp); + return NULL; + #else + int waiting; + + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = ioctl(fileno(f->f_fp), FIONREAD, &waiting); + Py_END_ALLOW_THREADS + if (ret == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + + return Py_BuildValue("i", waiting); + #endif /* FIONREAD */ + } else { /* should never happen! */ + PyErr_SetString(PyExc_OSError, "Unknown file type."); + clearerr(f->f_fp); + return NULL; + } + #endif /* HAVE_FSTAT */ + } + + static PyObject * file_fileno(PyFileObject *f, PyObject *args) { if (f->f_fp == NULL) *************** *** 1263,1268 **** --- 1348,1354 ---- {"truncate", (PyCFunction)file_truncate, 1}, #endif {"tell", (PyCFunction)file_tell, 0}, + {"waiting", (PyCFunction)file_waiting, 0}, {"readinto", (PyCFunction)file_readinto, 0}, {"readlines", (PyCFunction)file_readlines, 1}, {"xreadlines", (PyCFunction)file_xreadlines, 1}, -------------- next part -------------- #!/usr/bin/env python import sys, os, random, string, time, socket, smtplib, readline print "This program tests the `waiting' method of file objects." fp = open("waiting_test.py") if hasattr(fp, "waiting"): print "Good, you're running a patched Python with `waiting' available." else: print "You haven't installed the `waiting' patch yet. This won't work." sys.exit(1) successes = "" failures = "" nogo = "" print "" print "First, plain files:" filesize = fp.waiting() print "There are %d bytes waiting to be read in this file." % filesize if os.name == 'posix': os.system("ls -l waiting_test.py") print "That should match the number in the ls listing above." else: print "Please check this with your OS's directory tools." get = random.randrange(fp.waiting()) print "I'll now read a random number (%d) of bytes." % get fp.read(get) print "The waiting method sees %d bytes left." % fp.waiting() if get + fp.waiting() == filesize: print "%d + %d = %d. That's consistent. Test passed." % \ (get, fp.waiting(), filesize) successes += "Plain file random-read test passed.\n" else: print "That's not consistent. Test failed." failures += "Plain file random-read test failed\n" print "Now let's see if we can detect EOF reliably." fp.read() left = fp.waiting() print "I'll do a read()...the waiting method now returns %d" % left if left == 0: print "That looks like EOF." successes += "Plain file EOF test passed.\n" else: print "%d bytes left. Test failed." % left failures += "Plain file EOF test failed\n" fp.close() print "" print "Now sockets:" print "Connecting to imap.netaxs.com's IMAP server now..." sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) file = sock.makefile('rb') sock.connect(("imap.netaxs.com", 143)) print "Waiting a few seconds to avoid a race condition..." time.sleep(3) greetsize = file.waiting() print "There appear to be %d bytes waiting..." % greetsize greeting = file.readline() print "I just read the greeting line..." sys.stdout.write(greeting) if len(greeting) == greetsize: print "...and the size matches. Test passed." successes += "Socket test passed.\n" else: print "That's not right. Test failed." failures += "Socket test failed.\n" sock.close() print "" if not hasattr(os, "mkfifo"): print "Your platform doesn't have FIFOs (mkfifo() is absent), so I can't test them." nogo = "FIFO test could not be performed." else: print "Now FIFOs:" print "I'm making a FIFO named testfifo."; os.mkfifo("testfifo") str = string.letters[:random.randrange(len(string.letters))] print "I'm going to send it the following string '%s' of random length %d:" \ % (str, len(str),) # Note: Unix dependency here! os.system("(echo -n '%s' >testfifo; echo 'Data written to FIFO.') &" % str) fp = open("testfifo", "r") print "Waiting a few seconds to avoid a race condition..." time.sleep(3) ready = fp.waiting() print "I see %d bytes waiting in the FIFO." % ready if ready == len(str): print "That's consistent. Test passed." successes += "FIFO test passed.\n" else: print "That's not consistent. Test failed." failures += "FIFO test failed\n" os.remove("testfifo") print "\nSummary:" report = "Platform is: %s, version is %s\n" % (sys.platform, sys.version) if successes: report += "The following tests succeeded:\n" + successes if failures: report += "The following tests failed:\n" + failures if nogo: report += "The following tests could not be performed:\n" + nogo if not nogo: report += "No tests were skipped.\n" if not failures: report += "All tests succeeded.\n" print report if os.name == 'posix': me = os.environ["USER"] + "@" + socket.getfqdn() else: me = raw_input("Enter your emasil address, please?") try: server = smtplib.SMTP('localhost') report = ("From: %s\nTo: esr at thyrsus.com\nSubject: waiting_test\n\n" % me) + report server.sendmail(me, ["esr at thyrsus.com"], report) server.quit() except: print "The attempt to mail your test result failed.\n" From esr at snark.thyrsus.com Thu Jan 25 17:46:20 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 11:46:20 -0500 Subject: [Python-Dev] Documentation patch for waiting method. Message-ID: <200101251646.f0PGkKM23567@snark.thyrsus.com> Index: libstdtypes.tex =================================================================== RCS file: /cvsroot/python/python/dist/src/Doc/lib/libstdtypes.tex,v retrieving revision 1.50 diff -u -r1.50 libstdtypes.tex --- libstdtypes.tex 2001/01/17 01:18:00 1.50 +++ libstdtypes.tex 2001/01/25 16:46:40 @@ -1142,6 +1142,24 @@ \UNIX{} versions support this operation). \end{methoddesc} +\begin{methoddesc}[file]{waiting}{} + Return the number of bytes waiting to be read from this file object. + For regular files, this returns the size of the file in bytes minus + the current seek address, as would be returned by \method{tell()}; a + zero return can be used to detect EOF. For streams such as FIFOs, + sockets, Unix ttys, and other Unix character devices, this method + returns the number of bytes currently buffered up and waiting to be + read. Attempts to call this method on Unix block devices or + on directories will raise an error. + \footnote{The \method{waiting()} method uses + \cfunction{fstat(2)} and \cfunction{lseek(2)} on plain files; + these should be reliable on all of Unix, Windows, and MacOS. + It uses the FIONREAD ioctl(2) call to query FIFOs, sockets, + Unix ttys, and other POSIX character devices; FIFO and socket + behavior should be consistent across all three platforms, but + the results from querying other character devices may vary.} +\end{methoddesc} + \begin{methoddesc}[file]{write}{str} Write a string to the file. There is no return value. Note: Due to buffering, the string may not actually show up in the file until -- Eric S. Raymond "To disarm the people... was the best and most effectual way to enslave them." -- George Mason, speech of June 14, 1788 From fredrik at effbot.org Thu Jan 25 20:23:50 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 20:23:50 +0100 Subject: [Python-Dev] Fw: random.py gives wrong results (+ a solution) Message-ID: <00f701c08704$59bde510$e46940d5@hagrid> I'm pretty sure Tim's seen this already, but just in case... ----- Original Message ----- From: "Ivan Frohne" Newsgroups: comp.lang.python Sent: Thursday, January 25, 2001 5:20 PM Subject: Re: random.py gives wrong results (+ a solution) > > "Janne Sinkkonen" wrote in message > news:m3u26oy1rw.fsf at kinos.nnets.fi... > > > > At least in Python 2.0 and earlier, the samples returned by the > > function betavariate() of random.py are not from a beta distribution > > although the function name misleadingly suggests so. > > > > The following would give beta-distributed samples: > > > > def betavariate(alpha, beta): > > y = gammavariate(alpha,1) > > if y==0: return 0.0 > > else: return y/(y+gammavariate(beta,1)) > > > > This is from matlab. A comment in the original matlab code refers to > > Devroye, L. (1986) Non-Uniform Random Variate Generation, theorem 4.1A > > (p. 430). Another reference would be Gelman, A. et al. (1995) Bayesian > > data analysis, p. 481, which I have checked and found to agree with > > the code above. > > > I'm convinced that Janne Sinkkonen is right: The beta distribution > generator in module random.py does not return Beta-distributed > random numbers. Janne's suggested fix should work just fine. > > Here's my guess on how and why this bug bit -- it won't be of interest to > most but > this subject is so obscure sometimes that there needs to be a detailed > analysis. > > The probability density function of the gamma distribution with (positive) > parameters > A and B is usually written > > g(x; A, B) = (x**(A-1) * exp(x/B)) / (Gamma(A) * B**A), where x, A, and > B > 0. > > Here Gamma(A) is the gamma function -- for A a positive integer, Gamma(A) is > the > factorial of A - 1, Gamma(A) = (A-1)!. In fact, this is the definition used > by the authors of random.py in defining gammavariate(alpha, beta), the gamma > distribution random number generator. > > Now it happens that a gamma-distributed random variable with parameters A = > 1 and > B has the (much simpler) exponential distribution with density function > > g(x; 1, B) = exp(-x/B) / B. > > Keep that in mind. > > The reference "Discrete Event Simulation in ," by Kevin Watkins > (McGraw-Hill, 1993) > was consulted by the random.py authors. But this reference defines the > gamma probability distribution a little differently, as > > g1(x; A, B) = (B**A * x**(A-1) * exp(B*x)) / Gamma(A), where x, A, B > > 0. > > (See p. 85). On page 87, Watkins states (incorrectly) that if grv(A, B) is > a function which > returns a gamma random variable with parameters A and B (using his > definition on p. 85), > then the function > > brv(A, B) = grv(1, 1/B) / ( grv(1, 1/B) + grv(1, A) ) [ not > true!] > > will return a random variable which has the beta distribution with > parameters A and B. > > Believing Watkins to be correct, the random.py authors remembered that a > gamma > random variable with parameter A = 1 is just an exponential random variable > and > further simplified their beta generator to > > brv(A, B) = erv(1/B) / (erv(1/B) + erv(A)), where erv(K) is a random > variable > > having the exponential distribution with > > parameter K. > > The corrected equation for a beta random variable, using Watkins' definition > of the > gamma density, is > > brv(A, B) = grv(A, 1) / ( grv(A, 1) + grv(1/B, 1) ), > > which translates to > > brv(A, B) = grv(A, 1) / (grv(A, 1) + grv(B, 1) > > using the more common gamma density definition (the one used in random.py). > Many standard statistical references give this equation -- two are > "Non-Uniform random Variate Generation," by Luc Devroye, Springer-Verlag, > 1986, > p. 432, and "Monte Carlo Concepts, Algorithms and Applications," by > George S. Fishman, Springer, 1996, p. 200. > > --Ivan Frohne > > > > > >>> > > > > From jeremy at alum.mit.edu Thu Jan 25 18:13:03 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 12:13:03 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <20010124073155.B32266@glacier.fnational.com> References: <20010124073155.B32266@glacier.fnational.com> Message-ID: <14960.24223.599357.388059@localhost.localdomain> Neil, What would it take to add useful dependency information to the Makefile? Or does it already exist? When I was working the nested scopes, building was tedious at times because a change to funcobject.h meant that, e.g., newmodule.c needed to be recompiled. The Makefiles didn't capture that information, so I had been adding it to the individual Makefiles, e.g. newmodule.o: newmodule.c ../Include/funcobject.h (I think this worked.) It would be great if the Makefile captured all the dependencies. Could we just use makedepend? Jeremy From MarkH at ActiveState.com Thu Jan 25 20:43:35 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 25 Jan 2001 11:43:35 -0800 Subject: [Python-Dev] Waiting method for file objects In-Reply-To: <20010125111936.A23512@thyrsus.com> Message-ID: > I would appreciate it if those of you on Windows and Macintosh > machines would test the waiting method. The test program will take > some porting, because it needs to write to a FIFO in background. This didn't compile under Windows. I have a patch (against CVS) that compiles, but doesnt appear to work (and will be forwarded to Eric under seperate cover) [news flash :-) Changing the open call to add "rb" as the mode makes it work - text v binary bites again] I didn't try any sort of fifo test. The sockets test failed with a socket error, but would certainly have failed had the socket connected, as my patch includes: #ifndef S_ISSOCK # define S_ISSOCK(mode) (0) #endif I have no idea if it managed to mail the results, but I guess not, so the output is below. The test file (after some small mods, including the "rb" param) is indeed 4252 bytes long. Hope this is useful! Mark. This program tests the `waiting' method of file objects. Good, you're running a patched Python with `waiting' available. First, plain files: There are 4252 bytes waiting to be read in this file. Please check this with your OS's directory tools. I'll now read a random number (3091) of bytes. The waiting method sees 1161 bytes left. 3091 + 1161 = 4252. That's consistent. Test passed. Now let's see if we can detect EOF reliably. I'll do a read()...the waiting method now returns 0 That looks like EOF. Now sockets: Connecting to imap.netaxs.com's IMAP server now... Traceback (most recent call last): File "c:\temp\waiting_test.py", line 57, in ? sock.connect(("imap.netaxs.com", 143)) File "", line 1, in connect socket.error: (10060, 'Operation timed out') From nas at arctrix.com Thu Jan 25 14:07:53 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 05:07:53 -0800 Subject: [Python-Dev] Makefile changes In-Reply-To: <14960.24223.599357.388059@localhost.localdomain>; from jeremy@alum.mit.edu on Thu, Jan 25, 2001 at 12:13:03PM -0500 References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> Message-ID: <20010125050753.A1573@glacier.fnational.com> On Thu, Jan 25, 2001 at 12:13:03PM -0500, Jeremy Hylton wrote: > What would it take to add useful dependency information to the > Makefile? Or does it already exist? Some of it exists but I don't think its complete. > When I was working the nested scopes, building was tedious at times > because a change to funcobject.h meant that, e.g., newmodule.c needed > to be recompiled. The Makefiles didn't capture that information, so I > had been adding it to the individual Makefiles, e.g. > > newmodule.o: newmodule.c ../Include/funcobject.h > > (I think this worked.) Hmm, I don't think so. Which makefile did you add this to? Are you using the new makefile? The Makefile.pre.in file contains a line like: $(LIBRARY_OBJS) $(MAINOBJ): $(PYTHON_HEADERS) but newmodule.o not in LIBRARY_OBJS. By default its not compiled by make but with distutils. If you add newmodule to Setup then a line like: Modules/newmodule.o: $(PYTHON_HEADERS) would do the trick. I think I will add a line like: $(MODOBJS): $(PYTHON_HEADERS) to fix the problem. I could easily restore the mkdep target but my feeling right now that explicitly including the header dependencies is better. What do you think? Neil From jeremy at alum.mit.edu Thu Jan 25 21:02:46 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 15:02:46 -0500 (EST) Subject: [Python-Dev] PEP 227 checkins to follow Message-ID: <14960.34406.342961.834827@localhost.localdomain> I am about to check in the changes that implemention PEP 227. There are many changes, which I will make via separate commits. You might want to wait until the checkins are done to do an update. I'll send a note when I'm done. I also wanted to mention that the PEP has fallen a little out of date. There are a few wrinkles that it doesn't deal with, e.g. def f(x): def g(y): return x + y del x return g For now, this raises a SyntaxError. I'll flesh out the PEP to reflect the current implemention and spec out some of the less obvious cases. I'd welcome any comments on the code itself. I know there are a number of rough edges and also, most likely, a bunch of memory leaks. I'll be working to clean things up before 2.1a2, but wanted to get the code into CVS ASAP. Jeremy From jeremy at alum.mit.edu Thu Jan 25 21:15:01 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 15:15:01 -0500 (EST) Subject: [Python-Dev] checkins done for PEP 227 Message-ID: <14960.35141.237252.468467@localhost.localdomain> It looks like python-dev is very slow, so you'll see my original warning well after the checkins occurred. Oh, well. They're done. Jeremy From tim.one at home.com Thu Jan 25 21:58:03 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 15:58:03 -0500 Subject: [Python-Dev] Fw: random.py gives wrong results (+ a solution) Message-ID: [/F, fwds a c.l.py claim that random.betavariate is dead wrong] Not to worry; I had already entered that into the SF bug database and assigned it to me (hmm: why would you send it to Python-Dev instead of putting it in the database?). I suspect he's correct, and, more importantly, so does Ivan Frohne. We'll settle it before 2.1a2, but perhaps not today. Alas, I have no idea where the original code came from ("Guido" isn't a useful answer -- he was just converting somebody else's C++ code to Python). From fredrik at effbot.org Thu Jan 25 21:42:05 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 21:42:05 +0100 Subject: [Python-Dev] Waiting method for file objects References: <20010125111936.A23512@thyrsus.com> Message-ID: <01fb01c0870f$48517110$e46940d5@hagrid> eric wrote: > Fortunately, this is less an issue than it appears. only if you ignore Windows... -1 on making this a file method +0 on adding it as an optional support function to the os module. From martin at mira.cs.tu-berlin.de Thu Jan 25 21:42:39 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 25 Jan 2001 21:42:39 +0100 Subject: [Python-Dev] jeremy@alum.mit.edu Message-ID: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de> > It would be great if the Makefile captured all the dependencies. That would be great, yes. However, setup.py should probably also consider dependencies. > Could we just use makedepend? Not sure. Certainly not in the build process. I dislike distributions which, as the first thing, perform dependency generation. Dependencies change less often than the actual source, so it is should be sufficient to update them manually. Furthermore, generated files as part of the CVS repository fail to work properly unless everybody uses the exact same generator. For autoconf alone, that's a problem because of multiple autoconf versions. I don't know how many different makedepend versions are in use. Regards, Martin From tim.one at home.com Thu Jan 25 22:02:11 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 16:02:11 -0500 Subject: [Python-Dev] Windows compile broken Message-ID: Linking... Creating library ./python21.lib and object ./python21.exp ceval.obj : error LNK2001: unresolved external symbol _PyCell_Set ceval.obj : error LNK2001: unresolved external symbol _PyCell_Get frameobject.obj : error LNK2001: unresolved external symbol _PyCell_New ./python21.dll : fatal error LNK1120: 3 unresolved externals Error executing link.exe. Sorry if this has already been discussed. I don't see mention of it in the Python-Dev archive, and my email is almost worse than useless (random delays of minutes to days, due to what appears to be the simultaneous worldwide wedging of every email server servicing every email account I have). From esr at thyrsus.com Thu Jan 25 22:12:25 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 16:12:25 -0500 Subject: [Python-Dev] Waiting method for file objects In-Reply-To: <01fb01c0870f$48517110$e46940d5@hagrid>; from fredrik@effbot.org on Thu, Jan 25, 2001 at 09:42:05PM +0100 References: <20010125111936.A23512@thyrsus.com> <01fb01c0870f$48517110$e46940d5@hagrid> Message-ID: <20010125161225.A24305@thyrsus.com> Fredrik Lundh : > > Fortunately, this is less an issue than it appears. > > only if you ignore Windows... I don't understand this. Explain? -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From esr at thyrsus.com Thu Jan 25 22:13:31 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 16:13:31 -0500 Subject: [Python-Dev] jeremy@alum.mit.edu In-Reply-To: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 25, 2001 at 09:42:39PM +0100 References: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de> Message-ID: <20010125161331.B24305@thyrsus.com> Martin v. Loewis : > Not sure. Certainly not in the build process. I dislike distributions > which, as the first thing, perform dependency generation. Dependencies > change less often than the actual source, so it is should be > sufficient to update them manually. Furthermore, generated files as > part of the CVS repository fail to work properly unless everybody uses > the exact same generator. For autoconf alone, that's a problem because > of multiple autoconf versions. I don't know how many different > makedepend versions are in use. Easily solved -- there are script versions of makedepend we can just ship with the distribution. -- Eric S. Raymond Morality is always the product of terror; its chains and strait-waistcoats are fashioned by those who dare not trust others, because they dare not trust themselves, to walk in liberty. -- Aldous Huxley From mal at lemburg.com Thu Jan 25 22:26:04 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 25 Jan 2001 22:26:04 +0100 Subject: [Python-Dev] Windows compile broken References: Message-ID: <3A7099EC.81689EA5@lemburg.com> Tim Peters wrote: > > Linking... > Creating library ./python21.lib and object ./python21.exp > ceval.obj : error LNK2001: unresolved external symbol _PyCell_Set > ceval.obj : error LNK2001: unresolved external symbol _PyCell_Get > frameobject.obj : error LNK2001: unresolved external symbol _PyCell_New > ./python21.dll : fatal error LNK1120: 3 unresolved externals > Error executing link.exe. > > Sorry if this has already been discussed. I don't see mention of it in the > Python-Dev archive, and my email is almost worse than useless (random delays > of minutes to days, due to what appears to be the simultaneous worldwide > wedging of every email server servicing every email account I have). These must be related to checkins by Jeremy and his nested scopes... (I knew these would get us into trouble ;-) I think Jeremy forgot to check in the needed change for Objects/Makefile.in and probably the Windows project file is missing the new object type too (Objects/cellobject.c). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at alum.mit.edu Thu Jan 25 22:14:52 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 16:14:52 -0500 (EST) Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A7099EC.81689EA5@lemburg.com> References: <3A7099EC.81689EA5@lemburg.com> Message-ID: <14960.38732.773129.793360@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> Tim Peters wrote: >> >> Linking... Creating library ./python21.lib and object >> ./python21.exp ceval.obj : error LNK2001: unresolved external >> symbol _PyCell_Set ceval.obj : error LNK2001: unresolved external >> symbol _PyCell_Get frameobject.obj : error LNK2001: unresolved >> external symbol _PyCell_New ./python21.dll : fatal error LNK1120: >> 3 unresolved externals Error executing link.exe. >> >> Sorry if this has already been discussed. I don't see mention of >> it in the Python-Dev archive, and my email is almost worse than >> useless (random delays of minutes to days, due to what appears to >> be the simultaneous worldwide wedging of every email server >> servicing every email account I have). MAL> These must be related to checkins by Jeremy and his nested MAL> scopes... (I knew these would get us into trouble ;-) Just you wait and see! MAL> I think Jeremy forgot to check in the needed change for MAL> Objects/Makefile.in and probably the Windows project file is MAL> missing the new object type too (Objects/cellobject.c). That's right. I didn't change the Makefile in Objects or do anything with Windows. Don't know how to do the latter, but perhaps Tim will stop by my desk next week and show me. As for the Makefile, I thought I saw a message from Neil saying not to update those anymore. Jeremy From nas at arctrix.com Thu Jan 25 16:10:56 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 07:10:56 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include cellobject.h,NONE,2.1 Python.h,2.30,2.31 In-Reply-To: ; from jhylton@users.sourceforge.net on Thu, Jan 25, 2001 at 12:04:16PM -0800 References: Message-ID: <20010125071056.A2390@glacier.fnational.com> On Thu, Jan 25, 2001 at 12:04:16PM -0800, Jeremy Hylton wrote: > A cell contains a reference to a single PyObject. It could be > implemented as a mutable, one-element sequence, but the separate type > has less overhead. Can this object be involved in reference cycles? If so, it should probably have the GC methods added to it. Neil From jeremy at alum.mit.edu Thu Jan 25 22:42:04 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 16:42:04 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include cellobject.h,NONE,2.1 Python.h,2.30,2.31 In-Reply-To: <20010125071056.A2390@glacier.fnational.com> References: <20010125071056.A2390@glacier.fnational.com> Message-ID: <14960.40364.594582.353511@localhost.localdomain> >>>>> "NS" == Neil Schemenauer writes: NS> On Thu, Jan 25, 2001 at 12:04:16PM -0800, Jeremy Hylton wrote: >> A cell contains a reference to a single PyObject. It could be >> implemented as a mutable, one-element sequence, but the separate >> type has less overhead. NS> Can this object be involved in reference cycles? If so, it NS> should probably have the GC methods added to it. It's already there. (Last five lines of cellobject.c quoted as proof.) > Py_TPFLAGS_DEFAULT | Py_TPFLAGS_GC, /* tp_flags */ > 0, /* tp_doc */ > (traverseproc)cell_traverse, /* tp_traverse */ > (inquiry)cell_clear, /* tp_clear */ >}; From nas at arctrix.com Thu Jan 25 16:19:22 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 07:19:22 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A7099EC.81689EA5@lemburg.com>; from mal@lemburg.com on Thu, Jan 25, 2001 at 10:26:04PM +0100 References: <3A7099EC.81689EA5@lemburg.com> Message-ID: <20010125071922.B2390@glacier.fnational.com> On Thu, Jan 25, 2001 at 10:26:04PM +0100, M.-A. Lemburg wrote: > I think Jeremy forgot to check in the needed change for > Objects/Makefile.in That file is dead. Should I remove it now? I haven't heard any major complaints about Makefile.pre.in yet. Maybe the messages are all sitting in the python.org mail spool. Barry, what the hell is going on? You need to drop that Postfix crap and get qmail. :-) Neil From thomas at xs4all.net Thu Jan 25 23:19:37 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 23:19:37 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include modsupport.h,2.35,2.36 In-Reply-To: ; from fdrake@users.sourceforge.net on Thu, Jan 25, 2001 at 02:13:36PM -0800 References: Message-ID: <20010125231937.I962@xs4all.nl> On Thu, Jan 25, 2001 at 02:13:36PM -0800, Fred L. Drake wrote: > The addition of new parameters to functions in the Python/C API requires > that PYTHON_API_VERSION be incremented. When we update the API version, isn't it time to clean up the TP_HASFEATURE stuff ? Since we updated the API, all the current slots should be there, right ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Thu Jan 25 23:32:32 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 17:32:32 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include modsupport.h,2.35,2.36 In-Reply-To: Your message of "Thu, 25 Jan 2001 23:19:37 +0100." <20010125231937.I962@xs4all.nl> References: <20010125231937.I962@xs4all.nl> Message-ID: <200101252232.RAA20013@cj20424-a.reston1.va.home.com> > > The addition of new parameters to functions in the Python/C API requires > > that PYTHON_API_VERSION be incremented. > > When we update the API version, isn't it time to clean up the TP_HASFEATURE > stuff ? Since we updated the API, all the current slots should be there, > right ? No, we're issuing a warning about old API versions but still try to work with them. After all most extensions don't create frame or code objects. I added the flags for the tp_richcompare field when I tried 2.1a1 with Zope's ExtensionClasses and Acquisition modules. Turns out I cot a core dump, while 2.1 ran flawlessly. The reason: they have their own type struct which has the same lay-out as the Python 1.5.2 (or even older) type struct, followed by fields of their own. They have the tp_flags field set to 0, so up to 2.0, it was compatible. I expect that 2.1a2 will work with the unchanged Zope code because of the flag I added. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Jan 26 00:04:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 00:04:54 +0100 Subject: [Python-Dev] Windows compile broken References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> Message-ID: <3A70B116.12BF756B@lemburg.com> Neil Schemenauer wrote: > > On Thu, Jan 25, 2001 at 10:26:04PM +0100, M.-A. Lemburg wrote: > > I think Jeremy forgot to check in the needed change for > > Objects/Makefile.in > > That file is dead. Should I remove it now? I haven't heard any > major complaints about Makefile.pre.in yet. What about that file ? Are you saying that Makefile.pre.in will no longer work in 2.1 ??? Please don't remove that mechanism -- it has been in use for quite a while and is much more stable than distutils. We should at least wait a few more distutils releases for the dust to settle before removing the old fallback solution. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Fri Jan 26 00:06:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 18:06:40 -0500 Subject: [Python-Dev] Windows compile broken In-Reply-To: Your message of "Fri, 26 Jan 2001 00:04:54 +0100." <3A70B116.12BF756B@lemburg.com> References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> Message-ID: <200101252306.SAA20173@cj20424-a.reston1.va.home.com> > > That file is dead. Should I remove it now? I haven't heard any > > major complaints about Makefile.pre.in yet. > > What about that file ? Are you saying that Makefile.pre.in > will no longer work in 2.1 ??? > > Please don't remove that mechanism -- it has been in use for > quite a while and is much more stable than distutils. We should > at least wait a few more distutils releases for the dust to > settle before removing the old fallback solution. Let's at least mark it clearly as obsolete though -- it's a pain to maintain. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Thu Jan 25 17:31:28 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 08:31:28 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A70B116.12BF756B@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 12:04:54AM +0100 References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> Message-ID: <20010125083128.A2699@glacier.fnational.com> On Fri, Jan 26, 2001 at 12:04:54AM +0100, M.-A. Lemburg wrote: > What about that file ? Are you saying that Makefile.pre.in > will no longer work in 2.1 ??? I'm talking about Objects/Makefile.in. Which Makefile.pre.in are you talking about? Modules/Makefile.pre.in is dead too. There is a Makefile.pre.in in the toplevel directory which does the same thing. There is also Misc/Makefile.pre.in. That file gets installed into lib and still works as it aways did. The toplevel Makefile.pre.in can use Modules/Setup* just like the old Modules/Makefile.pre.in could. Does this address your concerns? > Please don't remove that mechanism -- it has been in use for > quite a while and is much more stable than distutils. We should > at least wait a few more distutils releases for the dust to > settle before removing the old fallback solution. No doubt. Neil From nas at arctrix.com Thu Jan 25 17:33:48 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 08:33:48 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <200101252306.SAA20173@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jan 25, 2001 at 06:06:40PM -0500 References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> <200101252306.SAA20173@cj20424-a.reston1.va.home.com> Message-ID: <20010125083348.B2699@glacier.fnational.com> On Thu, Jan 25, 2001 at 06:06:40PM -0500, Guido van Rossum wrote: > Let's at least mark it clearly as obsolete though -- it's a pain to > maintain. Are you talking about Misc/Makefile.pre.in? If so, how do you suggest we mark it? I don't think Modules/Setup should go away any time soon. I often like to build lots of modules staticly into the interpreter. setup.py has no support for building static modules. Neil From tim.one at home.com Fri Jan 26 00:27:52 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 18:27:52 -0500 Subject: [Python-Dev] Windows compile broken In-Reply-To: <14960.38732.773129.793360@localhost.localdomain> Message-ID: Thanks for the clues, everyone! I'll fix it for Windows. Note that I'm getting email in wild bursts, and most often delayed. So I'm generally not seeing any checkin msgs, or SF bug email, or Python-Dev email, ..., anywhere near the time (or, alas, sometimes even day) they're generated. So I simply didn't see the checkin msg introducing cellobject.c. all's-well-that-looks-like-it-may-end-ly y'rs - tim From mal at lemburg.com Fri Jan 26 10:32:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 10:32:14 +0100 Subject: [Python-Dev] Makefile.pre.in (Windows compile broken) References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> <20010125083128.A2699@glacier.fnational.com> Message-ID: <3A71441E.4584A5C8@lemburg.com> Neil Schemenauer wrote: > > On Fri, Jan 26, 2001 at 12:04:54AM +0100, M.-A. Lemburg wrote: > > What about that file ? Are you saying that Makefile.pre.in > > will no longer work in 2.1 ??? > > I'm talking about Objects/Makefile.in. Which Makefile.pre.in are > you talking about? Modules/Makefile.pre.in is dead too. There > is a Makefile.pre.in in the toplevel directory which does the > same thing. There is also Misc/Makefile.pre.in. That file gets > installed into lib and still works as it aways did. The toplevel > Makefile.pre.in can use Modules/Setup* just like the old > Modules/Makefile.pre.in could. Does this address your concerns? Yes. Thanks. I was talking about the Misc/Makefile.pre.in mechanism which was used in the past by many Python C extensions to provide a portable of compiling the extension into a shared module or statically into the Python interpreter. I have been using that mechanism for years now and with much success. Even though I am currently moving to distutils I have no idea how stable distutils is on exotic platforms or ones which have special needs (like e.g. AIX). > > Please don't remove that mechanism -- it has been in use for > > quite a while and is much more stable than distutils. We should > > at least wait a few more distutils releases for the dust to > > settle before removing the old fallback solution. > > No doubt. Ok. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Jan 26 10:37:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 10:37:12 +0100 Subject: [Python-Dev] setup.py Message-ID: <3A714548.C487DCC9@lemburg.com> I have posted two messages here regarding the new setup.py mechanism for building Modules/ but have received no comments on them so far. Here's another go: 1. I think that setup.py should output warnings about modules which cannot be built for some reason rather than having ot the build process completely. 2. I suggest adding -L/usr/lib/termcap to the readline extension. This doesn't hurt anywhere and will get this extension to compile on SuSE Linux too. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Fri Jan 26 13:27:56 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 26 Jan 2001 07:27:56 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <3A714548.C487DCC9@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 10:37:12AM +0100 References: <3A714548.C487DCC9@lemburg.com> Message-ID: <20010126072756.A5013@thyrsus.com> M.-A. Lemburg : > 1. I think that setup.py should output warnings about modules > which cannot be built for some reason rather than having > ot the build process completely. > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > This doesn't hurt anywhere and will get this extension to compile > on SuSE Linux too. Both good ideas. -- Eric S. Raymond Such are a well regulated militia, composed of the freeholders, citizen and husbandman, who take up arms to preserve their property, as individuals, and their rights as freemen. -- "M.T. Cicero", in a newspaper letter of 1788 touching the "militia" referred to in the Second Amendment to the Constitution. From mal at lemburg.com Fri Jan 26 15:13:45 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 15:13:45 +0100 Subject: [Python-Dev] setup.py References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> Message-ID: <3A718619.6278AF41@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > 1. I think that setup.py should output warnings about modules > > which cannot be built for some reason rather than having > > ot the build process completely. > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > This doesn't hurt anywhere and will get this extension to compile > > on SuSE Linux too. > > Both good ideas. Should I implement the two and check these in ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Fri Jan 26 15:25:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 26 Jan 2001 09:25:59 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <3A718619.6278AF41@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 03:13:45PM +0100 References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> <3A718619.6278AF41@lemburg.com> Message-ID: <20010126092559.A5623@thyrsus.com> M.-A. Lemburg : > "Eric S. Raymond" wrote: > > > > M.-A. Lemburg : > > > 1. I think that setup.py should output warnings about modules > > > which cannot be built for some reason rather than having > > > ot the build process completely. > > > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > > This doesn't hurt anywhere and will get this extension to compile > > > on SuSE Linux too. > > > > Both good ideas. > > Should I implement the two and check these in ? I may not channel Guido the way Tim does, but I suspect he gave you developer privileges because he trusts you to do routine stuff like this. -- Eric S. Raymond The saddest life is that of a political aspirant under democracy. His failure is ignominious and his success is disgraceful. -- H.L. Mencken From mal at lemburg.com Fri Jan 26 15:29:18 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 15:29:18 +0100 Subject: [Python-Dev] setup.py References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> <3A718619.6278AF41@lemburg.com> <20010126092559.A5623@thyrsus.com> Message-ID: <3A7189BE.C6C2806E@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > "Eric S. Raymond" wrote: > > > > > > M.-A. Lemburg : > > > > 1. I think that setup.py should output warnings about modules > > > > which cannot be built for some reason rather than having > > > > ot the build process completely. > > > > > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > > > This doesn't hurt anywhere and will get this extension to compile > > > > on SuSE Linux too. > > > > > > Both good ideas. > > > > Should I implement the two and check these in ? > > I may not channel Guido the way Tim does, but I suspect he gave you > developer privileges because he trusts you to do routine stuff like this. Just asking because setup.py is Andrew's baby. I'll add the above two later today. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mwh21 at cam.ac.uk Fri Jan 26 17:40:47 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 26 Jan 2001 16:40:47 +0000 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. Message-ID: Following discussion on c.l.py I've just submitted: http://sourceforge.net/patch/?func=detailpatch&patch_id=103441&group_id=5470 which implements a syntax for adding function attributes inline: >>> def f(a) having (publish=1): ... print 1 ... >>> f.publish 1 It uses an "import-as" like strategy to avoid makeing "having" a keyword (which interacts a bit badly with error reporting, as it happens). Obviously, it would be easy to change "having" to a different word. Another idea I had was: >>> def f(a) having (.publish=1): ... print 1 ... >>> f.publish 1 to emphasize the attributeness of what's going on, but I didn't like this as much in practice (I always forgot the period!). Emile van Sebille also suggested >>> d = {'a':1} >>> def f(a) having (**d): ... print 1 ... >>> f.a 1 which I haven't implemented, because I didn't really like it, but I thought I'd mention. I'll do test suites and documentation in time, but I thought I'd call in here to check the idea wasn't DOA. What do you all think? Cheers, M. -- surely, somewhere, somehow, in the history of computing, at least one manual has been written that you could at least remotely attempt to consider possibly glancing at. -- Adam Rixey From nas at arctrix.com Fri Jan 26 10:55:57 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 01:55:57 -0800 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. In-Reply-To: ; from mwh21@cam.ac.uk on Fri, Jan 26, 2001 at 04:40:47PM +0000 References: Message-ID: <20010126015556.A4215@glacier.fnational.com> I don't see whats wrong with: def f(a): print 1 f.publish = 1 Its perfectly clear to me. As a bonus it works already. I'm -1 on inventing more syntax. Neil From evan at digicool.com Fri Jan 26 18:12:43 2001 From: evan at digicool.com (Evan Simpson) Date: Fri, 26 Jan 2001 12:12:43 -0500 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. References: Message-ID: <00c001c087bb$322a9720$3e48a4d8@digicool.com> From: Michael Hudson > >>> def f(a) having (publish=1): > ... print 1 This doesn't really need special syntax. I would much rather have this (or something like it) as a way of spelling initialized local variables. That is, when I want static local variables, instead of corrupting the function signature by writing: def f(x, marker=[], foo=foo) ...I could write: def f(x) having (marker=[], foo) Cheers, Evan @ digicool From jeremy at alum.mit.edu Fri Jan 26 18:58:24 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 26 Jan 2001 12:58:24 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <20010125050753.A1573@glacier.fnational.com> References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> Message-ID: <14961.47808.315324.734238@localhost.localdomain> >>>>> "NS" == Neil Schemenauer writes: >> When I was working the nested scopes, building was tedious at >> times because a change to funcobject.h meant that, e.g., >> newmodule.c needed to be recompiled. The Makefiles didn't >> capture that information, so I had been adding it to the >> individual Makefiles, e.g. >> >> newmodule.o: newmodule.c ../Include/funcobject.h >> >> (I think this worked.) NS> Hmm, I don't think so. Which makefile did you add this to? Just to clarify: I added this line to the old Makefile before you checked the new one in. NS> Hmm, I don't think so. Which makefile did you add this to? Are NS> you using the new makefile? The Makefile.pre.in file contains a NS> line like: NS> $(LIBRARY_OBJS) $(MAINOBJ): $(PYTHON_HEADERS) NS> but newmodule.o not in LIBRARY_OBJS. By default its not NS> compiled by make but with distutils. If you add newmodule to NS> Setup then a line like: NS> Modules/newmodule.o: $(PYTHON_HEADERS) NS> would do the trick. I think I will add a line like: NS> $(MODOBJS): $(PYTHON_HEADERS) NS> to fix the problem. NS> I could easily restore the mkdep target but my feeling right now NS> that explicitly including the header dependencies is better. NS> What do you think? Isn't it overkill to have every .o file depend on all the .h files? If I change cobject.h, there are very few .o files that depend on this change. I suppose, however, it's not worth the effort to get it right at a finer granularity, e.g. that the only files that depend on cobject.h are cobject, cStringIO, unicodedata, _cursesmodule, object, and unicodeobject. Jeremy From fdrake at acm.org Fri Jan 26 21:36:18 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 26 Jan 2001 15:36:18 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <14961.47808.315324.734238@localhost.localdomain> References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> <14961.47808.315324.734238@localhost.localdomain> Message-ID: <14961.57282.880552.358709@cj42289-a.reston1.va.home.com> Jeremy Hylton writes: > Isn't it overkill to have every .o file depend on all the .h files? > If I change cobject.h, there are very few .o files that depend on this > change. I suppose, however, it's not worth the effort to get it right Perhaps. It's definately easier to maintain than tracking it more specifically and better than what we had, so I'll live with it. ;) > at a finer granularity, e.g. that the only files that depend on > cobject.h are cobject, cStringIO, unicodedata, _cursesmodule, object, > and unicodeobject. And py_curses.h, which is also used in _curses_panel.c. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas at arctrix.com Fri Jan 26 14:58:50 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 05:58:50 -0800 Subject: [Python-Dev] Makefile changes In-Reply-To: <14961.47808.315324.734238@localhost.localdomain>; from jeremy@alum.mit.edu on Fri, Jan 26, 2001 at 12:58:24PM -0500 References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> <14961.47808.315324.734238@localhost.localdomain> Message-ID: <20010126055850.C4918@glacier.fnational.com> On Fri, Jan 26, 2001 at 12:58:24PM -0500, Jeremy Hylton wrote: > Isn't it overkill to have every .o file depend on all the .h files? Maybe, but Python compiles pretty fast anyhow. I'd rather error on the safe side (ie. compiling too much). Trying to figure out which of the subheaders a .c file uses when it imports Python.h would be a lot of work and error prone. More power to you if you want to do it. ;-) Neil From dgoodger at atsautomation.com Fri Jan 26 22:46:13 2001 From: dgoodger at atsautomation.com (Goodger, David) Date: Fri, 26 Jan 2001 16:46:13 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very rusty (long live Python!), I don't know my way around configure, and am not familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of tweaks), but I'm getting caught by the new way of building things. Please help if you can! Many thanks in advance. Here's an excerpt of my efforts: # cd /tmp/py # gunzip -c < python-2.1a1.tgz | tar -rf - # cd Python-2.1a1 # ./configure 2>&1 | tee ../configure.1 # make 2>&1 | tee ../make.1 ... ./python //5/tmp/py/Python-2.1a1/setup.py build 'import site' failed; use -v for traceback Traceback (most recent call last): File "//5/tmp/py/Python-2.1a1/setup.py", line 4, in ? import sys, os, string, getopt ImportError: No module named string Running ./python results in stack overflow. The old QNX instructions in README recommend editing Modules/Makefile: LDFLAGS= -N 64k # make 2>&1 | tee ../make.2 Same error as first make. But now the stack doesn't overflow. # python 'import site' failed; use -v for traceback Python 2.1a1 (#2, Jan 26 2001, 11:38:55) [C] on qnxJ Type "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/usr/local/lib/python', '/home/dgoodger/lib/python', '/5/tmp/py/Python-2.1a1/Lib', '/5/tmp/py/Python-2.1a1/Lib/plat-qnxJ', '/tmp/py/Python-2.1a1/Modules'] >>> ^D # fullpath . . is //5/tmp/py/Python-2.1a1 The QNX node number prefix '//5' (machine or host number, equivalent to a 'hostname:' prefix for network paths) is being reduced somehow (path normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are required at the head of the path. Is this something that can be fixed? I added a prefix (QNX virtual-to-real path mapping on the filesystem tree) to correct this: # prefix -A /5=//5 Now /5 points to //5, similar to a link. # make 2>&1 | tee ../make.3 ... ./python //5/tmp/py/Python-2.1a1/setup.py build unable to execute ld: No such file or directory running build running build_ext building 'struct' extension creating build creating build/temp.qnx-J-PCI-2.1 cc -O -I. -I/5/tmp/py/Python-2.1a1/./Include -IInclude/ -I/usr/local/include -c /5/tmp/py/Python-2.1a1/Modules/structmodule.c -o build/temp.qnx-J-PCI-2.1/structmodule.o creating build/lib.qnx-J-PCI-2.1 ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o build/lib.qnx-J-PCI-2.1/struct.so error: command 'ld' failed with exit status 1 make: *** [sharedmods] Error 1 QNX doesn't have an 'ld' command. Is configure not getting its info to setup.py? (Is it supposed to?) What should I check? I have logs of each of the configure & make runs. Should I submit this as a bug on SourceForge? Hope to hear from somebody soon. David Goodger Systems Administrator & Programmer, Advanced Systems Automation Tooling Systems Inc., Automation Systems Division direct: (519) 653-4483 ext. 7121 fax: (519) 650-6695 e-mail: dgoodger at atsautomation.com From guido at digicool.com Fri Jan 26 22:52:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 26 Jan 2001 16:52:47 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: Your message of "Fri, 26 Jan 2001 16:46:13 EST." References: Message-ID: <200101262152.QAA26624@cj20424-a.reston1.va.home.com> > [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] > > I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very > rusty (long live Python!), I don't know my way around configure, and am not > familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of > tweaks), but I'm getting caught by the new way of building things. Please > help if you can! Many thanks in advance. > > Here's an excerpt of my efforts: > > # cd /tmp/py > # gunzip -c < python-2.1a1.tgz | tar -rf - > # cd Python-2.1a1 > # ./configure 2>&1 | tee ../configure.1 > # make 2>&1 | tee ../make.1 > ... > ./python //5/tmp/py/Python-2.1a1/setup.py build > 'import site' failed; use -v for traceback > Traceback (most recent call last): > File "//5/tmp/py/Python-2.1a1/setup.py", line 4, in ? > import sys, os, string, getopt > ImportError: No module named string > > Running ./python results in stack overflow. The old QNX instructions in > README recommend editing Modules/Makefile: > LDFLAGS= -N 64k > > # make 2>&1 | tee ../make.2 > > Same error as first make. But now the stack doesn't overflow. > > # python > 'import site' failed; use -v for traceback > Python 2.1a1 (#2, Jan 26 2001, 11:38:55) [C] on qnxJ > Type "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.path > ['', '/usr/local/lib/python', '/home/dgoodger/lib/python', > '/5/tmp/py/Python-2.1a1/Lib', '/5/tmp/py/Python-2.1a1/Lib/plat-qnxJ', > '/tmp/py/Python-2.1a1/Modules'] > >>> ^D > > # fullpath . > . is //5/tmp/py/Python-2.1a1 > > The QNX node number prefix '//5' (machine or host number, equivalent to a > 'hostname:' prefix for network paths) is being reduced somehow (path > normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are > required at the head of the path. Is this something that can be fixed? Aha -- you may need QNX-specific path manipulation functions. What's going on is that site.py normalizes the entries in sys.path, using this function: def makepath(*paths): dir = os.path.join(*paths) return os.path.normcase(os.path.abspath(dir)) I've got a feeling that os.path.abspath(dir) here is the culprit in posixpath.py: def abspath(path): """Return an absolute path.""" if not isabs(path): path = join(os.getcwd(), path) return normpath(path) And here I think that normpath(path) is the routine that actually gets rid of the double leading /. Feel free to submit a patch that leaves double leading slashes in if on QNX. > I added a prefix (QNX virtual-to-real path mapping on the filesystem tree) > to correct this: > > # prefix -A /5=//5 > > Now /5 points to //5, similar to a link. > > # make 2>&1 | tee ../make.3 > ... > ./python //5/tmp/py/Python-2.1a1/setup.py build > unable to execute ld: No such file or directory > running build > running build_ext > building 'struct' extension > creating build > creating build/temp.qnx-J-PCI-2.1 > cc -O -I. -I/5/tmp/py/Python-2.1a1/./Include -IInclude/ > -I/usr/local/include -c /5/tmp/py/Python-2.1a1/Modules/structmodule.c -o > build/temp.qnx-J-PCI-2.1/structmodule.o > creating build/lib.qnx-J-PCI-2.1 > ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o > build/lib.qnx-J-PCI-2.1/struct.so > error: command 'ld' failed with exit status 1 > make: *** [sharedmods] Error 1 > > QNX doesn't have an 'ld' command. Is configure not getting its info to > setup.py? (Is it supposed to?) > > What should I check? I have logs of each of the configure & make runs. > Should I submit this as a bug on SourceForge? > > Hope to hear from somebody soon. This is probably in the realm of the distutils. I have no idea how to teach it to build on QNX, sorry! --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at cnri.reston.va.us Fri Jan 26 23:01:01 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 26 Jan 2001 17:01:01 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 04:46:13PM -0500 References: Message-ID: <20010126170101.B2762@amarok.cnri.reston.va.us> On Fri, Jan 26, 2001 at 04:46:13PM -0500, Goodger, David wrote: > ImportError: No module named string The 'import string' in setup.py actually seems to be redundant now, since nothing seems to actually refer to the string module. I've removed it from CVS. >The QNX node number prefix '//5' (machine or host number, equivalent to a >'hostname:' prefix for network paths) is being reduced somehow (path >normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are >required at the head of the path. Is this something that can be fixed? Ooh, very likely: >>> os.path.normpath('//5/foo/bar') '/5/foo/bar' Isn't // at the root a Unix convention of some sort for some network filesystems? Probably normpath() should just leave it alone. >QNX doesn't have an 'ld' command. Is configure not getting its info to >setup.py? (Is it supposed to?) setup.py should be parsing the Makefile. The old QNX instructions say Modules/Makefile should be edited, but with Neil's non-recursive Makefile patch (committed after alpha1's release), editing Modules/Makefile will have no effect. Try editing just the top-level Makefile, which should affect setup.py. --amk From mal at lemburg.com Fri Jan 26 23:15:09 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 23:15:09 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> Message-ID: <3A71F6ED.D6D642A7@lemburg.com> "Andrew M. Kuchling" wrote: > >The QNX node number prefix '//5' (machine or host number, equivalent to a > >'hostname:' prefix for network paths) is being reduced somehow (path > >normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are > >required at the head of the path. Is this something that can be fixed? > > Ooh, very likely: > >>> os.path.normpath('//5/foo/bar') > '/5/foo/bar' > > Isn't // at the root a Unix convention of some sort for some > network filesystems? Probably normpath() should just leave it alone. Samba uses ////. os.path.normpath() should probably leave the leading '//' untouched (having too many of those in the path doesn't do any harm, AFAIK). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nas at arctrix.com Fri Jan 26 16:26:12 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 07:26:12 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 04:46:13PM -0500 References: Message-ID: <20010126072611.A5345@glacier.fnational.com> On Fri, Jan 26, 2001 at 04:46:13PM -0500, Goodger, David wrote: > Running ./python results in stack overflow. The old QNX instructions in > README recommend editing Modules/Makefile: > LDFLAGS= -N 64k > > # make 2>&1 | tee ../make.2 The README should be changed to say edit the toplevel Makefile. Should those flags be the default? If you can give me the MACHDEP from your Makefile I can add it to configure.in. > QNX doesn't have an 'ld' command. Is configure not getting its info to > setup.py? (Is it supposed to?) I'm not sure how distutils figures out what to use for ld. It doesn't appear in the Makefile. It think this is probably some distutils thing. Andrew? Neil From fredrik at effbot.org Fri Jan 26 23:25:34 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 26 Jan 2001 23:25:34 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> Message-ID: <001a01c087e6$ec3b9710$e46940d5@hagrid> mal wrote:> > Ooh, very likely: > > >>> os.path.normpath('//5/foo/bar') > > '/5/foo/bar' > > > > Isn't // at the root a Unix convention of some sort for some > > network filesystems? Probably normpath() should just leave it alone. > > Samba uses ////. os.path.normpath() > should probably leave the leading '//' untouched (having too > many of those in the path doesn't do any harm, AFAIK). from 1.5.2's posixpath: def normpath(path): """Normalize path, eliminating double slashes, etc.""" import string # Treat initial slashes specially slashes = '' while path[:1] == '/': slashes = slashes + '/' path = path[1:] ... return slashes + string.joinfields(comps, '/') from 2.0's posixpath: def normpath(path): """Normalize path, eliminating double slashes, etc.""" if path == '': return '.' import string initial_slash = (path[0] == '/') ... if initial_slash: path = '/' + path return path or '.' interesting... Cheers /F From akuchlin at cnri.reston.va.us Fri Jan 26 23:28:03 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 26 Jan 2001 17:28:03 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <20010126072611.A5345@glacier.fnational.com>; from nas@arctrix.com on Fri, Jan 26, 2001 at 07:26:12AM -0800 References: <20010126072611.A5345@glacier.fnational.com> Message-ID: <20010126172803.A2817@amarok.cnri.reston.va.us> On Fri, Jan 26, 2001 at 07:26:12AM -0800, Neil Schemenauer wrote: >I'm not sure how distutils figures out what to use for ld. It >doesn't appear in the Makefile. It think this is probably some >distutils thing. Andrew? It looks at LDSHARED. See customize_compiler in Lib/distutils/sysconfig.py. Looking in Modules/Makefile, LDFLAGS is only used for the final link to produce a Python executable, so I think this is up to the Makefile, not setup.py. --amk From nas at arctrix.com Fri Jan 26 16:56:41 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 07:56:41 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <20010126172803.A2817@amarok.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Fri, Jan 26, 2001 at 05:28:03PM -0500 References: <20010126072611.A5345@glacier.fnational.com> <20010126172803.A2817@amarok.cnri.reston.va.us> Message-ID: <20010126075641.A5534@glacier.fnational.com> On Fri, Jan 26, 2001 at 05:28:03PM -0500, Andrew M. Kuchling wrote: > On Fri, Jan 26, 2001 at 07:26:12AM -0800, Neil Schemenauer wrote: > >I'm not sure how distutils figures out what to use for ld. > > It looks at LDSHARED. Okay. David, what should LDSHARED say for QNX? I can add the magic to configure.in. Neil From mal at lemburg.com Fri Jan 26 23:51:09 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 23:51:09 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> Message-ID: <3A71FF5D.DC609775@lemburg.com> Fredrik Lundh wrote: > > mal wrote:> > Ooh, very likely: > > > >>> os.path.normpath('//5/foo/bar') > > > '/5/foo/bar' > > > > > > Isn't // at the root a Unix convention of some sort for some > > > network filesystems? Probably normpath() should just leave it alone. > > > > Samba uses ////. os.path.normpath() > > should probably leave the leading '//' untouched (having too > > many of those in the path doesn't do any harm, AFAIK). > > from 1.5.2's posixpath: > > def normpath(path): > """Normalize path, eliminating double slashes, etc.""" > import string > # Treat initial slashes specially > slashes = '' > while path[:1] == '/': > slashes = slashes + '/' > path = path[1:] > ... > return slashes + string.joinfields(comps, '/') > > from 2.0's posixpath: > > def normpath(path): > """Normalize path, eliminating double slashes, etc.""" > if path == '': > return '.' > import string > initial_slash = (path[0] == '/') > ... > if initial_slash: > path = '/' + path > return path or '.' > > interesting... Here's the log message: revision 1.34 date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 added rewritten normpath from Moshe Zadka that does the right thing with paths containing .. and the diff: diff -r1.34 -r1.33 349,350d348 < if path == '': < return '.' 352,367c350,372 < initial_slash = (path[0] == '/') < comps = string.split(path, '/') < new_comps = [] < for comp in comps: < if comp in ('', '.'): < continue < if (comp != '..' or (not initial_slash and not new_comps) or < (new_comps and new_comps[-1] == '..')): < new_comps.append(comp) < elif new_comps: < new_comps.pop() < comps = new_comps < path = string.join(comps, '/') < if initial_slash: < path = '/' + path < return path or '.' --- > # Treat initial slashes specially > slashes = '' > while path[:1] == '/': > slashes = slashes + '/' > path = path[1:] > comps = string.splitfields(path, '/') > i = 0 > while i < len(comps): > if comps[i] == '.': > del comps[i] > while i < len(comps) and comps[i] == '': > del comps[i] > elif comps[i] == '..' and i > 0 and comps[i-1] not in ('', '..'): > del comps[i-1:i+1] > i = i-1 > elif comps[i] == '' and i > 0 and comps[i-1] <> '': > del comps[i] > else: > i = i+1 > # If the path is now empty, substitute '.' > if not comps and not slashes: > comps.append('.') > return slashes + string.joinfields(comps, '/') Revision 1.33 clearly leaves initial slashes untouched. I guess we should restore this... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nas at arctrix.com Fri Jan 26 17:12:15 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 08:12:15 -0800 Subject: [Python-Dev] LINKCC defaults to CXX Message-ID: <20010126081215.B5534@glacier.fnational.com> Dear lord why? So people can develop extensions using C++? Its not worth the pain inflicted on everyone else. Let them recompile with LINKCC=CXX. Linking with CXX opens a huge can of stinky worms. First of all, just because configure found a value for CXX doesn't mean it works. Even if it does that doesn't mean that using it is a good idea. Linking with CXX will bring in the C++ runtime. There are a large number of platforms where the C++ ABI has not been standarized; for example, anything that used g++. Can we please leave LINKCC default to CXX? Its easy enough for the crazies to override if they like. I'll even create a configure option for them. Neil From barry at digicool.com Sat Jan 27 00:09:57 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 26 Jan 2001 18:09:57 -0500 Subject: [Python-Dev] LINKCC defaults to CXX References: <20010126081215.B5534@glacier.fnational.com> Message-ID: <14962.965.464326.794431@anthem.wooz.org> >>>>> "NS" == Neil Schemenauer writes: NS> Can we please leave LINKCC default to CXX? I think you mean default it to CC, eh? +1 From mal at lemburg.com Sat Jan 27 01:16:01 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 01:16:01 +0100 Subject: [Python-Dev] Nightly CVS tarballs Message-ID: <3A721341.3F348E51@lemburg.com> I just got a request from someone who wants to test the latest CVS version but unfortunately can't because he's behind a firewall. Is there any chance of reactivating the nightly tarball generation that was once in place ? http://www.python.org/download/cvs.html Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From dgoodger at atsautomation.com Sat Jan 27 01:30:21 2001 From: dgoodger at atsautomation.com (Goodger, David) Date: Fri, 26 Jan 2001 19:30:21 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: Thank you all for your prompt replies. (Guido's was within seconds! Well, minutes, certainly.) I'll give it another go on Monday. I've got renovations to fill my weekend. /David From thomas at xs4all.net Sat Jan 27 01:35:41 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 27 Jan 2001 01:35:41 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 07:30:21PM -0500 References: Message-ID: <20010127013541.N962@xs4all.nl> On Fri, Jan 26, 2001 at 07:30:21PM -0500, Goodger, David wrote: > Thank you all for your prompt replies. (Guido's was within seconds! Well, > minutes, certainly.) Oh, the wonderful things one can do with a time machine.... -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jeremy at alum.mit.edu Fri Jan 26 23:14:26 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 26 Jan 2001 17:14:26 -0500 (EST) Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <3A721341.3F348E51@lemburg.com> References: <3A721341.3F348E51@lemburg.com> Message-ID: <14961.63170.394043.790610@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> I just got a request from someone who wants to test the latest MAL> CVS version but unfortunately can't because he's behind a MAL> firewall. MAL> Is there any chance of reactivating the nightly tarball MAL> generation that was once in place ? MAL> http://www.python.org/download/cvs.html I plan to set up nightly cvs snapshots soon. We should be moving into our new office next week; I hope to have a machine that is on the net 24x7 shortly after that. Jeremy From bckfnn at worldonline.dk Sat Jan 27 08:58:38 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sat, 27 Jan 2001 07:58:38 GMT Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <14961.63170.394043.790610@localhost.localdomain> References: <3A721341.3F348E51@lemburg.com> <14961.63170.394043.790610@localhost.localdomain> Message-ID: <3a727e79.835771@smtp.worldonline.dk> >>>>>> "MAL" == M -A Lemburg writes: > > MAL> I just got a request from someone who wants to test the latest > MAL> CVS version but unfortunately can't because he's behind a > MAL> firewall. > > MAL> Is there any chance of reactivating the nightly tarball > MAL> generation that was once in place ? > > MAL> http://www.python.org/download/cvs.html [Jeremy] >I plan to set up nightly cvs snapshots soon. We should be moving into >our new office next week; I hope to have a machine that is on the net >24x7 shortly after that. FWIW, I have been using this cron and shell script running on shell.sourceforge.net. This way I don't need 24x7 in order to make a cvs tarball (and .zip) available. 22 2 * * * $HOME/bin/jython-snap SHOTLABEL=`date +%Y%m%d` LOGLABEL=log.`date +%Y%m%d` cd /home/groups/jython/htdocs/cvssnaps (cvs -Qd :pserver:anonymous at cvs1:/cvsroot/jython checkout -d jython-$SHOTLABEL jython && \ tar zcf jython-nightly.tar.gz jython-$SHOTLABEL && \ rm -fr jython-nightly.zip && \ zip -qr9 jython-nightly.zip jython-$SHOTLABEL && \ rm -fr jython-$SHOTLABEL) >$LOGLABEL 2>&1 regards, finn From tim.one at home.com Sat Jan 27 10:35:14 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 27 Jan 2001 04:35:14 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <20010126092559.A5623@thyrsus.com> Message-ID: [Eric S. Raymond] > I may not channel Guido the way Tim does, but I suspect he gave you > developer privileges because he trusts you to do routine stuff like this. Excellent, Eric! You're batting 1%. Here's how to boost it to 93%: whenever a new idea comes up, just grumble "no". You'll be right 92% of the time . Reminds me of a friend who got sucked into working at a neural-net startup trying to build a black box to predict whether the daily close of the S&P 500 would be above or below the previous day's. He was greatly impressed by the research they had done, showing that the prototype got the right answer more than half the time when fed historical data, and at a very high significance level (i.e., it almost certainly did better than flipping a coin). What he didn't realize at the time is that if they had written the prototype in Python: # S&P close daily direction predictor print "higher" it would have been right about 2/3rds the time <0.33 wink>. never-ascribe-to-insight-what-can-be-explained-by-idiocy-ly y'rs - tim From martin at mira.cs.tu-berlin.de Sat Jan 27 10:38:41 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 27 Jan 2001 10:38:41 +0100 Subject: [Python-Dev] Nightly CVS tarballs Message-ID: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> > Is there any chance of reactivating the nightly tarball generation > that was once in place ? What's wrong with http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz ? Regards, Martin From fredrik at effbot.org Sat Jan 27 11:43:50 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 27 Jan 2001 11:43:50 +0100 Subject: [Python-Dev] setup.py References: Message-ID: <008c01c0884e$09bd2030$e46940d5@hagrid> tim wrote: > Reminds me of a friend who got sucked into working at a neural-net startup > trying to build a black box to predict whether the daily close of the S&P > 500 would be above or below the previous day's. /.../ > > # S&P close daily direction predictor > print "higher" replace "higher" with "same", and you have a pretty decent weather predictor. Cheers /F From mal at lemburg.com Sat Jan 27 13:01:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 13:01:30 +0100 Subject: [Python-Dev] Nightly CVS tarballs References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> Message-ID: <3A72B89A.E03C1912@lemburg.com> "Martin v. Loewis" wrote: > > > Is there any chance of reactivating the nightly tarball generation > > that was once in place ? > > What's wrong with > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz > > ? I didn't realize that SF does this automagically. Could someone please redirect the link on the python.org cvs page to the above address (David Ascher's tarball generation stopped in February 2000 !). Thanks for the hint, Martin. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Sat Jan 27 14:16:01 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 27 Jan 2001 08:16:01 -0500 (EST) Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <3A72B89A.E03C1912@lemburg.com> References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> <3A72B89A.E03C1912@lemburg.com> Message-ID: <14962.51729.905084.154359@cj42289-a.reston1.va.home.com> "Martin v. Loewis" wrote: > What's wrong with > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz M.-A. Lemburg writes: > I didn't realize that SF does this automagically. Could someone > please redirect the link on the python.org cvs page to the > above address (David Ascher's tarball generation stopped in > February 2000 !). Did you want a "snapshot" or a copy of the repository? What SF produces is a tarball of the repository, not a snapshot. We still need to do something to create snapshots. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at lemburg.com Sat Jan 27 14:28:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 14:28:40 +0100 Subject: [Python-Dev] Nightly CVS tarballs References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> <3A72B89A.E03C1912@lemburg.com> <14962.51729.905084.154359@cj42289-a.reston1.va.home.com> Message-ID: <3A72CD08.F47DAA69@lemburg.com> "Fred L. Drake, Jr." wrote: > > "Martin v. Loewis" wrote: > > What's wrong with > > > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz > > M.-A. Lemburg writes: > > I didn't realize that SF does this automagically. Could someone > > please redirect the link on the python.org cvs page to the > > above address (David Ascher's tarball generation stopped in > > February 2000 !). > > Did you want a "snapshot" or a copy of the repository? What SF > produces is a tarball of the repository, not a snapshot. I meant a copy of what you get when you check out the Python CVS tree wrapped into a .tar.gz file. The size of the above archive (16MB) suggests that a lot more is going into the .tar.gz file. A .tar.gz of the CVS checkout is around 4MB in size. Looks like we still need to do something after all ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From armin at steinhoff.de Sat Jan 27 17:24:57 2001 From: armin at steinhoff.de (Armin Steinhoff) Date: Sat, 27 Jan 2001 17:24:57 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: <4.3.2.7.2.20010127170125.00b2ee80@mail.secureweb.de> Hello Guido, nice to see the first 2.1 version :) At 16:52 26.01.01 -0500, you wrote: > > [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] > > > > I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very > > rusty (long live Python!), I don't know my way around configure, and am not > > familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of > > tweaks), but I'm getting caught by the new way of building things. Please > > help if you can! Many thanks in advance. > > > > Here's an excerpt of my efforts: > > > > # cd /tmp/py > > # gunzip -c < python-2.1a1.tgz | tar -rf - > > # cd Python-2.1a1 > > # ./configure 2>&1 | tee ../configure.1 I did a fast hack with the new 2.1 version: CC=cc LINKCC=cc configure --without-gcc --shared=no --without-threads (Hope '--shared=no' works ... QNX4 doesn't support dynamic loading) Please replace all references to g++ by cc -> in the main Makefile and the Modules/Makefile. In the Modules/Makefile set LDFLAGS=250K ... the default stacksize of 32K seems to be too small. > > # make 2>&1 | tee ../make.1 > > ... > > ./python //5/tmp/py/Python-2.1a1/setup.py build > > 'import site' failed; use -v for traceback 'python -v' shows that the module 'distutils.util' isn't there .... it seems to be not included in the source distribution. 'import site' failed; traceback: Traceback (most recent call last): File "//1/Python-2.1a1/Lib/site.py", line 85, in ? from distutils.util import get_platform ImportError: No module named distutils.util ^^^^^^^^^^^^^^ [ clip ..] >This is probably in the realm of the distutils. I have no idea how to >teach it to build on QNX, sorry! IMHO ... it is not a path problem. In the moment there is no time left for me to go into these details. A clean port will happen in a few weeks. Please check out PyQNX for news regarding QNX4.25 and QNX6.0 (aka QNX Neutrino). Greetings Armin Steinhoff Life-Demo of PyDACHS http://www.dachs.net/PyDACHS_python-tilcon.htm in our booth at Embedded Systems 2001, Nuremberg, GER http://www.embedded-systems-messe.de Febr. 14-16, 2000 Hall 11, Booth P 04 From guido at digicool.com Sat Jan 27 17:50:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:50:50 -0500 Subject: [Python-Dev] LINKCC defaults to CXX In-Reply-To: Your message of "Fri, 26 Jan 2001 08:12:15 PST." <20010126081215.B5534@glacier.fnational.com> References: <20010126081215.B5534@glacier.fnational.com> Message-ID: <200101271650.LAA30720@cj20424-a.reston1.va.home.com> > Dear lord why? So people can develop extensions using C++? Its > not worth the pain inflicted on everyone else. Let them > recompile with LINKCC=CXX. > > Linking with CXX opens a huge can of stinky worms. First of all, > just because configure found a value for CXX doesn't mean it > works. Even if it does that doesn't mean that using it is a good > idea. Linking with CXX will bring in the C++ runtime. There are > a large number of platforms where the C++ ABI has not been > standarized; for example, anything that used g++. > > Can we please leave LINKCC default to CXX? Its easy enough for > the crazies to override if they like. I'll even create a > configure option for them. Arg. My bad. I did this as an experiment; it didn't break on my machine, but I didn't intend this to become the standard! Thanks for changing it back. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jan 27 17:52:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:52:23 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: Your message of "Fri, 26 Jan 2001 23:51:09 +0100." <3A71FF5D.DC609775@lemburg.com> References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> Message-ID: <200101271652.LAA30750@cj20424-a.reston1.va.home.com> > revision 1.34 > date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 > added rewritten normpath from Moshe Zadka that does the right thing with > paths containing .. [...] > Revision 1.33 clearly leaves initial slashes untouched. > I guess we should restore this... Yes, please! (Just the "leading extra slashes stay" behavior.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jan 27 17:57:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:57:40 -0500 Subject: [Python-Dev] New bug in function object hash() and comparisons In-Reply-To: Your message of "Fri, 26 Jan 2001 17:02:09 EST." References: Message-ID: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> Barry noticed: > Anyway, did you know that you can use functions as keys to a > dictionary, but that you can mutate them to "lose" the element? > > -------------------- snip snip -------------------- > Python 2.0 (#13, Jan 10 2001, 13:06:39) > [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> d = {} > >>> def foo(): pass > ... > >>> def bar(): pass > ... > >>> d[foo] = 1 > >>> d[foo] > 1 > >>> foocode = foo.func_code > >>> foo.func_code = bar.func_code > >>> d[foo] > Traceback (most recent call last): > File "", line 1, in ? > KeyError: > >>> d[bar] = 2 > >>> d[bar] > 2 > >>> d[foo] > 2 > >>> foo.func_code = foocode > >>> d[foo] > 1 > -------------------- snip snip -------------------- > > It's because a function's func_code attribute is used in its hash > calculation, but func_code is writable! Clearly, something changed. I'm pretty sure it's the function attributes. Either the function attributes shouldn't be used in comparing function objects, or hash() on functions should be unimplemented, or comparison on functions should use simple pointer compares. What's the right solution? Do people use functions as dict keys? If not, we can remove the hash() implementation. But I suspect they *are* used as dict keys. Not using the __dict__ on comparisons appears ugly, so probably the best solution is to change function comparisons to use simple pointer compares. That removes the possibility to see whether two different functions implement the same code -- but does anybody really use that? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Sat Jan 27 18:17:50 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sat, 27 Jan 2001 19:17:50 +0200 (IST) Subject: [Python-Dev] New bug in function object hash() and comparisons In-Reply-To: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com>, Message-ID: <20010127171750.91412A840@darjeeling.zadka.site.co.il> On Sat, 27 Jan 2001 11:57:40 -0500, Guido van Rossum wrote: (about function hash doing the wrong thing) > What's the right solution? I have no idea... > Do people use functions as dict keys? If > not, we can remove the hash() implementation. ...but this ain't it. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From gvwilson at ca.baltimore.com Sat Jan 27 18:23:42 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Sat, 27 Jan 2001 12:23:42 -0500 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1119 - 17 msgs In-Reply-To: <20010127170103.DA6DEEA44@mail.python.org> Message-ID: <000001c08885$e5418c40$770a0a0a@nevex.com> > Guido wrote: > What's the right solution? Do people use functions as dict keys? Yup --- even use this as an example in the course (part of drumming home to students that functions are just a special kind of data). Greg From barry at digicool.com Sat Jan 27 18:43:43 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 27 Jan 2001 12:43:43 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> Message-ID: <14963.2255.268933.615456@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Clearly, something changed. I'm pretty sure it's the GvR> function attributes. Actually no. func_code is used in func_hash() but somewhere in the Python 1.6 cycle, func_code was made assignable. GvR> Either the function attributes shouldn't be used in comparing GvR> function objects, or hash() on functions should be GvR> unimplemented, or comparison on functions should use simple GvR> pointer compares. GvR> What's the right solution? We should definitely continue to allow functions as keys to dictionaries, but probably just remove func_code as an input to the function's hash. -Barry From barry at digicool.com Sat Jan 27 18:48:33 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 27 Jan 2001 12:48:33 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> <14963.2255.268933.615456@anthem.wooz.org> Message-ID: <14963.2545.14600.667505@anthem.wooz.org> Me> We should definitely continue to allow functions as keys to Me> dictionaries, but probably just remove func_code as an input Me> to the function's hash. But of course, func_globals won't be sufficient as a hash for functions. Probably changing the hash to a pointer compare is the best thing after all. -Barry From guido at digicool.com Sat Jan 27 18:49:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 12:49:16 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons In-Reply-To: Your message of "Sat, 27 Jan 2001 12:43:43 EST." <14963.2255.268933.615456@anthem.wooz.org> References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> <14963.2255.268933.615456@anthem.wooz.org> Message-ID: <200101271749.MAA32025@cj20424-a.reston1.va.home.com> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Clearly, something changed. I'm pretty sure it's the > GvR> function attributes. > > Actually no. func_code is used in func_hash() but somewhere in the > Python 1.6 cycle, func_code was made assignable. Argh! You're right. > GvR> Either the function attributes shouldn't be used in comparing > GvR> function objects, or hash() on functions should be > GvR> unimplemented, or comparison on functions should use simple > GvR> pointer compares. > > GvR> What's the right solution? > > We should definitely continue to allow functions as keys to > dictionaries, but probably just remove func_code as an input to the > function's hash. OK, that settles it. There's not much point in having a function compare do anything besides a pointer comparison when the code objects aren't compared. (Two completely different functions could compare equal e.g. if they has the same attribute dict.) So we should just punt, and compare functions by object pointer. The proper way to do this is to *delete* func_hash and func_compare from funcobject.c -- the default comparison will take care of this. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Sat Jan 27 19:58:30 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Sat, 27 Jan 2001 13:58:30 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: Message-ID: <200101271858.NAA04898@mira.erols.com> On Sat, 27 Jan 2001 18:28:02 +0100, Andreas Jung wrote: >Is there a reason why 2.1 runs significantly slower ? >Both Python versions were compiled with -g -O2 only. [CC'ing to python-dev] Confirmed: [amk at mira Python-2.0]$ ./python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.14 This machine benchmarks at 3184.71 pystones/second [amk at mira Python-2.0]$ python2.1 Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.81 This machine benchmarks at 2624.67 pystones/second The ceval.c changes seem a likely candidate to have caused this. Anyone want to run Marc-Andre's microbenchmarks and see how the numbers have changed? --amk From moshez at zadka.site.co.il Sat Jan 27 20:14:28 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sat, 27 Jan 2001 21:14:28 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? Message-ID: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Attached is an example Python session after I patched the intepreter. The test-suite passes all right. I want an OK to check this in. Here is the patch: Index: Objects/funcobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/funcobject.c,v retrieving revision 2.33 diff -c -r2.33 funcobject.c *** Objects/funcobject.c 2001/01/25 20:06:59 2.33 --- Objects/funcobject.c 2001/01/27 19:13:08 *************** *** 347,358 **** 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ ! (cmpfunc)func_compare, /*tp_compare*/ (reprfunc)func_repr, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ ! (hashfunc)func_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ (getattrofunc)func_getattro, /*tp_getattro*/ --- 347,358 ---- 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ ! 0, /*tp_compare*/ (reprfunc)func_repr, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ ! 0, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ (getattrofunc)func_getattro, /*tp_getattro*/ Python 2.1a1 (#1, Jan 27 2001, 21:01:24) [GCC 2.95.3 20010111 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> def foo(): ... pass ... >>> def bar(): ... pass ... >>> hash(foo) 135484636 >>> hash(bar) 135481676 >>> foo == bar 0 >>> d = {} >>> d[foo] =1 >>> def temp(): ... print "baz" ... >>> foo.func_code = temp.func_code >>> d[foo] 1 -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one at home.com Sat Jan 27 21:06:20 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 27 Jan 2001 15:06:20 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <200101271858.NAA04898@mira.erols.com> Message-ID: [A.M. Kuchling] > [CC'ing to python-dev] Confirmed: > > [amk at mira Python-2.0]$ ./python Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.14 > This machine benchmarks at 3184.71 pystones/second > [amk at mira Python-2.0]$ python2.1 Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.81 > This machine benchmarks at 2624.67 pystones/second > > The ceval.c changes seem a likely candidate to have caused this. > Anyone want to run Marc-Andre's microbenchmarks and see how the > numbers have changed? Want to, yes, but it looks hopeless on my box: **** 2.0 C:\Python20>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 0.851013 This machine benchmarks at 11750.7 pystones/second C:\Python20>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.24279 This machine benchmarks at 8046.41 pystones/second **** 2.1a1 C:\Python21a1>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 0.823313 This machine benchmarks at 12146 pystones/second C:\Python21a1>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.27046 This machine benchmarks at 7871.15 pystones/second **** CVS C:\Code\python\dist\src\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.836391 This machine benchmarks at 11956.1 pystones/second C:\Code\python\dist\src\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 1.3055 This machine benchmarks at 7659.9 pystones/second That's after a reboot: no matter which Python I use, it gets about 12000 on the first run with a given python.exe, and about 8000 on the second. Not shown is that it *stays* at about 8000 until the next reboot. So there's a Windows (W98SE) Mystery, but also no evidence that timings have changed worth spit under the MS compiler. The eval loop is very touchy, and I suspect you won't track this down on your box until staring at the code gcc (I presume you're using gcc) generates. May be sensitive to which release of gcc you're using too. switch-to-windows-and-you'll-have-easier-things-to-worry-about-ly y'rs - tim From fredrik at pythonware.com Sun Jan 28 10:37:45 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 28 Jan 2001 10:37:45 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> Message-ID: <00ed01c0890e$e3bf5ad0$e46940d5@hagrid> guido wrote: > > Revision 1.33 clearly leaves initial slashes untouched. > > I guess we should restore this... > > Yes, please! (Just the "leading extra slashes stay" behavior.) just looked this up in the specs, and POSIX seem to require that leading slashes are preserved only if there are exactly two of them: A pathname that begins with two successive slashes may be interpreted in an implementation-dependent manner, although more than two leading slashes are treated as a single slash. (from susv2) maybe we should add a if len(slashes) > 2: slashes = "/" test to the patch? Cheers /F From thomas at xs4all.net Sun Jan 28 18:39:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 28 Jan 2001 18:39:58 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <00ed01c0890e$e3bf5ad0$e46940d5@hagrid>; from fredrik@pythonware.com on Sun, Jan 28, 2001 at 10:37:45AM +0100 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> <00ed01c0890e$e3bf5ad0$e46940d5@hagrid> Message-ID: <20010128183958.Q962@xs4all.nl> On Sun, Jan 28, 2001 at 10:37:45AM +0100, Fredrik Lundh wrote: > guido wrote: > > > Revision 1.33 clearly leaves initial slashes untouched. > > > I guess we should restore this... > > > > Yes, please! (Just the "leading extra slashes stay" behavior.) > just looked this up in the specs, and POSIX seem to > require that leading slashes are preserved only if there > are exactly two of them: > A pathname that begins with two successive slashes > may be interpreted in an implementation-dependent > manner, although more than two leading slashes are > treated as a single slash. > (from susv2) > maybe we should add a if len(slashes) > 2: slashes = "/" > test to the patch? How strictly do we need (or want, for that matter) to follow POSIX here ? I'm aware the module is called 'posixpath', but it's used in a bit more than just POSIX environments (or POSIX behaviours) so it might make sense to ignore this particular tidbit. What if there is a system that attaches a special meaning to ///, should we create a new path module for it ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at mira.cs.tu-berlin.de Sun Jan 28 21:50:35 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 21:50:35 +0100 Subject: [Python-Dev] XSLT parser interface Message-ID: <200101282050.f0SKoZr08809@mira.informatik.hu-berlin.de> Based on my previous IDL interface for XPath parsers, I've defined an API for a parser that parsers XSLT pattern expressions. It is an extension to the XPath API, so I attach only the additional functions. Any comments are appreciated. Martin module XPath{ // XSLT exprType values const unsigned short PATTERN = 17; const unsigned short LOCATION_PATTERN = 18; const unsigned short RELATIVE_PATH_PATTERN = 19; const unsigned short STEP_PATTERN = 20; interface Pattern; interface LocationPathPattern; interface RelativePathPattern; interface StepPattern; interface PatternFactory:ExprFactory{ Pattern createPattern(in LocationPathPattern first); // idkey may be null, represents IdKeyPattern // if parent is true, it is '/', else '//' // rel may be null LocationPathPattern createLocationPathPattern(in FunctionCall idkey, boolean parent, in RelativePathPattern rel); // if parent is true, it is /, else // RelativePathPattern createRelativePathPattern(in RelativePathPattern rel, boolean parent, in StepPattern step); StepPattern createStepPattern(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates); }; typedef sequence LocationPathPatterns; interface Pattern:Expr{ readonly attribute LocationPathPatterns patterns; void append(in LocationPathPattern pattern); }; interface LocationPathPattern:Expr{ readonly attribute FunctionCall idkey; readonly attribute boolean parent; readonly attribute RelativePathPattern relative_pattern; }; interface RelativePathPattern:Expr{ readonly attribute RelativePathPattern relative; readonly attribute boolean parent; readonly attribute StepPattern step; }; interface StepPattern:Expr{ readonly attribute AxisSpecifier axis; readonly attribute NodeTest test; readonly attribute PredicateList predicates; }; interface XSLTParser:Parser{ Pattern parsePattern(in DOMString pattern); }; }; From skip at mojam.com Sun Jan 28 22:40:28 2001 From: skip at mojam.com (Skip Montanaro) Date: Sun, 28 Jan 2001 15:40:28 -0600 (CST) Subject: [Python-Dev] What happened to Setup.local's functionality? Message-ID: <14964.37324.642566.602319@beluga.mojam.com> I just remembered Modules/Setup.local. I peeked at mine and noticed it had been zeroed out. I then copied a version of it over from another machine and reran make a couple times. Makesetup ran but nothing mentioned in Setup.local got built. I don't think 2.1 can be released without providing a way for users to recover from this change. I didn't see anything obvious in setup.py. Am I missing something? Skip From thomas at xs4all.net Mon Jan 29 01:39:04 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 29 Jan 2001 01:39:04 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.20,2.21 In-Reply-To: <20001104001415.A2093@53b.hoffleit.de>; from gregor@hoffleit.de on Sat, Nov 04, 2000 at 12:14:15AM +0100 References: <200010050142.SAA08326@slayer.i.sourceforge.net> <20001104001415.A2093@53b.hoffleit.de> Message-ID: <20010129013904.R962@xs4all.nl> On Sat, Nov 04, 2000 at 12:14:15AM +0100, Gregor Hoffleit wrote: > FYI: This misdefinition with LONG_BIT was due to a bug in glibc's limits.h. It > has been fixed in glibc 2.96. Do you mean gcc 2.96, or glibc 2.(1|2).96 ? Or is 2.96 some internal versioning for glibc that I was unaware of ? :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry at digicool.com Mon Jan 29 06:03:45 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 00:03:45 -0500 Subject: [Python-Dev] Function Hash: Check it in? References: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <14964.63921.966960.445548@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> Attached is an example Python session after I patched the MZ> intepreter. The test-suite passes all right. MZ> I want an OK to check this in. Moshe, please remove the func_hash() and func_compare() functions, and if the patch passes the test suite, go ahead and check it all in. Please also check in a test case. Thanks, -Barry From barry at digicool.com Mon Jan 29 06:04:12 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 00:04:12 -0500 Subject: [Python-Dev] Function Hash: Check it in? References: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <14964.63948.492662.775413@anthem.wooz.org> Oh yeah, please also add an entry to the NEWS file. Thanks, -Barry From moshez at zadka.site.co.il Mon Jan 29 07:26:25 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 08:26:25 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <14964.63948.492662.775413@anthem.wooz.org> References: <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 00:04:12 -0500, barry at digicool.com (Barry A. Warsaw) wrote: > Oh yeah, please also add an entry to the NEWS file. Done. The checkin to the NEWS file will be done in about a million years, when my antique of a modem finishes sending the data. I had to change test_opcodes since it tested that functions with the same code compare equal. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From gregor at mediasupervision.de Mon Jan 29 12:13:39 2001 From: gregor at mediasupervision.de (Gregor Hoffleit) Date: Mon, 29 Jan 2001 12:13:39 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.20,2.21 In-Reply-To: <20010129013904.R962@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 29, 2001 at 01:39:04AM +0100 References: <200010050142.SAA08326@slayer.i.sourceforge.net> <20001104001415.A2093@53b.hoffleit.de> <20010129013904.R962@xs4all.nl> Message-ID: <20010129121339.A1166@mediasupervision.de> On Mon, Jan 29, 2001 at 01:39:04AM +0100, Thomas Wouters wrote: > On Sat, Nov 04, 2000 at 12:14:15AM +0100, Gregor Hoffleit wrote: > > FYI: This misdefinition with LONG_BIT was due to a bug in glibc's limits.h. It > > has been fixed in glibc 2.96. > > Do you mean gcc 2.96, or glibc 2.(1|2).96 ? Or is 2.96 some internal > versioning for glibc that I was unaware of ? :) Sorry, it was fixed in glibc 2.1.96. Gregor From mal at lemburg.com Mon Jan 29 12:31:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 12:31:11 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> Message-ID: <3A75547F.A601E219@lemburg.com> Guido van Rossum wrote: > > > revision 1.34 > > date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 > > added rewritten normpath from Moshe Zadka that does the right thing with > > paths containing .. > [...] > > Revision 1.33 clearly leaves initial slashes untouched. > > I guess we should restore this... > > Yes, please! (Just the "leading extra slashes stay" behavior.) Checked in a patch which preserves '/' and '//' but converts more than 3 initial slashes into one (see Fredrik's note about POSIX standard on this). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 29 13:24:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 13:24:15 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> Message-ID: <3A7560EF.39D6CF@lemburg.com> Here the results of my micro benckmark pybench 0.7: PYBENCH 0.7 Benchmark: /home/lemburg/tmp/pybench-2.1a1.pyb (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 1102.30 ms 8.65 us +7.56% BuiltinMethodLookup: 966.75 ms 1.84 us +4.56% ConcatStrings: 1198.55 ms 7.99 us +11.63% ConcatUnicode: 1835.60 ms 12.24 us +19.29% CreateInstances: 1556.40 ms 37.06 us +2.49% CreateStringsWithConcat: 1396.70 ms 6.98 us +5.44% CreateUnicodeWithConcat: 1895.80 ms 9.48 us +31.61% DictCreation: 1760.50 ms 11.74 us +2.43% ForLoops: 1426.90 ms 142.69 us -7.51% IfThenElse: 1155.25 ms 1.71 us -6.24% ListSlicing: 555.40 ms 158.69 us -4.14% NestedForLoops: 784.55 ms 2.24 us -6.33% NormalClassAttribute: 1052.80 ms 1.75 us -10.42% NormalInstanceAttribute: 1053.80 ms 1.76 us +0.89% PythonFunctionCalls: 1127.50 ms 6.83 us +12.56% PythonMethodCalls: 909.10 ms 12.12 us +9.70% Recursion: 942.40 ms 75.39 us +23.74% SecondImport: 924.20 ms 36.97 us +3.98% SecondPackageImport: 951.10 ms 38.04 us +6.16% SecondSubmoduleImport: 1211.30 ms 48.45 us +7.69% SimpleComplexArithmetic: 1635.30 ms 7.43 us +5.58% SimpleDictManipulation: 963.35 ms 3.21 us -0.57% SimpleFloatArithmetic: 877.00 ms 1.59 us -2.92% SimpleIntFloatArithmetic: 851.10 ms 1.29 us -5.89% SimpleIntegerArithmetic: 850.05 ms 1.29 us -6.41% SimpleListManipulation: 1168.50 ms 4.33 us +8.14% SimpleLongArithmetic: 1231.15 ms 7.46 us +1.52% SmallLists: 2153.35 ms 8.44 us +10.77% SmallTuples: 1314.65 ms 5.48 us +3.80% SpecialClassAttribute: 1050.80 ms 1.75 us +1.48% SpecialInstanceAttribute: 1248.75 ms 2.08 us -2.32% StringMappings: 1702.60 ms 13.51 us +19.69% StringPredicates: 1024.25 ms 3.66 us -25.49% StringSlicing: 1093.35 ms 6.25 us +4.35% TryExcept: 1584.85 ms 1.06 us -10.90% TryRaiseExcept: 1239.50 ms 82.63 us +4.64% TupleSlicing: 983.00 ms 9.36 us +3.36% UnicodeMappings: 1631.65 ms 90.65 us +42.76% UnicodePredicates: 1762.10 ms 7.83 us +15.99% UnicodeProperties: 1410.80 ms 7.05 us +19.57% UnicodeSlicing: 1366.20 ms 7.81 us +19.23% ------------------------------------------------------------------------ Average round time: 58001.00 ms +3.30% *) measured against: /home/lemburg/tmp/pybench-2.0.pyb (rounds=10, warp=20) The benchmark is available here in case someone wants to verify the results on different platforms: http://www.lemburg.com/python/pybench-0.7.zip The above tests were done on a Linux 2.2 system, AMD K6 233MHz. The figures shown compare CVS Python (2.1a1) against stock Python 2.0. As you can see, Python function calls have suffered a lot for some reason. Unicode mappings and other Unicode database related methods show the effect of the compression of the Unicode database -- a clear space/speed tradeoff. I can't really explain why Unicode concatenation has had a slowdown -- perhaps the new coercion logic has something to do with this ?! On the nice side: attribute lookups are faster; probably due to the string key optimizations in the dictionary implementation. Loops and exceptions are also a tad faster. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Mon Jan 29 13:30:32 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 29 Jan 2001 13:30:32 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> Message-ID: <01fc01c089ef$48072230$0900a8c0@SPIFF> mal wrote: > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > Unicode mappings and other Unicode database related methods > show the effect of the compression of the Unicode database -- a > clear space/speed tradeoff. umm. the tests don't seem to test the "\N{name}" escapes, so the only thing that has changed in 2.1 is the "decomposition" method (used in the UnicodeProperties test). are you sure you're comparing against 2.0 final? Cheers /F From mal at lemburg.com Mon Jan 29 13:52:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 13:52:12 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> Message-ID: <3A75677C.E4FA82A0@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > > > Unicode mappings and other Unicode database related methods > > show the effect of the compression of the Unicode database -- a > > clear space/speed tradeoff. > > umm. the tests don't seem to test the "\N{name}" escapes, so the > only thing that has changed in 2.1 is the "decomposition" method > (used in the UnicodeProperties test). The mappings figure surprised me too: the code has not changed, but the unicodetype_db.h look different. Don't know how this affects performance though. The differences could also be explained by a increase in Unicode object creation time (the concatenation is also a lot slower), so perhaps that's where we should look... > are you sure you're comparing against 2.0 final? Yes... after a check of the Makefile I found that I had compiled Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this makes a difference w/r to inlining of code. I'll recompile and rerun the benchmark. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 29 13:56:49 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 07:56:49 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include Message-ID: [Ping] > dict[key] = 1 > if key in dict: ... > for key in dict: ... [Guido] > No chance of a time-machine escape, but I *can* say that I agree that > Ping's proposal makes a lot of sense. This is a reversal of my > previous opinion on this matter. (Take note -- those don't happen > very often! :-) > > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, Thomas since submitted a patch to do the "if key in dict" part (which I reviewed and accepted, pending resolution of doc issues). It does not do the "for key in dict" part. It's not entirely clear whether you intended to approve that part too (I've simplified away many layers of quoting in the above ). In any case, nobody is working on that part. WRT that part, Ping produced some stats in: http://mail.python.org/pipermail/python-dev/2001-January/012106.html > How often do you write 'dict.has_key(x)'? (std lib says: 206) > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > How often do you write 'x in dict.values()'? (std lib says: 0) > How often do you write 'for x in dict.values()'? (std lib says: 3) However, he did not report on occurrences of for k, v in dict.items() I'm not clear exactly which files he examined in the above, or how the counts were obtained. So I don't know how this compares: I counted 188 instances of the string ".items(" in 122 .py files, under the dist/ portion of current CVS. A number of those were assignment and return stmts, others were dict.items() in an arglist, and at least one was in a comment. After weeding those out, I was left with 153 legit "for" loops iterating over x.items(). In all: 153 iterating over x.items() 118 " over x.keys() 17 " over x.values() So I conclude that iterating over .values() is significantly more common than iterating over .keys(). On c.l.py about an hour ago, Thomas complained that two (out of two) of his coworkers guessed wrong about what for x in dict: would do, but didn't say what they *did* think it would do. Since Thomas doesn't work with idiots, I'm guessing they *didn't* guess it would iterate over either values or the lines of a freshly-opened file named "dict" . So if you did intend to approve "for x in dict" iterating over dict.keys(), maybe you want to call me out on that "approval post" I forged under your name. falls-on-swords-so-often-there's-nothing-left-to-puncture-ly y'rs - tim From mal at lemburg.com Mon Jan 29 14:18:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 14:18:52 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> Message-ID: <3A756DBC.8EAC42F5@lemburg.com> "M.-A. Lemburg" wrote: > > Fredrik Lundh wrote: > > > > mal wrote: > > > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > > > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > > > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > > > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > > > > > Unicode mappings and other Unicode database related methods > > > show the effect of the compression of the Unicode database -- a > > > clear space/speed tradeoff. > > > > umm. the tests don't seem to test the "\N{name}" escapes, so the > > only thing that has changed in 2.1 is the "decomposition" method > > (used in the UnicodeProperties test). > > The mappings figure surprised me too: the code has not changed, > but the unicodetype_db.h look different. Don't know how this > affects performance though. > > The differences could also be explained by a increase in Unicode > object creation time (the concatenation is also a lot slower), > so perhaps that's where we should look... > > > are you sure you're comparing against 2.0 final? > > Yes... after a check of the Makefile I found that I had compiled > Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this makes > a difference w/r to inlining of code. I'll recompile and rerun > the benchmark. Looks like there is an effect of choosing -O3 over -O2 (even though not necessarily positive all the way); what results do you get on Windows ? -- PYBENCH 0.7 Benchmark: /home/lemburg/tmp/pybench-2.1a1.pyb (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 1065.10 ms 8.35 us +3.93% BuiltinMethodLookup: 1286.30 ms 2.45 us +39.12% ConcatStrings: 1243.30 ms 8.29 us +15.80% ConcatUnicode: 1449.10 ms 9.66 us -5.83% CreateInstances: 1639.25 ms 39.03 us +7.95% CreateStringsWithConcat: 1453.45 ms 7.27 us +9.73% CreateUnicodeWithConcat: 1558.45 ms 7.79 us +8.19% DictCreation: 1869.35 ms 12.46 us +8.77% ForLoops: 1526.85 ms 152.69 us -1.03% IfThenElse: 1381.00 ms 2.05 us +12.09% ListSlicing: 547.40 ms 156.40 us -5.52% NestedForLoops: 824.50 ms 2.36 us -1.56% NormalClassAttribute: 1233.55 ms 2.06 us +4.96% NormalInstanceAttribute: 1215.50 ms 2.03 us +16.37% PythonFunctionCalls: 1107.30 ms 6.71 us +10.55% PythonMethodCalls: 1047.00 ms 13.96 us +26.34% Recursion: 940.35 ms 75.23 us +23.47% SecondImport: 894.05 ms 35.76 us +0.59% SecondPackageImport: 915.05 ms 36.60 us +2.14% SecondSubmoduleImport: 1131.10 ms 45.24 us +0.56% SimpleComplexArithmetic: 1652.05 ms 7.51 us +6.67% SimpleDictManipulation: 1150.25 ms 3.83 us +18.72% SimpleFloatArithmetic: 889.65 ms 1.62 us -1.52% SimpleIntFloatArithmetic: 900.80 ms 1.36 us -0.40% SimpleIntegerArithmetic: 901.75 ms 1.37 us -0.72% SimpleListManipulation: 1125.40 ms 4.17 us +4.15% SimpleLongArithmetic: 1305.15 ms 7.91 us +7.62% SmallLists: 2102.85 ms 8.25 us +8.18% SmallTuples: 1329.55 ms 5.54 us +4.98% SpecialClassAttribute: 1234.60 ms 2.06 us +19.23% SpecialInstanceAttribute: 1422.55 ms 2.37 us +11.28% StringMappings: 1585.55 ms 12.58 us +11.46% StringPredicates: 1241.35 ms 4.43 us -9.69% StringSlicing: 1206.20 ms 6.89 us +15.12% TryExcept: 1764.35 ms 1.18 us -0.81% TryRaiseExcept: 1217.40 ms 81.16 us +2.77% TupleSlicing: 933.00 ms 8.89 us -1.90% UnicodeMappings: 1137.35 ms 63.19 us -0.49% UnicodePredicates: 1632.05 ms 7.25 us +7.43% UnicodeProperties: 1244.05 ms 6.22 us +5.44% UnicodeSlicing: 1252.10 ms 7.15 us +9.27% ------------------------------------------------------------------------ Average round time: 58804.00 ms +4.73% *) measured against: /home/lemburg/tmp/pybench-2.0.pyb (rounds=10, warp=20) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 29 14:28:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 14:28:24 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A756FF8.B7185FA2@lemburg.com> Tim Peters wrote: > > [Ping] > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > [Guido] > > No chance of a time-machine escape, but I *can* say that I agree that > > Ping's proposal makes a lot of sense. This is a reversal of my > > previous opinion on this matter. (Take note -- those don't happen > > very often! :-) > > > > First to submit a working patch gets a free copy of 2.1a2 and > > subsequent releases, > > Thomas since submitted a patch to do the "if key in dict" part (which I > reviewed and accepted, pending resolution of doc issues). > > It does not do the "for key in dict" part. It's not entirely clear whether > you intended to approve that part too (I've simplified away many layers of > quoting in the above ). In any case, nobody is working on that part. > > WRT that part, Ping produced some stats in: > > http://mail.python.org/pipermail/python-dev/2001-January/012106.html > > > How often do you write 'dict.has_key(x)'? (std lib says: 206) > > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > > > How often do you write 'x in dict.values()'? (std lib says: 0) > > How often do you write 'for x in dict.values()'? (std lib says: 3) > > However, he did not report on occurrences of > > for k, v in dict.items() > > I'm not clear exactly which files he examined in the above, or how the > counts were obtained. So I don't know how this compares: I counted 188 > instances of the string ".items(" in 122 .py files, under the dist/ portion > of current CVS. A number of those were assignment and return stmts, others > were dict.items() in an arglist, and at least one was in a comment. After > weeding those out, I was left with 153 legit "for" loops iterating over > x.items(). In all: > > 153 iterating over x.items() > 118 " over x.keys() > 17 " over x.values() > > So I conclude that iterating over .values() is significantly more common > than iterating over .keys(). > > On c.l.py about an hour ago, Thomas complained that two (out of two) of his > coworkers guessed wrong about what > > for x in dict: > > would do, but didn't say what they *did* think it would do. Since Thomas > doesn't work with idiots, I'm guessing they *didn't* guess it would iterate > over either values or the lines of a freshly-opened file named "dict" > . > > So if you did intend to approve "for x in dict" iterating over dict.keys(), > maybe you want to call me out on that "approval post" I forged under your > name. Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect... Please also take into account that dictionaries are *mutable* and their internal state is not defined to e.g. not change due to lookups (take the string optimization for example...), so exposing PyDict_Next() in any to Python will cause trouble. In the end, you will need to create a list or tuple to iterate over one way or another, so why bother overloading for-loops w/r to dictionaries ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bckfnn at worldonline.dk Mon Jan 29 14:48:44 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Mon, 29 Jan 2001 13:48:44 GMT Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> References: <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> Message-ID: <3a75747e.17414620@smtp.worldonline.dk> On Mon, 29 Jan 2001 08:26:25 +0200 (IST), you wrote: >I had to change test_opcodes since it tested that functions with the >same code compare equal. Thanks. With this change, Jython too can complete the test_opcodes. In Jython a code object can never compare equal to anything but itself. regards, finn From moshez at zadka.site.co.il Mon Jan 29 15:04:47 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 16:04:47 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <3a75747e.17414620@smtp.worldonline.dk> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> Message-ID: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 13:48:44 GMT, bckfnn at worldonline.dk (Finn Bock) wrote: > Thanks. With this change, Jython too can complete the test_opcodes. In > Jython a code object can never compare equal to anything but itself. Great! I'm happy to have helped. I'm starting to wonder what the tests really test: the language definition, or accidents of the implementation? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From MarkH at ActiveState.com Mon Jan 29 15:35:25 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 30 Jan 2001 01:35:25 +1100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A756DBC.8EAC42F5@lemburg.com> Message-ID: "M.-A. Lemburg" wrote: > what results do you get on Windows ? Win2k, dual 800, relatively quiet! Python 2.0 F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.847605 This machine benchmarks at 11798 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.845104 This machine benchmarks at 11832.9 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.846069 This machine benchmarks at 11819.4 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.849447 This machine benchmarks at 11772.4 pystones/second Python from CVS today: F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.885801 This machine benchmarks at 11289.2 pystones/second F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.889048 This machine benchmarks at 11248 pystones/second F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.892422 This machine benchmarks at 11205.5 pystones/second Although I deleted Tim's earlier mail, from memory this is pretty similar in terms of performance lost. I'm afraid I have no idea what your benchmarks are or how to build them , but did check that the optimizer is set for "mazimize for speed" (/O2). Other compiler options gave significantly smaller results (no optimizations around 8500, and "optimize for space" (/O1) at around 10000). Other fiddling with the optimizer couldn't get better results than the existing settings. Mark. From guido at digicool.com Mon Jan 29 15:48:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 09:48:22 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 07:56:49 EST." References: Message-ID: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> > [Ping] > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > [Guido] > > No chance of a time-machine escape, but I *can* say that I agree that > > Ping's proposal makes a lot of sense. This is a reversal of my > > previous opinion on this matter. (Take note -- those don't happen > > very often! :-) > > > > First to submit a working patch gets a free copy of 2.1a2 and > > subsequent releases, > > Thomas since submitted a patch to do the "if key in dict" part (which I > reviewed and accepted, pending resolution of doc issues). > > It does not do the "for key in dict" part. It's not entirely clear whether > you intended to approve that part too (I've simplified away many layers of > quoting in the above ). In any case, nobody is working on that part. > > WRT that part, Ping produced some stats in: > > http://mail.python.org/pipermail/python-dev/2001-January/012106.html > > > How often do you write 'dict.has_key(x)'? (std lib says: 206) > > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > > > How often do you write 'x in dict.values()'? (std lib says: 0) > > How often do you write 'for x in dict.values()'? (std lib says: 3) > > However, he did not report on occurrences of > > for k, v in dict.items() > > I'm not clear exactly which files he examined in the above, or how the > counts were obtained. So I don't know how this compares: I counted 188 > instances of the string ".items(" in 122 .py files, under the dist/ portion > of current CVS. A number of those were assignment and return stmts, others > were dict.items() in an arglist, and at least one was in a comment. After > weeding those out, I was left with 153 legit "for" loops iterating over > x.items(). In all: > > 153 iterating over x.items() > 118 " over x.keys() > 17 " over x.values() > > So I conclude that iterating over .values() is significantly more common > than iterating over .keys(). I did a less sophisticated count but come to the same conclusion: iterations over items() are (somewhat) more common than over keys(), and values() are 1-2 orders of magnitude less common. My numbers: $ cd python/src/Lib $ grep 'for .*items():' *.py | wc -l 47 $ grep 'for .*keys():' *.py | wc -l 43 $ grep 'for .*values():' *.py | wc -l 2 > On c.l.py about an hour ago, Thomas complained that two (out of two) of his > coworkers guessed wrong about what > > for x in dict: > > would do, but didn't say what they *did* think it would do. Since Thomas > doesn't work with idiots, I'm guessing they *didn't* guess it would iterate > over either values or the lines of a freshly-opened file named "dict" > . I don't much value to the readability argument: typically, one will write "for key in dict" or "for name in dict" and then it's obvious what is meant. > So if you did intend to approve "for x in dict" iterating over dict.keys(), > maybe you want to call me out on that "approval post" I forged under your > name. But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody has even asked me for a has_item() method). I can live with "x in list" checking the values and "x in dict" checking the keys. But I can *not* live with "x in dict" equivalent to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()". I also think that defining "x in dict" but not "for x in dict" will be confusing. So we need to think more. How about: for key in dict: ... # ... over keys for key:value in dict: ... # ... over items This is syntactically unambiguous (a colon is currently illegal in that position). This also suggests: for index:value in list: ... # ... over zip(range(len(list), list) while doesn't strike me as bad or ugly, and would fulfill my brother's dearest wish. (And why didn't we think of this before?) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Mon Jan 29 15:58:16 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 29 Jan 2001 15:58:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291448.JAA11473@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:48:22AM -0500 References: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: <20010129155816.T962@xs4all.nl> On Mon, Jan 29, 2001 at 09:48:22AM -0500, Guido van Rossum wrote: > How about: > for key in dict: ... # ... over keys > for key:value in dict: ... # ... over items > This is syntactically unambiguous (a colon is currently illegal in > that position). I won't comment on the syntax right now, I need to look at it for a while first :-) However, what about MAL's point about dict ordering, internally ? Wouldn't FOR_LOOP be forced to generate a list of keys anyway, to avoid skipping keys ? I know currently the dict implementation doesn't do any reordering except during adds/deletes, but there is nothing in the language ref that supports that -- it's an implementation detail. Would we make a future enhancement where (some form of) gc would 'clean up' large dictionaries impossible ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jan 29 16:00:38 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:00:38 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 14:28:24 +0100." <3A756FF8.B7185FA2@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> Message-ID: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> > Dictionaries are not sequences. I wonder what order a user of > for k,v in dict: (or whatever other of this proposal you choose) > will expect... The same order that for k,v in dict.items() will yield, of course. > Please also take into account that dictionaries are *mutable* > and their internal state is not defined to e.g. not change due to > lookups (take the string optimization for example...), so exposing > PyDict_Next() in any to Python will cause trouble. In the end, > you will need to create a list or tuple to iterate over one way > or another, so why bother overloading for-loops w/r to dictionaries ? Actually, I was going to propose to play dangerously here: the for k:v in dict: ... syntax I proposed in my previous message should indeed expose PyDict_Next(). It should be a big speed-up, and I'm expecting (though don't have much proof) that most loops over dicts don't mutate the dict. Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 29 16:30:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:30:17 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: Your message of "Mon, 29 Jan 2001 16:04:47 +0200." <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <200101291530.KAA12037@cj20424-a.reston1.va.home.com> > I'm starting to wonder what the tests really test: the language definition, > or accidents of the implementation? It's good to test conformance to the language definition, but this is also a regression test for the implementation. The "accidents of the implementation" definitely need to be tested. E.g. if we decide that repr(s) uses \n rather than \012 or \x0a, this should be tested too. The language definition gives the implementer a choice here; but once the implementer has made a choice, it's good to have a test that tests that this choice is implemented correctly. Perhaps there should be several parts to the regression test, e.g. language conformance, library conformance, platform-specific features, and implementation conformance? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 29 16:57:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:57:12 -0500 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: Your message of "Sun, 28 Jan 2001 15:40:28 CST." <14964.37324.642566.602319@beluga.mojam.com> References: <14964.37324.642566.602319@beluga.mojam.com> Message-ID: <200101291557.KAA12347@cj20424-a.reston1.va.home.com> > I just remembered Modules/Setup.local. I peeked at mine and noticed it had > been zeroed out. I then copied a version of it over from another machine > and reran make a couple times. Makesetup ran but nothing mentioned in > Setup.local got built. > > I don't think 2.1 can be released without providing a way for users to > recover from this change. I didn't see anything obvious in setup.py. Am I > missing something? Well, Module/Setup is still used, so it should be trivial to add Setup.local back too. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Mon Jan 29 10:23:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 29 Jan 2001 01:23:55 -0800 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <14964.37324.642566.602319@beluga.mojam.com>; from skip@mojam.com on Sun, Jan 28, 2001 at 03:40:28PM -0600 References: <14964.37324.642566.602319@beluga.mojam.com> Message-ID: <20010129012355.A14763@glacier.fnational.com> On Sun, Jan 28, 2001 at 03:40:28PM -0600, Skip Montanaro wrote: > Makesetup ran but nothing mentioned in Setup.local got built. I believe Setup.local should still work. One possibility is that the modules in Setup.local were marked as shared. Shared modules from Setup* don't get build by default. You have to do "make oldsharedmods". I'm not sure why oldsharedmods is not included in the all target. Andrew, can you think of any reason why it shouldn't be added. Neil From dgoodger at atsautomation.com Mon Jan 29 17:19:12 2001 From: dgoodger at atsautomation.com (Goodger, David) Date: Mon, 29 Jan 2001 11:19:12 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: Marc-Andre Lemburg's patch to posixpath.py clears up the path problem. Thanks! MACHDEP is qnxJ for QNX 4.25, qnxG for QNX 4.23. I don't know what it is for QNX 6 (Neutrino). Perhaps test for MACHDEP[:3]=='qnx'? I'm still stuck at 'python setup.py build': unable to execute ld: no such file or directory running build running build_ext building 'struct' extension skipping //5/tmp/py/Python-2.1a1/Modules/structmodule.c (build/temp.qnx-J-PCI-2.1/structmodule.o up-to-date) ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o build/lib.qnx-J-PCI-2.1/struct.so error: command 'ld' failed with exit status 1 make: *** [sharedmods] Error 1 Armin Steinhoff said "QNX4 doesn't support dynamic loading". Is this compatible with distutils? If not, is there a workaround? Neil Schemenauer asked, "what should LDSHARED say for QNX?". I don't know. Python 2.0 compiled OK, and its makefile says LDSHARED=ld. However, Modules/Setup has no uncommented "*shared*" line. Those of us who rely on Python to get our work done, and who don't have the bandwidth for the implementation complexities, owe a lot to everyone who makes it possible to compile Python out-of-the-box. Very much appreciated. Thank you! David Goodger Systems Administrator & Programmer, Advanced Systems Automation Tooling Systems Inc., Automation Systems Division direct: (519) 653-4483 ext. 7121 fax: (519) 650-6695 e-mail: dgoodger at atsautomation.com From nas at arctrix.com Mon Jan 29 10:40:07 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 29 Jan 2001 01:40:07 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Mon, Jan 29, 2001 at 11:19:12AM -0500 References: Message-ID: <20010129014007.C14763@glacier.fnational.com> On Mon, Jan 29, 2001 at 11:19:12AM -0500, Goodger, David wrote: > I'm still stuck at 'python setup.py build': ... > Armin Steinhoff said "QNX4 doesn't support dynamic loading". Is this > compatible with distutils? If not, is there a workaround? The setup.py script only builds shared modules. Your going to have to enable modules using the old Setup file. I think Setup.dist should got back to including all the modules (commented out of course). This would make it easier to people who can't or don't want to build shared modules. Neil From akuchlin at cnri.reston.va.us Mon Jan 29 17:50:31 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 29 Jan 2001 11:50:31 -0500 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <20010129012355.A14763@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 29, 2001 at 01:23:55AM -0800 References: <14964.37324.642566.602319@beluga.mojam.com> <20010129012355.A14763@glacier.fnational.com> Message-ID: <20010129115031.B4018@amarok.cnri.reston.va.us> On Mon, Jan 29, 2001 at 01:23:55AM -0800, Neil Schemenauer wrote: >from Setup* don't get build by default. You have to do "make >oldsharedmods". I'm not sure why oldsharedmods is not included >in the all target. Andrew, can you think of any reason why it >shouldn't be added. That's an excellent idea, particularly if we add back Setup.dist, too, and comment out all but the required modules. I'll try to do that today. Note that I'm leaving on vacation tomorrow, and will be back next Monday. Everyone, feel free to check in changes to setup.py that are required. --amk From jeremy at alum.mit.edu Mon Jan 29 17:48:11 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 11:48:11 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A75677C.E4FA82A0@lemburg.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> Message-ID: <14965.40651.233438.311104@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> Yes... after a check of the Makefile I found that I had MAL> compiled Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this MAL> makes a difference w/r to inlining of code. I'll recompile and MAL> rerun the benchmark. When I was working in the CALL_FUNCTION revision, I compared 2.0 final with my development working using -O3. At that time, I saw no significant performance difference between the two. And I did notice a difference between -O2 and -O3. The strange thing is that I notice a difference between -O2 and -O3 with 2.1a1, but in the opposite direction. On pystone, python -O2 runs consistently faster than -O3; the difference is .05 sec on my machine. Jeremy From esr at thyrsus.com Mon Jan 29 18:12:05 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 12:12:05 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.40651.233438.311104@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 29, 2001 at 11:48:11AM -0500 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> Message-ID: <20010129121205.A8337@thyrsus.com> Jeremy Hylton : > The strange thing is that I notice a difference between -O2 and -O3 > with 2.1a1, but in the opposite direction. On pystone, python -O2 > runs consistently faster than -O3; the difference is .05 sec on my > machine. Bizarre. Make me wonder if we have a C compiler problem. -- Eric S. Raymond In every country and in every age, the priest has been hostile to liberty. He is always in alliance with the despot, abetting his abuses in return for protection to his own. -- Thomas Jefferson, 1814 From jeremy at alum.mit.edu Mon Jan 29 18:27:08 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 12:27:08 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <20010129121205.A8337@thyrsus.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> <20010129121205.A8337@thyrsus.com> Message-ID: <14965.42988.362288.154254@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Jeremy Hylton : >> The strange thing is that I notice a difference between -O2 and >> -O3 with 2.1a1, but in the opposite direction. On pystone, >> python -O2 runs consistently faster than -O3; the difference is >> .05 sec on my machine. ESR> Bizarre. Make me wonder if we have a C compiler problem. Depends on your defintion of "compiler problem" . If you mean, it compiles our code so it runs slower, then, yes, we've got one :-). One of the differences between -O2 and -O3, according to the man page, is that -O3 will perform optimizations that involve a space-speed tradeoff. It also include -finline-functions. I can imagine that some of these optimizations hurt memory performance enough to make a difference. not-really-understanding-but-not-really-expecting-too-ly y'rs, Jeremy From jeremy at alum.mit.edu Mon Jan 29 18:39:05 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 12:39:05 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.40651.233438.311104@localhost.localdomain> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> Message-ID: <14965.43705.367236.994786@localhost.localdomain> The recursion test in pybench is testing the performance of the nested scopes changes, which must do some extra bookkeeping to reference the recursive function in a nested scope. To some extent, a performance hit is a necessary consequence for nested functions with free variables. Nonetheless, there are two interesting things to say about this situation. First, there is a bug in the current implementation of nested scopes that the benchmark tickles. The problem is with code like this: def outer(): global f def f(x): if x > 0: return f(x - 1) The compiler determines that f is free in f. (It's recursive.) If f is free in f, in the absence of the global decl, the body of outer must allocate fresh storage (a cell) for f each time outer is called and add a reference to that cell to f's closure. If f is declared global in outer, then it ought to be treated as a global in nested scopes, too. In general terms, a free variable should use the binding found in the nearest enclosing scope. If the nearest enclosing scope has a global binding, then the reference is global. If I fix this problem, the recursion benchmark shouldn't be any slower than a normal function call. The second interesting thing to say is that frame allocation and dealloc is probably more expensive than it needs to be in the current implementation. The frame object has a new f_closure slot that holds a tuple that is freshly allocated every time the frame is allocated. (Unless the closure is empty, then f_closure is just NULL.) The extra tuple allocation can probably be done away with by using the same allocation strategy as locals & stack. If the f_localsplus array holds cells + frees + locals + stack, then a new frame will never require more than a single malloc (and often not even that). Jeremy From akuchlin at cnri.reston.va.us Mon Jan 29 18:54:37 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 29 Jan 2001 12:54:37 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.42988.362288.154254@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 29, 2001 at 12:27:08PM -0500 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> <20010129121205.A8337@thyrsus.com> <14965.42988.362288.154254@localhost.localdomain> Message-ID: <20010129125437.E4018@amarok.cnri.reston.va.us> On Mon, Jan 29, 2001 at 12:27:08PM -0500, Jeremy Hylton wrote: >Depends on your defintion of "compiler problem" . If you mean, >it compiles our code so it runs slower, then, yes, we've got one :-). Compiling with gcc and -g, with no optimization, 2.0 and 2.1cvs seem to be very close, with 2.1 slightly slower: 2.0: Pystone(1.1) time for 10000 passes = 1.04 This machine benchmarks at 9615.38 pystones/second This machine benchmarks at 9345.79 pystones/second This machine benchmarks at 9433.96 pystones/second This machine benchmarks at 9433.96 pystones/second This machine benchmarks at 9523.81 pystones/second 2.1cvs: Pystone(1.1) time for 10000 passes = 1.09 This machine benchmarks at 9174.31 pystones/second This machine benchmarks at 9090.91 pystones/second This machine benchmarks at 9259.26 pystones/second This machine benchmarks at 9174.31 pystones/second This machine benchmarks at 9090.91 pystones/second Would it be worth experimenting with platform-specific compiler options to try to squeeze out the last bit of performance (can wait for the betas, probably). --amk From jeremy at alum.mit.edu Mon Jan 29 19:04:28 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 13:04:28 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A756DBC.8EAC42F5@lemburg.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <3A756DBC.8EAC42F5@lemburg.com> Message-ID: <14965.45228.197778.579989@localhost.localdomain> I hope another set of benchmarks isn't overkill for the list. I see different results comparing 2.1 with 2.0 (both -O3) using pybench 0.6. The interesting differences I see in this benchmark that I didn't see in MAL's are: DictCreation +15.87% SeoncdImport +20.29% Other curious differences, which show up in both benchmarks, include: SpecialClassAttribute +17.91% (private variables) SpecialInstanceAttribute +15.34% (__methods__) Jeremy PYBENCH 0.6 Benchmark: py21 (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 305.05 ms 2.39 us +4.77% BuiltinMethodLookup: 319.65 ms 0.61 us +2.55% ConcatStrings: 383.70 ms 2.56 us +1.27% CreateInstances: 463.85 ms 11.04 us +1.96% CreateStringsWithConcat: 381.20 ms 1.91 us +2.39% DictCreation: 508.85 ms 3.39 us +15.87% ForLoops: 577.60 ms 57.76 us +5.65% IfThenElse: 443.70 ms 0.66 us +1.02% ListSlicing: 207.50 ms 59.29 us -4.18% NestedForLoops: 315.75 ms 0.90 us +3.54% NormalClassAttribute: 379.80 ms 0.63 us +7.39% NormalInstanceAttribute: 385.45 ms 0.64 us +8.04% PythonFunctionCalls: 400.00 ms 2.42 us +13.62% PythonMethodCalls: 306.25 ms 4.08 us +5.13% Recursion: 337.25 ms 26.98 us +19.00% SecondImport: 301.20 ms 12.05 us +20.29% SecondPackageImport: 298.20 ms 11.93 us +18.15% SecondSubmoduleImport: 339.15 ms 13.57 us +11.40% SimpleComplexArithmetic: 392.70 ms 1.79 us -10.52% SimpleDictManipulation: 350.40 ms 1.17 us +3.87% SimpleFloatArithmetic: 300.75 ms 0.55 us +2.04% SimpleIntFloatArithmetic: 347.95 ms 0.53 us +9.01% SimpleIntegerArithmetic: 356.40 ms 0.54 us +12.01% SimpleListManipulation: 351.85 ms 1.30 us +11.33% SimpleLongArithmetic: 309.00 ms 1.87 us -5.81% SmallLists: 584.25 ms 2.29 us +10.20% SmallTuples: 442.00 ms 1.84 us +10.33% SpecialClassAttribute: 406.50 ms 0.68 us +17.91% SpecialInstanceAttribute: 557.40 ms 0.93 us +15.34% StringSlicing: 336.45 ms 1.92 us +9.56% TryExcept: 650.60 ms 0.43 us +1.40% TryRaiseExcept: 345.95 ms 23.06 us +2.70% TupleSlicing: 266.35 ms 2.54 us +4.70% ------------------------------------------------------------------------ Average round time: 14413.00 ms +7.07% *) measured against: py20 (rounds=10, warp=20) From skip at mojam.com Mon Jan 29 19:07:26 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 29 Jan 2001 12:07:26 -0600 (CST) Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <20010129012355.A14763@glacier.fnational.com> References: <14964.37324.642566.602319@beluga.mojam.com> <20010129012355.A14763@glacier.fnational.com> Message-ID: <14965.45406.933528.53857@beluga.mojam.com> Neil> You have to do "make oldsharedmods". This did the trick. This should be emblazoned in big red letters somewhere if the decision is made to not include oldsharedmods as a dependency for the all target. Thx, Skip From gvwilson at ca.baltimore.com Mon Jan 29 19:19:21 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Mon, 29 Jan 2001 13:19:21 -0500 Subject: [Python-Dev] Re: Re: Sets: elt in dict, lst.include In-Reply-To: <20010129162012.32158ED49@mail.python.org> Message-ID: <001501c08a20$00dca2a0$770a0a0a@nevex.com> > > > [Ping] > > > dict[key] = 1 > > > if key in dict: ... > > > for key in dict: ... > "Tim Peters" > "if (k, v) in dict" is clearly useless... > I can live with "x in list" checking the values and "x in dict" > checking the keys. But I can *not* live with "x in dict" equivalent > to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()". > I also think that defining "x in dict" but not "for x in dict" will be > confusing. [Greg] Quick poll (four people): if the expression "if a in b" works, then all four expected "for a in b" to work as well. This is also my intuition; are there any exceptions in really existing Python? > [Guido] > for key in dict: ... # ... over keys > for key:value in dict: ... # ... over items [Greg] I'm probably revealing my ignorance of Python's internals here, but can the iteration protocol be extended so that the object (in this case, the dict) is told the number and type(s) of the values the loop is expecting? With: for key in dict: ... the dict would be asked for one value; with: for (key, value) in dict: the dict would be told that a two-element tuple was expected, and so on. This would allow multi-dimensional structures (e.g. NumPy arrays) to do things like: for (i, j, k) in array: # please give me three indices and: for ((i, j, k), v) in array: # three indices and value > [Guido] > for index:value in list: ... # ... over zip(range(len(list), list) How do you feel about: for i in seq.keys(): # strings, tuples, etc. "keys()" is kind of strange ("indices" or something would be more natural), *but* this allows uniform iteration over all built-in collections: def showem(c): for i in c.keys(): print i, c[i] Greg From bckfnn at worldonline.dk Mon Jan 29 19:31:48 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Mon, 29 Jan 2001 18:31:48 GMT Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <3a75aba9.31537178@smtp.worldonline.dk> On Mon, 29 Jan 2001 16:04:47 +0200 (IST), you wrote: >On Mon, 29 Jan 2001 13:48:44 GMT, bckfnn at worldonline.dk (Finn Bock) wrote: > >> Thanks. With this change, Jython too can complete the test_opcodes. In >> Jython a code object can never compare equal to anything but itself. > >Great! I'm happy to have helped. >I'm starting to wonder what the tests really test: the language definition, >or accidents of the implementation? Based on the amount of code in test_opcodes dedicated to code comparison, I doubt this particular situation was an accident. The problems I have had with the test suite are better described as accidents of the tests themself. From test_extcall: We expected (repr): "g() got multiple values for keyword argument 'b'" But instead we got: "g() got multiple values for keyword argument 'a'" This is caused by a difference in iteration over a dictionary. Or from test_import: test test_import crashed -- java.lang.ClassFormatError: java.lang.ClassFormatError: @test$py (Illegal Class name "@test$py") where '@' isn't allowed in java classnames. These are failures that have very little to do with the thing the test are about and nothing at all to do with the language definition. regards, finn From cgw at alum.mit.edu Mon Jan 29 19:35:58 2001 From: cgw at alum.mit.edu (Charles G Waldman) Date: Mon, 29 Jan 2001 12:35:58 -0600 (CST) Subject: [Python-Dev] Re: Re: Sets: elt in dict, lst.include In-Reply-To: <001501c08a20$00dca2a0$770a0a0a@nevex.com> References: <20010129162012.32158ED49@mail.python.org> <001501c08a20$00dca2a0$770a0a0a@nevex.com> Message-ID: <14965.47118.135246.700571@sirius.net.home> Greg Wilson writes: > This would allow multi-dimensional structures > (e.g. NumPy arrays) to do things like: > > for (i, j, k) in array: # please give me three indices > > and: > > for ((i, j, k), v) in array: # three indices and value And what if I had, for example, a 3-dimensional array where the values are 3-tuples? Would "for (i,j,k) in array" refer to the indices or the values? From mal at lemburg.com Mon Jan 29 20:03:41 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 20:03:41 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3A75BE8D.1B7673EE@lemburg.com> With all this confusion about how to actually write the iteration on dictionary items, wouldn't it make more sense to implement an extension module which then provides a __getitem__ style iterator for dictionaries by interfacing to PyDict_Next() ? The module could have three different iterators: 1. iterate over items 2. ... over keys 3. ... over values The reasoning behind this is that the __getitem__ interface is well established and this doesn't introduce any new syntax while still providing speed and flexibility. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 29 19:08:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 19:08:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3A75B190.3FD2A883@lemburg.com> Guido van Rossum wrote: > > > Dictionaries are not sequences. I wonder what order a user of > > for k,v in dict: (or whatever other of this proposal you choose) > > will expect... > > The same order that for k,v in dict.items() will yield, of course. And then people find out that the order has some sorting properties and start to use it... "how to sort a dictionary?" comes up again, every now and then. > > Please also take into account that dictionaries are *mutable* > > and their internal state is not defined to e.g. not change due to > > lookups (take the string optimization for example...), so exposing > > PyDict_Next() in any to Python will cause trouble. In the end, > > you will need to create a list or tuple to iterate over one way > > or another, so why bother overloading for-loops w/r to dictionaries ? > > Actually, I was going to propose to play dangerously here: the > > for k:v in dict: ... > > syntax I proposed in my previous message should indeed expose > PyDict_Next(). It should be a big speed-up, and I'm expecting (though > don't have much proof) that most loops over dicts don't mutate the > dict. > > Maybe we could add a flag to the dict that issues an error when a new > key is inserted during such a for loop? (I don't think the key order > can be affected when a key is *deleted*.) You mean: mark it read-only ? That would be a "nice to have" property for a lot of mutable types indeed -- sort of like low-level locks. This would be another candidate for an object flag (much like the one Fred wants to introduce for weak referenced objects). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Mon Jan 29 20:22:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 14:22:07 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 19:08:16 +0100." <3A75B190.3FD2A883@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> Message-ID: <200101291922.OAA13321@cj20424-a.reston1.va.home.com> > > > Dictionaries are not sequences. I wonder what order a user of > > > for k,v in dict: (or whatever other of this proposal you choose) > > > will expect... > > > > The same order that for k,v in dict.items() will yield, of course. > > And then people find out that the order has some sorting > properties and start to use it... "how to sort a dictionary?" > comes up again, every now and then. I don't understand why you bring this up. We're not revealing anything new here, the random order of dict items has always been part of the language. The answer to "how to sort a dict" should be "copy it into a list and sort that." Or am I missing something? > > > Please also take into account that dictionaries are *mutable* > > > and their internal state is not defined to e.g. not change due to > > > lookups (take the string optimization for example...), so exposing > > > PyDict_Next() in any to Python will cause trouble. In the end, > > > you will need to create a list or tuple to iterate over one way > > > or another, so why bother overloading for-loops w/r to dictionaries ? > > > > Actually, I was going to propose to play dangerously here: the > > > > for k:v in dict: ... > > > > syntax I proposed in my previous message should indeed expose > > PyDict_Next(). It should be a big speed-up, and I'm expecting (though > > don't have much proof) that most loops over dicts don't mutate the > > dict. > > > > Maybe we could add a flag to the dict that issues an error when a new > > key is inserted during such a for loop? (I don't think the key order > > can be affected when a key is *deleted*.) > > You mean: mark it read-only ? That would be a "nice to have" > property for a lot of mutable types indeed -- sort of like > low-level locks. This would be another candidate for an object flag > (much like the one Fred wants to introduce for weak referenced > objects). Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From gvwilson at ca.baltimore.com Mon Jan 29 20:38:50 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Mon, 29 Jan 2001 14:38:50 -0500 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1124 - 13 msgs In-Reply-To: <20010129193101.7BF83EF62@mail.python.org> Message-ID: <001a01c08a2b$1ba5a040$770a0a0a@nevex.com> > Greg Wilson writes: > > This would allow multi-dimensional structures > > (e.g. NumPy arrays) to do things like: > > for (i, j, k) in array: > > and: > > for ((i, j, k), v) in array: # three indices and value > Charles Waldman asks: > And what if I had, for example, a 3-dimensional array where the values > are 3-tuples? Would "for (i,j,k) in array" refer to the > indices or the values? Greg Wilson writes: That would be up to the module's implementer --- my idea was to have the 'for' loop provide more information to the object being iterated over, so that it could "do the right thing" (just as objects do right now with "x[i]"). Greg From mal at lemburg.com Mon Jan 29 20:45:46 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 20:45:46 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> Message-ID: <3A75C86A.3A4236E8@lemburg.com> Guido van Rossum wrote: > > > > > Dictionaries are not sequences. I wonder what order a user of > > > > for k,v in dict: (or whatever other of this proposal you choose) > > > > will expect... > > > > > > The same order that for k,v in dict.items() will yield, of course. > > > > And then people find out that the order has some sorting > > properties and start to use it... "how to sort a dictionary?" > > comes up again, every now and then. > > I don't understand why you bring this up. We're not revealing > anything new here, the random order of dict items has always been part > of the language. The answer to "how to sort a dict" should be "copy > it into a list and sort that." > > Or am I missing something? I just wanted to hint at a problem which iterating over items in an unordered set can cause. Especially new Python users will find it confusing that the order of the items in an iteration can change from one run to the next. Not much of an argument, but I like explicit programming more than magic under the cover. What we really want is iterators for dictionaries, so why not implement these instead of tweaking for-loops. If you are looking for speedups w/r to for-loops, applying a different indexing technique in for-loops would go a lot further and provide better performance not only to dictionary loops, but also to other sequences. I have made some good experience with a special counter object (sort of like a mutable integer) which is used instead of the iteration index integer in the current implementation. Using an iterator object instead of the integer + __getitem__ call machinery would allow more flexibility for all kinds of sequences or containers. There could be an iterator type for dictionaries, one for generic __getitem__ style sequences, one for lists and tuples, etc. All of these could include special logic to get the most out of the targetted datatype. Well, just a thought... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Mon Jan 29 21:02:47 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 15:02:47 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291922.OAA13321@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 02:22:07PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> Message-ID: <20010129150247.B10191@thyrsus.com> Guido van Rossum : > > > Maybe we could add a flag to the dict that issues an error when a new > > > key is inserted during such a for loop? (I don't think the key order > > > can be affected when a key is *deleted*.) > > > > You mean: mark it read-only ? That would be a "nice to have" > > property for a lot of mutable types indeed -- sort of like > > low-level locks. This would be another candidate for an object flag > > (much like the one Fred wants to introduce for weak referenced > > objects). > > Yes. For different reasons, I'd like to be able to set a constant flag on a object instance. Simple semantics: if you try to assign to a member or method, it throws an exception. Application? I have a large Python program that goes to a lot of effort to build elaborate context structures in core. It would be nice to know they can't be even inadvertently trashed without throwing an exception I can watch for. -- Eric S. Raymond No one is bound to obey an unconstitutional law and no courts are bound to enforce it. -- 16 Am. Jur. Sec. 177 late 2d, Sec 256 From esr at thyrsus.com Mon Jan 29 21:09:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 15:09:14 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75C86A.3A4236E8@lemburg.com>; from mal@lemburg.com on Mon, Jan 29, 2001 at 08:45:46PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> Message-ID: <20010129150914.C10191@thyrsus.com> M.-A. Lemburg : > If you are looking for speedups w/r to for-loops, applying a > different indexing technique in for-loops would go a lot further > and provide better performance not only to dictionary loops, > but also to other sequences. Which reminds me... There's not much I miss from C these days, but one thing I wish Python had is a more general for-loop. The C semantics that let you have any initialization, any termination test, and any iteration you like are rather cool. Yes, I realize that for (; ; ) {} can be simulated with: while 1: if : break Still, having them spatially grouped the way a C for does it is nice. Makes it easier to see invariants, I think. -- Eric S. Raymond "Rightful liberty is unobstructed action, according to our will, within limits drawn around us by the equal rights of others." -- Thomas Jefferson From moshez at zadka.site.co.il Mon Jan 29 21:29:53 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 22:29:53 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <200101291530.KAA12037@cj20424-a.reston1.va.home.com> References: <200101291530.KAA12037@cj20424-a.reston1.va.home.com>, <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <20010129202953.D1498A840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 10:30:17 -0500, Guido van Rossum wrote: > It's good to test conformance to the language definition, but this is > also a regression test for the implementation. The "accidents of the > implementation" definitely need to be tested. E.g. if we decide that > repr(s) uses \n rather than \012 or \x0a, this should be tested too. > The language definition gives the implementer a choice here; but once > the implementer has made a choice, it's good to have a test that tests > that this choice is implemented correctly. I agree. > Perhaps there should be several parts to the regression test, > e.g. language conformance, library conformance, platform-specific > features, and implementation conformance? This sounds like a good idea...probably for the 2.2 timeline. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one at home.com Mon Jan 29 22:51:56 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 16:51:56 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: [Moshe Zadka] > ... > I'm starting to wonder what the tests really test: the language > definition, or accidents of the implementation? You'd be amazed (appalled?) at how hard it is to separate them. In two previous lives as a Big Iron compiler hacker, we routinely had to get our compilers validated by a govt agency before any US govt account would be allowed to buy our stuff; e.g., http://www.itl.nist.gov/div897/ctg/vpl/language.htm This usually *started* as a two-day process, flying the inspector to our headquarters, taking perhaps 2 minutes of machine time to run the test suite, then sitting around that day and into the next arguing about whether the "failures" were due to non-standard assumptions in the tests, or compiler bugs. It was almost always the former, but sometimes that didn't get fully resolved for months (if the inspector was being particularly troublesome, it could require getting an Official Interpretation from the relevant stds body -- not swift!). (BTW, this is one reason huge customers are often very reluctant to move to a new release: the validation process can be very expensive and drag on for months) >>> def f(): ... global g ... g += 1 ... return g ... >>> g = 0 >>> d = {f(): f()} >>> d {2: 1} >>> The Python Lang Ref doesn't really say whether {2: 1} or {1: 2} "should be" the result, nor does it say it's implementation-defined. If you *asked* Guido what he thought it should do, he'd probably say {1: 2} (not much of a guess: I asked him in the past, and that's what he did say ). Something "like that" can show up in the test suite, but buried under layers of obfuscating accidents. Nobody is likely to realize it in the absence of a failure motivating people to search for it. Which is a trap: sometimes ours was the only compiler (of dozens and dozens) that had *ever* "failed" a particular test. This was most often the case at Cray Research, which had bizarre (but exceedingly fast -- which is what Cray's customers valued most) floating-point arithmetic. I recall one test in particular that failed because Cray's was the only box on earth that set I to 1 in INTEGER I I = 6.0/3.0 Fortran doesn't define that the result must be 2. But-- you guessed it --neither does Python. Cute: at KSR, INT(6.0/3.0) did return 2 -- but INT(98./49.) did not . then-again-the-python-test-suite-is-still-shallow-ly y'rs - tim From hughett at mercur.uphs.upenn.edu Mon Jan 29 23:05:22 2001 From: hughett at mercur.uphs.upenn.edu (Paul Hughett) Date: Mon, 29 Jan 2001 17:05:22 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: References: Message-ID: <200101292205.RAA18790@mercur.uphs.upenn.edu> tim says: > Cray's was the only box on earth that set I to 1 in > INTEGER I > I = 6.0/3.0 > Fortran doesn't define that the result must be 2. But-- you guessed > it --neither does Python. I would _guess_ that the IEEE 754 floating point standard does require that, but I haven't actually gotten my hands on a copy of the standard yet. If it doesn't, I may have to stop writing code that depends on the assumption that floating point computation is exact for exactly representable integers. If so, then we're reasonably safe; there aren't many non-IEEE machines left these days. Un-lurking-ly yours, Paul Hughett From tim.one at home.com Mon Jan 29 23:53:43 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 17:53:43 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <200101292205.RAA18790@mercur.uphs.upenn.edu> Message-ID: [Paul Hughett] > I would _guess_ that the IEEE 754 floating point standard does require > that [6./3. == 2.], It does, but 754 is silent on how languages may or may not *bind* to its semantics. The C99 std finally addresses that (15 years after 754), and Java does too (albeit in a way Kahan despises), but that's about it for "name brand" languages. > ... > If it doesn't, I may have to stop writing code that depends on > the assumption that floating point computation is exact for exactly > representable integers. If so, then we're reasonably safe; there > aren't many non-IEEE machines left these days. I'm afraid you've got no guarantees even on a box with 100% conforming 754 hardware. One of the last "mystery bugs" I helped tracked down at my previous employer only showed up under Intel's C++ compiler. It turned out the compiler was looking for code of the form: double *a, *b, scale; for (i=0; i < n; ++i) { a[i] = b[i] / scale; } and rewriting it as: double __temp = 1./scale; for (i=0; i < n; ++i) { a[i] = b[i] * __temp; } for speed. As time goes on, PC compilers are becoming more and more like Cray's and KSR's in this respect: float division is much more expensive than float mult, and so variations of "so multiply by the reciprocal instead" are hard for vendors to resist. And, e.g., under 754 double rules, (17. * 123.) * (1./123.) must *not* yield exactly 17.0 if done wholly in 754 double (but then 754 says nothing about how any language maps that string to 754 operations). if-you-like-logic-chopping-you'll-love-arguing-stds-ly y'rs - tim From guido at digicool.com Tue Jan 30 00:59:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 18:59:34 -0500 Subject: [Python-Dev] Does autoconfig detect INSTALL incorrectly? In-Reply-To: Your message of "Tue, 23 Jan 2001 00:30:56 PST." <20010123003056.A28309@glacier.fnational.com> References: <20010123003056.A28309@glacier.fnational.com> Message-ID: <200101292359.SAA20364@cj20424-a.reston1.va.home.com> > Why is the configure.in file set to always use "install-sh"? > There is a comment that says: > > # Install just never works :-( > > I don't think that statement is accurate. /usr/bin/install works > quite well on my machine. The only commments I can find in the > changelog are: > > revision 1.16 > date: 1995/01/20 14:12:16; author: guido; state: Exp; lines: +27 -2 > add INSTALL_PROGRAM and INSTALL_DATA; check for getopt > > and: > > revision 1.5 > date: 1994/08/19 15:33:51; author: guido; state: Exp; lines: +14 -6 > Simplify value of INSTALL (always 'cp'). > > Is there any reason why the autoconf macro AC_PROG_INSTALL is not used? The > documentation seems to indicate that is does what we want. Neil, It's too long for me to remember, and I bet this was before AC_PROG_INSTALL. If there's a reason to prefer a working "install" over install-sh, feel free to do the right thing! (You're in charge of the Makefile anyway now, it seems. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Jan 30 01:17:25 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 29 Jan 2001 18:17:25 -0600 (CST) Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP Message-ID: <14966.2069.950895.627663@beluga.mojam.com> After reading through this thread and noticing (but not paying close attention to) all the related posts on c.l.py (subject: "in for dicts"), it seems to me that the whole "if/for something in dict" thing needds to be hashed out in a PEP. There were a fair amount of "Python's changing too fast" rants when 2.0 was released. Adding a major feature such as this at the 2.1 stage is only going to generate that many more rants. The fact that it was easy for Thomas to implement "if key in dict" doesn't make the overall concept less controversial. There are apparently lots of varying opinions about what's reasonable. This topic seems related to PEP 212 (Loop Counter Iteration) and PEP 218 (Adding a Built-In Set Object Type), but may well warrant its own. That said, I have plenty enough on my plate trying to keep Mojam afloat these days, so I can't step into the crevass, just observe that it looks to me like a very long ways to the bottom... ;-) Skip From guido at digicool.com Tue Jan 30 01:22:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 19:22:58 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Your message of "Mon, 29 Jan 2001 18:17:25 CST." <14966.2069.950895.627663@beluga.mojam.com> References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <200101300022.TAA21244@cj20424-a.reston1.va.home.com> > After reading through this thread and noticing (but not paying close > attention to) all the related posts on c.l.py (subject: "in for dicts"), it > seems to me that the whole "if/for something in dict" thing needds to be > hashed out in a PEP. There were a fair amount of "Python's changing too > fast" rants when 2.0 was released. Adding a major feature such as this at > the 2.1 stage is only going to generate that many more rants. The fact that > it was easy for Thomas to implement "if key in dict" doesn't make the > overall concept less controversial. There are apparently lots of varying > opinions about what's reasonable. This topic seems related to PEP 212 (Loop > Counter Iteration) and PEP 218 (Adding a Built-In Set Object Type), but may > well warrant its own. Excellent. Good reminder also that this shouldn't go into 2.1 -- clearly the design space is too complicated for a quick decision. > That said, I have plenty enough on my plate trying to keep Mojam afloat > these days, so I can't step into the crevass, just observe that it looks to > me like a very long ways to the bottom... ;-) I'm not able to lead such a PEP effort myself either, but I hope *someone* will be. This PEP has a good chance for 2.2 though (what with BDFL approval and all :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 30 02:39:17 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 20:39:17 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I did a less sophisticated count but come to the same conclusion: > iterations over items() are (somewhat) more common than over keys(), > and values() are 1-2 orders of magnitude less common. My numbers: > > $ cd python/src/Lib > $ grep 'for .*items():' *.py | wc -l > 47 > $ grep 'for .*keys():' *.py | wc -l > 43 > $ grep 'for .*values():' *.py | wc -l > 2 I like my larger sample and anal methodology better . A closer look showed that it may have been unduly biased by the mass of files in Lib/encodings/, where encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k is at the end of most files (btw, MAL, that's the answer to your question: people would expect "the same" ordering you expected there, i.e. none in particular). > ... > I don't much value to the readability argument: typically, one will > write "for key in dict" or "for name in dict" and then it's obvious > what is meant. Well, "fiddlesticks" comes to mind <0.9 wink>. If I've got a dict mapping phone numbers to names, "for name in dict" is dead backwards. for vevent in keydefs.keys(): for x in self.subdirs.keys(): for name in lsumdict.keys(): for locale in self.descriptions.keys(): for name in attrs.keys(): for func in other.top_level.keys(): for func in target.keys(): for i in u2.keys(): for s in d.keys(): for url in self.bad.keys(): are other cases in the CVS tree where I don't think the name makes it obvious in the absence of ".keys()". But I don't personally give any weight to whether people can guess what something does at first glance. My rule is that it doesn't matter, provided it's (a) easy to learn; and (especially), (b) hard to *forget* once you've learned it. A classic example is Python's "points between elements" treatment of slice indices: few people guess right what that does at first glance, but once they "get it" they're delighted and rarely mess up again. And I think this is "like that". > ... > But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody > has even asked me for a has_item() method). Yup. > I can live with "x in list" checking the values and "x in dict" > checking the keys. But I can *not* live with "x in dict" equivalent > to "dict.has_key(x)" if "for x in dict" would mean > "for x in dict.items()". That's why I brought it up -- it's not entirely clear what's to be done here. > I also think that defining "x in dict" but not "for x in dict" will > be confusing. > > So we need to think more. The hoped-for next step indeed. > How about: > > for key in dict: ... # ... over keys > > for key:value in dict: ... # ... over items > > This is syntactically unambiguous (a colon is currently illegal in > that position). Cool! Can we resist adding if key:value in dict for "parallelism"? (I know I can ...) 2/3rd of these are marginally more attractive: for key: in dict: # over dict.keys() for :value in dict: # over dict.values() for : in dict: # a delay loop > This also suggests: > > for index:value in list: ... # ... over zip(range(len(list), list) > > while doesn't strike me as bad or ugly, and would fulfill my brother's > dearest wish. You mean besides the one that you fry in hell for not adding "for ... indexing"? Ya, probably. > (And why didn't we think of this before?) Best guess: we were focused exclusively on sequences, and a colon just didn't suggest itself in that context. Second-best guess: having finally approved one of these gimmicks, you finally got desperate enough to make it work . ponderingly y'rs - tim From tim.one at home.com Tue Jan 30 02:58:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 20:58:59 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > I'm expecting (though don't have much proof) that most loops over > dicts don't mutate the dict. Safe bet! I do recall writing one once: it del'ed keys for which the associated count was 1, because the rest of the algorithm was only interested in duplicates. > Maybe we could add a flag to the dict that issues an error when a new > key is inserted during such a for loop? (I don't think the key order > can be affected when a key is *deleted*.) That latter is true but specific to this implementation. "Can't mutate the dict period" is easier to keep straight, and probably harmless in practice (if not, it could be relaxed later). Recall that a similar trick is played during list.sort(), replacing the list's type pointer for the duration (to point to an internal "immutable list" type, same as the list type except the "dangerous" slots point to a function that raises an "immutable list" TypeError). Then no runtime expense is incurred for regular lists to keep checking flags. I thought of this as an elegant use for switching types at runtime; you may still be appalled by it, though! From tim.one at home.com Tue Jan 30 03:07:36 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:07:36 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75B190.3FD2A883@lemburg.com> Message-ID: [Guido] > The same order that for k,v in dict.items() will yield, of course. [MAL] > And then people find out that the order has some sorting > properties and start to use it... Except that it has none. dict insertion has never used any comparison outcome beyond "equal"/"not equal", so any ordering you think you see is-- and always was --an illusion. From guido at digicool.com Tue Jan 30 03:06:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:06:35 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 20:39:17 EST." References: Message-ID: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> This is all PEP material now. Tim, do you want to own the PEP? It seems just up your alley! > Cool! Can we resist adding > > if key:value in dict > > for "parallelism"? (I know I can ...) That's easy to resist because, unlike ``for key:value in dict'', it's not unambiguous: ``if key:value in dict'' is already legal syntax currently, with 'key' as the condition and 'value in dict' as the (not particularly useful) body of the if statement. > > (And why didn't we think of this before?) > > Best guess: we were focused exclusively on sequences, and a colon just > didn't suggest itself in that context. Second-best guess: having finally > approved one of these gimmicks, you finally got desperate enough to make it > work . I'm certainly more comfortable with just ``for key in dict'' than with the whole slow of extensions using colons. But, again, that's for the PEP to fight over. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 03:15:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:15:04 -0500 Subject: [Python-Dev] C's for statement In-Reply-To: Your message of "Mon, 29 Jan 2001 15:09:14 EST." <20010129150914.C10191@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> Message-ID: <200101300215.VAA21955@cj20424-a.reston1.va.home.com> [ESR] > There's not much I miss from C these days, but one thing I wish Python > had is a more general for-loop. The C semantics that let you have > any initialization, any termination test, and any iteration you like > are rather cool. > > Yes, I realize that > > for (; ; ) {} > > can be simulated with: > > > while 1: > if : > break > > > Still, having them spatially grouped the way a C for does it is nice. > Makes it easier to see invariants, I think. Hm, I've seen too many ugly C for loops to have much appreciation for it. I can recognize and appreciate the few common forms that clearly iterate over an array; most other forms look rather contorted to me. Check out the Python C sources; if you find anything more complicated than ``for (i = n; i > 0; i--)'' I probably didn't write it. :-) Common abominations include: - writing a while loop as for(;;) - putting arbitrary initialization code in - having an empty condition, so the becomes an arbitraty extension of the body that's written out-of-sequence --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 30 03:19:12 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:19:12 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75C86A.3A4236E8@lemburg.com> Message-ID: [MAL] > I just wanted to hint at a problem which iterating over items > in an unordered set can cause. Especially new Python users will find > it confusing that the order of the items in an iteration can change > from one run to the next. Do they find "for k, v in dict.items()" confusing now? Would be the same. > ... > What we really want is iterators for dictionaries, so why not > implement these instead of tweaking for-loops. Seems an unrelated topic: would "iterators for dictionaries" solve the supposed problem with iteration order? > If you are looking for speedups w/r to for-loops, applying a > different indexing technique in for-loops would go a lot further > and provide better performance not only to dictionary loops, > but also to other sequences. > > I have made some good experience with a special counter object > (sort of like a mutable integer) which is used instead of the > iteration index integer in the current implementation. Please quantify, if possible. My belief (based on past experiments) is that in loops fancier than for i in range(n): pass the loop overhead quickly falls into the noise even now. > Using an iterator object instead of the integer + __getitem__ > call machinery would allow more flexibility for all kinds of > sequences or containers. ... This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new iteration *protocol* could have major attractions. From guido at digicool.com Tue Jan 30 03:17:27 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:17:27 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: Your message of "Mon, 29 Jan 2001 15:02:47 EST." <20010129150247.B10191@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> Message-ID: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> [ESR] > For different reasons, I'd like to be able to set a constant flag on a > object instance. Simple semantics: if you try to assign to a > member or method, it throws an exception. > > Application? I have a large Python program that goes to a lot of effort > to build elaborate context structures in core. It would be nice to know > they can't be even inadvertently trashed without throwing an exception I > can watch for. Yes, this is a good thing. Easy to do on lists and dicts. Questions: - How to spell it? x.freeze()? x.readonly()? - Should this reversible? I.e. should there be an x.unfreeze()? - Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 30 03:29:25 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:29:25 -0500 Subject: [Python-Dev] C's for statement In-Reply-To: <200101300215.VAA21955@cj20424-a.reston1.va.home.com> Message-ID: Check out SETL's loop statement. I think Perl5 is a subset of it <0.9 wink>. From esr at thyrsus.com Tue Jan 30 03:34:01 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 21:34:01 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <200101300215.VAA21955@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:15:04PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> Message-ID: <20010129213401.A17235@thyrsus.com> Guido van Rossum : > Common abominations include: > > - writing a while loop as for(;;) Agreed. Bletch. > - putting arbitrary initialization code in Not sure what's "arbitrary", unless you mean unrelated to the iteration variable. > - having an empty condition, so the becomes an arbitraty > extension of the body that's written out-of-sequence Again agreed. Double bletch. I guess my archetype of the cute C for-loop is the idiom for pointer-list traversal: struct foo {int data; struct foo *next;} *ptr, *head; for (ptr = head; *ptr; ptr = ptr->next) do_something_with(ptr->data) This is elegant. It separates the logic for list traversal from the operation on the list element. Not the highest on my list of wants -- I'd sooner have ?: back. I submitted a patch for that once, and the discussion sort of died. Were you dead det against it, or should I revive this proposal? -- Eric S. Raymond "The bearing of arms is the essential medium through which the individual asserts both his social power and his participation in politics as a responsible moral being..." -- J.G.A. Pocock, describing the beliefs of the founders of the U.S. From esr at thyrsus.com Tue Jan 30 03:49:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 21:49:59 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:17:27PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <20010129214959.B17235@thyrsus.com> Guido van Rossum : > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? I like "freeze", it'a a clear imperative where "readonly()" sounds like a test (e.g. "is this readonly()?") > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... Moshe Zadka sent me a hack that handles instances: > class MarkableAsConstant: > > def __init__(self): > self.mark_writable() > > def __setattr__(self, name, value): > if self._writable: > self.__dict__[name] = value > else: > raise ValueError, "object is read only" > > def mark_writable(self): > self.__dict__['_writable'] = 1 > > def mark_readonly(self): > self.__dict__['_writable'] = 0 > - Should this reversible? I.e. should there be an x.unfreeze()? I gave this some thought earlier today. There are advantages to either way. Making freeze a one-way operation would make it possible to use freezing to get certain kinds of security and integrity guarantees that you can't have if freezing is reversible. Fortunately, there's a semantics that captures both. If we allow freeze to take an optional key argument, and require that an unfreeze call must supply the same key or fail, we get both worlds. We can even one-way-hash the keys so they don't have to be stored in the bytecode. Want to lock a structure permanently? Pick a random long key. Freeze with it. Then throw that key away... -- Eric S. Raymond Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune From tim.one at home.com Tue Jan 30 03:57:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:57:59 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? See below. > - Should this reversible? Of course. Or x.freeze(solid=1) to default to permanent rigidity, but not require it. > I.e. should there be an x.unfreeze()? That conveniently answers the first question, since x.unreadonly() reads horribly . > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... "Should be" supported for every mutable object. Next step: as in endless C++ debates, endless Python debates about "representation freeze" vs "logical freeze" ("well, yes, I'm changing this member, but it's just an invisible cache so I *should* be able to tag the object as const anyway ..."; etc etc etc). keep-it-simple-ly y'rs - tim From guido at digicool.com Tue Jan 30 03:57:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:57:24 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: Your message of "Mon, 29 Jan 2001 21:34:01 EST." <20010129213401.A17235@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> Message-ID: <200101300257.VAA22186@cj20424-a.reston1.va.home.com> > > - putting arbitrary initialization code in > > Not sure what's "arbitrary", unless you mean unrelated to the > iteration variable. Yes, that. > I guess my archetype of the cute C for-loop is the idiom for > pointer-list traversal: > > struct foo {int data; struct foo *next;} *ptr, *head; > > for (ptr = head; *ptr; ptr = ptr->next) > do_something_with(ptr->data) > > This is elegant. It separates the logic for list traversal from the > operation on the list element. And it rarely happens in Python, because sequences are rarely represented as linked lists. > Not the highest on my list of wants -- I'd sooner have ?: back. I submitted > a patch for that once, and the discussion sort of died. Were you dead > det against it, or should I revive this proposal? Not dead set against something like it, but dead set against the ?: syntax because then : becomes too overloaded for the human reader, e.g.: if foo ? bar : bletch : spam = eggs If you want to revive this, I strongly suggest writing a PEP first before posting here. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 03:59:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:59:17 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: Your message of "Mon, 29 Jan 2001 21:49:59 EST." <20010129214959.B17235@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <20010129214959.B17235@thyrsus.com> Message-ID: <200101300259.VAA22208@cj20424-a.reston1.va.home.com> > > - How to spell it? x.freeze()? x.readonly()? > > I like "freeze", it'a a clear imperative where "readonly()" sounds > like a test (e.g. "is this readonly()?") Agreed. > > - Should we support something like this for instances too? Sometimes > > it might be cool to be able to freeze changing attribute values... > > Moshe Zadka sent me a hack that handles instances: [...] OK, so no special support needed there. > > - Should this reversible? I.e. should there be an x.unfreeze()? > > I gave this some thought earlier today. There are advantages to either > way. Making freeze a one-way operation would make it possible to use > freezing to get certain kinds of security and integrity guarantees that > you can't have if freezing is reversible. > > Fortunately, there's a semantics that captures both. If we allow > freeze to take an optional key argument, and require that an unfreeze > call must supply the same key or fail, we get both worlds. We can > even one-way-hash the keys so they don't have to be stored in the > bytecode. > > Want to lock a structure permanently? Pick a random long key. Freeze > with it. Then throw that key away... Way too cute. My suggestion freeze(0) freezes forever, freeze(1) can be unfrozen. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 30 04:06:19 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 22:06:19 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <200101300257.VAA22186@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:57:24PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> <200101300257.VAA22186@cj20424-a.reston1.va.home.com> Message-ID: <20010129220619.A17713@thyrsus.com> Guido van Rossum : > Not dead set against something like it, but dead set against the ?: > syntax because then : becomes too overloaded for the human reader, e.g.: > > if foo ? bar : bletch : spam = eggs > > If you want to revive this, I strongly suggest writing a PEP first > before posting here. Noted. Will do. -- Eric S. Raymond Such are a well regulated militia, composed of the freeholders, citizen and husbandman, who take up arms to preserve their property, as individuals, and their rights as freemen. -- "M.T. Cicero", in a newspaper letter of 1788 touching the "militia" referred to in the Second Amendment to the Constitution. From tim.one at home.com Tue Jan 30 04:18:47 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 22:18:47 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: <20010129214959.B17235@thyrsus.com> Message-ID: Note that even adding a "frozen" flag would add 4 bytes to every freezable object on most machines. That's why I'd rather .freeze() replace the type pointer and .unfreeze() restore it. No time or space overhead; no cluttering up the normal-case (i.e., unfrozen) type implementations with new tests. From tim.one at home.com Tue Jan 30 04:57:07 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 22:57:07 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.42988.362288.154254@localhost.localdomain> Message-ID: Note that optimizing compilers use a pile of linear-time heuristics to attempt to solve exponential-time optimization problems (from optimal register assignment to optimal instruction scheduling, they're all formally intractable even in isolation). When code gets non-trivial, not even a compiler's chief designer can reliably outguess what optimization may do. It's really not unusual for a higher optimization level to yield slower code, and especially not when the source code is pushing or exceeding machine limits (# of registers, # of instruction pipes, size of branch-prediction buffers; I-cache structure; dynamic restrictions on execution units; ...). [Jeremy] > ... > One of the differences between -O2 and -O3, according to the man page, > is that -O3 will perform optimizations that involve a space-speed > tradeoff. It also include -finline-functions. I can imagine that > some of these optimizations hurt memory performance enough to make a > difference. One of the time-consuming ongoing tasks at my last employer was running profiles and using them to override counterproductive compiler inlining decisions (in both directions). It's not just memory that excessive inlining can screw up, but also things like running out of registers and so inserting gobs of register spill/restore code, and inlining so much code that the instruction scheduler effectively gives up (under many compilers, a sure sign of this is when you look at the generated code for a function, and it looks beautiful "at the top" but terrible "at the bottom"; some clever optimizers tried to get around that by optimizing "bottom-up", and then it looks beautiful at the bottom but terrible at the top <0.5 wink>; others work middle-out or burn the candle at both ends, with visible consequences you should be able to recognize now!). optimization-is-easier-than-speech-recog-but-the-latter-doesn't-work- all-that-well-either-ly y'rs - tim From barry at digicool.com Tue Jan 30 05:13:24 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:13:24 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <14966.16228.548177.112853@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> it seems to me that the whole "if/for something in dict" thing SM> needds to be hashed out in a PEP. SM> There are apparently lots of varying opinions about what's SM> reasonable. This topic seems related to PEP 212 (Loop Counter SM> Iteration) and PEP 218 (Adding a Built-In Set Object Type), SM> but may well warrant its own. As keeper of PEP0, I have to agree. I personally would vastly prefer a new iterator protocol than syntax such as "for key:value in dict". I'd really like to see a PEP on an iterator protocol for Python, but like Skip, I'm too busy at the moment to do it myself. If nobody takes it on before then, I might be willing to champion such a PEP for the 2.2 time frame. Until then, I'm decidedly -1 on "for/if in dict". -Barry From barry at digicool.com Tue Jan 30 05:25:09 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:25:09 -0500 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <14966.16933.209494.214183@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Yes, this is a good thing. Easy to do on lists and dicts. GvR> Questions: GvR> - How to spell it? x.freeze()? x.readonly()? GvR> - Should this reversible? I.e. should there be an GvR> x.unfreeze()? GvR> - Should we support something like this for instances too? GvR> Sometimes it might be cool to be able to freeze changing GvR> attribute values... lock(x) ...? :) -Barry From barry at digicool.com Tue Jan 30 05:26:50 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:26:50 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <20010129214959.B17235@thyrsus.com> Message-ID: <14966.17034.721204.305315@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Fortunately, there's a semantics that captures both. If we ESR> allow freeze to take an optional key argument, and require ESR> that an unfreeze call must supply the same key or fail, we ESR> get both worlds. We can even one-way-hash the keys so they ESR> don't have to be stored in the bytecode. ESR> Want to lock a structure permanently? Pick a random long ESR> key. Freeze with it. Then throw that key away... Clever! From esr at thyrsus.com Tue Jan 30 05:32:16 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 23:32:16 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <14966.16933.209494.214183@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 29, 2001 at 11:25:09PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <14966.16933.209494.214183@anthem.wooz.org> Message-ID: <20010129233215.A18533@thyrsus.com> Barry A. Warsaw : > lock(x) ...? :) I was thinking that myself, Barry. -- Eric S. Raymond "Boys who own legal firearms have much lower rates of delinquency and drug use and are even slightly less delinquent than nonowners of guns." -- U.S. Department of Justice, National Institute of Justice, Office of Juvenile Justice and Delinquency Prevention, NCJ-143454, "Urban Delinquency and Substance Abuse," August 1995. From tim.one at home.com Tue Jan 30 05:56:09 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 23:56:09 -0500 Subject: [Python-Dev] SSL socket read at EOF; SourceForge problem Message-ID: I tried to open an SF bug for the following msg from c.l.py, but SF balked: ERROR ERROR getting bug_id Logged out, logged in, tried it again, same outcome. Intended bug report content: Good question from c.l.py, assigned to Guido cuz he's a Socket Guy: From: Clarence Gardner Subject: RE: Thread Safety Date: Mon, 29 Jan 2001 09:51:03 -0800 ... I'm going to repeat a question that I posted about a week ago that passed without comment on the newsgroup. The issue is the SSL support in the socket module, which raises an exception when the reading socket is at EOF, rather than returning an empty string. I'm hesitant to call it a "bug", but I wouldn't have implemented it this way. There are the names of two people mentioned at the top of socketmodule.c, but no contact information, so I'm suggesting here that it be changed to conform to normal file/socket practice. (SSL was actually added at 2.0, so I'm late to the party with this; mea culpa, mea culpa. I delayed trying Python2 because of the extension rebuilding.) From thomas at xs4all.net Tue Jan 30 07:14:20 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:14:20 +0100 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <20010129213401.A17235@thyrsus.com>; from esr@thyrsus.com on Mon, Jan 29, 2001 at 09:34:01PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> Message-ID: <20010130071420.U962@xs4all.nl> On Mon, Jan 29, 2001 at 09:34:01PM -0500, Eric S. Raymond wrote: > I guess my archetype of the cute C for-loop is the idiom for > pointer-list traversal: > struct foo {int data; struct foo *next;} *ptr, *head; > for (ptr = head; *ptr; ptr = ptr->next) > do_something_with(ptr->data) Note two things: in Python, you would use a list, so 'for x i list' does exactly what you want here ;) And if you really need it, you could use iterators for exactly this (once we have them, of course): you are inventing a new storage type. Quite common in C, since the only one it has is useless for anything other than strings, but not so common in Python. > Not the highest on my list of wants -- I'd sooner have ?: back. I submitted > a patch for that once, and the discussion sort of died. Were you dead > det against it, or should I revive this proposal? Triple blech. Guido will never go for it! (There, increased your chance of getting it approved! :) Seriously though, I wouldn't like it much, it's too cryptic a syntax. I notice I use it less and less in C, too. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 30 07:18:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:18:25 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from tim.one@home.com on Mon, Jan 29, 2001 at 08:39:17PM -0500 References: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: <20010130071825.V962@xs4all.nl> On Mon, Jan 29, 2001 at 08:39:17PM -0500, Tim Peters wrote: > for key: in dict: # over dict.keys() > for :value in dict: # over dict.values() > for : in dict: # a delay loop Wot's the last one supposed to do ? 'for unused_var in range(len(dict)):' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Tue Jan 30 07:25:51 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 30 Jan 2001 01:25:51 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010130071825.V962@xs4all.nl> Message-ID: >> for key: in dict: # over dict.keys() >> for :value in dict: # over dict.values() >> for : in dict: # a delay loop [Thomas Wouters] > Wot's the last one supposed to do ? 'for unused_var in > range(len(dict)):' ? Well, as the preceding line said in the original: >> 2/3rd of these are marginally more attractive [than >> "if key:value in dict"]: I think you've guessed which 2/3 those are . I don't see that the last line has any visible semantics whatsoever, so Python can do whatever it likes, provided it doesn't do anything visible. You still hang out on c.l.py! So you gotta know that if something of the form x:y is suggested, people will line up to suggest meanings for the 3 obvious variations, along with x::y and x:-:y and x lambda y too <0.9 wink>. From thomas at xs4all.net Tue Jan 30 07:26:48 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:26:48 +0100 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <14966.2069.950895.627663@beluga.mojam.com>; from skip@mojam.com on Mon, Jan 29, 2001 at 06:17:25PM -0600 References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <20010130072648.W962@xs4all.nl> On Mon, Jan 29, 2001 at 06:17:25PM -0600, Skip Montanaro wrote: > The fact that it was easy for Thomas to implement "if key in dict" doesn't > make the overall concept less controversial. Note that the fact I implemented it doesn't mean I'm +1 on it (witness my posts on python-list.) In fact, *while implementing it*, I grew from +0 to -0 and maybe even to a weak -1 (all in 5 minutes :) The enthousiastic subject of the patch was a weak attempt at 5AM humour, not a venting of an ancient desire :) More-5AM-humour-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 30 07:55:16 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:55:16 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: ; from jhylton@users.sourceforge.net on Mon, Jan 29, 2001 at 05:27:30PM -0800 References: Message-ID: <20010130075515.X962@xs4all.nl> On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > add note about two kinds of illegal imports that are now checked > + - The compiler will report a SyntaxError if "from ... import *" occurs > + in a function or class scope or if a name bound by the import > + statement is declared global in the same scope. The language > + reference has also documented that these cases are illegal, but > + they were not enforced. Woah. Is this really a good idea ? I have seen 'from ... import *' in a function scope put to good (relatively -- we're talking 'import *' here) use. I also thought of 'import' as yet another assignment statement, so to me it's both logical and consistent if 'import' would listen to 'global'. Otherwise we have to re-invent 'import spam; eggs = spam' if we want eggs to be global. Is there really a reason to enforce this, or are we enforcing the wording of the language reference for the sake of enforcing the wording of the language reference ? When writing 'import as' for 2.0, I fixed some of the inconsistencies in import, making it adhere to 'global' statements in as many cases as possible (all except 'from ... import *') but I was apparently not aware of the wording of the language reference. I'd suggest updating the wording in the language reference, not the implementation, unless there is a good reason to disallow this. I also have another issue with your recent patches, Jeremy, also in the backwards-compatibility departement :) You gave new.code two new, non-optional arguments, in the middle of the long argument list. I sent a note about it to python-checkins instead of python-dev by accident, but Fred seemed to agree with me there. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh21 at cam.ac.uk Tue Jan 30 09:30:15 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 30 Jan 2001 08:30:15 +0000 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: "Tim Peters"'s message of "Mon, 29 Jan 2001 22:57:07 -0500" References: Message-ID: In the interest of generating some numbers (and filling up my hard drive), last night I wrote a script to build lots & lots of versions of python (many of which turned out to be redundant - eg. -O6 didn't seem to do anything different to -O3 and pybench doesn't work with 1.5.2), and then run pybench with them. Summarised results below; first a key: src-n: this morning's CVS (with Jeremy's f_localsplus optimisation) (only built this with -O3) src: CVS from yesterday afternoon src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc patch applied. More on this later... Python-2.0: you can guess what this is. All runs are compared against Python-2.0-O2: Benchmark: src-n-O3 (rounds=10, warp=20) Average round time: 49029.00 ms -0.86% Benchmark: src (rounds=10, warp=20) Average round time: 67141.00 ms +35.76% Benchmark: src-O (rounds=10, warp=20) Average round time: 50167.00 ms +1.44% Benchmark: src-O2 (rounds=10, warp=20) Average round time: 49641.00 ms +0.37% Benchmark: src-O3 (rounds=10, warp=20) Average round time: 49104.00 ms -0.71% Benchmark: src-O6 (rounds=10, warp=20) Average round time: 49131.00 ms -0.66% Benchmark: src-obmalloc (rounds=10, warp=20) Average round time: 63276.00 ms +27.94% Benchmark: src-obmalloc-O (rounds=10, warp=20) Average round time: 46927.00 ms -5.11% Benchmark: src-obmalloc-O2 (rounds=10, warp=20) Average round time: 46146.00 ms -6.69% Benchmark: src-obmalloc-O3 (rounds=10, warp=20) Average round time: 46456.00 ms -6.07% Benchmark: src-obmalloc-O6 (rounds=10, warp=20) Average round time: 46450.00 ms -6.08% Benchmark: Python-2.0 (rounds=10, warp=20) Average round time: 68933.00 ms +39.38% Benchmark: Python-2.0-O (rounds=10, warp=20) Average round time: 49542.00 ms +0.17% Benchmark: Python-2.0-O3 (rounds=10, warp=20) Average round time: 48262.00 ms -2.41% Benchmark: Python-2.0-O6 (rounds=10, warp=20) Average round time: 48273.00 ms -2.39% My conclusion? Python 2.1 is slower than Python 2.0, but not by enough to care about. Interestingly, adding obmalloc speeds things up. Let's take a closer look: $ python pybench.py -c src-obmalloc-O3 -s src-O3 PYBENCH 0.7 Benchmark: src-O3 (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 843.35 ms 6.61 us +2.93% BuiltinMethodLookup: 878.70 ms 1.67 us +0.56% ConcatStrings: 1068.80 ms 7.13 us -1.22% ConcatUnicode: 1373.70 ms 9.16 us -1.24% CreateInstances: 1433.55 ms 34.13 us +9.06% CreateStringsWithConcat: 1031.75 ms 5.16 us +10.95% CreateUnicodeWithConcat: 1277.85 ms 6.39 us +3.14% DictCreation: 1275.80 ms 8.51 us +44.22% ForLoops: 1415.90 ms 141.59 us -0.64% IfThenElse: 1152.70 ms 1.71 us -0.15% ListSlicing: 397.40 ms 113.54 us -0.53% NestedForLoops: 789.75 ms 2.26 us -0.37% NormalClassAttribute: 935.15 ms 1.56 us -0.41% NormalInstanceAttribute: 961.15 ms 1.60 us -0.60% PythonFunctionCalls: 1079.65 ms 6.54 us -1.00% PythonMethodCalls: 908.05 ms 12.11 us -0.88% Recursion: 838.50 ms 67.08 us -0.00% SecondImport: 741.20 ms 29.65 us +25.57% SecondPackageImport: 744.25 ms 29.77 us +18.66% SecondSubmoduleImport: 947.05 ms 37.88 us +25.60% SimpleComplexArithmetic: 1129.40 ms 5.13 us +114.92% SimpleDictManipulation: 1048.55 ms 3.50 us -0.00% SimpleFloatArithmetic: 746.05 ms 1.36 us -2.75% SimpleIntFloatArithmetic: 823.35 ms 1.25 us -0.37% SimpleIntegerArithmetic: 823.40 ms 1.25 us -0.37% SimpleListManipulation: 1004.70 ms 3.72 us +0.01% SimpleLongArithmetic: 865.30 ms 5.24 us +100.65% SmallLists: 1657.65 ms 6.50 us +6.63% SmallTuples: 1143.95 ms 4.77 us +2.90% SpecialClassAttribute: 949.00 ms 1.58 us -0.22% SpecialInstanceAttribute: 1353.05 ms 2.26 us -0.73% StringMappings: 1161.00 ms 9.21 us +7.30% StringPredicates: 1069.65 ms 3.82 us -5.30% StringSlicing: 846.30 ms 4.84 us +8.61% TryExcept: 1590.40 ms 1.06 us -0.49% TryRaiseExcept: 1104.65 ms 73.64 us +24.46% TupleSlicing: 681.10 ms 6.49 us -3.13% UnicodeMappings: 1021.70 ms 56.76 us +0.79% UnicodePredicates: 1308.45 ms 5.82 us -4.79% UnicodeProperties: 1148.45 ms 5.74 us +13.67% UnicodeSlicing: 984.15 ms 5.62 us -0.51% ------------------------------------------------------------------------ Average round time: 49104.00 ms +5.70% *) measured against: src-obmalloc-O3 (rounds=10, warp=20) Words fail me slightly, but maybe some tuning of the memory allocation of longs & complex numbers would be in order? Time for lectures - I don't think algebraic geometry is going to make my head hurt as much as trying to explain benchmarks... Cheers, M. -- ARTHUR: But which is probably incapable of drinking the coffee. -- The Hitch-Hikers Guide to the Galaxy, Episode 6 From ping at lfw.org Tue Jan 30 09:38:12 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 30 Jan 2001 00:38:12 -0800 (PST) Subject: [Python-Dev] Read-only function attributes Message-ID: Hi there. I see that the function attribute feature specifically allows assignment to func_code and func_defaults, but no other special attributes. This seems really suspect to me. Why would we want to allow the reassignment of special attributes at all? Functions have always been immutable objects, and i can see some motivation for attaching mutable dictionaries to them, but it's a more serious move to make the functions mutable themselves. I don't recall any discussion about changing special attributes; i don't see a clear purpose to them; and i do see a danger in making it harder to be certain that a program is safe and predictable. (Yes, i did notice that function attributes can't be set in restricted mode, but the addition of extra features requiring extra security checks makes me uneasy.) -- ?!ng From ping at lfw.org Tue Jan 30 09:52:43 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 30 Jan 2001 00:52:43 -0800 (PST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: Eric S. Raymond wrote: > For different reasons, I'd like to be able to set a constant flag on a > object instance. Simple semantics: if you try to assign to a > member or method, it throws an exception. Guido van Rossum wrote: > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? I'm not so sure. There seem to be many issues here. More questions: What's the difference between a frozen list and a tuple? Is a frozen list hashable? > - Should this reversible? I.e. should there be an x.unfreeze()? What if two threads lock and then unlock the same structure? > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... If you do this, i bet people will immediately want to freeze individual attributes. Some might be confused by a.x = [1, 2, 3] lock(a.x) # intend to lock the attribute, not the list a.x = 3 # hey, why is this allowed? What does locking an extension object do? What happens when you lock an object that implements list or dict semantics? Do we care that locking a UserList accomplishes nothing? Should unfreeze/unlock() be disallowed in restricted mode? -- ?!ng No software is totally secure, but using [Microsoft] Outlook is like hanging a sign on your back that reads "PLEASE MESS WITH MY COMPUTER." -- Scott Rosenberg, Salon Magazine From fredrik at effbot.org Tue Jan 30 10:05:47 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 30 Jan 2001 10:05:47 +0100 Subject: [Python-Dev] Read-only function attributes References: Message-ID: <01d701c08a9b$d7a9fe60$e46940d5@hagrid> Ka-Ping Yee wrote: > I see that the function attribute feature specifically allows > assignment to func_code and func_defaults, but no other special > attributes. This seems really suspect to me. Why would we want > to allow the reassignment of special attributes at all? to allow an IDE to "patch" a running program? From gvwilson at ca.baltimore.com Tue Jan 30 14:08:42 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Tue, 30 Jan 2001 08:08:42 -0500 (EST) Subject: [Python-Dev] re: Making mutable objects readonly In-Reply-To: <20010130085202.18E71EAC4@mail.python.org> Message-ID: > Barry Warsaw: > lock(x) ...? :) Greg Wilson: -1 --- everyone will assume it's mutual exclusion, rather than immutability. From guido at digicool.com Tue Jan 30 15:01:15 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 09:01:15 -0500 Subject: [Python-Dev] Read-only function attributes In-Reply-To: Your message of "Tue, 30 Jan 2001 00:38:12 PST." References: Message-ID: <200101301401.JAA25600@cj20424-a.reston1.va.home.com> > I see that the function attribute feature specifically allows > assignment to func_code and func_defaults, but no other special > attributes. This seems really suspect to me. Why would we want > to allow the reassignment of special attributes at all? As Effbot said, this is useful in certain circumstances where a development environment wants to implement a "better reload". For this same reason you can assign to a class's __bases__ and __dict__ and to an instance's __class__ and __dict__. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 16:00:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:00:58 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: Your message of "Tue, 30 Jan 2001 00:52:43 PST." References: Message-ID: <200101301500.KAA25733@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > > > - How to spell it? x.freeze()? x.readonly()? Ping: > I'm not so sure. There seem to be many issues here. More questions: > > What's the difference between a frozen list and a tuple? A frozen list can be unfrozen (maybe)? > Is a frozen list hashable? Yes -- that's what started this thread (using dicts as dict keys, actually). > > - Should this reversible? I.e. should there be an x.unfreeze()? > > What if two threads lock and then unlock the same structure? That's up to the threads -- it's no different that other concurrent access. > > - Should we support something like this for instances too? Sometimes > > it might be cool to be able to freeze changing attribute values... > > If you do this, i bet people will immediately want to freeze > individual attributes. Some might be confused by > > a.x = [1, 2, 3] > lock(a.x) # intend to lock the attribute, not the list > a.x = 3 # hey, why is this allowed? That's a matter of API. I wouldn't make this a built-in, but rather a method on freezable objects (please don't call it lock()!). > What does locking an extension object do? What does adding 1 to an extension object do? > What happens when you lock an object that implements list or dict > semantics? Do we care that locking a UserList accomplishes nothing? Who says it doesn't? > Should unfreeze/unlock() be disallowed in restricted mode? I don't see why not. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 16:06:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:06:57 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Tue, 30 Jan 2001 07:55:16 +0100." <20010130075515.X962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> Message-ID: <200101301506.KAA25763@cj20424-a.reston1.va.home.com> > On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > > > add note about two kinds of illegal imports that are now checked > > > + - The compiler will report a SyntaxError if "from ... import *" occurs > > + in a function or class scope or if a name bound by the import > > + statement is declared global in the same scope. The language > > + reference has also documented that these cases are illegal, but > > + they were not enforced. > Woah. Is this really a good idea ? I have seen 'from ... import *' > in a function scope put to good (relatively -- we're talking 'import > *' here) use. I also thought of 'import' as yet another assignment > statement, so to me it's both logical and consistent if 'import' > would listen to 'global'. Otherwise we have to re-invent 'import > spam; eggs = spam' if we want eggs to be global. Note that Jeremy is only raising errors for "from M import *". > Is there really a reason to enforce this, or are we enforcing the > wording of the language reference for the sake of enforcing the > wording of the language reference ? When writing 'import as' for > 2.0, I fixed some of the inconsistencies in import, making it adhere > to 'global' statements in as many cases as possible (all except > 'from ... import *') but I was apparently not aware of the wording > of the language reference. I'd suggest updating the wording in the > language reference, not the implementation, unless there is a good > reason to disallow this. I think Jeremy has an excellent reason. Compilers want to do analysis of name usage at compile time. The value of * cannot be determined at compile time (you cannot know what module will actually be imported at run time). Up till now, we were able to fudge this, but Jeremy's new compiler needs to know exactly which names are defined in all local scopes, in order to do nested scopes right. > I also have another issue with your recent patches, Jeremy, also in > the backwards-compatibility departement :) You gave new.code two > new, non-optional arguments, in the middle of the long argument > list. I sent a note about it to python-checkins instead of > python-dev by accident, but Fred seemed to agree with me there. (Tim will love this. :-) I don't know what those new arguments represent. If they can reasonably be assumed to be empty for code that doesn't use the new features, I'd say move them to the end and default them properly. If they must be specified, I'd say too bad, the new module is an accident of the implementation anyway, and its users should update their code. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 16:08:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:08:39 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Your message of "Tue, 30 Jan 2001 07:26:48 +0100." <20010130072648.W962@xs4all.nl> References: <14966.2069.950895.627663@beluga.mojam.com> <20010130072648.W962@xs4all.nl> Message-ID: <200101301508.KAA25825@cj20424-a.reston1.va.home.com> > Note that the fact I implemented it doesn't mean I'm +1 on it (witness my > posts on python-list.) In fact, *while implementing it*, I grew from +0 to > -0 and maybe even to a weak -1 (all in 5 minutes :) The enthousiastic > subject of the patch was a weak attempt at 5AM humour, not a venting of an > ancient desire :) Can you say "PEP time"? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Tue Jan 30 16:29:43 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 30 Jan 2001 10:29:43 -0500 Subject: [Python-Dev] Read-only function attributes References: Message-ID: <14966.56807.288840.7850@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> I see that the function attribute feature specifically allows KY> assignment to func_code and func_defaults, but no other KY> special attributes. This seems really suspect to me. Why KY> would we want to allow the reassignment of special attributes KY> at all? ... and actually, none of that changed w/ the function attribute patch. You've been able to assign to func_code and func_defaults since Python 1.6! -Barry From thomas at xs4all.net Tue Jan 30 16:52:04 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 16:52:04 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <200101301506.KAA25763@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 30, 2001 at 10:06:57AM -0500 References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> Message-ID: <20010130165204.I962@xs4all.nl> On Tue, Jan 30, 2001 at 10:06:57AM -0500, Guido van Rossum wrote: > > On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > > > > > add note about two kinds of illegal imports that are now checked > > > > > + - The compiler will report a SyntaxError if "from ... import *" occurs > > > + in a function or class scope or if a name bound by the import > > > + statement is declared global in the same scope. The language > > > + reference has also documented that these cases are illegal, but > > > + they were not enforced. > > Woah. Is this really a good idea ? I have seen 'from ... import *' > > in a function scope put to good (relatively -- we're talking 'import > > *' here) use. I also thought of 'import' as yet another assignment > > statement, so to me it's both logical and consistent if 'import' > > would listen to 'global'. Otherwise we have to re-invent 'import > > spam; eggs = spam' if we want eggs to be global. > Note that Jeremy is only raising errors for "from M import *". No, he says he's also raising errors for 'import spam' if 'spam' is declared global, like so: def viking(): global spam import spam > > Is there really a reason to enforce this, or are we enforcing the > > wording of the language reference for the sake of enforcing the > > wording of the language reference ? When writing 'import as' for > > 2.0, I fixed some of the inconsistencies in import, making it adhere > > to 'global' statements in as many cases as possible (all except > > 'from ... import *') but I was apparently not aware of the wording > > of the language reference. I'd suggest updating the wording in the > > language reference, not the implementation, unless there is a good > > reason to disallow this. > I think Jeremy has an excellent reason. Compilers want to do analysis > of name usage at compile time. The value of * cannot be determined at > compile time (you cannot know what module will actually be imported at > run time). Up till now, we were able to fudge this, but Jeremy's new > compiler needs to know exactly which names are defined in all local > scopes, in order to do nested scopes right. Hrrmm.... I guess I have to agree with that. None the less, I wish we could have a "ack! this is stupid code! it uses 'from larch import *'! All bets are off, we do a lot of slow complicated runtime checking now!" mode. The thing I still enjoy most about Python is that it always does what I want, and though I'd never want to do 'from different import *' in a local scope, I do want other, less wise people to have the same experience, where possible :) And I also want to be able to do: def fill_me(with): global me if with == 1: import me elif with == 2: import me_too as me elif with == 3: from me.Tools import me_me as me elif with == 4: me = FakeModule() sys.modules['me'] = me else: raise ValueError And I can't quite argue that away with 'the compiler needs to know ...' -- it's all there! > > I also have another issue with your recent patches, Jeremy, also in > > the backwards-compatibility departement :) You gave new.code two > > new, non-optional arguments, in the middle of the long argument > > list. I sent a note about it to python-checkins instead of > > python-dev by accident, but Fred seemed to agree with me there. > (Tim will love this. :-) > I don't know what those new arguments represent. If they can > reasonably be assumed to be empty for code that doesn't use the new > features, I'd say move them to the end and default them properly. If > they must be specified, I'd say too bad, the new module is an accident > of the implementation anyway, and its users should update their code. Okay, I can live with that. It's sure to cause some gripes though. Then again, from looking at the code I'd say those arguments (freevars and cellvars) can easily default to empty tuples. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From bckfnn at worldonline.dk Tue Jan 30 18:34:10 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Tue, 30 Jan 2001 17:34:10 GMT Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3a76df10.22007715@smtp.worldonline.dk> [Guido] >Maybe we could add a flag to the dict that issues an error when a new >key is inserted during such a for loop? FWIW, some of the java2 collections decided to throw a Concurrent- ModificationException in the iterator if the collection was modified during the iteration. Generally none of java2 collections can be modified while iterating over it (the exception is calling .remove() on the iterator object and not all collections support that). >(I don't think the key order can be affected when a key is *deleted*.) Probably also true for the Hashtables which is backing our PyDictionary, but I'll rather not depend too much on it being true. [Tim] >That latter is true but specific to this implementation. "Can't mutate the >dict period" is easier to keep straight, and probably harmless in practice >(if not, it could be relaxed later). Agree. >Recall that a similar trick is played >during list.sort(), replacing the list's type pointer for the duration (to >point to an internal "immutable list" type, same as the list type except the >"dangerous" slots point to a function that raises an "immutable list" >TypeError). Then no runtime expense is incurred for regular lists to keep >checking flags. I thought of this as an elegant use for switching types at >runtime; you may still be appalled by it, though! Changing the type of a type? Yuck! I might very likely be reading the CPython sources wrongly, but it seems this trick will cause an BadInternalCall if some other C extension are trying to modify a list while it is freezed by the type switching trick. I imagine this would happen if the extension called: PyList_SetItem(myList, 0, aValue); I guess Jython could support this from the python side, but its hard to ensure from the java side without adding an additional PyList_Check(..) to all list methods. It just doesn't feel like the right thing to go since it would cause slower access to all mutable objects. regards, finn From guido at digicool.com Tue Jan 30 21:42:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 15:42:58 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Tue, 30 Jan 2001 16:52:04 +0100." <20010130165204.I962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> Message-ID: <200101302042.PAA29301@cj20424-a.reston1.va.home.com> > > > Woah. Is this really a good idea ? I have seen 'from ... import *' > > > in a function scope put to good (relatively -- we're talking 'import > > > *' here) use. I also thought of 'import' as yet another assignment > > > statement, so to me it's both logical and consistent if 'import' > > > would listen to 'global'. Otherwise we have to re-invent 'import > > > spam; eggs = spam' if we want eggs to be global. > > > Note that Jeremy is only raising errors for "from M import *". > > No, he says he's also raising errors for 'import spam' if 'spam' is declared > global, like so: > > def viking(): > global spam > import spam Yeah, this was just brought to my attention at our group meeting today. I'm with you on this one -- there really isn't a good reason why this shouldn't work. (I wonder why that constraint was ever added to the reference manual; maybe I was just upset that someone would *do* something as ugly as that, or maybe there was a J[P]ython reason???.) > > I think Jeremy has an excellent reason. Compilers want to do analysis > > of name usage at compile time. The value of * cannot be determined at > > compile time (you cannot know what module will actually be imported at > > run time). Up till now, we were able to fudge this, but Jeremy's new > > compiler needs to know exactly which names are defined in all local > > scopes, in order to do nested scopes right. > > Hrrmm.... I guess I have to agree with that. None the less, I wish we could > have a "ack! this is stupid code! it uses 'from larch import *'! All bets > are off, we do a lot of slow complicated runtime checking now!" mode. The > thing I still enjoy most about Python is that it always does what I want, > and though I'd never want to do 'from different import *' in a local scope, > I do want other, less wise people to have the same experience, where > possible :) Hm, maybe, just *maybe* Jeremy can do this if there are no nested scopes in sight. But I don't think it's a big deal as long as the error message is clear -- it's bad style. > And I also want to be able to do: > > def fill_me(with): > global me > if with == 1: > import me > elif with == 2: > import me_too as me > elif with == 3: > from me.Tools import me_me as me > elif with == 4: > me = FakeModule() > sys.modules['me'] = me > else: > raise ValueError > > And I can't quite argue that away with 'the compiler needs to know ...' -- > it's all there! Sort of, although I would prefer to do a two-stager here: first some variation of "import me as meohmy", and then "global me; me = meohmy" . > > > I also have another issue with your recent patches, Jeremy, also in > > > the backwards-compatibility departement :) You gave new.code two > > > new, non-optional arguments, in the middle of the long argument > > > list. I sent a note about it to python-checkins instead of > > > python-dev by accident, but Fred seemed to agree with me there. > > > (Tim will love this. :-) > > > I don't know what those new arguments represent. If they can > > reasonably be assumed to be empty for code that doesn't use the new > > features, I'd say move them to the end and default them properly. If > > they must be specified, I'd say too bad, the new module is an accident > > of the implementation anyway, and its users should update their code. > > Okay, I can live with that. It's sure to cause some gripes though. Then > again, from looking at the code I'd say those arguments (freevars and > cellvars) can easily default to empty tuples. OK. I hope Jeremy can fix this when he gets home. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Tue Jan 30 23:30:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 23:30:25 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3a76df10.22007715@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Tue, Jan 30, 2001 at 05:34:10PM +0000 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> Message-ID: <20010130233025.J962@xs4all.nl> On Tue, Jan 30, 2001 at 05:34:10PM +0000, Finn Bock wrote: > >Recall that a similar trick is played during list.sort(), replacing the > >list's type pointer for the duration (to point to an internal "immutable > >list" type, same as the list type except the "dangerous" slots point to a > >function that raises an "immutable list" TypeError). Then no runtime > >expense is incurred for regular lists to keep checking flags. I thought > >of this as an elegant use for switching types at runtime; you may still > >be appalled by it, though! > Changing the type of a type? Yuck! No, the typeobject itself isn't changed -- that would freeze *all* dicts/lists/whatever, not just the one we want. We'd be changing the type of an object (or 'type instance', if you want, but not "type 'instance'"), not the type of a type. > I might very likely be reading the CPython sources wrongly, but it seems > this trick will cause an BadInternalCall if some other C extension are > trying to modify a list while it is freezed by the type switching trick. > I imagine this would happen if the extension called: > PyList_SetItem(myList, 0, aValue); Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() (or whatever), and methods/operations that modify the listobject would have to check if the list is frozen, and raise an appropriate error if so. This might throw 'unexpected' errors, but only in situations that can't happen right now! -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Tue Jan 30 23:45:16 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 30 Jan 2001 23:45:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> <20010130233025.J962@xs4all.nl> Message-ID: <003501c08b0e$51f975c0$e46940d5@hagrid> > Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, > 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() > (or whatever), and methods/operations that modify the listobject would have > to check if the list is frozen, and raise an appropriate error if so. This > might throw 'unexpected' errors. did someone just subscribe me to the perl-porters list? -1 on "modal freeze" (it's madness) -0 on an "immutable dictionary" type in the core From tim.one at home.com Wed Jan 31 00:53:45 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 30 Jan 2001 18:53:45 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This is all PEP material now. Yup. > Tim, do you want to own the PEP? Not really. Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first). >> Cool! Can we resist adding >> >> if key:value in dict >> >> for "parallelism"? (I know I can ...) > That's easy to resist because, unlike ``for key:value in dict'', it's > not unambiguous: But if (key:value) in dict is. Just trying to help whoever *does* want the PEP . > ... > I'm certainly more comfortable with just ``for key in dict'' than with > the whole slow of extensions using colons. What about just the for key:value in dict for index:value in sequence extensions? The degenerate forms (omitting x or y or both in x:y) are mechanical variations so are likely to get raised. > But, again, that's for the PEP to fight over. PEPs are easier if you Pronounce on things you hate early so that those can get recorded in the "BDFL Pronouncements" section without further ado. whatever-this-may-look-like-it's-not-a-pep-discussion-ly y'rs - tim From nas at arctrix.com Tue Jan 30 18:12:15 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 30 Jan 2001 09:12:15 -0800 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <003501c08b0e$51f975c0$e46940d5@hagrid>; from fredrik@effbot.org on Tue, Jan 30, 2001 at 11:45:16PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> <20010130233025.J962@xs4all.nl> <003501c08b0e$51f975c0$e46940d5@hagrid> Message-ID: <20010130091215.C18319@glacier.fnational.com> On Tue, Jan 30, 2001 at 11:45:16PM +0100, Fredrik Lundh wrote: > did someone just subscribe me to the perl-porters list? > > -1 on "modal freeze" (it's madness) > -0 on an "immutable dictionary" type in the core I'm glad I'm not the only one who had that feeling. I agree with your votes too. Neil From nas at arctrix.com Tue Jan 30 18:24:54 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 30 Jan 2001 09:24:54 -0800 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from tim.one@home.com on Tue, Jan 30, 2001 at 06:53:45PM -0500 References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> Message-ID: <20010130092454.D18319@glacier.fnational.com> [Tim Peters on adding yet more syntatic sugar] > Available time is finite, and this isn't at the top of the list > of things I'd like to see (resuming the discussion of > generators + coroutines + iteration protocol comes to mind > first). What's the chances of getting generators into 2.2? The implementation should not be hard. Didn't Steven Majewski have something years ago? Why do we always get sidetracked on trying to figure out how to do coroutines and continuations? Generators would add real power to the language and are simple enough that most users could benefit from them. Also, it should be possible to design an interface that does not preclude the addition of coroutines or continuations later. I'm not volunteering to champion the cause just yet. I just want to know if there is some issue I'm missing. Neil From barry at digicool.com Wed Jan 31 01:24:05 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 30 Jan 2001 19:24:05 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <14967.23333.57259.347222@anthem.wooz.org> >>>>> "NS" == Neil Schemenauer writes: NS> What's the chances of getting generators into 2.2? The NS> implementation should not be hard. Didn't Steven Majewski NS> have something years ago? Why do we always get sidetracked on NS> trying to figure out how to do coroutines and continuations? I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, renaming it just "Generators" and filling it out for the 2.2 time frame. If we want to address coroutines and continuations later, we can write separate PEPs for them. Send me a draft. -Barry From guido at digicool.com Wed Jan 31 01:28:44 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:28:44 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 18:53:45 EST." References: Message-ID: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> > Not really. Available time is finite, and this isn't at the top of the list > of things I'd like to see (resuming the discussion of generators + > coroutines + iteration protocol comes to mind first). OK, get going on that one then! > >> Cool! Can we resist adding > >> > >> if key:value in dict > >> > >> for "parallelism"? (I know I can ...) > > > That's easy to resist because, unlike ``for key:value in dict'', it's > > not unambiguous: > > But > > if (key:value) in dict > > is. Just trying to help whoever *does* want the PEP . OK, I'll pronounce -1 on this one. It looks ugly to me -- too reminiscent of C's if (...) required parentheses. Also it suggests that (key:value) is a new tuple notation that might be useful in other contexts -- which it's not. > > ... > > I'm certainly more comfortable with just ``for key in dict'' than with > > the whole slow of extensions using colons. > > What about just the > > for key:value in dict > for index:value in sequence > > extensions? I'm not against these -- I'd say +0.5. > The degenerate forms (omitting x or y or both in x:y) are > mechanical variations so are likely to get raised. For those, +0.2. > > But, again, that's for the PEP to fight over. > > PEPs are easier if you Pronounce on things you hate early so that those can > get recorded in the "BDFL Pronouncements" section without further ado. At your service -- see above. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 31 01:49:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:49:24 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 09:24:54 PST." <20010130092454.D18319@glacier.fnational.com> References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <200101310049.TAA30197@cj20424-a.reston1.va.home.com> > [Tim Peters on adding yet more syntatic sugar] > > Available time is finite, and this isn't at the top of the list > > of things I'd like to see (resuming the discussion of > > generators + coroutines + iteration protocol comes to mind > > first). > > What's the chances of getting generators into 2.2? The > implementation should not be hard. Didn't Steven Majewski have > something years ago? Why do we always get sidetracked on trying > to figure out how to do coroutines and continuations? I think there's a very good chance of getting them into 2.2. But it *is* true that coroutines are a very attractice piece of land "just nextdoor". On the other hand, continiations are a mirage, so don't try to go there. :-) > Generators would add real power to the language and are simple > enough that most users could benefit from them. Also, it should be > possible to design an interface that does not preclude the > addition of coroutines or continuations later. > > I'm not volunteering to champion the cause just yet. I just want > to know if there is some issue I'm missing. There are different ways to do interators. Here is a very "tame" proposal (and definitely in the realm of 2.2), that doesn't require any coroutine-like tricks. Let's propose that for var in expr: ...do something with var... will henceforth be translated into __iter = iterator(expr) while __iter.more(): var = __iter.next() ...do something with var... -- or some variation that combines more() and next() (I don't care). Then a new built-in function iterator() is needed that creates an iterator object. It should try two things: (1) If the object implements __iterator__() (or a C API equivalent), call that and be done; this way arbitrary iterators can be created. (2) If the object smells like a sequence (how to test???), use an iterator sort of like this: class Iterator: def __init__(self, sequence): self.sequence = sequence self.index = 0 def more(self): # Store the item so that each index is tried exactly once try: self.item = self.sequence[self.index] except IndexError: return 0 else: self.index = self.index + 1 return 1 def next(self): return self.item (I don't necessarily mean that all those instance variables should be publicly available.) The built-in sequence types can use a very fast built-in iterator type that uses a C int for the index and doesn't store the item in the iterator. (This should be as fast as Marc-Andre's for loop optimization using a C counter.) Dictionaries can define an appropriate iterator that uses PyDict_Next(). If the argument to iterator() is itself an iterator (how to test???), it returns the argument unchanged, so that one can also write for var in iterator(obj): ...do something with var... Files of course should have iterators that return the next input line. We could build filtering and mapping iterators that take an iterator argument and do certain manipulations with the elements; this would effectively introduce the notion lazy evaluation on sequences. Etc., etc. This does not come close to Icon generators -- but it doesn't require any coroutine-like capabilities, unlike those. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 31 01:55:10 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 30 Jan 2001 19:55:10 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3a76df10.22007715@smtp.worldonline.dk> Message-ID: [Finn Bock] > Changing the type of a type? Yuck! No, it temporarily changes the type of the single list being sorted, like so, where "self" is a pointer to a PyListObject (which is a list, not a list *type* object): self->ob_type = &immutable_list_type; err = samplesortslice(self->ob_item, self->ob_item + self->ob_size, compare); self->ob_type = &PyList_Type; immutable_list_type is "just like" PyList_Type, except that the slots for mutating methods point to a function that raises a TypeError. Before this drastic step came years of increasingly ugly hacks trying to stop core dumps when people mutated a list during the sort. Python's sort is very complex, and lots of pointers are tucked away -- having the size of the array, or its position in memory, or the set of objects it contains, change as a side effect of doing a compare, would be difficult and expensive to recover from -- and by "difficult" read "nobody ever managed to get it right before this" <0.5 wink>. > I might very likely be reading the CPython sources wrongly, but it seems > this trick will cause an BadInternalCall if some other C extension are > trying to modify a list while it is freezed by the type switching trick. > I imagine this would happen if the extension called: > > PyList_SetItem(myList, 0, aValue); Well, in CPython it's not "legal" for any other thread to use the C API while the sort is in progress, because the thread doing the sort holds the global interpreter lock for the duration. So this could happen "legally" only if a comparison function called by the sort called out to a C extension attempting to mutate the list. In that case, fine, it *is* a bad call: mutation is not allowed during list sorting, so they deserve whatever they get -- and far better a "bad internal call" than a core dump. If the immutable_list_type were used more generally, it would require more general support (but I see Thomas already talked about that -- thanks). From guido at digicool.com Wed Jan 31 01:55:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:55:19 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 19:24:05 EST." <14967.23333.57259.347222@anthem.wooz.org> References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> <14967.23333.57259.347222@anthem.wooz.org> Message-ID: <200101310055.TAA30250@cj20424-a.reston1.va.home.com> > I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, > renaming it just "Generators" and filling it out for the 2.2 time > frame. If we want to address coroutines and continuations later, we > can write separate PEPs for them. I think it's better not to re-use PEP 220 for that. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Wed Jan 31 01:58:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 01:58:32 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310028.TAA30090@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 30, 2001 at 07:28:44PM -0500 References: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> Message-ID: <20010131015832.K962@xs4all.nl> On Tue, Jan 30, 2001 at 07:28:44PM -0500, Guido van Rossum wrote: > > What about just the > > for key:value in dict > > for index:value in sequence > > extensions? > I'm not against these -- I'd say +0.5. What, fractions ? Isn't that against the whole idea of (+|-)(0|1) ? :) But since we are voting, I'm -0 on this right now, and might end up -1 or +0, depending on the implementation; I still can't *see* this, though I wouldn't be myself if I hadn't tried to implement it anyway :) And I ran into some fairly mind-boggling issues. The worst bit is 'how the f*ck does FOR_LOOP know if something's a dict or a list'. And the almost-as-bad bit is 'WTF to do for user classes, extension types and almost-list/almost-dict practically-builtin types (arrays, the *dbm's, etc.)'. After some sleep-deprived consideration I gave up and decided we need an iteration/generator protocol first. However, my life's been busy (or rather, my work has been) with all kinds of small and not so small details, and I haven't been getting much sleep in the last week or so, so I might be overlooking something very simple. That's why I can go either way based on implementation -- it might prove me wrong :) Until my boss is back and I stop being 'responsible' (end of this week, start of next week) and I get a chance to get rid of about 2 months of work backlog (the time he was away) I won't have time to champion or even contribute to such a PEP. Then again, by that time I might be preparing for IPC9 (_if_ my boss sends me there) or even my ApacheCon US presentation (which got accepted today, yay!) So, if that other message was an attempt to drop the PEP on me, Guido, the answer is the same as I tend to give to suits that show up next to my desk wanting to discuss something important (to them) right away: "b'gg'r 'ff" :) I'll-save-my-answer-to-PR-officers-doing-the-same-for-when-you-do-something- -*really*-offensive-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Wed Jan 31 02:16:51 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 20:16:51 -0500 Subject: [Python-Dev] Let's release 2.1a2 Thursday night Message-ID: <200101310116.UAA30386@cj20424-a.reston1.va.home.com> Things look good for a release of 2.1a2 this week; we're aiming for Thursday night. I won't be in town (speaking to the press at LinuxWorld Expo in New York) but Jeremy will handle the release process and the other PythonLabs folks will assist him. Tomorrow Fred will check in his weak references after making some changes (mostly making it more Spartan :-) that I suggested in a code review. After that, I think we're good for the second (and last!) alpha release; and enough has changed (e.g. nested scopes, lots of setup.py changes, flat Makefile) to warrant going ahead now. Now is the time for those last-minute bugfixes that you're all so famous for! I propose a checkin freeze for non-PythonLabs folks Wednesday midnight US west coast time, to give Jeremy c.s. enough time to build the release and give it a good work-out. (An internal freeze is up to Jeremy to declare, but should probably take Tim's sleep cycle into account.) --Guido van Rossum (home page: http://www.python.org/~guido/) PS. I'll be out of reach from noon US east coast time tomorrow (Wednesday), traveling to New York by train. I probably won't check my email while out there; I'll be back Friday night. From guido at digicool.com Wed Jan 31 02:35:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 20:35:25 -0500 Subject: [Python-Dev] SSL socket read at EOF; SourceForge problem In-Reply-To: Your message of "Mon, 29 Jan 2001 23:56:09 EST." References: Message-ID: <200101310135.UAA30629@cj20424-a.reston1.va.home.com> > I'm going to repeat a question that I posted about a week ago that passed > without comment on the newsgroup. The issue is the SSL support in the socket > module, which raises an exception when the reading socket is at EOF, rather > than returning an empty string. I'm hesitant to call it a "bug", but I > wouldn't have implemented it this way. There are the names of two people > mentioned at the top of socketmodule.c, but no contact information, so I'm > suggesting here that it be changed to conform to normal file/socket > practice. (SSL was actually added at 2.0, so I'm late to the party with > this; mea culpa, mea culpa. I delayed trying Python2 because of the > extension rebuilding.) I agree that it makes more sense if a read at EOF returns an empty string, since that's what other file-like objects in Python do. I can't do much about this right now, but I'd love to see a patch. It could go into 2.1a2 if small enough. Note that input() and raw_input() are specifically excepted because they are intended for use in interactive mode by newbies mostly; and because "" as return value for EOF would be ambiguous for these. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed Jan 31 05:12:23 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jan 2001 17:12:23 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> Message-ID: <200101310412.RAA03140@s454.cosc.canterbury.ac.nz> : > for index:value in sequence -1, because we only construct dicts using that notation, not sequences. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed Jan 31 06:21:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 00:21:37 -0500 Subject: [Python-Dev] codecity.com Message-ID: <200101310521.AAA31653@cj20424-a.reston1.va.home.com> Should I spread this word, or is this a joke? The Python quiz category is laughable. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Sat, 27 Jan 2001 23:16:02 -0800 From: "Jeff Cordova" To: Subject: New, fun way to learn Python. Hi Guido, I wanted to let you know about www.codecity.com After several years of managing large software projects in Silicon Valley, I realized that I was spending a lot of time teaching jr. programmers how to write code. So, I created CodeCity to help me automate some of that. If you go to the site, you'll see that I've created a category for Python. There's not much depth to the Python content yet (the site is only a week old) but I'm expecting the Python community to add their wisdom over a period of time. If you could spread the word, it would be highly appreciated. Thankyou, Jeff C. ------- End of Forwarded Message From tim.one at home.com Wed Jan 31 07:16:48 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 01:16:48 -0500 Subject: [Python-Dev] codecity.com In-Reply-To: <200101310521.AAA31653@cj20424-a.reston1.va.home.com> Message-ID: [Guido, on www.codecity.com] > Should I spread this word, or is this a joke? The Python quiz > category is laughable. While the Python section still seems to have only one question, the first day this was announced the third choice wasn't today's: Python is Open Source code, so it doesn't have a creator but: Martha Stewart I liked it better before <0.9 wink>. From moshez at zadka.site.co.il Wed Jan 31 07:30:07 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 08:30:07 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310049.TAA30197@cj20424-a.reston1.va.home.com> References: <200101310049.TAA30197@cj20424-a.reston1.va.home.com>, <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <20010131063007.536ACA83E@darjeeling.zadka.site.co.il> On Tue, 30 Jan 2001 19:49:24 -0500, Guido van Rossum wrote: > There are different ways to do interators. > > Here is a very "tame" proposal (and definitely in the realm of 2.2), > that doesn't require any coroutine-like tricks. Let's propose that > > for var in expr: > ...do something with var... > > will henceforth be translated into > > __iter = iterator(expr) > while __iter.more(): > var = __iter.next() > ...do something with var... I'm +1 on that...but Tim's "try to use that to write something that will return the nodes of a binary tree" still haunts me. Personally, though, I'd thin down the interface to while 1: try: var = __iter.next() except NoMoreError: break # pseudo-break? With the usual caveat that this is a lie as far as "else" is concerned (IOW, pseudo-break gets into the else) > Then a new built-in function iterator() is needed that creates an > iterator object. It should try two things: > > (1) If the object implements __iterator__() (or a C API equivalent), > call that and be done; this way arbitrary iterators can be > created. > (2) If the object smells like a sequence (how to test???), use an > iterator sort of like this: Why not, "if the object doesn't have __iterator__, try this. If it won't work, we'll find out by the exception that will be thrown in our face". class Iterator: def __init__(self, seq): self.seq = seq self.index = 0 def next(self): try: try: return self.seq[self.index] # <- smells like except IndexError: raise NoMoreError(self.index) finally: self.index += 1 > (I don't necessarily mean that all those instance variables should > be publicly available.) But what about your poor brother? Er....I mean, this would make implementing "indexing" really about just getting the index from the iterator. > If the argument to iterator() is itself an iterator (how to test???), No idea, and this looks problematic. I see your point -- but it's still problematic. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one at home.com Wed Jan 31 07:57:26 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 01:57:26 -0500 Subject: [Python-Dev] Can't enter new Python bugs on SourceForge? Message-ID: Reported this earlier. Still can't create a new bug. Guido either. Here's the SF Support request opened on this: http://sourceforge.net/support/ index.php?func=detailsupport&support_id=113100&group_id=1 The good(?) news is that Python isn't the only project to report this problem. From tim.one at home.com Wed Jan 31 08:50:18 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 02:50:18 -0500 Subject: [Python-Dev] FW: Python programmer needed (addition to urllib2 and HTTPS support) Message-ID: Get rich quick! -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of Albert Chin-A-Young Sent: Wednesday, January 31, 2001 2:31 AM To: python-list at python.org Subject: Python programmer needed (addition to urllib2 and HTTPS support) We're in need of a contract Python programmer for the following: 1. Allow connecting to a host with urlopen() which requires BASIC HTTP authentication with a proxy (via urllib2.py). This should address bug #125217: http://sourceforge.net/bugs/?func=detailbug&bug_id=125217&group_id=5470 2. Allow connecting to a host with urlopen() which requires BASIC HTTP authentication with a proxy that requires BASIC HTTP authentication (via urllib2.py). 3. Support for non-authenticated clients to connect to a HTTPS server 4. Support for a client to authenticate the HTTPS host (to verify that it's certificate is valid) What we might consider adding (depends on cost): 1. Support for authenticated clients to connect to a HTTPS server. Please note that solutions to the four items above must be rolled back into the main Python distribution (implies the "community" and the Python developers need to agree on the adopted solution). -- albert chin (china at thewrittenword dot com) -- http://mail.python.org/mailman/listinfo/python-list From ping at lfw.org Wed Jan 31 10:47:10 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 01:47:10 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Message-ID: On Tue, 30 Jan 2001, Guido van Rossum wrote: > > Can you say "PEP time"? :-) Okay, i have written a draft PEP that tries to combine the "elt in dict", custom iterator, and "for k:v" issues into a coherent proposal. Have a look: http://www.lfw.org/python/pep-iterators.txt http://www.lfw.org/python/pep-iterators.html Could i get a number for this please? -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From moshez at zadka.site.co.il Wed Jan 31 11:14:49 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 12:14:49 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: References: Message-ID: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> On Wed, 31 Jan 2001 01:47:10 -0800 (PST), Ka-Ping Yee wrote: > Okay, i have written a draft PEP that tries to combine the > "elt in dict", custom iterator, and "for k:v" issues into a > coherent proposal. Have a look: > > http://www.lfw.org/python/pep-iterators.txt > http://www.lfw.org/python/pep-iterators.html Er....one problem with first reading: you forgot to mention in the while loop description that 'else:' would be executed if the exception is raised, so the 'break' is a pseudo-break'. Basic response: I *love* the iter(), sq_iter and __iter__ parts. I tremble at seeing the rest. Why not add a method to dictionaries .iteritems() and do for (k, v) in dict.iteritems(): pass (dict.iteritems() would return an an iterator to the items) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From MarkH at ActiveState.com Wed Jan 31 11:34:01 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 31 Jan 2001 21:34:01 +1100 Subject: [Python-Dev] WARNING: Changed build process for zlib on Windows Message-ID: Hi all, In an attempt to solve "[ Bug #129293 ] zlib library used for binary win32 distribution can crash" (https://sourceforge.net/bugs/?func=detailbug&group_id=5470&bug_id=129293), Tim and I have decided that we should fix the build process of zlib.pyd on windows. The current process requires that the builder download _2_ zlib archives - a binary distribution for zlib.lib, and the source archive for the headers. We believe that slight differences between the 2 are causing the above bug. A particular warning-light is that the current process defines ZLIB_DLL even though we are _not_ currently using the DLL but the static lib. Removing this #define generates linker errors. The new process is very simple, but may break some peoples build. In theory it _should_ still work for everyone, but if it fails to build, please check your directory structure. From ping at lfw.org Wed Jan 31 12:00:48 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 03:00:48 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010131015832.K962@xs4all.nl> Message-ID: On Wed, 31 Jan 2001, Thomas Wouters wrote: > I still can't *see* this, though I > wouldn't be myself if I hadn't tried to implement it anyway :) And I ran > into some fairly mind-boggling issues. The worst bit is 'how the f*ck > does FOR_LOOP know if something's a dict or a list'. I believe the Pythonic answer to that is "see if the appropriate method is available". The best definition of "sequence-like" or "mapping-like" i can come up with is: x is sequence-like if it provides __getitem__() but not keys() x is mapping-like if it provides __getitem__() and keys() But in our case, since we need iteration, we can look for specific methods that have to do with just what we need for iteration and nothing else. Thus, e.g. a mapping-like class without a values() method is no problem if we never ask to iterate over values. > And the > almost-as-bad bit is 'WTF to do for user classes, extension types and > almost-list/almost-dict practically-builtin types I think it can be done; the draft PEP at http://www.lfw.org/python/pep-iterators.html is a best-attempt at supporting everything just as you would expect. Let me know if you think there are important cases it doesn't cover. I know, the table mp_iteritems __iteritems__, __iter__, items, __getitem__ mp_iterkeys __iterkeys__, __iter__, keys, __getitem__ mp_itervalues __itervalues__, __iter__, values, __getitem__ sq_iter __iter__, __getitem__ might look a little frightening, but it's not so bad, and i think it's about as simple as you can make it while continuing to support existing pseudo-lists and pseudo-dictionaries. No instance should ever provide __iter__ at the same time as any of the other __iter*__ methods anyway. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From mal at lemburg.com Wed Jan 31 12:56:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 12:56:12 +0100 Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) References: Message-ID: <3A77FD5C.DE8729DC@lemburg.com> > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv17061/Python > > Modified Files: > compile.c > Log Message: > Enforce two illegal import statements that were outlawed in the > reference manual but not checked: Names bound by import statemants may > not occur in global statements in the same scope. The from ... import * > form may only occur in a module scope. > > I guess these changes could break code, but the reference manual > warned about them. Jeremy, your code breaks all uses of "from package import submodule" inside packages. Try distutils for example or setup.py.... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 13:01:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:01:24 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <3A77FE94.E5082136@lemburg.com> Guido van Rossum wrote: > > [ESR] > > For different reasons, I'd like to be able to set a constant flag on a > > object instance. Simple semantics: if you try to assign to a > > member or method, it throws an exception. > > > > Application? I have a large Python program that goes to a lot of effort > > to build elaborate context structures in core. It would be nice to know > > they can't be even inadvertently trashed without throwing an exception I > > can watch for. > > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? How about .lock() and .unlock() ? > - Should this reversible? I.e. should there be an x.unfreeze()? Yes. These low-level locks could be used in thread programming since the above calls are C level functions and thus thread safe w/r to the global interpreter lock. > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... Sure :) Eric, could you write a PEP for this ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 13:08:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:08:15 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A78002F.DC8F0582@lemburg.com> Tim Peters wrote: > > [MAL] > > ... > > What we really want is iterators for dictionaries, so why not > > implement these instead of tweaking for-loops. > > Seems an unrelated topic: would "iterators for dictionaries" solve the > supposed problem with iteration order? No, but it would solve the problem in a more elegant and generalized way. Besides, it also allows writing code which is thread safe, since the iterator can take special actions to assure that the dictionary doesn't change during the iteration phase (see the other thread about "making mutable objects readonly"). > > If you are looking for speedups w/r to for-loops, applying a > > different indexing technique in for-loops would go a lot further > > and provide better performance not only to dictionary loops, > > but also to other sequences. > > > > I have made some good experience with a special counter object > > (sort of like a mutable integer) which is used instead of the > > iteration index integer in the current implementation. > > Please quantify, if possible. My belief (based on past experiments) is that > in loops fancier than > > for i in range(n): > pass > > the loop overhead quickly falls into the noise even now. I don't remember the figures, but these micor optimizations do speedup loops by a noticable amount. Just compare the performance of stock Python 1.5 against my patched version. > > Using an iterator object instead of the integer + __getitem__ > > call machinery would allow more flexibility for all kinds of > > sequences or containers. ... > > This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new > iteration *protocol* could have major attractions. Not really... the counter object is just a special case of an iterator -- in this case iteration is over the IN. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 13:10:43 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:10:43 +0100 Subject: [Python-Dev] Re: Making mutable objects readonly References: Message-ID: <3A7800C3.B5D3203F@lemburg.com> Tim Peters wrote: > > Note that even adding a "frozen" flag would add 4 bytes to every freezable > object on most machines. That's why I'd rather .freeze() replace the type > pointer and .unfreeze() restore it. No time or space overhead; no > cluttering up the normal-case (i.e., unfrozen) type implementations with new > tests. Note that Fred's weak ref implementation also need a flag on every weak referencable object (at least last time I looked at his patches). Why not add a flag byte or word to these objects -- then we'd have 8 or 16 choices of what to do with them ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From MarkH at ActiveState.com Wed Jan 31 13:18:12 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 31 Jan 2001 23:18:12 +1100 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com> Message-ID: MAL writes: > > - How to spell it? x.freeze()? x.readonly()? > > How about .lock() and .unlock() ? I'm with Greg here - lock() and unlock() imply an operation similar to threading.Lock() - ie, exclusivity rather than immutability. I don't have a strong opinion on the other names, but definately prefer any of the others over lock() for this operation. Mark. From mal at lemburg.com Wed Jan 31 13:26:07 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:26:07 +0100 Subject: [Python-Dev] Making mutable objects readonly References: Message-ID: <3A78045F.7DB50871@lemburg.com> Mark Hammond wrote: > > MAL writes: > > > > - How to spell it? x.freeze()? x.readonly()? > > > > How about .lock() and .unlock() ? > > I'm with Greg here - lock() and unlock() imply an operation similar to > threading.Lock() - ie, exclusivity rather than immutability. > > I don't have a strong opinion on the other names, but definately prefer any > of the others over lock() for this operation. Funny, I though that .lock() and .unlock() could be used to implement exactly what threading.Lock() does... Anyway, names really don't matter much, so how about: .mutable([flag]) -> integer If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state. The semantics of this interface would be in sync with many other state APIs in Python and C (e.g. setlocale()). The advantage of making this a method should be clear: it allows writing polymorphic code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pedroni at inf.ethz.ch Wed Jan 31 13:34:32 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Wed, 31 Jan 2001 13:34:32 +0100 (MET) Subject: [Python-Dev] weak refs and jython Message-ID: <200101311234.NAA24584@core.inf.ethz.ch> Hi. I have read weak ref PEP, maybe too late. I don't know if portability of code using weak refs between python and jython was a goal or could be one, and up to which extent actual impl. will correspond to the PEP. But about The callbacks registered with weak references must accept a single parameter, which will be the weak-ly referenced object itself. The object can be resurrected by creating some other reference to the object in the callback, in which case the weak reference generating the callback will still be cleared but no remaining weak references to the object will be cleared. AFAIK using java weak refs (which I think is a natural choice) I see no way (at least no worth-the-effort way) to implement this in jython. Java weak refs cannot be resurrected. regards, Samuele Pedroni. PS: Mr. X is a jython developer. From bckfnn at worldonline.dk Wed Jan 31 13:49:22 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Wed, 31 Jan 2001 12:49:22 GMT Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <200101302042.PAA29301@cj20424-a.reston1.va.home.com> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> Message-ID: <3a7809c0.14839067@smtp.worldonline.dk> >> > Note that Jeremy is only raising errors for "from M import *". >> >> No, he says he's also raising errors for 'import spam' if 'spam' is declared >> global, like so: >> >> def viking(): >> global spam >> import spam > >Yeah, this was just brought to my attention at our group meeting >today. I'm with you on this one -- there really isn't a good reason >why this shouldn't work. (I wonder why that constraint was ever added >to the reference manual; maybe I was just upset that someone would >*do* something as ugly as that, or maybe there was a J[P]ython >reason???.) Previously Jython have had problems with "from .. import *" in function scope, and still have problems when used with the python -> java compiler: http://sourceforge.net/bugs/?func=detailbug&bug_id=122834&group_id=12867 Using global on an import name is currently ignored by Jython because the name assignment is done by the runtime, not the compiler. regards, finn From thomas at xs4all.net Wed Jan 31 13:59:14 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 13:59:14 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <3a7809c0.14839067@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Wed, Jan 31, 2001 at 12:49:22PM +0000 References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> Message-ID: <20010131135914.N962@xs4all.nl> On Wed, Jan 31, 2001 at 12:49:22PM +0000, Finn Bock wrote: > Using global on an import name is currently ignored by Jython because > the name assignment is done by the runtime, not the compiler. So it's impossible to do, in Jython, something like: def fillme(): global me import me but it is possible to do: def fillme(): global me import me as _me me = _me ? I have to say I don't like that; we're always claiming 'import' (and 'def' and 'class' for that matter) are 'just another way of writing assignment'. All these special cases break that. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From bckfnn at worldonline.dk Wed Jan 31 14:35:36 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Wed, 31 Jan 2001 13:35:36 GMT Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <20010131135914.N962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> <20010131135914.N962@xs4all.nl> Message-ID: <3a780eda.16144995@smtp.worldonline.dk> On Wed, 31 Jan 2001 13:59:14 +0100, you wrote: >On Wed, Jan 31, 2001 at 12:49:22PM +0000, Finn Bock wrote: > >> Using global on an import name is currently ignored by Jython because >> the name assignment is done by the runtime, not the compiler. > >So it's impossible to do, in Jython, something like: > >def fillme(): > global me > import me > >but it is possible to do: > >def fillme(): > global me > import me as _me > me = _me > >? Yes, only the second example will make a global variable. > I have to say I don't like that; we're always claiming 'import' (and >'def' and 'class' for that matter) are 'just another way of writing >assignment'. All these special cases break that. I don't like it either, I was only reported what jython currently does. The current design used by Jython does lend itself directly towards a solution, but I don't see anything that makes it impossible to solve. regards, finn From mal at lemburg.com Wed Jan 31 15:34:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 15:34:19 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A78226B.2E177EFE@lemburg.com> Michael Hudson wrote: > > In the interest of generating some numbers (and filling up my hard > drive), last night I wrote a script to build lots & lots of versions > of python (many of which turned out to be redundant - eg. -O6 didn't > seem to do anything different to -O3 and pybench doesn't work with > 1.5.2), and then run pybench with them. Summarised results below; > first a key: > > src-n: this morning's CVS (with Jeremy's f_localsplus optimisation) > (only built this with -O3) > src: CVS from yesterday afternoon > src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc > patch applied. More on this later... > Python-2.0: you can guess what this is. > > All runs are compared against Python-2.0-O2: > > Benchmark: src-n-O3 (rounds=10, warp=20) > Average round time: 49029.00 ms -0.86% > Benchmark: src (rounds=10, warp=20) > Average round time: 67141.00 ms +35.76% > Benchmark: src-O (rounds=10, warp=20) > Average round time: 50167.00 ms +1.44% > Benchmark: src-O2 (rounds=10, warp=20) > Average round time: 49641.00 ms +0.37% > Benchmark: src-O3 (rounds=10, warp=20) > Average round time: 49104.00 ms -0.71% > Benchmark: src-O6 (rounds=10, warp=20) > Average round time: 49131.00 ms -0.66% > Benchmark: src-obmalloc (rounds=10, warp=20) > Average round time: 63276.00 ms +27.94% > Benchmark: src-obmalloc-O (rounds=10, warp=20) > Average round time: 46927.00 ms -5.11% > Benchmark: src-obmalloc-O2 (rounds=10, warp=20) > Average round time: 46146.00 ms -6.69% > Benchmark: src-obmalloc-O3 (rounds=10, warp=20) > Average round time: 46456.00 ms -6.07% > Benchmark: src-obmalloc-O6 (rounds=10, warp=20) > Average round time: 46450.00 ms -6.08% > Benchmark: Python-2.0 (rounds=10, warp=20) > Average round time: 68933.00 ms +39.38% > Benchmark: Python-2.0-O (rounds=10, warp=20) > Average round time: 49542.00 ms +0.17% > Benchmark: Python-2.0-O3 (rounds=10, warp=20) > Average round time: 48262.00 ms -2.41% > Benchmark: Python-2.0-O6 (rounds=10, warp=20) > Average round time: 48273.00 ms -2.39% > > My conclusion? Python 2.1 is slower than Python 2.0, but not by > enough to care about. What compiler did you use and on which platform ? I have made similar experience with -On with n>3 compared to -O2 using pgcc (gcc optimized for PC processors). BTW, the Linux kernel uses "-Wall -Wstrict-prototypes -O3 -fomit-frame-pointer" as CFLAGS -- perhaps Python should too on Linux ?! Does anybody know about the effect of -fomit-frame-pointer ? Would it cause problems or produce code which is not compatible with code compiled without this flag ? > Interestingly, adding obmalloc speeds things up. Let's take a closer > look: > > $ python pybench.py -c src-obmalloc-O3 -s src-O3 > PYBENCH 0.7 > > Benchmark: src-O3 (rounds=10, warp=20) > > Tests: per run per oper. diff * > ------------------------------------------------------------------------ > BuiltinFunctionCalls: 843.35 ms 6.61 us +2.93% > BuiltinMethodLookup: 878.70 ms 1.67 us +0.56% > ConcatStrings: 1068.80 ms 7.13 us -1.22% > ConcatUnicode: 1373.70 ms 9.16 us -1.24% > CreateInstances: 1433.55 ms 34.13 us +9.06% > CreateStringsWithConcat: 1031.75 ms 5.16 us +10.95% > CreateUnicodeWithConcat: 1277.85 ms 6.39 us +3.14% > DictCreation: 1275.80 ms 8.51 us +44.22% > ForLoops: 1415.90 ms 141.59 us -0.64% > IfThenElse: 1152.70 ms 1.71 us -0.15% > ListSlicing: 397.40 ms 113.54 us -0.53% > NestedForLoops: 789.75 ms 2.26 us -0.37% > NormalClassAttribute: 935.15 ms 1.56 us -0.41% > NormalInstanceAttribute: 961.15 ms 1.60 us -0.60% > PythonFunctionCalls: 1079.65 ms 6.54 us -1.00% > PythonMethodCalls: 908.05 ms 12.11 us -0.88% > Recursion: 838.50 ms 67.08 us -0.00% > SecondImport: 741.20 ms 29.65 us +25.57% > SecondPackageImport: 744.25 ms 29.77 us +18.66% > SecondSubmoduleImport: 947.05 ms 37.88 us +25.60% > SimpleComplexArithmetic: 1129.40 ms 5.13 us +114.92% > SimpleDictManipulation: 1048.55 ms 3.50 us -0.00% > SimpleFloatArithmetic: 746.05 ms 1.36 us -2.75% > SimpleIntFloatArithmetic: 823.35 ms 1.25 us -0.37% > SimpleIntegerArithmetic: 823.40 ms 1.25 us -0.37% > SimpleListManipulation: 1004.70 ms 3.72 us +0.01% > SimpleLongArithmetic: 865.30 ms 5.24 us +100.65% > SmallLists: 1657.65 ms 6.50 us +6.63% > SmallTuples: 1143.95 ms 4.77 us +2.90% > SpecialClassAttribute: 949.00 ms 1.58 us -0.22% > SpecialInstanceAttribute: 1353.05 ms 2.26 us -0.73% > StringMappings: 1161.00 ms 9.21 us +7.30% > StringPredicates: 1069.65 ms 3.82 us -5.30% > StringSlicing: 846.30 ms 4.84 us +8.61% > TryExcept: 1590.40 ms 1.06 us -0.49% > TryRaiseExcept: 1104.65 ms 73.64 us +24.46% > TupleSlicing: 681.10 ms 6.49 us -3.13% > UnicodeMappings: 1021.70 ms 56.76 us +0.79% > UnicodePredicates: 1308.45 ms 5.82 us -4.79% > UnicodeProperties: 1148.45 ms 5.74 us +13.67% > UnicodeSlicing: 984.15 ms 5.62 us -0.51% > ------------------------------------------------------------------------ > Average round time: 49104.00 ms +5.70% > > *) measured against: src-obmalloc-O3 (rounds=10, warp=20) > > Words fail me slightly, but maybe some tuning of the memory allocation > of longs & complex numbers would be in order? AFAIR, Vladimir's malloc implementation favours small objects. All number objects (except longs) fall into this category. Perhaps we should think about adding his lib to the core ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 15:39:01 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 15:39:01 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A782385.5B544CD5@lemburg.com> > In the interest of generating some numbers (and filling up my hard > drive), last night I wrote a script to build lots & lots of versions > of python (many of which turned out to be redundant - eg. -O6 didn't > seem to do anything different to -O3 and pybench doesn't work with > 1.5.2), and then run pybench with them. FYI, I've just updated the archive to also work under Python 1.5.x: http://www.lemburg.com/python/pybench-0.7.zip -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mwh21 at cam.ac.uk Wed Jan 31 16:52:23 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 31 Jan 2001 15:52:23 +0000 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 31 Jan 2001 15:34:19 +0100" References: <3A78226B.2E177EFE@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > > My conclusion? Python 2.1 is slower than Python 2.0, but not by > > enough to care about. > > What compiler did you use and on which platform ? Argh, sorry; I meant to put this in! $ uname -a Linux atrus.jesus.cam.ac.uk 2.2.14-1.1.0 #1 Thu Jan 6 05:12:58 EST 2000 i686 unknown $ gcc --version 2.95.1 It's a Dell Dimension XPS D233 (a 233MHz PII) with a reasonably fast hard drive (two year old 10G IBM 7200rpm thingy) and quite a lot of RAM (192Mb). [snip] > AFAIR, Vladimir's malloc implementation favours small objects. > All number objects (except longs) fall into this category. Well, longs & complex numbers don't do any free list handling (like floats and int do), so I see two conclusions: 1) Don't add obmalloc to the core, but do simple free list stuff for longs (might be tricky) and complex nubmers (this should be a no-brainer). 2) Integrate obmalloc - then maybe we can ditch all of that icky freelist stuff. > Perhaps we should think about adding his lib to the core ?! Strikes me as the better solution. Can anyone try this on Windows? Seeing as windows malloc reputedly sucks, maybe the differences would be bigger. Cheers, M. -- Our lecture theatre has just crashed. It will currently only silently display an unexplained line-drawing of a large dog accompanied by spookily flickering lights. -- Dan Sheppard, ucam.chat (from Owen Dunn's summary of the year) From barry at digicool.com Wed Jan 31 17:42:28 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 11:42:28 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: Message-ID: <14968.16500.594486.613828@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> Could i get a number for this please? Looks like you beat Eric to PEP 234. :) I'll update PEP 0 and let you check in your txt file. I may want to do an editorial pass over it. -Barry From barry at digicool.com Wed Jan 31 17:50:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 11:50:10 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> Message-ID: <14968.16962.830739.920771@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> Basic response: I *love* the iter(), sq_iter and __iter__ MZ> parts. I tremble at seeing the rest. Why not add a method to MZ> dictionaries .iteritems() and do | for (k, v) in dict.iteritems(): | pass MZ> (dict.iteritems() would return an an iterator to the items) Moshe, I had exactly the same reaction and exactly the same idea. I'm a strong -1 on introducing new syntax for this when new methods can handle it in a much more readable way (IMO). Another idea would be to allow the iterator() method to take an argument: for key in dict.iterator() a.k.a. for key in dict.iterator(KEYS) and also for value in dict.iterator(VALUES) for key, value in dict.iterator(ITEMS) One problem is that the constants KEYS, VALUES, and ITEMS would either have to be defined some place, or you'd just use values like 0, 1, 2, which is less readable perhaps than just having iteratoritems(), iteratorkeys(), and iteratorvalues() methods. Alternative spellings: itemsiter(), keysiter(), valsiter() itemsiterator(), keysiterator(), valuesiterator() iiterator(), kiterator(), viterator() ad-nauseum-ly y'rs, -Barry From skip at mojam.com Wed Jan 31 17:11:19 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 10:11:19 -0600 (CST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> Message-ID: <14968.14631.419491.440774@beluga.mojam.com> What stimulated this thread about making mutable objects (temporarily) immutable? Can someone give me an example where this is actually useful and can't be handled through some existing mechanism? I'm definitely with Fredrik on this one. Sounds like madness to me. I'm just guessing here, but since the most common need for immutable objects is a dictionary keys, I can envision having to test the lock state of a list or dict that someone wants to use as a key everywhere you would normally call has_key: if l.islocked() and d.has_key(l): ... If you want immutable dicts or lists in order to use them as dictionary keys, just serialize them first: survey_says = {"spam": 14, "eggs": 42} sl = marshal.dumps(survey_says) dict[sl] = "spam" Here's another pitfall I can envision. survey_says = {"spam": 14, "eggs": 42} survey_says.lock() dict[survey_says] = "Richard Dawson" survey_says.unlock() At this point can I safely iterate over the keys in the dictionary or not? Skip From skip at mojam.com Wed Jan 31 16:57:30 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 09:57:30 -0600 (CST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: References: <20010131015832.K962@xs4all.nl> Message-ID: <14968.13802.22823.702114@beluga.mojam.com> Ping> x is sequence-like if it provides __getitem__() but not keys() So why does this barf? >>> [].__getitem__ Traceback (most recent call last): File "", line 1, in ? AttributeError: __getitem__ (Obviously, lists *do* understand __getitem__ at some level. Why isn't it exposed in the method table?) Skip From fredrik at pythonware.com Wed Jan 31 18:19:44 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 31 Jan 2001 18:19:44 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <007301c08baa$02908220$e46940d5@hagrid> barry wrote: > Alternative spellings: > > itemsiter(), keysiter(), valsiter() > itemsiterator(), keysiterator(), valuesiterator() > iiterator(), kiterator(), viterator() shouldn't that be xitems, xkeys, xvalues? From mal at lemburg.com Wed Jan 31 18:21:02 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 18:21:02 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> Message-ID: <3A78497E.8BCF197E@lemburg.com> Skip Montanaro wrote: > > What stimulated this thread about making mutable objects (temporarily) > immutable? Can someone give me an example where this is actually useful and > can't be handled through some existing mechanism? I'm definitely with > Fredrik on this one. Sounds like madness to me. This thread is an offspring of the "for something in dict:" thread. The problem we face when iterating over mutable objects is that the underlying objects can change. By marking them read-only we can safely iterate over their contents. Another advantage of being able to mark mutable as read-only is that they may become usable as dictionary keys. Optimizations such as self-reorganizing read-only dictionaries would also become possible (e.g. attribute dictionaries which are read-only could calculate a second hash value to make the hashing perfect). > I'm just guessing here, but since the most common need for immutable objects > is a dictionary keys, I can envision having to test the lock state of a list > or dict that someone wants to use as a key everywhere you would normally > call has_key: > > if l.islocked() and d.has_key(l): > ... > > If you want immutable dicts or lists in order to use them as dictionary > keys, just serialize them first: > > survey_says = {"spam": 14, "eggs": 42} > sl = marshal.dumps(survey_says) > dict[sl] = "spam" Sure and that's what .items(), .keys() and .values() do. The idea was to avoid the extra step of creating lists or tuples first. > Here's another pitfall I can envision. > > survey_says = {"spam": 14, "eggs": 42} > survey_says.lock() > dict[survey_says] = "Richard Dawson" > survey_says.unlock() > > At this point can I safely iterate over the keys in the dictionary or not? Tim already pointed out that we will need two different read-only states: a) temporary b) permanent For dictionaries to become usable as keys in another dictionary, they'd have to marked permanently read-only. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at alum.mit.edu Wed Jan 31 05:35:58 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jan 2001 23:35:58 -0500 (EST) Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) In-Reply-To: <3A77FD5C.DE8729DC@lemburg.com> References: <3A77FD5C.DE8729DC@lemburg.com> Message-ID: <14967.38446.700271.122029@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: >> Modified Files: compile.c Log Message: Enforce two illegal import >> statements that were outlawed in the reference manual but not >> checked: Names bound by import statemants may not occur in global >> statements in the same scope. The from ... import * form may only >> occur in a module scope. >> >> I guess these changes could break code, but the reference manual >> warned about them. MAL> Jeremy, your code breaks all uses of "from package import MAL> submodule" inside packages. MAL> Try distutils for example or setup.py.... Quite aside from whether the changes should be preserved, I don't see how "from package import submodule" is affected. I ran setup.py without any problem; I wouldn't have been able to build Python otherwise. I wrote some simple test cases and didn't have any trouble with the form you describe. Can you provide a concrete example? It may be that something other than the changes mentioned above that is causing you problems. Jeremy From jeremy at alum.mit.edu Wed Jan 31 05:35:58 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jan 2001 23:35:58 -0500 (EST) Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) In-Reply-To: <3A77FD5C.DE8729DC@lemburg.com> References: <3A77FD5C.DE8729DC@lemburg.com> Message-ID: <14967.38446.700271.122029@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: >> Modified Files: compile.c Log Message: Enforce two illegal import >> statements that were outlawed in the reference manual but not >> checked: Names bound by import statemants may not occur in global >> statements in the same scope. The from ... import * form may only >> occur in a module scope. >> >> I guess these changes could break code, but the reference manual >> warned about them. MAL> Jeremy, your code breaks all uses of "from package import MAL> submodule" inside packages. MAL> Try distutils for example or setup.py.... Quite aside from whether the changes should be preserved, I don't see how "from package import submodule" is affected. I ran setup.py without any problem; I wouldn't have been able to build Python otherwise. I wrote some simple test cases and didn't have any trouble with the form you describe. Can you provide a concrete example? It may be that something other than the changes mentioned above that is causing you problems. Jeremy From barry at digicool.com Wed Jan 31 18:20:24 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 12:20:24 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> <007301c08baa$02908220$e46940d5@hagrid> Message-ID: <14968.18776.644453.903217@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> shouldn't that be xitems, xkeys, xvalues? Or iitems(), ikeys(), ivalues()? Personally, I don't much care. If we get consensus on the more important issue of going with methods instead of new syntax, I'm sure Guido will pick whatever method names appeal to him most. -Barry From ping at lfw.org Wed Jan 31 18:14:15 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 09:14:15 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: On Wed, 31 Jan 2001, Skip Montanaro wrote: > Ping> x is sequence-like if it provides __getitem__() but not keys() > > So why does this barf? > > >>> [].__getitem__ I was describing how to tell if instances are sequence-like. Before we get to make that judgement, first we have to look at the C method table. So: x is sequence-like if it has tp_as_sequence; all instances have tp_as_sequence; an instance is sequence-like if it has __getitem__() but not keys() x is mapping-like if it has tp_as_mapping; all instances have tp_as_mapping; an instance is mapping-like if it has both __getitem__() and keys() The "in" operator is implemented this way. x customizes "in" if it has sq_contains; all instances have sq_contains; an instance customizes "in" if it has __contains__() If sq_contains is missing, or if an instance has no __contains__ method, we supply the default behaviour by comparing the operand to each member of x in turn. This default behaviour is implemented twice: once in PyObject_Contains, and once in instance_contains. So i proposed this same structure for sq_iter and __iter__. x customizes "for ... in x" if it has sq_iter; all instances have sq_iter; an instance customizes "in" if it has __iter__() If sq_iter is missing, or if an instance has no __iter__ method, we supply the default behaviour by calling PyObject_GetItem on x and incrementing the index until IndexError. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From mal at lemburg.com Wed Jan 31 18:57:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 18:57:20 +0100 Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) References: <3A77FD5C.DE8729DC@lemburg.com> <14967.38446.700271.122029@localhost.localdomain> Message-ID: <3A785200.FFB37CAD@lemburg.com> Jeremy Hylton wrote: > > >>>>> "MAL" == M -A Lemburg writes: > > >> Modified Files: compile.c Log Message: Enforce two illegal import > >> statements that were outlawed in the reference manual but not > >> checked: Names bound by import statemants may not occur in global > >> statements in the same scope. The from ... import * form may only > >> occur in a module scope. > >> > >> I guess these changes could break code, but the reference manual > >> warned about them. > > MAL> Jeremy, your code breaks all uses of "from package import > MAL> submodule" inside packages. > > MAL> Try distutils for example or setup.py.... > > Quite aside from whether the changes should be preserved, I don't see > how "from package import submodule" is affected. I ran setup.py > without any problem; I wouldn't have been able to build Python > otherwise. I wrote some simple test cases and didn't have any trouble > with the form you describe. Perhaps you still had old .pyc files in your installation dir ? > Can you provide a concrete example? It may be that something other > than the changes mentioned above that is causing you problems. The distutils code is full of imports like these (and other code I'm running is too): distutils/cmd.py: def __init__ (self, dist): """Create and initialize a new Command object. Most importantly, invokes the 'initialize_options()' method, which is the real initializer and depends on the actual command being instantiated. """ # late import because of mutual dependence between these classes from distutils.dist import Distribution This is the report I got from Benjamin Collar: > I've gotten the newest CVS tarball, but setup.py is still not > working; this time with a different error. I will resubmit a bug on > sourceforge if that's the proper way to handle this. Here's the error: > > ./python ./setup.py build > Traceback (most recent call last): > File "./setup.py", line 12, in ? > from distutils.core import Extension, setup > File "/usr/src/python/dist/src/Lib/distutils/core.py", line 20, in ? > from distutils.cmd import Command > File "/usr/src/python/dist/src/Lib/distutils/cmd.py", line 15, in ? > from distutils import util, dir_util, file_util, archive_util, > dep_util > SyntaxError: 'from ... import *' may only occur in a module scope > make: *** [sharedmods] Error 1 -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Wed Jan 31 19:33:56 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 12:33:56 -0600 (CST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A78497E.8BCF197E@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> <3A78497E.8BCF197E@lemburg.com> Message-ID: <14968.23188.573257.392841@beluga.mojam.com> MAL> This thread is an offspring of the "for something in dict:" thread. MAL> The problem we face when iterating over mutable objects is that the MAL> underlying objects can change. By marking them read-only we can MAL> safely iterate over their contents. I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. (And what about Andy Dustman's cool sqldict stuff?) If you can't extend this concept in a reasonable fashion to cover (most of) the other objects that smell like dictionaries, I think you'll just be adding needless complications for a feature than can't be used where it's really needed. I see no problem asking for the items() of an in-memory dictionary in order to get a predictable list to iterate over, but doing that for disk-based mappings would be next to impossible. So, I'm stuck iterating over something can can change out from under me. In the end, the programmer will still have to handle border cases specially. Besides, even if you *could* lock your disk-based mapping, are you really going to do that in situations where its sharable (that's what databases they are there for, after all)? I suspect you're going to keep the database mutable and work around any resulting problems. If you want to implement "for key in dict:", why not just have the VM call keys() under the covers and use that list? It would be no worse than the situation today where you call "for key in dict.keys():", and with the same caveats. If you're dumb enough to do that for an on-disk mapping object, well, you get what you asked for. Skip From esr at thyrsus.com Wed Jan 31 18:55:00 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 31 Jan 2001 12:55:00 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A78045F.7DB50871@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 01:26:07PM +0100 References: <3A78045F.7DB50871@lemburg.com> Message-ID: <20010131125500.C5151@thyrsus.com> M.-A. Lemburg : > Anyway, names really don't matter much, so how about: > > .mutable([flag]) -> integer > > If called without argument, returns 1/0 depending on whether > the object is mutable or not. When called with a flag argument, > sets the mutable state of the object to the value indicated > by flag and returns the previous flag state. I'll bear this in mind if things progress to the point where a PEP is indicated. -- Eric S. Raymond From tim.one at home.com Wed Jan 31 20:49:34 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 14:49:34 -0500 Subject: [Python-Dev] WARNING: Changed build process for zlib on Windows In-Reply-To: Message-ID: [Mark Hammond] > ... > The new process is very simple, but may break some peoples build. > ... > The reason this _should_ not break your build is that your > _probably_ already have a "..\..\zlib-1.1.3" directory installed > in the right place so the header files can be located. Actually, it's certain to break the build for anyone who read PCbuild\readme.txt. But I *want* it to break: changing the directory name is a strong hint that they should download the zlib source code from the same place you did (and which is now explained in PCbuild\readme.txt, and mentioned in the 2.1a2 NEWS file). Other than that, worked first time, and-- even better --the second time too . From esr at thyrsus.com Wed Jan 31 18:53:16 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 31 Jan 2001 12:53:16 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 01:01:24PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> Message-ID: <20010131125316.B5151@thyrsus.com> M.-A. Lemburg : > Eric, could you write a PEP for this ? Not yet. I'm about (at Guido's suggestion) to submit a revised ternary-select proposal. Let's process that first. -- Eric S. Raymond "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom." -- John F. Kennedy From tim.one at home.com Wed Jan 31 21:28:00 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 15:28:00 -0500 Subject: [Python-Dev] weak refs and jython In-Reply-To: <200101311234.NAA24584@core.inf.ethz.ch> Message-ID: [Samuele Pedroni] > I have read weak ref PEP, maybe too late. > I don't know if portability of code using weak refs between > python and jython was a goal or could be one, CPython generally doesn't want to do anything impossible for Jython, if it can help it. > and up to which extent actual impl. will correspond to the PEP. Don't care about that. > ... > AFAIK using java weak refs (which I think is a natural choice) I > see no way (at least no worth-the-effort way) to implement this > in jython. Java weak refs cannot be resurrected. Thanks for bringing this up! Fred is looking into it. From fdrake at acm.org Wed Jan 31 21:25:51 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 31 Jan 2001 15:25:51 -0500 (EST) Subject: [Python-Dev] weak refs and jython In-Reply-To: <200101311234.NAA24584@core.inf.ethz.ch> References: <200101311234.NAA24584@core.inf.ethz.ch> Message-ID: <14968.29903.183882.41485@cj42289-a.reston1.va.home.com> Samuele Pedroni writes: > AFAIK using java weak refs (which I think is a natural choice) I see > no way (at least no worth-the-effort way) to implement this in jython. > Java weak refs cannot be resurrected. This is certainly annoying. How about this: the callback receives the weak reference object or proxy which it was registered on as a parameter. Since the reference has already been cleared, there's no way to get the object back, so we don't need to get it from Java either. Would that be workable? (I'm adjusting my patch now.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Wed Jan 31 21:56:52 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 15:56:52 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: [Ping] > x is sequence-like if it provides __getitem__() but not keys() [Skip] > So why does this barf? > > >>> [].__getitem__ > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: __getitem__ > > (Obviously, lists *do* understand __getitem__ at some level. Why > isn't it exposed in the method table?) The old type/class split: list is a type, and types spell their "method tables" in ways that have little in common with how classes do it. See PyObject_GetItem in abstract.c for gory details (e.g., dicts spell their version of getitem via ->tp_as_mapping->mp_subscript(...), while lists spell it ->tp_as_sequence->sq_item(...); neither has any binding to the attr "__getitem__"; instance objects fill in both the tp_as_mapping and tp_as_sequence slots, then map both the mp_subscript and sq_item slots to classobject.c's instance_item, which in turn looks up "__getitem__"). bet-you're-sorry-you-asked-ly y'rs - tim From tim.one at home.com Wed Jan 31 22:24:53 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 16:24:53 -0500 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: <3A78226B.2E177EFE@lemburg.com> Message-ID: [M.-A. Lemburg] > AFAIR, Vladimir's malloc implementation favours small objects. It favors the memory alloc/dealloc patterns Vlad recorded while running an instrumented Python. Which is mostly good news. The flip side is that it favors the specific programs he ran, and who knows whether those are "typical". OTOH, vendor mallocs favor the programs *they* ran, which probably didn't include Python at all . > ... > Perhaps we should think about adding his lib to the core ?! It's patch 101104 on SF. I pushed Vlad to push this for 2.0, but he wisely decided it was too big a change at the time. It's certainly too much a change to slam into 2.1 at this late stage too. There are many reasons to want this (e.g., list.append() calls realloc every time today, because, despite over-allocating, it has no idea how much storage *has* already been allocated; any malloc has to know this info under the covers, but there's no way for us to know that too unless we add another N bytes to every list object to record it, or use our own malloc which *can* tell us that info). list.append()-behavior-varies-wildly-across-platforms-today- when-the-list-gets-large-because-of-that-ly y'rs - tim From tim.one at home.com Wed Jan 31 22:49:31 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 16:49:31 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A78002F.DC8F0582@lemburg.com> Message-ID: [Tim] >> Seems an unrelated topic: would "iterators for dictionaries" solve the >> supposed problem with iteration order? [MAL] > No, but it would solve the problem in a more elegant and > generalized way. I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the problem], but it would solve the problem ...". Can only assume we're switching topics within single sentences now . > Besides, it also allows writing code which is thread safe, since > the iterator can take special actions to assure that the dictionary > doesn't change during the iteration phase (see the other thread > about "making mutable objects readonly"). Sorry, but immutability has nothing to do with thread safety (the latter has to do with "doing a right thing" in the presence of multiple threads, to keep data structures internally consistent; raising an exception is never "a right thing" unless the user is violating the advertised semantics, and if mutation during iteration is such a violation, the presence or absence of multiple threads has nothing to do with that). IOW, perhaps, a critical section is an area of non-exceptional serialization, not a landmine that makes other threads *blow up* if they touch it. > ... > I don't remember the figures, but these micor optimizations That's plural, but I thought you were talking specifically about the mutable counter object. I don't know which, but the two statements don't jibe. > do speedup loops by a noticable amount. Just compare the performance > of stock Python 1.5 against my patched version. No time now, but after 2.1 is out, sure, wrt it (not 1.5). From tim.one at home.com Wed Jan 31 23:10:12 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 17:10:12 -0500 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: Message-ID: [Michael Hudson] > ... > Can anyone try this on Windows? Seeing as windows malloc > reputedly sucks, maybe the differences would be bigger. No time now (pymalloc is a non-starter for 2.1). Was tried in the past on Windows. Helped significantly. Unclear how much was simply due to exploiting the global interpreter lock, though. "Windows" is also a multiheaded beast (e.g., NT has very different memory performance characteristics than 95). From tim.one at home.com Wed Jan 31 23:43:59 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 17:43:59 -0500 Subject: generators (was RE: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: <20010130092454.D18319@glacier.fnational.com> Message-ID: [Neil Schemenauer] > What's the chances of getting generators into 2.2? Unknown. IMO it has more to do with generalizing the iteration protocol than with generators per se (a generator object that doesn't play nice with "for" is unpleasant to use; otoh, a generator object that can't be used divorced from "for" is frustrating too (like when comparing the fringes of two trees efficiently, which requires interleaving two distinct traversals, each naturally recursive on its own)). > The implementation should not be hard. Didn't Steven Majewski have > something years ago? Yes, but Guido also sketched out a nearly complete implementation within the last year or so. > Why do we always get sidetracked on trying to figure out how to > do coroutines and continuations? Sorry, I've been failing to find a good answer to that question for a decade <0.4 wink>. I should note, though, that Guido's current notion of "generator" is stronger than Icon/CLU/Sather's (which are "strictly stack-like"), and requires machinery more elaborate than StevenM (or Guido) sketched before. > Generators would add real power to the language and are simple > enough that most users could benefit from them. Also, it should be > possible to design an interface that does not preclude the > addition of coroutines or continuations later. Agreed. > I'm not volunteering to champion the cause just yet. I just want > to know if there is some issue I'm missing. microthreads have an enthusiastic and possibly growing audience. That gets into (C) stacklessness, though, as do coroutines. I'm afraid that once you go beyond "simple" (Icon) generators, a whole world of other stuff gets pulled in. The key trick to implementing simple generators in current Python is simply to decline decrementing the frame's refcount upon a "suspend" (of course the full details are more involved than *just* that, but they mostly follow *from* just that). everything-is-the-enemy-of-something-ly y'rs - tim From skip at mojam.com Wed Jan 31 23:27:38 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 16:27:38 -0600 (CST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: References: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: <14968.37210.886842.820413@beluga.mojam.com> >>>>> "Tim" == Tim Peters writes: >> (Obviously, lists *do* understand __getitem__ at some level. Why >> isn't it exposed in the method table?) Tim> The old type/class split: list is a type, and types spell their Tim> "method tables" in ways that have little in common with how classes Tim> do it. The problem that rolls around in the back of my mind from time-to-time is that since Python doesn't currently support interfaces, checking for specific methods seems to be the only reasonable way to determine if a object does what you want or not. What would break if we decided to simply add __getitem__ (and other sequence methods) to list object's method table? Would they foul something up or would simply sit around quietly waiting for hasattr to notice them? Skip From pedroni at inf.ethz.ch Wed Jan 31 23:29:37 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Wed, 31 Jan 2001 23:29:37 +0100 Subject: [Python-Dev] weak refs and jython References: <200101311234.NAA24584@core.inf.ethz.ch> <14968.29903.183882.41485@cj42289-a.reston1.va.home.com> Message-ID: <001f01c08bd5$4c9c9900$7c5821c0@newmexico> Hi. [Fred L. Drake, Jr.] > > Java weak refs cannot be resurrected. > > This is certainly annoying. > How about this: the callback receives the weak reference object or > proxy which it was registered on as a parameter. Since the reference > has already been cleared, there's no way to get the object back, so we > don't need to get it from Java either. > Would that be workable? (I'm adjusting my patch now.) Yes, it is workable: clearly we can implement weak refs only under java2 but this is not (really) an issue. We can register the refs in a java reference queue, and poll it lazily or trough a low-priority thread in order to invoke the callbacks. -- Some remarks I have used java weak/soft refs to implement some of the internal tables of jython in order to avoid memory leaks, at least under java2. I imagine that the idea behind callbacks plus resurrection was to enable the construction of sofisticated caches. My intuition is that these features are not present under java because they will interfere too much with gc and have a performance penalty. On the other hand java offers reference queues and soft references, the latter cover the common case of caches that should be cleared when there is few memory left. (Never tried them seriously, so I don't know if the actual impl is fair, or will just wait too much starting to discard things => behavior like primitives gc). The main difference I see between callbacks and queues approach is that with queues is this left to the user when to do the actual cleanup of his tables/caches, and handling queues internally has a "low" overhead. With callbacks what happens depends really on the collection times/patterns and the overhead is related to call overhead and how much is non trivial, what the user put in the callbacks. Clearly general performance will not be easily predictable. (From a theoretical viewpoint one can simulate more or less queues with callbacks and the other way around). Resurrection makes few sense with queues, but I can easely see that lacking of both resurrection and soft refs limits what can be done with weak-like refs. Last thing: one of the things that is really missing in java refs features is that one cannot put conditions of the form as long A is not collected B should not be collected either. Clearly I'm referring to situation when one cannot modify the class of A in order to add a field, which is quite typical in java. This should not be a problem with python and its open/dynamic way-of-life. regards, Samuele Pedroni. From mal at lemburg.com Wed Jan 31 20:03:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 20:03:12 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> <3A78497E.8BCF197E@lemburg.com> <14968.23188.573257.392841@beluga.mojam.com> Message-ID: <3A786170.CD65B8A4@lemburg.com> Skip Montanaro wrote: > > MAL> This thread is an offspring of the "for something in dict:" thread. > MAL> The problem we face when iterating over mutable objects is that the > MAL> underlying objects can change. By marking them read-only we can > MAL> safely iterate over their contents. > > I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. > (And what about Andy Dustman's cool sqldict stuff?) If you can't extend > this concept in a reasonable fashion to cover (most of) the other objects > that smell like dictionaries, I think you'll just be adding needless > complications for a feature than can't be used where it's really needed. We are currently only talking about Python dictionaries here, even though other objects could also benefit from this. > I see no problem asking for the items() of an in-memory dictionary in order > to get a predictable list to iterate over, but doing that for disk-based > mappings would be next to impossible. So, I'm stuck iterating over > something can can change out from under me. In the end, the programmer will > still have to handle border cases specially. Besides, even if you *could* > lock your disk-based mapping, are you really going to do that in situations > where its sharable (that's what databases they are there for, after all)? I > suspect you're going to keep the database mutable and work around any > resulting problems. > > If you want to implement "for key in dict:", why not just have the VM call > keys() under the covers and use that list? It would be no worse than the > situation today where you call "for key in dict.keys():", and with the same > caveats. If you're dumb enough to do that for an on-disk mapping object, > well, you get what you asked for. That's why iterators do a much better task here. In DB design these are usually called cursors which the allow moving inside large result sets. But this really is a different topic... Readonlyness could be put to some good use in optimizing data structure for which you know that they won't change anymore. Temporary readonlyness has the nice sideeffect of allowing low-level lock implementations and makes writing thread safe code easier to handle, because you can make assertions w/r to the immutability of an object during a certain period of time explicit in your code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 21:36:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 21:36:54 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A78045F.7DB50871@lemburg.com> <20010131125500.C5151@thyrsus.com> Message-ID: <3A787766.35453597@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > Anyway, names really don't matter much, so how about: > > > > .mutable([flag]) -> integer > > > > If called without argument, returns 1/0 depending on whether > > the object is mutable or not. When called with a flag argument, > > sets the mutable state of the object to the value indicated > > by flag and returns the previous flag state. > > I'll bear this in mind if things progress to the point where a PEP is > indicated. Great :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Wed Jan 31 17:23:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 11:23:37 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Wed, 31 Jan 2001 13:35:36 GMT." <3a780eda.16144995@smtp.worldonline.dk> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> <20010131135914.N962@xs4all.nl> <3a780eda.16144995@smtp.worldonline.dk> Message-ID: <200101311623.LAA01774@cj20424-a.reston1.va.home.com> [Finn] > >> Using global on an import name is currently ignored by Jython because > >> the name assignment is done by the runtime, not the compiler. [Thomas] > >So it's impossible to do, in Jython, something like: > > > >def fillme(): > > global me > > import me > > > >but it is possible to do: > > > >def fillme(): > > global me > > import me as _me > > me = _me > > > >? [Finn again] > Yes, only the second example will make a global variable. > > > I have to say I don't like that; we're always claiming 'import' (and > >'def' and 'class' for that matter) are 'just another way of writing > >assignment'. All these special cases break that. > > I don't like it either, I was only reported what jython currently does. > The current design used by Jython does lend itself directly towards a > solution, but I don't see anything that makes it impossible to solve. Tentatively, I'd say that this should be documented as a Jython difference and Jython should strive to fix this. So I see no good reason to rule it out in CPython. That doesn't mean I like Thomas's example! It should probably be redesigned along the lines of def fillme(): import me return me me = fillme() to avoid needing side effects on globals. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 31 17:26:11 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 11:26:11 -0500 Subject: [Python-Dev] The 2nd Korea Python Users Seminar Message-ID: <200101311626.LAA01799@cj20424-a.reston1.va.home.com> Wow...! Way to go, Christian! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 31 Jan 2001 22:46:06 +0900 From: "Changjune Kim" To: Subject: The 2nd Korea Python Users Seminar Dear Mr. Guido van Rossum, First of all, I can't thank you more for your great contribution to the presence of Python. It is not a mere computer programming language but a whole culture, I think. I am proud to tell you that we are having the 2nd Korea Python Users Seminar which is wide open to the public. There are already more than 400 people who registered ahead, and we expect a few more at the site. The seminar will be held in Seoul, South Korea on Feb 2. With the effort of Korea Python Users Group, there has been quite a boom or phenomenon for Python among developers in Korea. Several magazines are _competitively_ carrying regular articles about Python -- I'm one of the authors -- and there was an article even on a _normal_ newspaper, one of the major four big newspapers in Korea, which described the sprouting of Python in Korea and pointed its extreme easiness to learn. (moreover, it's the year of the snake in the 12 zodiac animals) The seminar is mainly about: Python 2.0, intro for newbies, Python coding style, ZOPE, internationalization of Zope for Korean, GUIs such as wxPython, PyQt, Internet programming in Python, Python with UML, Python C/API, XML with Python, and Stackless Python. Christian Tismer is coming for SPC presentation with me, and Hostway CEO Lucas Roh will give a talk about how they are using Python, and one of the Python evangelists, Brian Lee, CTO of Linuxkorea will give a brief intro to Python and Python C/API. I'm so excited and happy to tell you this great news. If there is any message you want to give to Korea Python Users Group and the audience, it'd be great -- I could translate it and post it at the site for all the audience. Thank you again for your wonderful snake. Best regards, June from Korea. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com ------- End of Forwarded Message From moshez at zadka.site.co.il Wed Jan 31 21:32:45 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 22:32:45 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <007301c08baa$02908220$e46940d5@hagrid> References: <007301c08baa$02908220$e46940d5@hagrid>, <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <20010131203245.E813BA83E@darjeeling.zadka.site.co.il> [Barry] > itemsiter(), keysiter(), valsiter() > itemsiterator(), keysiterator(), valuesiterator() > iiterator(), kiterator(), viterator() [/F] > shouldn't that be xitems, xkeys, xvalues? I'm so hoping I missed a there somewhere. Please, no more of the dreaded 'x'. thinking-of-ripping-x-from-my-keyboard-ly y'rs, Z. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From thomas at xs4all.net Wed Jan 31 22:00:33 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 22:00:33 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: <3A78226B.2E177EFE@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 03:34:19PM +0100 References: <3A78226B.2E177EFE@lemburg.com> Message-ID: <20010131220033.O962@xs4all.nl> On Wed, Jan 31, 2001 at 03:34:19PM +0100, M.-A. Lemburg wrote: > I have made similar experience with -On with n>3 compared to -O2 > using pgcc (gcc optimized for PC processors). BTW, the Linux > kernel uses "-Wall -Wstrict-prototypes -O3 -fomit-frame-pointer" > as CFLAGS -- perhaps Python should too on Linux ?! Maybe, but the Linux kernel can be quite specific in what version of gcc you need, and knows in advance on what platform you are using it :) The stability and actual speedup of gcc's optimization options can and does vary across platforms. In the above example, -Wall and -Wstrict-prototypes are just warnings, and -O3 is the same as "-O2 -finline-functions". As for -fomit-frame-pointer.... > Does anybody know about the effect of -fomit-frame-pointer ? > Would it cause problems or produce code which is not compatible > with code compiled without this flag ? The effect of -fomit-frame-pointer is that the compilation of frame-pointer handling code is avoided. It doesn't have any effect on compatibility, since it doesn't matter that other parts/functions/libraries do have such code, but it does make debugging impossible (on most machines, in any case.) From GCC's info docs: -fomit-frame-pointer' Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. *It also makes debugging impossible on some machines.* On some machines, such as the Vax, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro RAME_POINTER_REQUIRED' controls whether a target machine supports this flag. *Note Registers::. Obviously, for the Linux kernel this is a very good thing, you don't debug the Linux kernel like a normal program anyway (contrary to some other UNIX kernels, I might add.) I believe -g turns off -fomit-frame-pointer itself, but the docs for -g or -fomit-frame-pointer don't mention it. One other thing I noted in the gcc docs is that gcc doesn't do loop unrolling even with -O3, though I thought it would at -O2. You need to add -funroll-loop to enable loop unrolling, and that might squeeze out some more performance.. This only works for loops with a fixed repetition, though, so I'm not sure if it matters. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Wed Jan 31 20:14:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 20:14:58 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <14968.16962.830739.920771@anthem.wooz.org>; from barry@digicool.com on Wed, Jan 31, 2001 at 11:50:10AM -0500 References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <20010131201457.I922@xs4all.nl> [ Trimming CC: line ] On Wed, Jan 31, 2001 at 11:50:10AM -0500, Barry A. Warsaw wrote: > Moshe, I had exactly the same reaction and exactly the same idea. I'm > a strong -1 on introducing new syntax for this when new methods can > handle it in a much more readable way (IMO). Same here. I *might* like it if iterators were given a format string (or tuple object, or whatever) so they knew what the iterating code expected (so something like this: for x,y,z in obj would translate into iterator(obj)("(x,y,z)") or maybe just iterator(obj)((None,None,None)) or maybe even just iterator(obj)(3) # that is, number of elements or so) but I suspect it might be too cute (and obfuscated) for Python, especially if it was put to use to distingish between 'for x:y in obj' and 'for x,y in obj'. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From sjoerd at oratrix.nl Wed Jan 31 21:05:06 2001 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Wed, 31 Jan 2001 21:05:06 +0100 Subject: [Python-Dev] python setup.py fails with illegal import (+ fix) Message-ID: <20010131200507.A106931E1AD@bireme.oratrix.nl> With the current CVS version, running python setup.py as part of the build process fails with a syntax error: Traceback (most recent call last): File "../setup.py", line 12, in ? from distutils.core import Extension, setup File "/usr/people/sjoerd/src/python/Lib/distutils/core.py", line 20, in ? from distutils.cmd import Command File "/usr/people/sjoerd/src/python/Lib/distutils/cmd.py", line 15, in ? from distutils import util, dir_util, file_util, archive_util, dep_util SyntaxError: 'from ... import *' may only occur in a module scope The fix is to change the from ... import * that the compiler complains about: Index: file_util.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/distutils/file_util.py,v retrieving revision 1.7 diff -u -c -r1.7 file_util.py *** file_util.py 2000/09/30 17:29:35 1.7 --- file_util.py 2001/01/31 20:01:56 *************** *** 106,112 **** # changing it (ie. it's not already a hard/soft link to src OR # (not update) and (src newer than dst). ! from stat import * from distutils.dep_util import newer if not os.path.isfile(src): --- 106,112 ---- # changing it (ie. it's not already a hard/soft link to src OR # (not update) and (src newer than dst). ! from stat import ST_ATIME, ST_MTIME, ST_MODE, S_IMODE from distutils.dep_util import newer if not os.path.isfile(src): I didn't check this in because distutils is Greg Ward's baby. -- Sjoerd Mullender From mal at lemburg.com Wed Jan 31 23:24:43 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 23:24:43 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A7890AB.69B893F9@lemburg.com> Tim Peters wrote: > > [Michael Hudson] > > ... > > Can anyone try this on Windows? Seeing as windows malloc > > reputedly sucks, maybe the differences would be bigger. > > No time now (pymalloc is a non-starter for 2.1). Was tried in the past on > Windows. Helped significantly. Unclear how much was simply due to > exploiting the global interpreter lock, though. "Windows" is also a > multiheaded beast (e.g., NT has very different memory performance > characteristics than 95). We're still in alpha, no ? Adding pymalloc is not much of a deal since it fits nicely with the Python malloc macros and giving the package a nice spin by putting it into a Python alpha release would sure create more confidence in this nice piece of work. We can always take it out again before going into the beta phase. Or do we have a 2.1 feature freeze already ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 23:15:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 23:15:50 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A788E96.AB823FAE@lemburg.com> Tim Peters wrote: > > [Tim] > >> Seems an unrelated topic: would "iterators for dictionaries" solve the > >> supposed problem with iteration order? > > [MAL] > > No, but it would solve the problem in a more elegant and > > generalized way. > > I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the > problem], but it would solve the problem ...". Can only assume we're > switching topics within single sentences now . Sorry, not my brightest day today... what I wanted to say is that iterators would solve the problem of defining "something" in "for something in dict" nicely. Since iterators can define the order in which a data structure is traversed, this would also do away with the second (supposed) problem. > > Besides, it also allows writing code which is thread safe, since > > the iterator can take special actions to assure that the dictionary > > doesn't change during the iteration phase (see the other thread > > about "making mutable objects readonly"). > > Sorry, but immutability has nothing to do with thread safety (the latter has > to do with "doing a right thing" in the presence of multiple threads, to > keep data structures internally consistent; raising an exception is never "a > right thing" unless the user is violating the advertised semantics, and if > mutation during iteration is such a violation, the presence or absence of > multiple threads has nothing to do with that). IOW, perhaps, a critical > section is an area of non-exceptional serialization, not a landmine that > makes other threads *blow up* if they touch it. Who said that an exception is raised ? The method I posted on the mutability thread allows querying the current state just like you would query the availability of a resource. > > ... > > I don't remember the figures, but these micor optimizations > > That's plural, but I thought you were talking specifically about the mutable > counter object. I don't know which, but the two statements don't jibe. The counter object patch is a micro-optimization and as such will only give you a gain of a few percent. What makes the difference is the sum of these micro optimizations. Here's the patch for Python 1.5 which includes the optimizations: http://www.lemburg.com/python/mxPython-1.5.patch.gz > > do speedup loops by a noticable amount. Just compare the performance > > of stock Python 1.5 against my patched version. > > No time now, but after 2.1 is out, sure, wrt it (not 1.5). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 1 01:13:12 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 31 Dec 2000 19:13:12 -0500 Subject: [Python-Dev] Re: Most everything is busted In-Reply-To: <14926.34447.60988.553140@anthem.concentric.net> Message-ID: [Barry A. Warsaw] > There's a stupid, stupid bug in Mailman 2.0, which I've just fixed > and (hopefully) unjammed things on the Mailman end[1]. We're still > probably subject to the Postfix delays unfortunately; I think those > are DNS related, and I've gotten a few other reports of DNS oddities, > which I've forwarded off to the DC sysadmins. I don't think that > particular problem will be fixed until after the New Year. > > relax-and-enjoy-the-quiet-ly y'rs, I would have, except you appear to have ruined it: hundreds of msgs disgorged overnight and into the afternoon. And echoes of email to c.l.py now routinely come back in minutes instead of days. Overall, ya, I liked it better when it was broken -- jerk . typical-user-ly y'rs - tim From tim.one at home.com Mon Jan 1 02:31:18 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 31 Dec 2000 20:31:18 -0500 Subject: [Python-Dev] Copyrights and licensing (was ... something irrelevant) In-Reply-To: <200012291652.RAA20251@pandora.informatik.hu-berlin.de> Message-ID: [Martin von Loewis] > I'd like to get an "official" clarification on this question. Is it > the case that patches containing copyright notices are only accepted > if they are accompanied with license information? It's nigh unto impossible to get Guido to pay attention to these kinds of issues until after it's too late -- guess who's still trying to get an FSF approved license for Python 1.6 . What I intend to push for is that nothing be accepted except under the understanding that copyright is assigned to the Python Software Foundation; but, since that doesn't exist yet, we're in limbo. > I agree that the changes are minor, I also believe that I hold the > copyright to the changes whether I attach a notice or not (at least > according to our local copyright law). Under U.S. law too. The difference is that, without an explicit copyright notice, it's a lot easier to get lawyers to ignore that reality <0.3 wink>. When the PSF does come into being, the lawyers will doubtless make us hassle everyone with an explicit copyright notice into signing reams of paperwork. It's a drain on time and money for all concerned, IMO, with no real payback. > What concerns me that without such a notice, gencodec.py looks as if > CNRI holds the copyright to it. I'm not willing to assign the > copyright of my changes to CNRI, and I'd like to avoid the impression > of doing so. Understood, and with sympathy. Since the status of JPython/Jython is still muddy, I urged Finn Bock to put his own copyright notice on his Jython work for exactly the same reason (i.e., to prevent CNRI claiming it later). Seems to me, though, that it may simplify life down the road if, whenever an author felt a similar need to assert copyright explicitly, they list Guido as the copyright holder. He's not going to screw Python! And it's inevitable that all Python copyrights will eventually be owned by him and/or the PSF anyway. But, for God's sake, whatever you do, *please* (anyone) don't make us look at a unique license! We're not lawyers, but we've been paying lawyers out of our own pockets to do this crap, and it's expensive and time-consuming. If you can't trust Guido to do a Right Thing with your code, Python is better off without it over the long haul. > What is even more concerning is that CNRI also holds the copyright to > the generated files, even though they are derived from information > made available by the Unicode consortium! It's no concern to me -- but then I'm not paranoid . cnri-and-the-uc-can-fight-it-out-if-it-comes-to-that-ly y'rs - tim From moshez at zadka.site.co.il Mon Jan 1 11:01:02 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 1 Jan 2001 12:01:02 +0200 (IST) Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20001231105812.A12168@newcnri.cnri.reston.va.us> References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> Message-ID: <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> On Sun, 31 Dec 2000, Andrew Kuchling wrote: > It also leads to one section of the FAQ (#3, I think) having something > like 60 questions jumbled together. IMHO the FAQ should be a text > file, perhaps in the PEP format so it can be converted to HTML, and it > should have an editor who'll arrange it into smaller sections. Any > volunteers? (Must ... resist ... urge to volunteer myself... help > me, Spock...) Well, Andrew, I know if I leave you any more time, you won't be able to resist the urge. OK, I'll volunteer. Can't do anything right now, but expect to see an updated version posted on my site soon. If people will think it's a good idea, I'll move it to Misc/. Fred, if the some-xml-format-to-HTML you're working on is in any sort of readiness, I'll use that to format the FAQ. Having used Perl in the last couple of weeks, I learned to appreciate the fact that the FAQ is a standard part of the documentation. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From loewis at informatik.hu-berlin.de Mon Jan 1 12:43:34 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 1 Jan 2001 12:43:34 +0100 (MET) Subject: [Python-Dev] Re: Copyrights and licensing (was ... something irrelevant) In-Reply-To: References: Message-ID: <200101011143.MAA11550@pandora.informatik.hu-berlin.de> > Seems to me, though, that it may simplify life down the road if, whenever an > author felt a similar need to assert copyright explicitly, they list Guido > as the copyright holder. He's not going to screw Python! That's a good solution, which I'll implement in a revised patch. Thanks for the advice, and Happy New Year, Martin From mal at lemburg.com Mon Jan 1 18:56:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 01 Jan 2001 18:56:20 +0100 Subject: [Python-Dev] Re: Copyright statements ([Patch #103002] Fix for #116285: Properly raise UnicodeErrors) References: <200012290957.KAA17936@pandora.informatik.hu-berlin.de> <3A4C757D.F64E9CEF@lemburg.com> Message-ID: <3A50C4C4.76A1C5B6@lemburg.com> Martin von Loewis wrote: > > > My only problem with it is your copyright notice. AFAIK, patches to > > the Python core cannot contain copyright notices without proper > > license information. OTOH, I don't think that these minor changes > > really warrant adding a complete license paragraph. > > I'd like to get an "official" clarification on this question. Is it > the case that patches containing copyright notices are only accepted > if they are accompanied with license information? > > I agree that the changes are minor, I also believe that I hold the > copyright to the changes whether I attach a notice or not (at least > according to our local copyright law). True. > What concerns me that without such a notice, gencodec.py looks as if > CNRI holds the copyright to it. I'm not willing to assign the > copyright of my changes to CNRI, and I'd like to avoid the impression > of doing so. > > What is even more concerning is that CNRI also holds the copyright to > the generated files, even though they are derived from information > made available by the Unicode consortium! The copyright for the files and changes needed for the Unicode support was indeed transferred to CNRI earlier this year. This was part of the contract I had with CNRI. I don't know why the copyright notice wasn't subsequently removed from the files after final checkin of the changes, though, because, as I remember, the copyright line was only added as "search&replace" token to the files in question in the sign over period. The codec files were part of the Unicode support patch, even though they were created by the gencodec.py tool I wrote to create them from the Unicode mapping files. That's why they also carry the copyright token. Note that with strict reading of the CNRI license, there's no problem with removing the notice from the files in question: """ ...provided, however, that CNRI's License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) 1995-2000 Corporation for National Research Initiatives; All Rights Reserved" are retained in Python 1.6 alone or in any derivative version prepared by Licensee... """ The copyright line in the Unicode files is "(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.", so this does not match the definition they gave in their license text. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Mon Jan 1 19:58:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 13:58:36 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: Your message of "Fri, 29 Dec 2000 21:59:16 +0100." <20001229215915.L1281@xs4all.nl> References: <20001229215915.L1281@xs4all.nl> Message-ID: <200101011858.NAA09263@cj20424-a.reston1.va.home.com> Thomas just checked this in, using Tim's words: > *** ref7.tex 2000/07/16 19:05:38 1.20 > --- ref7.tex 2000/12/31 22:52:59 1.21 > *************** > *** 243,249 **** > \ttindex{exc_value}\ttindex{exc_traceback}} > > ! The optional \keyword{else} clause is executed when no exception occurs > ! in the \keyword{try} clause. Exceptions in the \keyword{else} clause are > ! not handled by the preceding \keyword{except} clauses. > \kwindex{else} > > --- 243,251 ---- > \ttindex{exc_value}\ttindex{exc_traceback}} > > ! The optional \keyword{else} clause is executed when the \keyword{try} clause > ! terminates by any means other than an exception or executing a > ! \keyword{return}, \keyword{continue} or \keyword{break} statement. > ! Exceptions in the \keyword{else} clause are not handled by the preceding > ! \keyword{except} clauses. > \kwindex{else} How is this different from "when control flow reaches the end of the try clause", which is what I really had in mind? Using the current wording, this paragraph would have to be changed each time a new control-flow keyword is added. Based upon the historical record that's not a grave concern ;-), but I think the new wording relies too much on accidentals such as the fact that these are the only control flow altering events. It may be that control flow is not rigidly defined -- but as it is what was really intended, maybe the fix should be to explain the right concept rather than the current ad-hoc solution. This also avoids concerns of readers who are trying to read too much into the words and might become worried that there are other ways of altering the control flow that *would* cause the else clause to be executed; and guides implementors of other Pyhon-like languages (like vyper) that might have more control-flow altering statements or events. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Mon Jan 1 20:00:38 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 1 Jan 2001 20:00:38 +0100 Subject: [Python-Dev] PSA (Was: FAQ Horribly Out Of Date) Message-ID: <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> > It appears that CNRI can only think about one thing at a time <0.5 > wink>. For the last 6 months, that thing has been the license. If > they ever resolve the GPL compatibility issue, maybe they can be > persuaded to think about the PSA. In the meantime, I'd suggest you > not renew . I think we need to find a better answer than that, and soon. While everybody reading this list probably knows not to renew, the PSA is the first thing that you see when selecting "Python Community" on python.org. The first paragraph reads # The continued, free existence of Python is promoted by the # contributed efforts of many people. The Python Software Activity # (PSA) supports those efforts by helping to coordinate them. The PSA # operates web, ftp, and email services, organizes conferences, and # engages in other activities that benefit the Python user # community. In order to continue, the PSA needs the membership of # people who value Python. If you look at the current members list (http://www.python.org/psa/Members.html), it appears that many long-time members indeed have not renewed. This page was last updated Nov 14 - so it appears that CNRI is still processing applications when they come. It may well be that many of the newer members ask themselves by now what happened to their money; it might not be easy to get an answer to that question. However, there is clearly somebody to blame here: The Python Community. So I'd like to request that somebody with write permissions to these pages changes the text, to something along the lines of replacing the first paragraph with # The Python community organizes itself in different ways; people # interested in discussing development of and with Python usually # participate in mailing lists. # #

Organizations that wish to influence further directions of the # Python language may join the Python # Consortium. # #

The Corporation for # National Research Initiatives hosts the Python Software # Activity, which is described below. The PSA used to provide funding # for the Python development; that is no longer the case. If there is a factual error in this text, please let me know. Regards, Martin From tim.one at home.com Mon Jan 1 20:20:53 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 14:20:53 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [gvanrossum, in an SF patch comment] > Bah. I don't like this one bit. More complexity for a little > bit of extra speed. > I'm keeping this open but expect to be closing it soon unless I > hear a really good argument why more speed is really needed in > this area. Down with code bloat and creeping featurism! Without judging "the solution" here, "the problem" is that everyone's first attempt to use line-at-a-time file input in Perl: while (} { ... $_ ...; } runs 2-5x faster then everyone's first attempt in Python: while 1: line = f.readline() if not line: break ... line ... It would be beneficial to address that *somehow*, cuz 2-5x isn't just "a little bit"; and by the time you walk a newbie thru while 1: lines = f.readlines(hintsize) if not lines: break for line in lines: ... line ... they feel like maybe Perl isn't so obscure after all . Does someone have an elegant way to address this? I believe Jeff's shot at elegance was the other part of the patch, using (his new) xreadlines under the covers to speed the fileinput module. reading-text-files-is-very-common-ly y'rs - tim From guido at digicool.com Mon Jan 1 20:25:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:25:07 -0500 Subject: [Python-Dev] PSA (Was: FAQ Horribly Out Of Date) In-Reply-To: Your message of "Mon, 01 Jan 2001 20:00:38 +0100." <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> References: <200101011900.UAA01672@loewis.home.cs.tu-berlin.de> Message-ID: <200101011925.OAA09669@cj20424-a.reston1.va.home.com> > > It appears that CNRI can only think about one thing at a time <0.5 > > wink>. For the last 6 months, that thing has been the license. If > > they ever resolve the GPL compatibility issue, maybe they can be > > persuaded to think about the PSA. In the meantime, I'd suggest you > > not renew . > > I think we need to find a better answer than that, and soon. While > everybody reading this list probably knows not to renew, the PSA is > the first thing that you see when selecting "Python Community" on > python.org. The first paragraph reads > > # The continued, free existence of Python is promoted by the > # contributed efforts of many people. The Python Software Activity > # (PSA) supports those efforts by helping to coordinate them. The PSA > # operates web, ftp, and email services, organizes conferences, and > # engages in other activities that benefit the Python user > # community. In order to continue, the PSA needs the membership of > # people who value Python. > > If you look at the current members list > (http://www.python.org/psa/Members.html), it appears that many > long-time members indeed have not renewed. This page was last updated > Nov 14 - so it appears that CNRI is still processing applications when > they come. It may well be that many of the newer members ask > themselves by now what happened to their money; it might not be easy > to get an answer to that question. However, there is clearly somebody > to blame here: The Python Community. I don't know how many memberships CNRI has received, but it can't be many, since we sent out no reminders. I'll see if I can get an answer. > So I'd like to request that somebody with write permissions to these > pages changes the text, to something along the lines of replacing the > first paragraph with > > # The Python community organizes itself in different ways; people > # interested in discussing development of and with Python usually > # participate in mailing lists. > # > #

Organizations that wish to influence further directions of the > # Python language may join the Python > # Consortium. > # > #

The Corporation for > # National Research Initiatives hosts the Python Software > # Activity, which is described below. The PSA used to provide funding > # for the Python development; that is no longer the case. > > If there is a factual error in this text, please let me > know. I've done something slightly different -- see http://www.python.org/psa/. I've kept only your first paragraph, and inserted a boldface note before that about the obsolescence (or deprecation :-) of the PSA membership. I've removed the references to the consortium, since that's also about to collapse under its own inactivity; instead, the PSF will be formed, independent from CNRI, to hold the IP rights (insofar they can be assigned to the PSF) and for not much else. I'll see if I can get some more news about the creation of the PSF (which is supposed to be an initiative of ActiveState and Digital Creations). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 1 20:35:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:35:24 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 01 Jan 2001 14:20:53 EST." References: Message-ID: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> > [gvanrossum, in an SF patch comment] > > Bah. I don't like this one bit. More complexity for a little > > bit of extra speed. > > I'm keeping this open but expect to be closing it soon unless I > > hear a really good argument why more speed is really needed in > > this area. Down with code bloat and creeping featurism! > > Without judging "the solution" here, "the problem" is that everyone's first > attempt to use line-at-a-time file input in Perl: > > while (} { > ... $_ ...; > } > > runs 2-5x faster then everyone's first attempt in Python: > > while 1: > line = f.readline() > if not line: > break > ... line ... But is everyone's first thought to time the speed of Python vs. Perl? Why does it hurt so much that this is a bit slow? > It would be beneficial to address that *somehow*, cuz 2-5x isn't just "a > little bit"; and by the time you walk a newbie thru > > while 1: > lines = f.readlines(hintsize) > if not lines: > break > for line in lines: > ... line ... > > they feel like maybe Perl isn't so obscure after all . > > Does someone have an elegant way to address this? I believe Jeff's shot at > elegance was the other part of the patch, using (his new) xreadlines under > the covers to speed the fileinput module. But of course suggesting fileinput is also not a great solution -- it's relatively obscure (since it's not taught by most tutorials, certainly not by the standard tutorial). > reading-text-files-is-very-common-ly y'rs - tim So is worrying about performance without a good reason... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 1 20:49:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 01 Jan 2001 14:49:24 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: Your message of "Mon, 01 Jan 2001 12:01:02 +0200." <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> Message-ID: <200101011949.OAA09804@cj20424-a.reston1.va.home.com> [Moshe] > Well, Andrew, I know if I leave you any more time, you won't be able > to resist the urge. OK, I'll volunteer. Can't do anything right now, > but expect to see an updated version posted on my site soon. If > people will think it's a good idea, I'll move it to Misc/. > Fred, if the some-xml-format-to-HTML you're working on is in any > sort of readiness, I'll use that to format the FAQ. Moshe, if your solution is to turn the FAQ into a document with a single editor again, I think you're not doing the community a favor. Granted, we could add some more sections (easy enough for me if someone tells me the new section headings and which existing questions go where) and there is a lot of obsolete information. But I would be very hesitant to drop the notion of maintaining the FAQ as a group collaboration project. There's nothing wrong with the FAQ wizard except that the password (Spam) should be made publicly known... I've also noticed that Bjorn Pettersen has made a whole slew of useful updates to various sections, mostly updates about new 2.0 features or syntax. > Having used Perl > in the last couple of weeks, I learned to appreciate the fact that > the FAQ is a standard part of the documentation. Does that mean more than that it should be linked to from http://www.python.org/doc/ ? It's already there in the side bar; does it need a more prominent position? I used to include the FAQ in Misc/ (Ping's Misc/faq2html.py script is a last remnant of that), but gave up after realizing that the on-line FAQ is much more useful than a single text file. In my eyes, the best thing you (and everyone else) could do, if you find the time, would be to use the FAQ wizard to fix or delete out-of-date entries. To delete an entry, change its subject to "Deleted" and remove its body; I'll figure out a way to delete them from the index. Because FAQ entries can refer to each other (and are referred to from elsewhere) by number, it's not safe to simply renumber entries. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 1 21:27:37 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 15:27:37 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: <200101011858.NAA09263@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Thomas just checked this in, using Tim's words: [ The optional \keyword{else} clause is executed when no exception occurs in the \keyword{try} clause. Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. vs The optional \keyword{else} clause is executed when the \keyword{try} clause terminates by any means other than an exception or executing a \keyword{return}, \keyword{continue} or \keyword{break} statement. Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. ] > How is this different from "when control flow reaches the end of the > try clause", which is what I really had in mind? Only in that it doesn't appeal to a new undefined phrase, and is (I think) unambiguous in the eyes of a non-specialist reader (like Robin's friend). Note that "reaching the end of the try clause" is at best ambiguous, because you *really* have in mind "falling off the end" of the try clause. It wouldn't be unreasonable to say that in: try: x = 1 y = 2 return 1 "x=1" is the beginning of the try clause and "return 1" is the end. So if the reader doesn't already know what you mean, saying "the end" doesn't nail it (or, if like me, the reader does already know what you mean, it doesn't matter one whit what it says ). > Using the current wording, this paragraph would have to be > changed each time a new control-flow keyword is added. Based > upon the historical record that's not a grave concern ;-), It was sure no concern of mine ... > but I think the new wording relies too much on accidentals such > as the fact that these are the only control flow altering events. > > It may be that control flow is not rigidly defined -- but as it is > what was really intended, maybe the fix should be to explain the > right concept rather than the current ad-hoc solution. > ... OK, except I don't know how to do that succinctly. For example, if Java had an "else" clause, the Java spec would say: If present, the "else block" is executed if and only if execution of the "try block" completes normally, and then there is a choice: If the "else block" completes normally, then the "try" statement completes normally. If the "else block" completes abruptly for reason S, then the "try" statement completes abruptly for reason S. That is, they deal with control-flow issues via appeal to "complete normally" and "complete abruptly" (which latter comes in several flavors ("reasons"), such as returns and exceptions), and there are pages and pages and pages of stuff throughout the spec inductively defining when these conditions obtain. It's clear, precise and readable; but it's also wordy, and we don't have anything similar to build on. As a compromise, given that we're not going to take the time to be precise (well, I'm sure not ...): The optional \keyword{else} clause is executed if and when control flows off the end of the \keyword{try} clause.\foonote{In Python 2.0, control "flows off the end" except in case of exception, or executing a \keyword{return}, \keyword{continue} or \keyword{break} statement.} Exceptions in the \keyword{else} clause are not handled by the preceding \keyword{except} clauses. Now it's all of imprecise, almost precise, specific to Python 2.0, and robust against any future changes . From akuchlin at cnri.reston.va.us Mon Jan 1 21:35:27 2001 From: akuchlin at cnri.reston.va.us (Andrew Kuchling) Date: Mon, 1 Jan 2001 15:35:27 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <200101011949.OAA09804@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 01, 2001 at 02:49:24PM -0500 References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> <200101011949.OAA09804@cj20424-a.reston1.va.home.com> Message-ID: <20010101153527.A14116@newcnri.cnri.reston.va.us> On Mon, Jan 01, 2001 at 02:49:24PM -0500, Guido van Rossum wrote: >But I would be very hesitant to drop the notion of maintaining the FAQ >as a group collaboration project. There's nothing wrong with the FAQ >wizard except that the password (Spam) should be made publicly known... Why multiply the number of mechanisms required to maintain things? We already use CVS for other documentation; why not use it for the FAQ as well? --amk From tim.one at home.com Mon Jan 1 22:00:36 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 16:00:36 -0500 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20010101153527.A14116@newcnri.cnri.reston.va.us> Message-ID: [Andrew Kuchling] > Why multiply the number of mechanisms required to maintain things? > We already use CVS for other documentation; why not use it for the > FAQ as well? The search facilities of the FAQ wizard are invaluable, and so is the ability for "just users" to update the info from within their browsers. There are two problems with the FAQ in practice: 1. It doesn't get updated enough. We can't fix that by making it harder to update! 2. It's *only* available via the web interface. We should ship a text or HTML snapshot with releases; perhaps even do the usual Usenet periodic FAQ-posting thing. From tim.one at home.com Mon Jan 1 23:34:03 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 17:34:03 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > But is everyone's first thought to time the speed of Python vs. Perl? It's few peoples' first thought. It's impossible for bilingual programmers (or dabblers, or evaluators) not to notice *soon*, though, because: > Why does it hurt so much that this is a bit slow? Factors of 2 to 5 aren't "a bit" -- they're obvious when they happen, but the *cause* is not. To judge from a decade of c.l.py gripes, most people write it off to "huh -- guess Python is just slow"; the rest eventually figure out that their text input is the bottleneck (Tom Christiansen never got this far <0.5 wink>), but then don't know what to do about it. At this point I'm going to insert two anonymized pvt emails from last year: -----Original Message #1 ----- From: TTT Sent: Monday, March 13, 2000 2:29 AM To: GGG Subject: RE: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison GGG, note especially figure 4 in Lutz Prechelt's report: > http://wwwipd.ira.uka.de/~prechelt/Biblio/#jccpprtTR The submitted Python programs had by far the largest variability in how long it took to load the dictionary. My input loop is probably typical of the "fast" Python programs, which indeed beat most (but not all) of the fastest Perl ones here: class Dictionary: ... def fill_from_file(self, f, BUFFERSIZE=500000): """f, BUFFERSIZE=500000 -> fill dictionary from file f. f must be an open file, or other object with a readlines() method. It must contain one word per line. Optional arg BUFFERSIZE is used to chunk up input for efficiency, and is roughly the # of bytes read at a time. """ addword = self.addword while 1: lines = f.readlines(BUFFERSIZE) if not lines: break for line in lines: addword(line[:-1]) # chop trailing newline Comparable Perl may have been the one-liner: grep(&addword, chomp(<>)); which may account for why Perl's memory use was uniformly higher than Python's. Whatever, you really need to be a Python expert to dream up "the fast way" to do Python input! Hire me, and I'll fix that . nothing-like-blackmail-before-going-to-bed-ly y'rs - TTT -----Original Message #2 ----- From: GGG Sent: Monday, March 13, 2000 7:08 AM To: TTT Subject: Re: [Python-Help] C, C++, Java, Perl, Python, Rexx, Tcl comparison Agreed. readlines(BUFFERSIZE) is a crock. In fact, ``for i in f.readlines()'' should use lazy evaluation -- but that will have to wait for Py3K unless we add hints so that readlines knows it is being called from a for loop. --GGG -----Back to 2001 ----- I took TTT's advice and read Lutz's report . I agree with GGG that hiding this in .readlines() would be maximally elegant. xreadlines supplies most of the lazy machinery GGG favored. I don't know how hard it would be to supply the rest of it, but it's such a frequent bitching point that I would prefer pointing people to an explicit .xreadlines() hack than either (a) try to convince them that they "shouldn't" care about the speed as much as they claim to; or, (b) try to explain the double-loop buffering method. I'd personally rather use an explicit .xreadlines() hack than code the double-loop buffering too, and don't see an obvious way to do better than that right now. >> reading-text-files-is-very-common-ly y'rs - tim > So is worrying about performance without a good reason... Indeed it is. I'm persuaded that many people making this specific complaint have a legitimate need for more speed, though, and that many don't persist with Python long enough to find out how to address this complaint (because the double-loop method is too obscure for a newbie to dream up). That makes this hack score extraordinarily high on my benefit/harm ratio scale (in P3K xreadlines can be deprecated in favor of readlines <0.9 wink>). heck-it-doesn't-even-require-a-new-keyword-ly y'rs - tim From thomas at xs4all.net Mon Jan 1 23:46:45 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 1 Jan 2001 23:46:45 +0100 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101011935.OAA09728@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 01, 2001 at 02:35:24PM -0500 References: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <20010101234645.B5435@xs4all.nl> On Mon, Jan 01, 2001 at 02:35:24PM -0500, Guido van Rossum wrote: [ Python lacks a One True Way of doing Perl's 'while(<>)' ] > > Does someone have an elegant way to address this? I believe Jeff's shot at > > elegance was the other part of the patch, using (his new) xreadlines under > > the covers to speed the fileinput module. > But of course suggesting fileinput is also not a great solution -- > it's relatively obscure (since it's not taught by most tutorials, > certainly not by the standard tutorial). Is fileinput really obscure ? I personally quite like it. It is enough like the perl idiom to be very useful for people thinking that way, and it doesn't require special syntax or considerations. If tutorialization is the only problem, I'd be happy to fix that, provided Fred or Moshe can TeX my fix up. As for speed (which stays a secondary or tertiary consideration at best) do we really need the xreadlines method to accomplish that ? Couldn't fileinput get almost the same performance using readlines() with a sizehint ? I personally don't like the xreadlines because it adds yet another function to do the same, with a slight, subtle and to the untrained programmer unclear distinction from the rest. (I don't really like the range/xrange difference either -- I think Python code shouldn't care whether they're dealing with a real list or a generator, and as much as possible should just be generators. And in the case of simple (x)range()es, I have yet to see a case where a 'real' list had significantly better performance than a generator.) If we *do* start adding methods to (the public API of) filemethods, I think we should consider more than just xreadlines() (I seem to recall other proposals, but my memory is hazy at the moment -- I haven't slept since last millennium) add whatever is necessary, and provide a UserFile in the std. lib that 'emulates' all fileobject functionality using a single readline() function. Now, if you'll excuse me, I have a date with a soft bed I haven't seen in about 40 hours, a pair of aspirin my head is killing for and probably a hangover that I don't want to think about, right now ;) Gelukkig-Nieuwjaar-iedereen-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jepler at inetnebr.com Tue Jan 2 02:49:35 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Mon, 1 Jan 2001 19:49:35 -0600 Subject: [Python-Dev] Re: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: ; from Tim Peters on Mon, Jan 01, 2001 at 02:20:53PM -0500 Message-ID: <20010101194935.19672@falcon.inetnebr.com> I'd like to speak up about this patch I've submitted on sourceforge. I consider the xreadlines function/object to be the core of my proposal. The addition of a method to file objects, as well as the modifications to fileinput, are secondary in my opinion. The desire is to iterate over file conents in a way that satisfies the following criteria: * Uses the "for" syntax, because this clearly captures the underlying operation. (files can be viewed as sequences of lines when appropriate) * Consumes small amounts of memory even when the file contents are large. * Has the lowest overhead that can reasonably be attained. I think that it is agreed that the ability to use the "for" syntax is important, since it was the impetus for the xrange function/object. After all, there's a "while" statement which will give the same effect, without introducing xrange. The point under debate, as I see it, is the utility of speeding up the "benchmarks" of folks who compare the speed of Python and another language doing a very simple loop over the lines in a file. Since this advantage disappears once real work is beig done on the file, maybe an XReadLines class, written in Python, would be more suitable. In fact, I've written such a class since I didn't know about fileinput and in any case I find it less useful to me because of all the weird stuff it does. (parsing argv, opening files by name, etc) One shortcoming of my current patch, aside from the ones already named in another person's response to the it, are that it fails when working on a file-like class which implements .readline but not .readlines. In any case, I wrote xreadlines to learn how to write C extensions to Python, and submitted it at the suggestion of a fellow Python user in a private discussion. I'd like to extinguish one of these eternal comp.lang.python threads with it too, but maybe it's not to be. Happy new year, all. Jeff From gstein at lyra.org Tue Jan 2 04:34:31 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 1 Jan 2001 19:34:31 -0800 Subject: [Python-Dev] FAQ Horribly Out Of Date In-Reply-To: <20010101153527.A14116@newcnri.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Mon, Jan 01, 2001 at 03:35:27PM -0500 References: <20001231105812.A12168@newcnri.cnri.reston.va.us>, <20001231003330.D2188A84F@darjeeling.zadka.site.co.il> <20010101100102.2360CA84F@darjeeling.zadka.site.co.il> <200101011949.OAA09804@cj20424-a.reston1.va.home.com> <20010101153527.A14116@newcnri.cnri.reston.va.us> Message-ID: <20010101193431.M10567@lyra.org> On Mon, Jan 01, 2001 at 03:35:27PM -0500, Andrew Kuchling wrote: > On Mon, Jan 01, 2001 at 02:49:24PM -0500, Guido van Rossum wrote: > >But I would be very hesitant to drop the notion of maintaining the FAQ > >as a group collaboration project. There's nothing wrong with the FAQ > >wizard except that the password (Spam) should be made publicly known... > > Why multiply the number of mechanisms required to maintain things? We > already use CVS for other documentation; why not use it for the FAQ as > well? That would limit the updaters to just those with CVS access. As Guido just pointed out, Bjorn made a bunch of updates. And he didn't need CVS to do that... Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Tue Jan 2 04:44:05 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 1 Jan 2001 22:44:05 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101194935.19672@falcon.inetnebr.com> Message-ID: [Jeff Epler] > I'd like to speak up about this patch I've submitted on sourceforge. I'm not sure that's allowed . > ... > The point under debate, as I see it, is the utility of speeding > up the "benchmarks" of folks who compare the speed of Python and > another language doing a very simple loop over the lines in a file. If that were true, I couldn't care less. > Since this advantage disappears once real work is being done on > the file, ... I agree that's true, but submit it's rarely relevant. *Most* file-crunching apps are dominated by I/O time, which is why this is so visible to so many; e.g., chewing over massive log files looking for patterns appears to be the growth industry of the 21st century . Even in Lutz's report (see reference from earlier mail), where the task to be solved was far from trivial, input time exceeded processing time across all languages (with some oddball exceptions, when the coder neglected to use a hash table to store info). That's thoroughly typical of real file-crunching applications, in my experience: Perl has a killer speed advantage in the single most time-consuming portion of the app, and due to one implementation trick. Take that advantage away, and Python holds its own in this domain. Coincidentally, I got pvt email from a newbie today, reading in part; > If Perl wasn't so gosh darn good and fast at text scrubbing, it > wouldn't really be a consideration, it's syntax is so clunky and > hard to learn by comparison to both Python and Ruby. This is just depressing, because I can predict every step of this dance. > ... > Happy new year, all. And to you! Just make sure it's a fast new year . From moshez at zadka.site.co.il Tue Jan 2 16:24:40 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 2 Jan 2001 17:24:40 +0200 (IST) Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101234645.B5435@xs4all.nl> References: <20010101234645.B5435@xs4all.nl>, <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <20010102152440.9C26DA84F@darjeeling.zadka.site.co.il> On Mon, 1 Jan 2001, Thomas Wouters wrote: > As for speed (which stays a secondary or tertiary consideration at best) do > we really need the xreadlines method to accomplish that ? Couldn't fileinput > get almost the same performance using readlines() with a sizehint ? I me too Adding xreadlines() to the interface would break half a dozen file-objects all around the world (just the standard library has StringIO, cStringIO, GzipFile and probably some others I can't remember) Adding .readlines(sizehint) to fileinput, and adding a function to create something similar to fileinput from a file object (as opposed to a file name) would help everyone, and doesn't seem to hard. Is there a gotcha I'm just not seeing? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Tue Jan 2 09:06:32 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 03:06:32 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <20010101234645.B5435@xs4all.nl> Message-ID: [Thomas Wouters] > ... > As for speed (which stays a secondary or tertiary consideration > at best) do we really need the xreadlines method to accomplish > that ? Couldn't fileinput get almost the same performance using > readlines() with a sizehint ? There was a long email discussion among Jeff, Paul Prescod, Neel Krishnaswami, and Alex Martelli about this. I started getting copied on it somewhere midstream, but didn't have time to follow it then (like I do now ). About two weeks ago Neel summarized all the approaches then under discussion: """ [Neel Krishnaswami] ... Quick performance summary of the current solutions: Slowest: for line in fileinput.input('foo'): # Time 100 : while 1: line = file.readline() # Time 75 : for line in LinesOf(open('foo')): # Time 25 Fastest: for line in file.readlines(): # Time 10 while 1: lines = file.readlines(hint) # Time 10 for line in xreadlines(file): # Time 10 The difference in speed between the slowest and fastest is about a factor of 10. LinesOf is Alex's Python wrapper class that takes a file and uses readlines() with a size-hint to present a sequence interface. It's around half as fast as the fastest idioms, and 3-4 times faster than while 1:. Jeff's xreadlines is essentially the same thing in C, and is indistinguishable in performance from the other fast idioms. ... """ On his box, line-at-a-time is >7x slower than the fastest Python methods, which latter are usually close (depending on the platform) to Perl line-at-a-time speeds. A factor of 7 is too large for most working programmers to ignore in the interest of somebody else's notion of theoretical purity . Seriously, speed is not a secondary consideration to me when the gap is this gross, and in an area so visible and common. Alex's LineOf appears a good predictor for how adding fileinput.readlines(hint) would perform, since it appears to *be* that (except off on its own). Then it buys a factor of 3 over line-at-a-time on Neel's box but leaves a factor of 2.5 on the table. The cause of the latter appears mostly to be the overhead of getting a Python method call into the equation for each line returned. Note that Jeff added .xreadlines() as a file object method at Neel's urging. The way he started this is shown on the last line: a function. If we threw out the fileinput and file method aspects, and just added a new module xreadlines with a function xreadlines, then what? I bet it would become as popular as the string module, and for good reason: it's a specific approach that works, to a specific and common problem. > ... > And in the case of simple (x)range()es, I have yet to see a case > where a 'real' list had significantly better performance than > a generator.) It varies by platform, but I don't think I've heard of variations larger than 20% in either direction. 20% is nothing, though; in *this* case we're talking order of magnitude. That's go/nogo territory. > ... > Gelukkig-Nieuwjaar-iedereen-ly y'rs I understand people are passionate when reality clashes with the dream of a wart-free language, but that's no reason to swear at me . wishing-you-a-happy-new-year-like-a-civilized-man-ly y'rs - tim From paulp at ActiveState.com Tue Jan 2 11:00:46 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 02:00:46 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: <200101011935.OAA09728@cj20424-a.reston1.va.home.com> Message-ID: <3A51A6CE.3B15371D@ActiveState.com> Guido van Rossum wrote: > > ... > > But is everyone's first thought to time the speed of Python vs. Perl? > Why does it hurt so much that this is a bit slow? I want to interject here that I asked Jeff to submit this patch because I don't see it as "a little bit slow." When someone transliterates a program from one scripting language to another and gets a program that is two to five times slower that is a big deal! > But of course suggesting fileinput is also not a great solution -- > it's relatively obscure (since it's not taught by most tutorials, > certainly not by the standard tutorial). Fileinput's primary problem is that IIRC, it is even slower than doing readline yourself! > > reading-text-files-is-very-common-ly y'rs - tim > > So is worrying about performance without a good reason... I don't understand what constitutes good reason. We're talking about a relatively minor change that will speed up thousands of programs, answer a frequently asked question from comp.lang.python, obliterate an obscure idiom and reduce the number of requests for a Python syntax change (assignment expression) all in one bold sweep. It seemed to me as if it was a "pure win." Paul Prescod From paulp at ActiveState.com Tue Jan 2 11:06:24 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 02:06:24 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: <20010101234645.B5435@xs4all.nl>, <200101011935.OAA09728@cj20424-a.reston1.va.home.com> <20010102152440.9C26DA84F@darjeeling.zadka.site.co.il> Message-ID: <3A51A820.50365F02@ActiveState.com> Moshe Zadka wrote: > > ... > > Adding .readlines(sizehint) to fileinput, and adding a function > to create something similar to fileinput from a file object (as opposed > to a file name) would help everyone, and doesn't seem to hard. > Is there a gotcha I'm just not seeing? Fileinput is inherently slow because there are too many layers of Python code. I started to consider ways of inverting the logic so that it only called into Python when it needed to switch files but it would have been a much larger patch than Jeff's and I thought that a conservative approach was important. Fileinput should someday be optimized but we can easily get a low-hanging fruit improvement with Jeff's patch. Paul Prescod From guido at digicool.com Tue Jan 2 15:56:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 09:56:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 03:06:32 EST." References: Message-ID: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Tim's almost as good at convincing me as he is at channeling me! The timings he showed almost convinced me that fileinput is hopeless and xreadlines should be added. But then I wrote a little timer of my own... I am including the timer program below my signature. The test input was the current access_log of dinsdale.python.org, which has about 119 Mbytes and 1M lines (as counted by the test program). I measure about a factor of 2 between readlines with a sizehint (of 1 MB) and fileinput; a change to fileinput that uses readline with a sizehint and in-lines the common case in __getitem__ (as suggested by Moshe), didn't make a difference. Output (the first time is realtime seconds, the second CPU seconds): total 119808333 chars and 1009350 lines count_chars_lines 7.944 7.890 readlines_sizehint 5.375 5.320 using_fileinput 15.861 15.740 while_readline 8.648 8.570 This was on a 600 MHz Pentium-III Linux box (RH 6.2). Note that count_chars_lines and readlines_sizehint use the same algorithm -- the difference is that readlines_sizehint uses 'pass' as the inner loop body, while count_chars_lines adds two counters. Given that very light per-line processing (counting lines and characters) already increases the time considerably, I'm not sure I buy the arguments that the I/O overhead is always considerable. The fact that my change to fileinput.py didn't make a difference suggests that its lack of speed it purely caused by the Python code. Now what to do? I still don't like xreadlines very much, but I do see that it can save some time. But my test doesn't confirm Neel's times as posted by Tim: > Slowest: for line in fileinput.input('foo'): # Time 100 > : while 1: line = file.readline() # Time 75 > : for line in LinesOf(open('foo')): # Time 25 > Fastest: for line in file.readlines(): # Time 10 > while 1: lines = file.readlines(hint) # Time 10 > for line in xreadlines(file): # Time 10 I only see a factor of 3 between fastest and slowest, and readline is only about 60% slower than readlines_sizehint. --Guido van Rossum (home page: http://www.python.org/~guido/) import time, fileinput, sys def timer(func, *args): t0 = time.time() c0 = time.clock() func(*args) t1 = time.time() c1 = time.clock() print "%-20s %6.3f %6.3f" % (func.__name__, t1-t0, c1-c0) def count_chars_lines(fn, bs=1024*1024): nl = 0 nc = 0 f = open(fn, "r") while 1: buf = f.readlines(bs) if not buf: break for line in buf: nl += 1 nc += len(line) f.close() print "total", nc, "chars and", nl, "lines" def readlines_sizehint(fn, bs=1024*1024): f = open(fn, "r") while 1: buf = f.readlines(bs) if not buf: break for line in buf: pass f.close() def using_fileinput(fn): f = fileinput.FileInput(fn) for line in f: pass f.close() def while_readline(fn): f = open(fn, "r") while 1: line = f.readline() if not line: break pass f.close() fn = "/home/guido/access_log" if sys.argv[1:]: fn = sys.argv[1] timer(count_chars_lines, fn) timer(readlines_sizehint, fn, 1024*1024) timer(using_fileinput, fn) timer(while_readline, fn) From guido at digicool.com Tue Jan 2 16:07:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:07:06 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: Your message of "Mon, 01 Jan 2001 15:27:37 EST." References: Message-ID: <200101021507.KAA12796@cj20424-a.reston1.va.home.com> > As a compromise, given that we're not going to take the time to be precise > (well, I'm sure not ...): > > The optional \keyword{else} clause is executed if and > when control flows off the end of the \keyword{try} > clause.\foonote{In Python 2.0, control "flows off the > end" except in case of exception, or executing a > \keyword{return}, \keyword{continue} or \keyword{break} > statement.} > Exceptions in the \keyword{else} clause are not handled by > the preceding \keyword{except} clauses. > > Now it's all of imprecise, almost precise, specific to Python 2.0, and > robust against any future changes . Sounds good to me. The reference to 2.0 could be changed to "Currently". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 2 16:20:11 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:20:11 -0500 Subject: [Python-Dev] Re: curses in the core? In-Reply-To: Your message of "Thu, 28 Dec 2000 18:25:28 EST." <20001228182528.A10743@thyrsus.com> References: <200012282252.XAA18952@loewis.home.cs.tu-berlin.de> <20001228182528.A10743@thyrsus.com> Message-ID: <200101021520.KAA13222@cj20424-a.reston1.va.home.com> > What does being in the Python core mean? There are two potential definitions: > > 1. Documentation says it's available on all platforms. > > 2. Documentation restricts it to one of the three platform groups > (Unix/Windows/Mac) but implies that it will be available on any > OS in that group. > > I think the second one is closer to what application programmers > thinking about which batteries are included expect. But I could be > persuaded otherwise by a good argument. Actually, when *I* have used the term "core" I've typically thought of this as referring to anything that's in the standard source distribution, whether or not it is built on all platforms. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Tue Jan 2 09:42:30 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 00:42:30 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021456.JAA12633@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 02, 2001 at 09:56:40AM -0500 References: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Message-ID: <20010102004230.A29700@glacier.fnational.com> On Tue, Jan 02, 2001 at 09:56:40AM -0500, Guido van Rossum wrote: > Now what to do? I still don't like xreadlines very much, but I do see > that it can save some time. But my test doesn't confirm Neel's times > as posted by Tim: > > > Slowest: for line in fileinput.input('foo'): # Time 100 > > : while 1: line = file.readline() # Time 75 > > : for line in LinesOf(open('foo')): # Time 25 > > Fastest: for line in file.readlines(): # Time 10 > > while 1: lines = file.readlines(hint) # Time 10 > > for line in xreadlines(file): # Time 10 > > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. Could it be that your using the CVS version of Python which includes Andrew's cool glibc getline enhancement? Neil From guido at digicool.com Tue Jan 2 16:40:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 02 Jan 2001 10:40:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 00:42:30 PST." <20010102004230.A29700@glacier.fnational.com> References: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> <20010102004230.A29700@glacier.fnational.com> Message-ID: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> [me] > > I only see a factor of 3 between fastest and slowest, and > > readline is only about 60% slower than readlines_sizehint. [Neil] > Could it be that your using the CVS version of Python which > includes Andrew's cool glibc getline enhancement? Bingo! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 2 17:34:31 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 11:34:31 -0500 Subject: [Python-Dev] Fwd: try...else In-Reply-To: <200101021507.KAA12796@cj20424-a.reston1.va.home.com> Message-ID: >> The optional \keyword{else} clause is executed if and >> when control flows off the end of the \keyword{try} >> clause.\foonote{In Python 2.0, control "flows off the >> end" except in case of exception, or executing a >> \keyword{return}, \keyword{continue} or \keyword{break} >> statement.} >> Exceptions in the \keyword{else} clause are not handled by >> the preceding \keyword{except} clauses. [Guido] > Sounds good to me. The reference to 2.0 could be changed to > "Currently". Cool. See http://sourceforge.net/bugs/?group_id=5470&func=detailbug&bug_id=127098 From tim.one at home.com Tue Jan 2 21:48:08 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 15:48:08 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom Message-ID: test_compare is broken because the expected-output file has bizarre stuff in it like: cmp(2, [1]) = -108 cmp(2, (2,)) = -116 cmp(2, None) = -78 What's up with that? I'll leave test_minidom to someone who thinks they know what it's doing. Both failures are very recent. From tim.one at home.com Tue Jan 2 21:48:09 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 15:48:09 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. [Neil] > Could it be that your using the CVS version of Python which > includes Andrew's cool glibc getline enhancement? [Guido] > Bingo! It's a good thing I haven't yet had time to try any speed tests myself, since I don't have a glibc-enabled platform so Guido and I may have been tempted to disagree about numbers in public . I checked out the source for glibc's getline. It's pulling the same trick Perl uses, copying directly from the stdio buffer when it can, instead of (like Python, and like almost all vendor fgets implementations) doing getc-in-a-loop. The difference is that Perl can't do that without breaking into the FILE* representation in platform-dependent ways. It's a shame that almost all vendors missed that fgets was defined as a primitive by the C committee precisely so that vendors *could* pull this speed trick under the covers. It's also a shame that Perl did it for them . From barry at digicool.com Tue Jan 2 22:56:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 2 Jan 2001 16:56:10 -0500 Subject: [Python-Dev] testing, please ignore Message-ID: <14930.20090.283107.799626@anthem.wooz.org> Sorry folks, just making sure things are working again. you-really-didn't-want-email-this-millennium-didja?-ly y'rs, -Barry From guido at python.org Tue Jan 2 21:59:22 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 15:59:22 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 14:59:24 EST." References: Message-ID: <200101022059.PAA14845@cj20424-a.reston1.va.home.com> > [Guido] > > I only see a factor of 3 between fastest and slowest, and > > readline is only about 60% slower than readlines_sizehint. > > [Neil] > > Could it be that your using the CVS version of Python which > > includes Andrew's cool glibc getline enhancement? > > [Guido] > > Bingo! > > It's a good thing I haven't yet had time to try any speed tests myself, > since I don't have a glibc-enabled platform so Guido and I may have been > tempted to disagree about numbers in public . > > I checked out the source for glibc's getline. It's pulling the same trick > Perl uses, copying directly from the stdio buffer when it can, instead of > (like Python, and like almost all vendor fgets implementations) doing > getc-in-a-loop. The difference is that Perl can't do that without breaking > into the FILE* representation in platform-dependent ways. It's a shame that > almost all vendors missed that fgets was defined as a primitive by the C > committee precisely so that vendors *could* pull this speed trick under the > covers. It's also a shame that Perl did it for them . Quite apart from whether we should enable xreadlines(), could you look into doing a similar thing for MSVC stdio? For most Unix platforms, a cop-out answer is "use glibc" -- but for Windows it may pay to do our own hack. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Tue Jan 2 22:06:05 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 2 Jan 2001 16:06:05 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 03:48:09PM -0500 References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> Message-ID: <20010102160605.A5211@kronos.cnri.reston.va.us> On Tue, Jan 02, 2001 at 03:48:09PM -0500, Tim Peters wrote: >into the FILE* representation in platform-dependent ways. It's a shame that >almost all vendors missed that fgets was defined as a primitive by the C >committee precisely so that vendors *could* pull this speed trick under the >covers. It's also a shame that Perl did it for them . So, should Python be changed to use fgets(), available on all ANSI C platforms, rather than the glibc-specific getline()? That would be more complicated than the brain-dead easy course of using getline(), which is obviously why I didn't do it; PyFile_GetLine() had annoyingly complicated logic. When this was discussed in comp.lang.python, someone also mentioned getc_unlocked(), which saves the overhead of locking the stream every time, but that didn't seem a fruitful avenue for exploration. --amk From tim.one at home.com Tue Jan 2 23:00:37 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:00:37 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101022059.PAA14845@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Quite apart from whether we should enable xreadlines(), could you look > into doing a similar thing for MSVC stdio? For most Unix platforms, a > cop-out answer is "use glibc" -- but for Windows it may pay to do our > own hack. There's no question about whether it would pay on Windows, because it pays big for Perl on Windows. The question is about cost. There's no way to *do* it short of the way Perl does it, which is to write a large pile of Windows-specific code (roughly the same size and complexity as the glibc getline implementation -- check it out, it's not trivial, and glibc exploits compiler inlining to make it bearable) relying on reverse-engineered accidents of how MS happens to use all the fields from this undocumented struct (from MS's stdio.h): struct _iobuf { char *_ptr; int _cnt; char *_base; int _flag; int _file; int _charbuf; int _bufsiz; char *_tmpfname; }; typedef struct _iobuf FILE; in their stdio implementation. Else it won't play correctly with MS's stdio. That's A Project. Last year I tried extracting the relevant code from Perl, but, as is usual, gave up after unraveling the third (whatever) layer of mystery macros with no end in sight. I bet it would take me a week. Is it worth that much to you and DC? Since the real Windows experts are hanging out at ActiveState, I bet one of them will volunteer to do it tonight . From tim.one at home.com Tue Jan 2 23:17:14 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:17:14 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010102160605.A5211@kronos.cnri.reston.va.us> Message-ID: [Tim] > It's a shame that almost all vendors missed that fgets was defined > as a primitive by the C committee precisely so that vendors *could* > pull this speed trick under the covers. It's also a shame that Perl > did it for them . [Andrew Kuchling] > So, should Python be changed to use fgets(), available on all ANSI C > platforms, rather than the glibc-specific getline()? That would be > more complicated than the brain-dead easy course of using getline(), > which is obviously why I didn't do it; PyFile_GetLine() had annoyingly > complicated logic. The thrust of my original comment above is that fgets is almost never faster than what Python is doing now, because vendors overwhelmingly do *not* exploit the opportunity the std gave them. So, no, switching to fgets() wouldn't help. > When this was discussed in comp.lang.python, someone also mentioned > getc_unlocked(), which saves the overhead of locking the stream every > time, but that didn't seem a fruitful avenue for exploration. Well, get_unlocked isn't std (not even in C99). Mentioning it did inspire me to discover, however, that while the MS fgets() is the typical "getc in a loop" thing, at least it locks/unlocks the stream once each at function entry/exit, and uses a special MS flavor of getc ("_getc_lk") inside the loop. However, that this helps is an illusion, because the body of their _getc_lk macro is identical to the body of their getc macro. Smells like a bug, or an unfinished project. From paulp at ActiveState.com Tue Jan 2 23:40:39 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 14:40:39 -0800 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range References: Message-ID: <3A5258E7.D52CA2C@ActiveState.com> Tim Peters wrote: > > There's no question about whether it would pay on Windows, because it pays > big for Perl on Windows. The question is about cost. There's no way to > *do* it short of the way Perl does it, which is to write a large pile of > Windows-specific code > ... Since the real Windows experts > are hanging out at ActiveState, I bet one of them will volunteer to do it > tonight . Mark is busy tonight and the Perl guys are still recovering from implementing it the first time. :) Paul From guido at python.org Tue Jan 2 23:46:00 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 17:46:00 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 16:06:05 EST." <20010102160605.A5211@kronos.cnri.reston.va.us> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> Message-ID: <200101022246.RAA16384@cj20424-a.reston1.va.home.com> > On Tue, Jan 02, 2001 at 03:48:09PM -0500, Tim Peters wrote: > >into the FILE* representation in platform-dependent ways. It's a shame that > >almost all vendors missed that fgets was defined as a primitive by the C > >committee precisely so that vendors *could* pull this speed trick under the > >covers. It's also a shame that Perl did it for them . > > So, should Python be changed to use fgets(), available on all ANSI C > platforms, rather than the glibc-specific getline()? That would be > more complicated than the brain-dead easy course of using getline(), > which is obviously why I didn't do it; PyFile_GetLine() had annoyingly > complicated logic. You mean get_line(), which indeed has a complicated API and corresponding logic: the argument may be a max length, or 0 to indicate arbutrary length, or negative to indicate raw_input() semantics. :-( Unfortunately we can't use fgets(), even if it were faster than getline(), because it doesn't tell how many characters it read. On files containing null bytes, readline() is supposed to treat these like any other character; if your input is "abc\0def\nxyz\n", the first readline() call should return "abc\0def\n". But with fgets(), you're left to look in the returned buffer for a null byte, and there's no way (in general) to distinguish this result from an input file that only consisted of the three characters "abc". getline() doesn't seem to have this problem, since its size is also an output parameter. > When this was discussed in comp.lang.python, someone also mentioned > getc_unlocked(), which saves the overhead of locking the stream every > time, but that didn't seem a fruitful avenue for exploration. I've never heard of getc_unlocked; it's not in the (old) C standard. If it's also a glibc thing, I doubt that using it would be faster than getline(). If it's a new C standard (C9x) thing, we'll have to wait. Fred reminded me that for e.g. Solaris, while everybody probably compiles with GCC, that doesn't mean they are using glibc, so in practice getline() will only help on Linux. I'm slowly warming up to xreadlines(), although we must be careful to consider the consequences (do other file-like objects need to support it too?). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 2 23:46:18 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 17:46:18 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <3A5258E7.D52CA2C@ActiveState.com> Message-ID: [Tim] > ... Since the real Windows experts are hanging out at ActiveState, > I bet one of them will volunteer to do it tonight . [Paul Prescod] > Mark is busy tonight and the Perl guys are still recovering from > implementing it the first time. :) I'm delighted, then, that you have nothing better to do than tease the decent, hard-working folks on Python-Dev! I'll be up until about 4am -- feel free to submit your patch anytime before then. in-a-pinch-i'll-even-accept-it-tomorrow-ly y'rs - tim From guido at python.org Tue Jan 2 23:53:14 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 17:53:14 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 17:00:37 EST." References: Message-ID: <200101022253.RAA16482@cj20424-a.reston1.va.home.com> > [Guido] > > Quite apart from whether we should enable xreadlines(), could you look > > into doing a similar thing for MSVC stdio? For most Unix platforms, a > > cop-out answer is "use glibc" -- but for Windows it may pay to do our > > own hack. > > There's no question about whether it would pay on Windows, because it pays > big for Perl on Windows. The question is about cost. There's no way to > *do* it short of the way Perl does it, which is to write a large pile of > Windows-specific code (roughly the same size and complexity as the glibc > getline implementation -- check it out, it's not trivial, and glibc exploits > compiler inlining to make it bearable) relying on reverse-engineered > accidents of how MS happens to use all the fields from this undocumented > struct (from MS's stdio.h): > > struct _iobuf { > char *_ptr; > int _cnt; > char *_base; > int _flag; > int _file; > int _charbuf; > int _bufsiz; > char *_tmpfname; > }; > typedef struct _iobuf FILE; > > in their stdio implementation. Else it won't play correctly with MS's > stdio. That's A Project. Last year I tried extracting the relevant code > from Perl, but, as is usual, gave up after unraveling the third (whatever) > layer of mystery macros with no end in sight. I bet it would take me a > week. Is it worth that much to you and DC? Since the real Windows experts > are hanging out at ActiveState, I bet one of them will volunteer to do it > tonight . Yeah. That's too much. Too bad. I'm not holding my breath for ActiveState though. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Jan 2 23:52:58 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 2 Jan 2001 16:52:58 -0600 (CST) Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101022246.RAA16384@cj20424-a.reston1.va.home.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> Message-ID: <14930.23498.53540.401218@beluga.mojam.com> Guido> I'm slowly warming up to xreadlines(), ... I haven't followed this thread closely, and my brain is a bit frazzled at the moment, but is there some fundamental reason that the file object's readlines method can't be made lazy, perhaps only when given a sizehint? Skip From paulp at ActiveState.com Tue Jan 2 23:59:47 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 14:59:47 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> Message-ID: <3A525D63.17ABCC87@ActiveState.com> Skip Montanaro wrote: > > Guido> I'm slowly warming up to xreadlines(), ... > > I haven't followed this thread closely, and my brain is a bit frazzled at > the moment, but is there some fundamental reason that the file object's > readlines method can't be made lazy, perhaps only when given a sizehint? I suggested this at one point but it was pointed out that there is probably a lot of code that works with the resulting list *as a list* i.e. as a random-access, writable sequence object. I really wasn't thrilled with xreadlines at first either...it's the least of all possible evils (including the status quo). Paul From nas at arctrix.com Tue Jan 2 17:09:15 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 08:09:15 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 03:48:08PM -0500 References: Message-ID: <20010102080915.A30892@glacier.fnational.com> On Tue, Jan 02, 2001 at 03:48:08PM -0500, Tim Peters wrote: > test_compare is broken because the expected-output file has bizarre stuff in > it like: > > cmp(2, [1]) = -108 > cmp(2, (2,)) = -116 > cmp(2, None) = -78 > > What's up with that? My fault. I only ran regrtest.py and not "make test". I'm not sure why you say bizarre stuff though. Do you object to testing that 2 is less than None (something that is not part of the language spec) or do you think that the results from cmp() should be clamped between -1 and 1? Neil From guido at python.org Wed Jan 3 00:06:16 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 02 Jan 2001 18:06:16 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 02 Jan 2001 16:52:58 CST." <14930.23498.53540.401218@beluga.mojam.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> Message-ID: <200101022306.SAA16684@cj20424-a.reston1.va.home.com> > I haven't followed this thread closely, and my brain is a bit frazzled at > the moment, but is there some fundamental reason that the file object's > readlines method can't be made lazy, perhaps only when given a sizehint? Yes -- readlines() is documented to return a list, and some people do things to it that require it to be a real list (e.g. sort or reverse it or modify it in place or concatenate it with other lists). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 3 00:19:14 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 18:19:14 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102080915.A30892@glacier.fnational.com> Message-ID: [Tim] > test_compare is broken because the expected-output file has > bizarre stuff in it like: > > cmp(2, [1]) = -108 > cmp(2, (2,)) = -116 > cmp(2, None) = -78 > > What's up with that? [Neil Schemenauer] > My fault. I only ran regrtest.py and not "make test". Neil, my platform doesn't even *have* a "make": are you saying the test passes for you when you run regrtest.py? That's what I did. > I'm not sure why you say bizarre stuff though. Do you object to > testing that 2 is less than None (something that is not part of the > language spec) Only in part. Lang Ref 2.1.3 (Comparisons) says you can compare them, and guarantees they won't compare equal, but doesn't define it beyond that. If Python actually says "less", fine, we can test for that, although to minimize maintenance down the road it would be better to test for no more than we expect Python to guarantee across releases and implementations (suppose Jython says 2 is greater than None: that's fine too, and it would be better if the test suite didn't say Jython was broken). > or do you think that the results from cmp() should be clamped > between -1 and 1? Not that either ; cmp() isn't documented that way. They're "bizarre" simply because they're not what Python returns! C:\Code\python\dist\src\PCbuild>python Python 2.0 (#8, Dec 17 2000, 01:39:08) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> cmp(2, [1]) -1 >>> cmp(2, (2,)) -1 >>> cmp(2, None) -1 >>> The expected-output file is supposed to match what Python actually does. I have no idea where things like "-108" came from. So things like -108 look bizarre to me. So long as cmp(2, [1]) returns -1 in reality, an expected-output file that claims it returns -108 will never work no matter how you run the tests. One of us is missing something obvious here . From paulp at ActiveState.com Wed Jan 3 00:26:39 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 15:26:39 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> Message-ID: <3A5263AF.CE6C8C81@ActiveState.com> Guido van Rossum wrote: > > ... > > I'm slowly warming up to xreadlines(), although we must be careful to > consider the consequences (do other file-like objects need to support > it too?). The implementation is such that it is pretty easy to add the method to other file-like objects. It is also easy to use the xreadlines module to get the same behavior for objects that do not have the method. Essentially, file.xreadlines is implemented like this: def xreadlines(self): import xreadlines xreadlines.xreadlines(self) Any object can add the method similarly. Paul Prescod From nas at arctrix.com Tue Jan 2 17:51:48 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 08:51:48 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 06:19:14PM -0500 References: <20010102080915.A30892@glacier.fnational.com> Message-ID: <20010102085148.A30986@glacier.fnational.com> On Tue, Jan 02, 2001 at 06:19:14PM -0500, Tim Peters wrote: > Neil, my platform doesn't even *have* a "make": are you saying the test > passes for you when you run regrtest.py? Yes. Isn't checking in code without running regrtest a capital offence? :) > Lang Ref 2.1.3 (Comparisons) says you can compare them, and > guarantees they won't compare equal, but doesn't define it beyond that. Okay, I'll use == rather than cmp(). When I was working on the coercion patch I found cmp() useful. I guess it shouldn't be in the standard test suite, especially since Jython may implement things differently. [Neil] > or, do you think that the results from cmp() should be clamped > between -1 and 1? [Tim] > Not that either ; cmp() isn't documented that way. > > They're "bizarre" simply because they're not what Python returns! They do on my box: Python 2.0 (#19, Nov 21 2000, 18:13:04) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Type "copyright", "credits" or "license" for more information. >>> cmp(1, None) -78 I guess MS uses a different strcmp than GNU. Do you mind trying the attached C code? I get "-78" as output. I should have thought a little more before checking in the patch. -78 is quite obviously a machine/library dependent thing. [Tim again] > One of us is missing something obvious here . I don't know about that. The implementation of coercion and comparison is not simple. I've been studying it for some time now and I obviously still don't know what the hell is going on. AFAICT, the problem is that instances without a comparison method can compare larger or smaller than numbers depending on where in memory the objects are stored. Neil #include #include int main() { printf("%d\n", strcmp("", "None")); } From tim.one at home.com Wed Jan 3 01:30:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 19:30:26 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102085148.A30986@glacier.fnational.com> Message-ID: [Neil] > They do on my box: > > Python 2.0 (#19, Nov 21 2000, 18:13:04) > [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> cmp(1, None) > -78 Well, who cares about your silly box ? Messier than I thought! Yes, Windows strcmp is always in {-1, 0, 1}. Rather than run tests, here's the tail end of MS's strcmp.c: if ( ret < 0 ) ret = -1 ; else if ( ret > 0 ) ret = 1 ; return( ret ); Wasted cycles and stupid formatting . > ... > AFAICT, the problem is that instances without a comparison method can > compare larger or smaller than numbers depending on where in memory > the objects are stored. If so, that's a bug ... OK, it *is* a bug, at least in current CVS. Did you cause that, or was it always this way? I was able to provoke this badness: >>> j < c < i 1 >>> j < i 0 >>> i.e. it violates transitivity, and that's never supposed to happen in the absence of user-supplied __cmp__. Here c is an instance of "class C: pass", and i and j are ints. >>> type(i), type(j), type(c) (, , ) >>> i, j, c (999999, 1000000, <__main__.C instance at 00791B7C>) >>> id(i), id(j), id(c) (7941572, 7744676, 7936892) >>> Guido thought he fixed this kind of stuff once (and I believed him ) by treating all numbers as if they had type name "" (i.e., yes, an empty string) when compared to non-numbers. Then the usual "mixed-type comparisons in the absence of __cmp__ compare via type name string" rule ensured that numbers would always compare "less than" instances of any other type. That's the intent of the tail end: else if (vtp->tp_as_number != NULL) vname = ""; else if (wtp->tp_as_number != NULL) wname = ""; /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); of PyObject_Compare. So, in the example above, we *should* have i < c == 1 j < c == 1 j < c < i == 0 Unfortunately, we actually have i < c == 0 in that example. We're apparently not getting to the "number hack" code because c is an instance, and I'll confess up front that my eyes always glazed over long before I got to PyInstance_HalfBinOp <0.half wink>. Whatever, there's at least one bug somewhere in that path! We should have n < i == 1 for any numeric type n and any non-numeric type i (in the absence of user-defined __cmp__). From skip at mojam.com Wed Jan 3 02:27:03 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 2 Jan 2001 19:27:03 -0600 (CST) Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <3A525D63.17ABCC87@ActiveState.com> References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> <3A525D63.17ABCC87@ActiveState.com> Message-ID: <14930.32743.525564.69044@beluga.mojam.com> Paul> I suggested this at one point but it was pointed out that there is Paul> probably a lot of code that works with the resulting list *as a Paul> list* How about this idea? What if readlines() was allowed to return a lazy evaluator if a sizehint > 0 was given? I only saw one example outside of test cases in the current CVS tree where readlines(sizehint) was used (Tools/idle/GrepDialog.py), and it used it as expected: while 1: block = f.readlines(sizehint) if not block: break for line in block: more stuff My suspicion is that most uses of sizehint will be like this. It hasn't been around all that long in Python-years (since 1.5a2), so there's probably not tons of code to break (I agree the semantics would change), and the majority of code that uses it probably looks like the above, which is almost safe (if it returned "" instead of an empty evaluator when nothing was left to read it would be safe). The advantage would be that the above could become the more obvious for line in f.readlines(sizehint): more stuff and the change to file reading code that is "too slow" becomes much simpler. (Of course, xreadlines() has that advantage as well.) I scanned my own code quickly. I found about 10 uses with sizehint and 300 without. I presume we are talking about 2.1 here. In any case, it seems to me that in Py3k readlines should be lazy. Skip P.S. Why did FileInput class never grow a readlines method? From nas at arctrix.com Tue Jan 2 20:38:53 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 11:38:53 -0800 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: ; from tim.one@home.com on Tue, Jan 02, 2001 at 07:30:26PM -0500 References: <20010102085148.A30986@glacier.fnational.com> Message-ID: <20010102113853.A31341@glacier.fnational.com> On Tue, Jan 02, 2001 at 07:30:26PM -0500, Tim Peters wrote: > > AFAICT, the problem is that instances without a comparison method can > > compare larger or smaller than numbers depending on where in memory > > the objects are stored. > > If so, that's a bug ... OK, it *is* a bug, at least in current CVS. Did you > cause that, or was it always this way? To quote Bart Simpson: I didn't do it. I'm pretty sure the bug is in PyInstance_DoBinOp. I don't think its worth fixing though. I'm ready to check in my coercion overhaul patch, assuming no veto's from the list. It should fix this bug (and introduce a whole slew of new ones :). Guido suggested that I remove the "number types compare smaller than other types" behavior. What's your take on that? The current patch on SF always uses the type names. It should be easy to implement the old behavior though. Neil From nas at arctrix.com Tue Jan 2 20:48:09 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 2 Jan 2001 11:48:09 -0800 Subject: [Python-Dev] Applying the PEP 208 (coercion overhaul) patch Message-ID: <20010102114809.B31341@glacier.fnational.com> I'm almost ready to apply SF patch #102652. Guido has give the okay assuming there are no objections from the rest of python-dev. The patch is large and modifies some complicated parts of the interpreter. I expect there will be some bugs. If you would like me to wait, speak now. Guido has sent me some comments on the patch today which I plan to review and address tonight. I will probably apply the patch tomorrow evening. Neil From tim.one at home.com Wed Jan 3 04:05:59 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 22:05:59 -0500 Subject: [Python-Dev] Std test failures on WIndows: test_compare, test_minidom In-Reply-To: <20010102113853.A31341@glacier.fnational.com> Message-ID: [Neil Schemenauer, on a violation of transitivity j < c < i but not j < i] > To quote Bart Simpson: I didn't do it. I'm pretty sure the bug > is in PyInstance_DoBinOp. I don't think its worth fixing though. > I'm ready to check in my coercion overhaul patch, assuming no > veto's from the list. It should fix this bug (and introduce a > whole slew of new ones :). Sounds good to me! > Guido suggested that I remove the "number types compare smaller > than other types" behavior. What's your take on that? The > current patch on SF always uses the type names. It should be > easy to implement the old behavior though. It doesn't matter that they're specifically smaller, it matters that they can't violate transitivity. "numbers compare smaller" was introduced deliberately (by Guido) because, e.g., before that we had 99 < [99] < 99L despite that 99 == 99L, because "int" < "list" < "long int" Even stranger, we had 100 < [99] < 0L < 100 and 100 < [] < -101L < -100 Making numbers compare smaller than other types is one way to ensure stuff like that can't happen; I can't think of a simpler way (although making them compare larger than other types would be equally simple, as would making them compare as if their type name were "Neil" ). From paulp at ActiveState.com Wed Jan 3 04:34:59 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 02 Jan 2001 19:34:59 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range References: <200101021540.KAA13446@cj20424-a.reston1.va.home.com> <20010102160605.A5211@kronos.cnri.reston.va.us> <200101022246.RAA16384@cj20424-a.reston1.va.home.com> <14930.23498.53540.401218@beluga.mojam.com> <3A525D63.17ABCC87@ActiveState.com> <14930.32743.525564.69044@beluga.mojam.com> Message-ID: <3A529DE3.D93C3916@ActiveState.com> Skip Montanaro wrote: > >... > > I presume we are talking about 2.1 here. In any case, it seems to me that > in Py3k readlines should be lazy. I agree, but I'm ambivalent about your suggestion for polymorphic return values from readlines(). Yet another option is a "lazy=1" option. Paul Prescod From tim.one at home.com Wed Jan 3 05:33:29 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 2 Jan 2001 23:33:29 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101021456.JAA12633@cj20424-a.reston1.va.home.com> Message-ID: [Guido, writes a timing program] [Jeff, if you weren't copied on all this stuff, you can play catch-up by reading the archives, at http://mail.python.org/pipermail/python-dev/ ] > ... > I am including the timer program below my signature. The test input > was the current access_log of dinsdale.python.org, which has about 119 > Mbytes and 1M lines (as counted by the test program). For a contrast, I cobbled together a large test file out of various chunks of C source, .py source, HTML source, and email archives. I was shooting for the same size you used (~119Mb), but ended up with more than 3x as many lines. > I measure about a factor of 2 between readlines with a sizehint (of 1 > MB) and fileinput; Factor of 7 here (Jeff, NeilS eventually figured out that Guido was using a CVS version of Python that has AndrewK's glibc getline patch, a zippier line-input routine than Python 2.0 has; but it only applies to platforms using glibc). > ... > Output (the first time is realtime seconds, the second CPU seconds): > > total 119808333 chars and 1009350 lines > count_chars_lines 7.944 7.890 > readlines_sizehint 5.375 5.320 > using_fileinput 15.861 15.740 > while_readline 8.648 8.570 > > This was on a 600 MHz Pentium-III Linux box (RH 6.2). total 117615824 chars and 3237568 count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 866 MHz P3 Win98SE, current CVS Python. I have no handy explanation for why clock() and time() differ on my box (Win98 has no notions of "user time" or "CPU time" distinct from clock time). > Note that count_chars_lines and readlines_sizehint use the same > algorithm -- the difference is that readlines_sizehint uses 'pass' as > the inner loop body, while count_chars_lines adds two counters. > > Given that very light per-line processing (counting lines and > characters) already increases the time considerably, I'm not sure I > buy the arguments that the I/O overhead is always considerable. I disagree that this is "very light processing", although I agree it's hard to think of lighter processing : it's a few Python statements per line, which I'd say is pretty *typical* processing. Read a line, run a string find or regexp search on it, test the result, sometimes fiddle the line accordingly and sometimes not. File-crunching apps generally aren't rocket science! For example, I changed count_chars_lines to tally the number of lines containing the string "Guido" instead, and the runtime went up by just 0.8 seconds (BTW, it found 13808 of them ): if you're thinking in C terms, millions of failing searches for "Guido" may seem like more work, but the number of Python stmts executed usually counts more than what the stmts do at the C level. > ... > Now what to do? I still don't like xreadlines very much, but I do > see that it can save some time. But my test doesn't confirm Neel's > times as posted by Tim: > >> Slowest: for line in fileinput.input('foo'): # Time 100 >> : while 1: line = file.readline() # Time 75 >> : for line in LinesOf(open('foo')): # Time 25 >> Fastest: for line in file.readlines(): # Time 10 >> while 1: lines = file.readlines(hint) # Time 10 >> for line in xreadlines(file): # Time 10 > > I only see a factor of 3 between fastest and slowest, and > readline is only about 60% slower than readlines_sizehint. I don't know what Neel used for an input file, or which platform he used either. And this is bound to vary a lot across platforms. As above, I saw a factor of 7 between fastest and slowest and a factor of 3 between readline and readlines_sizehint. BTW, on my platform the Perl script (using a recent ActiveState Windows Perl) open(FILE, "ga.txt"); while () { 1; } ran in about 6 seconds (I never figured how to get Perl to compute usable timings itself)-- substantially faster than even readlines_sizehint! --and changing the body to $nc = $nl = 0; while () { ++$nl; $nc += length; } print "$nc $nl\n"; boosted that to about 8 seconds. So Perl has gotten zippier too over the years. From tim.one at home.com Wed Jan 3 10:32:55 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 3 Jan 2001 04:32:55 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101022253.RAA16482@cj20424-a.reston1.va.home.com> Message-ID: [Guido & Tim, wonder about faking getline-like functionality for Windows] The attached is kinda baffling. The std tests pass with it, and it changes my test timings from: count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 to: count_chars_lines 14.880 14.854 readlines_sizehint 9.280 9.302 using_fileinput 48.610 48.589 while_readline 13.450 13.451 Big win? You bet. But ... The baffling parts: 1. That Perl still takes only 6 seconds in line-at-a-time mode. 2. I originally wrote a getline workalike, instead of building directly into a PyString buffer. That made my test run *slower*, and I'm talking factor of 2, not a yawn. To judge from my usually silent disk (I've got 256Mb RAM on this box), I'm afraid the extra mallocs required may have triggered the horrid Win9x malloc-thrashing problem I wrote about while I was still at Dragon. Consider that another vote for Vlad's PyMalloc -- we've got no handle on x-platform dynamic memory behavior now. Python's destiny is to replace both the platform OS and libc anyway <0.9 wink>. The scary parts: + As the "XXX" comments indicate, this is full of little insecurities. + Another one I just thought of: if the user's last operations on the fp were two or more consecutive ungetc calls, all bets are off. But then MS doesn't define what happens then either. + This is much less ambitious than I recall Perl's code being: it doesn't try to guess anything about the file, and effectively captures only what would happen if you could unroll the guts of a getc-in-a-loop and optimize the snot out of it. The good news is that this means it's much easier to maintain (it touches only two of the MS FILE* fields, and in ways that are pretty obviously correct). The bad news is that this seems also pretty clearly all there *is* to be gotten out of breaking into the FILE* abstraction for the particular test case I'm using; and increasing TUNEME doesn't save any time at all: the sucker is flying at full speed already. + It drops (line-at-a-time) drops to a little under 13 seconds if I comment out the thread macros. + I haven't looked at Perl's implementation in a year, and they must have dreamt up another trick since then. That's a "scary part" indeed to anyone who has ever looked at Perl's implementation. retreating-into-a-fetal-position-ly y'rs - tim Anyone wants to play, the sandbox is fileobject.c. Do two things: insert this new chunk somewhere above get_line: #ifdef MS_WIN32 static PyObject* win32_getline(FILE *fp) { /* XXX ignores thread safety -- but so does MS's getc macro! */ PyObject* v; char* pBuf; /* next free slot in v's buffer */ /* MS's internals are declared in terms of ints, but it's a sure bet * that won't last forever -- use size_t now & live w/ the casting; * ditto for Python's routines */ size_t total_buf_size = 100; size_t free_buf_size = total_buf_size; #define TUNEME 1000 /* how much to boost the string buffer when exhausted */ v = PyString_FromStringAndSize((char *)NULL, (int)total_buf_size); if (v == NULL) return NULL; pBuf = BUF(v); Py_BEGIN_ALLOW_THREADS for (;;) { char ch; size_t ms_cnt; /* FILE->_cnt shadow */ char* ms_ptr; /* FILE->_ptr shadow */ size_t max_to_copy, i; /* stdio buffer empty or in unknown state; rather * than try to simulate every quirk of MS's internals, * let the MS macros deal with it. */ /* XXX we also wind up here when we simply run out of string * XXX buffer space, but I'm not sure I care: making this a * XXX double-nested loop doesn't seem worth it */ ch = getc(fp); if (ch == EOF) break; /* make sure we've got some breathing room */ if (free_buf_size < 100) { size_t currentoffset = pBuf - BUF(v); total_buf_size += TUNEME; /* XXX check for overflow */ Py_BLOCK_THREADS if (_PyString_Resize(&v, (int)total_buf_size) < 0) return NULL; Py_UNBLOCK_THREADS pBuf = BUF(v) + currentoffset; free_buf_size = TUNEME; } /* ch wasn't EOF, so store it */ *pBuf++ = ch; --free_buf_size; if (ch == '\n') { break; } ms_cnt = (size_t)fp->_cnt; if (!ms_cnt) { /* XXX this is a slow way to read one character at * XXX a time if, e.g., the stream is unbuffered */ continue; } /* payback! now we don't have to check for buffer overflows or * EOF inside the loop, nor does the macro _filbuf() branch force * _ptr and _cnt in and out of memory on each iteration */ ms_ptr = fp->_ptr; assert(ms_cnt > 0); i = max_to_copy = ms_cnt < free_buf_size ? ms_cnt : free_buf_size; do { /* XXX unclear to me why MS's getc macro does "& 0xff" */ *pBuf++ = ch = *ms_ptr++ & 0xff; } while (--i && ch != '\n'); /* update the shadows & counters */ fp->_ptr = ms_ptr; free_buf_size -= max_to_copy - i; fp->_cnt = ms_cnt - (max_to_copy - i); if (ch == '\n') break; } Py_END_ALLOW_THREADS _PyString_Resize(&v, pBuf - BUF(v)); return v; } #endif 2. Within get_line, add this before the #endif (this is the getline #if block): #elif defined(MS_WIN32) if (n == 0) { return win32_getline(fp); } From ping at lfw.org Wed Jan 3 12:40:47 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 3 Jan 2001 05:40:47 -0600 (CST) Subject: [Python-Dev] inspect.py In-Reply-To: <14840.19556.127151.457533@anthem.concentric.net> Message-ID: Uh... hi. I know i've all but dropped out of existence for a long time, what with my simultaneous first stints as a grad student, a teaching assistant, and a house cook (!) and all, but i didn't want to let this work go to waste. Now that the holidays are here i can *finally* try to get some work done! So, i've updated inspect.py in response to Barry's comments, and below is my reply to this old thread. I also wrote some regression tests. I tried to submit inspect.py to SourceForge, but i got: ERROR Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 Does anyone know what's going on with that? Anyway, the latest module and regression tests are available at: http://www.lfw.org/python/inspect.py http://www.lfw.org/python/test_inspect.py for your perusal. On Thu, 26 Oct 2000 barry at wooz.org wrote: > Some thoughts after an initial scan of inspect.py: > > - The doc strings for the is*() functions aren't accurate. > E.g. ismodule() says that it asks whether "the object is a module > with the __file__ special attribute", but that isn't really what it > tests! Guido points out that builtin modules don't currently have > __file__ and besides, you're really testing that the type of the > object is ModuleType. Perhaps a different wording would be better, but i should at least clarify the intention: i wrote them that way because it seemed that the current objects export an unofficial "interface" by means of the special attributes they provide. The purpose of the "is*()" functions is to determine whether an object meets one of these interfaces. A complete interface would provide (1) a type-checker, (2) a constructor, and (3) the methods. As for (2), we don't normally allow construction of these things (except for wizards using the newmodule). As for (3), i suppose that one could further encapsulate these interfaces by providing spelled-out methods like "def getcode(f): return f.func_code", but it didn't seem worth the trouble. So that left just (1), and i had the other parts in mind while trying to describe (1). The type-checkers aren't of much use unless they accurately reflect the availability of the special attributes. Do you see what i'm trying to do? Maybe you can suggest a better way of doing it... anyway, i've tried to compromise in the docstrings as submitted. > - Don't make the predicate in getmembers() default to "lambda x: 1" > Instead make the default None, and skip the predicate test if it is > None. Okay, fine. > - getdoc()'s docstring should describe the margin munging it does. Okay, done. > - findsource() seems off-by one, e.g. > > >>> x = inspect.findsource(inspect.findsource) > >>> x[1] > 138 > > but the function really stars on line 139. 138 was the intended result here. Indeed the function starts on line 139 if you start counting from 1. The reason it returns 138 is that it's the index you would use for the array of lines (thus x[0][x[1]] or file.readlines()[138] is the first line of the function). Which way makes more sense? Should it be changed? > - I notice that currentframe() still uses the try/except trick to get > the frame object. It's much more efficient to provide a C > trampoline for getting that information. Sure, if there's a faster way, that's fine. It just wasn't something i expected to be used really often, and i wanted to write the module in pure Python so it could be easily maintained. I added a line to clobber the pure-Python currentframe() with sys._getframe() if it exists. > - If this were included in the library, we might want to 2.0-ify it. It currently doesn't rely on any 2.0 features, and it would be kind of nice to have it still work with 1.5 (especially if it is part of a drop-in documentation tool, as it is now, since it goes with htmldoc). -- ?!ng "Computers are useless. They can only give you answers." -- Pablo Picasso From guido at python.org Wed Jan 3 13:06:33 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 07:06:33 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Apparently getc_unlocked() is in the Single Unix spec. Not sure how widespread that is -- do Linux developers pay attention to this standard at all? According to the webpage it's (c) 1997. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 03 Jan 2001 10:58:44 +0200 From: Erno Kuusela To: guido at python.org Subject: getc_unlocked note hello, i was reading the python-dev archives and saw that someone had noticed my getline/getc_unlocked post from the newsgroup. a correction to the python-dev thread: getc_unlocked and friends are infact standard (not c99 though since c99 doesn't specify threads); they are part of the single unix specification. link: http://www.opennc.org/onlinepubs/007908799/xsh/getc_unlocked.html -- erno ------- End of Forwarded Message From guido at python.org Wed Jan 3 13:37:11 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 07:37:11 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 03 Jan 2001 04:32:55 EST." References: Message-ID: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> > 1. That Perl still takes only 6 seconds in line-at-a-time mode. Are you sure Perl still uses stdio at all? If so, does it open the file in binary or in text mode? Based on the APIs in MS's libc, I presume that the crlf->lf translation is not done by stdio proper but by the Unix I/O emulation just underneath it (open() has an O_BINARY option flag, so read() probably does the translation). That comes down to copying most bytes an extra time. (To test this hypothesis, you could try to open the test file with mode "rb" and see if it makes a difference.) > 2. I originally wrote a getline workalike, instead of building > directly into a PyString buffer. That made my test run *slower*, > and I'm talking factor of 2, not a yawn. To judge from my usually > silent disk (I've got 256Mb RAM on this box), I'm afraid the extra > mallocs required may have triggered the horrid Win9x > malloc-thrashing problem I wrote about while I was still at Dragon. > Consider that another vote for Vlad's PyMalloc -- we've got no > handle on x-platform dynamic memory behavior now. Python's destiny > is to replace both the platform OS and libc anyway <0.9 wink>. > > The scary parts: > > + As the "XXX" comments indicate, this is full of little > insecurities. My biggest worry: thread-safety. There must be a way to lock the file (you indicated that fgets() uses it). > + Another one I just thought of: if the user's last operations on > the fp were two or more consecutive ungetc calls, all bets are off. > But then MS doesn't define what happens then either. Python doesn't have an interface to ungetc(), and I believe the stdio standard says you can only call ungetc() once consecutively. Assuming other C code linked with Python obeys this rule (a pretty safe assumption), we should be fine. And if the assumption is violated, I presume it's really that C code's fault -- plus, it code that only uses getc() would be screwed just as badly. > + This is much less ambitious than I recall Perl's code being: it > doesn't try to guess anything about the file, and effectively > captures only what would happen if you could unroll the guts of a > getc-in-a-loop and optimize the snot out of it. The good news is > that this means it's much easier to maintain (it touches only two of > the MS FILE* fields, and in ways that are pretty obviously correct). > The bad news is that this seems also pretty clearly all there *is* > to be gotten out of breaking into the FILE* abstraction for the > particular test case I'm using; and increasing TUNEME doesn't save > any time at all: the sucker is flying at full speed already. You probably don't have many lines longer than 1000 characters. > + It drops (line-at-a-time) drops to a little under 13 seconds if I > comment out the thread macros. If you mean the Py_BLOCK_THREADS around the resize, that can be safely dropped. (If/when we introduce Vladimir's malloc, we'll have to decide whether it is threadsafe by itself or whether it requires the global interpreter lock. I vote to make it threadsafe by itself.) > + I haven't looked at Perl's implementation in a year, and they must > have dreamt up another trick since then. That's a "scary part" > indeed to anyone who has ever looked at Perl's implementation. > > retreating-into-a-fetal-position-ly y'rs - tim > > > Anyone wants to play, the sandbox is fileobject.c. Do two things: > insert this new chunk somewhere above get_line: > > #ifdef MS_WIN32 > static PyObject* > win32_getline(FILE *fp) > { > /* XXX ignores thread safety -- but so does MS's getc macro! */ > PyObject* v; > char* pBuf; /* next free slot in v's buffer */ > /* MS's internals are declared in terms of ints, but it's a sure bet > * that won't last forever -- use size_t now & live w/ the casting; > * ditto for Python's routines > */ > size_t total_buf_size = 100; > size_t free_buf_size = total_buf_size; > #define TUNEME 1000 /* how much to boost the string buffer when exhausted */ > > v = PyString_FromStringAndSize((char *)NULL, (int)total_buf_size); > if (v == NULL) > return NULL; > pBuf = BUF(v); > Py_BEGIN_ALLOW_THREADS > for (;;) { > char ch; > size_t ms_cnt; /* FILE->_cnt shadow */ > char* ms_ptr; /* FILE->_ptr shadow */ > size_t max_to_copy, i; > /* stdio buffer empty or in unknown state; rather > * than try to simulate every quirk of MS's internals, > * let the MS macros deal with it. > */ > /* XXX we also wind up here when we simply run out of string > * XXX buffer space, but I'm not sure I care: making this a > * XXX double-nested loop doesn't seem worth it > */ > ch = getc(fp); > if (ch == EOF) > break; > /* make sure we've got some breathing room */ > if (free_buf_size < 100) { > size_t currentoffset = pBuf - BUF(v); > total_buf_size += TUNEME; /* XXX check for overflow */ > Py_BLOCK_THREADS > if (_PyString_Resize(&v, (int)total_buf_size) < 0) > return NULL; > Py_UNBLOCK_THREADS > pBuf = BUF(v) + currentoffset; > free_buf_size = TUNEME; > } > /* ch wasn't EOF, so store it */ > *pBuf++ = ch; > --free_buf_size; > if (ch == '\n') { > break; > } > ms_cnt = (size_t)fp->_cnt; > if (!ms_cnt) { > /* XXX this is a slow way to read one character at > * XXX a time if, e.g., the stream is unbuffered > */ > continue; > } > /* payback! now we don't have to check for buffer overflows or > * EOF inside the loop, nor does the macro _filbuf() branch force > * _ptr and _cnt in and out of memory on each iteration > */ > ms_ptr = fp->_ptr; > assert(ms_cnt > 0); > i = max_to_copy = ms_cnt < free_buf_size ? ms_cnt : free_buf_size; Doesn't it make more sense to delay the resize until this point? I don't know how much the character copying accounts for, but I could imagine a strategy based on memchr() and memcpy() that first searches for a \n, and if found, allocates to the right size before copying. Typically, the buffer contains many lines, so this could be optimized into requiring a single exactly-sized malloc() call in the common case (where the buffer doesn't wrap). But possibly scanning the buffer for \n and then copying the bytes separately, even with memcmp() and memcpy(), slows things down too much for this to be faster. > do { > /* XXX unclear to me why MS's getc macro does "& 0xff" */ > *pBuf++ = ch = *ms_ptr++ & 0xff; I know why. getchar() returns an int in the range [-1, 255]. If chars are signed the &0xff is needed else you would get a return in the range [-128, 127] and -1 would be ambiguous (EOF==-1). Not sure if they *are* unsigned on any MS platform -- if they aren't, whoever coded this wasn't thinking -- on the other hand the compiler probagbly optimizes it out. But here since you're copying to another character, it's pointless. > } while (--i && ch != '\n'); > /* update the shadows & counters */ > fp->_ptr = ms_ptr; > free_buf_size -= max_to_copy - i; > fp->_cnt = ms_cnt - (max_to_copy - i); > if (ch == '\n') > break; > } > Py_END_ALLOW_THREADS > _PyString_Resize(&v, pBuf - BUF(v)); > return v; > } > #endif > > 2. Within get_line, add this before the #endif (this is the getline #if block): > > #elif defined(MS_WIN32) > if (n == 0) { > return win32_getline(fp); > } Note that get_line() with negative n could be implemented as get_line(0) with some post processing. This should be done completely separately, in PyFile_GetLine. The negative n case is only used by raw_input() -- it means strip the \n and raise EOFError for EOF, and I expect that this is rarely if ever used in a speed-conscious situation. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 3 15:56:31 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 09:56:31 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 03 Jan 2001 07:06:33 EST." <200101031206.HAA19182@cj20424-a.reston1.va.home.com> References: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Message-ID: <200101031456.JAA19990@cj20424-a.reston1.va.home.com> > Apparently getc_unlocked() is in the Single Unix spec. Not sure how > widespread that is -- do Linux developers pay attention to this > standard at all? According to the webpage it's (c) 1997. Erno Kuusela gave me some more info about this; glibc supports it. I did a quick test which suggests that it is a lot faster than regular getc() -- on a small test file it's actually faster than GNU getline(), even with the proper flockfile() / funlockfile() calls. (The test file was 6Mb -- 10 copies of /etc/termcap, which has short lines -- avg 43 chars.) This together with Tim's Win32x specific hacks might be the best we can do for get_line(). However, raw xreadlines is still almost twice as fast, so it's still under consideration. Maybe MS supports a similar unlocked getc macro, and a separate primitive to lock/unlock a file? That would allow more unified code. (Quick research shows that it exists, but only in internal form. We could probably call _lock_file() and _unlock_file(), and define our own getc_lk(), protected by the proper set of macros. This could all be presented by config.h as flockfile(), funlockfile(), and getc_unlocked() macros.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Wed Jan 3 16:27:09 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 10:27:09 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101031206.HAA19182@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 07:06:33AM -0500 References: <200101031206.HAA19182@cj20424-a.reston1.va.home.com> Message-ID: <20010103102709.A19451@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 07:06:33AM -0500, Guido van Rossum wrote: >Apparently getc_unlocked() is in the Single Unix spec. Not sure how >widespread that is -- do Linux developers pay attention to this >standard at all? According to the webpage it's (c) 1997. It seems to be in glibc 2.1, but I don't know how much it would help, and the added complexity of having to lock the file separately worries me, perhaps due to a superstitious fear of angering the Thread Gods. --amk From akuchlin at mems-exchange.org Wed Jan 3 16:44:57 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 10:44:57 -0500 Subject: [Python-Dev] Help wanted with setup.py script In-Reply-To: <017201c0759a$c2b180c0$e000a8c0@thomasnotebook>; from thomas.heller@ion-tof.com on Wed, Jan 03, 2001 at 04:35:10PM +0100 References: <200012290046.TAA01346@207-172-57-128.s128.tnt2.ann.va.dialup.rcn.com> <017201c0759a$c2b180c0$e000a8c0@thomasnotebook> Message-ID: <20010103104457.A19493@kronos.cnri.reston.va.us> [Cc'ing to python-dev]. On Wed, Jan 03, 2001 at 04:35:10PM +0100, Thomas Heller wrote: >You didn't expect this script run under windows? >(It does not run) It shouldn't matter, I think, since the makesetup stuff doesn't run on Windows either; presumably the compiled-in modules are specified by an MSVC project file, or something similar. Can anyone confirm that I don't care if setup.py works on Windows? (Well, I *know* for a fact I don't care; but should I? :) ) --amk From guido at python.org Wed Jan 3 16:49:43 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 10:49:43 -0500 Subject: [Python-Dev] Help wanted with setup.py script In-Reply-To: Your message of "Wed, 03 Jan 2001 10:44:57 EST." <20010103104457.A19493@kronos.cnri.reston.va.us> References: <200012290046.TAA01346@207-172-57-128.s128.tnt2.ann.va.dialup.rcn.com> <017201c0759a$c2b180c0$e000a8c0@thomasnotebook> <20010103104457.A19493@kronos.cnri.reston.va.us> Message-ID: <200101031549.KAA20188@cj20424-a.reston1.va.home.com> > It shouldn't matter, I think, since the makesetup stuff doesn't run on > Windows either; presumably the compiled-in modules are specified by an > MSVC project file, or something similar. Can anyone confirm that I > don't care if setup.py works on Windows? (Well, I *know* for a fact I > don't care; but should I? :) ) Personally, I don't think it's worth to make setup.py work for Windows. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Wed Jan 3 21:04:07 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 15:04:07 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: ; from noreply@sourceforge.net on Wed, Jan 03, 2001 at 08:47:30AM -0800 References: Message-ID: <20010103150407.D20301@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 08:47:30AM -0800, GvR wrote: >Summary: speed up readline() using getc_unlocked() So what does the performance of this version look like? --amk From guido at python.org Wed Jan 3 21:25:53 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 15:25:53 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 15:04:07 EST." <20010103150407.D20301@kronos.cnri.reston.va.us> References: <20010103150407.D20301@kronos.cnri.reston.va.us> Message-ID: <200101032025.PAA27457@cj20424-a.reston1.va.home.com> > >Summary: speed up readline() using getc_unlocked() > > So what does the performance of this version look like? Very slightly faster than the GNU getline() version. Without GNU getline, the old code was about 3.5 times slower. Here are the current times on a 6 Mb file (fileinput.py has my sourceforge speedup patch too): $ ./python ~/rltest.py ~/termcapx10 total 6252720 chars and 146250 lines; average line length 42.8 count_chars_lines 0.943 0.930 readlines_sizehint 0.544 0.540 using_fileinput 2.089 2.090 while_readline 0.956 0.960 For comparison, here's what Python 1.5.2 does with the same test (which should be pretty close to what the released Python 2.0 does; I don't have a copy of that handy). $ python1.5 ~/rltest.py ~/termcapx10 total 6252720 chars and 146250 lines; average line length 42.8 count_chars_lines 0.836 0.820 readlines_sizehint 0.523 0.520 using_fileinput 5.739 5.740 while_readline 3.670 3.670 I don't know why count_chars_lines got proportionally more slower than readlines_sizehint. (The += operator didn't make a difference either way.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 3 21:45:38 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 15:45:38 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103082] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 15:25:53 EST." <200101032025.PAA27457@cj20424-a.reston1.va.home.com> References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> Message-ID: <200101032045.PAA27595@cj20424-a.reston1.va.home.com> I should add that the patches are on SourceForge: fileinput.py: http://sourceforge.net/patch/?func=detailpatch&patch_id=103081&group_id=5470 fileobject.c: http://sourceforge.net/patch/?func=detailpatch&patch_id=103082&group_id=5470 I'm ready to check these in, but I'm waiting 24 hours in case there's something I've missed. (I haven't actually tested these on any other platform besides Linux.) Jeff Epler's xreadlines patch is here: http://sourceforge.net/patch/?func=detailpatch&patch_id=102915&group_id=5470 Note that Jeff's patch includes a patch to fileinput.py that does the same thing as mine but using his xreadlines module instead of directly using readlines(sizehint) as does mine. I like my approach better, mostly because it reduces depenencies. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Wed Jan 3 22:25:30 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 3 Jan 2001 16:25:30 -0500 Subject: [Python-Dev] speed up readline() using getc_unlocked() In-Reply-To: <200101032045.PAA27595@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 03:45:38PM -0500 References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> <200101032045.PAA27595@cj20424-a.reston1.va.home.com> Message-ID: <20010103162530.A20433@kronos.cnri.reston.va.us> On Wed, Jan 03, 2001 at 03:45:38PM -0500, Guido van Rossum wrote: >I'm ready to check these in, but I'm waiting 24 hours in case there's >something I've missed. (I haven't actually tested these on any other >platform besides Linux.) On Solaris 2.6, the configure script doesn't detect that getc_unlocked() & friends are supported; details available from the patch. After editing config.h manually to enable them, the results are: Before getc_unlocked patch: total 1559913 chars and 32513 lines count_chars_lines 0.892 0.730 readlines_sizehint 0.329 0.300 using_fileinput 4.612 4.470 while_readline 2.739 2.670 After patch: total 1559913 chars and 32513 lines count_chars_lines 0.698 0.680 readlines_sizehint 0.273 0.270 using_fileinput 2.707 2.700 while_readline 0.778 0.780 amarok src> With a patched version of fileinput.py: using_fileinput 1.675 1.680 --amk From guido at python.org Wed Jan 3 22:36:07 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 03 Jan 2001 16:36:07 -0500 Subject: [Python-Dev] speed up readline() using getc_unlocked() In-Reply-To: Your message of "Wed, 03 Jan 2001 16:25:30 EST." <20010103162530.A20433@kronos.cnri.reston.va.us> References: <20010103150407.D20301@kronos.cnri.reston.va.us> <200101032025.PAA27457@cj20424-a.reston1.va.home.com> <200101032045.PAA27595@cj20424-a.reston1.va.home.com> <20010103162530.A20433@kronos.cnri.reston.va.us> Message-ID: <200101032136.QAA07752@cj20424-a.reston1.va.home.com> > On Solaris 2.6, the configure script doesn't detect that > getc_unlocked() & friends are supported; details available from the > patch. (Fixed now, see the new patch.) > After editing config.h manually to enable them, the results are: > > Before getc_unlocked patch: > total 1559913 chars and 32513 lines > count_chars_lines 0.892 0.730 > readlines_sizehint 0.329 0.300 > using_fileinput 4.612 4.470 > while_readline 2.739 2.670 > > After patch: > total 1559913 chars and 32513 lines > count_chars_lines 0.698 0.680 > readlines_sizehint 0.273 0.270 > using_fileinput 2.707 2.700 > while_readline 0.778 0.780 > amarok src> > > With a patched version of fileinput.py: > using_fileinput 1.675 1.680 Thanks! The bottom line seems to be that your basic readline loop is still 3x as slow as the fastest way -- so there's still a lot to say for xreadlines... --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Jan 3 22:42:48 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 03 Jan 2001 22:42:48 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib codecs.py,1.13,1.14 References: Message-ID: <3A539CD8.367361B8@lemburg.com> "M.-A. Lemburg" wrote: > > Update of /cvsroot/python/python/dist/src/Lib > In directory usw-pr-cvs1:/tmp/cvs-serv26608/Lib > > Modified Files: > codecs.py > Log Message: > ... > > This patch closes the bugs #116285 and #119960. I was too fast... the subject line of #119960 was misleading. It is still open. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Thu Jan 4 00:13:15 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 3 Jan 2001 18:13:15 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Are you sure Perl still uses stdio at all? Pretty sure, but there are so many layers of macros the code is undecipherable, and I can't step thru macros in the debugger either (that's assuming I wanted to devote N hours to building Perl from source too -- which I don't). Perl also makes heavy use of macroizing std library names, so e.g. when I see "fopen" (which I do!), that doesn't mean I'm getting the fopen I'm thinking of. But the MSVC config files define all sorts of macros to get at the MS stdio _cnt and _ptr (and most other) FILE* fields, and the version of fopen in the Win32 stuff appears to defer to the platform fopen (after doing Perlish stuff, like if someone passed "/dev/null" as the file name, Perl changes it to "NUL"). This is what it's like: the first line of Perl's win32_fopen is this: dTHXo; That's conditionally defined in perl.h, either as #define dTHXo dTHXoa(PERL_GET_THX) or, if pTHXo is not defined, as # define dTHXo dTHX dTHX in turn is #defined in 4 different places across 3 different files in 2 different directories. I'll skip those. OTOH, dTHXoa is easy! It's only defined once: #define dTHXoa(a) pTHXo = a Ah, *that* clears it up . Etc. 20 years ago I may have thought this was fun. I thought debugging large systems of m4 macros was fun then, and I'm not sure this is either better or worse than that -- well, it's worse, because I understood m4's implementation. > If so, does it open the file in binary or in text mode? Sorry, but I really don't know and it's a pit to pursue. If it's not native text mode, they do a good job of faking it (e.g., Ctrl-Z acts like an EOF when reading a text file from Perl on Windows -- not something even Larry would be likely to do on his own ). > Based on the APIs in MS's libc, I presume that the crlf->lf > translation is not done by stdio proper but by the Unix I/O > emulation just underneath it (open() has an O_BINARY option > flag, so read() probably does the translation). Yes; and late in the last release cycle, import.c's open_exclusive had a Windows bug related to this (fdopen() used "wb", but the earlier open() didn't use O_BINARY, and fdopen *acted* like it had used "w"). Also, the MS setmode() function works on file handles, not streams. > That comes down to copying most bytes an extra time. Understood. But the CRLF are stored physically on disk, so unless the disk controller is converting them, *someone's* software (whether MS's or Perl's) is doing it. By the time Perl is doing its fast line-input stuff, and doing what sure looks like a straight copy out of an IO buffer, it's clear from the code that CRLF has already been translated to LF. > (To test this hypothesis, you could try to open the test file > with mode "rb" and see if it makes a difference.) In Python, that saved about 10% (but got the wrong answers ). In Perl, about 15-20%. But I don't think that tells us who's doing the translation. Assuming that the translation takes about the same total time for each, it makes sense that the percentage would be higher for Perl (since its total runtime is lower: same-sized slice of a smaller pie). > My biggest worry: thread-safety. There must be a way to lock > the file (you indicated that fgets() uses it). Yes, via the unadvertised _lock_str and _unlock_str macros defined in MS mtdll.h, which is not on the include path: /* * This is an internal C runtime header file. It is used when building * the C runtimes only. It is not to be used as a public header file. */ The routines and macros it calls are also unadvertised. After an hour of thrashing I wasn't able to successfully link any code trying to call these routines. Doesn't mean it's impossible, does means they're internal to MS libc and aren't meant to be called by anything else. That's why it's called "cheating" . Perl appears to ignore the whole issue (but Perl's thread story is muddy at best). [... ungetc ...] Not worried here either. > ... > You probably don't have many lines longer than 1000 characters. None, in fact. >> + It drops (line-at-a-time) drops to a little under 13 seconds if I >> comment out the thread macros. > If you mean the Py_BLOCK_THREADS around the resize, that can be safely > dropped. I meant *all* thread-related macros -- was just trying to get a feel for how much that fiddling cost (it's an expense Perl doesn't seem to have -- yet). Was measurable but not substantial. WRT the resize, there's now a "fast path" that avoids it. > (If/when we introduce Vladimir's malloc, we'll have to decide whether > it is threadsafe by itself or whether it requires the global > interpreter lock. I vote to make it threadsafe by itself.) As feared, this thread is going to consume my life <0.5 wink>. > ... > Doesn't it make more sense to delay the resize until this point? I > don't know how much the character copying accounts for, but I could > imagine a strategy based on memchr() and memcpy() that first searches > for a \n, and if found, allocates to the right size before copying. > Typically, the buffer contains many lines, so this could be optimized > into requiring a single exactly-sized malloc() call in the common case > (where the buffer doesn't wrap). But possibly scanning the buffer for > \n and then copying the bytes separately, even with memcmp() and > memcpy(), slows things down too much for this to be faster. Turns out that Perl does very much what I was doing; the Perl code is actually more burdensome, because its routine is trying to deal not only with \n-termination, but also arbitrary-string termination (Perl's Awk-like input record separator), and "paragraph mode", and fixed-size reads, and some other stuff I can't figure out from the macro names. In all cases with a terminator, though, it's doing the same business of both copying and testing in a very tight inner loop. It doesn't appear to make any serious attempts to avoid resizing the buffer. But, Perl has its own malloc routines, and I'm guessing they're highly tuned for this stuff. Since we're stuck with the MS malloc-- and Win9x's in particular seems lame --adding this near the start of my stuff did yield a nice speedup: if (fp->_cnt > 0 && (pBuf = (char *)memchr(fp->_ptr, '\n', fp->_cnt)) != NULL) { /* it's all in the buffer so don't bother releasing the * global lock */ total_buf_size = pBuf - fp->_ptr + 1; v = PyString_FromStringAndSize(fp->_ptr, (int)total_buf_size); if (v != NULL) { pBuf = BUF(v) + total_buf_size; fp->_cnt -= total_buf_size; fp->_ptr += total_buf_size; } goto done; } So that builds the result string directly from the stdio buffer when it can. Times dropped from (before this particular small hack) count_chars_lines 14.880 14.854 readlines_sizehint 9.280 9.302 using_fileinput 48.610 48.589 while_readline 13.450 13.451 to count_chars_lines 14.780 14.784 readlines_sizehint 9.550 9.514 using_fileinput 43.560 43.584 while_readline 10.600 10.578 Since I have no long lines in this test data, and the stdio buffer typically contains thousands of chars, most calls should be satisfied by the fast path. Compared to the previous code, the fast path (1) avoids global lock fiddling (but that didn't account for much in a distinct test); (2) crawls over the buffer twice instead of once; and, (3) avoids one (shrinking!) realloc. So crawling over the buffer an extra time costs nothing compared to the cost of a resize; and that's likely just more evidence that malloc/realloc suck on this platform. CAUTION: no file locking is going on now (because I haven't found a way to do it). My previous claim that the MS getc macro did no locking was wrong, as I discovered by stepping thru the generated machine code. stdio.h #defines getc without locking, but in _MT mode it later gets #undef'ed and turned into a function call. >> /* XXX unclear to me why MS's getc macro does "& 0xff" */ >> *pBuf++ = ch = *ms_ptr++ & 0xff; > I know why. getchar() returns an int in the range [-1, 255]. If > chars are signed the &0xff is needed else you would get a return in > the range [-128, 127] and -1 would be ambiguous (EOF==-1). Bingo -- MS chars are signed. > ... > But here since you're copying to another character, it's pointless. Yup! Gone. > .... > Note that get_line() with negative n could be implemented as > get_line(0) with some post processing. Andrew's glibc getline code appears to have wanted to do that, but looks to me like it's unreachable (unless I'm hallucinating, the "n < 0" test after return from glibc getline can't succeed, because the enclosing block is guarded by an "n==0" test). > This should be done completely separately, in PyFile_GetLine. I assume you have an editor . > The negative n case is only used by raw_input() -- it means strip > the \n and raise EOFError for EOF, and I expect that this is rarely > if ever used in a speed-conscious situation. I've never seen raw_input used except when stdin and stdout were connected to a tty. When I tried raw_input from a DOS box under the debugger, it never called get_line. Something trickier is going on there; I suspect it's actually calling fgets (eventually) instead in that case. more-mysteries-than-i-really-need-ly y'rs - tim From jeremy at alum.mit.edu Thu Jan 4 01:06:58 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 3 Jan 2001 19:06:58 -0500 (EST) Subject: [Python-Dev] Mailman problems? In-Reply-To: References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: <14931.48802.273143.209933@localhost.localdomain> Tim & Barry, It looks like the is some problem with Mailman that is garbling messages to python-dev. It may only affect lines that begin with a tab; not sure. Your most recent message came through with the following line > dTHXo; (This was not the only example.) I think this was supposed to be a line of C code, but whatever meaningful contents it had were rendered as gobbledygook. Jeremy From loewis at informatik.hu-berlin.de Thu Jan 4 01:13:16 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Thu, 4 Jan 2001 01:13:16 +0100 (MET) Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> > Apparently getc_unlocked() is in the Single Unix spec. Not sure how > widespread that is -- do Linux developers pay attention to this > standard at all? Ulrich Drepper, who is in charge of glibc, is always interested in following Single Unix to the letter; getc_unlocked is supported atleast since glibc 2.0. http://www.sun.com/smcc/solaris-migration/docs/courses/threadsHTML/adv.html claims that getc_unlocked is already in POSIX.1c; Solaris apparently supports it atleast since Solaris 2.4. Irix has it since 6.5, Tru64 atleast since 4.0d (probably much longer); HPUX since 11.0, AIX since atleast 4.3. Of the BSDs, only OpenBSD appears to support it; it knows that it is in ANSI 1003.1 since 1996-07-12. SCO OpenServer doesn't support it. Regards, Martin From fredrik at effbot.org Thu Jan 4 01:20:41 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 4 Jan 2001 01:20:41 +0100 Subject: [Python-Dev] Mailman problems? References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> <14931.48802.273143.209933@localhost.localdomain> Message-ID: <011901c075e4$2ce96360$e46940d5@hagrid> > It looks like the is some problem with Mailman that is garbling > messages to python-dev. It may only affect lines that begin with a > tab; not sure. > > Your most recent message came through with the following line > > > dTHXo; > > (This was not the only example.) > > I think this was supposed to be a line of C code, but whatever > meaningful contents it had were rendered as gobbledygook. also looks like Mailman removed all smileys from Jeremys post ;-) From thomas at xs4all.net Thu Jan 4 01:27:54 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 01:27:54 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101040013.BAA13436@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Thu, Jan 04, 2001 at 01:13:16AM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> Message-ID: <20010104012753.D2467@xs4all.nl> On Thu, Jan 04, 2001 at 01:13:16AM +0100, Martin von Loewis wrote: > Of the BSDs, only OpenBSD appears to support it; it knows that it is > in ANSI 1003.1 since 1996-07-12. BSDI supports getc_unlocked() at least since BSDI 3.1. I don't have any older boxes to check, but the manpage for getc and all its friends carries the timestamp 'June 4, 1993', which implies it could have been available a lot longer. (Note that BSD was once known to *define* the standard ;-) I concur that FreeBSD does not currently support getc_unlocked, but since BSDI and FreeBSD are merging, I suspect it will, soonish. In other words: use it! :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry at wooz.org Thu Jan 4 03:59:01 2001 From: barry at wooz.org (Barry A. Warsaw) Date: Wed, 3 Jan 2001 21:59:01 -0500 Subject: [Python-Dev] Re: Mailman problems? References: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> <14931.48802.273143.209933@localhost.localdomain> Message-ID: <14931.59125.391596.730296@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> It looks like the is some problem with Mailman that is JH> garbling messages to python-dev. It may only affect lines JH> that begin with a tab; not sure. JH> Your most recent message came through with the following line >> dTHXo; JH> (This was not the only example.) JH> I think this was supposed to be a line of C code, but whatever JH> meaningful contents it had were rendered as gobbledygook. Oh shoot, my bad. I dropped in an experimental Perl filter module in the delivery pipeline. It's been so long since I hacked Perl, I think I meant to write $%_-> when I really wrote %$_-> -Barry From tim.one at home.com Thu Jan 4 05:26:51 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 3 Jan 2001 23:26:51 -0500 Subject: [Python-Dev] RE: Mailman problems? In-Reply-To: <14931.48802.273143.209933@localhost.localdomain> Message-ID: [Jeremy] > It looks like the is some problem with Mailman that is garbling > messages to python-dev. It may only affect lines that begin with a > tab; not sure. > > Your most recent message came through with the following line > >> dTHXo; > > (This was not the only example.) > > I think this was supposed to be a line of C code, but whatever > meaningful contents it had were rendered as gobbledygook. I have no idea where that "o" came from! It was supposed to be "o". Barry, fix it! BTW, the second line of Perl implementation functions is usually a lot less mysterious than the first. If anyone wants the joy of reverse-engineering Perl's supernaturally fast input, it's function Perl_sv_gets in file sv.c. sv.c? Yes! The destination of a one-line input is a Scalar Value, hence, sc. I expect there's similar method behind all of this stuff, but I never stumbled into the key. To get you started, here's the first line of Perl_sv_gets: dTHR; The line you're looking for is 119 lines down from that: if ((*bp++ = *ptr++) == rslast) /* really | dust */ the-comment-makes-more-sense-in-context-ly y'rs - tim From thomas at xs4all.net Thu Jan 4 07:51:17 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 07:51:17 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101040037.TAA08699@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 03, 2001 at 07:37:22PM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> Message-ID: <20010104075116.J402@xs4all.nl> On Wed, Jan 03, 2001 at 07:37:22PM -0500, Guido van Rossum wrote: > > In other words: use it! :) > > Mind doing a few platform tests on the (new version of the) patch? Well, only a bit :) It's annoying that BSDI doesn't come with autoconf, but I managed to use all my early-morning wit (it's 6:30AM ) to work around it. I've tested it on BSDI 4.1 and FreeBSD 4.2-RELEASE. > I already know that it works on Red Hat Linux 6.2 (my box) and Solaris > 2.6 (Andrew's box). I would be delighted to know that it works on at > least one other platform that has getc_unlocked() and one platform > that doesn't have it! Sorry, I have to disappoint you. FreeBSD does have getc_unlocked, they just didn't document it. Hurrah for autoconf ;P Anyway, it worked like a charm on BSDI: (Python 2.0) total 1794310 chars and 37660 lines count_chars_lines 0.310 0.300 readlines_sizehint 0.150 0.150 using_fileinput 2.013 2.017 while_readline 1.006 1.000 (CVS Python + getc_unlocked) daemon2:~/python/python/dist/src > ./python test.py termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.354 0.350 readlines_sizehint 0.182 0.183 using_fileinput 1.594 1.583 while_readline 0.363 0.367 But something weird is going on on FreeBSD: (Standard CVS Python) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.265 0.266 readlines_sizehint 0.148 0.148 using_fileinput 0.943 0.938 while_readline 0.214 0.219 (CVS+getc_unlocked) > ./python-getc-unlocked ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.266 0.266 readlines_sizehint 0.151 0.141 using_fileinput 1.066 1.078 while_readline 0.283 0.281 This was sufficiently unexpected that I looked a bit further. The FreeBSD Python was compiled without editing Modules/Setup, so it was statically linked, no readline etc, but *with* threads (which are on by default, and functional on both FreeBSD and BSDI 4.1.) Here's the timings after I enabled just '*shared*': (CVS + *shared*) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.276 0.273 readlines_sizehint 0.150 0.156 using_fileinput 0.902 0.898 while_readline 0.206 0.203 (This was not a fluke, I repeated it several times, getting hardly any variation.) Enabling readline and cursesmodule had no additional effect. Adding *shared* to the getc_unlocked tree saw roughly the same improvement, but was still slower than without getc_unlocked. (CVS + *shared* + getc_unlocked) > ./python ~thomas/test.py ~thomas/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.272 0.273 readlines_sizehint 0.149 0.148 using_fileinput 1.031 1.031 while_readline 0.267 0.266 Increasing the size of the testfile didn't change anything, other than the absolute numbers. I browsed stdio.h, where both getc() and getc_unlocked() are defined as macros. getc_unlocked is defined as: #define __sgetc(p) (--(p)->_r < 0 ? __srget(p) : (int)(*(p)->_p++)) #define getc_unlocked(fp) __sgetc(fp) and getc either as #define getc(fp) getc_unlocked(fp) (without threads) or static __inline int \ __getc_locked(FILE *_fp) \ { \ extern int __isthreaded; \ int _ret; \ if (__isthreaded) \ _FLOCKFILE(_fp); \ _ret = getc_unlocked(_fp); \ if (__isthreaded) \ funlockfile(_fp); \ return (_ret); \ } #define getc(fp) __getc_locked(fp) _FLOCKFILE(x) is defined as flockfile(x), so that isn't the difference. The speed difference has to be in the quick-and-easy test for whether the locking is even necessary. Starting a thread on 'time.sleep(900)' in test.py shows these numbers: (standard CVS python) > ./python-shared-std ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.433 0.445 readlines_sizehint 0.204 0.188 using_fileinput 1.595 1.594 while_readline 0.456 0.453 (getc_unlocked) > ./python-getc-unlocked-shared ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.441 0.453 readlines_sizehint 0.206 0.195 using_fileinput 1.677 1.688 while_readline 0.509 0.508 So... using getc_unlocked manually for performance reasons isn't a cardinal sin on FreeBSD only if you are really using threads :-) Lets-outsmart-the-OS-scheduler-next!-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Thu Jan 4 08:57:26 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 08:57:26 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test/output test_coercion,1.2,1.3 In-Reply-To: ; from nascheme@users.sourceforge.net on Wed, Jan 03, 2001 at 05:36:27PM -0800 References: Message-ID: <20010104085726.E2467@xs4all.nl> On Wed, Jan 03, 2001 at 05:36:27PM -0800, Neil Schemenauer wrote: > Update of /cvsroot/python/python/dist/src/Lib/test/output > In directory usw-pr-cvs1:/tmp/cvs-serv21710/Lib/test/output > > Modified Files: > test_coercion > Log Message: > Sequence repeat works now for in-place multiply with an integer type > as the left operand. I don't know if this is a feature or a bug. > ! 2 *= [1] => [1, 1] It's a feature. x = 2 * [1] works, so x = 2 x *= [1] does, too. Obviously, '2 *= [1]' shouldn't, but I'm assuming you don't actually execute that (it should give a SyntaxError) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Thu Jan 4 10:32:55 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 4 Jan 2001 10:32:55 +0100 Subject: [Python-Dev] RE: Mailman problems? References: Message-ID: <00a701c07631$531983b0$e46940d5@hagrid> tim wrote: > I have no idea where that "o" came from! It was supposed to be "o". > Barry, fix it! no need. from the perlguts man page: "You can ignore [pad]THX[xo] when browsing the Perl headers/sources." in-my-dictionary-perl's-an-american-physicist-ly yrs /F From mal at lemburg.com Thu Jan 4 11:02:35 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 04 Jan 2001 11:02:35 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 References: Message-ID: <3A544A3B.32B86792@lemburg.com> Neil Schemenauer wrote: > > Update of /cvsroot/python/python/dist/src/Include > In directory usw-pr-cvs1:/tmp/cvs-serv21006/Include > > Modified Files: > classobject.h > Log Message: > Remove PyInstance_*BinOp functions. > > Index: classobject.h > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Include/classobject.h,v > retrieving revision 2.33 > retrieving revision 2.34 > diff -C2 -r2.33 -r2.34 > *** classobject.h 2000/09/01 23:29:26 2.33 > --- classobject.h 2001/01/04 01:30:34 2.34 > *************** > *** 60,71 **** > extern DL_IMPORT(int) PyClass_IsSubclass(PyObject *, PyObject *); > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > - char *, char *, > - PyObject * (*)(PyObject *, > - PyObject *)); > - > - extern DL_IMPORT(int) > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > - PyObject * (*)(PyObject *, PyObject *), int); Wouldn't it be safer to provide emulation APIs for these ? There might be code out there using these APIs. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Thu Jan 4 15:06:53 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 09:06:53 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 In-Reply-To: Your message of "Thu, 04 Jan 2001 11:02:35 +0100." <3A544A3B.32B86792@lemburg.com> References: <3A544A3B.32B86792@lemburg.com> Message-ID: <200101041406.JAA11926@cj20424-a.reston1.va.home.com> > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > > - char *, char *, > > - PyObject * (*)(PyObject *, > > - PyObject *)); > > - > > - extern DL_IMPORT(int) > > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > > - PyObject * (*)(PyObject *, PyObject *), int); > > Wouldn't it be safer to provide emulation APIs for these ? There > might be code out there using these APIs. No. These were never intended to be part of the API (and it was a mistake that they used DL_IMPORT()). They had to be extern because they were defined in one file and used in another. I'm glad they're gone. They are so obscure that I'd be *very* surprised if anybody was using them, and even more if they even *wanted* emulation under the new scheme -- I'd expect them to eagerly convert their code to using new-style numbers right away. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 4 15:16:39 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 09:16:39 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 07:51:17 +0100." <20010104075116.J402@xs4all.nl> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> Message-ID: <200101041416.JAA11983@cj20424-a.reston1.va.home.com> [Thomas finds that on FreeBSD, getc() is faster than getc_unlocked().] Thomas, I really don't understand it. The getc() source code you showed calls getc_unlocked(). So how can it be faster? The answer must be somewhere else... Cache line conflicts, the rewriting of the loop that I did, a compiler bug, the inlining, who knows. Can you compare the generated assembly code? On other platforms, getc_unlocked() typically speeds the readline() test case up by a significant factor (as in your BSDI numbers, where it's almost 3x faster). Could it be that you're mistaken and that somehow getc_unlocked() is *not* chosen on FreeBSD? Then I could believe it, the rewritten loop is so different that the optimizer might have done something different to it. (Check config.h. When all else fails, I put an #error in the #ifdef branch that I expect not to be taken.) Could it be that somehow getc_unlocked() is later defined to be the same as getc(), so choosing it just adds the overhead of calling f[un]lockfile() for each line? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 4 15:59:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 15:59:05 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041416.JAA11983@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 04, 2001 at 09:16:39AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> Message-ID: <20010104155904.L402@xs4all.nl> On Thu, Jan 04, 2001 at 09:16:39AM -0500, Guido van Rossum wrote: > [Thomas finds that on FreeBSD, getc() is faster than getc_unlocked().] > Thomas, I really don't understand it. The getc() source code you > showed calls getc_unlocked(). So how can it be faster? The answer > must be somewhere else... Cache line conflicts, the rewriting of the > loop that I did, a compiler bug, the inlining, who knows. Can you > compare the generated assembly code? On other platforms, > getc_unlocked() typically speeds the readline() test case up by a > significant factor (as in your BSDI numbers, where it's almost 3x > faster). Nono, reread my message, and your code. getc() isn't faster than getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, etc.) Significantly so when there is only one thread running (which is still the common case, for most systems, and FreeBSD's libc has easy inside knowledge about) and marginally so when there is at least one other thread. The small advantage in the multi-threaded case can be explained by the rest of the changes. You see, I was comparing a patched tree versus a non-patched tree, not a getc_unlocked() enabled one versus a disabled one, so I was measuring the speed difference of the *patch*, not of the use of getc_unlocked() vs getc(). Here is the speed difference of just the use of getc() vs getc_unlocked() (same tree, hand-edited config.h) in a non-threaded environment: > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.149 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.271 0.273 readlines_sizehint 0.148 0.148 using_fileinput 0.898 0.898 while_readline 0.214 0.211 As you see, no significant difference. Here is the difference in a threaded environment (a second thread that does just 'time.sleep(900)'): > ./python-getc-disabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.422 readlines_sizehint 0.200 0.211 using_fileinput 1.604 1.594 while_readline 0.465 0.461 > ./python-getc-enabled ~/test.py ~/termcapx10 total 1794310 chars and 37660 lines count_chars_lines 0.429 0.430 readlines_sizehint 0.201 0.203 using_fileinput 1.600 1.602 while_readline 0.463 0.461 ... where I have to note that the getc-disabled version's 'using_fileinput' time fluctuates a lot more, mostly upwards, in the threaded environment. (I see it jump to 1.609, 1.617 cputime, every few runs.) Still not a terribly significant difference, but a hint that we, too, can use inside knowledge ;) > Could it be that you're mistaken and that somehow getc_unlocked() is > *not* chosen on FreeBSD? Then I could believe it, the rewritten loop > is so different that the optimizer might have done something different > to it. (Check config.h. When all else fails, I put an #error in the > #ifdef branch that I expect not to be taken.) Yah, #error is great for debugging, I use it a lot ;) But I'm sure of this. FreeBSD's getc() is just craftily optimized. Note that if we can get get_line using getc_unlocked() to run as fast as get_line using getc() on FreeBSD, it should also benifit other platforms, because the only speed to be had is in our own code :) Not that I'm saying it can be improved, just that it apparently got slower, because of this patch. I can't be much help doing any performance tuning, though, I've about used up my lunchhour and I'm working late tonight ;P Good-thing-my-boss-can't-tell-the-difference-between-Apache-and-Python-src-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Thu Jan 4 16:27:28 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 10:27:28 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 15:59:05 +0100." <20010104155904.L402@xs4all.nl> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> Message-ID: <200101041527.KAA12181@cj20424-a.reston1.va.home.com> [Me & Thomas in violent agreement that there's something weird about the speed of getc_unlocked() vs. getc() on FreeBSD.] I just realized what's the probable cause. Read your timing post again: # BSDI: # # (Python 2.0) # while_readline 1.006 1.000 # # (CVS Python + getc_unlocked) # while_readline 0.363 0.367 # FreeBSD: # # (Standard CVS Python) # while_readline 0.214 0.219 # # (CVS+getc_unlocked) # while_readline 0.283 0.281 Standard CVS Python, as opposed to Python 2.0 as released, uses GNU getline()! So on FreeBSD, for this test case, GNU getline() is faster than getc_unlocked(). So the question is, should I leave the GNU getline() code in? I'm inclined against it -- it's not that much faster, and on other platform getc_unlocked() is faster. Given that getc_unlocked() is a standard (of some sort) and GNU getline() is, well, just that, I'd say let's stick with getc_unlocked(). (Unfortunately, from a phone conversation I had last night with Tim, there's not much hope of doing something there -- and that platform sorely needs it! The hacks that Tim reported earlier are definitely not thread-safe. While it's easy to come up with getc_unlocked() for Windows, the locking operations used internally there by the /MT code are not exported from MSVCRT.DLL, and that's crucial.) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 4 16:31:39 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 16:31:39 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041527.KAA12181@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 04, 2001 at 10:27:28AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <200101041527.KAA12181@cj20424-a.reston1.va.home.com> Message-ID: <20010104163139.M402@xs4all.nl> On Thu, Jan 04, 2001 at 10:27:28AM -0500, Guido van Rossum wrote: > [Me & Thomas in violent agreement that there's something weird about > the speed of getc_unlocked() vs. getc() on FreeBSD.] > I just realized what's the probable cause. Read your timing post > again: > Standard CVS Python, as opposed to Python 2.0 as released, uses GNU > getline()! Sorry, no go. You need two things to use getline(): getline() itself, and a GNU libc. FreeBSD has neither. (And autoconf agrees with me.) If you *really really* want me to, I can compile 2.0-standard on FreeBSD and show you. But I'd rather not :) Now go back and read my other mail about why FreeBSD is faster :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Thu Jan 4 16:43:15 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 4 Jan 2001 10:43:15 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104155904.L402@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 04, 2001 at 03:59:05PM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> Message-ID: <20010104104315.C23803@kronos.cnri.reston.va.us> On Thu, Jan 04, 2001 at 03:59:05PM +0100, Thomas Wouters wrote: >getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ >the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, >etc.) Significantly so when there is only one thread running (which is still So it looks like the ALLOW_THREADS should be moved out of the for loop. This produced no measureable performance difference on Solaris; I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some unusually slow thread operation? --amk From thomas at xs4all.net Thu Jan 4 16:59:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 4 Jan 2001 16:59:25 +0100 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104104315.C23803@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 04, 2001 at 10:43:15AM -0500 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> Message-ID: <20010104165925.G2467@xs4all.nl> On Thu, Jan 04, 2001 at 10:43:15AM -0500, Andrew Kuchling wrote: > On Thu, Jan 04, 2001 at 03:59:05PM +0100, Thomas Wouters wrote: > >getc_unlocked(). getc() is faster than flockfile(f) + getc_unlocked(f) (+ > >the rearranging of the function, use of PyTHREAD_ALLOW inside the outer loop, > >etc.) Significantly so when there is only one thread running (which is still > So it looks like the ALLOW_THREADS should be moved out of the for > loop. This produced no measureable performance difference on Solaris; > I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some > unusually slow thread operation? Note that I was just guessing there. I did a quick scan of the function, and noticed that the ALLOW_THREADS statements had moved into the outer loop. I didn't even contemplate whether that made a difference, so don't trust that judgement. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Thu Jan 4 17:10:29 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 4 Jan 2001 11:10:29 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010104165925.G2467@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 04, 2001 at 04:59:25PM +0100 References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> <20010104165925.G2467@xs4all.nl> Message-ID: <20010104111029.A28510@kronos.cnri.reston.va.us> On Thu, Jan 04, 2001 at 04:59:25PM +0100, Thomas Wouters wrote: >Note that I was just guessing there. I did a quick scan of the function, and >noticed that the ALLOW_THREADS statements had moved into the outer loop. I >didn't even contemplate whether that made a difference, so don't trust that >judgement. According to your benchmark, the performance of the threaded version was the same whether or not getc_unlocked() was unused, so it's not that flockfile() is really slow. I can't believe the compiler optimized the old, ungainly loop better than the newer, tighter loop. That leaves the ALLOW_THREADS as the most reasonable culprit. --amk From akuchlin at mems-exchange.org Thu Jan 4 18:10:11 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 04 Jan 2001 12:10:11 -0500 Subject: [Python-Dev] SGI's Digital Media SDK Message-ID: SGI just made a source release of their digital media SDK for IRIX and Linux at http://oss.sgi.com/projects/dmsdk/ . According to the FAQ, this is derived from previous SGI libraries, "including the Video Library (VL), the Audio Library (AL), Digital Media Image Convertor (DMIC), Digital Media Audio Convertor (DMAC), and the Compression Library (CL)." Interested parties may want to look into this, because Python still has the al, cd, cl, and sv modules; maybe they'd work with the new software with a reasonable amount of fixing, and at least now there's a reasonable chance that non-IRIX platforms will be supported. --amk From guido at python.org Thu Jan 4 20:07:13 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 04 Jan 2001 14:07:13 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 04 Jan 2001 10:43:15 EST." <20010104104315.C23803@kronos.cnri.reston.va.us> References: <200101040013.BAA13436@pandora.informatik.hu-berlin.de> <20010104012753.D2467@xs4all.nl> <200101040037.TAA08699@cj20424-a.reston1.va.home.com> <20010104075116.J402@xs4all.nl> <200101041416.JAA11983@cj20424-a.reston1.va.home.com> <20010104155904.L402@xs4all.nl> <20010104104315.C23803@kronos.cnri.reston.va.us> Message-ID: <200101041907.OAA12573@cj20424-a.reston1.va.home.com> > So it looks like the ALLOW_THREADS should be moved out of the for > loop. This produced no measureable performance difference on Solaris; > I'll leave it to GvR to try it on Linux. I wonder if FreeBSD has some > unusually slow thread operation? I kind of doubt that it's Py_ALLOW_THREADS -- it's in the outer loop, which typically only gets executed once. It only goes around a second time when the line is longer than the initial buffer. We could tweak the initial buffer size (currently 100, with increments of 1000). --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Thu Jan 4 20:32:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 04 Jan 2001 20:32:15 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include classobject.h,2.33,2.34 References: <3A544A3B.32B86792@lemburg.com> <200101041406.JAA11926@cj20424-a.reston1.va.home.com> Message-ID: <3A54CFBF.CDD2138B@lemburg.com> Guido van Rossum wrote: > > > > - extern DL_IMPORT(PyObject *) PyInstance_DoBinOp(PyObject *, PyObject *, > > > - char *, char *, > > > - PyObject * (*)(PyObject *, > > > - PyObject *)); > > > - > > > - extern DL_IMPORT(int) > > > - PyInstance_HalfBinOp(PyObject *, PyObject *, char *, PyObject **, > > > - PyObject * (*)(PyObject *, PyObject *), int); > > > > Wouldn't it be safer to provide emulation APIs for these ? There > > might be code out there using these APIs. > > No. These were never intended to be part of the API (and it was a > mistake that they used DL_IMPORT()). They had to be extern because > they were defined in one file and used in another. I'm glad they're > gone. They are so obscure that I'd be *very* surprised if anybody was > using them, and even more if they even *wanted* emulation under the > new scheme -- I'd expect them to eagerly convert their code to using > new-style numbers right away. I'll see whether I can get mxDateTime working with the new scheme later this year -- it would be really great to do away with the coercion hack I was using until now :-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Fri Jan 5 07:04:56 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 01:04:56 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101041527.KAA12181@cj20424-a.reston1.va.home.com> Message-ID: [Guido van Rossum] > ... > (Unfortunately, from a phone conversation I had last night with > Tim, there's not much hope of doing something there -- and that > platform [Win32] sorely needs it! The hacks that Tim reported > earlier are definitely not thread-safe. While it's easy to come > up with getc_unlocked() for Windows, the locking operations used > internally there by the /MT code are not exported from MSVCRT.DLL, > and that's crucial.) The short course is that I still haven't found a workable way to lock streams on Windows: they do have a complete set of stream-locking functions and macros, but there's no way short of deep magic I can find to get at them ("deep magic" == resort to assembler and patch in function addresses). The only file-locking functions advertised in the C and platform SDK libraries are trivial variants of Python's msvcrt.locking, but that has to do with locking specific file byte-position ranges across processes, not ensuring the integrity of runtime stream structures across threads. Perl appears to ignore the issue of thread safety here (on Windows and everywhere else). Revealing experiment! 1. I threw away my changes and rebuilt from current CVS. 2. I made one change, expanding the getc() call in get_line to what MSVC *would* expand it to if we weren't building in thread mode: if ((c = (--fp->_cnt >= 0 ? 0xff & *fp->_ptr++ : _filbuf(fp))) == EOF) { That alone reduced the runtime of my "while 1: readline" test case from over 30 seconds to 12.8. What I did before went beyond that, by also (in effect) unrolling the loop and optimizing it. That bought an additional ~2 seconds. So compared to Perl's 6 seconds, it looks like we're paying (on Win98SE) approximately: 17 seconds for compiling with _MT (threadsafe libc) 6 seconds to do the work 5 seconds for "other stuff", best guess mostly a poor platform malloc/realloc 2 seconds for not optimizing the loop -- 30 total Unfortunately, the smoking gun is the only one whose firing pin we can't file down on this platform. so-the-good-news-is-that-it's-impossible-for-perl-not-to-be-at- least-twice-as-fast-ly y'rs - tim From guido at python.org Fri Jan 5 16:29:05 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 10:29:05 -0500 Subject: [Python-Dev] Python 2.1 release schedule (PEP 226) Message-ID: <200101051529.KAA19100@cj20424-a.reston1.va.home.com> We had our first PythonLabs meeting of the year yesterday, and we went over the 2.1 release schedule. The release schedule is posted in PEP 226: http://python.sourceforge.net/peps/pep-0226.html We found that the schedule previously posted there was a bit too aggressive, given our goals for this release, so we have adjusted the dates somewhat. We have also decided on a date for the first alpha release (previously unmentioned in the PEP). So, here are the relevant dates: 19-Jan-2001: First 2.1 alpha release 23-Feb-2001: First 2.1 beta release 01-Apr-2001: 2.1 final release We're already in PEP freeze mode -- no more PEPs will be considered for inclusion in 2.1. Below is a list of the PEPs that we are currently considering, with some comments. But first some general remarks: - The alpha release cycle is for testing of tentative features. Alpha releases contain working code that we want to see widely tested; however, it's possible that a feature present in an alpha release is changed or even retracted in a later release. - Beta releases represent a feature freeze -- after the first beta release, we will resign ourselves to fixing bugs. Once beta 1 is released, no new features will be introduced, and no features will be withdrawn. The alpha cycle is especially important for features (such as nested scopes) that (may) introduce backwards incompatibilities. There may be more than one alpha release depending on feedback on the alpha 1 release. (But having too many alpha releases is not good -- people won't bother downloading.) Thus, we can only introduce a new feature in beta 1 if we're very sure that it is mature enough to stay without interface changes. The final decision on all PEPs under consideration has to be made before the beta 1 release. The beta cycle is important to ensure stability of the final release. Specific PEPs under consideration: I 42 pep-0042.txt Small Feature Requests Hylton Actually, most of these won't be fulfilled in 2.1. SD 205 pep-0205.txt Weak References Drake Fred is still working on this. I hope Tim can assist. But we may have to postpone this. S 207 pep-0207.txt Rich Comparisons Lemburg, van Rossum I'm pretty sure that this is a piece of cake now that the coercion patches are checked in. S 208 pep-0208.txt Reworking the Coercion Model Schemenauer All checked in. Great work, Neil! S 217 pep-0217.txt Display Hook for Interactive Use Zadka Moshe, this was accepted ages ago. Would you mind submitting a patch to SourceForge? If you don't champion this (and nobody else does), we may have to postpone it still. S 222 pep-0222.txt Web Library Enhancements Kuchling This is really up to Andrew. It seems he plans to create new modules, so he won't be introducing incompatibilities in existing APIs. S 227 pep-0227.txt Statically Nested Scopes Hylton Jeremy is still working on a proper implementation, which he hopes to have ready in time for the first alpha release date. S 229 pep-0229.txt Using Distutils to Build Python Kuchling I just moved this from pie-in-the-sky to active. Andrew has a working prototype, it just doesn't work 100% yet, so I'm very hopeful. S 230 pep-0230.txt Warning Framework van Rossum All done. S 232 pep-0232.txt Function Attributes Warsaw Still waiting for Barry to implement this, but it's pretty straightforward. S 233 pep-0233.txt Python Online Help Prescod Paul, what's up with this? Tim & I recommended to do something simple and working, and then you disappeared from the face of the earth. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Jan 5 16:28:16 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 5 Jan 2001 10:28:16 -0500 (EST) Subject: [Python-Dev] new "theme" on SourceForge! Message-ID: <14933.59408.512734.105160@cj42289-a.reston1.va.home.com> While "theme-ability" is becoming very popular for desktop software (think about the latest Gnome and KDE systems for Unix, and some of the multimedia applications for Windows, and the newest MacOS desktops), it can be a huge drain on Web sites; too many graphics is a pain, and too many tables just makes it worse. SourceForge had definately fallen prey to the overly-fancy themes, and all of us developers paid the price with slow rendering. But they've fixed that! The SF crew has announced a new "theme" called "Ultra Light" which is optimized for slow connections. What that really means is less embedded graphics and fewer nested tables, so rendering is *much* faster. To try the new theme, go to the "Change My Theme" link near the top of the left-hand navigation area. Use the form to select "Ultra Light"; you can preview the theme first if you want. Guido also thinks its cool that the bug & patch report pages are printable with this theme. (Sheesh... managers! ;) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Fri Jan 5 18:46:16 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 12:46:16 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Lib fileinput.py,1.5,1.6 In-Reply-To: Message-ID: [Guido] > Modified Files: > fileinput.py > Log Message: > Speed it up by using readlines(sizehint). It's still slower than > other ways of reading input. :-( On my box, it's now head-to-head with (maybe even a little quicker than) the while 1: line-at-a-time way: total 117615824 chars and 3237568 lines readlines_sizehint 9.450 9.459 using_fileinput 29.880 29.884 while_readline 30.480 30.506 (stock CVS Python under Win98SE) So that's a huge improvement! the-two-people-using-fileinput-should-be-delighted-ly y'rs - tim From skip at mojam.com Fri Jan 5 20:05:14 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 5 Jan 2001 13:05:14 -0600 (CST) Subject: [Python-Dev] fileinput.py In-Reply-To: References: Message-ID: <14934.6890.160122.384692@beluga.mojam.com> Tim> the-two-people-using-fileinput-should-be-delighted-ly What do you think contributes to fileinput's relative disfavor? This whole thread on Python's file reading performance was started by the eternal whine "why is Python so much slower than Perl?" which really means why is line = f.readline() while line: process(line) so much slower than whatever that thing is in Perl that everybody uses as the be-all-end-all performance benchmark (something with <> in it). Given that fileinput is supposed to make the I/O loop in Python more familiar to those people wandering over from Perl (at least in part), you'd think that people would naturally gravitate to it. Would it benefit from some exposure in the Python tutorial? Is it fast enough now to warrant the extra exposure? just-whining-out-loud-ly y'rs Skip From tim.one at home.com Fri Jan 5 20:11:00 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 14:11:00 -0500 Subject: [Python-Dev] new "theme" on SourceForge! In-Reply-To: <14933.59408.512734.105160@cj42289-a.reston1.va.home.com> Message-ID: [Fred L. Drake, Jr.] Who would have guessed that the "L." stands for Light? > ... > The SF crew has announced a new "theme" called "Ultra Light" which > is optimized for slow connections. Indeed, I think I can cancel my cable modem now and go back to a 28.8 phone modem. liking-it!-ly y'rs - tim From jeremy at alum.mit.edu Fri Jan 5 20:14:49 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 5 Jan 2001 14:14:49 -0500 (EST) Subject: [Python-Dev] unit testing bake-off Message-ID: <14934.7465.360749.199433@localhost.localdomain> There was a brief discussion of unit testing last millennium, which did not reach any conclusions. I'd like to restart the discussion and set some specific goals. The action item is a unit testing bake-off, held next week, to choose a tool. The primary goal is to choose a unit testing framework for the regression test suite. Tests written with this framework would eventually replace the current regrtest.py framework, based on comparing test output to expected output. For the 2.1 release, the goal would be to choose a test framework to include in the standard distribution and use it to write some or all of the new tests. We would need to integrate it in some way with regrtest.py, so that a single command can be used to run all the tests. In the long run, we can migrate existing tests to use the new system. The new system can help us address some other goals: - running an entire test suite to completion instead of stopping on the first failure - clearer reporting of what went wrong - better support for conditional tests, e.g. write a test for httplib that only runs if the network is up. This is tied into better error reporting, since the current test suite could only report that httplib succeeded or failed. Does anyone disagree with the goal? Three tools have been proposed: PyUnit, Quixote unittest, and doctest. doctest has been championed by Peter Funk, who wants a few new features, but Tim, its author, isn't pushing it as a tool for writing stand alone tests. I think the best way to use doctest is for module writers to consider it when writing a new module. If doctest is used from the start for a module, we could integrate it with the regression test. It seems quite useful for what it is intended for, but is not a general solution. That leaves PyUnit and Quixote's unittest. The two tools are fairly similar, but differ on a number of non-trivial details. Quixote also integrates code coverage, which is quite handy. If we don't adopt its unittest, we should add code coverage to PyUnit. Is anyone else interested in the choice between the two? If so, I suggest you try writing some tests with each tool and reporting back with your feedback. I propose leaving one week for such a bake-off and making a decision next Friday. Jeremy From fredrik at effbot.org Fri Jan 5 20:55:18 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 5 Jan 2001 20:55:18 +0100 Subject: [Python-Dev] unit testing bake-off References: <14934.7465.360749.199433@localhost.localdomain> Message-ID: <004c01c07751$6eed84d0$e46940d5@hagrid> Jeremy Hylton wrote: > Is anyone else interested in the choice between the two? yes. I suggest adding doctest.py plus one unit test implementation. > If so, I suggest you try writing some tests with each tool and > reporting back with your feedback. we've recently migrated from a 30-minute reimplementation of Kent Beck's original framework to one of the frameworks you mention. with that background, the choice was easy. let me know when it's time to vote... From guido at python.org Fri Jan 5 20:55:33 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 14:55:33 -0500 Subject: [Python-Dev] fileinput.py In-Reply-To: Your message of "Fri, 05 Jan 2001 13:05:14 CST." <14934.6890.160122.384692@beluga.mojam.com> References: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: <200101051955.OAA20190@cj20424-a.reston1.va.home.com> > What do you think contributes to fileinput's relative disfavor? In my view, fileinput is one of those unfortunate features that exist solely to shut up a particular kind of criticism. Without fileinput, Perl zealots would have an easy argument for a "trivial reject" of even considering Python. Now, when somebody claims the superiority of Perl's "loop involving a <> thingie", you can point to fileinput to prevent them from scoring a point. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 5 21:01:13 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 15:01:13 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: Your message of "Fri, 05 Jan 2001 20:55:18 +0100." <004c01c07751$6eed84d0$e46940d5@hagrid> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> Message-ID: <200101052001.PAA20238@cj20424-a.reston1.va.home.com> > yes. I suggest adding doctest.py plus one unit test implementation. I second this vote for doctest (in addition to a unittest thing). I propose that Tim checks in his latest version of doctest. It should go under Lib, not under Lib/test, I think. (Certainly that's how Tim has been proposing its use.) It requires LaTeX docs, but since it's got a great docstring, that should be easy. > > If so, I suggest you try writing some tests with each tool and > > reporting back with your feedback. > > we've recently migrated from a 30-minute reimplementation of Kent > Beck's original framework to one of the frameworks you mention. with > that background, the choice was easy. let me know when it's time to > vote... Which framework are you now using? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 5 21:14:41 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 05 Jan 2001 15:14:41 -0500 Subject: [Python-Dev] Add __exports__ to modules Message-ID: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Please have a look at this SF patch: http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 This implements control over which names defined in a module are externally visible: if there's a variable __exports__ in the module, it is a list of identifiers, and any access from outside the module to names not in the list is disallowed. This affects access using the getattr and setattr protocols (which raise AttributeError for disallowed names), as well as "from M import v" (which raises ImportError). I like it. This has been asked for many times. Does anybody see a reason why this should *not* be added? Tim remarked that introducing this will prompt demands for a similar feature on classes and instances, where it will be hard to implement without causing a bit of a slowdown. It causes a slight slowdown (an extra dictionary lookup for each use of "M.v") even when it is not used, but for accessing module variables that's acceptable. I'm not so sure about instance variable references. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Fri Jan 5 21:19:55 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 5 Jan 2001 15:19:55 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101052001.PAA20238@cj20424-a.reston1.va.home.com> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> Message-ID: <14934.11371.879059.610988@localhost.localdomain> If anyone is interested in experimenting with a test suite, here is a summary of the code coverage for the current regression test suite as run on my Linux box. Pick a module with low code coverage and your experiment can also improve the regression test suite. Jeremy 67.42% 798 Modules/arraymodule.c 74.39% 773 Modules/audioop.c 81.84% 380 Modules/binascii.c 62.36% 449 Modules/bsddbmodule.c 78.29% 152 Modules/cmathmodule.c 67.89% 246 Modules/_codecsmodule.c 47.41% 2647 Modules/cPickle.c 87.50% 8 Modules/cryptmodule.c 64.34% 272 Modules/cStringIO.c 0.00% 1351 Modules/_cursesmodule.c 0.00% 202 Modules/_curses_panel.c 99.28% 139 Modules/errnomodule.c 30.71% 127 Modules/fcntlmodule.c 81.90% 315 Modules/gcmodule.c 0.00% 4 Modules/getbuildinfo.c 47.29% 277 Modules/getpath.c 72.22% 54 Modules/grpmodule.c 79.95% 419 Modules/imageop.c 0.00% 11 Modules/../Include/cStringIO.h 13.25% 234 Modules/linuxaudiodev.c 14.80% 223 Modules/_localemodule.c 30.66% 137 Modules/main.c 73.20% 97 Modules/mathmodule.c 98.39% 124 Modules/md5c.c 69.70% 66 Modules/md5module.c 48.62% 362 Modules/mmapmodule.c 66.22% 74 Modules/newmodule.c 84.91% 53 Modules/operator.c 50.57% 1236 Modules/parsermodule.c 0.00% 350 Modules/pcremodule.c 28.88% 1077 Modules/posixmodule.c 82.05% 39 Modules/pwdmodule.c 77.96% 431 Modules/pyexpat.c 0.00% 1876 Modules/pypcre.c 50.00% 2 Modules/python.c 0.00% 189 Modules/readline.c 78.35% 425 Modules/regexmodule.c 72.93% 931 Modules/regexpr.c 0.00% 81 Modules/resource.c 76.98% 443 Modules/rgbimgmodule.c 82.70% 289 Modules/rotormodule.c 82.47% 291 Modules/selectmodule.c 85.10% 208 Modules/shamodule.c 81.52% 276 Modules/signalmodule.c 51.18% 678 Modules/socketmodule.c 78.64% 1105 Modules/_sre.c 69.67% 689 Modules/stropmodule.c 80.49% 656 Modules/structmodule.c 4.88% 123 Modules/termios.c 60.71% 140 Modules/threadmodule.c 68.78% 205 Modules/timemodule.c 76.92% 65 Modules/ucnhash.c 87.50% 16 Modules/unicodedatabase.c 65.83% 120 Modules/unicodedata.c 68.81% 420 Modules/zlibmodule.c 64.68% 1005 Objects/abstract.c 18.77% 261 Objects/bufferobject.c 68.77% 1204 Objects/classobject.c 27.59% 58 Objects/cobject.c 59.41% 271 Objects/complexobject.c 78.32% 678 Objects/dictobject.c 52.14% 723 Objects/fileobject.c 80.43% 368 Objects/floatobject.c 84.86% 185 Objects/frameobject.c 60.40% 149 Objects/funcobject.c 78.68% 455 Objects/intobject.c 77.66% 779 Objects/listobject.c 81.17% 1142 Objects/longobject.c 50.68% 148 Objects/methodobject.c 58.82% 136 Objects/moduleobject.c 76.50% 549 Objects/object.c 15.24% 105 Objects/rangeobject.c 41.03% 78 Objects/sliceobject.c 76.63% 1797 Objects/stringobject.c 77.00% 287 Objects/tupleobject.c 22.22% 18 Objects/typeobject.c 84.26% 108 Objects/unicodectype.c 66.61% 2743 Objects/unicodeobject.c 90.79% 76 Parser/acceler.c 0.00% 28 Parser/bitset.c 0.00% 67 Parser/firstsets.c 18.18% 22 Parser/grammar1.c 0.00% 139 Parser/grammar.c 0.00% 30 Parser/intrcheck.c 0.00% 38 Parser/listnode.c 0.00% 2 Parser/metagrammar.c 0.00% 63 Parser/myreadline.c 90.70% 43 Parser/node.c 82.26% 124 Parser/parser.c 79.38% 97 Parser/parsetok.c 0.00% 366 Parser/pgen.c 0.00% 85 Parser/pgenmain.c 0.00% 60 Parser/printgrammar.c 76.70% 588 Parser/tokenizer.c 62.31% 1231 Python/bltinmodule.c 76.55% 2021 Python/ceval.c 64.78% 230 Python/codecs.c 73.85% 2367 Python/compile.c 76.67% 30 Python/dynload_shlib.c 75.75% 301 Python/errors.c 65.59% 401 Python/exceptions.c 0.00% 31 Python/frozenmain.c 56.83% 776 Python/getargs.c 100.00% 2 Python/getcompiler.c 100.00% 2 Python/getcopyright.c 80.00% 5 Python/getmtime.c 15.62% 32 Python/getopt.c 100.00% 2 Python/getplatform.c 100.00% 4 Python/getversion.c 61.78% 1167 Python/import.c 66.67% 42 Python/importdl.c 51.35% 483 Python/marshal.c 60.58% 274 Python/modsupport.c 88.73% 71 Python/mystrtoul.c 0.00% 2 Python/pyfpe.c 91.15% 113 Python/pystate.c 37.80% 635 Python/pythonrun.c 0.00% 5 Python/sigcheck.c 12.67% 150 Python/structmember.c 53.87% 323 Python/sysmodule.c 100.00% 5 Python/thread.c 53.47% 144 Python/thread_pthread.h 21.74% 138 Python/traceback.c 58.65% 48417 TOTAL From tim.one at home.com Fri Jan 5 21:46:10 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 5 Jan 2001 15:46:10 -0500 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: [Skip Montanaro] > What do you think contributes to fileinput's relative disfavor? Only half jokingly, because I never use it , and I don't think Fredrik or Alex Martelli do either. That means it rarely gets mentioned by the c.l.py reply bots. Plus it's not *used* anywhere in the Python distribution, so nobody stumbles into it that way either. Plus the docs require more than one line to explain what it does, and get bogged down describing the Awk-like (Perl took this from Awk) convolutions before the simplest (one explictly named file) case. It *is* regularly mentioned in the eternal "while 1:" debate, but that's it. > This whole thread on Python's file reading performance was started > by the eternal whine "why is Python so much slower than Perl?" No, it started with Guido's objections to Jeff's xreadlines patch. I dragged Perl into it -- because, like it or not, that was the right thing to do . > which really means why is > > line = f.readline() > while line: > process(line) > > so much slower than whatever that thing is in Perl that everybody > uses as the be-all-end-all performance benchmark (something with > <> in it). "" is simply Perl's way of spelling Python's FILE.readline() (and FILE.readlines(), when appears in an array context; and FILE.read() when Perl's Awkish "record separator" is disabled; and ...). "<>" without an explict filehandle does all the inherited-from-Awk magic with argv, else that stuff doesn't come into play. "<>" (wihtout a filehandle) seems rarely used in Perl practice, though, *except* in support of your_shell_prompt> some_perl_script < some_file That is, "<>" is usually used simply as an abbrevision for , and I bet *most* Perl programmers don't even know "<>" is more general than that. > Given that fileinput is supposed to make the I/O loop in Python more > familiar to those people wandering over from Perl (at least in part), > you'd think that people would naturally gravitate to it. I guess you didn't actually read the timing results . Really, it's been an outrageously slow way to do input. That's better now, and I'm much more likely now than I used to be to use for line in fileinput.input('file'): instead of f = open('file') while 1: line = f.readline() if not line: break The relative attraction of the former is obvious if it's reasonably quick. I don't really have any use for the Awk complications (note that I'm running on Windows, though, and the shells here don't expand wildcards -- the Awk gimmicks are much more useful on Unix systems). > Would it benefit from some exposure in the Python tutorial? Heh -- that's a tough one. The *simplest* case is the only one deserving of promotion. But in that case, Jeff's xreadlines is about as convenient and much quicker. I bet we'll all be afraid to change the tutorial to mention either <0.9 wink>. > Is it fast enough now to warrant the extra exposure? Don't know. It's the same speed as "while 1: on *my* box now, but still 3x slower than the double-loop method. > just-whining-out-loud-ly y'rs so-do-*you*-want-to-use-it-now?-ly y'rs - tim From thomas at xs4all.net Fri Jan 5 22:19:42 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 5 Jan 2001 22:19:42 +0100 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: ; from tim.one@home.com on Fri, Jan 05, 2001 at 03:46:10PM -0500 References: <14934.6890.160122.384692@beluga.mojam.com> Message-ID: <20010105221942.J2467@xs4all.nl> On Fri, Jan 05, 2001 at 03:46:10PM -0500, Tim Peters wrote: > "<>" (wihtout a filehandle) seems > rarely used in Perl practice, though, *except* in support of > > your_shell_prompt> some_perl_script < some_file > > That is, "<>" is usually used simply as an abbrevision for , and I > bet *most* Perl programmers don't even know "<>" is more general than that. Well, I can't say anything about *most* Perl programmers, but all Perl programmers I know (including me) know damned well what <> does, and use it frequently. And in all the ways: no arguments meaning , a list of files meaning open those files one at a time, using - to include stdin in that list, accessing the filename and linenumber, etc. None of them can be called newbies, though. But then, I like using Python's fileinput, too, so maybe I'm just weird :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ping at lfw.org Fri Jan 5 23:01:53 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 5 Jan 2001 16:01:53 -0600 (CST) Subject: [Python-Dev] RE: fileinput.py In-Reply-To: <20010105221942.J2467@xs4all.nl> Message-ID: On Fri, 5 Jan 2001, Thomas Wouters wrote: > On Fri, Jan 05, 2001 at 03:46:10PM -0500, Tim Peters wrote: > > That is, "<>" is usually used simply as an abbrevision for , and I > > bet *most* Perl programmers don't even know "<>" is more general than that. > > Well, I can't say anything about *most* Perl programmers, but all Perl > programmers I know (including me) know damned well what <> does, and use it > frequently. And in all the ways: no arguments meaning , a list of > files meaning open those files one at a time, using - to include stdin in > that list, accessing the filename and linenumber, etc. I was just about to chime in and say the same thing. I don't even program in Perl any more, and i still remember all the ways that <> works. For text-processing scripts, it's unbeatable. It does pretty much exactly everything you want, and the idiom while (<>) { ... } is simple, quickly learned, frequently used, and instantly recognizable. import sys if len(sys.argv) > 1: file = open(sys.argv[1]) else: file = sys.stdin while 1: line = file.readline() if not line: break ... is much more complex, harder to explain, harder to learn, and runs slower. I have two separate suggestions: 1. Include 'sys' in builtins. It's silly to have to 'import sys' just to be able to see sys.argv and sys.stdin. 2. Put fileinput.input() in sys. With both, the while (<>) idiom becomes: for line in sys.input(): ... -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From skip at mojam.com Fri Jan 5 23:19:36 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 5 Jan 2001 16:19:36 -0600 (CST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14934.11371.879059.610988@localhost.localdomain> References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> <14934.11371.879059.610988@localhost.localdomain> Message-ID: <14934.18552.749081.871226@beluga.mojam.com> Jeremy> If anyone is interested in experimenting with a test suite, here Jeremy> is a summary of the code coverage for the current regression Jeremy> test suite as run on my Linux box. Speaking of which, I am still running my nightly code coverage thing (still with warts) whose results are available at http://musi-cal.mojam.com/~skip/python/Python/dist/src/ Does anyone care? Should I turn it off? Skip From thomas at xs4all.net Sat Jan 6 00:18:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 00:18:58 +0100 Subject: [Python-Dev] RE: fileinput.py In-Reply-To: ; from ping@lfw.org on Fri, Jan 05, 2001 at 04:01:53PM -0600 References: <20010105221942.J2467@xs4all.nl> Message-ID: <20010106001858.B402@xs4all.nl> On Fri, Jan 05, 2001 at 04:01:53PM -0600, Ka-Ping Yee wrote: > while (<>) { > ... > } > is simple, quickly learned, frequently used, and instantly recognizable. > import sys > if len(sys.argv) > 1: > file = open(sys.argv[1]) > else: > file = sys.stdin > while 1: > line = file.readline() > if not line: > break > ... ... Except that it can take more than one filename, and will do the one after another, and that it takes "-" as a filename for stdin. Doing it in a script is not dead simple, unless you open up all files at once (which can be harmful, and Perl, for one, doesn't do) or you do most of the work fileinput does. That is why I use fileinput (and while-diamond) -- I might not need it now, but when I do need it, it already works :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From moshez at zadka.site.co.il Sat Jan 6 12:00:33 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sat, 6 Jan 2001 13:00:33 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> On Fri, 05 Jan 2001 15:14:41 -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). Ummmmm.....why do we want this? What's wrong with the current suggestion of using "_"? __exports__ feels somehow wrong to me. None of the rest of Python has any access control, and I really like that. A big -1 from me, for what it's worth. > I like it. I'm surprised. Why do you like that? > This has been asked for many times. So has adding curly-braces as control structure, with all due respect. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From billtut at microsoft.com Sat Jan 6 04:43:06 2001 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 5 Jan 2001 19:43:06 -0800 Subject: [Python-Dev] Add __exports__ to modules Message-ID: <58C671173DB6174A93E9ED88DCB0883DB8637E@red-msg-07.redmond.corp.microsoft.com> I think I'm with Moshe on this one, whats wrong with just using underscores (__) to play the hiding game. Here's my silly language suggestion for this week: with self: .bar = foo bar.blah = .fubar .bar = .bar + 1 # etc.... Bill From skip at mojam.com Sat Jan 6 05:15:12 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 5 Jan 2001 22:15:12 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <14934.39888.908416.983794@beluga.mojam.com> > On Fri, 05 Jan 2001 15:14:41 -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). I have to agree with Moshe. If __exports__ is implemented for modules we'll have multiple, different access control mechanisms for different things, some of which thoughtful programmers would be able to get around, some of which they wouldn't. Here are the ways I'm aware of to control attribute visibility (there may be others - I don't usually delve too deeply into this stuff): * preface module globals with "_": This just prevents those globals from being added to the current namespace when a programmer executes "from module import *". Programmers can workaround this by attribute access through the module object or by explicitly importing it: "from module import _foo" works, yes? * preface class or instance attributes with "__": This just mangles the name by prefacing the visible name with _. The programmer can still access it by knowing the simple name mangling rule. In both cases the programmer can still get at the attribute value when necessary. If you were to add some sort of access control to module globals, I would have thought it would have been along the same lines as the existing mechanisms in place to "hide" class/instance attributes. Would it be possible (or desirable) to add the name mangling restriction to module globals as an alternative to this more restrictive implementation? What about the chances that class/instance attribute hiding will get more restrictive in the future? Finally, are the motivations for wanting to restrict access to module globals and class/instance attributes that much different from one another that they call for fundamentally different mechanisms? Skip From barry at digicool.com Sat Jan 6 06:15:20 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 00:15:20 -0500 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <14934.43496.322436.612746@anthem.wooz.org> I'm -0 on this, largely for the reasons already brought up: if modules grow __exports__ then there will be pressure to add it to classes, and modules already have a limited version of access control through leading underscore names. I might be more positive on the addition if __exports__ were added to classes, because at least there'd be a consistently stronger fence added to name access rules that prevented even consenting adults from fiddling with the naughty bits. -Barry From nas at arctrix.com Sat Jan 6 00:20:58 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 5 Jan 2001 15:20:58 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14934.43496.322436.612746@anthem.wooz.org>; from barry@digicool.com on Sat, Jan 06, 2001 at 12:15:20AM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> Message-ID: <20010105152058.A6016@glacier.fnational.com> On Sat, Jan 06, 2001 at 12:15:20AM -0500, Barry A. Warsaw wrote: > I might be more positive on the addition if __exports__ were added to > classes, because at least there'd be a consistently stronger fence > added to name access rules that prevented even consenting adults from > fiddling with the naughty bits. I think you, Skip and Moshe are missing a big advantage of having the __exports__ mechanism. It should allow some attribute access inside of modules to become faster (like LOAD_FAST for locals). I think that optimization could be implemented without too much difficultly. I've never channeled Guido before so I could be off the mark. If the only advantage is encapsulation then I'm -0. Neil From barry at digicool.com Sat Jan 6 08:09:31 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 02:09:31 -0500 Subject: [Python-Dev] PEP 232 update and patch Message-ID: <14934.50347.851118.581484@anthem.wooz.org> I've updated PEP 232, function attributes, and uploaded a patch to SF. I couldn't coax cvs diff into including the new files Lib/test/test_funcattrs.py and Lib/test/output/test_funcattrs so I'll attach them below. PEP 232: http://python.sourceforge.net/peps/pep-0232.html SF patch #103123: http://sourceforge.net/patch/?func=detailpatch&patch_id=103123&group_id=5470 Enjoy, -Barry -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs URL: From martin at loewis.home.cs.tu-berlin.de Sat Jan 6 11:06:49 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 11:06:49 +0100 Subject: [Python-Dev] PEP 208 comment Message-ID: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> I just studied PEP 208 for the first time. Overall, it seems all natural and nice, but there is one one aspect I'd like to see changed: the naming of the type flag. Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a program should be called "new". The flag will still be there five years from now, but it won't be new anymore. Also, while the flag indicates that style of the numbers is new, it does not say what it does. So I propose to rename it; if nobody finds a better name, I propose to call it Py_TPFLAGS_UNCOERCED. Regards, Martin From thomas at xs4all.net Sat Jan 6 13:52:19 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 13:52:19 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sat, Jan 06, 2001 at 11:06:49AM +0100 References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> Message-ID: <20010106135219.L2467@xs4all.nl> On Sat, Jan 06, 2001 at 11:06:49AM +0100, Martin v. Loewis wrote: > Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a > program should be called "new". The flag will still be there five > years from now, but it won't be new anymore. Also, while the flag > indicates that style of the numbers is new, it does not say what it > does. So I propose to rename it; if nobody finds a better name, I > propose to call it Py_TPFLAGS_UNCOERCED. Wrong name. The TPFLAGs only indicate whether a struct is large enough to contain a particular member, not whether that member is going to contain or do anything. 'Py_TPFLAGS_HASCOERCE' or some such would seem more appropriate to me. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Sat Jan 6 14:36:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 14:36:39 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <20010106135219.L2467@xs4all.nl> (message from Thomas Wouters on Sat, 6 Jan 2001 13:52:19 +0100) References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <20010106135219.L2467@xs4all.nl> Message-ID: <200101061336.f06DadP02895@mira.informatik.hu-berlin.de> > Wrong name. The TPFLAGs only indicate whether a struct is large enough to > contain a particular member, not whether that member is going to contain or > do anything. That may have been the original intention; *this* specific flag is not of that kind. Please look at abstract.c:binary_op1, which has if (v->ob_type->tp_as_number != NULL && NEW_STYLE_NUMBER(v)) { slot = NB_BINOP(v->ob_type->tp_as_number, op_slot); if (*slot) { x = (*slot)(v, w); if (x != Py_NotImplemented) { return x; } Py_DECREF(x); /* can't do it */ } if (v->ob_type == w->ob_type) { goto binop_error; } } Here, no additional member was added: there always was tp_as_number, and that also supported all possible op_slot values. What is new here is that the slot may be called even if v and w have different types; that was not allowed before the PEP 208 changes. Yet it tests for NEW_STYLE_NUMBER(v), which is PyType_HasFeature((o)->ob_type, Py_TPFLAGS_NEWSTYLENUMBER) So the presence of this flag is indeed an promise that a specific member will do something that it normally wouldn't do. > 'Py_TPFLAGS_HASCOERCE' or some such would seem more appropriate to > me. Well, all numbers still have coercion - it just may not be used if the flag is present. It's not a matter of having or not having something (well, only the "new style" numbers may have nb_cmp, but calling it Py_TPFLAGS_HAS_NB_CMP would be besides the point, IMO). Anyway, I don't want to defend my version too much - I just want to request that the current name is changed to *something* more descriptive. Regards, Martin From skip at mojam.com Sat Jan 6 15:40:30 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 08:40:30 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010105152058.A6016@glacier.fnational.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> <20010105152058.A6016@glacier.fnational.com> Message-ID: <14935.11870.360839.235102@beluga.mojam.com> Neil> I think you, Skip and Moshe are missing a big advantage of having Neil> the __exports__ mechanism. It should allow some attribute access Neil> inside of modules to become faster (like LOAD_FAST for locals). I Neil> think that optimization could be implemented without too much Neil> difficultly. True enough, that hadn't occurred to me. Knowing that now, I still don't think consistency of the interface should suffer as a result of under-the-covers performance gains. Skip From skip at mojam.com Sat Jan 6 15:42:25 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 08:42:25 -0600 (CST) Subject: [Python-Dev] Re: [Patches] [Patch #103123] PEP 232 implementation (function attributes) In-Reply-To: References: Message-ID: <14935.11985.972526.108391@beluga.mojam.com> Oooo... I tried went to check out Barry's function attribute patch at http://sourceforge.net/patch/?func=detailpatch&patch_id=103123&group_id=5470 and got Fatal error: Call to a member function on a non-object in /usr/local/htdocs/alexandria/www/patch/index.php on line 55 in response. Any idea whazzup? Skip From akuchlin at cnri.reston.va.us Sat Jan 6 15:47:59 2001 From: akuchlin at cnri.reston.va.us (Andrew Kuchling) Date: Sat, 6 Jan 2001 09:47:59 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14934.18552.749081.871226@beluga.mojam.com>; from skip@mojam.com on Fri, Jan 05, 2001 at 04:19:36PM -0600 References: <14934.7465.360749.199433@localhost.localdomain> <004c01c07751$6eed84d0$e46940d5@hagrid> <200101052001.PAA20238@cj20424-a.reston1.va.home.com> <14934.11371.879059.610988@localhost.localdomain> <14934.18552.749081.871226@beluga.mojam.com> Message-ID: <20010106094759.A13723@newcnri.cnri.reston.va.us> On Fri, Jan 05, 2001 at 04:19:36PM -0600, Skip Montanaro wrote: >Speaking of which, I am still running my nightly code coverage thing (still >with warts) whose results are available at > http://musi-cal.mojam.com/~skip/python/Python/dist/src/ Add a link to it from the Python development pages on SourceForge; I suspect much of the problem is that people don't remember the URL for it, and don't want to dig through the archives to find it. --amk From mal at lemburg.com Sat Jan 6 16:15:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 06 Jan 2001 16:15:27 +0100 Subject: [Python-Dev] PEP 208 comment References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> Message-ID: <3A57368F.FC01F78@lemburg.com> "Martin v. Loewis" wrote: > > I just studied PEP 208 for the first time. Overall, it seems all > natural and nice, but there is one one aspect I'd like to see changed: > the naming of the type flag. > > Currently, it is called Py_TPFLAGS_NEWSTYLENUMBER. IMHO, nothing in a > program should be called "new". The flag will still be there five > years from now, but it won't be new anymore. Also, while the flag > indicates that style of the numbers is new, it does not say what it > does. So I propose to rename it; if nobody finds a better name, I > propose to call it Py_TPFLAGS_UNCOERCED. Given that the design could well be applied to other slots as well, I think you've got a point there. The idea behind the flag was to signal that slots will no longer make object type assumptions which they could previously. Right now, only numeric types support this feature. In the future I could imaging strings and other types involving coercion would also want to use the feature. Given this design idea, how about calling the flag Py_TPFLAGS_CHECKTYPES ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Sat Jan 6 16:35:20 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 09:35:20 -0600 (CST) Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error Message-ID: <14935.15160.130742.390323@beluga.mojam.com> You know, I thought of something (which was probably already obvious to the rest of you) while perusing Barry's patch. Attaching function attributes to unbound methods could really function like C++ static data members. You'd have to write accessor functions to make setting the attributes look clean, but that wouldn't be all bad. Precisely because you couldn't modify them through the bound method, there's be no chance you could make the mistake of modifying them that way and having them transmogrify into instance attributes. Here's a quick example: class C: def __init__(self): self.just_resting() __init__.howmany = 0 def __del__(self): self.hes_dead() def hes_dead(self): C.__init__.howmany -= 1 def just_resting(self): C.__init__.howmany += 1 def howmany(self): return C.__init__.howmany def howmany(): return C.__init__.howmany c = C() print c.howmany() d = C() print d.howmany() del c print d.howmany() After applying Barry's patch, if I execute this script from the command line it displays 1 2 1 as one would expect, but then catches an attribute error during cleanup: Exception exceptions.AttributeError: "'None' object has no attribute '__init__'" in ignored If I add "del d" to the end of the script the exception disappears. I suspect there is a cleanup order problem of some sort. It seems like C is getting reclaimed before d (not possible), or that d's __class__ attribute is set to None before its __del__ method is called. Is this a known problem or something introduced by Barry's patch? Skip From barry at digicool.com Sat Jan 6 17:09:47 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 6 Jan 2001 11:09:47 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #103123] PEP 232 implementation (function attributes) References: <14935.11985.972526.108391@beluga.mojam.com> Message-ID: <14935.17227.634808.132783@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> and got | Fatal error: Call to a member function on a non-object in | /usr/local/htdocs/alexandria/www/patch/index.php on line 55 SM> in response. Any idea whazzup? I got a similar error on SF when I tried to find my patch on the patches page. I still think the patch manager just gives you no way to see all the patches when there's more than what fits on one page. The error dropped a cookie in my lap that logged me out too. After I logged in again, it all seemed to work. -Barry From martin at loewis.home.cs.tu-berlin.de Sat Jan 6 16:20:51 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 6 Jan 2001 16:20:51 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <3A57368F.FC01F78@lemburg.com> (mal@lemburg.com) References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <3A57368F.FC01F78@lemburg.com> Message-ID: <200101061520.f06FKpu03218@mira.informatik.hu-berlin.de> > Given this design idea, how about calling the flag > Py_TPFLAGS_CHECKTYPES ?! Sounds good to me. Martin From thomas at xs4all.net Sat Jan 6 17:47:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 6 Jan 2001 17:47:24 +0100 Subject: [Python-Dev] PEP 208 comment In-Reply-To: <200101061336.f06DadP02895@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Sat, Jan 06, 2001 at 02:36:39PM +0100 References: <200101061006.f06A6nn01722@mira.informatik.hu-berlin.de> <20010106135219.L2467@xs4all.nl> <200101061336.f06DadP02895@mira.informatik.hu-berlin.de> Message-ID: <20010106174724.M2467@xs4all.nl> On Sat, Jan 06, 2001 at 02:36:39PM +0100, Martin v. Loewis wrote: > That may have been the original intention; *this* specific flag is not > of that kind. Please look at abstract.c:binary_op1, which has You're right, I stand corrected, I retract my proposal :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Sat Jan 6 23:05:23 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 06 Jan 2001 17:05:23 -0500 Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error In-Reply-To: Your message of "Sat, 06 Jan 2001 09:35:20 CST." <14935.15160.130742.390323@beluga.mojam.com> References: <14935.15160.130742.390323@beluga.mojam.com> Message-ID: <200101062205.RAA23603@cj20424-a.reston1.va.home.com> > You know, I thought of something (which was probably already obvious to the > rest of you) while perusing Barry's patch. Attaching function attributes to > unbound methods could really function like C++ static data members. You'd > have to write accessor functions to make setting the attributes look clean, > but that wouldn't be all bad. Precisely because you couldn't modify them > through the bound method, there's be no chance you could make the mistake of > modifying them that way and having them transmogrify into instance > attributes. > > Here's a quick example: > > class C: > def __init__(self): > self.just_resting() > __init__.howmany = 0 > > def __del__(self): > self.hes_dead() > > def hes_dead(self): > C.__init__.howmany -= 1 > > def just_resting(self): > C.__init__.howmany += 1 > > def howmany(self): > return C.__init__.howmany > > def howmany(): > return C.__init__.howmany > > c = C() > print c.howmany() > d = C() > print d.howmany() > del c > print d.howmany() Skip, I don't find this better than the existing solution, which uses C._howmany instead of C.__init__.howmany. True, you can access it as self._howmany and if you assign to self._howmany you'd transform it into an instance attribute -- but that falls in the "then don't do that" category. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat Jan 6 23:14:44 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 17:14:44 -0500 Subject: [Python-Dev] Rehabilitating fgets Message-ID: [Guido] > ... > Unfortunately we can't use fgets(), even if it were faster than > getline(), because it doesn't tell how many characters it read. Let's think about that a little harder, because it appears to be our only hope on Windows (the MS fgets isn't optimized like the Perl inner loop, but it does lock/unlock the stream only at routine entry/exit, and uses a hidden non-locking (== much faster) variant of getc in the guts -- we've seen that the "locking" part of MS getc accounts for 17 of 30 seconds in my test case). > On files containing null bytes, readline() is supposed to treat > these like any other character; fgets does too (at least it does on Windows, and I believe that's std behavior). The problem is that it also makes up a null byte on its own. > If your input is "abc\0def\nxyz\n", the first readline() call > should return "abc\0def\n". Yes. > But with fgets(), you're left to look in the returned buffer for > a null byte, Also yes. But suppose I search "from the right", and ensure the buffer is free of null bytes before the fgets. For your input file above, fgets overwrites the initial 9 bytes of the buffer (assuming the buffer is at least 9 bytes long ...) with "abc\0def\n\0" and there's no problem if I search from the right. > and there's no way (in general) to distinguish this result from > an input file that only consisted of the three characters "abc". As above, I'm not convinced of that. The input file "abc" would overwrite the first four bytes of the buffer with "abc\0" and leave the tail end alone (well, the MS fgets leaves the tail alone, although I'm not sure ANSI C guarantees that). Of course I've *read* any number of Unix(tm) FAQs that also claim it's impossible, but I never believed them either . This extra buffer fiddling is surely an expense I don't want to pay, but the timing evidence on Windows so far says that I can probably search and/or copy the whole buffer 100 times and still be faster than enduring the threadsafe getc. Am I missing something obvious? From guido at python.org Sat Jan 6 23:33:00 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 06 Jan 2001 17:33:00 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Your message of "Sat, 06 Jan 2001 17:14:44 EST." References: Message-ID: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> [Tim suggests to use fgets(), preparing the buffer with non-null bytes, and searching for a null byte from the right.] If this is really sufficiently fast, I'd say, go for it. Looks bullet-proof as long as the source code to MSVCRT doesn't change. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat Jan 6 23:34:42 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 17:34:42 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Message-ID: [Tim, pondering] > ... But suppose I search "from the right", and ensure the buffer is > free of null bytes before the fgets. Even better, suppose I ensure the buffer is free of both null bytes and newlines before the fgets; then if I search from the *left* for a newline and find one, it must be that fgets found a line and it ends right there, and this should usually obtain. There's no need to search from the right unless I don't find a newline ... From skip at mojam.com Sun Jan 7 02:15:08 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 19:15:08 -0600 (CST) Subject: [Python-Dev] function attributes as "true" class attributes & reclamation error In-Reply-To: <200101062205.RAA23603@cj20424-a.reston1.va.home.com> References: <14935.15160.130742.390323@beluga.mojam.com> <200101062205.RAA23603@cj20424-a.reston1.va.home.com> Message-ID: <14935.49948.574427.668588@beluga.mojam.com> Skip> Attaching function attributes to unbound methods could really Skip> function like C++ static data members.... Guido> Skip, I don't find this better than the existing solution, which Guido> uses C._howmany instead of C.__init__.howmany. It was more a "hey, I never thought of it quite that way" than a "hey, I think this would be a great new idiom". In fact, I believe the more important part of my note was the bit about the attribute error on exit. I'm sure function attributes will attract their fair share of abuse. ;-) Skip From tim_one at email.msn.com Sun Jan 7 04:16:31 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 22:16:31 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow Message-ID: I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. test_builtin fails because raw_input() isn't stripping a trailing newline. I've got my own code in this area that *may* be to blame, but I don't see how it could be. I note that fileobject.c's new function get_line_raw has the comment /* Internal routine to get a line for raw_input(): strip trailing '\n', raise EOFError if EOF reached immediately */ but the code doesn't look for a trailing newline (let alone strip one). From tim_one at email.msn.com Sun Jan 7 04:33:02 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 6 Jan 2001 22:33:02 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> Message-ID: > [Tim suggests to use fgets(), preparing the buffer with non-null > bytes, and searching for a null byte from the right.] [Guido] > If this is really sufficiently fast, I'd say, go for it. Looks > bullet-proof as long as the source code to MSVCRT doesn't change. :-) Surprise? Despite all the memsets, memchrs (looking for a newline), and one-at-a-time backward searches (looking for a null byte), it's a huge win on Windows: total 117615824 chars and 3237568 lines readlines_sizehint 9.550 9.578 using_fileinput 28.790 28.781 while_readline 13.120 13.134 The last one was 30.5 seconds before the fgets hackery. I'll check it in tomorrow after sleeping on it (there's a large pile of messy endcases (not only does fgets() invent a null byte, it can't tell you whether it stopped reading due to EOF, so maybe the last line in the file ends with 10000 null bytes + no newline + exactly lines up with a buffer boundary -- etc); test_builtin is failing in a closely related area but nobody would have checked in code that failed a std test ; and it's been a frustrating day all around). i-want-my-cable-modem-back-now-ly y'rs - tim From esr at thyrsus.com Sun Jan 7 05:01:25 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 6 Jan 2001 23:01:25 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: ; from tim_one@email.msn.com on Sat, Jan 06, 2001 at 10:33:02PM -0500 References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> Message-ID: <20010106230125.A29058@thyrsus.com> Tim Peters : > > [Tim suggests to use fgets(), preparing the buffer with non-null > > bytes, and searching for a null byte from the right.] No, I haven't forgotten about the curses autoconfig stuff. But... This mess reminds me. For some work I'm doing right now, it would be very useful if there were a way to query the end-of-file status of a file descriptor without actually doing a read. I don't see this ability anywhere in the 2.0 API. Questions: 1. Am I missing something obvious? 2. If the answer to 1 is that I am not, in fact, being a dumbass, what is the right way to support this? The obvious alternatives are an eof member (analogous to the existing `closed' member, or an eof() method. I favor the latter. 3. If we agree on a design, I'm willing to implement this at least for Unix. Should be a small project. -- Eric S. Raymond The direct use of physical force is so poor a solution to the problem of limited resources that it is commonly employed only by small children and great nations. -- David Friedman From skip at mojam.com Sun Jan 7 05:05:22 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 6 Jan 2001 22:05:22 -0600 (CST) Subject: [Python-Dev] readline module seems crippled - am I missing something? Message-ID: <14935.60162.726131.593211@beluga.mojam.com> For a more-or-less throwaway script I'm working on I need a little input function similar to Emacs's read-from-minibuffer, which accepts both a prompt and an initial string for the input buffer. Seems like I ought to be able to whip something up using readline, but it's not happening. GNU readline's docs aren't the greatest, but I thought this simple script would work: import readline readline.insert_text("default") x = raw_input("?") print x I expected to see an editable "default" displayed after the prompt and have x default to "default" if I just hit the return key. I see nothing displayed after the question mark, and x is the empty string if I just hit return. This does print "default": readline.insert_text("default") x = readline.get_line_buffer() print x so I know that insert_text and get_line_buffer seem to be working as intended. Looking at call_readline in Modules/readline.c I see nothing that would disrupt the line buffer before the call to readline(). Am I missing something totally obvious about how GNU readline works or the conditions under which readline is used (only at the interactive prompt?) or is some required bit of GNU readline not exposed through Python's readline module? Skip From tim.one at home.com Sun Jan 7 11:09:02 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 7 Jan 2001 05:09:02 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <20010106230125.A29058@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > For some work I'm doing right now, it would be very useful if > there were a way to query the end-of-file status of a file > descriptor without actually doing a read. > > I don't see this ability anywhere in the 2.0 API. When someone says "API", I think "C API". In that case you can use feof(stream) directly, or whatever the heck your platform supports for handles (_eof(handle) on Windows, which I know is an OS you're secretly longing to master ). I don't believe there's a way to find out from Python short of trying to read, though. Well, I suppose you could try to compare f.tell() to the size, if you knew that f.tell() and "the size" made sense for f ... > 1. Am I missing something obvious? I don't know! I never asked Guido about this, and given that he's not on vacation now I'm not allowed to channel him. I would hazard a guess, though, that he thinks "you do or don't get something back when you read" is clearer than "you may or may not get something back when you read, regardless of which answer I give you in response to .eof() -- depending". The latter is particularly muddy in a threaded environment, even for plain old disk files. > 2. If the answer to 1 is that I am not, in fact, being a dumbass, > what is the right way to support this? The obvious alternatives > are an eof member (analogous to the existing `closed' member, or > an eof() method. I favor the latter. > > 3. If we agree on a design, I'm willing to implement this at least > for Unix. Should be a small project. I agree an .eof() method would be better than a data member. Note that whenever Python internals hit stream EOF today, they call clearerr(), so simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to make sure that feof() would never be useful <0.8 wink>. one-of-life's-little-mysteries-ly y'rs - tim From gstein at lyra.org Sun Jan 7 11:46:54 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 02:46:54 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.96,2.97 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Fri, Jan 05, 2001 at 06:43:07AM -0800 References: Message-ID: <20010107024654.W17220@lyra.org> On Fri, Jan 05, 2001 at 06:43:07AM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv3183 > > Modified Files: > fileobject.c > Log Message: > Restructured get_line() for clarity and speed. > > - The raw_input() functionality is moved to a separate function. > > - Drop GNU getline() in favor of getc_unlocked(), which exists on more > platforms (and is even a tad faster on my system). The "configure" tests for getline() can be punted if we won't use it any more... Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sun Jan 7 13:27:57 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 04:27:57 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101052014.PAA20328@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 05, 2001 at 03:14:41PM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <20010107042757.X17220@lyra.org> It feels wrong. Whatever happened to the "we're all adults here" mantra. Besides people asking for it, what is a good reason *for* it to be added? Cheers, -g On Fri, Jan 05, 2001 at 03:14:41PM -0500, Guido van Rossum wrote: > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). > > I like it. This has been asked for many times. Does anybody see a > reason why this should *not* be added? > > Tim remarked that introducing this will prompt demands for a similar > feature on classes and instances, where it will be hard to implement > without causing a bit of a slowdown. It causes a slight slowdown (an > extra dictionary lookup for each use of "M.v") even when it is not > used, but for accessing module variables that's acceptable. I'm not > so sure about instance variable references. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Greg Stein, http://www.lyra.org/ From guido at python.org Sun Jan 7 17:52:11 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 11:52:11 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sat, 06 Jan 2001 23:01:25 EST." <20010106230125.A29058@thyrsus.com> References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> <20010106230125.A29058@thyrsus.com> Message-ID: <200101071652.LAA31411@cj20424-a.reston1.va.home.com> > This mess reminds me. For some work I'm doing right now, it would be > very useful if there were a way to query the end-of-file status of a > file descriptor without actually doing a read. I hope you really mean file object (== wrapper around stdio FILE object). A file descriptor (small little integer in Unix) doesn't have a way to find this out. Even for file objects, it is typically only known that there's an EOF condition after a lowest-level read operation returned 0 bytes. So in effect you must still do a read in order to determine EOF status. I just ran a small test program, and fread() appears to set the eof status when it returns a short count. Normally, Python's read() uses fread() so this might be useful. However after a readline(), you can't know the eof status (unless the last line of the file doesn't end in a newline). > I don't see this ability anywhere in the 2.0 API. Questions: > > 1. Am I missing something obvious? > > 2. If the answer to 1 is that I am not, in fact, being a dumbass, what > is the right way to support this? The obvious alternatives are an > eof member (analogous to the existing `closed' member, or an eof() > method. I favor the latter. > > 3. If we agree on a design, I'm willing to implement this at least for > Unix. Should be a small project. Before adding an eof() method, can you explain what your program is trying to do? Is it reading from a pipe or socket? Then select() or poll() might be useful. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Sun Jan 7 19:30:32 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 13:30:32 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: ; from tim.one@home.com on Sun, Jan 07, 2001 at 05:09:02AM -0500 References: <20010106230125.A29058@thyrsus.com> Message-ID: <20010107133032.F4586@thyrsus.com> Tim Peters : > I agree an .eof() method would be better than a data member. Note that > whenever Python internals hit stream EOF today, they call clearerr(), so > simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to > make sure that feof() would never be useful <0.8 wink>. That's inconvenient, but only means the internal Python state flag that feof() would inspect would have to be checked after each read. -- Eric S. Raymond "...The Bill of Rights is a literal and absolute document. The First Amendment doesn't say you have a right to speak out unless the government has a 'compelling interest' in censoring the Internet. The Second Amendment doesn't say you have the right to keep and bear arms until some madman plants a bomb. The Fourth Amendment doesn't say you have the right to be secure from search and seizure unless some FBI agent thinks you fit the profile of a terrorist. The government has no right to interfere with any of these freedoms under any circumstances." -- Harry Browne, 1996 USA presidential candidate, Libertarian Party From esr at thyrsus.com Sun Jan 7 19:45:41 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 13:45:41 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101071652.LAA31411@cj20424-a.reston1.va.home.com>; from guido@python.org on Sun, Jan 07, 2001 at 11:52:11AM -0500 References: <200101062233.RAA23942@cj20424-a.reston1.va.home.com> <20010106230125.A29058@thyrsus.com> <200101071652.LAA31411@cj20424-a.reston1.va.home.com> Message-ID: <20010107134541.G4586@thyrsus.com> Guido van Rossum : > > This mess reminds me. For some work I'm doing right now, it would be > > very useful if there were a way to query the end-of-file status of a > > file descriptor without actually doing a read. > > I hope you really mean file object (== wrapper around stdio FILE > object). A file descriptor (small little integer in Unix) doesn't > have a way to find this out. You're right, my bad. > Even for file objects, it is typically only known that there's an EOF > condition after a lowest-level read operation returned 0 bytes. So in > effect you must still do a read in order to determine EOF status. > > I just ran a small test program, and fread() appears to set the eof > status when it returns a short count. Normally, Python's read() uses > fread() so this might be useful. However after a readline(), you > can't know the eof status (unless the last line of the file doesn't > end in a newline). I considered trying a zero-length read() in Python, but this strikes me as inelegant even if it would work. > Before adding an eof() method, can you explain what your program is > trying to do? Is it reading from a pipe or socket? Then select() or > poll() might be useful. Sadly, it's exactly the wrong case. Hmmm...omitting irrelevant details, it's a situation where a markup file can contain sections in two different languages. The design requires the first interpreter to exit on seeing either EOF or a marker that says "switching to second language". For reasons too compllicated to explain, it would be best if the parser for the first language didn't simply call the second parser. The logic I wanted to write amounts to: while 1: line = fp.readline() if not line or line == "history": break interpret_in-language_1(line) if not fp.feof() while 1: line = fp.readline() if not line: break interpret_in-language_2(line) I just tested the zero-length-read method. That worked. I guess I'll use it. -- Eric S. Raymond "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom." -- John F. Kennedy From martin at loewis.home.cs.tu-berlin.de Sun Jan 7 19:45:15 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 7 Jan 2001 19:45:15 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? Message-ID: <200101071845.f07IjFi01249@mira.informatik.hu-berlin.de> Authors of extension packages often find the need to auto-import some of their modules. This is often needed for registration, e.g. a codec author (like Tamito KAJIYAMA, who wrote the JapaneseCodecs package) may need to register a search function with codecs.register. This is currently only possible by writing into sitecustomize.py, which must be done by the system administrator manually. To enhance the service of site.py, I've written the patch http://sourceforge.net/patch/?func=detailpatch&patch_id=103134&group_id=5470 which treats lines in PTH files which start with "import" as statements and executes them, instead of appending these lines to sys.path. The patch is relatively small, but since it is an extension: Do I need to write a PEP for it? Regards, Martin From tismer at tismer.com Sun Jan 7 19:05:21 2001 From: tismer at tismer.com (Christian Tismer) Date: Sun, 07 Jan 2001 20:05:21 +0200 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <14934.43496.322436.612746@anthem.wooz.org> <20010105152058.A6016@glacier.fnational.com> <14935.11870.360839.235102@beluga.mojam.com> Message-ID: <3A58AFE1.3AB619BD@tismer.com> Skip Montanaro wrote: > > Neil> I think you, Skip and Moshe are missing a big advantage of having > Neil> the __exports__ mechanism. It should allow some attribute access > Neil> inside of modules to become faster (like LOAD_FAST for locals). I > Neil> think that optimization could be implemented without too much > Neil> difficultly. > > True enough, that hadn't occurred to me. Knowing that now, I still don't > think consistency of the interface should suffer as a result of > under-the-covers performance gains. Ok, vice versa: Given that we can support access control via __exports__ for modules, classes and instances as well, *and* if we can think up a scheme that allows a LOAD_FAST like speedup for all of these cases at the same time, then I would say +1, otherwise -0, half-hearted solution. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Sun Jan 7 22:13:01 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 16:13:01 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sun, 07 Jan 2001 13:30:32 EST." <20010107133032.F4586@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> Message-ID: <200101072113.QAA32467@cj20424-a.reston1.va.home.com> > Tim Peters : > > I agree an .eof() method would be better than a data member. Note that > > whenever Python internals hit stream EOF today, they call clearerr(), so > > simply adding an feof() wrapper wouldn't suffice. Guido seemed to try to > > make sure that feof() would never be useful <0.8 wink>. > [ESR] > That's inconvenient, but only means the internal Python state flag > that feof() would inspect would have to be checked after each read. This was done because some platforms set feof() when there's still a possibity to read more (e.g. after an interactive user typed ^D), while others don't. It's inconvenient to get an endless stream of EOFs from stdin when a user typed ^D to one particular prompt, so I decided to clear the EOF status. [ESR in a later message] > I considered trying a zero-length read() in Python, but this strikes me > as inelegant even if it would work. I doubt that a zero-length read conveys any information. It should return "" whether or not there is more to read! Plus, look at the implementation of readline() (file_readline() in Objects/fileobject.c): it shortcuts the n == 0 case and returns an empty string without touching the file. [me] > > Before adding an eof() method, can you explain what your program is > > trying to do? Is it reading from a pipe or socket? Then select() or > > poll() might be useful. [ESR again] > Sadly, it's exactly the wrong case. Hmmm...omitting irrelevant details, > it's a situation where a markup file can contain sections in two different > languages. The design requires the first interpreter to exit on seeing > either EOF or a marker that says "switching to second language". For > reasons too compllicated to explain, it would be best if the parser for > the first language didn't simply call the second parser. > > The logic I wanted to write amounts to: > > while 1: > line = fp.readline() > if not line or line == "history": > break > interpret_in-language_1(line) > > if not fp.feof() > while 1: > line = fp.readline() > if not line: > break > interpret_in-language_2(line) > > I just tested the zero-length-read method. That worked. I guess I'll > use it. Bizarre (given what I know about zero-length read). But in the above code, you can replace "if not fp.feof()" with "if line". In other words, you just have to carry the state over within your program. So, I see no reason why the logic in your program couldn't take care of this, which in general is a preferred way to solve a problem than to change the language. Also note that in Python it's no sin to attempt to read a line even when the file is already at EOF -- you will simply get an empty line again. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Sun Jan 7 22:29:46 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sun, 7 Jan 2001 22:29:46 +0100 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> Message-ID: <035901c078f0$f6180f70$e46940d5@hagrid> Guido van Rossum wrote: > Bizarre (given what I know about zero-length read). But in the above > code, you can replace "if not fp.feof()" with "if line". In other > words, you just have to carry the state over within your program. and if that's too hard, just hide the state in a class: class FileWrapper: def __init__(self, file): self.__file = file self.__line = None def __more(self): # try reading another line if not self.__line: self.__line = self.__file.readline() def eof(self): self.__more() return not self.__line def readline(self): self.__more() line = self.__line self.__line = None return line file = open("myfile.txt") file = FileWrapper(file) while not file.eof(): print repr(file.readline()) From guido at python.org Sun Jan 7 22:32:26 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 07 Jan 2001 16:32:26 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: Your message of "Sat, 06 Jan 2001 22:16:31 EST." References: Message-ID: <200101072132.QAA32627@cj20424-a.reston1.va.home.com> > I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. > > test_builtin fails because raw_input() isn't stripping a trailing newline. > I've got my own code in this area that *may* be to blame, but I don't see > how it could be. I note that fileobject.c's new function get_line_raw has > the comment > > /* Internal routine to get a line for raw_input(): > strip trailing '\n', raise EOFError if EOF reached immediately > */ > > but the code doesn't look for a trailing newline (let alone strip one). My bad. Try the latest CVS now. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Sun Jan 7 23:15:27 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 17:15:27 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101072113.QAA32467@cj20424-a.reston1.va.home.com>; from guido@python.org on Sun, Jan 07, 2001 at 04:13:01PM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> Message-ID: <20010107171527.A5093@thyrsus.com> Guido van Rossum : > [ESR in a later message] > > I considered trying a zero-length read() in Python, but this strikes me > > as inelegant even if it would work. > > I doubt that a zero-length read conveys any information. It should > return "" whether or not there is more to read! Duh. Of course it would. You know, I've always been half-consciously dissatisfied with Python's use of "" as an EOF marker, and now I know why. It's precisely because there's no way to distinguish these cases. I think a zero-length read ought to return "" and a read on EOF ought to return None. > Bizarre (given what I know about zero-length read). But in the above > code, you can replace "if not fp.feof()" with "if line". In other > words, you just have to carry the state over within your program. > > So, I see no reason why the logic in your program couldn't take care > of this, which in general is a preferred way to solve a problem than > to change the language. OK, two objections, one practical and one (more important) esthetic: Practical: I guess I oversimplified the code for expository purposes. What's actually going on is that I have two parser classes both based on shlex -- they do character-at-a-time input and don't actually *have* accessible line buffers. Esthetic: Yes, I can have the first parser set a flag, or return some EOF token. But this seems deeply wrong to me, because EOFness is not a property of the parser but of the underlying stream object. It seems to me that my program ought to be able to ask the stream object whether it's at EOF rather than carrying its own flag for that state. In Python as it is, there's no clean way to do this. I'd have to do a nonzero-length read to test it (I failed to check the right alternate case before when I tried zero-length). That's really broken. What if the neither the underlying stream nor the parser supports pushback? Do you see now why I think this is a more general issue? Now, another and more general way to handle this would be to make an equivalent of the old FIONCLEX ioctl part of Python's standard set of file object methods -- a way to ask "how many bytes are ready to be read in this stream? Trivial to make it work for plain files, of course. Harder to make it work usefully for pipes/fifos/sockets/terminals. Having it pass up the results of the fstat.size field (corrected for the current seek address if you're reading a plain file) would be a good start. -- Eric S. Raymond Live free or die; death is not the worst of evils. -- General George Stark. From tismer at tismer.com Sun Jan 7 23:37:55 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 08 Jan 2001 00:37:55 +0200 Subject: [Python-Dev] ANN: Stackless Python 2.0 Message-ID: <3A58EFC3.5A722FF0@tismer.com> Dear community, I'm happy to announce that Stackless Python 2.0 is finally ready and available for download. Stackless Python for Python 1.5.2+ also got some minor enhancements. Both versions are available as Win32 installer files here: http://www.stackless.com/spc20-win32.exe http://www.stackless.com/spc15-win32.exe Speed: Stackless Python for Python 2.0 is again a bit faster than the original. This time even better: About 9-10 percent. I have to say that optimization was much harder this time. My speed patches are now done by a Python script, which will make maintenance and diff reading much easier in the future. There is now also a bit of example code available, like the uthread9.py Microthreads module from Will Ware, Just van Rossum, and Mike Fletcher. Source code and an update to the website will become available in the next days. enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal at lemburg.com Mon Jan 8 01:26:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 01:26:00 +0100 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow References: Message-ID: <3A590918.E90031AA@lemburg.com> Tim Peters wrote: > > I'm pretty sure the test_pow and test_charmapcodec failures aren't my doing. test_charmapcodec is my fault... I should run the tests in a clean room environment before checkin: my PYTHONPATH picked up some other file which it was not supposed to do. I'll fix it next week. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 8 05:13:26 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 7 Jan 2001 23:13:26 -0500 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: Message-ID: The "Win32" readline() hack is now checked in, but there's really nothing Win32-specific about it anymore. It makes one mild assumption about what the C std doesn't clearly address but may have intended: that in case of a non-NULL return, fgets doesn't overwrite any of the buffer positions beyond the terminating null byte (the std is clear that it doesn't overwrite anything at all in case of a NULL-because-EOF return, but I can't say whether they're pointing that out as a consequence, or pointing that out as an exception). I'm curious about how it performs (relative to the getc_unlocked hack) on other platforms. If you'd like to try that, just recompile fileobject.c with USE_MS_GETLINE_HACK #define'd. It should *work* on any platform with fgets() meeting the assumption. The new test_bufio.py std test gives it a pretty good correctness workout, if you're worried about that. From esr at snark.thyrsus.com Mon Jan 8 05:16:53 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Sun, 7 Jan 2001 23:16:53 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge Message-ID: <200101080416.f084GrM10912@snark.thyrsus.com> Setting things up so curses is autoconfigured into the default build if your system has it in the expected places turned out to be dead easy. Some clever person (the BDFL himself?) wrote the build process so that there is *already* a Setup.config.in that gets configure expansions done on it, with the generated Setup.config used when makesetup does its magic. As a bonus, I've also added autoconfiguration for readline. A small detail, but one which I suspect many people building their own Pythons frequently trip over. The technique generalizes easily. The archetype for a facility for autoconfiguring libfoo with a Python extension foo.c if it's present has just two steps: Add this to Modules/Setup.config.in: @USE_FOO_MODULE at foo foo.c -lfoo Add this to configure.in: # This is used to generate Setup.config AC_SUBST(USE_FOO_MODULE) AC_CHECK_LIB(foo, random_foo_function, [USE_FOO_MODULE=""], [USE_FOO_MODULE="#"]) (Apologies for the lack of description with the patch. I tripped over a SourceForge interface bug.) -- Eric S. Raymond The possession of arms by the people is the ultimate warrant that government governs only with the consent of the governed. -- Jeff Snyder From tim.one at home.com Mon Jan 8 06:34:20 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 00:34:20 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <3A590918.E90031AA@lemburg.com> Message-ID: An update: test_builtin works again (thanks, Guido!), and test_charmapcodec will "next week" (thanks, MAL!). Still unknown (to me): is the test_pow failure unique to Windows? One response from a Unix(tm) geek would settle that. From nas at arctrix.com Sun Jan 7 23:59:49 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 14:59:49 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 12:34:20AM -0500 References: <3A590918.E90031AA@lemburg.com> Message-ID: <20010107145949.A14166@glacier.fnational.com> On Mon, Jan 08, 2001 at 12:34:20AM -0500, Tim Peters wrote: > Still unknown (to me): is the test_pow failure unique to Windows? One > response from a Unix(tm) geek would settle that. It works fine for me on Linux. I thought I tested on Windows before checking in the coerce patch. I'll try again. Neil From nas at arctrix.com Mon Jan 8 00:29:14 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:29:14 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107145949.A14166@glacier.fnational.com>; from nas@arctrix.com on Sun, Jan 07, 2001 at 02:59:49PM -0800 References: <3A590918.E90031AA@lemburg.com> <20010107145949.A14166@glacier.fnational.com> Message-ID: <20010107152914.A14228@glacier.fnational.com> On Sun, Jan 07, 2001 at 02:59:49PM -0800, Neil Schemenauer wrote: > It works fine for me on Linux. I thought I tested on Windows > before checking in the coerce patch. I'll try again. Wierd. rt.bat does not run the test_pow script. If I run "regrtet test_pow" then the test fails. It could be a problem with line endings (I copied the source for a Unix CVS checkout). Anyhow, I found the bug. I don't know how test_pow was passing under Linux. Time to reboot again. Neil From tim.one at home.com Mon Jan 8 07:39:20 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 01:39:20 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107152914.A14228@glacier.fnational.com> Message-ID: [NeilS] > Wierd. rt.bat does not run the test_pow script. Works for me, else I never would have noticed . Also works for me in single-test mode: C:\Code\python\dist\src\PCbuild>rt test_pow C:\Code\python\dist\src\PCbuild>python ../lib/test/regrtest.py test_pow test_pow The actual stdout doesn't match the expected stdout. This much did match (between asterisk lines): ********************************************************************** test_pow Testing integer mode... Testing 2-argument pow() function... Testing 3-argument pow() function... Testing long integer mode... Testing 2-argument pow() function... Testing 3-argument pow() function... Testing floating point mode... Testing 3-argument pow() function... The number in both columns should match. 3 3 -5 -5 -1 -1 5 5 -3 -3 -7 -7 3L 3L -5L -5L -1L -1L 5L 5L -3L -3L -7L -7L 3.0 3.0 -5.0 -5.0 -1.0 -1.0 -7.0 -7.0 ********************************************************************** Then ... We expected (repr): '' But instead we got: 'Float mismatch:' test test_pow failed -- Writing: 'Float mismatch:', expected: '' 1 test failed: test_pow C:\Code\python\dist\src\PCbuild> That may point to the problem, too: the canned output file is truncated? > If I run "regrtet test_pow" then the test fails. It could be a > problem with line endings (I copied the source for a Unix CVS > checkout). Don't understand; e.g., "copied" what, from where to where? I'm not sure I gave you write access to my box, and hacking into Windows machines is uncool because it's not challenging . > Anyhow, I found the bug. I don't know how test_pow was passing > under Linux. Time to reboot again. Cool! BTW, Windows solves the "don't reboot enough" problem for you via automation, sometimes on an hourly basis. Thanks for sharing the brain cells, Neil! From thomas at xs4all.net Mon Jan 8 07:44:11 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 07:44:11 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101080416.f084GrM10912@snark.thyrsus.com>; from esr@snark.thyrsus.com on Sun, Jan 07, 2001 at 11:16:53PM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> Message-ID: <20010108074411.N2467@xs4all.nl> On Sun, Jan 07, 2001 at 11:16:53PM -0500, Eric S. Raymond wrote: > Setting things up so curses is autoconfigured into the default build > if your system has it in the expected places turned out to be dead > easy. Some clever person (the BDFL himself?) wrote the build process > so that there is *already* a Setup.config.in that gets configure > expansions done on it, with the generated Setup.config used when > makesetup does its magic. Skip, actually, IIRC. It was added in the last stages of 2.0 development, to auto-detect bsddb. However, I still think it should be a separate 'configure', in the Modules directory. Especially now that Andrew is practically checking in the distutils setup ;) The main configure can make an educated guess whether Python and distutils are available, and call configure with some passed-through options if not. It does depend on what the distutils setup does, though, and I'll shamefully admit that I haven't looked at that ;P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas at arctrix.com Mon Jan 8 00:51:16 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:51:16 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 01:39:20AM -0500 References: <20010107152914.A14228@glacier.fnational.com> Message-ID: <20010107155116.A14312@glacier.fnational.com> On Mon, Jan 08, 2001 at 01:39:20AM -0500, Tim Peters wrote: > [NeilS] > > If I run "regrtet test_pow" then the test fails. It could be a > > problem with line endings (I copied the source for a Unix CVS > > checkout). > > Don't understand; e.g., "copied" what, from where to where? I should have been clearer. I mean the problem with rt.bat not running test_pow. I copied the CVS source from my Linux ext2 filesystem to a VFAT filesystem. I was too lazy to fix the line endings. Neil From nas at arctrix.com Mon Jan 8 00:52:38 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 7 Jan 2001 15:52:38 -0800 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107152914.A14228@glacier.fnational.com>; from nas@arctrix.com on Sun, Jan 07, 2001 at 03:29:14PM -0800 References: <3A590918.E90031AA@lemburg.com> <20010107145949.A14166@glacier.fnational.com> <20010107152914.A14228@glacier.fnational.com> Message-ID: <20010107155238.A14291@glacier.fnational.com> On Sun, Jan 07, 2001 at 03:29:14PM -0800, Neil Schemenauer wrote: > I don't know how test_pow was passing under Linux. Under Linux with the buggy float_pow: >>> pow(10.0, 0, 10) nan >>> pow(10.0, 0, 10) == 1 1 >>> pow(10.0, 0, 10) == 0 1 Under Windows NAN obviously behaves differently. floating-point-is-fun-ly y'rs Neil From esr at thyrsus.com Mon Jan 8 07:49:45 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 01:49:45 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108074411.N2467@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 08, 2001 at 07:44:11AM +0100 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> Message-ID: <20010108014945.A19516@thyrsus.com> Thomas Wouters : > On Sun, Jan 07, 2001 at 11:16:53PM -0500, Eric S. Raymond wrote: > > Setting things up so curses is autoconfigured into the default build > > if your system has it in the expected places turned out to be dead > > easy. Some clever person (the BDFL himself?) wrote the build process > > so that there is *already* a Setup.config.in that gets configure > > expansions done on it, with the generated Setup.config used when > > makesetup does its magic. > > Skip, actually, IIRC. It was added in the last stages of 2.0 development, to > auto-detect bsddb. However, I still think it should be a separate > 'configure', in the Modules directory. You may be right. Still, this patch solves the immediate problem in a reasonably clean way, and I urge that it should go in. We can do a more complete reorganization of the build process later. (I'll help with that; I'm pretty expert with autoconf and friends.) -- Eric S. Raymond "As to the species of exercise, I advise the gun. While this gives [only] moderate exercise to the body, it gives boldness, enterprise, and independence to the mind. Games played with the ball and others of that nature, are too violent for the body and stamp no character on the mind. Let your gun, therefore, be the constant companion to your walks." -- Thomas Jefferson, writing to his teenaged nephew. From tim.one at home.com Mon Jan 8 08:05:46 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 02:05:46 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: Well, I like __exports__ (but not some details of the patch, for which see my SF comments). Guido is aware of the optimization possibilities, but that's not what's driving it. I don't know why he likes it; I like it because the only normal use for a module is to do module.attr, or "from module import attr", and dir(module) very often exposes stuff today that the module author had no intention of exporting. For example, if I do import os dir(os) under CVS Python today, on my box I see that os exports "i". It's bound to _exit. That's baffling, and is purely an accident of how module os.py initialization works when you're running on Windows. Couple that with that I've hardly ever seen (or bothered to write) a module docstring spelling out everything a module *intends* to export, and an __exports__ line near the top (when present) would also automagically give a solid answer to that question. modules aren't classes or instances, and in normal practice modules accumulate all sorts of accidental attrs (due to careless (== normal) imports, and module init code). It doesn't make any *sense* that os exports "sys" either, or that random exports "cos", or that cgi exports "string", or ... this inelegance is ubiquitous. In a world with an __exports__ that gets used, though, I do wonder whether people will or won't export their test() functions. I really like that they do now. or-maybe-it's-just-that-i-like-modules-that-*have*-a- test-function-ly y'rs - tim From gstein at lyra.org Mon Jan 8 08:25:32 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 7 Jan 2001 23:25:32 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 02:05:46AM -0500 References: <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <20010107232532.V17220@lyra.org> On Mon, Jan 08, 2001 at 02:05:46AM -0500, Tim Peters wrote: >... > modules aren't classes or instances, and in normal practice modules > accumulate all sorts of accidental attrs (due to careless (== normal) > imports, and module init code). It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. Simple question: so what? "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Mon Jan 8 08:29:39 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 02:29:39 -0500 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: <20010107155238.A14291@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Under Linux with the buggy float_pow: > > >>> pow(10.0, 0, 10) > nan > >>> pow(10.0, 0, 10) == 1 > 1 > >>> pow(10.0, 0, 10) == 0 > 1 > > Under Windows NAN obviously behaves differently. Comparisons with NaN are a platform-dependent accident, partly because some C compilers generate nonsense code, partly because Python isn't coded to cater to NaN's peculiarities either. The behavior under Windows is (accidentally) better in these cases today (NaN should never compare equal to anything -- not even to itself -- and, curiously, MSVC's codegen mistakes cancel out Python's mistakes in this case!). Thank you for fixing the bug. Only test_charmapcodec is failing for me now, and MAL knows the cause and cure. nothing-can-stop-the-alpha-now-ly y'rs - tim From thomas at xs4all.net Mon Jan 8 08:42:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 08:42:30 +0100 Subject: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 02:29:39AM -0500 References: <20010107155238.A14291@glacier.fnational.com> Message-ID: <20010108084230.O2467@xs4all.nl> On Mon, Jan 08, 2001 at 02:29:39AM -0500, Tim Peters wrote: > (NaN should never compare equal to anything -- not even to itself You know that's impossible, in Python, right ? (Due to the shortcut taken by '==', based on object identity.) Is that going to be 'fixed', too ? :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ping at lfw.org Mon Jan 8 08:51:11 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 7 Jan 2001 23:51:11 -0800 (PST) Subject: [Python-Dev] inspect.py In-Reply-To: Message-ID: Hi again. Sorry to bother you if you're busy -- i haven't seen any responses about inspect.py for a few days and wanted to know what your reactions were. The module and test suite are still at: http://www.lfw.org/python/inspect.py http://www.lfw.org/python/test_inspect.py The only change since my announcement last Wednesday is that getframe() has been renamed to getframeinfo(). Thanks, -- ?!ng "Old code doesn't die -- it just smells that way." -- Bill Frantz From tim.one at home.com Mon Jan 8 09:17:57 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:17:57 -0500 Subject: NaN nonsense (was RE: [Python-Dev] Std tests failing, Windows: test_builtin test_charmapcodec test_pow) In-Reply-To: <20010108084230.O2467@xs4all.nl> Message-ID: >> (NaN should never compare equal to anything -- not even to itself [Thomas Wouters] > You know that's impossible, in Python, right ? (Due to the > shortcut taken by '==', based on object identity.) Surely you jest: I probably knew that while you were still nursing . OTOH, Python on WinTel comes remarkably close (by accident): C:\Code\python\dist\src\PCbuild>python Python 2.0 (#8, Jan 5 2001, 00:33:19) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> inf = 1e300**2 >>> inf 1.#INF >>> nan = inf - inf >>> nan -1.#IND >>> nan2 = nan * 1.0 >>> nan2 -1.#IND >>> nan == nan2 0 >>> > Is that going to be 'fixed', too ? :) Not if I can help it. I'd be in favor of adding an fcmp function that needs to be called explicitly when you want the full complexity of 754 comparisons. Count them all up, and there are 32 distinct 754 binary float comparison operators! The 754 std says 26 (from memory, may be 2 more or less) of those have to be supplied, but-- since 754 is not a language std --says nothing about how they're to be spelled. OTOH, C99 resolutely tries to map that into C, and 754 True Believers will use that as a club. On the third hand, as Tom MacDonald posted here earlier (he was X3J11 chair), he's not sure anyone will ever implement C99 in whole. The complexities of full 754 support are a large part of why he worries about that. too-much-too-late-ly y'rs - tim From tim.one at home.com Mon Jan 8 09:17:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:17:59 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010107232532.V17220@lyra.org> Message-ID: [Greg Stein] > Simple question: so what? > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Couldn't care less about the module author. It's the module user who has to sort this stuff out. "Don't use 'import *'" is good advice but not followed either, and after I do from MyPackage import sys # intentionally exports its own sys from GregSnort import * # accidentally exports some other sys madness ensues. Like I said, it's inelegant, and at best. Simple question for you: what would __exports__ hurt? "Oh, no! Tim's module explicitly lists what it intended to export! Oh, woe is me!". Gimme a break. From gstein at lyra.org Mon Jan 8 09:26:03 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 00:26:03 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:17:59AM -0500 References: <20010107232532.V17220@lyra.org> Message-ID: <20010108002603.X17220@lyra.org> On Mon, Jan 08, 2001 at 03:17:59AM -0500, Tim Peters wrote: > [Greg Stein] > > Simple question: so what? > > > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* > > Couldn't care less about the module author. It's the module user who has to > sort this stuff out. "Don't use 'import *'" is good advice but not followed > either, and after I do > > from MyPackage import sys # intentionally exports its own sys > from GregSnort import * # accidentally exports some other sys > > madness ensues. Like I said, it's inelegant, and at best. > > Simple question for you: what would __exports__ hurt? "Oh, no! Tim's > module explicitly lists what it intended to export! Oh, woe is me!". Gimme > a break. hehe... adding __exports__ to your module is fine. Adding more crud to Python, in opposition to the "we're all adults" motto, doesn't seem Right. Somebody wants to use "from foo import *" on a module not designed for it? Too bad for them. If you're suggesting __exports__ is to patch over problems caused by "from foo import *", then I think you're barking up the wrong tree :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at zadka.site.co.il Mon Jan 8 17:50:57 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 8 Jan 2001 18:50:57 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010107232532.V17220@lyra.org> References: <20010107232532.V17220@lyra.org>, <20010106110033.52127A84F@darjeeling.zadka.site.co.il> Message-ID: <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> [Tim Peters] > modules aren't classes or instances, and in normal practice modules > accumulate all sorts of accidental attrs (due to careless (== normal) > imports, and module init code). It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. [Greg Stein] > Simple question: so what? > > "Oh, no! My module exposes mod.sys! Oh, woe is me!" *snort* Let me "me to" here: Put another way, what Greg said is just a rephrase of "don't use from foo import * unless foo's docos say it's OK". Add to that the simple access control of a leading underscore, and I don't see any place which needs it. Something better to do would be to use import foo as _foo In some standard library modules, and minimize using from foo import bar in them. Since everyone know that leading underscore means "implementation detail - ignore at your convenience, use at yor peril", this would keep the "we're all adults" philosophy of Python, with all the advantages *I* see in __exports__. One more point against __exports__, which I hoped I would not have to make (but when I'm up against the timbot *and* Guido, I need to pull out the heavy artillery): it would *totally* stop any hope in the future of module level __getattr__ (or at least complicate the semantics). I think Alex M. is thinking of a PEP, but he's taking his time, since no PEPs can be considered until 2.1 is out. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Mon Jan 8 09:49:58 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 03:49:58 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010108002603.X17220@lyra.org> Message-ID: [Greg Stein] > hehe... adding __exports__ to your module is fine. Adding more > crud to Python, in opposition to the "we're all adults" motto, > doesn't seem Right. My idea of what's Right is copied from my boss . > Somebody wants to use "from foo import *" on a module not designed > for it? Too bad for them. How is someone supposed to know whether a module "was designed" for import*? Even Tkinter (which just about everyone does "import *" on) also exports sys, and everything from the "types" module, by accident too. > If you're suggesting __exports__ is to patch over problems > caused by "from foo import *", then I think you're barking up the > wrong tree > :-) Indeed. But I'm suggesting that the problems that *can* arise from "import*" illustrate the fundamental silliness of exporting things by accident. It's come up much more often for me when I'm looking over someone's shoulder, teaching them how to use dir() in an interactive shell to answer their own damn questions <0.5 wink>. It's usually the case that dir(M) shows them something that isn't documented, and over time I am *not* pleased that "oh, I guess the 'string' in there is just crap" is how they learn to view it. I can live without __exports__; but I'd prefer not to, because I would always use it if it were there. if-i'd-both-use-it-and-heartily-recommend-it-it's-hard-to- oppose-it-ly y'rs - tim From m.favas at per.dem.csiro.au Mon Jan 8 12:48:40 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Mon, 08 Jan 2001 19:48:40 +0800 Subject: [Python-Dev] _cursesmodule.c clobbered since Christmas Message-ID: <3A59A918.E0D02E0D@per.dem.csiro.au> I last successfully downloaded from CVS, compiled, linked and tested on Dec. 22 last year. For the last week or so, the current CVS _cursesmodule.c gives a bunch of compiler warning messages of the form: cc: Warning: ./_cursesmodule.c, line 619: In this statement, "derwin(...)" of ty pe "int", is being converted to "pointer to struct _win_st". (cvtdiftypes) win = derwin(self->win,nlines,ncols,begin_y,begin_x); --^ cc: Warning: ./_cursesmodule.c, line 1259: In this statement, "subpad(...)" of t ype "int", is being converted to "pointer to struct _win_st". (cvtdiftypes) win = subpad(self->win, nlines, ncols, begin_y, begin_x); ----^ cc: Warning: ./_cursesmodule.c, line 1488: In this statement, "termname(...)" of type "int", is being converted to "pointer to const char". (cvtdiftypes) NoArgReturnStringFunction(termname) ^ (more elided) and cc: Warning: ./_cursesmodule.c, line 305: The scalar variable "arg1" is fetched but not initialized. And there may be other such fetches of this variable that have not been reported in this compilation. (uninit1) Window_NoArg2TupleReturnFunction(getparyx, int, "(ii)") ^ cc: Warning: ./_cursesmodule.c, line 305: The scalar variable "arg2" is fetched but not initialized. And there may be other such fetches of this variable that have not been reported in this compilation. (uninit1) Window_NoArg2TupleReturnFunction(getparyx, int, "(ii)") ^ (more elided) and at link time, fails with: ld: Unresolved: getbegyx getmaxyx getparyx I've held off bothering anyone about this, but it begins to look as though no-one else has noticed... My platform? Tru64 Unix, V4.0F (aka OSF1). The recent pow() bug hit this platform, too. Happy to do any testing... -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From guido at python.org Mon Jan 8 15:27:50 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 09:27:50 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: Your message of "Mon, 08 Jan 2001 01:49:45 EST." <20010108014945.A19516@thyrsus.com> References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> Message-ID: <200101081427.JAA03146@cj20424-a.reston1.va.home.com> > You may be right. Still, this patch solves the immediate problem in a > reasonably clean way, and I urge that it should go in. We can do a > more complete reorganization of the build process later. (I'll help with > that; I'm pretty expert with autoconf and friends.) I expect Andrew's code to go in before 2.1 is released. So I don't see a reason why we should hurry and check in a stop-gap measure. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 8 15:33:09 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 09:33:09 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Mon, 08 Jan 2001 00:26:03 PST." <20010108002603.X17220@lyra.org> References: <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> Message-ID: <200101081433.JAA03185@cj20424-a.reston1.va.home.com> > hehe... adding __exports__ to your module is fine. Adding more crud to > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > Somebody wants to use "from foo import *" on a module not designed for it? > Too bad for them. If you're suggesting __exports__ is to patch over problems > caused by "from foo import *", then I think you're barking up the wrong tree > :-) You haven't been answering many newbie questions lately, have you? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 8 16:06:28 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 10:06:28 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Sun, 07 Jan 2001 17:15:27 EST." <20010107171527.A5093@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> Message-ID: <200101081506.KAA03404@cj20424-a.reston1.va.home.com> > > So, I see no reason why the logic in your program couldn't take care > > of this, which in general is a preferred way to solve a problem than > > to change the language. > > OK, two objections, one practical and one (more important) esthetic: > > Practical: I guess I oversimplified the code for expository purposes. > What's actually going on is that I have two parser classes both based > on shlex -- they do character-at-a-time input and don't actually > *have* accessible line buffers. And what's wrong with always starting the second parser? If the stream was at EOF it will simply process zero lines. Or does your parser have a problem with empty input? > Esthetic: Yes, I can have the first parser set a flag, or return some > EOF token. But this seems deeply wrong to me, because EOFness is not > a property of the parser but of the underlying stream object. It > seems to me that my program ought to be able to ask the stream object > whether it's at EOF rather than carrying its own flag for that state. Eric, before we go furhter, can you give an exact definition of EOFness to me? > In Python as it is, there's no clean way to do this. I'd have to do a > nonzero-length read to test it (I failed to check the right alternate > case before when I tried zero-length). That's really broken. What if the > neither the underlying stream nor the parser supports pushback? > > Do you see now why I think this is a more general issue? No. What's wrong with just setting the parser loose on the input and letting it deal with EOF? In your example, apparently a line containing the word "history" signals that the rest of the file must be parsed by the second parser. What if "history" is the last line of the file? The eof() test can't tell you *that*! > Now, another and more general way to handle this would be to make an > equivalent of the old FIONCLEX ioctl part of Python's standard set of > file object methods -- a way to ask "how many bytes are ready to be > read in this stream? There's no portable way to do that. > Trivial to make it work for plain files, of course. Harder to make it > work usefully for pipes/fifos/sockets/terminals. Having it pass up the > results of the fstat.size field (corrected for the current seek address > if you're reading a plain file) would be a good start. This seems totally the wrong level to solve your problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Tue Jan 9 00:13:21 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 01:13:21 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101081433.JAA03185@cj20424-a.reston1.va.home.com> References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> Message-ID: <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 09:33:09 -0500, Guido van Rossum wrote: > > hehe... adding __exports__ to your module is fine. Adding more crud to > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > caused by "from foo import *", then I think you're barking up the wrong tree > > :-) > > You haven't been answering many newbie questions lately, have you? :-) Well, I have. And frankly, I think having "from foo import *" issue a warning at 2.1 a *much* better solution. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido at python.org Mon Jan 8 16:15:20 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 10:15:20 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Tue, 09 Jan 2001 01:13:21 +0200." <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> [Greg] > > > hehe... adding __exports__ to your module is fine. Adding more crud to > > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > > caused by "from foo import *", then I think you're barking up the wrong tree > > > :-) [Guido] > > You haven't been answering many newbie questions lately, have you? :-) [Moshe] > Well, I have. > And frankly, I think having "from foo import *" issue a warning at 2.1 > a *much* better solution. (1) For what problem? (2) Under exactly what circumstances do you want from foo import * issue a warning? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jan 8 16:26:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 16:26:21 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101071845.f07IjFi01249@mira.informatik.hu-berlin.de> Message-ID: <3A59DC1D.29DE500B@lemburg.com> "Martin v. Loewis" wrote: > > Authors of extension packages often find the need to auto-import some > of their modules. This is often needed for registration, e.g. a codec > author (like Tamito KAJIYAMA, who wrote the JapaneseCodecs package) > may need to register a search function with codecs.register. This is > currently only possible by writing into sitecustomize.py, which must > be done by the system administrator manually. > > To enhance the service of site.py, I've written the patch > > http://sourceforge.net/patch/?func=detailpatch&patch_id=103134&group_id=5470 > > which treats lines in PTH files which start with "import" as > statements and executes them, instead of appending these lines to > sys.path. > > The patch is relatively small, but since it is an extension: Do I need > to write a PEP for it? Just curious: wouldn't this introduce a /tmp-style problem to Python ? The scenario is quite simple: a Python script runs under root. The script could pick up a lingering .pth file (e.g. from /tmp or one of its subdirs -- distutils does this !) and then executes arbitrary code as *root*. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Mon Jan 8 16:43:05 2001 From: jim at interet.com (James C. Ahlstrom) Date: Mon, 08 Jan 2001 10:43:05 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? Message-ID: <3A59E009.96922CA5@interet.com> There a number of problems which frequently recur on c.l.p that can serve as a source of Python improvement ideas. On December 30, 2000 gerson.kurz at t-online.de (Gerson Kurz) writes: If I embedd Python in a Win32 console application (using Demo\embed.c), everything works fine. If I take the very same piece of code and put it in a Win32 Windows application (not MFC, just a plain WinMain()) I see no output (and more importantly so, no errors), because the application does not have a stdout/stderr set up. This is well known. Windows developers must replace sys.stdout and sys.stderr with alternative mechanisms. Unfortunately this solution does not completely work because errors can occur before sys.stdout is replaced. I propose patching pythonw.exe (WinMain.c) and adding a new module to fix this so it Just Works. The patch is completely Windows specific. I am not sure if this constitutes a PEP, but would like everyone's feedback anyway. Design Requirements 1) "pythonw.exe myfile.py" will give the usual error message if myfile.py does not exist. 2) "pythonw.exe myfile.py" will give the usual traceback for a syntax error in myfile.py. 3) python.exe will provide a useful C-language stdout/stderr so the user does not have to replace sys.stdout/err herself. 4) None of the above will interfere will the user's replacement of sys.stdout/err for her own purposes. Description of Patch A new module winstdoutmodule.c (138 lines) is included in Windows builds. It contains a C entry point PyWin_StdoutReplace() which creates a valid C stdout/err, and code to display output in a popup dialog box. There is a Python entry point winstdout.print() to display output, but it is only used for special purposes, and the typical user will never import winstdout. The file WinMain.c calls PyWin_StdoutReplace() before it calls Py_Main(), and PyWin_StdoutPrint() afterwards. This is meant to display startup error messages. Normally, any available output is displayed when the system is idle. Technical Details Some experimentation (as opposed to documentation) shows that Win32 programs have a valid FILE * stdout, but fileno(stdout) gives INVALID_HANDLE_VALUE; the FILE * has an invalid OS file object. It is tempting to hack the FILE structure directly. But it is more prudent to use the only documented way to replace stdout, namely the standard call "freopen()" (also available on Unix). The design uses this call to open a temporary file to append stdout and stderr output. To display output, the file is checked when the system is idle, and MessageBox() is called with the file contents if any. Status After a few false starts, I now have working code. Is this a good idea? If so, is the implementation optimal (comments from MarkH especially welcome)? JimA From mal at lemburg.com Mon Jan 8 16:52:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 16:52:32 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <3A59E240.7F77790E@lemburg.com> Moshe Zadka wrote: > > On Mon, 08 Jan 2001 09:33:09 -0500, Guido van Rossum wrote: > > > hehe... adding __exports__ to your module is fine. Adding more crud to > > > Python, in opposition to the "we're all adults" motto, doesn't seem Right. > > > > > > Somebody wants to use "from foo import *" on a module not designed for it? > > > Too bad for them. If you're suggesting __exports__ is to patch over problems > > > caused by "from foo import *", then I think you're barking up the wrong tree > > > :-) > > > > You haven't been answering many newbie questions lately, have you? :-) > > Well, I have. > And frankly, I think having "from foo import *" issue a warning at 2.1 > a *much* better solution. Why raise a warning ? "from xyz import *" is still very useful in intercative sessions and also has some merrits when it comes to importing all subpackages of a package (well, at least those listed in __all__). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From barry at digicool.com Mon Jan 8 16:54:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 8 Jan 2001 10:54:10 -0500 Subject: [Python-Dev] Add __exports__ to modules References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> Message-ID: <14937.58018.792925.31985@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> it would *totally* stop any hope in the future of module level MZ> __getattr__ (or at least complicate the semantics). I think MZ> Alex M. is thinking of a PEP, but he's taking his time, since MZ> no PEPs can be considered until 2.1 is out. Given the current discussion, I'm now -1 on __exports__ unless a PEP is written. I think enough issues and interactions have been brought up that a PEP is warranted first. -Barry From moshez at zadka.site.co.il Tue Jan 9 01:03:00 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 02:03:00 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com>, <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> Message-ID: <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 10:15:20 -0500, Guido van Rossum wrote: > (1) For what problem? Users seeing things they didn't expect in their modules. > (2) Under exactly what circumstances do you want from foo import * > issue a warning? All. If you want to be less extreme, don't warn if the module defines a __from_star_ok__ But in any case, I'm done with this thread. We'll probably won't manage to convince each other. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido at python.org Mon Jan 8 17:04:58 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 11:04:58 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Mon, 08 Jan 2001 10:54:10 EST." <14937.58018.792925.31985@anthem.wooz.org> References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <14937.58018.792925.31985@anthem.wooz.org> Message-ID: <200101081604.LAA04464@cj20424-a.reston1.va.home.com> > Given the current discussion, I'm now -1 on __exports__ unless a PEP > is written. I think enough issues and interactions have been brought > up that a PEP is warranted first. I have to agree. I am no longer championing this patch. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Mon Jan 8 17:27:17 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 8 Jan 2001 10:27:17 -0600 (CST) Subject: [Python-Dev] inspect.py In-Reply-To: References: Message-ID: <14937.60005.951163.80255@beluga.mojam.com> Ping> Sorry to bother you if you're busy -- i haven't seen any responses Ping> about inspect.py for a few days and wanted to know what your Ping> reactions were. Fiddling code bits is not the sort of stuff I do very often, but every time I do I wind up having to reacquaint myself with all sorts of object details that slip out of my brain shortly after the latest need is gone. Having a module that hides the details seems like a good idea to me. +1. I vote it go into 2.1 assuming a bit for the library reference can be written in time. Skip From akuchlin at mems-exchange.org Mon Jan 8 17:31:09 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:31:09 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101081427.JAA03146@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 09:27:50AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> Message-ID: <20010108113109.C7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 09:27:50AM -0500, Guido van Rossum wrote: >I expect Andrew's code to go in before 2.1 is released. So I don't >see a reason why we should hurry and check in a stop-gap measure. But it might not; the final version might be unacceptable or run into some intractable problem. Assuming the patch is correct (I haven't looked at it), why not check it in? The work has already been done to write it, after all. --amk From akuchlin at mems-exchange.org Mon Jan 8 17:41:10 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:41:10 -0500 Subject: [Python-Dev] _cursesmodule.c clobbered since Christmas In-Reply-To: <3A59A918.E0D02E0D@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Mon, Jan 08, 2001 at 07:48:40PM +0800 References: <3A59A918.E0D02E0D@per.dem.csiro.au> Message-ID: <20010108114110.D7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 07:48:40PM +0800, Mark Favas wrote: >I last successfully downloaded from CVS, compiled, linked and tested on >Dec. 22 last year. For the last week or so, the current CVS >_cursesmodule.c gives a bunch of compiler warning messages of the form: Hmm... on Dec. 22 there was a sizable change to export a C API from the module; since then there's only been one minor change. Perhaps the last version you compiled successfully was from before I checked in those changes. In any case, I'll look into it as soon as my Compaq test drive account is usable and I have access to a Tru64 4.0 machine again. Thanks for the report! Once the PEP 229 changes go in, many more modules will be tried on many more platforms. It might be worth considering setting up a Tinderbox for Python, or at least doing a systematic test on several platforms before releases. --amk From paulp at ActiveState.com Mon Jan 8 17:46:47 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:46:47 -0800 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A59EEF7.BB4118BD@ActiveState.com> Tim Peters wrote: > > ... It doesn't make any *sense* that os exports > "sys" either, or that random exports "cos", or that cgi exports "string", or > ... this inelegance is ubiquitous. I agree strongly. I think that Python people are careless about what their module dictionaries look like. My two main annoyances are modules that export other modules randomly and modules that export huge wacks of constants. > Indeed. But I'm suggesting that the problems that *can* arise from > "import*" illustrate the fundamental silliness of exporting things by > accident. It's come up much more often for me when I'm looking over > someone's shoulder, teaching them how to use dir() in an interactive shell > to answer their own damn questions <0.5 wink>. It's usually the case that > dir(M) shows them something that isn't documented, and over time I am *not* > pleased that "oh, I guess the 'string' in there is just crap" is how they > learn to view it. Screw dir()! Let's talk about important stuff: Komodo. And Idle. And WingIDE. And PythonWorks and PythonWin. :) How are class browsers and "intellisense prompters" supposed to know that it "makes sense" to prompt the user with os.path but not CGIHTTPServer.os.path. Overall, I think Tim is right. We are all adults here and part of being adults is keeping your privates private and your nose clean. Paul Prescod From paulp at ActiveState.com Mon Jan 8 17:47:39 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:47:39 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <20010107232532.V17220@lyra.org>, <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> Message-ID: <3A59EF2B.792801E5@ActiveState.com> Moshe Zadka wrote: > > ... > Let me "me to" here: > Put another way, what Greg said is just a rephrase of "don't use from > foo import * unless foo's docos say it's OK". That's not the issue. It's not about keeping people out of your module. In fact I would propose that mod.__dict__ should be as loose as ever. It's a user interface issue. If we encourage people to learn about modules in interactive environments like the prompt using dir(), class browsers and IDEs then we need to create modules that are friendly for those users. I think that the current situation is pretty bad that way. what does CGIHTTPServer export BaseHTTPServer? And why is CGIHTTPServer.CGIHTTPServer a class but CGIHTTPServer.BaseHTTPServer is a module? We go to great lengths to make the syntax newbie friendly. I think that we should make similar efforts in a cleanly reflective class library. > Add to that the simple > access control of a leading underscore, and I don't see any place > which needs it. > > Something better to do would be to use > import foo as _foo It's pretty clear that nobody does this now and nobody is going to start doing it in the near future. It's too invasive and it makes the code too ugly. Why obfuscate thousands of lines of code when a simple feature can mitigate that? >... > One more point against __exports__, which I hoped I would not have to > make (but when I'm up against the timbot *and* Guido, I need to pull > out the heavy artillery): it would *totally* stop any hope in the > future of module level __getattr__ (or at least complicate the semantics). > I think Alex M. is thinking of a PEP, but he's taking his time, since > no PEPs can be considered until 2.1 is out. __exports__ would merely be considered an implementation detail of the "default __getattr__". Custom __getattr__'s could decide whether to respect it or not. It doesn't complicate anything much. Paul Prescod From nas at arctrix.com Mon Jan 8 10:54:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 8 Jan 2001 01:54:55 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A59E009.96922CA5@interet.com>; from jim@interet.com on Mon, Jan 08, 2001 at 10:43:05AM -0500 References: <3A59E009.96922CA5@interet.com> Message-ID: <20010108015455.A15138@glacier.fnational.com> On Mon, Jan 08, 2001 at 10:43:05AM -0500, James C. Ahlstrom wrote: > Is this a good idea? If so, is the implementation optimal > (comments from MarkH especially welcome)? The general idea sounds good to me. Having tracebacks go nowhere when running pythonw is un-Python-like. I don't know enough about MFC, etc. to comment on the specifics of your patch. Neil From akuchlin at mems-exchange.org Mon Jan 8 17:49:13 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 11:49:13 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EEF7.BB4118BD@ActiveState.com>; from paulp@ActiveState.com on Mon, Jan 08, 2001 at 08:46:47AM -0800 References: <3A59EEF7.BB4118BD@ActiveState.com> Message-ID: <20010108114913.E7563@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 08:46:47AM -0800, Paul Prescod wrote: >How are class browsers and "intellisense prompters" supposed to know >that it "makes sense" to prompt the user with os.path but not >CGIHTTPServer.os.path. Could we then simply adopt __exports__ as a convention for such browsers, but with no changes to core Python to support it? Browsers would then follow the algorithm "Use __exports__ if present, dir() if not." --amk From paulp at ActiveState.com Mon Jan 8 17:51:26 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 08:51:26 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <3A59F00E.53A0A32A@ActiveState.com> Tim Peters wrote: > > .... > > Perl appears to ignore the issue of thread safety here (on Windows and > everywhere else). If you can create a sample program that demonstrates the unsafety I'll anonymously submit it as a bug on our internal system and ensure that the next version of Perl is as slow as Python. :) Seriously: If someone comes at me with Perl-IO-is-way-faster-than-Python-IO, I'd like to know what concretely they've given up in order to achieve that performance. And even just for my own interest I'd like to understand the cost/benefit of stream thread safety. For instance would it make sense to just write a thread-safe wrapper for streams used from multiple threads? Paul Prescod From paulp at ActiveState.com Mon Jan 8 18:01:49 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 08 Jan 2001 09:01:49 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <3A59EEF7.BB4118BD@ActiveState.com> <20010108114913.E7563@kronos.cnri.reston.va.us> Message-ID: <3A59F27D.C27B8CD0@ActiveState.com> Andrew Kuchling wrote: > > ... > > Could we then simply adopt __exports__ as a convention for such > browsers, but with no changes to core Python to support it? Browsers > would then follow the algorithm "Use __exports__ if present, dir() if > not." dir() is one of the "interactive tools" I'd like to work better in the presence of __exports__. On the other hand, dir() works pretty poorly for object instances today so maybe we need something new anyhow. Perhaps attrs()? If there were an "attrs()" and it basically returned __exports__ if it existed and dir() if it didn't, then I would buy it. Graphical apps would just build on attrs(). Paul From MarkH at ActiveState.com Mon Jan 8 18:04:31 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 8 Jan 2001 09:04:31 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A59E009.96922CA5@interet.com> Message-ID: > Is this a good idea? If so, is the implementation optimal Im really on the fence here. Note however that your solution does not solve the original problem. Eg, your example is: > On December 30, 2000 gerson.kurz at t-online.de (Gerson Kurz) writes: > > If I embedd Python in a Win32 console application (using > Demo\embed.c), everything works fine. If I take the very same piece But your solution involves: > The file WinMain.c calls PyWin_StdoutReplace() before it > calls Py_Main(), and PyWin_StdoutPrint() afterwards. This Note that the original problem was _embedding_ Python - thus, you need to patch _their_ WinMain to make it work for them - something you can't do. Even if PyWin_StdoutReplace() was a public symbol so they _could_ call it, I am not convinced they would - it is almost certain they will still need to redirect output to somewhere useful, so why bother redirecting it temporarily just to redirect it for real immediately after? Finally, I am slightly concerned about the possibility of "hanging" certain programs. For example, I believe that DCOM will often invoke a COM server in a different "desktop" than the user (this is also true for Services, but Python services don't use pythonw.exe). Thus, a Python program may end up hanging with a dialog box, but in the context where no user is able to see it. However, this could be addressed by adding a command-line option to prevent this new behaviour kicking in. I would prefer to see a decent API for extracting error and traceback information from Python. On the other hand, I _do_ see the problem for "newbies" trying to use pythonw.exe. So - I guess I am saying that I don't see this as optimal, and it doesnt solve the original problem you pointed at - but in the interests of making pythonw.exe seem "less broken" for newbies, I could live with this as long as I could prevent it when necessary. Another option would be to use the Win32 Console APIs, and simply attempt to create a console for the error message. Eg, maybe PyErr_Print() could be changed to check for the existance of a console, and if not found, create it. However, the problem with this approach is that the error message will often be printed just as the process is terminating - meaning you will see a new console with the error message for about 0.025 of a second before it vanishes due to process termination. Any sort of "press any key to terminate" option then leaves us in the same position - if no user can see the message, the process appears hung. Mark. From andreas at andreas-jung.com Mon Jan 8 18:06:16 2001 From: andreas at andreas-jung.com (Andreas Jung) Date: Mon, 8 Jan 2001 18:06:16 +0100 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 In-Reply-To: <3A58EFC3.5A722FF0@tismer.com>; from tismer@tismer.com on Mon, Jan 08, 2001 at 12:37:55AM +0200 References: <3A58EFC3.5A722FF0@tismer.com> Message-ID: <20010108180616.A18993@yetix.sz-sb.de> On Mon, Jan 08, 2001 at 12:37:55AM +0200, Christian Tismer wrote: > Dear community, > > I'm happy to announce that > > Stackless Python 2.0 > > is finally ready and available for download. > > Stackless Python for Python 1.5.2+ also got some minor > enhancements. Both versions are available as Win32 > installer files here: Are there patches available against the standard Python 2.0 source code tree ? Andreas From tismer at tismer.com Mon Jan 8 17:15:55 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 08 Jan 2001 18:15:55 +0200 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 References: <3A58EFC3.5A722FF0@tismer.com> <20010108180616.A18993@yetix.sz-sb.de> Message-ID: <3A59E7BB.6908B7E2@tismer.com> Andreas Jung wrote: > > On Mon, Jan 08, 2001 at 12:37:55AM +0200, Christian Tismer wrote: > > Dear community, > > > > I'm happy to announce that > > > > Stackless Python 2.0 > > > > is finally ready and available for download. > > > > Stackless Python for Python 1.5.2+ also got some minor > > enhancements. Both versions are available as Win32 > > installer files here: > > Are there patches available against the standard Python 2.0 > source code tree ? I had no time yet to put the source trees on the web. Should happen in one or two days. The I will probably not provide patches, hoping that some other Unix people will catch up and provide that part. This worked the same for the 1.5.2 version. The 2.0 port consists of 10 or so files, which can be used as direct replacements for the same files in the 2.0 distro. I think on Unix this is the right way to go. For me it is simpler to have my own litle tree, since I'm working with Windows, and I just have to modify my VC++ project file. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From moshez at zadka.site.co.il Tue Jan 9 02:30:09 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 03:30:09 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59F27D.C27B8CD0@ActiveState.com> References: <3A59F27D.C27B8CD0@ActiveState.com>, <3A59EEF7.BB4118BD@ActiveState.com> <20010108114913.E7563@kronos.cnri.reston.va.us> Message-ID: <20010109013009.37D6DA82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 09:01:49 -0800, Paul Prescod wrote: > dir() is one of the "interactive tools" I'd like to work better in the > presence of __exports__. On the other hand, dir() works pretty poorly > for object instances today so maybe we need something new anyhow. > Perhaps attrs()? > > If there were an "attrs()" and it basically returned __exports__ if it > existed and dir() if it didn't, then I would buy it. Graphical apps > would just build on attrs(). Even better, __exports__ could be what was imported in from foo import *. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From andreas at andreas-jung.com Mon Jan 8 18:25:36 2001 From: andreas at andreas-jung.com (Andreas Jung) Date: Mon, 8 Jan 2001 18:25:36 +0100 Subject: [Python-Dev] Re: ANN: Stackless Python 2.0 In-Reply-To: <3A59E7BB.6908B7E2@tismer.com>; from tismer@tismer.com on Mon, Jan 08, 2001 at 06:15:55PM +0200 References: <3A58EFC3.5A722FF0@tismer.com> <20010108180616.A18993@yetix.sz-sb.de> <3A59E7BB.6908B7E2@tismer.com> Message-ID: <20010108182536.A20361@yetix.sz-sb.de> On Mon, Jan 08, 2001 at 06:15:55PM +0200, Christian Tismer wrote: > > The 2.0 port consists of 10 or so files, which can be used > as direct replacements for the same files in the 2.0 distro. > I think on Unix this is the right way to go. > For me it is simpler to have my own litle tree, since I'm > working with Windows, and I just have to modify my VC++ > project file. I would prefer a tar.gz archive that contains just the modified files. With this approach it is easy possible to extract the archive inside the Python source tree. Andreas From loewis at informatik.hu-berlin.de Mon Jan 8 18:51:28 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 8 Jan 2001 18:51:28 +0100 (MET) Subject: [Python-Dev] Extending startup code: PEP needed? Message-ID: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> > Just curious: wouldn't this introduce a /tmp-style problem to > Python ? I tried, but I could not produce such a problem. > The scenario is quite simple: a Python script runs under root. > The script could pick up a lingering .pth file (e.g. from /tmp > or one of its subdirs -- distutils does this !) and then executes > arbitrary code as *root*. No, Python looks only in a few places for pth file: {,}{,/lib/python/site-packages,/lib/site-python} so it won't pick up pth files in /tmp. Regards, Martin From esr at thyrsus.com Mon Jan 8 19:01:37 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 13:01:37 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101081506.KAA03404@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 10:06:28AM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> Message-ID: <20010108130137.E22834@thyrsus.com> Guido van Rossum : > Eric, before we go furhter, can you give an exact definition of > EOFness to me? A file is at EOF when attempts to read more data from it will fail returning no data. > What's wrong with just setting the parser loose on the input and > letting it deal with EOF? Nothing wrong in theory, but it's a problem in practice. I don't want to import the second parser unless it's actually needed, because it's much larger than the first one. > In your example, apparently a line > containing the word "history" signals that the rest of the file must > be parsed by the second parser. What if "history" is the last line of > the file? The eof() test can't tell you *that*! Right. That case never happens. I mean it *really* never happens :-). What we're talking about is a game system. The first parser recognizes a spec language for describing games of a particular class (variants of Diplomacy, if that's meaningful to you). The system keeps logfiles which consist of a a section in the game description language, optionally followed by the token "history" and an order log. The parser for the order log language is a *lot* larger than the one for the description language. This is why I said I don't want the first parser to just call the second. I want to test for EOF to know whether I have to import the second parser at all! Here's the beginning of my problem: the first parser can't export a line buffer, because it doesn't *have* a line buffer. It's a subclass of shlex and does single-character reads. There are two ways I can cope with this. One is to do a (nonzero) length read after the first parser exits; the other is to have the first parser set a state flag controlling whether the second parser loads. This is where it bites that I can't test for EOF with a read(0). The second shlex parser only has token-level pushback! If do a nonzero-length read and I get data, I'm screwed. On the other hand (as I said before) setting a lexer state flag seems wrong, because EOFness is a property of the underlying stream rather than the parser. I'd be duplicating state that exists in the stdio stream structure anyway; it ought to be accessible. > > Now, another and more general way to handle this would be to make an > > equivalent of the old FIONCLEX ioctl part of Python's standard set of > > file object methods -- a way to ask "how many bytes are ready to be > > read in this stream? > > There's no portable way to do that. Actually, fstat(2) is portable enough to support a very useful approximation of FIONCLEX. I know, because I tried it. Last night I coded up a "waiting" method for file objects that calls fstat(2) on the associated file descriptor. For a plain file, it then subtracts the result of ftell() from the fstat size field and returns that -- for other files, it simply returns the size field. I then tested this on plain files, FIFOs, and sockets under Linux. It turns out fstat(2) gives useful information in all three cases (a count of characters waiting in the buffer in the latter two). I expected this; it should be true under all current Unixes. fstat(2) does not give useful size-field results for Linux block devices. I didn't test the character (terminal) devices. (I documented my results in Python's Doc/lib/stat.tex, in a patch I have already submitted to SourceForge.) I would be quite surprised if the plain-file case didn't work on Mac and Windows. I would be a little surprised if the socket case failed, because all three probably inherited fstat(2) from the ancestral BSD TCP/IP stack. Just having the plain-file case work would, IMHO, be justification enough for this method. If it turns out to be portable across Mac and Windows sockets as well, *huge* win. Could this be tested by someone with access to Windows and Mac systems? -- Eric S. Raymond An armed society is a polite society. Manners are good when one may have to back up his acts with his life. -- Robert A. Heinlein, "Beyond This Horizon", 1942 From mal at lemburg.com Mon Jan 8 19:10:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 19:10:50 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> Message-ID: <3A5A02AA.675A35D1@lemburg.com> Martin von Loewis wrote: > > > Just curious: wouldn't this introduce a /tmp-style problem to > > Python ? > > I tried, but I could not produce such a problem. > > > The scenario is quite simple: a Python script runs under root. > > The script could pick up a lingering .pth file (e.g. from /tmp > > or one of its subdirs -- distutils does this !) and then executes > > arbitrary code as *root*. > > No, Python looks only in a few places for pth file: > {,}{,/lib/python/site-packages,/lib/site-python} > > so it won't pick up pth files in /tmp. Hmm, but what if the Python script picks up a site.py which is different from the standard one distributed with Python ? The code adding (and with the patch: executing) the .pth files is defined in site.py and it is rather easy to override this file by adding a modified site.py file to the current working dir... a potential security hole in its own right, I guess :( -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Jan 8 19:30:34 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 13:30:34 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: Your message of "Mon, 08 Jan 2001 13:01:37 EST." <20010108130137.E22834@thyrsus.com> References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> <20010108130137.E22834@thyrsus.com> Message-ID: <200101081830.NAA05301@cj20424-a.reston1.va.home.com> Eric, take a hint. You're not going to get your eof() method no matter what arguments you bring up. But I'll explain it to you again anyway... :-) > Guido van Rossum : > > Eric, before we go furhter, can you give an exact definition of > > EOFness to me? [Eric] > A file is at EOF when attempts to read more data from it will fail > returning no data. I was afraid you would say this. That's not a condition that's easy to calculate without doing I/O, *and* that's not the condition that you are interested in for your problem. According to your definition, f.eof() should be true in this example: f = open("/etc/passwd") f.seek(0, 2) # Seek to end of file print f.eof() # What will this print??? print `f.readline()` # Will print '' But getting the right result here requires a lot of knowledge about how the file is implemented! While you've explained how this can be implemented on Unix, it can't be implemented with just the tools that stdio gives us. Going beyond stdio in order to implement a feature is a grave decision. After all, Python is portable to many less-than-mainstream operating systems (VxWorks, OS/9, VMS...). Now, if this was just a speed hack (like xreadlines) I could accept having some platform-dependent code, if at least there was a portable way to do it that was just a bit slower. But here you can't convince me that this can be done in a portable way, and I don't want to force porters to figure out how to do this for their platform before their port can work. I also don't want to make f.eof() a non-portable feature: *if* it is provided, it's too important for that. Note that stdio's feof() doesn't have this definition! It is set when the last *read* (or getc(), etc.) stumbled upon an EOF condition. That's also of limited value; it's mostly defined so you can distinguish between errors and EOF when you get a short read. The stdio feof() flag would be false in the above example. > > What's wrong with just setting the parser loose on the input and > > letting it deal with EOF? > > Nothing wrong in theory, but it's a problem in practice. I don't want > to import the second parser unless it's actually needed, because it's much > larger than the first one. So be practical and let the first parser set a global flag that tells you whether it's necessary to load the second one. > > In your example, apparently a line > > containing the word "history" signals that the rest of the file must > > be parsed by the second parser. What if "history" is the last line of > > the file? The eof() test can't tell you *that*! > > Right. That case never happens. I mean it *really* never happens :-). > > What we're talking about is a game system. The first parser recognizes > a spec language for describing games of a particular class (variants of > Diplomacy, if that's meaningful to you). The system keeps logfiles which > consist of a a section in the game description language, optionally > followed by the token "history" and an order log. > > The parser for the order log language is a *lot* larger than the one > for the description language. This is why I said I don't want the > first parser to just call the second. I want to test for EOF to > know whether I have to import the second parser at all! > > Here's the beginning of my problem: the first parser can't export a line > buffer, because it doesn't *have* a line buffer. It's a subclass of > shlex and does single-character reads. > > There are two ways I can cope with this. One is to do a (nonzero) > length read after the first parser exits; the other is to have the > first parser set a state flag controlling whether the second parser > loads. Do the latter. Nothing wrong with it that I can see. > This is where it bites that I can't test for EOF with a read(0). And can you tell me a system where you *can* test for EOF with a read(0)? I've never heard of such a thing. The Unix read() system call has the same properties as Python's f.read(). I'm pretty sure that fread() with a zero count also doesn't give you the information you're after. > The > second shlex parser only has token-level pushback! If do a > nonzero-length read and I get data, I'm screwed. On the other hand > (as I said before) setting a lexer state flag seems wrong, because > EOFness is a property of the underlying stream rather than the parser. > I'd be duplicating state that exists in the stdio stream structure > anyway; it ought to be accessible. Bullshit. The EOFness that you're after (according to your own definition) is not the same as the EOFness of the stdio stream. The EOFness in the stdio stream could help you, but Python resets it -- so that making it available wouldn't be as easy as you claim. Anyway, you seem to have a sufficiently vague idea of what "EOFness" means that I don't think providing access to whatever low-level EOFness condition might exist would do you much good. > > > Now, another and more general way to handle this would be to make an > > > equivalent of the old FIONCLEX ioctl part of Python's standard set of > > > file object methods -- a way to ask "how many bytes are ready to be > > > read in this stream? > > > > There's no portable way to do that. > > Actually, fstat(2) is portable enough to support a very useful > approximation of FIONCLEX. I know, because I tried it. > > Last night I coded up a "waiting" method for file objects that calls > fstat(2) on the associated file descriptor. For a plain file, it > then subtracts the result of ftell() from the fstat size field and > returns that -- for other files, it simply returns the size field. > > I then tested this on plain files, FIFOs, and sockets under Linux. It > turns out fstat(2) gives useful information in all three cases (a > count of characters waiting in the buffer in the latter two). I expected > this; it should be true under all current Unixes. > > fstat(2) does not give useful size-field results for Linux block > devices. I didn't test the character (terminal) devices. (I > documented my results in Python's Doc/lib/stat.tex, in a patch I have > already submitted to SourceForge.) > > I would be quite surprised if the plain-file case didn't work on Mac > and Windows. I would be a little surprised if the socket case failed, > because all three probably inherited fstat(2) from the ancestral BSD > TCP/IP stack. > > Just having the plain-file case work would, IMHO, be justification > enough for this method. If it turns out to be portable across Mac and > Windows sockets as well, *huge* win. Could this be tested by someone > with access to Windows and Mac systems? I don't see the huge win. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 8 19:33:26 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 13:33:26 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 19:10:50 +0100." <3A5A02AA.675A35D1@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> Message-ID: <200101081833.NAA05325@cj20424-a.reston1.va.home.com> Discussions based on Python running as root and picking up untrusted code from $PYTHONPATH are pointless. Of course this is a security hole. If root runs *any* Python script in a way that could pick up even a single untrusted module, there's a security hole. site.py or *.pth files are just a special case of this, so I don't see why this is used as an example. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 8 19:48:40 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 13:48:40 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EF2B.792801E5@ActiveState.com> Message-ID: [Moshe] > Something better to do would be to use > import foo as _foo [Paul] > It's pretty clear that nobody does this now and nobody is going > to start doing it in the near future. It's too invasive and it > makes the code too ugly. Actually, this function is one of my std utilities: def _pvt_import(globs, modname, *items): """globs, modname, *items -> import into globs with leading "_". If *items is empty, set globs["_" + modname] to module modname. If *items is not empty, import each item similarly but don't import the module into globs. Leave names that already begin with an underscore as-is. # import math as _math >>> _pvt_import(globals(), "math") >>> round(_math.pi, 0) 3.0 # import math.sin as _sin and math.floor as _floor >>> _pvt_import(globals(), "math", "sin", "floor") >>> _floor(3.14) 3.0 """ mod = __import__(modname, globals()) if items: for name in items: xname = name if xname[0] != "_": xname = "_" + xname globs[xname] = getattr(mod, name) else: xname = modname if xname[0] != "_": xname = "_" + xname globs[xname] = mod Note that it begins with an underscore because it's *meant* to be exported <0.5 wink>. That is, the module importing this does from utils import _pvt_import because they don't already have _pvt_import to automate adding the underscore, and without the underscore almost everyone would accidentally export "pvt_import" in turn. IOW, import M from N import M not only import M, by default they usually export it too, but the latter is rarely *intended*. So, over the years, I've gone thru several phases of naming objects I *intend* to export with a leading underscore. That's the only way to prevent later imports from exporting by accident. I don't believe I've distributed any code using _pvt_import, though, because it fights against the language and expectations. Metaprogramming against the grain should be a private sin <0.9 wink>. _metaprogramming-ly y'rs - tim From mal at lemburg.com Mon Jan 8 19:40:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 19:40:37 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> Message-ID: <3A5A09A5.D0DC33A1@lemburg.com> Guido van Rossum wrote: > > Discussions based on Python running as root and picking up untrusted > code from $PYTHONPATH are pointless. Of course this is a security > hole. If root runs *any* Python script in a way that could pick up > even a single untrusted module, there's a security hole. site.py or > *.pth files are just a special case of this, so I don't see why this > is used as an example. Agreed; see my reply to Martin. Still, wouldn't it be wise to add some logic to Python to prevent importing untrusted modules, e.g. by making sys.path read-only and disabling the import hook usage using a command line ? This would at least prevent the most obvious attacks. I wonder how RedHat works around these problems. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Mon Jan 8 20:16:45 2001 From: jim at interet.com (James C. Ahlstrom) Date: Mon, 08 Jan 2001 14:16:45 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? References: Message-ID: <3A5A121D.FDD8C2C1@interet.com> Mark Hammond wrote: > Note that the original problem was _embedding_ Python - thus, you need to > patch _their_ WinMain to make it work for them - something you can't do. Correct, if they don't use pythonw.exe, but use a different main program, the new stdout will not be installed. But then they must have their own main.c, and they can add the C call. > Even if PyWin_StdoutReplace() was a public symbol so they _could_ call it, I Yes, the symbol PyWin_StdoutReplace() is public, and they can call it. > am not convinced they would - it is almost certain they will still need to > redirect output to somewhere useful, so why bother redirecting it > temporarily just to redirect it for real immediately after? Redirecting it temporarily is valuable, because if the sys.stdout replacement occurs in (for example) myprog.py, then "pythonw.exe myprog.py" will fail to produce any error messages for a syntax error in myprog.py. Also, I was hoping further sys.stdout redirection would be unnecessary. > Finally, I am slightly concerned about the possibility of "hanging" certain > programs. For example, I believe that DCOM will often invoke a COM server in > a different "desktop" than the user (this is also true for Services, but > Python services don't use pythonw.exe). Thus, a Python program may end up > hanging with a dialog box, but in the context where no user is able to see > it. However, this could be addressed by adding a command-line option to > prevent this new behaviour kicking in. Limiting the code to pythonw.exe instead of trying to install it in python20.dll was supposed to prevent damage to the use of Python in servers. Since pythonw.exe is a Windows (GUI) program, I am assuming there is a screen. The dialog box is started with MessageBox() and a window handle of GetForegroundWindow(). So there doesn't need to be an application window. I have tested it with GUI programs, and it also works when run from a console. Having said that, you may be right that there is some way to hang on a dialog box which can not be seen. It depends on what MessageBox() and GetForegroundWindow() actually do. If it seems that this patch has merit, I would be grateful if you would review the code to look for issues of this type. > I would prefer to see a decent API for extracting error and traceback > information from Python. On the other hand, I _do_ see the problem for > "newbies" trying to use pythonw.exe. There could be an API added to the winstdout module such as msg = winstdout.GetMessageText() which would return saved text, control its display etc. But then the problem remains of actually displaying the messages especially in the context of tracebacks and errors. And it is probably easier to redirect sys.stdout so it does what you want rather than use the API. I do not view winstdout as a "newbie" feature, but rather a generally useful C-language addition to Python. > So - I guess I am saying that I don't see this as optimal, and it doesnt > solve the original problem you pointed at - but in the interests of making > pythonw.exe seem "less broken" for newbies, I could live with this as long > as I could prevent it when necessary. I guess I am saying, perhaps incorrectly, that the mechanism provided will make further redirection of sys.stdout unnecessary 99% of the time. Experimentation shows that Python composes tracebacks and error messages a line or partial line at a time. That is, you can not display each call to printf(), but must wait until the system is idle to be sure that multiple calls to printf() are complete. So this forces you to use the idle processing loop, not rocket science but at least inconvenient. And the only source of stdout/err is tracebacks, error messages and the "print" statement. What would you do with these in a Windows program except display an "OK" dialog box? If someone out there knows of a different example of sys.stdout redirection in use in the real world, it would be helpful if they would describe it. Maybe it could be incorporated. > Another option would be to use the Win32 Console APIs, and simply attempt to > create a console for the error message. Eg, maybe PyErr_Print() could be > changed to check for the existance of a console, and if not found, create > it. However, the problem with this approach is that the error message will > often be printed just as the process is terminating - meaning you will see a > new console with the error message for about 0.025 of a second before it > vanishes due to process termination. Any sort of "press any key to > terminate" option then leaves us in the same position - if no user can see > the message, the process appears hung. Yes, this a problem with the console API approach. Another is that popping up a black console for output instead of the usual "OK" dialog box is unnatural, and will force the user to replace sys.stdout. I was hoping this C stdout will make this unnecessary. JimA From esr at thyrsus.com Mon Jan 8 20:17:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 8 Jan 2001 14:17:50 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <200101081830.NAA05301@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 01:30:34PM -0500 References: <20010106230125.A29058@thyrsus.com> <20010107133032.F4586@thyrsus.com> <200101072113.QAA32467@cj20424-a.reston1.va.home.com> <20010107171527.A5093@thyrsus.com> <200101081506.KAA03404@cj20424-a.reston1.va.home.com> <20010108130137.E22834@thyrsus.com> <200101081830.NAA05301@cj20424-a.reston1.va.home.com> Message-ID: <20010108141750.C23214@thyrsus.com> Guido van Rossum : > [Eric] > > A file is at EOF when attempts to read more data from it will fail > > returning no data. > > I was afraid you would say this. That's not a condition that's easy > to calculate without doing I/O, *and* that's not the condition that > you are interested in for your problem. According to your definition, > f.eof() should be true in this example: > > f = open("/etc/passwd") > f.seek(0, 2) # Seek to end of file > print f.eof() # What will this print??? > print `f.readline()` # Will print '' I agree that after f.seek(0, 2) f is in an end-of-file condition. But I think it's precisely the definition that would be useful for my problem. Contrary to what you say, I think my definition of EOF is quite sharp -- a sequential read would return no data. Better to think of what I need as an "is there data waiting?" query. I should have framed it that way, rather than about EOFness, from the beginning. > But getting the right result here requires a lot of knowledge about > how the file is implemented! While you've explained how this can be > implemented on Unix, it can't be implemented with just the tools that > stdio gives us. Granted. However, it looks possible that "is there data waiting" *can* be portably implemented with the help of fstat(2), which by precedent is also part of Python's toolkit. > I also don't want to make f.eof() a non-portable feature: *if* > it is provided, it's too important for that. Agreed. > Note that stdio's feof() doesn't have this definition! It is set when > the last *read* (or getc(), etc.) stumbled upon an EOF condition. > That's also of limited value; it's mostly defined so you can > distinguish between errors and EOF when you get a short read. The > stdio feof() flag would be false in the above example. OK. You're right about that. I should have thought more clearly about the difference between the state of stdio and the state of the underlying file or device. Access to stdio state won't do by itself. > > This is where it bites that I can't test for EOF with a read(0). > > And can you tell me a system where you *can* test for EOF with a > read(0)? I've never heard of such a thing. The Unix read() system > call has the same properties as Python's f.read(). I'm pretty sure > that fread() with a zero count also doesn't give you the information > you're after. I'd have to test -- but what Unix read(2) does in this case isn't really my point. My real point is that I can't probe for whether there's data waiting to be read in what seems like the obvious way. I expect Python to compensate for the deficiencies of the underlying C, not reflect them. > > Just having the plain-file case work would, IMHO, be justification > > enough for this method. If it turns out to be portable across Mac and > > Windows sockets as well, *huge* win. Could this be tested by someone > > with access to Windows and Mac systems? > > I don't see the huge win. Try "polling after a non-blocking open". A lower-overhead and more natural way to do it than with a poller object. (This is on my mind because I used a poller object to query FIFOs just last week.) The game system I'm working on, BTW, has another point of interest for this list. It is a rather large and complex suite of C programs that makes heavy use of dynamic-memory allocation; I am translating to Python partly in order to avoid chronic misallocation problems (leaks and wild pointers) and partly because the thing needed to be rewritten anyway to eliminate global state so I can embed it an multithreaded server. Side-by-side comparison of the original C and its translation should be quite an interesting educational experience once it's done. That just might be my next yesar's paper. -- Eric S. Raymond It is the assumption of this book that a work of art is a gift, not a commodity. Or, to state the modern case with more precision, that works of art exist simultaneously in two "economies," a market economy and a gift economy. Only one of these is essential, however: a work of art can survive without the market, but where there is no gift there is no art. -- Lewis Hyde, The Gift: Imagination and the Erotic Life of Property From guido at python.org Mon Jan 8 20:36:02 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 14:36:02 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 19:40:37 +0100." <3A5A09A5.D0DC33A1@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> Message-ID: <200101081936.OAA05440@cj20424-a.reston1.va.home.com> > Still, wouldn't it be wise to add some logic to Python to prevent > importing untrusted modules, e.g. by making sys.path read-only and > disabling the import hook usage using a command line ? > > This would at least prevent the most obvious attacks. I wonder how > RedHat works around these problems. I don't understand what kind of attacks you are thinking of. What would making sys.path read-only prevent? You seem to be thinking that some malicious piece of code could try to subvert you by setting sys.path. But what you forget is that if this piece of code cannot be trusted wiuth sys.path, it should not be trusted to run at all! --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis at informatik.hu-berlin.de Mon Jan 8 20:45:44 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 8 Jan 2001 20:45:44 +0100 (MET) Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: <3A5A02AA.675A35D1@lemburg.com> (mal@lemburg.com) References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> Message-ID: <200101081945.UAA12178@pandora.informatik.hu-berlin.de> > The code adding (and with the patch: executing) the .pth files > is defined in site.py and it is rather easy to override this > file by adding a modified site.py file to the current working dir... > a potential security hole in its own right, I guess :( Indeed - independent of my patch changing the other site.py :-) Regards, Martin From skip at mojam.com Mon Jan 8 20:49:22 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 8 Jan 2001 13:49:22 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59EF2B.792801E5@ActiveState.com> References: <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <3A59EF2B.792801E5@ActiveState.com> Message-ID: <14938.6594.44596.509259@beluga.mojam.com> Paul> It's not about keeping people out of your module. In fact I would Paul> propose that mod.__dict__ should be as loose as ever. Okay, how about this as a compromise first step? Allow programmers to put __exports__ lists in their modules but don't do anything with them *except* modify dir() to respect that if it exists? That would pretty up dir() output for newbies, almost certainly not break anything, improve the internal documentation of the modules that use __exports__, and still allow us to move in a more restrictive direction at a later time if we so choose. Skip From moshez at zadka.site.co.il Tue Jan 9 05:04:23 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 06:04:23 +0200 (IST) Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: <3A5A02AA.675A35D1@lemburg.com> References: <3A5A02AA.675A35D1@lemburg.com>, <200101081751.SAA08918@pandora.informatik.hu-berlin.de> Message-ID: <20010109040423.68AA4A82D@darjeeling.zadka.site.co.il> On Mon, 08 Jan 2001 19:10:50 +0100, "M.-A. Lemburg" wrote: > Hmm, but what if the Python script picks up a site.py which is > different from the standard one distributed with Python ? Then the site.py can do whatever it wants. No need to go through PTHs -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Mon Jan 8 20:59:48 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 14:59:48 -0500 Subject: feof status (was: Re: [Python-Dev] Rehabilitating fgets) In-Reply-To: <20010108130137.E22834@thyrsus.com> Message-ID: Quickie: [Guido] > Eric, before we go furhter, can you give an exact definition of > EOFness to me? [Eric] > A file is at EOF when attempts to read more data from it will fail > returning no data. To be very clear about this, that's not what C's feof() means: in general, the end-of-file indicator in std C stream input is set only *after* you've attempted a read that "didn't work". For example, #include void main() { FILE* fp = fopen("guts", "wb"); fputs("abc", fp); fclose(fp); fp = fopen("guts", "rb"); for (;;) { int c; c = getc(fp); printf("getc returned %c (%d)\n", c, c); printf("At EOF after getc? %d\n", feof(fp)); if (c == EOF) break; } } Unless your C is broken, feof() will return 0 after getc() returns 'a', and again after 'b', and again after 'c'. It's not until getc() returns EOF that feof() first returns a non-zero result. Then add these two lines after the "for": fseek(fp, 0L, SEEK_END); printf("after seeking to the end, feof() says %d\n", feof(fp)); Unless your fseek() is non-std, that clears the end-of-file indicator, and regardless of to where you seek. So the std behavior throughout libc is much like Python's behavior: there's nothing that can tell you whether you're at the end of the file, in general, short of trying to read and failing to get something back. In your case you seem to *know* that you have a "plain old file", meaning that its size is well-defined and that ftell() makes sense for it. You also seem to know that you don't have to worry about anyone else, e.g., appending to it (or in any other way changing its size, or changing your stream's file position), while you're mucking with it. So why not just do f.tell() and compare that to the size yourself? This sounds easy for you to do, but in this particular case you enjoy the benefits of a world of assumptions that aren't true in general. > ... > This is where it bites that I can't test for EOF with a read(0). You can't in std C using an fread of 0 bytes either -- that has no effect on the end-of-file indicator. Add if (c == 'c') { char buf[100]; size_t i = fread(buf, 1, 0, fp); printf("after fread of 0 bytes, feof() says %d\n", feof(fp)); } before the "(c == EOF)" test above to try that on your platform. > ... > I would be quite surprised if the plain-file case didn't work on Mac > and Windows. Don't know about Mac. On Windows everything is grossly complicated because of line-end translations in text mode. Like the C std says, the only *portable* thing you can do with an ftell() result for a text file is feed it back unaltered to fseek(). It so happens that on Windows, using MS's libc, if f.readline() returns "abc\n" for the first line of a native text file, f.tell() returns 5, reflecting the actual byte offset in the file (including the \r that .readline() doesn't show you). So you *can* get away with comparing f.tell() to the file's size on Windows too (using the MS C compiler; don't know about others). the-operational-defn-of-eof-is-the-only-portable-defn- there-is-ly y'rs - tim From moshez at zadka.site.co.il Tue Jan 9 05:08:29 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 9 Jan 2001 06:08:29 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14938.6594.44596.509259@beluga.mojam.com> References: <14938.6594.44596.509259@beluga.mojam.com>, <20010107232532.V17220@lyra.org> <20010106110033.52127A84F@darjeeling.zadka.site.co.il> <20010108165057.8FED8A82D@darjeeling.zadka.site.co.il> <3A59EF2B.792801E5@ActiveState.com> Message-ID: <20010109040829.BDB66A82D@darjeeling.zadka.site.co.il> [Paul Prescod] > It's not about keeping people out of your module. In fact I would > propose that mod.__dict__ should be as loose as ever. [Skip Montanaro] > Okay, how about this as a compromise first step? Allow programmers to put > __exports__ lists in their modules but don't do anything with them *except* > modify dir() to respect that if it exists? That would pretty up dir() > output for newbies, almost certainly not break anything, improve the > internal documentation of the modules that use __exports__, and still allow > us to move in a more restrictive direction at a later time if we so choose. I'm +1 on that personally. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal at lemburg.com Mon Jan 8 21:38:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 08 Jan 2001 21:38:00 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> Message-ID: <3A5A2528.C289BE1D@lemburg.com> Guido van Rossum wrote: > > > Still, wouldn't it be wise to add some logic to Python to prevent > > importing untrusted modules, e.g. by making sys.path read-only and > > disabling the import hook usage using a command line ? > > > > This would at least prevent the most obvious attacks. I wonder how > > RedHat works around these problems. > > I don't understand what kind of attacks you are thinking of. What > would making sys.path read-only prevent? You seem to be thinking that > some malicious piece of code could try to subvert you by setting > sys.path. But what you forget is that if this piece of code cannot be > trusted wiuth sys.path, it should not be trusted to run at all! I was thinking an attack where knowledge of common temporary execution locations is used to trick Python into executing untrusted code -- the untrusted code would only have to be copied to the known temporary execution directory and then gets executed by Python next time the program using the temporary location is invoked. But you're right: this is possible with and without sys.path being writeable or not. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Mon Jan 8 21:45:57 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 21:45:57 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <200101081427.JAA03146@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 08, 2001 at 09:27:50AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> Message-ID: <20010108214557.H402@xs4all.nl> On Mon, Jan 08, 2001 at 09:27:50AM -0500, Guido van Rossum wrote: > > You may be right. Still, this patch solves the immediate problem in a > > reasonably clean way, and I urge that it should go in. We can do a > > more complete reorganization of the build process later. (I'll help with > > that; I'm pretty expert with autoconf and friends.) > I expect Andrew's code to go in before 2.1 is released. So I don't > see a reason why we should hurry and check in a stop-gap measure. Oh, we're gonna distribute binaries of Python 2.0/1.5.2-with-distutils for every known platform that can run configure ? :) I still think there are more than enough platforms without Python to warrant using autoconf for configuring modules. The module list and their demands are stable enough to make maintenance a fair breeze, IMHO. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Mon Jan 8 22:57:58 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 16:57:58 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108214557.H402@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 08, 2001 at 09:45:57PM +0100 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108214557.H402@xs4all.nl> Message-ID: <20010108165758.B9260@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 09:45:57PM +0100, Thomas Wouters wrote: >every known platform that can run configure ? :) I still think there are >more than enough platforms without Python to warrant using autoconf for >configuring modules. The module list and their demands are stable enough to >make maintenance a fair breeze, IMHO. Umm... the proposed PEP 229 patch would compile a Python binary with sre, posix, and strop statically linked; this minimal Python is then used to run the setup.py script. You shouldn't require a preinstalled Python, though the current version of the patch doesn't meet this requirement yet. --amk From tim.one at home.com Mon Jan 8 21:59:40 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 15:59:40 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: [Tim] > Perl appears to ignore the issue of thread safety here (on Windows and > everywhere else). [Paul Prescod] > If you can create a sample program that demonstrates the unsafety > I'll anonymously submit it as a bug on our internal system I don't want to spend time on that, as I *assume* it's already well-known within the Perl thread community. Besides, the last version of Perl I got from ActiveState complains: No threads in this perl at temp.pl line 14 if I try to use Perl threads. That's: > \perl\bin\perl -v This is perl, v5.6.0 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2000, Larry Wall Binary build 620 provided by ActiveState Tool Corp. http://www.ActiveState.com Built 18:31:05 Oct 31 2000 ... If I can repair that by downloading a more recent release, let me know. > and ensure that the next version of Perl is as slow as Python. :) I don't want to slow them down! To the contrary, now I've got a solid reason for why I keep using Perl for simple high-volume text-crunching jobs . > Seriously: If someone comes at me with Perl-IO-is-way-faster-than- > Python-IO, I'd like to know what concretely they've given up in order > to achieve that performance. My line-at-a-time test case used (rounding to nearest whole integers) 30 seconds in Python and 6 in Perl. The result of testing many changes to Python's implementation was that the excess 24 seconds broke down like so: 17 spent inside internal MS threadsafe getc() lock/unlock routines 5 uncertain, but evidence suggests much of it due to MS malloc/realloc (Perl does its own memory mgmt) 2 for not copying directly out of the platform FILE* implementation struct in a highly optimized loop (like Perl does) My last checkin to fileobject.c reclaimed 17 seconds on Win98SE while remaining threadsafe, via a combination of locking per line instead of per character, and invoking realloc much less often (only for lines exceeding 200 chars). (BTW, I'm still curious to know how that compares to the getc_unlocked hack on a platform other than Windows!) > And even just for my own interest I'd like to understand the cost/ > benefit of stream thread safety. If you're not *using* threads, or not using them to muck with the same stream at the same time, the ratio is infinite. And that's usually the case. > For instance would it make sense to just write a thread-safe > wrapper for streams used from multiple threads? Alas, on Windows you can't pick and choose: you get the threadsafe libc, or you don't. So long as anyone may want to use threads for any reason whatsoever, we must link with threadsafe libraries. But, as above, on Windows we're not paying much for that anymore in this case (unless maybe the threadsafe MS malloc family is also outrageously slower than its careless counterpart ...). It does prevent me from persuing the "optimized inner loop" business, because MS doesn't expose its locking primitives (so I can't do in C everything I would need to do to optimize the inner loop while remaining threadsafe). there-are-damn-few-pieces-of-libc-we-wouldn't-be-better-off- writing-ourselves-but-then-we'd-have-a-much-harder-time- playing-with-others'-code-ly y'rs - tim From akuchlin at mems-exchange.org Mon Jan 8 22:15:34 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 8 Jan 2001 16:15:34 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:59:40PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: <20010108161534.A2392@kronos.cnri.reston.va.us> On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: >200 chars). (BTW, I'm still curious to know how that compares to the >getc_unlocked hack on a platform other than Windows!) On Solaris and Linux, the results seemed to be lost in the noise. Repeated runs of filetest.py were sometimes faster than without USE_MS_GETLINE_HACK, so the variation is probably large enough to swamp any difference between the two. (Assuming I enabled the getline hack correctly of course; someone please replicate...) --amk Linux: w/o USE_MS_GETLINE_HACK kronos Python-2.0>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.186 0.190 readlines_sizehint 0.108 0.110 using_fileinput 0.447 0.450 while_readline 0.184 0.180 Linux w/ USE_MS_GETLINE_HACK: kronos Python-2.0>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.178 0.180 readlines_sizehint 0.108 0.110 using_fileinput 0.434 0.430 while_readline 0.183 0.190 Solaris w/o USE_MS_GETLINE_HACK: amarok src>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.640 0.630 readlines_sizehint 0.278 0.280 using_fileinput 1.874 1.820 while_readline 0.839 0.840 Solaris w/ USE_MS_GETLINE_HACK: amarok src>./python ~/filetest.py total 1559913 chars and 32513 lines count_chars_lines 0.569 0.570 readlines_sizehint 0.275 0.280 using_fileinput 1.902 1.900 while_readline 0.769 0.770 From gstein at lyra.org Mon Jan 8 22:29:40 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 13:29:40 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108161534.A2392@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Jan 08, 2001 at 04:15:34PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> <20010108161534.A2392@kronos.cnri.reston.va.us> Message-ID: <20010108132940.G4141@lyra.org> On Mon, Jan 08, 2001 at 04:15:34PM -0500, Andrew Kuchling wrote: > On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: > >200 chars). (BTW, I'm still curious to know how that compares to the > >getc_unlocked hack on a platform other than Windows!) > > On Solaris and Linux, the results seemed to be lost in the noise. Your times are so small... I'd suggest do a few iterations within filetest.py so your margin of error isn't so noticable. Cheers, -g >... > Linux: w/o USE_MS_GETLINE_HACK > kronos Python-2.0>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.186 0.190 > readlines_sizehint 0.108 0.110 > using_fileinput 0.447 0.450 > while_readline 0.184 0.180 > > Linux w/ USE_MS_GETLINE_HACK: > kronos Python-2.0>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.178 0.180 > readlines_sizehint 0.108 0.110 > using_fileinput 0.434 0.430 > while_readline 0.183 0.190 > Solaris w/o USE_MS_GETLINE_HACK: > amarok src>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.640 0.630 > readlines_sizehint 0.278 0.280 > using_fileinput 1.874 1.820 > while_readline 0.839 0.840 > > Solaris w/ USE_MS_GETLINE_HACK: > amarok src>./python ~/filetest.py > total 1559913 chars and 32513 lines > count_chars_lines 0.569 0.570 > readlines_sizehint 0.275 0.280 > using_fileinput 1.902 1.900 > while_readline 0.769 0.770 -- Greg Stein, http://www.lyra.org/ From thomas at xs4all.net Mon Jan 8 22:59:17 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 22:59:17 +0100 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <20010108165758.B9260@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Jan 08, 2001 at 04:57:58PM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108214557.H402@xs4all.nl> <20010108165758.B9260@kronos.cnri.reston.va.us> Message-ID: <20010108225916.P2467@xs4all.nl> On Mon, Jan 08, 2001 at 04:57:58PM -0500, Andrew Kuchling wrote: > Umm... the proposed PEP 229 patch would compile a Python binary with > sre, posix, and strop statically linked; this minimal Python is then > used to run the setup.py script. You shouldn't require a preinstalled > Python, though the current version of the patch doesn't meet this > requirement yet. Apologies. I should've bothered to read the PEP first, but I haven't found the time yet :P I retract all my comments on the subject until I do. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Mon Jan 8 23:08:50 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 8 Jan 2001 23:08:50 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Tue, Jan 09, 2001 at 02:03:00AM +0200 References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com>, <200101081433.JAA03185@cj20424-a.reston1.va.home.com>, <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> <200101081515.KAA03474@cj20424-a.reston1.va.home.com> <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> Message-ID: <20010108230850.Q2467@xs4all.nl> On Tue, Jan 09, 2001 at 02:03:00AM +0200, Moshe Zadka wrote: > > (2) Under exactly what circumstances do you want from foo import * > > issue a warning? > All. > If you want to be less extreme, don't warn if the module defines > a __from_star_ok__ We already have a perfectly acceptable way of turning off warnings in particular circumstances. I'm +1 on warning against using 'from spam import *' by the way, though it would be even better (+2!) if there was a 'import * considered harmful' page/chapter in the documentation somewhere, so we could point to it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Mon Jan 8 23:23:02 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 08 Jan 2001 17:23:02 -0500 Subject: [Python-Dev] Extending startup code: PEP needed? In-Reply-To: Your message of "Mon, 08 Jan 2001 21:38:00 +0100." <3A5A2528.C289BE1D@lemburg.com> References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> <3A5A2528.C289BE1D@lemburg.com> Message-ID: <200101082223.RAA05858@cj20424-a.reston1.va.home.com> > I was thinking an attack where knowledge of common temporary > execution locations is used to trick Python into executing > untrusted code -- the untrusted code would only have to be > copied to the known temporary execution directory and then > gets executed by Python next time the program using the temporary > location is invoked. When does Python execute code from a predictable common temporary location? When is that likely to be used from a Python script running as root? Note that if you use tempfile.TemporaryFile(), you can create a temporary file that's not subvertible. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Jan 8 23:35:17 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 8 Jan 2001 17:35:17 -0500 (EST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010108230850.Q2467@xs4all.nl> References: <200101081515.KAA03474@cj20424-a.reston1.va.home.com> <200101081433.JAA03185@cj20424-a.reston1.va.home.com> <20010107232532.V17220@lyra.org> <20010108002603.X17220@lyra.org> <20010108231321.AB08FA82D@darjeeling.zadka.site.co.il> <20010109000300.DF2A5A82D@darjeeling.zadka.site.co.il> <20010108230850.Q2467@xs4all.nl> Message-ID: <14938.16549.944123.917467@cj42289-a.reston1.va.home.com> Thomas Wouters writes: > *' by the way, though it would be even better (+2!) if there was a 'import * > considered harmful' page/chapter in the documentation somewhere, so we could > point to it. Care to write it? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From MarkH at ActiveState.com Tue Jan 9 00:00:01 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 8 Jan 2001 15:00:01 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? In-Reply-To: <3A5A05DA.86B3EB86@interet.com> Message-ID: > Limiting the code to pythonw.exe instead of trying to install > it in python20.dll was supposed to prevent damage to the use > of Python in servers. Since pythonw.exe is a Windows (GUI) program, > I am assuming there is a screen. Sometimes _no_ screen at all is wanted - ie, no main GUI window, and no console window. pythonw is used in this case. COM uses pythonw.exe in just this way, and when executed by DCOM, it will be executed in a context where the user can not see any such dialog. However, I would be happy to ensure the correct command-line is used to prevent this behaviour in this case. Indeed, in _every_ case I use pythonw.exe I would disable this - but I accept that other users have simpler requirements. > Having said that, you may be right that there is some way to > hang on a dialog box which can not be seen. It depends on what > MessageBox() and GetForegroundWindow() actually do. If it seems > that this patch has merit, I would be grateful if you would review > the code to look for issues of this type. There will be no issues in the code - it is just that Win2k will execute in a different "workspace" (I think that is the term). This is identical to the problem of a service attempting to display a messagebox - the code is perfect and works perfectly - just in a context where noone can see it, or dismiss it. > > I would prefer to see a decent API for extracting error and traceback > > information from Python. On the other hand, I _do_ see the problem for > > "newbies" trying to use pythonw.exe. > > There could be an API added to the winstdout module such as > msg = winstdout.GetMessageText() > which would return saved text, control its display etc. I was thinking more of a "Py_GetTraceback()", which would return a complete exception string. Thus, embedders could write code similar to: whatever = Py_BuildValue(...); ret = PyObject_Call(foo, whatever); ... if (!ok) { char *text = Py_GetTraceback(); MsgBox(text); } Thus, with only a small amount of work, they have _complete_ control over the output. However, I agree this doesnt really solve pythonw.exe's problems. > I do not view winstdout as a "newbie" feature, but rather a > generally useful C-language addition to Python. Hrm. I dont believe a commercial app, for example, would find this suitable - they would roll their own solution. Hence I see this purely for newbie users. Advanced users have complete control now - a simple try/except block around their main code, and you are pretty good. A builtin module for displaying a messagebox is as robust as an experienced user needs to emulate this, IMO. > I guess I am saying, perhaps incorrectly, that the mechanism provided > will make further redirection of sys.stdout unnecessary 99% of the > time. Yes, I disagree here. IMO it is no good for a commercial, real app. As I said, I see this as a feature so the newbie will not believe pythonw.exe is broken. Advanced users can already do similar things themselves. > Experimentation shows that Python composes tracebacks and > error messages a line or partial line at a time. That is, you can > not display each call to printf(), but must wait until the system is > idle to be sure that multiple calls to printf() are complete. So this > forces you to use the idle processing loop, not rocket science but > at least inconvenient. What "idle processing loop"? > And the only source of stdout/err is tracebacks, > error messages and the "print" statement. What would you do with > these in a Windows program except display an "OK" dialog box? Log the error to a file, and display a "friendly" dialog - possibly offering to automatically submit a support request/bug report. The casual user is going to be _very_ scared by a Python traceback. This is a sin of a similar magnitude to those crappy applications with unhandled VB exceptions. IMO, nothing looks more unprofessional than an app that displays an internal VB error message. Python is no different IMO. For real applications, there is a good chance that the majority of your users have never heard of Python. Thus, I don't believe your solution suitable for the real, professional, commercial user. However, I agree that your solution does not prevent this user doing the "right thing"... But all this does keep me believing this is a "newbie" helper. > > If someone out there knows of a different example of sys.stdout > redirection in use in the real world, it would be helpful if > they would describe it. Maybe it could be incorporated. Sure. Komodo to a file with a friendly dialog (sometimes ;-). Pythonwin actually attempts a few things first - eg, not every exception Pythonwin casues at startup should be logged. Python services write unhandled errors to the event log. I don't believe I have worked on 2 projects with the same requirement here!!! Mark. From nas at arctrix.com Mon Jan 8 17:22:10 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 8 Jan 2001 08:22:10 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: ; from tim.one@home.com on Mon, Jan 08, 2001 at 03:59:40PM -0500 References: <3A59F00E.53A0A32A@ActiveState.com> Message-ID: <20010108082210.A16149@glacier.fnational.com> On Mon, Jan 08, 2001 at 03:59:40PM -0500, Tim Peters wrote: > My line-at-a-time test case used (rounding to nearest whole integers) 30 > seconds in Python and 6 in Perl. The result of testing many changes to > Python's implementation was that the excess 24 seconds broke down like so: > > 17 spent inside internal MS threadsafe getc() lock/unlock > routines > 5 uncertain, but evidence suggests much of it due to MS > malloc/realloc (Perl does its own memory mgmt) > 2 for not copying directly out of the platform FILE* > implementation struct in a highly optimized loop (like > Perl does) Have you tried pymalloc? Neil From billtut at microsoft.com Tue Jan 9 01:38:14 2001 From: billtut at microsoft.com (Bill Tutt) Date: Mon, 8 Jan 2001 16:38:14 -0800 Subject: [Python-Dev] Create a synthetic stdout for Windows? Message-ID: <58C671173DB6174A93E9ED88DCB0883D0A6202@red-msg-07.redmond.corp.microsoft.com> > From: Mark Hammond [mailto:MarkH at ActiveState.com] > There will be no issues in the code - it is just that Win2k will execute in > a different "workspace" (I think that is the term). This is identical to > the problem of a service attempting to display a messagebox - the code is > perfect and works perfectly - just in a context where noone can see it, or > dismiss it. The term Mark is looking for here is Windowstation, and it's an NT thing, not just a Win2k thing. Windowstations have been around for ages. Bill From ping at lfw.org Tue Jan 9 02:51:15 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 8 Jan 2001 17:51:15 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <14938.6594.44596.509259@beluga.mojam.com> Message-ID: On Mon, 8 Jan 2001, Skip Montanaro wrote: > Okay, how about this as a compromise first step? Allow programmers to put > __exports__ lists in their modules but don't do anything with them *except* > modify dir() to respect that if it exists? I'd say: Just have dir() and import * pay attention to __exports__. Don't mess with getattr or __dict__. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From ping at lfw.org Tue Jan 9 03:00:08 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 8 Jan 2001 18:00:08 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A59F27D.C27B8CD0@ActiveState.com> Message-ID: On Mon, 8 Jan 2001, Paul Prescod wrote: > dir() is one of the "interactive tools" I'd like to work better in the > presence of __exports__. On the other hand, dir() works pretty poorly > for object instances today so maybe we need something new anyhow. I suggest a built-in function "methods()" that works like this: def methods(obj): if type(obj) is InstanceType: return methods(obj.__class__) results = [] if hasattr(obj, '__bases__'): for base in obj.__bases__: results.extend(methods(base)) results.extend( filter(lambda k, o=obj: type(getattr(o, k)) in [MethodType, BuiltinMethodType], dir(obj))) return unique(results) def unique(seq): dict = {} for item in seq: dict[item] = 1 results = dict.keys() results.sort() return results >>> import sys >>> >>> methods(sys.stdin) ['close', 'fileno', 'flush', 'isatty', 'read', 'readinto', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines'] >>> >>> import SocketServer >>> >>> methods(SocketServer.ForkingTCPServer) ['__init__', 'collect_children', 'fileno', 'finish_request', 'get_request', 'handle_error', 'handle_request', 'process_request', 'serve_forever', 'server_activate', 'server_bind', 'verify_request'] >>> -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose From gstein at lyra.org Tue Jan 9 03:20:56 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 8 Jan 2001 18:20:56 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.102,2.103 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, Jan 08, 2001 at 06:00:13PM -0800 References: Message-ID: <20010108182056.C4640@lyra.org> On Mon, Jan 08, 2001 at 06:00:13PM -0800, Guido van Rossum wrote: >... > Modified Files: > fileobject.c > Log Message: > Tsk, tsk, tsk. Treat FreeBSD the same as the other BSDs when defining > a fallback for TELL64. Fixes SF Bug #128119. >... > *** fileobject.c 2001/01/08 04:02:07 2.102 > --- fileobject.c 2001/01/09 02:00:11 2.103 > *************** > *** 59,63 **** > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > /* NOTE: this is only used on older > NetBSD prior to f*o() funcions */ > --- 59,63 ---- > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > /* NOTE: this is only used on older > NetBSD prior to f*o() funcions */ All of those #ifdefs could be tossed and it would be more robust (long term) if an autoconf macro were used to specify when TELL64 should be defined. [ I've looked thru fileobject.c and am a bit confused: the conditions for defining TELL64 do not match the conditions for *using* it. that would seem to imply a semantic error somewhere and/or a potential gotcha when they get skewed (like I assume what happened to FreeBSD). simplifying with an autoconf macro may help to rationalize it. ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Tue Jan 9 05:29:02 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:29:02 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108161534.A2392@kronos.cnri.reston.va.us> Message-ID: [Andrew Kuchling] I'll chop everything except while_readline (which is most affected by this stuff): > Linux: w/o USE_MS_GETLINE_HACK > while_readline 0.184 0.180 > > Linux w/ USE_MS_GETLINE_HACK: > while_readline 0.183 0.190 > > Solaris w/o USE_MS_GETLINE_HACK: > while_readline 0.839 0.840 > > Solaris w/ USE_MS_GETLINE_HACK: > while_readline 0.769 0.770 So it's probably a wash. In that case, do we want to maintain two hacks for this? I can't use the FLOCKFILE/etc approach on Windows, while "the Windows" approach probably works everywhere (although its speed relies on the platform factoring out at least the locking/unlocking in fgets). Both methods lack a refinement I would like to see, but can't achieve in "the Windows way": ensure that consistency is on no worse than a per-line basis. Right now, both methods lock/unlock the file only for the extent of the current buffer size, so that two threads *can* get back different interleaved pieces of a single long line. Like so: import thread def read(f): x = f.readline() print "thread saw " + `len(x)` + " chars" m.release() f = open("ga", "w") # a file with one long line f.write("x" * 100000 + "\n") f.close() m = thread.allocate_lock() for i in range(10): print i f = open("ga", "r") m.acquire() thread.start_new_thread(read, (f,)) x = f.readline() print "main saw " + `len(x)` + " chars" m.acquire(); m.release() f.close() Here's a typical run on Windows (current CVS Python): 0 main saw 95439 chars thread saw 4562 chars 1 main saw 97941 chars thread saw 2060 chars 2 thread saw 43801 chars main saw 56200 chars 3 thread saw 8011 chars main saw 91990 chars 4 main saw 46546 chars thread saw 53455 chars 5 thread saw 53125 chars main saw 46876 chars 6 main saw 98638 chars thread saw 1363 chars 7 main saw 72121 chars thread saw 27880 chars 8 thread saw 70031 chars main saw 29970 chars 9 thread saw 27555 chars main saw 72446 chars So, yes, it's threadsafe now: between them, the threads always see a grand total of 100001 characters. But what friggin' good is that ? If, e.g., Guido wants multiple threads to chew over his giant logfile, there's no guarantee that .readline() ever returns an actual line from the file. Not that Python 2.0 was any better in this respect ... From tim.one at home.com Tue Jan 9 05:48:25 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:48:25 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <20010108082210.A16149@glacier.fnational.com> Message-ID: [Tim] > 5 uncertain, but evidence suggests much of it due to MS > malloc/realloc (Perl does its own memory mgmt) [NeilS] > Have you tried pymalloc? Not recently, and don't expect to find time for it this week. IIRC, Vladimir did get significant speedups-- lo those many years ago! --when he tried it on Windows, though. Maybe (or maybe not) that was due to exploiting the global lock (i.e., exploiting that pymalloc didn't need to do its own serialization, when called from the Python core). From tim.one at home.com Tue Jan 9 05:52:25 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 8 Jan 2001 23:52:25 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [Tim] > ... > Here's a typical run on Windows (current CVS Python): > > 0 > main saw 95439 chars > thread saw 4562 chars > 1 > main saw 97941 chars > thread saw 2060 chars > 2 > thread saw 43801 chars > main saw 56200 chars > 3 > thread saw 8011 chars > main saw 91990 chars > 4 > main saw 46546 chars > thread saw 53455 chars > 5 > thread saw 53125 chars > main saw 46876 chars > 6 > main saw 98638 chars > thread saw 1363 chars > 7 > main saw 72121 chars > thread saw 27880 chars > 8 > thread saw 70031 chars > main saw 29970 chars > 9 > thread saw 27555 chars > main saw 72446 chars Oops! I lied. That was the released 2.0. Current CVS is either better or worse, depending on whether you think "working" by accident more often is a good thing or leads to false confidence : 0 main saw 100001 chars thread saw 0 chars 1 main saw 100001 chars thread saw 0 chars 2 main saw 100001 chars thread saw 0 chars 3 main saw 100001 chars thread saw 0 chars 4 main saw 100001 chars thread saw 0 chars 5 thread saw 25802 chars main saw 74199 chars 6 thread saw 802 chars main saw 99199 chars 7 main saw 100001 chars thread saw 0 chars 8 main saw 100001 chars thread saw 0 chars 9 main saw 100001 chars thread saw 0 chars From mal at lemburg.com Tue Jan 9 08:23:42 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 09 Jan 2001 08:23:42 +0100 Subject: [Python-Dev] Extending startup code: PEP needed? References: <200101081751.SAA08918@pandora.informatik.hu-berlin.de> <3A5A02AA.675A35D1@lemburg.com> <200101081833.NAA05325@cj20424-a.reston1.va.home.com> <3A5A09A5.D0DC33A1@lemburg.com> <200101081936.OAA05440@cj20424-a.reston1.va.home.com> <3A5A2528.C289BE1D@lemburg.com> <200101082223.RAA05858@cj20424-a.reston1.va.home.com> Message-ID: <3A5ABC7E.E953962B@lemburg.com> Guido van Rossum wrote: > > > I was thinking an attack where knowledge of common temporary > > execution locations is used to trick Python into executing > > untrusted code -- the untrusted code would only have to be > > copied to the known temporary execution directory and then > > gets executed by Python next time the program using the temporary > > location is invoked. > > When does Python execute code from a predictable common temporary > location? When is that likely to be used from a Python script running > as root? > > Note that if you use tempfile.TemporaryFile(), you can create a > temporary file that's not subvertible. It's not Python itself that's running temporary files. Tools like distutils, RPM, etc. tend to run Python code in temporary locations during build stages. That's what I was thinking about. OTOH, root should know where these tools run their code, so I guess it's moot to discuss who's fault this really is, e.g. distutils style distributions should never be unzipped to /tmp for subsequent installation, but nobody will prevent root from doing so. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Tue Jan 9 08:35:09 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 9 Jan 2001 02:35:09 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101031237.HAA19244@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Are you sure Perl still uses stdio at all? I've got solid answers now, but I'll paraphrase them anonymously to save the bother of untangling multi-person email etiquette snarls: + Yes, Perl uses platform stdio. Usually. Yes on Windows anyway. + But Perl "cheats" on Windows (well, everywhere it can ...), as I've explained in great detail half a dozen times over the years. No reason to retract any of that. + The cheating is not thread-safe. + The last stab at threads accessible from Perl was an experiment that got dropped. There are no user-muckable threads in std Perl builds. + But there is a notion of threads available at the C level. + This latter notion of threads is used to implement Perl's fork() on Windows, so can be exploited to test Windows Perl thread safety without writing a Perl extension module in C. + This Perl program (very much like the 2-threaded one I just posted for Python) uses that trick: ------------------------------------------------------------------- sub counter { my $nc = 0; while () { $nc += length; } print "num bytes seen = $nc\n"; } open(FILE, "ga"); binmode FILE; fork(); &counter(); ------------------------------------------------------------------- Under the covers, that really shares the FILE filehandle on Windows via threads. Running it multiple times yields multiple wild results; the number of bytes seen by parent and child rarely sum to the number of bytes actually in the input file ("ga"). The most common output for me is that one thread sees the entire file, while the other sees "a lot" of it (since the Perl inner loop registerizes its FILE* struct member shadows for as long as possible, that's actually what I expected). So the code is exactly as thread-unsafe as it looked. bosses-demand-answers-but-they-forget-their-questions-ly y'rs - tim From guido at python.org Tue Jan 9 14:41:24 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 09 Jan 2001 08:41:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 08 Jan 2001 23:29:02 EST." References: Message-ID: <200101091341.IAA09132@cj20424-a.reston1.va.home.com> > So it's probably a wash. In that case, do we want to maintain two hacks for > this? I can't use the FLOCKFILE/etc approach on Windows, while "the > Windows" approach probably works everywhere (although its speed relies on > the platform factoring out at least the locking/unlocking in fgets). I'm much more confident about the getc_unlocked() approach than about fgets() -- with the latter we need much more faith in the C library implementers. (E.g. that fgets() never writes beyond the null bytes it promises, and that it locks/unlocks only once.) Also, you're relying on blindingly fast memchr() and memset() implementations. > Both methods lack a refinement I would like to see, but can't achieve in > "the Windows way": ensure that consistency is on no worse than a per-line > basis. [Example omitted] The only portable way to ensure this that I can see, is to have a separate mutex in the Python file object. Since this is hardly a common thing to do, I think it's better to let the application manage that lock if they need it. (Then why are we bothering with flockfile(), you may ask? Because otherwise, accidental multithreaded reading from the same file could cause core dumps.) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Tue Jan 9 16:48:13 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 9 Jan 2001 10:48:13 -0500 Subject: [Python-Dev] Python 2.1 release schedule (PEP 226) In-Reply-To: <200101051529.KAA19100@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 05, 2001 at 10:29:05AM -0500 References: <200101051529.KAA19100@cj20424-a.reston1.va.home.com> Message-ID: <20010109104813.D6203@kronos.cnri.reston.va.us> On Fri, Jan 05, 2001 at 10:29:05AM -0500, Guido van Rossum wrote: > S 222 pep-0222.txt Web Library Enhancements Kuchling > > This is really up to Andrew. It seems he plans to create > new modules, so he won't be introducing incompatibilities in > existing APIs. I don't think PEP 222 will be worked on for 2.1; there have only been a few reactions, and none at all on the python-web-modules mailing list, so I don't think anyone really cares very much at this point. Maybe for 2.2, or maybe I'll just write new classes for Quixote. That leaves PEP 229 as the only PEP I need to work on for 2.1. --amk From tim.one at home.com Tue Jan 9 22:12:42 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 9 Jan 2001 16:12:42 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101091341.IAA09132@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I'm much more confident about the getc_unlocked() approach than about > fgets() -- with the latter we need much more faith in the C library > implementers. (E.g. that fgets() never writes beyond the null bytes > it promises, and that it locks/unlocks only once.) Also, you're > relying on blindingly fast memchr() and memset() implementations. Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a bit quicker on Solaris, despite that it's paying an extra layer of function call per line, to keep it out of get_line proper). That tells me the assumptions are indeed mild. The business about not writing beyond the null byte is a concern only I would have raised: the possibility is an aggressively paranoid reading of the std (I do *lots* of things with libc I'm paranoid about <0.9 wink>). If even *Microsoft* didn't blow these things, it's hard to imagine any other vendor exploding ... Still, I'd rather get rid of ms_getline_hack if I could, because the code is so much more complicated. >> Both methods lack a refinement I would like to see, but can't >> achieve in "the Windows way": ensure that consistency is on no >> worse than a per-line basis. [Example omitted] > The only portable way to ensure this that I can see, is to have a > separate mutex in the Python file object. Since this is hardly a > common thing to do, I think it's better to let the application manage > that lock if they need it. Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the file locked until the line was complete, and I wouldn't be opposed to making life saner on platforms that allow it. But there's another problem here: part of the reason we release Python threads around the fgets is in case some other thread is trying to write the data we're trying to read, yes? But since FLOCKFILE is in effect, other threads *trying* to write to the stream we're reading will get blocked anyway. Seems to give us potential for deadlocks. > (Then why are we bothering with flockfile(), you may ask? I wouldn't ask that, no . > Because otherwise, accidental multithreaded reading from the same > file could cause core dumps.) Ugh ... turns out that on my box I can provoke core dumps anyway, with this program. Blows up under released 2.0 and CVS Pythons (so it's not due to anything new): import thread def read(f): import time time.sleep(.01) n = 0 while n < 1000000: x = f.readline() n += len(x) print "r", print "read " + `n` m.release() m = thread.allocate_lock() f = open("ga", "w+") print "opened" m.acquire() thread.start_new_thread(read, (f,)) n = 0 x = "x" * 113 + "\n" while n < 1000000: f.write(x) print "w", n += len(x) m.acquire() print "done" Typical run: C:\Python20>\code\python\dist\src\pcbuild\python temp.py opened w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r r w r w r w r w r w r and then it dies in msvcrt.dll with a bad pointer. Also dies under the debugger (yay!) ... always dies like so: + We (Python) call the MS fwrite, from fileobject.c file_write. + MS fwrite succeeds with its _lock_str(stream) call. + MS fwrite then calls MS _fwrite_lk. + MS _fwrite_lk calls memcpy, which blows up for a non-obvious reason. Looks like the stream's _cnt member has gone mildly negative, which _fwrite_lk casts to unsigned and so treats like a giant positive count, and so memcpy eventually runs off the end of the process address space. Only thing I can conclude from this is that MS's internal stream-locking implementation is buggy. At least on W98SE. Other flavors of Windows? Other platforms? Note that I don't claim the program above is *sensible*, just that it shouldn't blow up. Alas, short of indeed adding a separate mutex in Python file objects-- or writing our own stdio --I don't believe I can fix this. the-best-thing-to-do-with-threads-is-don't-ly y'rs - tim From fdrake at acm.org Tue Jan 9 23:58:49 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 9 Jan 2001 17:58:49 -0500 (EST) Subject: [Python-Dev] Updated development documentation Message-ID: <14939.38825.218757.535010@cj42289-a.reston1.va.home.com> I've just updated the development version of the documentation, but am not sure the automated notice got sent. This version contains a wide variety of smaller updates, plus added documentation on the fpectl and xreadlines modules. http://python.sourceforge.net/devel-docs/ -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From MarkH at ActiveState.com Wed Jan 10 01:00:03 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 9 Jan 2001 16:00:03 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: > Only thing I can conclude from this is that MS's internal stream-locking > implementation is buggy. At least on W98SE. Other flavors of Windows? > Other platforms? Same behaviour on Win2k for me. Mark. From tim.one at home.com Wed Jan 10 01:55:11 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 9 Jan 2001 19:55:11 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: Final report (I've spent way more time on this than I can afford already, so it's "final" by defn <0.3 wink>). We started here (on my Win98SE box, using Guido's test program): total 117615824 chars and 3237568 lines count_chars_lines 14.780 14.772 readlines_sizehint 9.390 9.375 using_fileinput 66.130 66.157 while_readline 30.380 30.337 Here's where we are today: total 117615824 chars and 3237568 lines count_chars_lines 14.670 14.667 readlines_sizehint 9.500 9.506 using_fileinput 28.670 28.708 while_readline 13.680 13.676 for_xreadlines 7.630 7.635 Same box, same input file, same test program except for this addition: def for_xreadlines(fn): f = open(fn, MODE) for line in xreadlines.xreadlines(f): pass f.close() This last is within 25% of Perl "while (<>)" speed, but-- unlike Perl --is thread-safe. Good show! The other speedups are nothing to snort at either. The strangest thing left to my eye is why xreadlines enjoys a significant advantage over the double-loop buffering method (readlines_sizehint) on my box; reducing the very large (1Mb) buffer in Guido's test program made no material difference to that. nothing's-ever-finished-but-everything-ends-ly y'rs - tim From tim.one at home.com Wed Jan 10 06:46:24 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 00:46:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Message-ID: [Tim] > Only thing I can conclude from this is that MS's internal stream- > locking implementation is buggy. At least on W98SE. Other flavors > of Windows? Other platforms? [Mark Hammond] > Same behaviour on Win2k for me. Thanks, Mark! I opened a bug on SF to record more clues: http://sourceforge.net/bugs/?func=detailbug&bug_id=128210&group_id=5470 I didn't assign it to anyone because-- best I can tell --there's nothing realistic we can do about it. Probably won't happen in practice anyway . there's-a-reason-thread-problems-pop-up-on-windows-first-but- ms-isn't-it-ly y'rs - tim From billtut at microsoft.com Wed Jan 10 10:10:51 2001 From: billtut at microsoft.com (Bill Tutt) Date: Wed, 10 Jan 2001 01:10:51 -0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> With a nice simple C test case from Tim, I've submitted this one to internal support. I'll let everybody know what happens when I know more. Bill -----Original Message----- From: Tim Peters [mailto:tim.one at home.com] Sent: Tuesday, January 09, 2001 9:46 PM To: python-dev at python.org Subject: RE: [Python-Dev] xreadlines : readlines :: xrange : range [Tim] > Only thing I can conclude from this is that MS's internal stream- > locking implementation is buggy. At least on W98SE. Other flavors > of Windows? Other platforms? [Mark Hammond] > Same behaviour on Win2k for me. Thanks, Mark! I opened a bug on SF to record more clues: http://sourceforge.net/bugs/?func=detailbug&bug_id=128210&group_id=5470 I didn't assign it to anyone because-- best I can tell --there's nothing realistic we can do about it. Probably won't happen in practice anyway . there's-a-reason-thread-problems-pop-up-on-windows-first-but- ms-isn't-it-ly y'rs - tim _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://www.python.org/mailman/listinfo/python-dev From m.favas at per.dem.csiro.au Wed Jan 10 12:57:56 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Wed, 10 Jan 2001 19:57:56 +0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint Message-ID: <3A5C4E44.23B593E9@per.dem.csiro.au> Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same behaviour as Tim's WinBox wrt the new xreadline and the double-loop readlines (so it's not just something funny with MS (not that there's not anything funny with MS...)): total 131426612 chars and 514216 lines count_chars_lines 5.450 5.066 readlines_sizehint 4.112 4.083 using_fileinput 10.928 10.916 while_readline 11.766 11.733 for_xreadlines 3.569 3.533 -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tismer at tismer.com Wed Jan 10 12:06:42 2001 From: tismer at tismer.com (Christian Tismer) Date: Wed, 10 Jan 2001 13:06:42 +0200 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A5C4242.E445C3A1@tismer.com> Ka-Ping Yee wrote: > > On Mon, 8 Jan 2001, Skip Montanaro wrote: > > Okay, how about this as a compromise first step? Allow programmers to put > > __exports__ lists in their modules but don't do anything with them *except* > > modify dir() to respect that if it exists? > > I'd say: Just have dir() and import * pay attention to __exports__. > Don't mess with getattr or __dict__. quadruple-nodd - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal at lemburg.com Wed Jan 10 14:21:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 10 Jan 2001 14:21:28 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> Message-ID: <3A5C61D8.2E5D098C@lemburg.com> Guido van Rossum wrote: > > Please have a look at this SF patch: > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > This implements control over which names defined in a module are > externally visible: if there's a variable __exports__ in the module, > it is a list of identifiers, and any access from outside the module to > names not in the list is disallowed. This affects access using the > getattr and setattr protocols (which raise AttributeError for > disallowed names), as well as "from M import v" (which raises > ImportError). Can't we use the existing attribute __all__ (this is currently only used for packages) for this kind of thing. As other have already remarked: I would rather like to see this attribute being used as basis for 'from M import *' rather than enforce the access restrictions like the patch suggests. Access control mechanisms should be treated in different ways such as wrapping objects using access-control proxies (see mx.Proxy for an example of such an implementation) and on-demand only. I wouldn't wan't to pay the performance hit for each and every lookup in all my Python applications just because someone out there feels that "from M import *" has a meaning in life apart from being useful in interactive sessions to ease typing ;-) > I like it. This has been asked for many times. Does anybody see a > reason why this should *not* be added? > > Tim remarked that introducing this will prompt demands for a similar > feature on classes and instances, where it will be hard to implement > without causing a bit of a slowdown. It causes a slight slowdown (an > extra dictionary lookup for each use of "M.v") even when it is not > used, but for accessing module variables that's acceptable. I'm not > so sure about instance variable references. Again, I'd rather see these implemented using different techniques which are under programmer control and made explicit and visible in the program flow. Proxies are ideal for these things, since they allow great flexibility while still providing reasonable security at Python level. I have been using the proxy approach for years now and so far with great success. What's even better is that weak references and garbage finalization aids come along with it for free. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Jan 10 16:12:56 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 10:12:56 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 09 Jan 2001 19:55:11 EST." References: Message-ID: <200101101512.KAA26193@cj20424-a.reston1.va.home.com> > The strangest thing left to my eye is why xreadlines enjoys a significant > advantage over the double-loop buffering method (readlines_sizehint) on my > box; reducing the very large (1Mb) buffer in Guido's test program made no > material difference to that. I was baffled at this too (same difference on my box), until I discovered that the buffer size is specified *twice*: once as a default in the arg list of readlines_sizehint(), then *again* in the call to timer() near the bottom of the file. Take the latter one out and the times are comparable, in fact readlines_sizehint() is a few percent quicker. --Guido van Rossum (home page: http://www.python.org/~guido/) From jim at interet.com Wed Jan 10 16:19:01 2001 From: jim at interet.com (James C. Ahlstrom) Date: Wed, 10 Jan 2001 10:19:01 -0500 Subject: [Python-Dev] Create a synthetic stdout for Windows? References: Message-ID: <3A5C7D65.780065C6@interet.com> Mark Hammond wrote: > Sometimes _no_ screen at all is wanted - ie, no main GUI window, and no > console window. pythonw is used in this case. COM uses pythonw.exe in just > this way, and when executed by DCOM, it will be executed in a context where > the user can not see any such dialog. > > However, I would be happy to ensure the correct command-line is used to > prevent this behaviour in this case. > > Indeed, in _every_ case I use pythonw.exe I would disable this - but I > accept that other users have simpler requirements. It would be easier to have a pythonw2.exe where this feature is built in, rather than a command line option. But see below. > > I do not view winstdout as a "newbie" feature, but rather a > > generally useful C-language addition to Python. > > Hrm. I dont believe a commercial app, for example, would find this > suitable - they would roll their own solution. ... > > I guess I am saying, perhaps incorrectly, that the mechanism provided > > will make further redirection of sys.stdout unnecessary 99% of the > > time. > > Yes, I disagree here. IMO it is no good for a commercial, real app. As I ... > > If someone out there knows of a different example of sys.stdout > > redirection in use in the real world, it would be helpful if > > they would describe it. Maybe it could be incorporated. > > Sure. Komodo to a file with a friendly dialog (sometimes ;-). ... > I don't believe I have worked on 2 projects with the same requirement > here!!! Well, that is the problem. Is this feature "generally useful"? I am writing Windows programs in which Python is the "main" and provides the GUI, so I find this useful. And I do show my users tracebacks. But perhaps this is unique to me. I don't see users of wxPython nor tkinter replying "great idea" so maybe they don't use pythonw. Absent more support, I don't think this idea has enough merit to justify a patch. JimA From guido at python.org Wed Jan 10 17:39:34 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:39:34 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 01:10:51 PST." <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> References: <58C671173DB6174A93E9ED88DCB0883DB863A8@red-msg-07.redmond.corp.microsoft.com> Message-ID: <200101101639.LAA26776@cj20424-a.reston1.va.home.com> > With a nice simple C test case from Tim, I've submitted this one to internal > support. > I'll let everybody know what happens when I know more. I bet you it's rejected on the basis of "the docs tell you not to mix reading and writing on the same stream without intervening seek or flush." If I were on the support line I would do that. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 10 17:38:16 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:38:16 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Tue, 09 Jan 2001 16:12:42 EST." References: Message-ID: <200101101638.LAA26759@cj20424-a.reston1.va.home.com> > [Guido] > > I'm much more confident about the getc_unlocked() approach than about > > fgets() -- with the latter we need much more faith in the C library > > implementers. (E.g. that fgets() never writes beyond the null bytes > > it promises, and that it locks/unlocks only once.) Also, you're > > relying on blindingly fast memchr() and memset() implementations. [Tim] > Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a > bit quicker on Solaris, despite that it's paying an extra layer of function > call per line, to keep it out of get_line proper). That tells me the > assumptions are indeed mild. The business about not writing beyond the null > byte is a concern only I would have raised: the possibility is an > aggressively paranoid reading of the std (I do *lots* of things with libc > I'm paranoid about <0.9 wink>). If even *Microsoft* didn't blow these > things, it's hard to imagine any other vendor exploding ... > > Still, I'd rather get rid of ms_getline_hack if I could, because the code is > so much more complicated. Which is another argument to prefer the getc_unlocked() code when it works -- it's obviously correct. :-) > >> Both methods lack a refinement I would like to see, but can't > >> achieve in "the Windows way": ensure that consistency is on no > >> worse than a per-line basis. [Example omitted] > > > The only portable way to ensure this that I can see, is to have a > > separate mutex in the Python file object. Since this is hardly a > > common thing to do, I think it's better to let the application manage > > that lock if they need it. > > Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the > file locked until the line was complete, and I wouldn't be opposed to making > life saner on platforms that allow it. Hm... That would be possible, except for one unfortunate detail: _PyString_Resize() may call PyErr_BadInternalCall() which touches thread state. > But there's another problem here: > part of the reason we release Python threads around the fgets is in case > some other thread is trying to write the data we're trying to read, yes? NO, NO NO! Mixing reads and writes on the same stream wasn't what we are locking against at all. (As you've found out, it doesn't even work.) We're only trying to protect against concurrent *reads*. > But since FLOCKFILE is in effect, other threads *trying* to write to the > stream we're reading will get blocked anyway. Seems to give us potential > for deadlocks. Only if tyeh are holding other locks at the same time. I haven't done a thorough survey of fileobject.c, but I've skimmed it, I believe it's religious about releasing the Global Interpreter Lock around I/O calls. But, of course, 3rd party C code might not be. > > (Then why are we bothering with flockfile(), you may ask? > > I wouldn't ask that, no . > > > Because otherwise, accidental multithreaded reading from the same > > file could cause core dumps.) > > Ugh ... turns out that on my box I can provoke core dumps anyway, with this > program. Blows up under released 2.0 and CVS Pythons (so it's not due to > anything new): Yeah. But this is insane use -- see my comments on SF. It's only worth fixing because it could be used to intentionally crash Python -- but there are easier ways... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Wed Jan 10 17:41:47 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 10 Jan 2001 10:41:47 -0600 (CST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? Message-ID: <14940.37067.893679.750918@beluga.mojam.com> I just noticed that the "Environment" options for Python on the SF site are listed as Console (Text Based), Win32 (MS Windows), X11 Applications Shouldn't something Macintosh-related be in that list as well? Skip From guido at python.org Wed Jan 10 17:53:16 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 11:53:16 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Wed, 10 Jan 2001 14:21:28 +0100." <3A5C61D8.2E5D098C@lemburg.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> Message-ID: <200101101653.LAA28986@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > Please have a look at this SF patch: > > > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > > > This implements control over which names defined in a module are > > externally visible: if there's a variable __exports__ in the module, > > it is a list of identifiers, and any access from outside the module to > > names not in the list is disallowed. This affects access using the > > getattr and setattr protocols (which raise AttributeError for > > disallowed names), as well as "from M import v" (which raises > > ImportError). [Marc-Andre] > Can't we use the existing attribute __all__ (this is currently > only used for packages) for this kind of thing. As other have already > remarked: I would rather like to see this attribute being used > as basis for 'from M import *' rather than enforce the access > restrictions like the patch suggests. Yes -- I came up with the same thought. So here's a plan: somebody please submit a patch that does only one thing: from...import * looks for __all__ and if it exists, imports exactly those names. No changes to dir(), or anything. > Access control mechanisms should be treated in different ways > such as wrapping objects using access-control proxies (see mx.Proxy > for an example of such an implementation) and on-demand only. > I wouldn't wan't to pay the performance hit for each and every > lookup in all my Python applications just because someone out > there feels that "from M import *" has a meaning in life > apart from being useful in interactive sessions to ease typing ;-) In the process of looking into Zope internals I've noticed that proxies are indeed very useful! I note that the IMPORT opcodes in ceval.c require that the imported module (as found in sys.modules[name] or returned by __import__()) is a real module object. I think this is unnecessary -- at least IMPORT_FROM should work even if the module is a proxy or some other thing (I've been known to smuggle class instances into sys.modules :-) and IMPORT_STAR should work with a non-module at least if it has an __all__ attribute. > > I like it. This has been asked for many times. Does anybody see a > > reason why this should *not* be added? > > > > Tim remarked that introducing this will prompt demands for a similar > > feature on classes and instances, where it will be hard to implement > > without causing a bit of a slowdown. It causes a slight slowdown (an > > extra dictionary lookup for each use of "M.v") even when it is not > > used, but for accessing module variables that's acceptable. I'm not > > so sure about instance variable references. > > Again, I'd rather see these implemented using different > techniques which are under programmer control and made > explicit and visible in the program flow. Proxies are ideal > for these things, since they allow great flexibility while > still providing reasonable security at Python level. > > I have been using the proxy approach for years now and > so far with great success. What's even better is that > weak references and garbage finalization aids come along with > it for free. Agreed. Which reminds me -- would you mind reviewing Fred's new version of PEP 205 (weak refs)? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Jan 10 18:12:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 10 Jan 2001 18:12:20 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <3A5C97F4.945D0C1@lemburg.com> Guido van Rossum wrote: > > > Guido van Rossum wrote: > > > > > > Please have a look at this SF patch: > > > > > > http://sourceforge.net/patch/?func=detailpatch&patch_id=102808&group_id=5470 > > > > > > This implements control over which names defined in a module are > > > externally visible: if there's a variable __exports__ in the module, > > > it is a list of identifiers, and any access from outside the module to > > > names not in the list is disallowed. This affects access using the > > > getattr and setattr protocols (which raise AttributeError for > > > disallowed names), as well as "from M import v" (which raises > > > ImportError). > > [Marc-Andre] > > Can't we use the existing attribute __all__ (this is currently > > only used for packages) for this kind of thing. As other have already > > remarked: I would rather like to see this attribute being used > > as basis for 'from M import *' rather than enforce the access > > restrictions like the patch suggests. > > Yes -- I came up with the same thought. Sorry, I didn't read the whole thread on the topic. Rereading the above paragraph I guess I should have had some more coffee at the time of writing ;-) > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. +1 -- this won't be me though (at least not this week). > > Access control mechanisms should be treated in different ways > > such as wrapping objects using access-control proxies (see mx.Proxy > > for an example of such an implementation) and on-demand only. > > I wouldn't wan't to pay the performance hit for each and every > > lookup in all my Python applications just because someone out > > there feels that "from M import *" has a meaning in life > > apart from being useful in interactive sessions to ease typing ;-) > > In the process of looking into Zope internals I've noticed that > proxies are indeed very useful! > > I note that the IMPORT opcodes in ceval.c require that the imported > module (as found in sys.modules[name] or returned by __import__()) is > a real module object. I think this is unnecessary -- at least > IMPORT_FROM should work even if the module is a proxy or some other > thing (I've been known to smuggle class instances into sys.modules :-) > and IMPORT_STAR should work with a non-module at least if it has an > __all__ attribute. Cool. This could make Python instances usable as "modules" -- with full getattr() hook support ! For IMPORT_STAR I'd suggest first looking for __all__ and then reverting to __dict__.items() in case this fails. BTW, is __dict__ needed by the import mechanism or would the getattr/setattr slots suffice ? And if yes, must it be a real Python dictionary ? > > > I like it. This has been asked for many times. Does anybody see a > > > reason why this should *not* be added? > > > > > > Tim remarked that introducing this will prompt demands for a similar > > > feature on classes and instances, where it will be hard to implement > > > without causing a bit of a slowdown. It causes a slight slowdown (an > > > extra dictionary lookup for each use of "M.v") even when it is not > > > used, but for accessing module variables that's acceptable. I'm not > > > so sure about instance variable references. > > > > Again, I'd rather see these implemented using different > > techniques which are under programmer control and made > > explicit and visible in the program flow. Proxies are ideal > > for these things, since they allow great flexibility while > > still providing reasonable security at Python level. > > > > I have been using the proxy approach for years now and > > so far with great success. What's even better is that > > weak references and garbage finalization aids come along with > > it for free. > > Agreed. Which reminds me -- would you mind reviewing Fred's new > version of PEP 205 (weak refs)? I'll have a look at it next week. Is that OK ? > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Wed Jan 10 18:37:58 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 10 Jan 2001 12:37:58 -0500 (EST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: <14940.37067.893679.750918@beluga.mojam.com> References: <14940.37067.893679.750918@beluga.mojam.com> Message-ID: <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > I just noticed that the "Environment" options for Python on the SF site are > listed as > > Console (Text Based), Win32 (MS Windows), X11 Applications > > Shouldn't something Macintosh-related be in that list as well? Are the maintainers of the MacOS port using the SF bug tracker or something else? If they're using it, then by all means we should add it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas at xs4all.net Wed Jan 10 19:06:06 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 10 Jan 2001 19:06:06 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,NONE,1.1 Setup.dist,1.3,1.4 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Tue, Jan 09, 2001 at 01:46:53PM -0800 References: Message-ID: <20010110190606.T2467@xs4all.nl> On Tue, Jan 09, 2001 at 01:46:53PM -0800, Guido van Rossum wrote: > static void > xreadlines_dealloc(PyXReadlinesObject *op) { > Py_XDECREF(op->file); > Py_XDECREF(op->lines); > PyObject_DEL(op); > } I'm confuzzled. Is this breach of the style guidelines intentional, accidental, or just not cared enough about ? The style isn't even consistent in that single module! > void > initxreadlines(void) > { > PyObject *m; > > m = Py_InitModule("xreadlines", xreadlines_methods); > } -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at mojam.com Wed Jan 10 19:11:52 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 10 Jan 2001 12:11:52 -0600 (CST) Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> References: <14940.37067.893679.750918@beluga.mojam.com> <14940.40438.1654.487682@cj42289-a.reston1.va.home.com> Message-ID: <14940.42472.174920.866172@beluga.mojam.com> Fred> Are the maintainers of the MacOS port using the SF bug tracker or Fred> something else? If they're using it, then by all means we should Fred> add it. Even if they aren't, I think it would be valuable to list. There aren't all that many tools (open source or otherwise) that run on Unix, Windows and Mac and can be used as either a console app or a GUI. I assume the reason Fred asks is that the Environment: list is generated on-the-fly and somehow ties into use of the SF bug tracker. Skip From thomas at xs4all.net Wed Jan 10 19:45:44 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 10 Jan 2001 19:45:44 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101101653.LAA28986@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 10, 2001 at 11:53:16AM -0500 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010110194544.V2467@xs4all.nl> On Wed, Jan 10, 2001 at 11:53:16AM -0500, Guido van Rossum wrote: > I note that the IMPORT opcodes in ceval.c require that the imported > module (as found in sys.modules[name] or returned by __import__()) is > a real module object. I think this is unnecessary -- at least > IMPORT_FROM should work even if the module is a proxy or some other > thing (I've been known to smuggle class instances into sys.modules :-) > and IMPORT_STAR should work with a non-module at least if it has an > __all__ attribute. Hmm.... Have you been sneaking looks at python-list again, Guido ? :-) I'm certain the expanding of IMPORT would make a lot of people very happy. Alex Martelli only just discovered the fact you can populate sys.modules yourself, with non-module objects, and was wondering about its legality and compatibility. I, for one, am very +1 on the idea, also on MAL's idea to do our best in the IMPORT_STAR case (try dict.items(), etc.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed Jan 10 19:49:40 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 13:49:40 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101101512.KAA26193@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > The strangest thing left to my eye is why xreadlines enjoys a > significant advantage over the double-loop buffering method > (readlines_sizehint) on my box; reducing the very large > (1Mb) buffer in Guido's test program made no material difference > to that. [Guido] > I was baffled at this too (same difference on my box), until I > discovered that the buffer size is specified *twice*: once as a > default in the arg list of readlines_sizehint(), then *again* in > the call to timer() near the bottom of the file. Bingo! > Take the latter one out and the times are comparable, in fact > readlines_sizehint() is a few percent quicker. They're indistinguishable then on my box (on one run xreadlines is .1 seconds (out of around 7.6 total) quicker, on another readlines_sizehint), *provided* that I specify the same buffer size (8192) that xreadlines uses internally. However, if I even double that, readlines_sizehint is uniformly about 10% slower. It's also a tiny bit slower if I cut the sizehint buffer size to 4096. I'm afraid Mysteries will remain no matter how many person-decades we spend staring at this <0.5 wink> ... From guido at python.org Wed Jan 10 19:50:10 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 13:50:10 -0500 Subject: [Python-Dev] Shouldn't the Mac be listed as an environment? In-Reply-To: Your message of "Wed, 10 Jan 2001 10:41:47 CST." <14940.37067.893679.750918@beluga.mojam.com> References: <14940.37067.893679.750918@beluga.mojam.com> Message-ID: <200101101850.NAA29744@cj20424-a.reston1.va.home.com> > I just noticed that the "Environment" options for Python on the SF site are > listed as > > Console (Text Based), Win32 (MS Windows), X11 Applications > > Shouldn't something Macintosh-related be in that list as well? Yeah, except for two problems: :-) (1) This is a selection from a drop-down menu that doesn't have a Mac option; (2) There are only three slots allowed. So this is the best we can do. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Wed Jan 10 19:53:32 2001 From: gstein at lyra.org (Greg Stein) Date: Wed, 10 Jan 2001 10:53:32 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010110194544.V2467@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 10, 2001 at 07:45:44PM +0100 References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> <20010110194544.V2467@xs4all.nl> Message-ID: <20010110105332.T4640@lyra.org> On Wed, Jan 10, 2001 at 07:45:44PM +0100, Thomas Wouters wrote: > On Wed, Jan 10, 2001 at 11:53:16AM -0500, Guido van Rossum wrote: > > > I note that the IMPORT opcodes in ceval.c require that the imported > > module (as found in sys.modules[name] or returned by __import__()) is > > a real module object. I think this is unnecessary -- at least > > IMPORT_FROM should work even if the module is a proxy or some other > > thing (I've been known to smuggle class instances into sys.modules :-) > > and IMPORT_STAR should work with a non-module at least if it has an > > __all__ attribute. > > Hmm.... Have you been sneaking looks at python-list again, Guido ? :-) I'm > certain the expanding of IMPORT would make a lot of people very happy. Alex > Martelli only just discovered the fact you can populate sys.modules > yourself, with non-module objects, and was wondering about its legality and > compatibility. > > I, for one, am very +1 on the idea, also on MAL's idea to do our best in the > IMPORT_STAR case (try dict.items(), etc.) +1 ... I'm always up for removing type restrictions. Did that with the bytecodes in function objects a while back. Cheers, -g -- Greg Stein, http://www.lyra.org/ From MarkH at ActiveState.com Wed Jan 10 19:54:34 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 10 Jan 2001 10:54:34 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,NONE,1.1 Setup.dist,1.3,1.4 In-Reply-To: <20010110190606.T2467@xs4all.nl> Message-ID: > I'm confuzzled. Is this breach of the style guidelines intentional, > accidental, or just not cared enough about ? I vote the latter! Who-really-cares ly, Mark. From guido at python.org Wed Jan 10 20:00:24 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 14:00:24 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: Your message of "Mon, 08 Jan 2001 11:31:09 EST." <20010108113109.C7563@kronos.cnri.reston.va.us> References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> Message-ID: <200101101900.OAA30486@cj20424-a.reston1.va.home.com> [me] > >I expect Andrew's code to go in before 2.1 is released. So I don't > >see a reason why we should hurry and check in a stop-gap measure. [Andrew] > But it might not; the final version might be unacceptable or run into > some intractable problem. Assuming the patch is correct (I haven't > looked at it), why not check it in? The work has already been done to > write it, after all. OK, done. It was more work than I had hoped for, because Eric apparently (despite having developer privileges!) doesn't use the CVS tree -- he sent in a diff relative to the 2.0 release. I munged it into place, adding the feature that readline, _curses and bsdddb are built as shared libraries by default. You'd have to edit Setup.config.in to change this. Hope this doesn't break anybody's setup. (Skip???) Question for Eric: do you still want developer privileges? They come with responsibilities too. Please check out the @#$%& CVS tree! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 10 20:03:07 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Jan 2001 14:03:07 -0500 Subject: [Python-Dev] Re: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Mon, 01 Jan 2001 19:49:35 CST." <20010101194935.19672@falcon.inetnebr.com> References: <20010101194935.19672@falcon.inetnebr.com> Message-ID: <200101101903.OAA30522@cj20424-a.reston1.va.home.com> Hi Jeff, I'm glad to tell you that I've accepted your xreadlines patches. It's all checked into the CVS tree now, except for your patch to fileinput.py, where I had already checked in a similar change using readlines(sizehint) directly. Thanks again for your contribution! --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Wed Jan 10 21:08:31 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 10 Jan 2001 12:08:31 -0800 Subject: [Python-Dev] Add __exports__ to modules References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <3A5CC13F.DFB26A0B@ActiveState.com> Guido van Rossum wrote: > > ... > > Yes -- I came up with the same thought. > > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. Why? From my point of view, the changes to dir() are much more important. I seldom tell newbies about import * but I always tell them how they can browse objects (especially modules) with dir. If dir() is changed then IDEs and so forth would use that and inherit the right behavior. If the module exporting behavior gets more sophisticated in a future version of Python they will continue to inherit the behavior. Also, dir() could look for an __all__ on all objects including "module proxies", classes and "plain old instances". In other words we can extend the convention to other objects "for free". Paul From tim.one at home.com Wed Jan 10 21:25:24 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 15:25:24 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101101638.LAA26759@cj20424-a.reston1.va.home.com> Message-ID: [Tim] >> Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method >> to keep the file locked until the line was complete, and I >> wouldn't be opposed to making life saner on platforms that allow it. [Guido] > Hm... That would be possible, except for one unfortunate detail: > _PyString_Resize() may call PyErr_BadInternalCall() which touches > thread state. FLOCKFILE/FUNLOCKFILE are independent of Python's notion of thread state. IOW, do FLOCKFILE once before the for(;;), and FUNLOCKFILE once on every *exit* path thereafter. We can block/unblock Python threads as often as desired between those *file*-locking brackets. The only thing the repeated FLOCKFILE/FUNLOCKFILE calls do to my eyes now is to create the *possibility* for multiple readers to get partial lines of the file. > ... > NO, NO NO! Mixing reads and writes on the same stream wasn't what we > are locking against at all. (As you've found out, it doesn't even > work.) On Windows, yes, but that still seems to me to be a bug in MS's code. If anyone had reported a core dump on any other platform, I'd be more tractable on this point. > We're only trying to protect against concurrent *reads*. As above, I believe that we could do a better job of that, then, on platforms that HAVE_GETC_UNLOCKED, by protecting not only against core dumps but also against .readline() not delivering an intact line from the file. >> But since FLOCKFILE is in effect, other threads *trying* to write >> to the stream we're reading will get blocked anyway. Seems to give us >> potential for deadlocks. > Only if tyeh are holding other locks at the same time. I'm not being clear, then. Thread X does f.readline(), on a HAVE_GETC_UNLOCKED platform. get_line allows other threads to run and invokes FLOCKFILE on f->f_fp. get_line's GETC in thread X eventually hits the end of the stdio buffer, and does its platform's version of _filbuf. _filbuf may wait (depending on the nature of the stream) for more input to show up. Simultaneously, thread Y attempts to write some data to f. But the *FLOCKFILE* lock prevents it from doing anything with f. So X is waiting for Y to write data inside platform _filbuf, but Y is waiting for X to release the platform stream lock inside some platform stream-output routine (if I'm being clear now, Python locks have nothing to do with this scenario: it's the platform stream lock). I think this is purely the user's fault if it happens. Just pointing it out as another insecurity we're probably not able to protect users from. > ... > Yeah. But this is insane use -- see my comments on SF. It's only > worth fixing because it could be used to intentionally crash Python -- > but there are easier ways... If it's unique to MS (as I suspect), I see no reason to even consider trying to fix it in Python. Unless the Perl Mongers use it to crash Zope . From cgw at fnal.gov Wed Jan 10 22:57:41 2001 From: cgw at fnal.gov (Charles G Waldman) Date: Wed, 10 Jan 2001 15:57:41 -0600 (CST) Subject: [Python-Dev] Interning filenames of imported modules Message-ID: <14940.56021.646147.770080@buffalo.fnal.gov> I have a question about the following code in compile.c:jcompile (line 3678) filename = PyString_InternFromString(sc.c_filename); name = PyString_InternFromString(sc.c_name); In the case of a long-running server which constantly imports modules, this causes the interned string dict to grow without bound. Is there a strong reason that the filename needs to be interned? How about the module name? How about some way to enforce a limit on the size of the interned strings dictionary? From mwh21 at cam.ac.uk Wed Jan 10 23:02:49 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: Wed, 10 Jan 2001 22:02:49 +0000 (GMT) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5CC13F.DFB26A0B@ActiveState.com> Message-ID: On Wed, 10 Jan 2001, Paul Prescod wrote: > Guido van Rossum wrote: > > > > ... > > > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Why? From my point of view, the changes to dir() are much more > important. I seldom tell newbies about import * but I always tell them > how they can browse objects (especially modules) with dir. If dir() is > changed then IDEs and so forth would use that and inherit the right > behavior. If the module exporting behavior gets more sophisticated in a > future version of Python they will continue to inherit the behavior. Changing dir would also make rlcompleter nicer - it's something of a pain to use with a module that has, eg, "from TERMIOS import *"-ed. This might also make "from ... import *" less of a pariah... Sounds good to me, IOW. Cheers, M. From tim.one at home.com Wed Jan 10 23:23:14 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 17:23:14 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101101639.LAA26776@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I bet you it's rejected on the basis of "the docs tell you not to mix > reading and writing on the same stream without intervening seek or > flush." If I were on the support line I would do that. So would I if I were a typical first-line support idiot . But the *implementers*-- if they ever see it --should be very keen to figure out how they managed to let the _iobuf get corrupted. *I'm* not mucking with their internals, nor doing wild pointer stores, nor anything else sneaky to subvert their locking protection. I wasn't even trying to break it. The only code reading from or storing into the _iobuf is theirs. They're ordinary stdio calls with ordinary arguments, and if *any* sequence of those can cause internal corruption, they've almost certainly got a problem that will manifest in other situations too. Think like an implementer here <0.5 wink>: they've lost track of how many characters are in the buffer despite a locking scheme whose purpose is to prevent that. If it were my implementation, that would be a top-priority bug no matter how silly the first program I saw that triggered it. but-willing-to-let-them-decide-whether-they-care-ly y'rs - tim From skip at mojam.com Wed Jan 10 23:52:55 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 10 Jan 2001 16:52:55 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5CC13F.DFB26A0B@ActiveState.com> References: <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> <3A5CC13F.DFB26A0B@ActiveState.com> Message-ID: <14940.59335.723701.574821@beluga.mojam.com> Paul> Also, dir() could look for an __all__ on all objects including Paul> "module proxies", classes and "plain old instances". In other Paul> words we can extend the convention to other objects "for free". The __exports__/dir() patch I submitted will do this if you remove the PyModule_Check that guards it. Skip From tim.one at home.com Thu Jan 11 00:06:05 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 10 Jan 2001 18:06:05 -0500 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <3A5C4E44.23B593E9@per.dem.csiro.au> Message-ID: [Mark Favas] > Just Another Data Point - my box (DEC Alpha, Tru64 Unix) shows the same > behaviour as Tim's WinBox wrt the new xreadline and the double-loop > readlines (so it's not just something funny with MS (not that there's > not anything funny with MS...)): > > total 131426612 chars and 514216 lines You average over 255 chars/line? Really? What kind of file are you reading? I don't really want to measure the speed of line-at-a-time input on binary files where "line" doesn't actually make sense <0.6 wink>. > count_chars_lines 5.450 5.066 > readlines_sizehint 4.112 4.083 > using_fileinput 10.928 10.916 > while_readline 11.766 11.733 > for_xreadlines 3.569 3.533 Guido pointed out that his readlines_sizehint test forced use of a 1Mb buffer (in the call, not only the default value). For whatever reason, that was significantly slower than using an 8Kb sizehint on my box. Another oddity is that while_readline is slower than using_fileinput for you. From that I take it Python config does *not* #define HAVE_GETC_UNLOCKED on your platform. If that's true (or esp. if it's not!), would you do me a favor? Recompile fileobject.c with USE_MS_GETLINE_HACK #define'd, try the timing test again (while_readline is the most interesting test for this), and run the test_bufio.py std test to make sure you're actually getting the right answers. At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack whenever HAVE_GETC_UNLOCKED isn't available. I'd be surprised if ms_getline_hack failed to work correctly on any platform; a bigger unknown (to me) is whether it will yield a speedup. So far it yields a large speedup on Windows, and looks like a speedup equal to getc_unlocked() yields on Linux and Solaris. Info on a platform from Mars (like Tru64 Unix ) would be valuable in deciding whether to boost +0.5. don't-want-your-python-to-run-slower-than-possible-if-possible-ly y'rs - tim From tismer at tismer.com Wed Jan 10 23:38:57 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 11 Jan 2001 00:38:57 +0200 Subject: [Python-Dev] [Stackless] ANN: Sourcecode for Stackless Python 2.0 Message-ID: <3A5CE481.24A7656@tismer.com> On Monday, Jan 8th, I spake """ Source code and an update to the website will become available in the next days. """ Now, here it is, together with a slightly updated website, which tries to mention all the people who are helping or sponsoring me (yes, there are sponsors!). If somebody feels ignored by me, let me know. I'm good at making mistakes. Let me also know if there are problems building the code, or if there are *no* problems understanding the code. I don't expect either :-) There is nearly no support for Unix, but Stackless *should* build on Unix as it did before without problems. enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From nas at arctrix.com Wed Jan 10 19:15:45 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 10 Jan 2001 10:15:45 -0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: ; from tim.one@home.com on Wed, Jan 10, 2001 at 06:06:05PM -0500 References: <3A5C4E44.23B593E9@per.dem.csiro.au> Message-ID: <20010110101545.A21305@glacier.fnational.com> On Wed, Jan 10, 2001 at 06:06:05PM -0500, Tim Peters wrote: > At this point I'm +0.5 on the idea of fileobject.c using ms_getline_hack > whenever HAVE_GETC_UNLOCKED isn't available. Leave it to the timbot use floating point votes. :) Compare ms_getline_hack to what Perl does in order speed up IO. I think its worth maintaining that piece of relatively portable code given the benefit. If the code has to be maintained then it might was well be used. If we find a platform the breaks we can always disable it before the final release. Neil From m.favas at per.dem.csiro.au Thu Jan 11 02:28:59 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 09:28:59 +0800 Subject: [Python-Dev] xreadlines : readlines :: xrange : range Message-ID: <3A5D0C5B.162F624A@per.dem.csiro.au> [Tim produces a warped threader that crashes on MS OS's] >> ... >> NO, NO NO! Mixing reads and writes on the same stream wasn't what >> we are locking against at all. (As you've found out, it doesn't >> even work.) >On Windows, yes, but that still seems to me to be a bug in MS's code. >If anyone had reported a core dump on any other platform, I'd be more >tractable on this point. On Tru64 Unix, I get an infinite generator of 'r's (after an initial few 'w's) to the screen (but no crashes). If I reduce the size of the loop counters from 1000000 to 3000, I get the following output: opened w w w w w w w w w w w w w w w w w w w w w w w w w w w r read 5114 done -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From m.favas at per.dem.csiro.au Thu Jan 11 04:40:18 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 11:40:18 +0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint Message-ID: <3A5D2B22.B8028AC@per.dem.csiro.au> [Tim responded] >> >> total 131426612 chars and 514216 lines >You average over 255 chars/line? Really? What kind of file are you >reading? I don't really want to measure the speed of line-at-a-time >input on binary files where "line" doesn't actually make sense <0.6 wink>. Real-life input, my boy! It's actually a syslog from my mailserver, consisting mainly of sendmail log messages, and I have a current need to process these things (MS Exchange, corrupted database, clobbered backup tapes), so this thread came along at the right time... >Guido pointed out that his readlines_sizehint test forced use of a 1Mb >buffer (in the call, not only the default value). For whatever >reason, that was significantly slower than using an 8Kb sizehint on my >box. Removing the buffer size arg in the call to readlines_sizehint results in this (using up-to-the-minute CVS): total 131426612 chars and 514216 lines count_chars_lines 4.922 4.916 readlines_sizehint 3.881 3.850 using_fileinput 10.371 10.366 while_readline 10.943 10.916 for_xreadlines 2.990 2.967 and with an 8Kb sizehint: total 131426612 chars and 514216 lines count_chars_lines 5.241 5.216 readlines_sizehint 2.917 2.900 using_fileinput 10.351 10.333 while_readline 10.990 10.983 for_xreadlines 2.877 2.867 >Another oddity is that while_readline is slower than using_fileinput >for you. From that I take it Python config does *not* #define > > HAVE_GETC_UNLOCKED > >on your platform. If that's true Nope, HAVE_GETC_UNLOCKED is indeed #define'd >(or esp. if it's not!), would you do me a >favor? Recompile fileobject.c with > > USE_MS_GETLINE_HACK > >#define'd, try the timing test again (while_readline is the most >interesting test for this), and run the test_bufio.py std test to make >sure you're actually getting the right answers. Sure: With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd (although defining the former makes the latter def irrelevant): (test_bufio also OK) total 131426612 chars and 514216 lines count_chars_lines 5.056 5.050 readlines_sizehint 3.771 3.667 using_fileinput 11.128 11.116 while_readline 8.287 8.233 for_xreadlines 3.090 3.083 With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just for completeness): total 131426612 chars and 514216 lines count_chars_lines 4.916 4.900 readlines_sizehint 3.875 3.867 using_fileinput 14.404 14.383 while_readline 322.728 321.837 for_xreadlines 7.113 7.100 So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From nas at arctrix.com Wed Jan 10 22:55:23 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 10 Jan 2001 13:55:23 -0800 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <3A5D2B22.B8028AC@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Thu, Jan 11, 2001 at 11:40:18AM +0800 References: <3A5D2B22.B8028AC@per.dem.csiro.au> Message-ID: <20010110135523.A21894@glacier.fnational.com> On Thu, Jan 11, 2001 at 11:40:18AM +0800, Mark Favas wrote: [with getc_unlocked] > while_readline 10.943 10.916 [without] > while_readline 322.728 321.837 Holy crap. Great work team. Neil From tim.one at home.com Thu Jan 11 06:03:51 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 00:03:51 -0500 Subject: [Python-Dev] Baffled on Windows Message-ID: In version 2.26 of mmapmodule.c, Guido replaced (as part of a contributed Cygwin patch): #ifdef MS_WIN32 __declspec(dllexport) void #endif /* MS_WIN32 */ #ifdef UNIX extern void #endif by: DL_EXPORT(void) before initmmap. 1. Windows Python can no longer import mmap: >>> import mmap Traceback (most recent call last): File "", line 1, in ? ImportError: dynamic module does not define init function (initmmap) >>> This is because GetProcAddress returns NULL. 2. Everything's fine if I revert Guido's change (although I assume that breaks Cygwin then). 3. DL_EXPORT(void) expands to "void". 4. The way mmapmodule.c is coded and built after Guido's change appears to me to be the same as how every other non-builtin module is coded and built on Windows. For example, winsound.c, which uses DL_EXPORT(void) before its initwinsound and where that macro also expands to "void". But importing winsound works fine. Since what I'm seeing makes no consistent sense, I'm at a loss how to fix it. But then I'm punch-drunk too <0.7 wink>. Any Windows geek got a clue? From tim.one at home.com Thu Jan 11 07:10:40 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 01:10:40 -0500 Subject: [Python-Dev] RE: xreadline speed vs readlines_sizehint In-Reply-To: <3A5D2B22.B8028AC@per.dem.csiro.au> Message-ID: [Tim, to MarkF] >> You average over 255 chars/line? [nag, nag, nag] [Mark Favas] > Real-life input, my boy! It's actually a syslog from my > mailserver, consisting mainly of sendmail log messages, and I > have a current need to process these things (MS Exchange, > corrupted database, clobbered backup tapes), so this thread > came along at the right time... Hmm. I tuned ms_getline_hack for Guido's logfiles, which he said don't often exceed 160 chars/line. I guess if you're on a 64-bit platform, though, it must take about twice as many chars per line to record a log msg . > ... > Removing the buffer size arg in the call to readlines_sizehint results > in this (using up-to-the-minute CVS): > total 131426612 chars and 514216 lines > count_chars_lines 4.922 4.916 > readlines_sizehint 3.881 3.850 > using_fileinput 10.371 10.366 > while_readline 10.943 10.916 > for_xreadlines 2.990 2.967 > > and with an 8Kb sizehint: > total 131426612 chars and 514216 lines > count_chars_lines 5.241 5.216 > readlines_sizehint 2.917 2.900 > using_fileinput 10.351 10.333 > while_readline 10.990 10.983 > for_xreadlines 2.877 2.867 That's sure consistent across platforms, then. I guess we'll write it off to "cache effects" (a catch-all explanation for any timing mystery -- go ahead, just *try* to prove it's wrong <0.5 wink>). [and Mark has HAVE_GETC_UNLOCKED on his Tru64 Unix box, yet using_fileinput is quicker than while_readline] > With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #define'd > (although defining the former makes the latter def irrelevant): > (test_bufio also OK) > total 131426612 chars and 514216 lines > count_chars_lines 5.056 5.050 > readlines_sizehint 3.771 3.667 > using_fileinput 11.128 11.116 > while_readline 8.287 8.233 > for_xreadlines 3.090 3.083 So ms_getline_hack is significantly faster on your box (I'm only looking at while_readline: 11 using getc_unlocked, 8.3 using ms_getline_hack). There are only two reasons I can imagine for that: 1. Your vendor optimizes the inner loop in fgets (as all vendors should, but few do). and/or 2. Despite the long average length of your lines, many of them are nevertheless shorter than 200 chars, and so all the pain ms_getline_hack endures to avoid a realloc pays off. Unfortunately, there's not enough info to figure out if either, both, or none of those are on-target. It's such a large percentage speedup, though, that my bet goes primarily to #1 -- unless realloc is really pig slow on your box. Which some things *are*: > With USE_MS_GETLINE_HACK and HAVE_GETC_UNLOCKED both #undef'ed (just > for completeness): > total 131426612 chars and 514216 lines > count_chars_lines 4.916 4.900 > readlines_sizehint 3.875 3.867 > using_fileinput 14.404 14.383 > while_readline 322.728 321.837 > for_xreadlines 7.113 7.100 > > So, having HAVE_GETC_UNLOCKED #define'd does make a small improvement > Yes, that's the "platform from Mars" evidence I was seeking: if ms_getline_hack survives test_bufio on *your* crazy box, it's as close to provably correct as any algorithm in all of Computer Science . a-factor-of-39-is-almost-big-enough-to-notice!-ly y'rs - tim From m.favas at per.dem.csiro.au Thu Jan 11 08:26:37 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 15:26:37 +0800 Subject: [Python-Dev] Re: xreadline speed vs readlines_sizehint References: Message-ID: <3A5D602D.9DC991CB@per.dem.csiro.au> [Tim speculates on getc_unlocked and his ms_getline_hack]: > > So ms_getline_hack is significantly faster on your box (I'm only > looking at while_readline: 11 using getc_unlocked, 8.3 using > ms_getline_hack). There are only two reasons I can imagine for that: > > 1. Your vendor optimizes the inner loop in fgets (as all vendors > should, but few do). Digital engineering, Compaq management/marketing <0.6 wink> > > and/or > > 2. Despite the long average length of your lines, many of them are > nevertheless shorter than 200 chars, and so all the pain > ms_getline_hack endures to avoid a realloc pays off. > > Unfortunately, there's not enough info to figure out if either, both, > or none of those are on-target. It's such a large percentage > speedup, though, that my bet goes primarily to #1 -- unless realloc > is really pig slow on your box. The lines range in length from 96 to 747 characters, with 11% @ 233, 17% @ 252 and 52% @ 254 characters, so #1 looks promising - most lines are long enough to trigger a realloc. Cranking up INITBUFSIZE in ms_getline_hack to 260 from 200 improves thing again, by another 25%: total 131426612 chars and 514216 lines count_chars_lines 5.081 5.066 readlines_sizehint 3.743 3.717 using_fileinput 11.113 11.100 while_readline 6.100 6.083 for_xreadlines 3.027 3.033 Apart from the name , I like ms_getline_hack... tho'-a-factor-of-100-makes-xreadlines-a-welcome-addition!-ly y'rs -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From m.favas at per.dem.csiro.au Thu Jan 11 10:08:29 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Thu, 11 Jan 2001 17:08:29 +0800 Subject: [Python-Dev] Current CVS version of sysmodule.c fails to compile Message-ID: <3A5D780D.62D0F473@per.dem.csiro.au> On Tru64 Unix, with Compaq's C/CXX compilers, the current CVS version of sysmodule.c produces the following errors: cc -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c -o sysmodule.o sysmodule.c cc: Error: sysmodule.c, line 73: Invalid declarator. (declarator) PyObject *o, *stdout; ----------------------^ cc: Error: sysmodule.c, line 79: In this statement, "o" is not declared. (undeclared) if (!PyArg_ParseTuple(args, "O:displayhook", &o)) ------------------------------------------------------^ cc: Error: sysmodule.c, line 93: In this statement, "(&_iob[1])" is not an lvalue, but occurs in a context that requires one. (needlvalue) stdout = PySys_GetObject("stdout"); --------^ cc: Warning: sysmodule.c, line 98: In this statement, the referenced type of the pointer value "(&_iob[1])" is "struct declared without a tag", which is not compatible with "struct _object". (ptrmismatch) if (PyFile_WriteObject(o, stdout, 0) != 0) ----------------------------------^ cc: Warning: sysmodule.c, line 100: In this statement, the referenced type of the pointer value "(&_iob[1])" is "struct declared without a tag", which is not compatible with "struct _object". (ptrmismatch) PyFile_SoftSpace(stdout, 1); -------------------------^ The problem is that stdout is a macro #define'd in stdio.h as (&_iob[1]) (stdin and stderr also are similarly #define'd). -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From gstein at lyra.org Thu Jan 11 10:18:44 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 01:18:44 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.216,2.217 sysmodule.c,2.80,2.81 In-Reply-To: ; from moshez@users.sourceforge.net on Wed, Jan 10, 2001 at 09:41:29PM -0800 References: Message-ID: <20010111011843.W4640@lyra.org> On Wed, Jan 10, 2001 at 09:41:29PM -0800, Moshe Zadka wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv21213/Python > > Modified Files: > ceval.c sysmodule.c >... > --- 1246,1269 ---- > case PRINT_EXPR: > v = POP(); > ! w = PySys_GetObject("displayhook"); > ! if (w == NULL) { > ! PyErr_SetString(PyExc_RuntimeError, > ! "lost sys.displayhook"); > ! err = -1; > } > + if (err == 0) { > + x = Py_BuildValue("(O)", v); > + if (x == NULL) > + err = -1; > + } > + if (err == 0) { > + w = PyEval_CallObject(w, x); > + if (w == NULL) > + err = -1; > + } > Py_DECREF(v); > + Py_XDECREF(x); x was never initialized to NULL. In fact, the loop sets it to Py_None. If you get an error in the initial "w" setup case, then you could erroneously decref None. Further, there is no DECREF for the CallObject result ("w"). But watch out: you don't want to DECREF the PySys_GetObject result (that is a borrowed reference). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Jan 11 10:28:16 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 01:28:16 -0800 Subject: [Python-Dev] Current CVS version of sysmodule.c fails to compile In-Reply-To: <3A5D780D.62D0F473@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Thu, Jan 11, 2001 at 05:08:29PM +0800 References: <3A5D780D.62D0F473@per.dem.csiro.au> Message-ID: <20010111012815.X4640@lyra.org> You're quite right! I've checked in a change, renaming it to "outf". Cheers, -g On Thu, Jan 11, 2001 at 05:08:29PM +0800, Mark Favas wrote: > On Tru64 Unix, with Compaq's C/CXX compilers, the current CVS version of > sysmodule.c produces the following errors: > > cc -O -Olimit 1500 -I./../Include -I.. -DHAVE_CONFIG_H -c -o > sysmodule.o sysmodule.c > cc: Error: sysmodule.c, line 73: Invalid declarator. (declarator) > PyObject *o, *stdout; > ----------------------^ > cc: Error: sysmodule.c, line 79: In this statement, "o" is not declared. > (undeclared) > if (!PyArg_ParseTuple(args, "O:displayhook", &o)) > ------------------------------------------------------^ > cc: Error: sysmodule.c, line 93: In this statement, "(&_iob[1])" is not > an lvalue, but occurs in a context that requires one. (needlvalue) > stdout = PySys_GetObject("stdout"); > --------^ > cc: Warning: sysmodule.c, line 98: In this statement, the referenced > type of the pointer value "(&_iob[1])" is "struct declared without a > tag", which is not compatible with "struct _object". (ptrmismatch) > if (PyFile_WriteObject(o, stdout, 0) != 0) > ----------------------------------^ > cc: Warning: sysmodule.c, line 100: In this statement, the referenced > type of the pointer value "(&_iob[1])" is "struct declared without a > tag", which is not compatible with "struct _object". (ptrmismatch) > PyFile_SoftSpace(stdout, 1); > -------------------------^ > > The problem is that stdout is a macro #define'd in stdio.h as (&_iob[1]) > (stdin and stderr also are similarly #define'd). > > -- > Mark Favas - m.favas at per.dem.csiro.au > CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Greg Stein, http://www.lyra.org/ From skip at mojam.com Thu Jan 11 15:13:55 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 11 Jan 2001 08:13:55 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: References: Message-ID: <14941.49059.26189.733094@beluga.mojam.com> Moshe> * Did not DECREF result from displayhook function ... Moshe> w = PyEval_CallObject(w, x); Moshe> + Py_XDECREF(w); Moshe> if (w == NULL) ... While it works, is it really kosher to test w's value after the DECREF? Just seems like an odd construct to me. I'm used to seeing the test immediately after it's been set. Skip From guido at python.org Thu Jan 11 15:44:58 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 09:44:58 -0500 Subject: [Python-Dev] Interning filenames of imported modules In-Reply-To: Your message of "Wed, 10 Jan 2001 15:57:41 CST." <14940.56021.646147.770080@buffalo.fnal.gov> References: <14940.56021.646147.770080@buffalo.fnal.gov> Message-ID: <200101111444.JAA14597@cj20424-a.reston1.va.home.com> > I have a question about the following code in compile.c:jcompile (line 3678) > > filename = PyString_InternFromString(sc.c_filename); > name = PyString_InternFromString(sc.c_name); > > In the case of a long-running server which constantly imports modules, > this causes the interned string dict to grow without bound. Is there > a strong reason that the filename needs to be interned? How about the > module name? It's probably not *necessary* for the filename, but I know why I am interning it: since a module typically contains a bunch of functions, and each function has its own code object with a reference to the filename, I'm trying to save memory (the filename is a C string pointer in the "sc" structure, so it has to be turned into a Python string when creating the code object). The module name is used as an identifier elsewhere so will become interned anyway. > How about some way to enforce a limit on the size of the interned > strings dictionary? I've never thought of this -- but I suppose that a weak dictionary could be used. Fred's working on a PEP for weak references, so there's a chance that we might use this eventually. In the mean time, a possibility would be to provide a service function that goes through the "interned" dictionary and looks for values with a reference count of 1, and deletes them. You could then explicitly call this service function occasionally in your program. I would let it return a tuple: (number of values kept, number of values deleted). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 16:08:48 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:08:48 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 13:49:40 EST." References: Message-ID: <200101111508.KAA14870@cj20424-a.reston1.va.home.com> > They're indistinguishable then on my box (on one run xreadlines is .1 > seconds (out of around 7.6 total) quicker, on another readlines_sizehint), > *provided* that I specify the same buffer size (8192) that xreadlines uses > internally. However, if I even double that, readlines_sizehint is uniformly > about 10% slower. It's also a tiny bit slower if I cut the sizehint buffer > size to 4096. > > I'm afraid Mysteries will remain no matter how many person-decades we spend > staring at this <0.5 wink> ... 8192 happens to be the size of the stack-allocated buffer readlines() uses, and also the stdio BUFSIZ parameter, on many systems. Look for SMALLCHUNK in fileobject.c. Would it make sense to tie the two constants together more to tune this optimally even when BUFSIZ is different? --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Thu Jan 11 16:09:54 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 11 Jan 2001 10:09:54 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> <200101101900.OAA30486@cj20424-a.reston1.va.home.com> Message-ID: <14941.52418.18484.898061@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> It was more work than I had hoped for, because Eric GvR> apparently (despite having developer privileges!) doesn't use GvR> the CVS tree -- he sent in a diff relative to the 2.0 GvR> release. I munged it into place, adding the feature that GvR> readline, _curses and bsdddb are built as shared libraries by GvR> default. You'd have to edit Setup.config.in to change this. GvR> Hope this doesn't break anybody's setup. (Skip???) We may need to move dbm module to Setup.config from Setup and build it shared too. The problem I ran into when building the pybsddb3 module was that even though I'd built the standard bsddb shared, I was also building in dbm statically. This pulled in a dependency to the old db.so module (under RH6.1) and core dumped me during the test suite for pybsddb. Commenting out dbm did the trick, so building it shared should work too. Couple of things: dbm isn't enabled by default I believe so moving it to Setup.config may not be the right thing after all (would that imply an autoconf test and auto-enabling if it's detected?) Also, Andrew's distutils-based build procedure may obviate the need for this change. -Barry From ping at lfw.org Thu Jan 11 16:14:17 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 07:14:17 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: On Wed, 10 Jan 2001, Guido van Rossum wrote: > Yes -- I came up with the same thought. > > So here's a plan: somebody please submit a patch that does only one > thing: from...import * looks for __all__ and if it exists, imports > exactly those names. No changes to dir(), or anything. Please don't use __all__. At the moment, __all__ is the only way to easily tell whether a particular module object really represents a package, and the only way to get the list of submodule names. If __all__ is overloaded to also represent exportable symbols in modules, these two pieces of information will be impossible (or require much ugly hackery) to obtain. -- ?!ng From guido at python.org Thu Jan 11 16:23:26 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:23:26 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 15:25:24 EST." References: Message-ID: <200101111523.KAA14982@cj20424-a.reston1.va.home.com> > [Tim] > >> Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method > >> to keep the file locked until the line was complete, and I > >> wouldn't be opposed to making life saner on platforms that allow it. > > [Guido] > > Hm... That would be possible, except for one unfortunate detail: > > _PyString_Resize() may call PyErr_BadInternalCall() which touches > > thread state. [Tim] > FLOCKFILE/FUNLOCKFILE are independent of Python's notion of thread state. > IOW, do FLOCKFILE once before the for(;;), and FUNLOCKFILE once on every > *exit* path thereafter. We can block/unblock Python threads as often as > desired between those *file*-locking brackets. The only thing the repeated > FLOCKFILE/FUNLOCKFILE calls do to my eyes now is to create the *possibility* > for multiple readers to get partial lines of the file. I don't want to call FLOCKFILE while holding the Python lock, as this means that *if* we're blocked in FLOCKFILE (e.g. we're reading from a pipe or socket), no other Python thread can run! > > ... > > NO, NO NO! Mixing reads and writes on the same stream wasn't what we > > are locking against at all. (As you've found out, it doesn't even > > work.) > > On Windows, yes, but that still seems to me to be a bug in MS's code. If > anyone had reported a core dump on any other platform, I'd be more tractable > on this point. Yes, it's a Windows bug. > > We're only trying to protect against concurrent *reads*. > > As above, I believe that we could do a better job of that, then, on > platforms that HAVE_GETC_UNLOCKED, by protecting not only against core dumps > but also against .readline() not delivering an intact line from the file. See above for a reason why I think that's not safe. I think that applications that want to do this can do their own locking. (They'll find out soon enough that readline() isn't atomic. :-) > >> But since FLOCKFILE is in effect, other threads *trying* to write > >> to the stream we're reading will get blocked anyway. Seems to give us > >> potential for deadlocks. > > > Only if tyeh are holding other locks at the same time. > > I'm not being clear, then. Thread X does f.readline(), on a > HAVE_GETC_UNLOCKED platform. get_line allows other threads to run and > invokes FLOCKFILE on f->f_fp. get_line's GETC in thread X eventually hits > the end of the stdio buffer, and does its platform's version of _filbuf. > _filbuf may wait (depending on the nature of the stream) for more input to > show up. Simultaneously, thread Y attempts to write some data to f. But > the *FLOCKFILE* lock prevents it from doing anything with f. So X is > waiting for Y to write data inside platform _filbuf, but Y is waiting for X > to release the platform stream lock inside some platform stream-output > routine (if I'm being clear now, Python locks have nothing to do with this > scenario: it's the platform stream lock). I don't think that _filbuf can possibly wait for another thread to write data to the same stream object. A single stream object doesn't act like a pipe, even if it is open for simultaneous reading and writing. So if there's no more data in the file, _fulbuf will simply return with an EOF status, not wait for the data that the other thread would write. > I think this is purely the user's fault if it happens. Just pointing it out > as another insecurity we're probably not able to protect users from. I don't think this can happen. > > ... > > Yeah. But this is insane use -- see my comments on SF. It's only > > worth fixing because it could be used to intentionally crash Python -- > > but there are easier ways... > > If it's unique to MS (as I suspect), I see no reason to even consider trying > to fix it in Python. Unless the Perl Mongers use it to crash Zope . OK. It's unique to MS. So close the bug report with a "won't fix" resolution. There's no point in having bug reports remain open that we know we can't fix. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 16:27:05 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:27:05 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Wed, 10 Jan 2001 17:23:14 EST." References: Message-ID: <200101111527.KAA15005@cj20424-a.reston1.va.home.com> > Think like an implementer here <0.5 wink>: they've lost track of how many > characters are in the buffer despite a locking scheme whose purpose is to > prevent that. If it were my implementation, that would be a top-priority > bug no matter how silly the first program I saw that triggered it. The locking prevents concurrent threads accessing the stream. But mixing reads and writes (without intervening fseek etc.) is illegal use of the stream, and the C standard allows them to be lax here, even if the program was single-threaded. In other words: the locking is so good that it serializes the sequence of reads and writes; but if the sequence of reads and writes is illegal, they don't guarantee anything. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 16:28:23 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:28:23 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 11 Jan 2001 09:28:59 +0800." <3A5D0C5B.162F624A@per.dem.csiro.au> References: <3A5D0C5B.162F624A@per.dem.csiro.au> Message-ID: <200101111528.KAA15021@cj20424-a.reston1.va.home.com> > On Tru64 Unix, I get an infinite generator of 'r's (after an initial few > 'w's) to the screen (but no crashes). Same here on Linux. > If I reduce the size of the loop > counters from 1000000 to 3000, I get the following output: > opened > w w w w w w w w w w w w w w w w w w w w w w w w w w w r read 5114 > done I still get an infinite amount of 'r's. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 11 16:28:21 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 16:28:21 +0100 Subject: [Python-Dev] Rehabilitating fgets In-Reply-To: ; from tim.one@home.com on Sun, Jan 07, 2001 at 11:13:26PM -0500 References: Message-ID: <20010111162820.W2467@xs4all.nl> On Sun, Jan 07, 2001 at 11:13:26PM -0500, Tim Peters wrote: > I'm curious about how it performs (relative to the getc_unlocked hack) on > other platforms. If you'd like to try that, just recompile fileobject.c > with > USE_MS_GETLINE_HACK > #define'd. It should *work* on any platform with fgets() meeting the > assumption. The new test_bufio.py std test gives it a pretty good > correctness workout, if you're worried about that. FreeBSD seems to work fine. Speed is practically the same as without USE_MS_GETLINE_HACK (but with HAVE_GETC_UNLOCKED), though still not quite the same as before all this hackery :-) Not by much though. For most tests it's smaller than the margin of error, though the difference is still as much as 20, 30% for the while_readline test. When using a second thread somewhere in the test, the difference vanishes further. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Thu Jan 11 16:33:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 11 Jan 2001 16:33:28 +0100 Subject: [Python-Dev] Add __exports__ to modules References: Message-ID: <3A5DD248.8EE0DF63@lemburg.com> Ka-Ping Yee wrote: > > On Wed, 10 Jan 2001, Guido van Rossum wrote: > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package, and the only way to get the list of submodule names. But __all__ has to be user-defined, so I don't buy that argument. Note that the only true way to recognize a package is by looking for an attribute "__path__" since Python adds this for packages only. > If __all__ is overloaded to also represent exportable symbols in > modules, these two pieces of information will be impossible (or > require much ugly hackery) to obtain. Again, __all__ is not automatically generated, so trusting it doesn't get you very far. To be able to find subpackages you will always have to apply some hackery (based on __path__) in order to be sure. It would be better to add a helper function to packages to query this kind of information -- the package usually knows best where to look and what to look for. Note that __all__ was explicitly invented to be used by from package import * so I think it is the right choice here. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Thu Jan 11 16:37:19 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 11 Jan 2001 10:37:19 -0500 Subject: [Python-Dev] autoconfigure patch submitted on SourceForge In-Reply-To: <14941.52418.18484.898061@anthem.wooz.org>; from barry@digicool.com on Thu, Jan 11, 2001 at 10:09:54AM -0500 References: <200101080416.f084GrM10912@snark.thyrsus.com> <20010108074411.N2467@xs4all.nl> <20010108014945.A19516@thyrsus.com> <200101081427.JAA03146@cj20424-a.reston1.va.home.com> <20010108113109.C7563@kronos.cnri.reston.va.us> <200101101900.OAA30486@cj20424-a.reston1.va.home.com> <14941.52418.18484.898061@anthem.wooz.org> Message-ID: <20010111103719.A7191@thyrsus.com> GvR> It was more work than I had hoped for, because Eric GvR> apparently (despite having developer privileges!) doesn't use GvR> the CVS tree -- he sent in a diff relative to the 2.0 GvR> release. I'm using the CVS tree now. I did that patch relative to 2.0 for boring reasons having to do with the state of my laptop. -- Eric S. Raymond The IRS has become morally corrupted by the enormous power which we in Congress have unwisely entrusted to it. Too often it acts like a Gestapo preying upon defenseless citizens. -- Senator Edward V. Long From thomas at xs4all.net Thu Jan 11 16:48:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 16:48:32 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5DD248.8EE0DF63@lemburg.com>; from mal@lemburg.com on Thu, Jan 11, 2001 at 04:33:28PM +0100 References: <3A5DD248.8EE0DF63@lemburg.com> Message-ID: <20010111164831.X2467@xs4all.nl> On Thu, Jan 11, 2001 at 04:33:28PM +0100, M.-A. Lemburg wrote: > > Please don't use __all__. At the moment, __all__ is the only way > > to easily tell whether a particular module object really represents > > a package, and the only way to get the list of submodule names. > > But __all__ has to be user-defined, so I don't buy that argument. > Note that the only true way to recognize a package is by looking > for an attribute "__path__" since Python adds this for packages > only. Ehm.... What, exactly, prevents usercode from doing __path__ = "neener, neener" ? In other words, even *that* isn't a true way to recognize a package. You can see what isn't a package, but not what is. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Thu Jan 11 16:58:55 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 10:58:55 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 07:14:17 PST." References: Message-ID: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package, and the only way to get the list of submodule names. > > If __all__ is overloaded to also represent exportable symbols in > modules, these two pieces of information will be impossible (or > require much ugly hackery) to obtain. Marc-Andre already explained that __all__ is not to be trusted. If you want a reasonably good test for package-ness, use the presence of __path__. For a really good test, check whether __file__ ends in __init__.py[c]. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Thu Jan 11 17:14:00 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 11 Jan 2001 11:14:00 -0500 Subject: [Python-Dev] PEP 229: setup.py revised Message-ID: I've put a new version of the setup.py script at http://www.mems-exchange.org/software/files/python/setup.py (I'm at work and can't remember the password to get into www.amk.ca. :) ) This version improves the detection of Tcl/Tk, handles the _curses_panel module, and doesn't do a chdir(). Same drill as before: just grab the script, drop it in the root of your Python source tree (2.0 or current CVS), run "./python setup.py build", and look at the modules it compiles. I can try it on Linux, so I'm most interested in hearing reports for other Unix versions (*BSD, HP-UX, etc.) --amk From ping at lfw.org Thu Jan 11 17:36:36 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 08:36:36 -0800 (PST) Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) Message-ID: I'm pleased to announce a reasonable first pass at a documentation utility for interactive use. "pydoc" is usable in three ways: 1. At the shell prompt, "pydoc " displays documentation on , very much like "man". 2. At the shell prompt, "pydoc -k " lists modules whose one-line descriptions mention the keyword, like "man -k". 3. Within Python, "from pydoc import help" provides a "help" function to display documentation at the interpreter prompt. All of them use sys.path in order to guarantee that the documentation you see matches the modules you get. To try "pydoc", download: http://www.lfw.org/python/pydoc.py http://www.lfw.org/python/htmldoc.py http://www.lfw.org/python/textdoc.py http://www.lfw.org/python/inspect.py I would very much appreciate your feedback, especially from testing on non-Unix platforms. Thank you! I've pasted some examples from my shell below (when you actually run pydoc, the output is piped through "less", "more", or a pager implemented in Python, depending on what is available). -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler skuld[1268]% pydoc -k mail mailbox - Classes to handle Unix style, MMDF style, and MH style mailboxes. mailcap - Mailcap file handling. See RFC 1524. mimify - Mimification and unmimification of mail messages. test.test_mailbox - (no description) skuld[1269]% pydoc -k text textdoc - Generate text documentation from live Python objects. collab - Routines for collaboration, especially group editing of text documents. gettext - Internationalization and localization support. test.test_gettext - (no description) curses.textpad - Simple textbox editing widget with Emacs-like keybindings. distutils.text_file - text_file ScrolledText - (no description) skuld[1270]% pydoc -k html htmldoc - Generate HTML documentation from live Python objects. htmlentitydefs - HTML character entity references. htmllib - HTML 2.0 parser. skuld[1271]% pydoc md5 Python Library Documentation: built-in module md5 NAME md5 FILE (built-in) DESCRIPTION This module implements the interface to RSA's MD5 message digest algorithm (see also Internet RFC 1321). Its use is quite straightforward: use the new() to create an md5 object. You can now feed this object with arbitrary strings using the update() method, and at any point you can ask it for the digest (a strong kind of 128-bit checksum, a.k.a. ``fingerprint'') of the contatenation of the strings fed to it so far using the digest() method. Functions: new([arg]) -- return a new md5 object, initialized with arg if provided md5([arg]) -- DEPRECATED, same as new, but for compatibility Special Objects: MD5Type -- type object for md5 objects FUNCTIONS md5(no arg info) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. new(no arg info) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. skuld[1272]% pydoc types Python Library Documentation: module types NAME types FILE /home/ping/sw/Python-1.5.2/Lib/types.py DESCRIPTION # Define names for all type symbols known in the standard interpreter. # Types that are part of optional modules (e.g. array) are not listed. skuld[1273]% pydoc abs Python Library Documentation: built-in function abs abs (no arg info) abs(number) -> number Return the absolute value of the argument. skuld[1274]% pydoc repr Python Library Documentation: built-in function repr repr (no arg info) repr(object) -> string Return the canonical string representation of the object. For most object types, eval(repr(object)) == object. Python Library Documentation: module repr NAME repr - # Redo the `...` (representation) but with limits on most sizes. FILE /home/ping/sw/Python-1.5.2/Lib/repr.py CLASSES Repr class Repr __init__(self) repr(self, x) repr1(self, x, level) repr_dictionary(self, x, level) repr_instance(self, x, level) repr_list(self, x, level) repr_long_int(self, x, level) repr_string(self, x, level) repr_tuple(self, x, level) FUNCTIONS repr(no arg info) skuld[1275]% pydoc re.MatchObject Python Library Documentation: class MatchObject in re class MatchObject __init__(self, re, string, pos, endpos, regs) end(self, g=0) Return the end of the substring matched by group g group(self, *groups) Return one or more groups of the match groupdict(self, default=None) Return a dictionary containing all named subgroups of the match groups(self, default=None) Return a tuple containing all subgroups of the match object span(self, g=0) Return (start, end) of the substring matched by group g start(self, g=0) Return the start of the substring matched by group g skuld[1276]% pydoc xml Python Library Documentation: package xml NAME xml - Core XML support for Python. FILE /home/ping/dev/python/dist/src/Lib/xml/__init__.py DESCRIPTION This package contains three sub-packages: dom -- The W3C Document Object Model. This supports DOM Level 1 + Namespaces. parsers -- Python wrappers for XML parsers (currently only supports Expat). sax -- The Simple API for XML, developed by XML-Dev, led by David Megginson and ported to Python by Lars Marius Garshol. This supports the SAX 2 API. VERSION 1.8 skuld[1277]% pydoc lovelyspam no Python documentation found for lovelyspam skuld[1278]% python Python 1.5.2 (#1, Dec 12 2000, 02:25:44) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> >>> from pydoc import help >>> help(int) Help on built-in function int: int (no arg info) int(x) -> integer Convert a string or number to an integer, if possible. A floating point argument will be truncated towards zero. >>> help("urlparse.urljoin") Help on function urljoin in module urlparse: urljoin(base, url, allow_fragments=1) # Join a base URL and a possibly relative URL to form an absolute # interpretation of the latter. >>> import random >>> help(random.generator) Help on class generator in module random: class generator(whrandom.whrandom) Random generator class. __init__(self, a=None) Constructor. Seed from current time or hashable value. seed(self, a=None) Seed the generator from current time or hashable value. >>> From moshez at zadka.site.co.il Fri Jan 12 01:48:30 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 02:48:30 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <3A5C97F4.945D0C1@lemburg.com> References: <3A5C97F4.945D0C1@lemburg.com>, <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il> On Wed, 10 Jan 2001 18:12:20 +0100, "M.-A. Lemburg" wrote: > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > +1 -- this won't be me though (at least not this week). I'm working on it -- I'll have a patch ready as soon as my slow modem will manage to finish the "cvs diff". Guido, I'll assign it to you, OK? > Cool. This could make Python instances usable as "modules" > -- with full getattr() hook support ! My Patch already does that -- if the instance supports __all__ > For IMPORT_STAR I'd suggest first looking for __all__ and > then reverting to __dict__.items() in case this fails. That's what my patch is doing. > BTW, is __dict__ needed by the import mechanism or would > the getattr/setattr slots suffice ? And if yes, must it > be a real Python dictionary ? My patch works with getattr (no setattr) as longs as there is an __all__ attribute. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From ping at lfw.org Thu Jan 11 17:42:44 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 08:42:44 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Guido van Rossum wrote: > > Marc-Andre already explained that __all__ is not to be trusted. > > If you want a reasonably good test for package-ness, use the presence > of __path__. Sorry, you're right. I retract my comment about __all__. -- ?!ng From skip at mojam.com Thu Jan 11 17:47:13 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 11 Jan 2001 10:47:13 -0600 (CST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010111164831.X2467@xs4all.nl> References: <3A5DD248.8EE0DF63@lemburg.com> <20010111164831.X2467@xs4all.nl> Message-ID: <14941.58257.304339.437443@beluga.mojam.com> Thomas> __path__ = "neener, neener" I believe correct English usage here is "neener, neener, neener", with a little extra emphasis on the first syllable of the third "neener"... does-that-help?-ly y'rs, Skip From MarkH at ActiveState.com Fri Jan 12 17:55:29 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Fri, 12 Jan 2001 08:55:29 -0800 Subject: [Python-Dev] RE: Baffled on Windows In-Reply-To: Message-ID: > 4. The way mmapmodule.c is coded and built after Guido's change appears to > me to be the same as how every other non-builtin module is coded and built > on Windows. For example, winsound.c, which uses DL_EXPORT(void) > before its > initwinsound and where that macro also expands to "void". But importing > winsound works fine. winsound adds "/export:initwinsound" to the link line. This is an alternative to __declspec in the sources. This all gets back to a discussion we had here nearly a year or so ago - that "DL_EXPORT" isnt capturing our semantics, and that we should probably create #defines that match the _intent_ of the definition, rather than the implementation details - ie, replace DL_EXPORT with (say) PY_API_DECL and PY_MODULEINIT_DECL or some such. I'm happy to think about this and help implement it if the time is now right... > Any Windows geek got a clue? Isn't that question a paradox? ;-) Mark. From skip at mojam.com Thu Jan 11 18:11:23 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 11 Jan 2001 11:11:23 -0600 (CST) Subject: [Python-Dev] dir()/__all__/etc Message-ID: <14941.59707.632995.224116@beluga.mojam.com> I know Guido has said he doesn't want to fiddle with dir(), but my sense of things from the overall discussion of the __exports__ concept tells me that when used interactively dir() often presents confusing output for new Python users. I twiddled CGIHTTPServer to have __all__ and added the following dir() function to my PYTHONSTARTUP file: def dir(o,showall=0): if not showall and hasattr(o, "__all__"): x = list(o.__all__) x.sort() return x from __builtin__ import dir as d return d(o) Compare its output with and without showall set: >>> dir(CGIHTTPServer) ['CGIHTTPRequestHandler', 'test'] >>> dir(CGIHTTPServer,1) ['BaseHTTPServer', 'CGIHTTPRequestHandler', 'SimpleHTTPServer', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__version__', 'executable', 'nobody', 'nobody_uid', 'os', 'string', 'sys', 'test', 'urllib'] I haven't demonstrated any great programming prowess with this little function, but I rather suspect it may be beyond most brand new users. If Guido can't be convinced to allow dir() to change, how about adding a sample PYTHONSTARTUP file to the distribution that contains little bits like this and Ping's pydoc.help stuff (assuming it gets into the distro, which I hope it does)? Skip From mal at lemburg.com Thu Jan 11 18:25:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 11 Jan 2001 18:25:20 +0100 Subject: [Python-Dev] Add __exports__ to modules References: <3A5DD248.8EE0DF63@lemburg.com> <20010111164831.X2467@xs4all.nl> Message-ID: <3A5DEC80.596F0818@lemburg.com> Thomas Wouters wrote: > > On Thu, Jan 11, 2001 at 04:33:28PM +0100, M.-A. Lemburg wrote: > > > > Please don't use __all__. At the moment, __all__ is the only way > > > to easily tell whether a particular module object really represents > > > a package, and the only way to get the list of submodule names. > > > > But __all__ has to be user-defined, so I don't buy that argument. > > Note that the only true way to recognize a package is by looking > > for an attribute "__path__" since Python adds this for packages > > only. > > Ehm.... What, exactly, prevents usercode from doing > > __path__ = "neener, neener" > > ? In other words, even *that* isn't a true way to recognize a package. You > can see what isn't a package, but not what is. Purists.... ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Fri Jan 12 03:06:37 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:06:37 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <14941.49059.26189.733094@beluga.mojam.com> References: <14941.49059.26189.733094@beluga.mojam.com>, Message-ID: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 08:13:55 -0600 (CST), Skip Montanaro wrote: > While it works, is it really kosher to test w's value after the DECREF? Yes. It may not point to anything valid, but it won't be NULL. > Just seems like an odd construct to me. I'm used to seeing the test > immediately after it's been set. It was more convenient that way. And I'm pretty certain the _DECREF macros do not change their arguments. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez at zadka.site.co.il Fri Jan 12 03:09:13 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:09:13 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: References: Message-ID: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 07:14:17 -0800 (PST), Ka-Ping Yee wrote: > On Wed, 10 Jan 2001, Guido van Rossum wrote: > > Yes -- I came up with the same thought. > > > > So here's a plan: somebody please submit a patch that does only one > > thing: from...import * looks for __all__ and if it exists, imports > > exactly those names. No changes to dir(), or anything. > > Please don't use __all__. At the moment, __all__ is the only way > to easily tell whether a particular module object really represents > a package Why not __init__? It has to be there, and is in no other module object. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez at zadka.site.co.il Fri Jan 12 03:23:16 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 04:23:16 +0200 (IST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il> References: <20010112004830.21E10A82F@darjeeling.zadka.site.co.il>, <3A5C97F4.945D0C1@lemburg.com>, <200101052014.PAA20328@cj20424-a.reston1.va.home.com> <3A5C61D8.2E5D098C@lemburg.com> <200101101653.LAA28986@cj20424-a.reston1.va.home.com> Message-ID: <20010112022316.BE682A82D@darjeeling.zadka.site.co.il> On Fri, 12 Jan 2001, Moshe Zadka wrote: > I'm working on it -- I'll have a patch ready as soon as my slow > modem will manage to finish the "cvs diff". Guido, I'll > assign it to you, OK? OK, it's 103200. Unfortunately, I couldn't assign it to Guido, since I couldn't upload it at all (yeah, still those lynx problems). This time I managed to get one specific person to upload for me, but someone else will have to assign to Guido. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From nas at arctrix.com Thu Jan 11 12:42:51 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 11 Jan 2001 03:42:51 -0800 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: ; from akuchlin@mems-exchange.org on Thu, Jan 11, 2001 at 11:14:00AM -0500 References: Message-ID: <20010111034251.A23512@glacier.fnational.com> Here is what I get on my Debian Linux machine: _codecs.so cPickle.so imageop.so pwd.so termios.so _curses.so cStringIO.so linuxaudiodev.so regex.so time.so _curses_panel.so cmath.so math.so resource.so timing.so _locale.so crypt.so md5.so rgbimg.so ucnhash.so _socket.so dbm.so mmap.so rotor.so unicodedata.so _tkinter.so errno.so new.so select.so zlib.so array.so fcntl.so nis.so sha.so audioop.so fpectl.so operator.so signal.so binascii.so gdbm.so parser.so strop.so bsddb.so grp.so pcre.so syslog.so I think that is every module which can be compiled on my machine. Great work Andrew (and the distutil developers). Neil From nas at arctrix.com Thu Jan 11 12:47:09 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 11 Jan 2001 03:47:09 -0800 Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <14941.59707.632995.224116@beluga.mojam.com>; from skip@mojam.com on Thu, Jan 11, 2001 at 11:11:23AM -0600 References: <14941.59707.632995.224116@beluga.mojam.com> Message-ID: <20010111034709.C23512@glacier.fnational.com> I'm -1 on making dir() pay attention to __all__. I'm +1 on adding a help() function which pays attention to __all__ and (optionally?) prints doc strings. Neil From gstein at lyra.org Thu Jan 11 20:38:50 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 11 Jan 2001 11:38:50 -0800 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101111558.KAA15447@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 11, 2001 at 10:58:55AM -0500 References: <200101111558.KAA15447@cj20424-a.reston1.va.home.com> Message-ID: <20010111113850.F4640@lyra.org> On Thu, Jan 11, 2001 at 10:58:55AM -0500, Guido van Rossum wrote: > > Please don't use __all__. At the moment, __all__ is the only way > > to easily tell whether a particular module object really represents > > a package, and the only way to get the list of submodule names. > > > > If __all__ is overloaded to also represent exportable symbols in > > modules, these two pieces of information will be impossible (or > > require much ugly hackery) to obtain. > > Marc-Andre already explained that __all__ is not to be trusted. > > If you want a reasonably good test for package-ness, use the presence > of __path__. > > For a really good test, check whether __file__ ends in __init__.py[c]. Even that isn't safe: if the module was pulled from an archive, __file__ might not get set. Determining whether something is a package is highly dependent upon how it was brought into the system. It is entirely possibly that you *can't* know something represents a package. You can get close by looking in sys.modules to look for modules "below" the given module. But if none have been imported yet, then you're out of luck. If you're using imputil, then you can look for __ispkg__ in the module. Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas at xs4all.net Thu Jan 11 20:50:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 20:50:24 +0100 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Fri, Jan 12, 2001 at 04:09:13AM +0200 References: <20010112020913.1FE70A82F@darjeeling.zadka.site.co.il> Message-ID: <20010111205024.Z2467@xs4all.nl> On Fri, Jan 12, 2001 at 04:09:13AM +0200, Moshe Zadka wrote: > Why not __init__? It has to be there, and is in no other module object. Wrong association... __init__ would be a method that gets executed. (At least that's what I'd expect :) 'sides,-everyone-was-in-agreement-on-__all__-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From MarkH at ActiveState.com Thu Jan 11 21:25:30 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 11 Jan 2001 12:25:30 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> Message-ID: > It was more convenient that way. And I'm pretty certain the _DECREF > macros do not change their arguments. Pretty certain??? That doesn't inspire confidence . How certain are you that this will be true in the future? I think it bad style indeed - for example, I could see benefit in having DECREF (or _Py_Dealloc, called by decref) set the object to NULL in debug builds. What if that decision is taken in the future? I thought rules were pretty clear with reference counting - dont assume _anything_ about the object unless you hold a reference (or are damn sure someone else does!) Mark. From thomas at xs4all.net Thu Jan 11 22:41:57 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 11 Jan 2001 22:41:57 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: ; from MarkH@ActiveState.com on Thu, Jan 11, 2001 at 12:25:30PM -0800 References: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> Message-ID: <20010111224157.A2467@xs4all.nl> On Thu, Jan 11, 2001 at 12:25:30PM -0800, Mark Hammond wrote: > I thought rules were pretty clear with reference counting - dont assume > _anything_ about the object unless you hold a reference (or are damn sure > someone else does!) Moshe isn't breaking that rule. He isn't assuming anything about the object, just about the value of the pointer to that object. I agree, though, that it's bad practice to rely on it having the old value, after DECREFing it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Thu Jan 11 22:48:46 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 16:48:46 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 08:42:44 PST." References: Message-ID: <200101112148.QAA16227@cj20424-a.reston1.va.home.com> > Sorry, you're right. I retract my comment about __all__. Can you explain *why* you wanted to test for package-ness? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jan 11 22:55:24 2001 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Jan 2001 16:55:24 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Thu, 11 Jan 2001 11:14:00 EST." References: Message-ID: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> > I've put a new version of the setup.py script at > http://www.mems-exchange.org/software/files/python/setup.py > > (I'm at work and can't remember the password to get into > www.amk.ca. :) ) > > This version improves the detection of Tcl/Tk, handles the > _curses_panel module, and doesn't do a chdir(). Same drill as before: > just grab the script, drop it in the root of your Python source tree > (2.0 or current CVS), run "./python setup.py build", and look at the > modules it compiles. I can try it on Linux, so I'm most interested in > hearing reports for other Unix versions (*BSD, HP-UX, etc.) Good work -- but I still can't run this inside a platform-specific subdirectory. Are you planning on supporting this? --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Thu Jan 11 22:20:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 11 Jan 2001 22:20:45 +0100 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) Message-ID: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> > I would very much appreciate your feedback At the first glance, it looks *very* promising. I really look forward to see it in 2.1. However, robustness probably needs to be improved: >>> help() Traceback (most recent call last): File "", line 1, in ? TypeError: not enough arguments to help(); expected 1, got 0 Wasn't there even a proposal that >>> help should do something meaningful (by implementing __repr__)? >>> import string >>> help(string) Traceback (most recent call last): File "", line 1, in ? File "pydoc.py", line 183, in help pager('Help on %s:\n\n' % desc + textdoc.document(thing)) File "./textdoc.py", line 171, in document if inspect.ismodule(object): results = document_module(object) File "./textdoc.py", line 87, in document_module if (inspect.getmodule(value) or object) is object: File "./inspect.py", line 190, in getmodule file = getsourcefile(object) File "./inspect.py", line 204, in getsourcefile filename = getfile(object) File "./inspect.py", line 172, in getfile raise TypeError, 'arg is a built-in class' TypeError: arg is a built-in class Also, the tools could use some command line options: martin at mira:~/pydoc > ./pydoc.py --help Traceback (most recent call last): File "./pydoc.py", line 190, in ? opts[args[i][1:]] = args[i+1] IndexError: list index out of range At a minimum, I propose -h, --help, -v, -V. Regards, Martin From fdrake at acm.org Thu Jan 11 23:11:24 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 11 Jan 2001 17:11:24 -0500 (EST) Subject: [Python-Dev] [PEP 205] Weak References PEP updated, patch available! Message-ID: <14942.12172.129547.770776@cj42289-a.reston1.va.home.com> I've updated the Weak References PEP a little: http://python.sourceforge.net/peps/pep-0205.html A preliminary version of the implementation and documentation is available as well: http://sourceforge.net/patch/?func=detailpatch&patch_id=103203&group_id=5470 Please send feedback on the PEP or implementation to me. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From akuchlin at mems-exchange.org Thu Jan 11 23:26:33 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 11 Jan 2001 17:26:33 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: <200101112155.QAA16678@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Jan 11, 2001 at 04:55:24PM -0500 References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> Message-ID: <20010111172633.A26249@kronos.cnri.reston.va.us> On Thu, Jan 11, 2001 at 04:55:24PM -0500, Guido van Rossum wrote: >Good work -- but I still can't run this inside a platform-specific >subdirectory. Are you planning on supporting this? I didn't really understand this when you pointed it out, but forgot to ask for clarification. What does your directory layout look like? --amk From ping at lfw.org Thu Jan 11 23:26:53 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 14:26:53 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> Message-ID: On Thu, 11 Jan 2001, Martin v. Loewis wrote: > > However, robustness probably needs to be improved: Agreed. > Wasn't there even a proposal that > > >>> help > > should do something meaningful (by implementing __repr__)? There was. I am planning to incorporate Paul Prescod's mechanism for doing this; i just didn't have time to throw in that feature yet, and wanted feedback on the man-like stuff first. My next two targets are: 1. Generating text from the HTML documentation files using Paul Prescod's stuff in onlinehelp.py. 2. Running a background HTTP server that produces its pages using htmldoc.py. Both are pieces we already have and only need to integrate; i just wanted to get at least a working candidate done first. Did using pydoc like "man" work okay for you? > >>> import string > >>> help(string) > Traceback (most recent call last): ... > TypeError: arg is a built-in class Mine doesn't do this for me. I think i may have left up an older version of inspect.py by mistake. Try downloading http://www.lfw.org/python/inspect.py again -- apologies for the hassle. > Also, the tools could use some command line options: > > martin at mira:~/pydoc > ./pydoc.py --help > Traceback (most recent call last): > File "./pydoc.py", line 190, in ? > opts[args[i][1:]] = args[i+1] > IndexError: list index out of range > > At a minimum, I propose -h, --help, -v, -V. Okay. There is usage help already; i just failed to make it sufficiently robust about deciding when to show it. skuld[1010]% pydoc /home/ping/bin/pydoc ... Show documentation on something. may be the name of a Python function, module, package, or a dotted reference to a class or function within a module or module in a package. /home/ping/bin/pydoc -k Search for a keyword in the short descriptions of modules. -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler From ping at lfw.org Thu Jan 11 23:28:44 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 14:28:44 -0800 (PST) Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: <200101112148.QAA16227@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Guido van Rossum wrote: > > Sorry, you're right. I retract my comment about __all__. > > Can you explain *why* you wanted to test for package-ness? Auto-generating documentation. pydoc.py currently tests for __path__, and looks for the presence of __init__.py in a subdirectory to mean that the subdirectory name is a package name. Is it safe on all platforms to just list all .py files in the subdirectory to get all submodules? -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler From tim.one at home.com Fri Jan 12 00:17:06 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 18:17:06 -0500 Subject: [Python-Dev] RE: Baffled on Windows In-Reply-To: Message-ID: [Mark Hammond] > winsound adds "/export:initwinsound" to the link line. This is an > alternative to __declspec in the sources. Yup/arghghghgh. It's fixed now. Thanks! > This all gets back to a discussion we had here nearly a year > or so ago - Yup/arghghghgh. . > that "DL_EXPORT" isnt capturing our semantics, and that we should > probably create #defines that match the _intent_ of the > definition, rather than the implementation details - ie, replace > DL_EXPORT with (say) PY_API_DECL and PY_MODULEINIT_DECL or some > such. Yup/noarghghghgh. > I'm happy to think about this and help implement it if the time > is now right... Same here. Now how can we tell whether the time is right? I must say, it hasn't gotten better by leaving it alone for a year. I think we need a Unix dweeb to play along, though -- if only to confirm that their compilers are no help. >> Any Windows geek got a clue? > Isn't that question a paradox? ;-) Well, nobody else will understand this, but *we* know that Windows geeks need more clues than everyone else put together just to get the box booted each day (or hour <0.9 wink>). From michel at digicool.com Fri Jan 12 02:15:52 2001 From: michel at digicool.com (Michel Pelletier) Date: Thu, 11 Jan 2001 20:15:52 -0500 Subject: [Python-Dev] New Draft PEP: Python Interfaces Message-ID: Hello, I have roughed out a draft PEP that proposes the extension of Python to include an interface framework. It is posted online here: http://www.zope.org/Members/michel/InterfacesPEP/PEP.txt This is my first revision and stab at a PEP. I'd like to find out what you think about the PEP and maybe discuss it some more offline on a different list. Thanks! -Michel From martin at loewis.home.cs.tu-berlin.de Fri Jan 12 02:15:25 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 02:15:25 +0100 Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: (message from Ka-Ping Yee on Thu, 11 Jan 2001 14:26:53 -0800 (PST)) References: Message-ID: <200101120115.f0C1FPx03702@mira.informatik.hu-berlin.de> > Did using pydoc like "man" work okay for you? Yes, that is very impressive. > Mine doesn't do this for me. I think i may have left up an older version > of inspect.py by mistake. Try downloading > > http://www.lfw.org/python/inspect.py > > again -- apologies for the hassle. No need to apologize. It works fine now. Thanks, Martin From moshez at zadka.site.co.il Fri Jan 12 10:53:35 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 11:53:35 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: References: Message-ID: <20010112095335.E8A15A82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001, "Mark Hammond" wrote: > I think it bad style indeed - for example, I could see benefit in having > DECREF (or _Py_Dealloc, called by decref) set the object to NULL in debug > builds. What if that decision is taken in the future? > > I thought rules were pretty clear with reference counting - dont assume > _anything_ about the object unless you hold a reference (or are damn sure > someone else does!) I'm not assuming anything about the object -- I'm assuming something about the pointer. And macros should not change their arguments -- DECREF is basically a wrapper around _Py_Dealloc((PyObject *)(op)). Just like free(pointer); if (pointer == NULL) do_something(); is perfectly legal C. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From moshez at zadka.site.co.il Fri Jan 12 10:57:32 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 11:57:32 +0200 (IST) Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <14941.59707.632995.224116@beluga.mojam.com> References: <14941.59707.632995.224116@beluga.mojam.com> Message-ID: <20010112095732.1F65BA82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001 11:11:23 -0600 (CST), Skip Montanaro wrote: > > I know Guido has said he doesn't want to fiddle with dir(), but my sense of > things from the overall discussion of the __exports__ concept tells me that > when used interactively dir() often presents confusing output for new Python > users. > > I twiddled CGIHTTPServer to have __all__ and added the following dir() > function to my PYTHONSTARTUP file: > > def dir(o,showall=0): > if not showall and hasattr(o, "__all__"): > x = list(o.__all__) > x.sort() > return x > from __builtin__ import dir as d > return d(o) > > Compare its output with and without showall set: > > >>> dir(CGIHTTPServer) > ['CGIHTTPRequestHandler', 'test'] > >>> dir(CGIHTTPServer,1) > ['BaseHTTPServer', 'CGIHTTPRequestHandler', 'SimpleHTTPServer', '__all__', > '__builtins__', '__doc__', '__file__', '__name__', '__version__', > 'executable', 'nobody', 'nobody_uid', 'os', 'string', 'sys', 'test', > 'urllib'] > > I haven't demonstrated any great programming prowess with this little > function, but I rather suspect it may be beyond most brand new users. If > Guido can't be convinced to allow dir() to change, how about adding a sample > PYTHONSTARTUP file to the distribution that contains little bits like this > and Ping's pydoc.help stuff (assuming it gets into the distro, which I hope > it does)? And, while we're at it, the following bit too can be in the PYTHONSTARTUP: def display(x): import __builtin__ __builtin__._ = None if type(x) == type(''): print `x` else: print x __built__._ = x import sys sys.displayhook = display -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From tim.one at home.com Fri Jan 12 03:33:59 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 21:33:59 -0500 Subject: [Python-Dev] dir()/__all__/etc In-Reply-To: <20010111034709.C23512@glacier.fnational.com> Message-ID: [Neil Schemenauer] > I'm -1 on making dir() pay attention to __all__. Me too. The original __exports__ idea was an ironclad guarantee about which names were externally visible for *any* purpose. Then it made sense to restrict dir() accordingly. But if __all__ is just "a hint" (to be ignored or honored at whim, by whoever chooses), the introspective uses of dir() must be served too. > I'm +1 on adding a help() function which pays attention to > __all__ and (optionally?) prints doc strings. I can't be +1 on anything that vague -- although I'm +1 on each part of it if done in exactly the way I envision . From ping at lfw.org Fri Jan 12 03:51:54 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 11 Jan 2001 18:51:54 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101120115.f0C1FPx03702@mira.informatik.hu-berlin.de> Message-ID: On Fri, 12 Jan 2001, Martin v. Loewis wrote: > > Did using pydoc like "man" work okay for you? > > Yes, that is very impressive. Good. What platform did you try it on? I have updated the scripts now to provide a very rudimentary HTTP server feature: skuld[1316]% pydoc -p 8080 starting server on port 8080 This starts a server on port 8080 that generates HTML documentation for modules on the fly. The root page (http://localhost:8080/) shows an index of modules -- it badly needs some cleaning up, but at least it provides access to all the documentation. http://www.lfw.org/python/pydoc.py http://www.lfw.org/python/htmldoc.py Also, as you requested: skuld[1324]% pydoc -h /home/ping/bin/pydoc ... Show documentation on something. may be the name of a Python function, module, package, or a dotted reference to a class or function within a module or module in a package. /home/ping/bin/pydoc -k Search for a keyword in the short descriptions of modules. /home/ping/bin/pydoc -p Start an HTTP server on the given port on the local machine. More to come. -- ?!ng From fdrake at acm.org Fri Jan 12 04:02:00 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 11 Jan 2001 22:02:00 -0500 (EST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: References: <200101112120.f0BLKjc01982@mira.informatik.hu-berlin.de> Message-ID: <14942.29609.19618.534613@cj42289-a.reston1.va.home.com> Ka-Ping Yee writes: > My next two targets are: > 1. Generating text from the HTML documentation files > using Paul Prescod's stuff in onlinehelp.py. You mean the ones I publish as the standard documentation? Relying on the structure of that HTML is pure folly! I don't think I can make any guaranttees that the HTML structures won't change as the processing evolves. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Fri Jan 12 04:49:47 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 22:49:47 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101111523.KAA14982@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I don't want to call FLOCKFILE while holding the Python lock, as > this means that *if* we're blocked in FLOCKFILE (e.g. we're reading > from a pipe or socket), no other Python thread can run! Ah, good point! Doesn't appear an essential point, though: the HAVE_GETC_UNLOCKED code could still be fiddled easily enough to call FLOCKFILE and FUNLOCKFILE exactly once per line, but with the first thread release before the (dynamically only) FLOCKFILE and the last thread grab after the (dynamically only) FUNLOCKFILE. It's just a question of will, but since that's lacking I'll drop it. > ... > I don't think that _filbuf can possibly wait for another thread to > write data to the same stream object. OK, I'll buy that. Dropped too. > ... > OK. It's unique to MS. So close the bug report with a "won't fix" > resolution. There's no point in having bug reports remain open that > we know we can't fix. We don't really have a policy about that. Perhaps you're articulating one here, though! I've always left bugs open if they're (a) bugs, and (b) open . For example, I left the Norton Blue-Screen crash bug open (although I see now you eventually closed that). Ditto the "Rare hangs in w9xpopen.exe" bug (which is still open, but will never be fixed by *us*). Just other examples of things we'll almost certainly never fix ourselves (we have no handle on them, and all evidence says the OS is screwing up). My view has been that if a user comes to the bug site, it's most helpful for them if active (== "still happens") crashes and hangs appear among the open problems. Now that your view of it is clearer, I'll switch to yours. too-easy-ly y'rs - tim From tim.one at home.com Fri Jan 12 05:22:40 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 11 Jan 2001 23:22:40 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: <200101111527.KAA15005@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > The locking prevents concurrent threads accessing the stream. > > But mixing reads and writes (without intervening fseek etc.) is > illegal use of the stream, and the C standard allows them to be lax > here, even if the program was single-threaded. > > In other words: the locking is so good that it serializes the > sequence of reads and writes; but if the sequence of reads and > writes is illegal, they don't guarantee anything. We're never going to agree on this one, you know. My definition of "bug" here has nothing to do with the std: something's "a bug" if it's not functioning as designed. That's all. So if the implementers would say "oops! that should not have happened!", then to me it's "a bug". It so happens I believe the MS implementers would consider this to be a bug under that defn. Multi-threaded libraries have to be written to a much higher level than the C std guarantees (been there, done that, and so have you), and this is specifically corruption in a crucial area vulnerable to races. They have a timing hole! That's clear. If the MS implementers don't believe that's "a bug", then I'd say they're too unprofessional to be allowed in the same country as a multithreaded library <0.1 wink>. Your definition of "bug" seems to be more "I don't want it in Python's open bug list, so I'll do what Tim usually does and appeal to the std in a transparent effort to convince someone that it's not really 'a bug' -- then maybe I'll get it off of Python's bug list". I'm sure you'll agree that's a fair summary of both sides . it's-a-bug-and-it's-no-longer-on-python's-open-bug-list-ly y'rs - tim From tim.one at home.com Fri Jan 12 07:54:47 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 12 Jan 2001 01:54:47 -0500 Subject: [Python-Dev] RE: [Patches] [Patch #102915] xreadlines : readlines :: xrange : range In-Reply-To: <200101111508.KAA14870@cj20424-a.reston1.va.home.com> Message-ID: [Tim, on for_xreadlines vs readlines_sizehint, after disabling the default 1Mb buffer size in the latter] > They're indistinguishable then on my box (on one run xreadlines > is .1 seconds (out of around 7.6 total) quicker, on another > readlines_sizehint), *provided* that I specify the same buffer > size (8192) that xreadlines uses internally. However, if I even > double that, readlines_sizehint is uniformly about 10% slower. It's > also a tiny bit slower if I cut the sizehint buffer size to 4096. [Guido] > 8192 happens to be the size of the stack-allocated buffer readlines() > uses, and also the stdio BUFSIZ parameter, on many systems. Look for > SMALLCHUNK in fileobject.c. > > Would it make sense to tie the two constants together more to tune > this optimally even when BUFSIZ is different? Have to repeat what I first said: > I'm afraid Mysteries will remain no matter how many > person-decades we spend staring at this <0.5 wink> ... I'm repeating that because BUFSIZ is 4096 on WinTel, but SMALLCHUNK (8192) worked best for me. Now we're in some complex balancing act among how often the outer loop needs to refill the readlines_sizehint buffer;, how out of whack the latter is with the platform stdio buffer; whether platform malloc takes only twice as long to allocate space for 2*N strings as for N; and, if the readlines buffer is too large, at exactly which point the known Win9x eventually-quadratic-time behavior of PyList_Append starts to kick in. I can't out-think all that. Indeed, I can't out-think any of it . After staring at the code, I expect my "only a tiny bit slower" was an illusion: if 0 < sizehint <= SMALLCHUNK, sizehint appears to have no effect on the operation on file_readline. BTW, changing fileobject.c's SMALLCHUNK to a copy of BUFSIZ didn't make any difference on Windows. From moshez at zadka.site.co.il Fri Jan 12 17:03:58 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 12 Jan 2001 18:03:58 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,1.2,1.3 In-Reply-To: References: Message-ID: <20010112160358.B0AC0A82D@darjeeling.zadka.site.co.il> On Thu, 11 Jan 2001, Thomas Wouters wrote: > Noone but me cares, but Guido said to go ahead and fix it if it bothered me. I think you meant no one. Noone is an archaic spelling of noon. quid-pro-quo-ly y'rs, Z. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From fredrik at effbot.org Fri Jan 12 09:17:11 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 12 Jan 2001 09:17:11 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules xreadlinesmodule.c,1.2,1.3 References: <20010112160358.B0AC0A82D@darjeeling.zadka.site.co.il> Message-ID: <012a01c07c70$11aac700$e46940d5@hagrid> > > Noone but me cares, but Guido said to go ahead and fix it if it bothered me. > > I think you meant no one. Noone is an archaic spelling of noon. no, he meant me. I care. From martin at loewis.home.cs.tu-berlin.de Fri Jan 12 09:09:00 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 09:09:00 +0100 Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: (message from Ka-Ping Yee on Thu, 11 Jan 2001 18:51:54 -0800 (PST)) References: Message-ID: <200101120809.f0C890B00802@mira.informatik.hu-berlin.de> > Good. What platform did you try it on? Linux, in a Konsole. I guess that is an environment you'd been using as well :-) Martin From jack at oratrix.nl Fri Jan 12 10:57:27 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 12 Jan 2001 10:57:27 +0100 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message by Ka-Ping Yee , Thu, 11 Jan 2001 08:36:36 -0800 (PST) , Message-ID: <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> > I'm pleased to announce a reasonable first pass at a documentation > utility for interactive use. "pydoc" is usable in three ways: [...] > I would very much appreciate your feedback, especially from testing > on non-Unix platforms. Thank you! Wow, I'm impressed! To make it run on the mac I had to add tests for the existence of os.system only. (So all statements "if os.system(...) > 0:" got to be "if hasattr(os, "system") and os.system(...) > 0:"). There are however various other niceties that could be added to make it more useful, can this be put into the repository or something? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From gstein at lyra.org Fri Jan 12 11:31:53 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 12 Jan 2001 02:31:53 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.217,2.218 In-Reply-To: <20010111224157.A2467@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 11, 2001 at 10:41:57PM +0100 References: <20010112020637.EF4D5A82F@darjeeling.zadka.site.co.il> <20010111224157.A2467@xs4all.nl> Message-ID: <20010112023153.Q4640@lyra.org> On Thu, Jan 11, 2001 at 10:41:57PM +0100, Thomas Wouters wrote: > On Thu, Jan 11, 2001 at 12:25:30PM -0800, Mark Hammond wrote: > > > I thought rules were pretty clear with reference counting - dont assume > > _anything_ about the object unless you hold a reference (or are damn sure > > someone else does!) > > Moshe isn't breaking that rule. He isn't assuming anything about the object, > just about the value of the pointer to that object. I agree, though, that > it's bad practice to rely on it having the old value, after DECREFing it. Oh, that is just so much baloney. If I said Py_DECREF(&ptr), *then* I'd be worried. But if I ever call Py_DECREF(foo) and it modifies foo, then I'd be quite upset. "functions" just aren't supposed to do that. -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Jan 12 14:51:51 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 08:51:51 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Thu, 11 Jan 2001 17:26:33 EST." <20010111172633.A26249@kronos.cnri.reston.va.us> References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> Message-ID: <200101121351.IAA19676@cj20424-a.reston1.va.home.com> > >Good work -- but I still can't run this inside a platform-specific > >subdirectory. Are you planning on supporting this? > > I didn't really understand this when you pointed it out, but forgot to > ask for clarification. What does your directory layout look like? Ah. It's very simple. I create a directory "linux" as a subdirectory of the Python source tree (i.e. at the same level as Lib, Objects, etc.). Then I chdir into that directory, and I say "../configure". The configure script creates subdirectories to hold the object files for me: Grammar, Parser, Objects, Python, Modules, and sticks Makefiles in them. The "srcdir" variable in the Makefiles is set to "..". Then I say "make" and it builds Python. The source directories are used but no files are created or modified there: all files are created in the "linux" directory. This lets me have several separate configurations: the feature used to be intended for sharing a source tree between multiple platforms, but now I use it to have threaded, nonthreaded, debugging, and regular builds under a single source tree. This also works where the build directory is completely outside the source tree (some people apparently mount the source tree read-only). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 12 14:54:12 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 08:54:12 -0500 Subject: [Python-Dev] Add __exports__ to modules In-Reply-To: Your message of "Thu, 11 Jan 2001 14:28:44 PST." References: Message-ID: <200101121354.IAA19700@cj20424-a.reston1.va.home.com> > > Can you explain *why* you wanted to test for package-ness? > > Auto-generating documentation. pydoc.py currently tests for __path__, > and looks for the presence of __init__.py in a subdirectory to mean > that the subdirectory name is a package name. Is it safe on all platforms > to just list all .py files in the subdirectory to get all submodules? Yes, that should work. Of course there could also be extension modules or .pyc-only files there -- you could use imp..get_suffixes() to find out all modules (even if that means you don't always have the source code available). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 12 15:07:30 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 09:07:30 -0500 Subject: [Python-Dev] xreadlines : readlines :: xrange : range In-Reply-To: Your message of "Thu, 11 Jan 2001 22:49:47 EST." References: Message-ID: <200101121407.JAA19781@cj20424-a.reston1.va.home.com> > [Guido] > > I don't want to call FLOCKFILE while holding the Python lock, as > > this means that *if* we're blocked in FLOCKFILE (e.g. we're reading > > from a pipe or socket), no other Python thread can run! [Tim] > Ah, good point! Doesn't appear an essential point, though: the > HAVE_GETC_UNLOCKED code could still be fiddled easily enough to call > FLOCKFILE and FUNLOCKFILE exactly once per line, but with the first thread > release before the (dynamically only) FLOCKFILE and the last thread grab > after the (dynamically only) FUNLOCKFILE. It's just a question of will, but > since that's lacking I'll drop it. Yes, but if the line is very long, you'd have to use malloc() -- you can't use _PyString_Resize() since that can access the thread state. You're right that I don't want to do this. > > OK. It's unique to MS. So close the bug report with a "won't fix" > > resolution. There's no point in having bug reports remain open that > > we know we can't fix. > > We don't really have a policy about that. Perhaps you're articulating one > here, though! I've always left bugs open if they're (a) bugs, and (b) open > . For example, I left the Norton Blue-Screen crash bug open (although > I see now you eventually closed that). Ditto the "Rare hangs in > w9xpopen.exe" bug (which is still open, but will never be fixed by *us*). > Just other examples of things we'll almost certainly never fix ourselves (we > have no handle on them, and all evidence says the OS is screwing up). Yes, as I was thinking about this I realized that that was the policy I wanted. So, yes, the w9xpopen popen bug can be closed as WontFix too. > My view has been that if a user comes to the bug site, it's most helpful for > them if active (== "still happens") crashes and hangs appear among the open > problems. Now that your view of it is clearer, I'll switch to yours. I find it more important that the bug list gives us developers an overview of tasks to be tackled. The problems that won't go away can be listed in the Python 2.0 MoinMoin web! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jan 12 15:27:43 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 09:27:43 -0500 Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Your message of "Fri, 12 Jan 2001 10:57:27 +0100." <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> References: <20010112095727.C56D13BD8B0@snelboot.oratrix.nl> Message-ID: <200101121427.JAA20034@cj20424-a.reston1.va.home.com> > There are however various other niceties that could be added to make it more > useful, can this be put into the repository or something? Ping, do you think you could check this in into the nondist tree? nondist/sandbox/help would seem a good name (next to Paul's nondist/sandbox/doctools). --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Fri Jan 12 17:37:57 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 12 Jan 2001 10:37:57 -0600 (CST) Subject: [Python-Dev] [Patch #103154] Cygwin Check Import Case Patch In-Reply-To: References: Message-ID: <14943.13029.103771.261362@beluga.mojam.com> Guido> Summary: Cygwin Check Import Case Patch ... Guido> But I believe the solution is that the TERMIOS module should be Guido> renamed. Isn't this a general problem? As I recall, the convention when generating Python modules from C header files is to simply convert the base name to upper case and replace ".h" with ".py" (errno.h -> ERRNO.py). From h2py.py: # Without filename arguments, acts as a filter. # If one or more filenames are given, output is written to corresponding # filenames in the local directory, translated to all uppercase, with # the extension replaced by ".py". Perhaps the convention should be instead to append "d" or "data" to the base name (errno.h -> errnodata.py). Skip From guido at python.org Fri Jan 12 18:47:46 2001 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Jan 2001 12:47:46 -0500 Subject: [Python-Dev] [Patch #103154] Cygwin Check Import Case Patch In-Reply-To: Your message of "Fri, 12 Jan 2001 10:37:57 CST." <14943.13029.103771.261362@beluga.mojam.com> References: <14943.13029.103771.261362@beluga.mojam.com> Message-ID: <200101121747.MAA27504@cj20424-a.reston1.va.home.com> > Guido> Summary: Cygwin Check Import Case Patch > ... > Guido> But I believe the solution is that the TERMIOS module should be > Guido> renamed. > > Isn't this a general problem? As I recall, the convention when generating > Python modules from C header files is to simply convert the base name to > upper case and replace ".h" with ".py" (errno.h -> ERRNO.py). From h2py.py: > > # Without filename arguments, acts as a filter. > # If one or more filenames are given, output is written to corresponding > # filenames in the local directory, translated to all uppercase, with > # the extension replaced by ".py". > > Perhaps the convention should be instead to append "d" or "data" to the base > name (errno.h -> errnodata.py). An even better solution is to get rid of those generated headers and incorporate the desired symbols directly in the C extension modules. That's happened for errno and socket, for example; maybe it's time to do that for termios, too! --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Fri Jan 12 19:54:47 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 12 Jan 2001 13:54:47 -0500 Subject: [Python-Dev] Patch 103216 - dbmmodule Setup changes Message-ID: <14943.21239.382891.661026@anthem.wooz.org> I've just uploaded patch 103216 to the Python project at SF. This does a couple of things. First, it auto-detects (in configure) whether dbmmodule can be built, and if so whether the -lndbm library needs to be specified. Second, it moves the entry for dbmmodule to Setup.conf, after the *shared* key so that it'll be built as a dynamic library by default. This should fix the problem where compiling in dbmmodule sets up a dependency to libdb which later hoses pybsddb3. I'd have just checked it in, but I'd like someone else to just proof it first. I've only tested this with the current CVS tree on a fairly stock RH6.1. BTW, I didn't include the changes to configure in the patch, because it's large and made SF's patch manager cough. Besides it can be generated from configure.in and config.h.in which are included in the patch. Cheers, -Barry From martin at loewis.home.cs.tu-berlin.de Fri Jan 12 23:19:57 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 12 Jan 2001 23:19:57 +0100 Subject: [Python-Dev] PEP 205 comments Message-ID: <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> Before commenting on the patch itself, I'd like to comment on the patch describing it. I'm missing a discussion as to why weak references don't act as proxies (or why they do now). A weak proxy would provide the same attributes as the object which it encapsulates, so it could be used transparently in place of the original object. I can think of a number of reasons why it is not done this way (e.g. complete transparency is impossible to achieve); now that a revision of the patch provides proxies, the documentation should state which features are forwarded to the proxy and which aren't (it lists the type() as a difference, but I doubt that is the only difference - repr is also different). Next, I wonder whether weakref.new is allowed to return an existing weak reference to the same object. If that is not acceptable, I'd like to know why - if it was acceptable, then weakref.new(instance) (i.e. without callback) could return the same weak reference all the time. A smart implementation might chose to put the weak reference with no callback in the start of the list, so creation of additional weak references to the same object would be inexpensive. Likewise, I'd like to know the rationale for the clear method. Why is it desirable to drop the object, yet keep the weak reference? Isn't it easier for the application to either ignore clearing altogether, or dropping the reference to the weak reference? So I'd propose to kill the clear method. Again on proxies, there is no discussion or documentation of the ReferenceError. Why is it a RuntimeError? LookupError, ValueError, and AttributeError seem to be just as fine or better. On to the type type extensions: Should there be a type flag indicating presence of tp_weaklistoffset? It appears that the type structure had tp_xxx7 for a long time, so likely all in-use binary modules have that field set to zero. Is that sufficient? Thanks for reading all of this message, Martin From skip at mojam.com Sat Jan 13 16:37:55 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 13 Jan 2001 09:37:55 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tempfile.py,1.23,1.24 In-Reply-To: References: Message-ID: <14944.30291.658931.489979@beluga.mojam.com> Tim> On Linux, someone please run that standalone with more files and/or Tim> more threads; e.g., Tim> python lib/test/test_threadedtempfile.py -f 1000 -t 10 Tim> to run with 10 threads each creating (and deleting) 1000 temp files. After capitalizing "Lib", it worked fine for me: % ./python Lib/test/test_threadedtempfile.py -f 1000 -t 10 Creating Starting Reaping Done: errors 0 ok 10000 Skip From dkwolfe at pacbell.net Sat Jan 13 19:48:21 2001 From: dkwolfe at pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 10:48:21 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore Message-ID: <0G740027Q6Q1KL@mta6.snfc21.pbi.net> Howdy Folks, I need some help here. I'd like to see Python build out of the box with a ./configure, make, make test, and make install on Darwin and Mac OS X. Having it build out of the box will make it easier to be incorporated into both Darwin and the base Mac OS X distribution - although not for the initial release of the latter but definitely doable for subsequent releases. In order to do this, I need to have it build cleanly on HFS and UFS filesystems. Under HFS system, I've got a name conflict due to case insenstivity between the build target and the "Python" directory that forces me to build with a -with-suffix command on HFS and manually change the name after install - which is an automatic knockout factor when it comes to incorporating it in an automatic build system. Not to mention a problem with unix newbies trying to build from source... Last night, I did some quick investigation to determine the best way to fix this problem as documented in PEP-42 in the build section and Sourceforge bug 122215 and determined that the easiest and least error prone way was to change the directory name Python to PyCore. It's apparent from the comments that I'm missing something here as the reaction has been negative so far - to the point where Guido has rejected the patch. Can someone explain what I'd missing that's causing such strong feelings? My second question is how do I resolve the name conflict in an approved way? It's been suggested that a build directory be created (/src/build ?) and that the target be place here. The problem that I had with this suggestion is that it would require an additional layer to execute the target and I wasn't sure what impact it whould have on running python from a new directory... which is the reason I took the more known path. :-) Bottom line, come March 24th, Mac OS X 1.0 will be released and as of July 2001 all Macintoshes will come with Mac OS X. I'd like to see Python be easily built on "out of the box" these machines - rather come with a haphazardous list of instructions or commands as currently needed for 1.5.2 and 2.0 releases. And hopefully, at some point be incorporated into the base Mac OS X installation... - Dan Wolfe From esr at thyrsus.com Sat Jan 13 21:23:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 15:23:50 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? Message-ID: <20010113152350.A17338@thyrsus.com> I have a new goodie for the 2.1 standard library, a module called "simil" that supports computation of similarity indices between strings such as one might use for recovery-matching of misspellings against a dictionary. The three methods supported are stemming, normalized Hamming similarity, and (the star of the show) Ratcliff-Obershelp gestalt subpattern matching. The latter is spookily effective for detecting not just substition typos but insertions and deletions. The module is a C extension (my first!) for speed and because the Ratcliff-Obershelp implementation uses pointer arithmetic heavily. It's documented, tested, and ready to go. But having written it, I now have a question: why is soundex marked obsolete? Is there something wrong with the algorithm or implementation? If not, then it would be natural for simil to absorb the existing soundex implementation as a fourth entry point. -- Eric S. Raymond Whether the authorities be invaders or merely local tyrants, the effect of such [gun control] laws is to place the individual at the mercy of the state, unable to resist. -- Robert Anson Heinlein, 1949 -- Eric S. Raymond Americans have the right and advantage of being armed - unlike the citizens of other countries whose governments are afraid to trust the people with arms. -- James Madison, The Federalist Papers From tim.one at home.com Sat Jan 13 22:34:10 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 16:34:10 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113152350.A17338@thyrsus.com> Message-ID: [Eric S. Raymond] > I have a new goodie for the 2.1 standard library, a module called > "simil" that supports computation of similarity indices between > strings such as one might use for recovery-matching of misspellings > against a dictionary. My guess is that Guido won't accept it. > The three methods supported are stemming, normalized Hamming > similarity, and (the star of the show) Ratcliff-Obershelp gestalt > subpattern matching. The latter is spookily effective for detecting > not just substition typos but insertions and deletions. The module is > a C extension (my first!) for speed and because the Ratcliff-Obershelp > implementation uses pointer arithmetic heavily. Never heard of R-O, so tracked down some C code via google. It appears I invented the same algorithm at Cray Research in the early 80's for a diff generator, which later got reincarnated in my ndiff.py (in the Tools/scripts/ directory). ndiff generates "human-friendly" diffs between text files, at both the "file is a sequence of lines" and "line is a sequence of characters" levels. I didn't have the hyperbolic marketing genius to call it "gestalt subpattern matching", though -- I thought of it as what Unix diff *would* do if it constrained itself to matching *contiguous* subsequences, and under the theory people would find that more natural because contiguity is something the human visual system naturally latches on to. ndiff can be spookily natural in practice too. > It's documented, tested, and ready to go. But having written it, I > now have a question: why is soundex marked obsolete? Is there > something wrong with the algorithm or implementation? What is the soundex algorithm? Not joking. Skip Montanaro and I were unable to find the algorithm implemented by soundex.c anywhere in the literature, and I never found *any* two definitions that were the same. Even Knuth changed his description of Soundex between editions 2 and 3 of volume 3. Skip eventually merged my and Fred Drake's Python implementations of Knuth Vol 3 Ed 3 Soundex (see the Vaults of Parnassus). > If not, then it would be natural for simil to absorb the existing > soundex implementation as a fourth entry point. Well, soundex.c doesn't match any other Soundex on earth, so it's not worth reproducing in new code. Guido doesn't want to be in the middle of fighting over ill-defined algorithms, so booted Soundex entirely. Another candidate for inclusion is the NYSIIS algorithm, which is probably in more "serious" use than Soundex anyway. Same thing with NYSIIS, though (i.e., what-- exactly --is "the NYSIIS algorithm"?), except that Knuth didn't do us the favor of making up his own variation that will *become* "the std" via force of reputation. Sean True implemented *a* NYSIIS in Python (and again see the Vaults for a link to that). So that's why the module is unlikely to make it into the core: + There are any number of algorithms people may want to see (I don't know what "normalized Hamming similarity" means, but if it's not the same as Levenshtein edit distance then add the latter to the pot too). + Each algorithm on its own is likely controversial. + Computing string similarity is something few apps need anyway. Lots of hassle + little demand == not a natural for the core. ndiff is in the core only because many people found the *app* useful; its SequenceMatcher class isn't even advertised. may-never-understand-how-bigints-got-into-python-ly y'rs - tim From fdrake at acm.org Sat Jan 13 22:45:12 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 13 Jan 2001 16:45:12 -0500 (EST) Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: References: <20010113152350.A17338@thyrsus.com> Message-ID: <14944.52328.558763.46161@cj42289-a.reston1.va.home.com> Tim Peters writes: > + Computing string similarity is something few apps need anyway. And this is a biggie. > Lots of hassle + little demand == not a natural for the core. ndiff is in But it *is* an excellent type of thing to have around -- Eric: just post it on your Web site and register it with the Vaults. > the core only because many people found the *app* useful; its > SequenceMatcher class isn't even advertised. Did you ever write documentation for it? ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas at arctrix.com Sat Jan 13 16:17:58 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sat, 13 Jan 2001 07:17:58 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Sat, Jan 13, 2001 at 02:06:07PM -0800 References: Message-ID: <20010113071758.C28643@glacier.fnational.com> [Guido van Rossum on Demo/embed/loop] > (Except it still leaks, but that's probably a separate issue.) Could this be caused by modules adding things to their dict and then forgetting to decref them? I know I've been guilty of that. Neil From esr at thyrsus.com Sat Jan 13 23:15:28 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 17:15:28 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 04:34:10PM -0500 References: <20010113152350.A17338@thyrsus.com> Message-ID: <20010113171528.A17480@thyrsus.com> OK, now I understand why soundex isn't in the core -- there's no canonical version. Tim Peters : > + There are any number of algorithms people may want to see (I don't know > what "normalized Hamming similarity" means, but if it's not the same as > Levenshtein edit distance then add the latter to the pot too). Normalized Hamming similarity: it's an inversion of Hamming distance -- number of pairwise matches in two strings of the same length, divided by the common string length. Gives a measure in [0.0, 1.0]. I've looked up "Levenshtein edit distance" and you're rigbt. I'll add it as a fourth entry point as soon as I can find C source to crib. (Would you happen to have a pointer?) > + Each algorithm on its own is likely controversial. Not these. There *are* canonical versions of all these, and exact equivalents are all heavily used in commercial OCR software. > + Computing string similarity is something few apps need anyway. Tim, this isn't true. Any time you need to validate user input against a controlled vocabulary and give feedback on probable right choices, R/O similarity is *very* useful. I've had it in my personal toolkit for a decade and used it heavily for this -- you take your unknown input, check it against a dictionary and kick "maybe you meant foo?" to the user for every foo with an R/O similarity above 0.6 or so. The effects look like black magic. Users love it. -- Eric S. Raymond "I hold it, that a little rebellion, now and then, is a good thing, and as necessary in the political world as storms in the physical." -- Thomas Jefferson, Letter to James Madison, January 30, 1787 From guido at python.org Sat Jan 13 23:25:12 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Jan 2001 17:25:12 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: Your message of "Sat, 13 Jan 2001 07:17:58 PST." <20010113071758.C28643@glacier.fnational.com> References: <20010113071758.C28643@glacier.fnational.com> Message-ID: <200101132225.RAA03197@cj20424-a.reston1.va.home.com> > [Guido van Rossum on Demo/embed/loop] > > (Except it still leaks, but that's probably a separate issue.) > > Could this be caused by modules adding things to their dict and > then forgetting to decref them? I know I've been guilty of that. Do you have a tool that detects leaks? Barry has one: Insure++. It's expensive and we don't have a site license, so I'll ask Barry to investigate this. (Barry: go to Demo/embed and do "make looptest". Then in another shell window use "top" to watch the "loop" process grow slowly. I'd love to find out what's the problem here. It's not dependent on what you ask it to loop over; "./loop pass" also grows. Of course it could be one of the modules loaded during initialization...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Jan 13 23:33:34 2001 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Jan 2001 17:33:34 -0500 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Your message of "Sat, 13 Jan 2001 10:48:21 PST." <0G740027Q6Q1KL@mta6.snfc21.pbi.net> References: <0G740027Q6Q1KL@mta6.snfc21.pbi.net> Message-ID: <200101132233.RAA03229@cj20424-a.reston1.va.home.com> > Howdy Folks, > > I need some help here. I'd like to see Python build out of the box with a > ./configure, make, make test, and make install on Darwin and Mac OS X. > Having it build out of the box will make it easier to be incorporated > into both Darwin and the base Mac OS X distribution - although not for > the initial release of the latter but definitely doable for subsequent > releases. In order to do this, I need to have it build cleanly on HFS and > UFS filesystems. > > Under HFS system, I've got a name conflict due to case insenstivity > between the build target and the "Python" directory that forces me to > build with a -with-suffix command on HFS and manually change the name > after install - which is an automatic knockout factor when it comes to > incorporating it in an automatic build system. Not to mention a problem > with unix newbies trying to build from source... > > Last night, I did some quick investigation to determine the best way to > fix this problem as documented in PEP-42 in the build section and > Sourceforge bug 122215 and determined that the easiest and least error > prone way was to change the directory name Python to PyCore. > > It's apparent from the comments that I'm missing something here as the > reaction has been negative so far - to the point where Guido has rejected > the patch. Can someone explain what I'd missing that's causing such > strong feelings? We use CVS to manage the sources. CVS makes it it very hard to a directory; it doesn't have a command for this, so you have to do the move directly in the repository, which will then break checkouts for everyone who has a work directory linked to the CVS repository. Using SourceForge makes it a bit harder still: we have to ask the SF sysadmins to do the move for us. And if we did the move, it would be much harder to reproduce old versions of the source tree with a single CVS command. A way around that would be to do a copy instead of a move, but that would cause the directory "PyCore" to pop up in all old versions, too. I just don't want to go through this hassle in order to make building easier for one relatively little-used platform. > My second question is how do I resolve the name conflict in an approved > way? It's been suggested that a build directory be created (/src/build > ?) and that the target be place here. The problem that I had with this > suggestion is that it would require an additional layer to execute the > target and I wasn't sure what impact it whould have on running python > from a new directory... which is the reason I took the more known path. > :-) I don't understand what you are proposing here; I can't imagine that an extra directory level could cause a slowdown. A suggestion I would be open to: change the executable name during build (currently a .exe suffix is added), but change it back (removing the .exe suffix) during the install. That should be a small change to the Makefile. > Bottom line, come March 24th, Mac OS X 1.0 will be released and as of > July 2001 all Macintoshes will come with Mac OS X. I'd like to see > Python be easily built on "out of the box" these machines - rather come > with a haphazardous list of instructions or commands as currently needed > for 1.5.2 and 2.0 releases. And hopefully, at some point be incorporated > into the base Mac OS X installation... Just get Apple to include Python with their standard distribution and nobody will *have* to build Python on Mac OSX. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sun Jan 14 00:59:44 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 18:59:44 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113171528.A17480@thyrsus.com> Message-ID: [Eric] > OK, now I understand why soundex isn't in the core -- there's no > canonical version. Actually, I think Knuth Vol 3 Ed 3 is canonical *now* -- nobody would dare to oppose him <0.5 wink>. > Normalized Hamming similarity: it's an inversion of Hamming distance > -- number of pairwise matches in two strings of the same length, > divided by the common string length. Gives a measure in [0.0, 1.0]. > > I've looked up "Levenshtein edit distance" and you're rigbt. I'll add > it as a fourth entry point as soon as I can find C source to crib. > (Would you happen to have a pointer?) If you throw almost everything out of Unix diff, that's what you'll be left with. Offhand I don't know of enencumbered, industrial-strength C source; a problem is that writing a program to compute this is a std homework exercise (it's a common first "dynamic programming" example), so you can find tons of bad C source. Caution: many people want small variations of "edit distance", usually via assigning different weights to insertions, replacements and deletions. A less common but still popular variant is to say that a transposition ("xy" vs "yx") is less costly than a delete plus an insert. Etc. "edit distance" is really a family of algorithms. >> + Each algorithm on its own is likely controversial. > Not these. There *are* canonical versions of all these, See the "edit distance" gloss above. > and exact equivalents are all heavily used in commercial OCR > software. God forbid that core Python may lose the commercial OCR developer market . It's not accepted that for every field F, core Python needs to supply the algorithms F uses heavily. Heck, core Python doesn't even ship with an FFT! Doesn't bother the folks working in signal processing. >> + Computing string similarity is something few apps need anyway. > Tim, this isn't true. Any time you need to validate user input > against a controlled vocabulary and give feedback on probable right > choices, Which is something few apps need anyway -- in my experience, but more so in my *primary* role here of trying to channel for you (& Guido) what Guido will say. It should be clear that I've got some familiarity with these schemes, so it should also be clear that Guido is likely to ask me about them whenever they pop up. But Guido has hardly ever asked me about them over the past decade, with the exception of the short-lived Soundex brouhaha. From that I guess hardly anyone ever asks *him* about them, and that's how channeling works: if this were an area where Guido felt core Python needed beefier libraries, I'm pretty sure I would have heard about it by now. But now Guido can speak for himself. There's no conceivable argument that could change what I *predict* he'll say. > R/O similarity is *very* useful. I've had it in my personal > toolkit for a decade and used it heavily for this -- you take your > unknown input, check it against a dictionary and kick "maybe you meant > foo?" to the user for every foo with an R/O similarity above 0.6 or so. > > The effects look like black magic. Users love it. I believe that. And I'd guess we all have things in our personal toolkits our users love. That isn't enough to get into the core, as I expect Guido will belabor on the next iteration of this . doesn't-mean-the-code-isn't-mondo-cool-ly y'rs - tim From dkwolfe at pacbell.net Sun Jan 14 01:19:56 2001 From: dkwolfe at pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 16:19:56 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore Message-ID: <0G7400EZQM2TXD@mta5.snfc21.pbi.net> >CVS makes it it very hard to a directory... >which will then break checkouts for everyone... with the potential to cause development code to be lost >Using SourceForge...have to ask the SF sysadmins I understand... we also use CVS and periodically (usually pre alpha) reorganize the source... going thru SF sysadmin makes it doublely hard... yuck! However, since you have "released" tarball archives, it seems to me that the loss of the diffs and log notes is more troubling that the need to create an old version.... at least that's been my experience when building software. ;-) >I just don't want to go through this hassle in order to make building >easier for one relatively little-used platform. humph. Ok, I'll accept that for now as we've only sold 100,000 Beta copies of Mac OS X... but if were not over 1 million users by this time next year... I'll eat my words. ;-) >> It's been suggested that a build directory be created (/src/build ?) >> and that the target be place here. >I don't understand what you are proposing here; I can't imagine that >an extra directory level could cause a slowdown. moshez suggested this in his comment on the patch - moving the target to a seperate directory. I'm not sure of the implications of doing this however, and wondered if it might effect the running of the regression suite and the executable before it was installed. >A suggestion I would be open to: change the executable name during >build (currently a .exe suffix is added), but change it back (removing >the .exe suffix) during the install. That should be a small change to >the Makefile. You mean without using the -with-suffix command? That can probably be done... but based on my readings, I'd thought you reject it as not being "clean" and complicating the build process more than it should - not to mention renaming the executable behind the builder's back... Lesser of two evils I guess - I'll investigate this however... >> I'd like to see Python be easily built on "out of the box"... >> [and] incorporated into the base Mac OS X installation... > >Just get Apple to include Python with their standard distribution and >nobody will *have* to build Python on Mac OSX. :-) Easier said that done as they already have the other P language installed. ;-) But then on the other hand, there are quite a few Pythonatic including me who use it in daily work at Apple. As I mentioned, the road to getting it in Mac OS X begins with getting it to build cleanly with the automated build system... so I've got to get this problem fixed before I start working on getting it in the build. - Dan (yes, I work for Apple, but this is something that I'm doing on my own!) From mwh21 at cam.ac.uk Sun Jan 14 01:41:35 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 14 Jan 2001 00:41:35 +0000 Subject: [Python-Dev] a readline replacement? In-Reply-To: Michael Hudson's message of "17 Dec 2000 18:18:24 +0000" References: <200012142319.MAA02140@s454.cosc.canterbury.ac.nz> <3A39ED07.6B3EE68E@lemburg.com> <14906.17412.221040.895357@anthem.concentric.net> <20001215040304.A22056@glacier.fnational.com> <20001215235425.A29681@xs4all.nl> Message-ID: Michael Hudson writes: > It wouldn't be particularly hard to rewrite editline in Python (we > have termios & the terminal handling functions in curses - and even > ioctl if we get really keen). > > I've been hacking on my own Python line reader on and off for a while; > it's still pretty buggy, but if you're feeling brave you could look at: > > http://www-jcsu.jesus.cam.ac.uk/~mwh21/hacks/pyrl-0.0.0.tar.gz As I secretly planned , the embarrassment of having code that full of holes publicly accessible spurred me to writing a much better version, to be found at: http://www-jcsu.jesus.cam.ac.uk/~mwh21/hacks/pyrl-0.2.0.tar.gz (or, now rsync works there again, in the equivalent place on the starship...). If you unpack it and execute $ python python_reader.py you should get something that closely mimics the current interpreter top level. It supports a wide range of cursor motion commands, built-in support for multiple line input and history (including incremental search). It doesn't do completion, basically because I haven't got round to it yet, and it will get into severe trouble if you enter an input that is taller than your terminal (I think this should be surmountable, but I haven't gotten round to this either). Another thing that I haven't gotten round to yet is documentation. After I've tackled these points I'll probably stick it up on parnassus. I've been using it as my standard python shell for a week or so, and quite like it, though the lack of completion is a drag. It is probably staggeringly unportable, so I'd appreciate finding out how it breaks on systems other that Linux with terminals other than xterms... Have the changes to enable use of editline been checked in yet? I worry that the licensing situation around the readline module is grey at best... Cheers, M. -- That's why the smartest companies use Common Lisp, but lie about it so all their competitors think Lisp is slow and C++ is fast. (This rumor has, however, gotten a little out of hand. :) -- Erik Naggum, comp.lang.lisp From esr at thyrsus.com Sun Jan 14 01:58:08 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sat, 13 Jan 2001 19:58:08 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 06:59:44PM -0500 References: <20010113171528.A17480@thyrsus.com> Message-ID: <20010113195808.B17712@thyrsus.com> Tim Peters : > If you throw almost everything out of Unix diff, that's what you'll be left > with. Offhand I don't know of enencumbered, industrial-strength C source; a > problem is that writing a program to compute this is a std homework exercise > (it's a common first "dynamic programming" example), so you can find tons of > bad C source. I found some formal descriptions of the algorithm and some unencumbered Oberon source. I'm coding up C now. It's not complicated if you're willing to hold the cost matrix in memory, which is reasonable for a string comparator in a way it wouldn't be for a file diff. > Caution: many people want small variations of "edit distance", usually via > assigning different weights to insertions, replacements and deletions. A > less common but still popular variant is to say that a transposition ("xy" > vs "yx") is less costly than a delete plus an insert. Etc. "edit distance" > is really a family of algorithms. Which about collapse into one if your function has three weight arguments for insert/replace/delete weights, as mine does. It don't get more general than that -- I can see that by looking at the formal description. OK, so I'll give you that I don't weight transpositions separately, but neither does any other variant I found on the web nor the formal descriptions. A fourth optional weight agument someday, maybe :-). > God forbid that core Python may lose the commercial OCR developer market > . It's not accepted that for every field F, core Python needs to > supply the algorithms F uses heavily. That's not my point -- I don't see OCR as a big Python market either. My point in observing that OCR uses Ratcliff/Obershelp heavily was simplty to show that it's a well-established algorithm, not `controversial'. > Heck, core Python doesn't even ship > with an FFT! Doesn't bother the folks working in signal processing. It probably won't surprise you that I considered writing an FFT extension module at one point :-). > > Tim, this isn't true. Any time you need to validate user input > > against a controlled vocabulary and give feedback on probable right > > choices, > > Which is something few apps need anyway I fundamentally disagree. Few application designers *know* they need it, but user interfaces would get a hell of a lot better if the technique were more commonly applied -- and that's why I want it in the Python library, so doing the right thing in Python will be a minimum-effort proposition. -- Eric S. Raymond What if you were an idiot, and what if you were a member of Congress? But I repeat myself. -- Mark Twain From tim.one at home.com Sun Jan 14 04:17:34 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 22:17:34 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <14944.52328.558763.46161@cj42289-a.reston1.va.home.com> Message-ID: [Fred] > Did you ever write documentation for it? ;-) A lot more than you did . just-show-me-"write-docs"-in-my-job-description-ly y'rs - tim From tim.one at home.com Sun Jan 14 05:39:59 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 13 Jan 2001 23:39:59 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010113195808.B17712@thyrsus.com> Message-ID: [Eric, on "edit distance"] > I found some formal descriptions of the algorithm and some > unencumbered Oberon source. I'm coding up C now. It's not > complicated if you're willing to hold the cost matrix in memory, > which is reasonable for a string comparator in a way it wouldn't > be for a file diff. All agreed, and it should be a straightforward task then. I'm assuming it will work with Unicode strings too . [on differing weights] > Which about collapse into one if your function has three weight > arguments for insert/replace/delete weights, as mine does. It don't > get more general than that -- I can see that by looking at the formal > description. > > OK, so I'll give you that I don't weight transpositions separately, > but neither does any other variant I found on the web nor the formal > descriptions. A fourth optional weight agument someday, maybe :-). > ... > and that's why I want it in the Python library, so doing the right > thing in Python will be a minimum-effort proposition. Guido will depart from you at a different point. I depart here: it's not "the right thing". It's a bunch of hacks that appeal not because they solve a problem, but because they're cute algorithms that are pretty easy to implement and kinda solve part of a problem. "The right thing"-- which you can buy --at least involves capturing a large base of knowledge about phonetics and spelling. In high school, one of my buddies was Dan Pryzbylski. If anyone who knew him (other than me ) were to type his name into the class reunion guy's web page, they'd probably spell it the way they remember him pronouncing it: sha-bill-skey (and that's how he pronounced "Dan" ). If that hit on the text string "Pryzbylski", *then* it would be "the right thing" in a way that makes sense to real people, not just to implementers. Working six years in commercial speech recog really hammered that home to me: 95% solutions are on the margin of unsellable, because an error one try in 20 is intolerable for real people. Developers writing for developers get "whoa! cool!" where my sisters walk away going "what good is that?". Edit distance doesn't get within screaming range of 95% in real life. Even for most developers, it would be better to package up the single best approach you've got (f(list, word) -> list of possible matches sorted in confidence order), instead of a module with 6 (or so) functions they don't understand and a pile of equally mysterious knobs. Then it may actually get used! Developers of the breed who would actually take the time to understand what you've done are, I suggest, similar to us: they'd skim the docs, ignore the code, and write their own variations. Or, IOW: > so doing the right thing in Python will be a minimum-effort > proposition. Make someone think first, and 95% of developers will just skip over it too. BTW, the theoretical literature ignored transposition at first, because it didn't fit well in the machinery. IIRC, I first read about it in an issue of SP&E (Software Practice & Experience), where the authors were forced into it because the "traditional" edit sequence measure sucked in their practice. They were much happier after taking transposition into account. The theoreticians have more than caught up since, and research is still active; e.g., 1997's PATTERN RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, DELETIONS AND GENERALIZED TRANSPOSITIONS B. J. Oommen and R. K. S. Loke http://www.scs.carleton.ca/~oommen/papers/GnTrnsJ2.PDF is a good read. As they say there, If one views the elements of the confusion matrices as probabilities, this [treating each character independent of all others, as "edit distance" does] is equivalent to assuming that the transformation probabilities at each position in the string are statistically independent and possess first-order Markovian characteristics. This model is usually assumed for simplicity rather it [sic] having any statistical significance. IOW, because it's easy to analyze, not because it solves a real problem -- and they're complaining about an earlier generalization of edit distance that makes the weights depend on the individual symbols involved as well as on the edit/delete/insert distinction (another variation trying to make this approach genuinely useful in real life). The Oommen-Loke algorithm appears much more realistic, taking into account the observed probabilities of mistyping specific letter pairs (although it still ignores phonetics), and they report accuracies approaching 98% in correctly identifying mangled words. 98% (more than twice as good as 95% -- the error rate is actually more useful to think about, 2% vs 5%) is truly useful for non-geek end users, and the state of the art here is far beyond what's easy to find and dead easy to implement. > ... > It probably won't surprise you that I considered writing an FFT > extension module at one point :-). Nope! More power to you, Eric. At least FFTs *are* state of the art, although *coding* them optimally is likely beyond human ability on modern machines: http://www.fftw.org/ (short course: they've generally got the fastest FFTs available, and their code is generated by program, systematically *trying* every trick in the book, timing it on a given box, and synthesizing a complete strategy out of the quickest pieces). sooner-or-later-the-only-code-real-people-will-use-won't-be-written- by-people-at-all-ly y'rs - tim From tim.one at home.com Sun Jan 14 06:38:52 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 00:38:52 -0500 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: <0G7400EZQM2TXD@mta5.snfc21.pbi.net> Message-ID: [Dan Wolfe] > ... > As I mentioned, the road to getting it in Mac OS X begins with > getting it to build cleanly with the automated build system... so > I've got to get this problem fixed before I start working on > getting it in the build. > > - Dan > (yes, I work for Apple, but this is something that I'm doing > on my own!) Hang in there, Dan! I did the first Python port to the KSR-1 on my own time too, despite working for the visionless bastards at the time. The rest is history: the glory, the fame, the riches, the groupies, the adulation of my peers. We won't mention the financial scandal and subsequent bankruptcy lest it discourage you for no good reason . BTW, "do the simplest thing that can possibly work"! It's OK if it's a little ugly. Better that than force hundreds of Python-builders to get divorced from a decade-old directory naming scheme. From esr at thyrsus.com Sun Jan 14 08:08:57 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 02:08:57 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sat, Jan 13, 2001 at 11:39:59PM -0500 References: <20010113195808.B17712@thyrsus.com> Message-ID: <20010114020857.E19782@thyrsus.com> Tim Peters : > All agreed, and it should be a straightforward task then. I'm assuming it > will work with Unicode strings too . Thought about that. Want to get it working for 8 bits first. > Guido will depart from you at a different point. I depart here: it's not > "the right thing". It's a bunch of hacks that appeal not because they solve > a problem, but because they're cute algorithms that are pretty easy to > implement and kinda solve part of a problem. Again, my experience says differently. I have actually *used* Ratcliff-Obershelp to implement Do What I Mean (actually, Tell Me What I Mean) -- and had it work very well for non-geek users. That's why I want other Python programmers to have easy access to the capability. > Working six years in commercial speech recog really hammered that home to > me: 95% solutions are on the margin of unsellable, because an error one try > in 20 is intolerable for real people. Developers writing for developers get > "whoa! cool!" where my sisters walk away going "what good is that?". Edit > distance doesn't get within screaming range of 95% in real life. I suspect your speech recognition experience has given you an unhelpful bias. For English, what you say is certainly true -- but that's a gross worst-case application of R/O and Levenshtein that I'm not interested in pursuing. Nor do I expect Python hackers to use my module for that. Where techniques like Ratcliff-Obershelp really shine (and what I expect the module to be used for) is with controlled vocabularies such as command interfaces. These tend to have better orthogonality than NL, so antinoise filtering by R/O or Levenshtein distance (a kindred technique I somehow didn't learn until today -- there are disadvantages to being an autodidact) can really go to town on them. (Actually, my gut after thinking about both algorithms hard is that R/O is still a better technique than Levenshtein for the kind of application I have in mind. But I also suspect the difference is marginal.) (Other good uses for algorithms in this class include cladistics and genomic analysis.) > Even for most developers, it would be better to package up the single best > approach you've got (f(list, word) -> list of possible matches sorted in > confidence order), instead of a module with 6 (or so) functions they don't > understand and a pile of equally mysterious knobs. That's why good documentation, with motivating usage hints, is important. I write good documentation, Tim. > PATTERN RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, > DELETIONS AND GENERALIZED TRANSPOSITIONS > B. J. Oommen and R. K. S. Loke > http://www.scs.carleton.ca/~oommen/papers/GnTrnsJ2.PDF Thanks for the pointer; I've downloaded it and will read it. If the description of Ooomen's algorithm is good enough, I'll implement it and add it to the module. -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From dkwolfe at pacbell.net Sun Jan 14 08:48:51 2001 From: dkwolfe at pacbell.net (Dan Wolfe) Date: Sat, 13 Jan 2001 23:48:51 -0800 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Message-ID: <0G75009ZD6UYYE@mta5.snfc21.pbi.net> On Saturday, January 13, 2001, at 09:38 PM, Tim Peters wrote: > [Dan Wolfe] >> ... >> As I mentioned, the road to getting it in Mac OS X begins with >> getting it to build cleanly with the automated build system... so >> I've got to get this problem fixed before I start working on >> getting it in the build. >> >> - Dan >> (yes, I work for Apple, but this is something that I'm doing >> on my own!) > > Hang in there, Dan! I did the first Python port to the KSR-1 on my own > time > too, despite working for the visionless bastards at the time. Well, I won't go that far..... some of them are quite visionaries (I can't stop drooling over a Ti portable....). > The rest is > history: the glory, the fame, the riches, the groupies, the adulation > of my > peers. We won't mention the financial scandal and subsequent bankruptcy > lest it discourage you for no good reason . You left out the part where they turn ya into a timbot... > BTW, "do the simplest thing that can possibly work"! It's OK if it's a > little ugly. Better that than force hundreds of Python-builders to get > divorced from a decade-old directory naming scheme. Well the mv Python to PyCore was the simplest... but obviously the most painful.... The longer ugly fix is working but it's such a hack that I'd rather not show it off...I need to fix it so that it allow nice things such allowing the -with-suffix to be used...and then testing all the edge cases such as clobber, etc so that I don't break anything. :-) appreciating-your-note-after-attempting-to-understand-makefiles-on-Saturday-night' ly yours, - Dan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1729 bytes Desc: not available URL: From tim.one at home.com Sun Jan 14 11:45:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 05:45:53 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114020857.E19782@thyrsus.com> Message-ID: [Tim] >> ...It's a bunch of hacks that appeal not because they solve >> a problem, but because they're cute algorithms that are pretty >> easy to implement and kinda solve part of a problem. [Eric] > Again, my experience says differently. I have actually *used* > Ratcliff-Obershelp to implement Do What I Mean (actually, Tell Me What > I Mean) -- and had it work very well for non-geek users. That's why I > want other Python programmers to have easy access to the capability. > ... > Where techniques like Ratcliff-Obershelp really shine (and what I > expect the module to be used for) is with controlled vocabularies > such as command interfaces. Yet the narrower the domain, the less call for a library with multiple approaches. If R-O really shone for you, why bother with anything else? Seriously. You haven't used some (most?) of these. The core isn't a place for research modules either (note that I have no objection whatsoever to writing any module you like -- the only question here is what belongs in the core, and any algorithm *nobody* here has experience with in your target domain is plainly a poor *core* candidate for that reason alone -- we have to maintain, justify and explain it for years to come). > I suspect your speech recognition experience has given you an > unhelpful bias. Try to think of it as a helpfully different perspective <0.5 wink>. It's in favor of measuring error rate by controlled experiments, skeptical of intuition, and dismissive of anecdotal evidence. I may well agree you don't need all that heavy machinery if I had a clear definition of what problem it is you're trying to solve (I've learned it's not the kinds of problems *I* had in mind when I first read your description!). BTW, telephone speech recog requires controlled vocabularies because phone acoustics are too poor for the customary close-talking microphone approaches to work well enough. A std technique there is to build a "confusability matrix" of the words *in* the vocabulary, to spot trouble before it happens: if two words are acoustically confusable, it flags them and bounces that info back to the vocabulary designer. A similar approach should work well in your domain: if you get to define the cmd interface, run all the words in it pairwise through your similarity measure of choice, and dream up new words whenever a pair is "too close". That all but ensures that even a naive similarity algorithm will perform well (in telephone speech recog, the unconstrained error rate is up to 70% on cell phones; by constraining the vocabulary with the aid of confusability measures, we cut that to under 1%). > ... > (Actually, my gut after thinking about both algorithms hard is that > R/O is still a better technique than Levenshtein for the kind of > application I have in mind. But I also suspect the difference is > marginal.) So drop Levenshtein -- go with your best shot. Do note that they both (usually) consider a single transposition to be as much a mutation as two replacements (or an insert plus a delete -- "pure" Levenshtein treats those the same). What happens when the user doesn't enter an exact match? Does the kind of app you have in mind then just present them with a list of choices? If that's all (as opposed to, e.g., substituting its best guess for what the user actually typed and proceeding as if the user had given that from the start), then the evidence from studies says users are almost as pleased when the correct choice appears somewhere in the first three choices as when it appears as *the* top choice. A well-designed vocabulary can almost guarantee that happy result (note that most of the current research is aimed at the much harder job of getting the intended word into the #1 slot on the choice list). > (Other good uses for algorithms in this class include cladistics and > genomic analysis.) I believe you'll find current work in those fields has moved far beyond these simplest algorithms too, although they remain inspirational (for example, see "Protein Sequence Alignment and Database Scanning" at http://barton.ebi.ac.uk/papers/rev93_1/rev93_1.html Much as in typing, some mutations are more likely than others for *physical* reasons, so treating all pairs of symbols in the alphabet alike is too gross a simplification.). >> Even for most developers, it would be better to package up the >> single best approach you've got (f(list, word) -> list of possible >> matches sorted in confidence order), instead of a module with 6 >> (or so) functions they don't understand and a pile of equally >> mysterious knobs. > That's why good documentation, with motivating usage hints, is > important. I write good documentation, Tim. You're not going to find offense here even if you look for it, Eric : while only a small percentage of developers don't read docs at all, everyone else spaces out at least in linear proportion to the length of the docs. Most people will be looking for "a solution", not for "a toolkit". If the docs read like a toolkit, it doesn't matter how good they are, the bulk of the people you're trying to reach will pass on it. If you really want this to be *used*, supply one class that does *all* the work, including making the expert-level choices of which algorithm is used under the covers and how it's tuned. That's good advice. I still expect Guido won't want it in the core before wide use is a demonstrated fact, though (and no, that's not a chicken-vs-egg thing: "wide use" for a thing outside the core is narrower than "wide use" for a thing in the core). An exception would likely get made if he tried it and liked it a lot. But to get it under his radar, it's again much easier if the usage docs are no longer than a couple paragraphs. I'll attach a tiny program that uses ndiff's SequenceMatcher to guess which of the 147 std 2.0 top-level library modules a user may be thinking of (and best I can tell, these are the same results case-folding R/O would yield): Module name? random Hmm. My best guesses are random, whrandom, anydbm (BTW, the first choice was an exact match) Module name? disect Hmm. My best guesses are bisect, dis, UserDict Module name? password Hmm. My best guesses are keyword, getpass, asyncore Module name? chitchat Hmm. My best guesses are whichdb, stat, asynchat Module name? xml Hmm. My best guesses are xmllib, mhlib, xdrlib [So far so good] Module name? http Hmm. My best guesses are httplib, tty, stat [I was thinking of httplib, but note that it missed SimpleHTTPServer: a name that long just isn't going to score high when the input is that short] Module name? dictionary Hmm. My best guesses are Bastion, ConfigParser, tabnanny [darn, I *think* I was thinking of UserDict there] Module name? uuencode Hmm. My best guesses are code, codeop, codecs [Missed uu] Module name? parse Hmm. My best guesses are tzparse, urlparse, pre Module name? browser Hmm. My best guesses are webbrowser, robotparser, user Module name? brower Hmm. My best guesses are webbrowser, repr, reconvert Module name? Thread Hmm. My best guesses are threading, whrandom, sched Module name? pickle Hmm. My best guesses are pickle, profile, tempfile (BTW, the first choice was an exact match) Module name? shelf Hmm. My best guesses are shelve, shlex, sched Module name? katmandu Hmm. My best guesses are commands, random, anydbm [I really was thinking of "commands"!] Module name? temporary Hmm. My best guesses are tzparse, tempfile, fpformat So it gets what I was thinking of into the top 3 very often, and despite some wildly poor guesses at the correct spelling -- you'd *almost* think it was doing a keyword search, except the *unintended* choices on the list are so often insane . Something like that may be a nice addition to Paul/Ping's help facility someday too. Hard question: is that "good enough" for what you want? Checking against 147 things took no perceptible time, because SequenceMatcher is already optimized for "compare one thing against N", doing preprocessing work on the "one thing" that greatly speeds the N similarity computations (I suspect you're not -- yet). It's been tuned and tested in practice for years; it works for any sequence type with hashable elements (so Unicode strings are already covered); it works for long sequences too. And if R-O is the best trick we've got, I believe it already does it. Do we need more? Of course *I'm* not convinced we even need *it* in the core, but packaging a match-1-against-N class is just a few minutes' editing of what follows. something-to-play-with-anyway-ly y'rs - tim NDIFFPATH = "/Python20/Tools/Scripts" LIBPATH = "/Python20/Lib" import sys, os sys.path.append(NDIFFPATH) from ndiff import SequenceMatcher modules = {} # map lowercase module stem to module name for f in os.listdir(LIBPATH): if f.endswith(".py"): f = f[:-3] modules[f.lower()] = f def match(fname, numchoices=3): lower = fname.lower() s = SequenceMatcher() s.set_seq2(lower) scores = [] for lowermod, mod in modules.items(): s.set_seq1(lowermod) scores.append((s.ratio(), mod)) scores.sort() scores.reverse() return modules.has_key(lower), [x[1] for x in scores[:numchoices]] while 1: name = raw_input("Module name? ") is_exact, choices = match(name) print "Hmm. My best guesses are", ", ".join(choices) if is_exact: print "(BTW, the first choice was an exact match)" From esr at thyrsus.com Sun Jan 14 13:15:33 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 07:15:33 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 05:45:53AM -0500 References: <20010114020857.E19782@thyrsus.com> Message-ID: <20010114071533.A5812@thyrsus.com> Tim Peters : > Yet the narrower the domain, the less call for a library with multiple > approaches. If R-O really shone for you, why bother with anything else? Well, I was bothering with Levenshtein because *you* suggested it. :-) I put in Hamming similarity and stemming because they're O(n) where R/O is quadratic, and both widely used in situations where a fast sloppy job is preferable to a good but slow one. My documentation page is explicit about the tradeoff. > Seriously. You haven't used some (most?) of these. I've used stemming and R-O. Haven't used Hamming or Levenshtein. > The core isn't a place > for research modules either (note that I have no objection whatsoever to > writing any module you like -- the only question here is what belongs in the > core, and any algorithm *nobody* here has experience with in your target > domain is plainly a poor *core* candidate for that reason alone -- we have > to maintain, justify and explain it for years to come). Fair point. I read it, in this context, as good advice to drop the Hamming entry point and forget about the Levenshtein implementation -- stick to what I've used and know is useful as opposed to what I think might be useful. > I may well agree you don't > need all that heavy machinery if I had a clear definition of what problem it > is you're trying to solve (I've learned it's not the kinds of problems *I* > had in mind when I first read your description!). I think you have it by now, judging by the following... > What happens when the user doesn't enter an exact match? Does the kind of > app you have in mind then just present them with a list of choices? Yes. I've used this technique a lot. It gives users not just guidance but warm fuzzy feelings -- they react as though there's a friendly homunculus inside the software looking out for them. Actually, in my experience, the less techie they are the more they like this. > If that's all (as opposed to, e.g., substituting its best guess for what the > user actually typed and proceeding as if the user had given that from the > start), then the evidence from studies says users are almost as pleased when > the correct choice appears somewhere in the first three choices as when it > appears as *the* top choice. Interesting. That does fit what I've seen. > A well-designed vocabulary can almost > guarantee that happy result (note that most of the current research is aimed > at the much harder job of getting the intended word into the #1 slot on the > choice list). Yes. One of my other tricks is to design command vocabularies so the first three characters close to unique. This means R/O will almost always nail the right thing. > Much as in typing, some mutations are more likely than others for *physical* > reasons, so treating all pairs of symbols in the alphabet alike is too gross > a simplification.). Indeed. Couple weeks ago I was a speaker at a conference called "After the Genome 6" at which one of the most interesting papers was given by a lady mathematician who designs algorithms for DNA sequence matching. She made exactly this point. > > That's why good documentation, with motivating usage hints, is > > important. I write good documentation, Tim. > > You're not going to find offense here even if you look for it, Eric : No worries, I wasn't looking. :-) > Most people will be looking for "a solution", not for "a toolkit". If the > docs read like a toolkit, it doesn't matter how good they are, the bulk of > the people you're trying to reach will pass on it. If you really want this > to be *used*, supply one class that does *all* the work, including making > the expert-level choices of which algorithm is used under the covers and how > it's tuned. That's good advice. I don't think that's possible in this case -- the proper domains for stemming and R-O are too different. But maybe this is another nudge to drop the Hamming code. > But to get it under his radar, it's again much easier if the usage > docs are no longer than a couple paragraphs. How's this? \section{\module{simil} -- String similarily metrics} \declaremodule{standard}{simil} \moduleauthor{Eric S. Raymond}{esr at thyrsus.com} \modulesynopsis{String similarity metrics.} \sectionauthor{Eric S. Raymond} The \module{simil} module provides similarity functions for approximate word or string matching. One important application is for checking input words against a dictionary to match possible misspellings with the right terms in a controlled vocabulary. The entry points provide different tradeoffs ranging from crude and fast (stemming) to effective but slow (Ratcliff-Obershelp gestalt subpattern matching). The latter is one of the standard techniques used in commercial OCR software. The \module{simil} module defines the following functions: \begin{funcdesc}{stem}{} Returns the length of the longest common prefix of two strings divided by the length of the longer. Similarity scores range from 0.0 (no common prefix) to 1.0 (identity). Running time is linear in string length. \end{funcdesc} \begin{funcdesc}{hamming}{} Computes a normalized Hamming similarity between two strings of equal length -- the number of pairwise matches in the strings, divided by their common length. It returns None if the strings are of unequal length. Similarity scores range from 0.0 (no positions equal) to 1.0 (identity). Running time is linear in string length. \end{funcdesc} \begin{funcdesc}{ratcliff}{} Returns a Ratcliff/Obershelp gestalt similarity score based on co-occurrence of subpatterns. Similarity scores range from 0.0 (no common subpatterns) to 1.0 (identity). Running time is best-case linear, worst-case quadratic in string length. \end{funcdesc} > Module name? http > Hmm. My best guesses are httplib, tty, stat > > [I was thinking of httplib, but note that it missed > SimpleHTTPServer: a name that long just isn't going to score > high when the input is that short] >>> simil.ratcliff("http", "httplib") 0.72727274894714355 >>> simil.ratcliff("http", "tty") 0.57142859697341919 >>> simil.ratcliff("http", "stat") 0.5 >>> simil.ratcliff("http", "simplehttpserver") 0.40000000596046448 So with the 0.6 threshold I normally use R-O does better at eliminating the false matches but doesn't catch SimpleHTTPServer (case is, I'm sure you'll agree, an irrelevant detail here). > Module name? dictionary > Hmm. My best guesses are Bastion, ConfigParser, tabnanny > > [darn, I *think* I was thinking of UserDict there] >>> simil.ratcliff("dictionary", "bastion") 0.47058823704719543 >>> simil.ratcliff("dictionary", "configparser") 0.45454546809196472 >>> simil.ratcliff("dictionary", "tabnanny") 0.4444444477558136 >>> simil.ratcliff("dictionary", "userdict") 0.4444444477558136 R-O would have booted all of these. Hiighest score to configparser. Interesting -- I'm beginning to think R-O overweights lots of small subpattern matches relative to a few big ones, something I didn't notice before because the statistics of my vocabularies masked it. > Module name? uuencode > Hmm. My best guesses are code, codeop, codecs >>> simil.ratcliff("uuencode", "code") 0.66666668653488159 >>> simil.ratcliff("uuencode", "codeops") 0.53333336114883423 >>> simil.ratcliff("uuencode", "codecs") 0.57142859697341919 >>> simil.ratcliff("uuencode", "uu") 0.40000000596046448 R-O would pick "code" and boot the rest. > [Missed uu] > > Module name? parse > Hmm. My best guesses are tzparse, urlparse, pre >>> simil.ratcliff("parse", "tzparse") 0.83333331346511841 >>> simil.ratcliff("parse", "urlparse") 0.76923078298568726 >>> simil.ratcliff("parse", "pre") 0.75 Same result. > Module name? browser > Hmm. My best guesses are webbrowser, robotparser, user >>> simil.ratcliff("browser", "webbrowser") 0.82352942228317261 >>> simil.ratcliff("browser", "robotparser") 0.55555558204650879 >>> simil.ratcliff("browser", "user") 0.54545456171035767 Big win for R-O. Picks the right one, boots the wrong two. > Module name? brower > Hmm. My best guesses are webbrowser, repr, reconvert >>> simil.ratcliff("brower", "webbrowser") 0.75 >>> simil.ratcliff("brower", "repr") 0.60000002384185791 >>> simil.ratcliff("brower", "reconvert") 0.53333336114883423 Small win for R/O -- boots reconvert, and repr squeaks in under the wire. > Module name? Thread > Hmm. My best guesses are threading, whrandom, sched >>> simil.ratcliff("thread", "threading") 0.80000001192092896 >>> simil.ratcliff("thread", "whrandom") 0.57142859697341919 >>> simil.ratcliff("thread", "sched") 0.54545456171035767 Big win for R-O. > Module name? pickle > Hmm. My best guesses are pickle, profile, tempfile >>> simil.ratcliff("pickle", "pickle") 1.0 >>> simil.ratcliff("pickle", "profile") 0.61538463830947876 >>> simil.ratcliff("pickle", "tempfile") 0.57142859697341919 R-O wins again. > (BTW, the first choice was an exact match) > Module name? shelf > Hmm. My best guesses are shelve, shlex, sched >>> simil.ratcliff("shelf", "shelve") 0.72727274894714355 >>> simil.ratcliff("shelf", "shlex") 0.60000002384185791 >>> simil.ratcliff("shelf", "sched") 0.60000002384185791 Interesting. Shelve scoores highest, both the others squeak in. > Module name? katmandu > Hmm. My best guesses are commands, random, anydbm > > [I really was thinking of "commands"!] >>> simil.ratcliff("commands", "commands") 1.0 >>> simil.ratcliff("commands", "random") 0.4285714328289032 >>> simil.ratcliff("commands", "anydbm") 0.4285714328289032 R-O wins big. > Module name? temporary > Hmm. My best guesses are tzparse, tempfile, fpformat >>> simil.ratcliff("temporary", "tzparse") 0.5 >>> simil.ratcliff("temporary", "tempfile") 0.47058823704719543 >>> simil.ratcliff("temporary", "fpformat") 0.47058823704719543 R-O boots all of these. > Hard question: is that "good enough" for what you want? Um...notice that R-O filtering, even though it seems to be underweighting large matches, did a rather better job on your examples! With an 0.66 threshold it would have done *much* better. I think you've just made an argument for replacing your SequenceMatcher with simil.ratcliff. Mine's even documented. :-). -- Eric S. Raymond Militias, when properly formed, are in fact the people themselves and include all men capable of bearing arms. [...] To preserve liberty it is essential that the whole body of the people always possess arms and be taught alike, especially when young, how to use them. -- Senator Richard Henry Lee, 1788, on "militia" in the 2nd Amendment From ping at lfw.org Sun Jan 14 13:38:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 04:38:42 -0800 (PST) Subject: [Python-Dev] Why both r'' and R'', u'' and U''? Message-ID: Sorry i'm being forgetful -- could someone please refresh my memory: Was there a good reason for allowing both lowercase and capital 'r' as a prefix for raw-strings? I assume that the availability of both r'' and R'' is what led to having both u'' and U''. Is there any good reason for that either? This just seems to lead to ambiguity and unneeded complexity: more cases in tokenize.py, more cases in tokenize.c, more work for IDLE, more annoying when searching for u' in your editor. (I was about to fix the lack of u'' support in tokenize.py and that made me think about this.) What happened to TOOWTDI? Would you believe we now have 36 different ways of starting a string: ' " ''' """ r' r" r''' r""" u' u" u''' u""" ur' ur" ur''' ur""" R' R" R''' R""" U' U" U''' U""" uR' uR" uR''' uR""" Ur' Ur" Ur''' Ur""" UR' UR" UR''' UR""" Would it be outrageous to suggest deprecating the last five rows? -- ?!ng [1] We started with 4. Perl has (by my count) 381 ways of starting a string literal, so we're halfway there, logarithmically speaking. Perl has 757 if you count the fancier operators qx, qw, s, and tr. From mal at lemburg.com Sun Jan 14 14:33:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 14 Jan 2001 14:33:29 +0100 Subject: [Python-Dev] Why is soundex marked obsolete? References: Message-ID: <3A61AAA9.F6F1EA9F@lemburg.com> [Lots of talk about interesting algorithms for "human" pattern matching] I just want to add my 2 cents to the discussion: * Eric's package seems very useful for pattern matching, but that is a very specific domain -- not main stream * I would opt to create a neat distutils style package for it for people to install at their own liking (I would certainly like it :) * If wrapped up as a separate package, I'd suggest to add all known algorithms to the package and also make it Unicode aware. There are similar package for e.g. RNGs on Parnassus. BTW, are there less English centric "sounds alike" matchers around ? The NIST soundex algorithm as published on the internet: http://physics.nist.gov/cuu/Reference/soundex.html works fine for English texts, but other languages of course have different letter coding requirements (or even different alphabets). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sun Jan 14 14:53:03 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 14 Jan 2001 14:53:03 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? References: Message-ID: <3A61AF3F.EE6DAB88@lemburg.com> Ka-Ping Yee wrote: > > Sorry i'm being forgetful -- could someone please refresh my memory: > > Was there a good reason for allowing both lowercase and capital 'r' > as a prefix for raw-strings? I assume that the availability of both > r'' and R'' is what led to having both u'' and U''. Right. > Is there any > good reason for that either? No idea... I have never used anything other than the lowercase versions. > This just seems to lead to ambiguity and unneeded complexity: > more cases in tokenize.py, more cases in tokenize.c, more work > for IDLE, more annoying when searching for u' in your editor. > (I was about to fix the lack of u'' support in tokenize.py and > that made me think about this.) > > What happened to TOOWTDI? > > Would you believe we now have 36 different ways of starting a string: > > ' " ''' """ > r' r" r''' r""" > u' u" u''' u""" > ur' ur" ur''' ur""" > R' R" R''' R""" > U' U" U''' U""" > uR' uR" uR''' uR""" > Ur' Ur" Ur''' Ur""" > UR' UR" UR''' UR""" > > Would it be outrageous to suggest deprecating the last five rows? No. + 1 on the idea. > -- ?!ng > > [1] We started with 4. Perl has (by my count) 381 ways of starting > a string literal, so we're halfway there, logarithmically speaking. > Perl has 757 if you count the fancier operators qx, qw, s, and tr. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Sun Jan 14 15:24:08 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 14 Jan 2001 15:24:08 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: ; from ping@lfw.org on Sun, Jan 14, 2001 at 04:38:42AM -0800 References: Message-ID: <20010114152408.G1005@xs4all.nl> On Sun, Jan 14, 2001 at 04:38:42AM -0800, Ka-Ping Yee wrote: > [1] We started with 4. Perl has (by my count) 381 ways of starting > a string literal, so we're halfway there, logarithmically speaking. > Perl has 757 if you count the fancier operators qx, qw, s, and tr. Don't forget 'qr//', which is quite like a raw string, except that Perl uses it to 'precompile' regular expressions as a side effect. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Sun Jan 14 18:08:28 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 12:08:28 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Your message of "Sun, 14 Jan 2001 14:53:03 +0100." <3A61AF3F.EE6DAB88@lemburg.com> References: <3A61AF3F.EE6DAB88@lemburg.com> Message-ID: <200101141708.MAA11161@cj20424-a.reston1.va.home.com> > Ka-Ping Yee wrote: > > > > Sorry i'm being forgetful -- could someone please refresh my memory: > > > > Was there a good reason for allowing both lowercase and capital 'r' > > as a prefix for raw-strings? I assume that the availability of both > > r'' and R'' is what led to having both u'' and U''. > > Right. > > > Is there any > > good reason for that either? > > No idea... I have never used anything other than the lowercase > versions. It comes from the numeric literals. C allows 0x0 and 0X0, and 0L as well as 0l. So does Python (and also 0j == 0J). > > This just seems to lead to ambiguity and unneeded complexity: > > more cases in tokenize.py, more cases in tokenize.c, more work > > for IDLE, more annoying when searching for u' in your editor. > > (I was about to fix the lack of u'' support in tokenize.py and > > that made me think about this.) > > > > What happened to TOOWTDI? > > > > Would you believe we now have 36 different ways of starting a string: > > > > ' " ''' """ > > r' r" r''' r""" > > u' u" u''' u""" > > ur' ur" ur''' ur""" > > R' R" R''' R""" > > U' U" U''' U""" > > uR' uR" uR''' uR""" > > Ur' Ur" Ur''' Ur""" > > UR' UR" UR''' UR""" > > > > Would it be outrageous to suggest deprecating the last five rows? > > No. + 1 on the idea. Why bother? All that does is outdate a bunch of documentation. I don't see the extra effort in various parsers as a big deal. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Sun Jan 14 18:53:32 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sun, 14 Jan 2001 18:53:32 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? Message-ID: <010f01c07e52$e9801fc0$e46940d5@hagrid> The name database portions of SF task 17335 ("add compressed unicode database") were postponed to 2.1. My current patch replaces the ~450k large ucnhash module with a new ~160k large module. (See earlier posts for more info on how the new database works). Should I check it in? From skip at mojam.com Sun Jan 14 18:51:52 2001 From: skip at mojam.com (Skip Montanaro) Date: Sun, 14 Jan 2001 11:51:52 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core Message-ID: <14945.59192.400783.403810@beluga.mojam.com> Ping's pydoc is awesome! Move it out of the sandbox and put it in the standard distribution. Biggest hook for me: 1. execute "pydoc -p 3200" 2. visit "http://localhost:3200/" 3. knock yourself out Skip From martin at mira.cs.tu-berlin.de Sun Jan 14 18:57:57 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 14 Jan 2001 18:57:57 +0100 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? Message-ID: <200101141757.f0EHvvt01407@mira.informatik.hu-berlin.de> > > Would it be outrageous to suggest deprecating the last five rows? > Why bother? All that does is outdate a bunch of documentation. He suggested to deprecate it, not to remove it. By the time it is removed, the documentation still mentioning it should be outdated for other reasons (e.g. the string module might have disappeared). In general, the rationale for deprecating things would be that the simplification will make everybody's life easier in the long run. In the case of a small change (such as this one), that advantage would be small. OTOH, the hassle for users that rely on the then-removed feature will be also small; I see it as quite unlikely that anybody uses that feature actively (although I do think that people use 0X10 and 100L; the latter is common since 100l is oft confused with 1001). Regards, Martin From tim.one at home.com Sun Jan 14 20:00:21 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 14:00:21 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114071533.A5812@thyrsus.com> Message-ID: Very quick (swamped): > I think you've just made an argument for replacing your > SequenceMatcher with simil.ratcliff. Actually, I'm certain they're the same algorithm now, except the C is showing through in ratcliff to the floating-point eye . For demonstration, I *always* printed the top three scorers (that's logic in the little driver I posted, not in SequenceMatcher), without any notion of cutoff (ndiff does use a cutoff). Add this line before the return (in the posted driver) to see the actual scores: print scores[:numchoices] For example: Module name? browser [(0.82352941176470584, 'webbrowser'), (0.55555555555555558, 'robotparser'), (0.54545454545454541, 'user')] Hmm. My best guesses are webbrowser, robotparser, user Module name? On this example you reported: >>> simil.ratcliff("browser", "webbrowser") 0.82352942228317261 >>> simil.ratcliff("browser", "robotparser") 0.55555558204650879 >>> simil.ratcliff("browser", "user") 0.54545456171035767 which strongly suggests you're using C floats instead of Python floats to compute the final score. I didn't try every example in your email, but it's the same story on the three I did try (scores identical modulo simil.ratcliff dropping about 30 of the low-order result bits -- which is about the difference between a C double and a C float on most boxes). > Mine's even documented. :-). Which I appreciate! I dreamt up the SequenceMatcher algorithm going on 20 years ago for a friendly diff generator, and never even considered using it for other purposes. But then I may have mentioned that these other purposes never come up in my apps . or-at-least-they-haven't-in-contexts-where-r/o-would-have-been- strong-enough-ly y'rs - tim From bckfnn at worldonline.dk Sun Jan 14 20:00:33 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sun, 14 Jan 2001 19:00:33 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <010f01c07e52$e9801fc0$e46940d5@hagrid> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: <3a61f12a.36601630@smtp.worldonline.dk> On Sun, 14 Jan 2001 18:53:32 +0100, you wrote: >The name database portions of SF task 17335 ("add >compressed unicode database") were postponed to >2.1. > >My current patch replaces the ~450k large ucnhash >module with a new ~160k large module. (See earlier >posts for more info on how the new database works). Do you have a link or an approx date of this earlier posts? I must have missed it. The patch on sourceforge seems a bit empty: https://sourceforge.net/patch/index.php?func=detailpatch&patch_id=100899&group_id=5470 As a result I invented my own compression format for the ucnhash for jython. I managed to achive ~100k but that probably have different performance properties. regards, finn From esr at thyrsus.com Sun Jan 14 20:09:01 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 14:09:01 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 02:00:21PM -0500 References: <20010114071533.A5812@thyrsus.com> Message-ID: <20010114140901.A6431@thyrsus.com> Tim Peters : > > I think you've just made an argument for replacing your > > SequenceMatcher with simil.ratcliff. > > Actually, I'm certain they're the same algorithm now, except the C is > showing through in ratcliff to the floating-point eye . Take a look: /***************************************************************************** * * Ratcliff-Obershelp common-subpattern similarity. * * This code first appeared in a letter to the editor in Doctor * Dobbs's Journal, 11/1988. The original article on the algorithm, * "Pattern Matching by Gestalt" by John Ratcliff, had appeared in the * July 1988 issue (#181) but the algorithm was presented in assembly. * The main drawback of the Ratcliff-Obershelp algorithm is the cost * of the pairwise comparisons. It is significantly more expensive * than stemming, Hamming distance, soundex, and the like. * * Running time quadratic in the data size, memory usage constant. * *****************************************************************************/ static int RatcliffObershelp(char *st1, char *end1, char *st2, char *end2) { register char *a1, *a2; char *b1, *b2; char *s1 = st1, *s2 = st2; /* initializations are just to pacify GCC */ short max, i; if (end1 <= st1 || end2 <= st2) return(0); if (end1 == st1 + 1 && end2 == st2 + 1) return(0); max = 0; b1 = end1; b2 = end2; for (a1 = st1; a1 < b1; a1++) { for (a2 = st2; a2 < b2; a2++) { if (*a1 == *a2) { /* determine length of common substring */ for (i = 1; a1[i] && (a1[i] == a2[i]); i++) continue; if (i > max) { max = i; s1 = a1; s2 = a2; b1 = end1 - max; b2 = end2 - max; } } } } if (!max) return(0); max += RatcliffObershelp(s1 + max, end1, s2 + max, end2); /* rhs */ max += RatcliffObershelp(st1, s1, st2, s2); /* lhs */ return max; } static float ratcliff(char *s1, char *s2) /* compute Ratcliff-Obershelp similarity of two strings */ { short l1, l2; l1 = strlen(s1); l2 = strlen(s2); /* exact match end-case */ if (l1 == 1 && l2 == 1 && *s1 == *s2) return(1.0); return 2.0 * RatcliffObershelp(s1, s1 + l1, s2, s2 + l2) / (l1 + l2); } static PyObject * simil_ratcliff(PyObject *self, PyObject *args) { char *str1, *str2; if(!PyArg_ParseTuple(args, "ss:ratcliff", &str1, &str2)) return NULL; return Py_BuildValue("f", ratcliff(str1, str2)); } -- Eric S. Raymond "Taking my gun away because I might shoot someone is like cutting my tongue out because I might yell `Fire!' in a crowded theater." -- Peter Venetoklis From fredrik at effbot.org Sun Jan 14 20:31:06 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sun, 14 Jan 2001 20:31:06 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3a61f12a.36601630@smtp.worldonline.dk> Message-ID: <040e01c07e60$8c74d100$e46940d5@hagrid> finn wrote: > As a result I invented my own compression format for the ucnhash for > jython. I managed to achive ~100k but that probably have different > performance properties. here's the description: --- From: "Fredrik Lundh" Date: Sun, 16 Jul 2000 20:40:46 +0200 /.../ The unicodenames database consists of two parts: a name database which maps character codes to names, and a code database, mapping names to codes. * The Name Database (getname) First, the 10538 text strings are split into 42193 words, and combined into a 4949-word lexicon (a 29k array). Each word is given a unique index number (common words get lower numbers), and there's a "lexicon offset" table mapping from numbers to words (10k). To get back to the original text strings, I use a "phrase book". For each original string, the phrase book stores a a list of word numbers. Numbers 0-127 are stored in one byte, higher numbers (less common words) use two bytes. At this time, about 65% of the words can be represented by a single byte. The result is a 56k array. The final data structure is an offset table, which maps code points to phrase book offsets. Instead of using one big table, I split each code point into a "page number" and a "line number" on that page. offset = line[ (page[code>>SHIFT]< From tim.one at home.com Sun Jan 14 20:46:44 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 14:46:44 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <3A61AAA9.F6F1EA9F@lemburg.com> Message-ID: [M.-A. Lemburg] > BTW, are there less English centric "sounds alike" matchers > around ? Yes, but if anything there are far too many of them: like Soundex, they're just heuristics, and *everybody* who cares adds their own unique twists, while proper studies are almost non-existent. Few variants appear to be in use much beyond their inventor's friends; one notable exception in the Jewish community is the Daitch-Mokotoff variation, originally tailored to their unique needs but later generalized; a brief description here: http://www.avotaynu.com/soundex.html The similarly involved NYSIIS algorithm (New York State Identification Intelligence System -- look for NYSIIS on Parnassus) was the winner from a field of about two dozen competing algorithms, after measuring their effectiveness on assorted databases maintained by the state of New York. Since New York has a large immigrant population, NYSIIS isn't as Anglocentric as Soundex either. But state-of-the-art has given up on purely computational algorithms for these purposes: proper names are simply too much a mess. For example, if I search for "Richard", it *ought* to match on "Dick"; if my Arab buddy searches on "Mohammed", it *ought* to match on "Mhd"; "the rules" people actually use just aren't reducible to pure computation -- it takes a large knowledge base to capture what people "just know". You may enjoy visiting this commercial site (AFAIK, nobody is giving away state-of-the-art for free): http://www.las-inc.com/ > ... > http://physics.nist.gov/cuu/Reference/soundex.html > > works fine for English texts, If that were true, the English-speaking researchers would have declared victory 120 years ago . But English pronunciation is *notoriously* difficult to predict from spelling, partly because English is the Perl of human languages. or-maybe-the-borg-assuming-there's-a-difference-ly y'rs - tim From esr at thyrsus.com Sun Jan 14 21:17:53 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 15:17:53 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 02:46:44PM -0500 References: <3A61AAA9.F6F1EA9F@lemburg.com> Message-ID: <20010114151753.A6671@thyrsus.com> Tim Peters : > If that were true, the English-speaking researchers would have declared > victory 120 years ago . But English pronunciation is *notoriously* > difficult to predict from spelling, partly because English is the Perl of > human languages. Actually, according to the Oxford Encyclopedia of Linguistics, this is an urban myth. The orthography of English is, in fact, quite consistent; it looks much more wacked out than it is because the maddening irregularities are concentrated in the 400 most commonly used words. The situation is much like that with French verb forms -- most French verbs have a very regular inflection pattern, but the twenty or so exceptions are the most commonly used ones. In fact it's a general rule in language evolution that irregularities are preserved in common forms and not rare ones -- in the rare ones they get forgotten. American personal names are are problem precisely because they sometimes do *not* have English orthography. -- Eric S. Raymond "...quemadmodum gladius neminem occidit, occidentis telum est." [...a sword never kills anybody; it's a tool in the killer's hand.] -- (Lucius Annaeus) Seneca "the Younger" (ca. 4 BC-65 AD), From tim.one at home.com Sun Jan 14 21:31:06 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 15:31:06 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <20010114140901.A6431@thyrsus.com> Message-ID: [Tim] > Actually, I'm certain they're the same algorithm now, except the C is > showing through in ratcliff to the floating-point eye . [Eric] > Take a look: Yup, same thing, except: > static float ratcliff(char *s1, char *s2) accounts for the numeric differences (change "float"->"double" and they'd be the same; Python has to convert it to a double anyway, lacking any internal support for C's floats; and the C code is *computing* in double regardless, cutting it back to a float upon return just because of the "float" decl). The code in SequenceMatcher doesn't *look* anything like it, though, due to years of dreaming up faster ways to do this (in its original role as a diff generator, it routinely had to deal with sequences containing 10s of thousands of elements, and code very much like the code you posted was just too slow for that). One simple trick that can enormously speed the worst cases: the "find the longest match starting here" innermost loop is guarded by > if (*a1 == *a2) However, it can't possibly find a *bigger* max unless it's also the case that a1[max) == a2[max) That's usually false in real life, so by adding that test to the guard you usually get to skip the innermost loop entirely. Probably more important in a diff-generator role, though. SequenceMatcher's prime trick is to preprocess one of the strings, in linear time building up a hash table mapping each character in the string to a list of the indices at which it appears. Then the second-innermost loop is saved from needing to do any search: when we get to, e.g., 'x' in the other string, the precomputed hash table tells us directly where to find all the x's in the original string. And in the match-1-against-N case, this hash table can be computed once & reused N times. That's a monster win. However, I never had the patience to code that in C, so I never *did* that before I reimplemented my stuff in Python. Now the Python ndiff runs circles around the old Pascal and C versions. I'm sure that has nothing to do with machines having gotten 100x faster in the meantime > for-short-1-against-1-matches-yours-will-certainly-be-quicker-ly y'rs - tim From guido at python.org Sun Jan 14 21:55:21 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 15:55:21 -0500 Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: Your message of "Sun, 14 Jan 2001 11:51:52 CST." <14945.59192.400783.403810@beluga.mojam.com> References: <14945.59192.400783.403810@beluga.mojam.com> Message-ID: <200101142055.PAA13041@cj20424-a.reston1.va.home.com> > Ping's pydoc is awesome! Move it out of the sandbox and put it in the > standard distribution. > > Biggest hook for me: > > 1. execute "pydoc -p 3200" > 2. visit "http://localhost:3200/" > 3. knock yourself out Yes, wow! Now, if we could somehow get this to show both the docs that Fred maintains and the stuff that Ping extracts from the source code, that would be even better! (I think that Ping's stuff should also run on the python.org site, by the way.) --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Sun Jan 14 21:59:28 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Sun, 14 Jan 2001 15:59:28 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: ; from tim.one@home.com on Sun, Jan 14, 2001 at 03:31:06PM -0500 References: <20010114140901.A6431@thyrsus.com> Message-ID: <20010114155928.A6793@thyrsus.com> Tim Peters : > [Tim] > > Actually, I'm certain they're the same algorithm now, except the C is > > showing through in ratcliff to the floating-point eye . > > [Eric] > > Take a look: > > Yup, same thing, except: > > > static float ratcliff(char *s1, char *s2) > > accounts for the numeric differences (change "float"->"double" and they'd be > the same; Python has to convert it to a double anyway, lacking any internal > support for C's floats; and the C code is *computing* in double regardless, > cutting it back to a float upon return just because of the "float" decl). OK, so the right answer is to make your version visible and documented in the library. -- Eric S. Raymond No one is bound to obey an unconstitutional law and no courts are bound to enforce it. -- 16 Am. Jur. Sec. 177 late 2d, Sec 256 From tim.one at home.com Sun Jan 14 22:01:19 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 16:01:19 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Message-ID: [?!ng] > [1] We started with 4. Na, *we* started with two, just ' and ". And at the time, I thought that was arguably one too many already . Allowing the modifiers to be case-insensitive seems to me much more Pythonic than the original sin of making ' and " mean the same thing. OTOH, if only " had been allowed at the start, we'd probably spell raw strings with ' today, and that doesn't really scream that they're so very different from " strings. leaving-this-one-be-ly y'rs - tim From barry at digicool.com Sun Jan 14 22:02:07 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sun, 14 Jan 2001 16:02:07 -0500 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> Message-ID: <14946.5071.92879.789400@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> Ping's pydoc is awesome! Move it out of the sandbox and put SM> it in the standard distribution. SM> Biggest hook for me: | 1. execute "pydoc -p 3200" | 2. visit "http://localhost:3200/" | 3. knock yourself out Whoa. Awesome. From ping at lfw.org Sun Jan 14 22:01:45 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:01:45 -0800 (PST) Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: <200101141708.MAA11161@cj20424-a.reston1.va.home.com> Message-ID: On Sun, 14 Jan 2001, Guido van Rossum wrote: > > It comes from the numeric literals. C allows 0x0 and 0X0, and 0L as > well as 0l. So does Python (and also 0j == 0J). I just did a little test. Neither Python, Perl, nor Tcl support "\X66", only "\x66". Perl doesn't support 0X1234, only 0x1234. Tcl's "expr" routine does support 0X1234. Javascript supports 0X1234, but not "\X66". I'd bet that no one really relies on or expects the uppercase forms except L. -- ?!ng From ping at lfw.org Sun Jan 14 22:14:34 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:14:34 -0800 (PST) Subject: [Python-Dev] Re: pydoc.py (show docs both inside and outside of Python) In-Reply-To: <14942.29609.19618.534613@cj42289-a.reston1.va.home.com> Message-ID: On Thu, 11 Jan 2001, Fred L. Drake, Jr. wrote: > Ka-Ping Yee writes: > > My next two targets are: > > 1. Generating text from the HTML documentation files > > using Paul Prescod's stuff in onlinehelp.py. > > You mean the ones I publish as the standard documentation? Relying > on the structure of that HTML is pure folly! Paul's onlinehelp.py is using the HTMLParser and AbstractFormatter to turn HTML into text. It also contains paths to specific files, e.g. help('assert') looks for "ref/assert.html". Are you okay with this technique? Have you tried onlinehelp.py? I was planning to do the same to provide help on the language in pydoc. -- ?!ng From skip at mojam.com Sun Jan 14 22:26:48 2001 From: skip at mojam.com (Skip Montanaro) Date: Sun, 14 Jan 2001 15:26:48 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <200101142055.PAA13041@cj20424-a.reston1.va.home.com> References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> Message-ID: <14946.6552.542015.620760@beluga.mojam.com> Guido> Now, if we could somehow get this to show both the docs that Fred Guido> maintains and the stuff that Ping extracts from the source code, Guido> that would be even better! I had exactly the same thought. I suspect that if the install target were modified to install the html-ized sections of the lib reference manual pydoc could grovel around in sys and find the root of the library reference manual pretty easily. If not, it could simply redirect to the relevant section of http://www.python.org/doc/current/lib/. Skip From tim.one at home.com Sun Jan 14 22:45:48 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 16:45:48 -0500 Subject: [Python-Dev] Why both r'' and R'', u'' and U''? In-Reply-To: Message-ID: [?!ng] > ... > I'd bet that no one really relies on or expects the uppercase > forms except L. And 0X. I don't think it's in the std library, but I've certainly seen Python code do stuff like magic = 0XFEEDFACE Plus it's always good for a language to be able parse the stuff it prints, and "0X..." is generated by Python's %#X format code. Don't believe I've ever seen the "u" or "r" string modifiers in uppercase, though, but really don't see the harm in allowing that. From ping at lfw.org Sun Jan 14 22:50:43 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 13:50:43 -0800 (PST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <14946.5071.92879.789400@anthem.wooz.org> Message-ID: On Sun, 14 Jan 2001, Barry A. Warsaw wrote: > Whoa. Awesome. Thanks! Two things added recently: constants (any numbers, lists, tuples, strings, or types) in modules are shown; and packages are listed in the index as they should be. -- ?!ng From bckfnn at worldonline.dk Sun Jan 14 23:20:51 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sun, 14 Jan 2001 22:20:51 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <040e01c07e60$8c74d100$e46940d5@hagrid> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3a61f12a.36601630@smtp.worldonline.dk> <040e01c07e60$8c74d100$e46940d5@hagrid> Message-ID: <3a622615.50148579@smtp.worldonline.dk> [/F] >here's the description: Thanks. >From: "Fredrik Lundh" >Date: Sun, 16 Jul 2000 20:40:46 +0200 > >/.../ > > The unicodenames database consists of two parts: a name > database which maps character codes to names, and a code > database, mapping names to codes. > >* The Name Database (getname) > > First, the 10538 text strings are split into 42193 words, > and combined into a 4949-word lexicon (a 29k array). I only added a word to the lexicon if it was used more than once and if the length was larger then the lexicon index. I ended up with 1385 entries in the lexicon. (a 7k array) > Each word is given a unique index number (common words get > lower numbers), and there's a "lexicon offset" table mapping > from numbers to words (10k). My lexicon offset table is 3k and I also use 4k on a perfect hash of the words. > To get back to the original text strings, I use a "phrase > book". For each original string, the phrase book stores a a > list of word numbers. Numbers 0-127 are stored in one byte, > higher numbers (less common words) use two bytes. At this > time, about 65% of the words can be represented by a single > byte. The result is a 56k array. Because not all words are looked up in the lexicon, I used the values 0-38 for the letters and number, 39-250 are used for one byte lexicon index, and 251-255 are combined with following byte to form a two byte. This also result in a 57k array So far it is only minor variations. > The final data structure is an offset table, which maps code > points to phrase book offsets. Instead of using one big > table, I split each code point into a "page number" and a > "line number" on that page. > > offset = line[ (page[code>>SHIFT]< > Since the unicode space is sparsely populated, it's possible > to split the code so that lots of pages gets no contents. I > use a brute force search to find the optimal SHIFT value. > > In the current database, the page table has 1024 entries > (SHIFT is 6), and there are 199 unique pages in the line > table. The total size of the offset table is 26k. > >* The code database (getcode) > > For the code table, I use a straight-forward hash table to store > name to code mappings. It's basically the same implementation > as in Python's dictionary type, but a different hash algorithm. > The table lookup loop simply uses the name database to check > for hits. > > In the current database, the hash table is 32k. I chose to split a unicode name into words even when looking up a unicode name. Each word is hashed to a lexicon index and a "phrase book string" is created. The sorted phrase book is then search with a binary search among 858 entries that can be address directly followed by a sequential search among 12 entries. The phrase book search index is 8k and a table that maps phrase book indexes to codepoints is another 20k. The searching I do makes jython slower then the direct calculation you do. I'll take another look at this after jython 2.0 to see if I can improve performance with your page/line number scheme and a total hashing of all the unicode names. regards, finn From ping at lfw.org Sun Jan 14 23:44:47 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 14 Jan 2001 14:44:47 -0800 (PST) Subject: [Python-Dev] SourceForge and long patches Message-ID: Okay, this is getting really annoying. SourceForge won't accept any patches > 16k. Why not? Is there a way around this? SourceForge: Exiting with Error ERROR Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 I'm trying to submit the update to tokenize.py, but it's too long because i've changed test/output/test_tokenize and that's a big file. -- ?!ng From guido at python.org Sun Jan 14 23:58:03 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 17:58:03 -0500 Subject: [Python-Dev] SourceForge and long patches In-Reply-To: Your message of "Sun, 14 Jan 2001 14:44:47 PST." References: Message-ID: <200101142258.RAA13606@cj20424-a.reston1.va.home.com> > Okay, this is getting really annoying. SourceForge won't accept > any patches > 16k. Why not? Is there a way around this? I have no idea why; can only assume it's a limitation in the database package they use. The standard workaround is to upload a URL pointing to the patch. :-( > SourceForge: Exiting with Error > > ERROR > > Patch Uploaded ERROR - Submission failed PQsendQuery() -- query is too long. Maximum length is 16382 --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jan 15 00:35:51 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 00:35:51 +0100 Subject: [Python-Dev] Where's Greg Ward ? Message-ID: <3A6237D7.673BBB30@lemburg.com> He seems to be offline and the people on the distutils list have some patches and other things which would be nice to have in distutils for 2.1. I suppose we could simply check in the patches, but we still want to get his OK on things before applying patches to the distutils tree. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 15 00:57:45 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 14 Jan 2001 18:57:45 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <3A6237D7.673BBB30@lemburg.com> Message-ID: [MAL] > He seems to be offline and the people on the distutils list have > some patches and other things which would be nice to have in > distutils for 2.1. Greg's somewhere near the end of the process of moving from Virginia to Canada; I expect he'll become visible again Real Soon. > I suppose we could simply check in the patches, but we still want > to get his OK on things before applying patches to the distutils > tree. The distutils SIG could elect a Shadow Dictator in his place; if everyone agrees to vote for Andrew, you save the effort of counting votes . From tismer at tismer.com Mon Jan 15 02:35:57 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 15 Jan 2001 02:35:57 +0100 Subject: [Python-Dev] Minor Bug-fix release for Stackless Python 2.0 Message-ID: <3A6253FD.E9B30462@tismer.com> Wolfgang Lipp reported that Microthreads were executing sequentially with SLP 2.0 . The bug fix is available on the website. Please use this new version, or microthreads will not give you much fun. http://www.stackless.com/spc20-win32.exe http://www.stackless.com/spc-src-010115.zip enjoy - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tommy at ilm.com Mon Jan 15 03:18:20 2001 From: tommy at ilm.com (Captain Senorita) Date: Sun, 14 Jan 2001 18:18:20 -0800 (PST) Subject: [Python-Dev] chomp()? In-Reply-To: <14923.31238.65155.496546@buffalo.fnal.gov> References: <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> Message-ID: <14946.23981.694472.406438@mace.lucasdigital.com> Charles G Waldman writes: | | P=NP (Python is not Perl) Is it too late to suggest this for the SPAM9 t-shirt? :) From guido at python.org Mon Jan 15 03:24:36 2001 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Jan 2001 21:24:36 -0500 Subject: [Python-Dev] chomp()? In-Reply-To: Your message of "Sun, 14 Jan 2001 18:18:20 PST." <14946.23981.694472.406438@mace.lucasdigital.com> References: <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> <14946.23981.694472.406438@mace.lucasdigital.com> Message-ID: <200101150224.VAA15254@cj20424-a.reston1.va.home.com> > Charles G Waldman writes: > | > | P=NP (Python is not Perl) > > Is it too late to suggest this for the SPAM9 t-shirt? :) By just about a day -- I haven't seen the new design yet, but Just & Eric were supposed to design it today and hand in the final proofs tomorrow. I believe the slogan will be "it fits your brain" (or "it fits my brain"). But if you print a bunch of P=NP shirts, I'm sure you can sell them with a profit, both in Long Beach and in San Diego (at the O'Reilly Open Source conference)... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 15 07:35:05 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 01:35:05 -0500 Subject: [Python-Dev] xreadline speed vs readlines_sizehint In-Reply-To: <20010110101545.A21305@glacier.fnational.com> Message-ID: [Timmy] > At this point I'm +0.5 on the idea of fileobject.c using > ms_getline_hack whenever HAVE_GETC_UNLOCKED isn't available. [NeilS, from Wednesday] > Compare ms_getline_hack to what Perl does in order speed up IO. Believe me, I have . > I think its worth maintaining that piece of relatively portable > code given the benefit. If the code has to be maintained then it > might was well be used. If we find a platform the breaks we can > always disable it before the final release. Given that hearty encouragement, and the utterly non-scary results so far, I just checked in a new scheme: On a platform with getc_unlocked(): By default, use getc_unlocked(). If you want to use fgets() instead, #define USE_FGETS_IN_GETLINE. [so motivated people can use fgets() instead if it's faster on their platform] On a platform without getc_unlocked(): By default, use fgets(). If you don't want to use fgets(), #define DONT_USE_FGETS_IN_GETLINE. [so if we stumble into a platform it fails on between releases, the user will have an easy time turning it off themself] From gstein at lyra.org Mon Jan 15 08:18:20 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 14 Jan 2001 23:18:20 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Sat, Jan 13, 2001 at 08:55:35AM -0800 References: Message-ID: <20010114231820.C6081@lyra.org> On Sat, Jan 13, 2001 at 08:55:35AM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Lib > In directory usw-pr-cvs1:/tmp/cvs-serv14586 > > Modified Files: > httplib.py > Log Message: > SF Patch #103225 by Ping: httplib: smallest Python patch ever >... Not so small: >... > *** 333,337 **** > i = host.find(':') > if i >= 0: > ! port = int(host[i+1:]) > host = host[:i] > else: > --- 333,340 ---- > i = host.find(':') > if i >= 0: > ! try: > ! port = int(host[i+1:]) > ! except ValueError, msg: > ! raise socket.error, str(msg) > host = host[:i] > else: Did you intend to commit this? Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at zadka.site.co.il Mon Jan 15 16:53:58 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 17:53:58 +0200 (IST) Subject: [Python-Dev] chomp()? In-Reply-To: <200101150224.VAA15254@cj20424-a.reston1.va.home.com> References: <200101150224.VAA15254@cj20424-a.reston1.va.home.com>, <200012281504.KAA25892@cj20424-a.reston1.va.home.com> <14923.31238.65155.496546@buffalo.fnal.gov> <14946.23981.694472.406438@mace.lucasdigital.com> Message-ID: <20010115155358.86E5AA828@darjeeling.zadka.site.co.il> On Sun, 14 Jan 2001 21:24:36 -0500, Guido van Rossum wrote: > But if you print a bunch of P=NP shirts, I'm sure you can sell them > with a profit, both in Long Beach and in San Diego (at the O'Reilly > Open Source conference)... And the Libre Software Meeting (http://lsm.abul.org), which has a Python subtopic too. (Since it's in France, no one is calling it "free", so it's probable you can sell those T-shirts there...) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal at lemburg.com Mon Jan 15 10:44:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:44:14 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: <3A62C66E.2BB69E61@lemburg.com> Fredrik Lundh wrote: > > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? Since the Unicode character names are probably not used for performance sensitive tasks, I suggest to checkin the smallest version possible. If it is too much work to get Finn's version recoded in C (presuming it's written in Java), then I'd suggest checking in your version until someone comes up with a yet smaller edition. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 15 10:48:49 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:48:49 +0100 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> <14946.6552.542015.620760@beluga.mojam.com> Message-ID: <3A62C781.22240D3C@lemburg.com> Skip Montanaro wrote: > > Guido> Now, if we could somehow get this to show both the docs that Fred > Guido> maintains and the stuff that Ping extracts from the source code, > Guido> that would be even better! > > I had exactly the same thought. I suspect that if the install target were > modified to install the html-ized sections of the lib reference manual pydoc > could grovel around in sys and find the root of the library reference manual > pretty easily. If not, it could simply redirect to the relevant section of > http://www.python.org/doc/current/lib/. Since Fred remarked that the URLs for the different docs are not fixed, how about adding a __onlinedocs__ attribute to the standard Python modules providing the correct URL ? Or, alternatively, pass the module's name through some Google like "I feel lucky" documentation search engine... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 15 10:51:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 10:51:40 +0100 Subject: [Python-Dev] Where's Greg Ward ? References: Message-ID: <3A62C82C.EA25AAF5@lemburg.com> [CCed to distutils, since it matters there] Tim Peters wrote: > > [MAL] > > He seems to be offline and the people on the distutils list have > > some patches and other things which would be nice to have in > > distutils for 2.1. > > Greg's somewhere near the end of the process of moving from Virginia to > Canada; I expect he'll become visible again Real Soon. Great :) > > I suppose we could simply check in the patches, but we still want > > to get his OK on things before applying patches to the distutils > > tree. > > The distutils SIG could elect a Shadow Dictator in his place; if everyone > agrees to vote for Andrew, you save the effort of counting votes . Ok, let's agree to vote for Andrew :) Andrew, is that OK with you ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 15 11:52:09 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 05:52:09 -0500 Subject: [Python-Dev] RE: xreadline speed vs readlines_sizehint In-Reply-To: <3A5D602D.9DC991CB@per.dem.csiro.au> Message-ID: [Mark Favas] > ... > The lines range in length from 96 to 747 characters, with > 11% @ 233, 17% @ 252 and 52% @ 254 characters, so #1 [a vendor > who actually optimized fgets()] looks promising - most lines are > long enough to trigger a realloc. Plus as soon as you spill over the stack buffer, I make you pay for filling 1024 new bytes with newlines before the next fgets() call, and almost all of those are irrelevant to you. It doesn't degrade gracefully. Alas, I tried several "adaptive" schemes (adjusting how much of the initial segment of a larger stack buffer they would use, based on the actual line lengths seen in the past), but the costs always exceeded the savings on my box. > Cranking up INITBUFSIZE in ms_getline_hack to 260 from 200 > improves thing again, by another 25%: > total 131426612 chars and 514216 lines > count_chars_lines 5.081 5.066 > readlines_sizehint 3.743 3.717 > using_fileinput 11.113 11.100 > while_readline 6.100 6.083 > for_xreadlines 3.027 3.033 Well, I couldn't let you forego *all* of 25%. The current fileobject.c has a stack buffer of 300 bytes, but only uses 100 of them on the first gets() call. On a very quiet machine, that saved 3-4% of the runtime on *my* test case, whose line lengths are typical of the text files I crunch over, so I'm happy for me. If 100 bytes aren't enough, it must call fgets() again, but just appends the next call into the full 300-byte buffer. So it saves the realloc for lines under 300 chars. > Apart from the name , I like ms_getline_hack... Ya, it's now the non-pejorative getline_via_fgets(). I hate that I became a grown-up <0.9 wink>. time-to-pick-wings-off-of-flies-ly y'rs - tim From ping at lfw.org Mon Jan 15 12:11:16 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 03:11:16 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: <20010114231820.C6081@lyra.org> Message-ID: On Sun, 14 Jan 2001, Greg Stein wrote: > Not so small: > > >... > > *** 333,337 **** > > i = host.find(':') > > if i >= 0: > > ! port = int(host[i+1:]) > > host = host[:i] > > else: > > --- 333,340 ---- > > i = host.find(':') > > if i >= 0: > > ! try: > > ! port = int(host[i+1:]) > > ! except ValueError, msg: > > ! raise socket.error, str(msg) > > host = host[:i] > > else: The above changes were not part of the patch i submitted; the patch i submitted was exactly a one-character change. Guido has already edited the file, so there's no need to commit anything further here. -- ?!ng From mal at lemburg.com Mon Jan 15 12:56:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 12:56:37 +0100 Subject: [Python-Dev] Why is soundex marked obsolete? References: Message-ID: <3A62E575.9A584108@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > BTW, are there less English centric "sounds alike" matchers > > around ? > > Yes, but if anything there are far too many of them: like Soundex, they're > just heuristics, and *everybody* who cares adds their own unique twists, > while proper studies are almost non-existent. Few variants appear to be in > use much beyond their inventor's friends; one notable exception in the > Jewish community is the Daitch-Mokotoff variation, originally tailored to > their unique needs but later generalized; a brief description here: > > http://www.avotaynu.com/soundex.html > > The similarly involved NYSIIS algorithm (New York State Identification > Intelligence System -- look for NYSIIS on Parnassus) was the winner from a > field of about two dozen competing algorithms, after measuring their > effectiveness on assorted databases maintained by the state of New York. > Since New York has a large immigrant population, NYSIIS isn't as > Anglocentric as Soundex either. Thanks for the pointer. I'll add that module to my lib :) http://metagram.webreply.com/downloads/nysiis.py Perhaps Eric ought to add this one to his package as well ?! BTW, where can I find your package on the web, Eric ? I'd like to give it a ride under German language conditions ;) > But state-of-the-art has given up on purely computational algorithms for > these purposes: proper names are simply too much a mess. For example, if I > search for "Richard", it *ought* to match on "Dick"; if my Arab buddy > searches on "Mohammed", it *ought* to match on "Mhd"; "the rules" people > actually use just aren't reducible to pure computation -- it takes a large > knowledge base to capture what people "just know". You may enjoy visiting > this commercial site (AFAIK, nobody is giving away state-of-the-art for > free): > > http://www.las-inc.com/ Sad -- "patent pending" algorithms don't help anyone on this planet :( > > ... > > http://physics.nist.gov/cuu/Reference/soundex.html > > > > works fine for English texts, > > If that were true, the English-speaking researchers would have declared > victory 120 years ago . But English pronunciation is *notoriously* > difficult to predict from spelling, partly because English is the Perl of > human languages. Then Dutch must be the Python of human languages... ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Mon Jan 15 21:13:18 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:13:18 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 In-Reply-To: References: Message-ID: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> On Sun, 14 Jan 2001 19:26:38 -0800, Tim Peters wrote: > Modified Files: > tabnanny.py > Log Message: > Whitespace normalization. hmmmmmm....... -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From mal at lemburg.com Mon Jan 15 13:10:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 13:10:30 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 References: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: <3A62E8B6.3DFC1FA2@lemburg.com> Moshe Zadka wrote: > > On Sun, 14 Jan 2001 19:26:38 -0800, Tim Peters wrote: > > Modified Files: > > tabnanny.py > > Log Message: > > Whitespace normalization. > > hmmmmmm....... Perhaps you ought to make this a CRON job ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Mon Jan 15 21:24:48 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:24:48 +0200 (IST) Subject: [Python-Dev] Someone should be shot In-Reply-To: <3A62E8B6.3DFC1FA2@lemburg.com> References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: <20010115202448.38F60A828@darjeeling.zadka.site.co.il> I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! Of course, the real culprit is the person who fixed up the reply-to in the checkin messages to point to python-dev. Why was it done, and isn't there a better way? This makes it painful to personally comment on people's checkin messages. I suggest instead to add a mail-followup-to header (Didn't anyone read "Reply-To Munging Considered Harmful"?) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From esr at thyrsus.com Mon Jan 15 13:23:25 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 07:23:25 -0500 Subject: [Python-Dev] Why is soundex marked obsolete? In-Reply-To: <3A62E575.9A584108@lemburg.com>; from mal@lemburg.com on Mon, Jan 15, 2001 at 12:56:37PM +0100 References: <3A62E575.9A584108@lemburg.com> Message-ID: <20010115072325.A10377@thyrsus.com> M.-A. Lemburg : > Perhaps Eric ought to add this one to his package as well ?! Actually, at this point, my plan is to give Tim a decent interval to refactor ndiff so his SequenceMatcher class is exposed and documented -- otherwise *I'll* go in and do it (har! waving a bloody knife!). His turns out to be the same as the Ratcliff-Obershelp technique I was using, except Tim had his bullshit threshold set too low (:-)) and let through matches I wouldn't have. -- Eric S. Raymond The only purpose for which power can be rightfully exercised over any member of a civilized community, against his will, is to prevent harm to others. His own good, either physical or moral, is not a sufficient warrant -- John Stuart Mill, "On Liberty", 1859 From mal at lemburg.com Mon Jan 15 13:26:59 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 13:26:59 +0100 Subject: [Python-Dev] Re: Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <3A62EC93.9AA60ABA@lemburg.com> Moshe Zadka wrote: > > I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! > Of course, the real culprit is the person who fixed up the reply-to in > the checkin messages to point to python-dev. Why was it done, and > isn't there a better way? This makes it painful to personally comment > on people's checkin messages. I suggest instead to add a mail-followup-to > header > > (Didn't anyone read "Reply-To Munging Considered Harmful"?) Naa, noone needs to be shot in the foot ;) In fact I like it, that replies go to python-dev ... after all, that's where these things should be discussed. BTW, in case you misunderstood my reply: it would indeed make sense to automate these kinds of check (tabnanny et al). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From moshez at zadka.site.co.il Mon Jan 15 21:42:15 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 15 Jan 2001 22:42:15 +0200 (IST) Subject: [Python-Dev] Re: Someone should be shot In-Reply-To: <3A62EC93.9AA60ABA@lemburg.com> References: <3A62EC93.9AA60ABA@lemburg.com>, <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <20010115204215.84F0CA828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001 13:26:59 +0100, "M.-A. Lemburg" wrote: > In fact I like it, that replies go to python-dev ... after all, > that's where these things should be discussed. Well, that's the mailing list where things should be discussed. But when I press the "Reply" button (as opposed to "Reply to List" button) I expect my e-mail to go to the person originating the e-mail. Reply-To: means "I'd like to get replies to some other address". What if, say, a checkin message relates to some private topic I'd discussed with someone: I'd like to reply to him personally. I agree that responses to Python-Checkins should be handled on Python-Dev: that's what the mail-followup-to header is for. > BTW, in case you misunderstood my reply: it would indeed make > sense to automate these kinds of check (tabnanny et al). Oh, ok. The "cron" part threw me off (why cron?) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From barry at digicool.com Mon Jan 15 14:15:28 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 08:15:28 -0500 Subject: [Python-Dev] Where's Greg Ward ? References: <3A62C82C.EA25AAF5@lemburg.com> Message-ID: <14946.63472.282750.828218@anthem.wooz.org> >>>>> "M" == M writes: >> The distutils SIG could elect a Shadow Dictator in his place; >> if everyone agrees to vote for Andrew, you save the effort of >> counting votes . M> Ok, let's agree to vote for Andrew :) M> Andrew, is that OK with you ? He's got my vote. I've been experiencing some weird problems with the distutils installation of pybsddb3 out of the current Python cvs tree. It'd be nice if the outstanding distutils patches are integrated before I dive in. I don't see anything relevant in patches or bugs, but I don't know if there are other repositories of distutils fixes (like the archives?). -Barry From barry at digicool.com Mon Jan 15 14:27:02 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 08:27:02 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <14946.64166.348139.425223@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> I'm sorry! I meant to reply to tim alone, and ended up MZ> spamming python-dev! Of course, the real culprit is the MZ> person who fixed up the reply-to in the checkin messages to MZ> point to python-dev. Why was it done, and isn't there a better MZ> way? This makes it painful to personally comment on people's MZ> checkin messages. I suggest instead to add a mail-followup-to MZ> header MZ> (Didn't anyone read "Reply-To Munging Considered Harmful"?) Or how about http://www.metasystema.org/essays/reply-to-useful.mhtml for a dissenting view. Of course Mail-Followup-To is completely non-standard, but even if it were, having the mailing list munge it in isn't recommended: http://cr.yp.to/proto/replyto.html Bottom line (IMHO), this is just something about email that is and will forever remain broken. Given that, it was voted a long while back to make Reply-To for checkins point to python-dev so until there's a hue and cry to change it back, I'll leave it as is. And yeah, it bites me sometimes too! -Barry From tony at lsl.co.uk Mon Jan 15 15:18:36 2001 From: tony at lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 15 Jan 2001 14:18:36 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message-ID: <002801c07efe$0c728a80$f05aa8c0@lslp7o.int.lsl.co.uk> Neat stuff. Ka-Ping Yee strikes again. And it works with Python 1.5.2. Running on NT (4.00.1381) in an "MS-DOS" window, using Python 1.5.2 installed in the effbot manner, it works, with the slight strangeness that if I do: python pydoc.py I get the documentation for OK, but it is preceded with a line claiming that: The system cannot find the path specified. I don't have the time to pursue this at the moment - it's possibly an artefact of our system? (one minor "prettiness" hack - those of us who have been tainted by Emacs Lisp programming tend to start module documentation off with a line of the form: .py -- information about the module which, when pydoc'ed, results in a NAME line which starts with twice... Of course, if I'm the only person doing this, I'll just have to, well, stop...) A request - a "-f" switch to allow the user to specify a particular Python file (i.e., something not on the PYTHONPATH). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From jack at oratrix.nl Mon Jan 15 15:32:02 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 15 Jan 2001 15:32:02 +0100 Subject: [Python-Dev] Regarding Patch #103222: mv Python to PyCore In-Reply-To: Message by Guido van Rossum , Sat, 13 Jan 2001 17:33:34 -0500 , <200101132233.RAA03229@cj20424-a.reston1.va.home.com> Message-ID: <20010115143203.A44B63C2031@snelboot.oratrix.nl> Also note that the problem only occurs when trying to build a unix-Python out-of-the-box on MacOSX. If you're building a Carbon Python from the MacPython sources (something very few people can do right now:-) the executable isn't called "python". And when a real MacOSX-Python will be done it'll have all the nifty packaging stuff that will also make sure that there's nothing called "python" in the toplevel folder. And the two workarounds (1-Use a UFS filesystem, 2-Put a ".exe" extension in the Makefile) work fine for the mean time. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Mon Jan 15 15:33:23 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 09:33:23 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib httplib.py,1.26,1.27 In-Reply-To: Your message of "Sun, 14 Jan 2001 23:18:20 PST." <20010114231820.C6081@lyra.org> References: <20010114231820.C6081@lyra.org> Message-ID: <200101151433.JAA17944@cj20424-a.reston1.va.home.com> > >... > > *** 333,337 **** > > i = host.find(':') > > if i >= 0: > > ! port = int(host[i+1:]) > > host = host[:i] > > else: > > --- 333,340 ---- > > i = host.find(':') > > if i >= 0: > > ! try: > > ! port = int(host[i+1:]) > > ! except ValueError, msg: > > ! raise socket.error, str(msg) > > host = host[:i] > > else: > > Did you intend to commit this? Oops. That was a patch submitted a while ago that I applied as an experiment but then decided I didn't like (argument: why bother). I've reverted it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 15 15:40:30 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 09:40:30 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 22:24:48 +0200." <20010115202448.38F60A828@darjeeling.zadka.site.co.il> References: <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <200101151440.JAA18045@cj20424-a.reston1.va.home.com> > I'm sorry! I meant to reply to tim alone, and ended up spamming python-dev! > Of course, the real culprit is the person who fixed up the reply-to in > the checkin messages to point to python-dev. Why was it done, and > isn't there a better way? This makes it painful to personally comment > on people's checkin messages. I suggest instead to add a mail-followup-to > header > > (Didn't anyone read "Reply-To Munging Considered Harmful"?) I agree with you, but Barry (who set this up) seems to believe that there's a good reason to do it this way. Barry, do you still feel that way? The auto-reply-all has probably tripped me up more than anyone. Anyone else have a strong reason why this should be set? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Tue Jan 16 00:03:25 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 16 Jan 2001 01:03:25 +0200 (IST) Subject: [Python-Dev] Someone should be shot In-Reply-To: <14946.64166.348139.425223@anthem.wooz.org> References: <14946.64166.348139.425223@anthem.wooz.org>, <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> Message-ID: <20010115230325.1C7F5A828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001 08:27:02 -0500, barry at digicool.com (Barry A. Warsaw) wrote: > > Or how about > > http://www.metasystema.org/essays/reply-to-useful.mhtml If your mailer doesn't have this option, you should request it from its development team. Any mailer, whose development team refuses this simple request due to some ideological position, cannot be said to be reasonable. As some people here know, I'm my mailer's "development team". I refuse to add it due to an ideological position. Anyone who knows me know I'm quite unreasonable. Hmmm....I'm not making much headway, am I ;-) > for a dissenting view. Of course Mail-Followup-To is completely > non-standard, but even if it were, having the mailing list munge it in > isn't recommended: > > http://cr.yp.to/proto/replyto.html This has no relevance to the current case, since python-checkin messages are machine-generated -- so this is closer to doing this in the script generating the checkin message, and only differes in implementation. > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! I won't continue this thread, but remember that my vote is "no". I simply shudder at the thought that I might send someone e-mail with something like "nice bugfix. Didn't know you were back from the sex-change operation", and it would be broadcast out to all Python-Dev *and* the archives, for posterity. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From thomas at xs4all.net Mon Jan 15 16:31:22 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 16:31:22 +0100 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14946.64166.348139.425223@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 08:27:02AM -0500 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> Message-ID: <20010115163122.I1005@xs4all.nl> On Mon, Jan 15, 2001 at 08:27:02AM -0500, Barry A. Warsaw wrote: > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! I've said this before, on the Mailman-devel list, but I'll repeat it here for the record (in case this issue ever comes up for vote again :) The main bite (for me) is that to reply to a person in private, you have to cut&paste the 'From' header from the original mail, and edit your new mail's headers, in order to reply to a specific person. My mailer is mature enough to have a 'reply', 'reply-group' and 'reply-list' keybinding, so the 'Reply-To' only interferes. There probably is a 'reply-to-from-ignoring-replyto' keybinding in there, too, somewhere, or it could be added, but remembering to type that different key is almost as much trouble as typing the email address by hand ;P So, my vote, like Moshe's, is just back from a sex change, and reads 'no'. Recount-recount-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Mon Jan 15 16:38:01 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 10:38:01 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 08:27:02 EST." <14946.64166.348139.425223@anthem.wooz.org> References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> Message-ID: <200101151538.KAA21937@cj20424-a.reston1.va.home.com> > Bottom line (IMHO), this is just something about email that is and > will forever remain broken. Given that, it was voted a long while > back to make Reply-To for checkins point to python-dev so until > there's a hue and cry to change it back, I'll leave it as is. And > yeah, it bites me sometimes too! It sounds like a hue and cry to change it to me! It looks like it's time for a BDFL Pronouncement. I pronounce: Given that: - we all know how to mail to python-dev; - replying to the sender is by far the most common kind of reply; - the mistake of replying to the sender when a reply-all was intended does much less potential harm than the mistake of replying to all when reply-to-sender was intended, the reply-to header shall be removed. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Mon Jan 15 17:57:19 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 11:57:19 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <14946.63472.282750.828218@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 08:15:28AM -0500 References: <3A62C82C.EA25AAF5@lemburg.com> <14946.63472.282750.828218@anthem.wooz.org> Message-ID: <20010115115719.B919@kronos.cnri.reston.va.us> On Mon, Jan 15, 2001 at 08:15:28AM -0500, Barry A. Warsaw wrote: >tree. It'd be nice if the outstanding distutils patches are >integrated before I dive in. I don't see anything relevant in patches >or bugs, but I don't know if there are other repositories of distutils >fixes (like the archives?). There are a few patches buried in the back archives, but I don't know of any outstanding bugfixes, so please report whatever problem you're seeing. Oh, and Barry, did the issue holding up your patch for adding shar support (#102313) ever get resolved? --amk From guido at python.org Mon Jan 15 17:02:39 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 11:02:39 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Mon, 08 Jan 2001 18:20:56 PST." <20010108182056.C4640@lyra.org> References: <20010108182056.C4640@lyra.org> Message-ID: <200101151602.LAA22272@cj20424-a.reston1.va.home.com> Greg Stein noticed me checking in *yet* another system that needs the fallback TELL64() definition in fileobjects.c, and wrote: > All of those #ifdefs could be tossed and it would be more robust (long term) > if an autoconf macro were used to specify when TELL64 should be defined. > > [ I've looked thru fileobject.c and am a bit confused: the conditions for > defining TELL64 do not match the conditions for *using* it. that would > seem to imply a semantic error somewhere and/or a potential gotcha when > they get skewed (like I assume what happened to FreeBSD). simplifying with > an autoconf macro may help to rationalize it. ] I have a better idea. Since "lseek((fd),0,SEEK_CUR)" seems to be the universal fallback, why not just define TELL64 to be that if it's not previously defined (currently only MS_WIN64 has a different definition)? It isn't always *used* (the conditions under which _portable_fseek() uses it are quite complex), but *when* it is used, this seems to be the most common definition... Patch: *** fileobject.c 2001/01/15 10:36:56 2.106 --- fileobject.c 2001/01/15 16:02:06 *************** *** 58,66 **** /* define the appropriate 64-bit capable tell() function */ #if defined(MS_WIN64) #define TELL64 _telli64 ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) ! /* NOTE: this is only used on older ! NetBSD prior to f*o() funcions */ #define TELL64(fd) lseek((fd),0,SEEK_CUR) #endif --- 58,65 ---- /* define the appropriate 64-bit capable tell() function */ #if defined(MS_WIN64) #define TELL64 _telli64 ! #else ! /* Fallback for older systems that don't have the f*o() funcions */ #define TELL64(fd) lseek((fd),0,SEEK_CUR) #endif I'll check this in after 24 hours unless a better idea comes up. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jan 15 17:17:07 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 11:17:07 -0500 Subject: [Python-Dev] PEP 205 comments In-Reply-To: Your message of "Fri, 12 Jan 2001 23:19:57 +0100." <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> References: <200101122219.f0CMJvp01376@mira.informatik.hu-berlin.de> Message-ID: <200101151617.LAA22359@cj20424-a.reston1.va.home.com> I'll leave most of this to Fred, but I'll reply to two items (Fred can add these replies to the PEP): > Again on proxies, there is no discussion or documentation of the > ReferenceError. Why is it a RuntimeError? LookupError, ValueError, and > AttributeError seem to be just as fine or better. RuntimeError was my suggestion. The error doesn't really qualify as a LookupError in my view (there's no key that could be valid or invalid) and ValueError seems too general (that's typically used for out-of-range arguments and unparseable strings and the like). Do you have a reason why RuntimeError is inappropriate? > On to the type type extensions: Should there be a type flag indicating > presence of tp_weaklistoffset? It appears that the type structure had > tp_xxx7 for a long time, so likely all in-use binary modules have > that field set to zero. Is that sufficient? Yes, that should be sufficient. (I'm also going to clain tp_xxx7 for the rich comparison function slot, but either patch can be modified to use tp_xxx8 instead.) Maybe it's time to add a bunch of new spares? > Thanks for reading all of this message, You're welcome. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Mon Jan 15 17:39:03 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 11:39:03 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> Message-ID: <14947.10151.575008.869188@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> the reply-to header shall be removed. I'm more than happy to do this (I remember adding the reply-to munging reluctantly). Understand one thing: anybody who naively replies to the whole list will send those replies to python-checkins, not python-dev. Still want it? -Barry From barry at digicool.com Mon Jan 15 17:46:28 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 11:46:28 -0500 Subject: [Python-Dev] Where's Greg Ward ? References: <3A62C82C.EA25AAF5@lemburg.com> <14946.63472.282750.828218@anthem.wooz.org> <20010115115719.B919@kronos.cnri.reston.va.us> Message-ID: <14947.10596.733726.995351@anthem.wooz.org> >>>>> "AK" == Andrew Kuchling writes: AK> There are a few patches buried in the back archives, but I AK> don't know of any outstanding bugfixes, so please report AK> whatever problem you're seeing. Okay, will do. AK> Oh, and Barry, did the issue holding up your patch for adding AK> shar support (#102313) ever get resolved? No, but I'll try to take another poke at it. -Barry From moshez at zadka.site.co.il Tue Jan 16 02:07:48 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Tue, 16 Jan 2001 03:07:48 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: References: Message-ID: <20010116010748.41869A828@darjeeling.zadka.site.co.il> On Mon, 15 Jan 2001, Guido van Rossum wrote: > Modified Files: > Meta.py > Log Message: > Geoffrey Gerrietts discovered that a KeyError was caught that probably > should have been a NameError. I'm checking in a change that catches > both, just to be sure -- I can't be bothered trying to understand this > code any more. :-) ... > ! except (KeyError, AttributeError): Ummmm....can you be bothered to make sure you really meant AttributeError when you said NameError? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From guido at python.org Mon Jan 15 18:06:07 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 12:06:07 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: Your message of "Mon, 15 Jan 2001 11:39:03 EST." <14947.10151.575008.869188@anthem.wooz.org> References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> Message-ID: <200101151706.MAA22884@cj20424-a.reston1.va.home.com> > I'm more than happy to do this (I remember adding the reply-to munging > reluctantly). Understand one thing: anybody who naively replies to > the whole list will send those replies to python-checkins, not > python-dev. > > Still want it? Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Mon Jan 15 18:11:29 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 12:11:29 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <200101151706.MAA22884@cj20424-a.reston1.va.home.com> Message-ID: <14947.12097.613433.580928@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm more than happy to do this (I remember adding the reply-to >> munging reluctantly). Understand one thing: anybody who >> naively replies to the whole list will send those replies to >> python-checkins, not python-dev. Still want it? GvR> Yes. Done. From thomas at xs4all.net Mon Jan 15 18:34:37 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 18:34:37 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ftplib.py,1.47,1.48 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, Jan 15, 2001 at 08:32:52AM -0800 References: Message-ID: <20010115183437.J1005@xs4all.nl> On Mon, Jan 15, 2001 at 08:32:52AM -0800, Guido van Rossum wrote: > This is slightly controversial, but after reading the argumentation in > the bug tracker for and against, I believe this is the right solution. It's really only slightly controversional. 'mfisk' convinced me too, and I used to use ftp to a server behind a firewall :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Mon Jan 15 19:21:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 15 Jan 2001 19:21:54 +0100 Subject: [Python-Dev] Re: Someone should be shot References: <3A62EC93.9AA60ABA@lemburg.com>, <3A62E8B6.3DFC1FA2@lemburg.com>, <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <20010115204215.84F0CA828@darjeeling.zadka.site.co.il> Message-ID: <3A633FC2.11F90E94@lemburg.com> Moshe Zadka wrote: > > On Mon, 15 Jan 2001 13:26:59 +0100, "M.-A. Lemburg" wrote: > > > In fact I like it, that replies go to python-dev ... after all, > > that's where these things should be discussed. > > Well, that's the mailing list where things should be discussed. > But when I press the "Reply" button (as opposed to "Reply to List" button) > I expect my e-mail to go to the person originating the e-mail. > Reply-To: means "I'd like to get replies to some other address". > What if, say, a checkin message relates to some private topic > I'd discussed with someone: I'd like to reply to him personally. > > I agree that responses to Python-Checkins should be handled on Python-Dev: > that's what the mail-followup-to header is for. Ah, ok. I thought you pressed Reply-All and then wondered why your message got copied to python-dev... > > BTW, in case you misunderstood my reply: it would indeed make > > sense to automate these kinds of check (tabnanny et al). > > Oh, ok. The "cron" part threw me off (why cron?) CRON is what's used on Unix to implement jobs which run on a regular basis... perhaps we just need to seup the CRON job in timbot though ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Jan 15 19:35:54 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 13:35:54 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: Your message of "Tue, 16 Jan 2001 03:07:48 +0200." <20010116010748.41869A828@darjeeling.zadka.site.co.il> References: <20010116010748.41869A828@darjeeling.zadka.site.co.il> Message-ID: <200101151835.NAA26712@cj20424-a.reston1.va.home.com> > > Modified Files: > > Meta.py > > Log Message: > > Geoffrey Gerrietts discovered that a KeyError was caught that probably > > should have been a NameError. I'm checking in a change that catches > > both, just to be sure -- I can't be bothered trying to understand this > > code any more. :-) > ... > > ! except (KeyError, AttributeError): > > Ummmm....can you be bothered to make sure you really meant AttributeError > when you said NameError? The code is correct. Ignore the comment. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Mon Jan 15 12:55:51 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 03:55:51 -0800 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14947.10151.575008.869188@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 15, 2001 at 11:39:03AM -0500 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> Message-ID: <20010115035550.B4336@glacier.fnational.com> [Barry on removing the reply-to header on python-checkins messages] > I'm more than happy to do this (I remember adding the reply-to munging > reluctantly). Understand one thing: anybody who naively replies to > the whole list will send those replies to python-checkins, not > python-dev. Could you make the script generate mail-followup-to instead of reply-to? I know its not a standard header but some MUA understand it and it is exactly what is needed to solve this problem. I think promoting it is a good thing. Neil From thomas at xs4all.net Mon Jan 15 19:59:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 15 Jan 2001 19:59:12 +0100 Subject: [Python-Dev] Someone should be shot In-Reply-To: <20010115035550.B4336@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 15, 2001 at 03:55:51AM -0800 References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <20010115035550.B4336@glacier.fnational.com> Message-ID: <20010115195912.K1005@xs4all.nl> On Mon, Jan 15, 2001 at 03:55:51AM -0800, Neil Schemenauer wrote: > [Barry on removing the reply-to header on python-checkins messages] > > I'm more than happy to do this (I remember adding the reply-to munging > > reluctantly). Understand one thing: anybody who naively replies to > > the whole list will send those replies to python-checkins, not > > python-dev. > Could you make the script generate mail-followup-to instead of > reply-to? I know its not a standard header but some MUA > understand it and it is exactly what is needed to solve this > problem. I think promoting it is a good thing. The script just calls '/bin/mail'. The Reply-To munging is done by Mailman, which is slightly more than 'a script'. syncmail could do it, but that would mean using sendmail instead of mail, and writing all headers itself. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at python.org Mon Jan 15 20:17:27 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 14:17:27 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: Your message of "Fri, 05 Jan 2001 14:14:49 EST." <14934.7465.360749.199433@localhost.localdomain> References: <14934.7465.360749.199433@localhost.localdomain> Message-ID: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> There doesn't seem to be a lot of enthousiasm for a Unittest bakeoff... Certainly I don't think I'll get to this myself before the conference. How about the following though: talking of low-hanging fruit, Tim's doctest module is an excellent thing even if it isn't a unit testing framework! (I found this out when I played with it -- it's real easy to get used to...) Would anyone object against Tim checking this in? Since it isn't a contender in the unit test bake-off, it shouldn't affect the outcome there at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Mon Jan 15 20:40:03 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 14:40:03 -0500 Subject: [Python-Dev] Someone should be shot References: <3A62E8B6.3DFC1FA2@lemburg.com> <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> <20010115202448.38F60A828@darjeeling.zadka.site.co.il> <14946.64166.348139.425223@anthem.wooz.org> <200101151538.KAA21937@cj20424-a.reston1.va.home.com> <14947.10151.575008.869188@anthem.wooz.org> <20010115035550.B4336@glacier.fnational.com> <20010115195912.K1005@xs4all.nl> Message-ID: <14947.21011.310090.686632@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: >> Could you make the script generate mail-followup-to instead of >> reply-to? I know its not a standard header but some MUA >> understand it and it is exactly what is needed to solve this >> problem. I think promoting it is a good thing. TW> The script just calls '/bin/mail'. The Reply-To munging is TW> done by Mailman, which is slightly more than 'a TW> script'. syncmail could do it, but that would mean using TW> sendmail instead of mail, and writing all headers itself. I'm sure Fred or I would be happy to review such a patch to syncmail . -Barry From jeremy at alum.mit.edu Mon Jan 15 20:31:44 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 15 Jan 2001 14:31:44 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> Message-ID: <14947.20512.140859.119597@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: GvR> There doesn't seem to be a lot of enthousiasm for a Unittest GvR> bakeoff... Certainly I don't think I'll get to this myself GvR> before the conference. Let's have all the interested parties vote now, then. It would certainly be helpful to have the new unittest module in the alpha release of 2.1. I'd like to write some new tests and I'd rather use the new stuff than the old stuff. Jeremy From tim.one at home.com Mon Jan 15 21:01:52 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:01:52 -0500 Subject: [Python-Dev] Someone should be shot In-Reply-To: <14947.10151.575008.869188@anthem.wooz.org> Message-ID: [Barry] > ... > Understand one thing: anybody who naively replies to the whole > list will send those replies to python-checkins, not python-dev. IIRC, that's why the redirect to python-dev was added to begin with: of course people will reply to python-checkins, and then the next guy x-posts to python-dev too, and the next three in turn variously remove one or the other groups, or keep both or add c.l.py too. In the end, no single archive contains a coherent record on its own, and the random mix of "[Python-Dev]" and "[Python-checkins]" Subject tags even make it impossible to sort by (true) subject easily in your own mail client. > Still want it? Don't care . From tim.one at home.com Mon Jan 15 21:08:15 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:08:15 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib tabnanny.py,1.10,1.11 telnetlib.py,1.8,1.9 tempfile.py,1.26,1.27 threading.py,1.10,1.11 toaiff.py,1.8,1.9 tokenize.py,1.15,1.16 traceback.py,1.18,1.19 tty.py,1.2,1.3 tzparse.py,1.8,1.9 In-Reply-To: <20010115201318.A2E73A828@darjeeling.zadka.site.co.il> Message-ID: [] > Modified Files: > tabnanny.py > Log Message: > Whitespace normalization. [Moshe] > hmmmmmm....... LOL! I was hoping nobody would notice that <0.7 wink>. The appalling truth is that late in tabnanny's development I deliberately indented a large block of code by one column, and actually thought it was a good idea at the time. I'm as delighted to see that finally fixed as I am emabarrassed by the necessity. although-perhaps-more-appalled-that-was-there-was-followup- debate-about-followups-containing-more-msgs-than-there- were-characters-in-moshe's-followup-ly y'rs - tim From ping at lfw.org Mon Jan 15 21:10:10 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 12:10:10 -0800 (PST) Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <002801c07efe$0c728a80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: On Mon, 15 Jan 2001, Tony J Ibbs (Tibs) wrote: > I get the documentation for OK, but it is preceded with a line > claiming that: > > The system cannot find the path specified. Thanks for the NT testing. That's funny -- i put in a special case for Windows to avoid messages like the above a couple of days ago. How recently did you download pydoc.py? Does your copy contain: if hasattr(sys, 'winver'): return lambda text: tempfilepager(text, 'more') ? > .py -- information about the module > > which, when pydoc'ed, results in a NAME line which starts with > twice... > Of course, if I'm the only person doing this, I'll just have to, well, > stop...) I think i'm going to ask you to stop, unless Guido prefers otherwise. Guido, do you have a style pronouncement for module docstrings? > A request - a "-f" switch to allow the user to specify a particular > Python file (i.e., something not on the PYTHONPATH). Yes, it's on my to-do list. So you can see what i'm up to, here's my current to-do list: make boldness optional (only if using more/less? only Unix?) document a .py file given on the command line + webserver in background help should have a repr write a better htmlrepr (\n should look special, max length limit, etc.) generate docs from lib HTML generate HTML index from precis and __path__ and package contents list have help(...) produce a directory of available things to ask for help on curses.wrapper is broken: both function and package respect package __all__ coherent answer to .py vs .pyc: do we show .pyc? fix getcomments() bug: last two lines stuck together + grey out shadowed modules/packages refactor .py/.pyc/.module.so/.module.so.1 listers in htmldoc, textdoc skip __main__ module + index built-in modules too Windows and Mac testing default to HTTP mode on GUI platforms? (win, mac) The ones marked with + i consider done. Feel free to comment on or suggest priorities for the others; in particular, what do you think of the last one? The idea is that double-clicking on pydoc.py in Windows or MacOS could launch the server and then open the localhost URL using webbrowser.py to display the documentation index. Should it do this by default? -- ?!ng From guido at python.org Mon Jan 15 21:41:25 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 15:41:25 -0500 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Your message of "Mon, 15 Jan 2001 12:10:10 PST." References: Message-ID: <200101152041.PAA32298@cj20424-a.reston1.va.home.com> > > .py -- information about the module > > > > which, when pydoc'ed, results in a NAME line which starts with > > twice... > > Of course, if I'm the only person doing this, I'll just have to, well, > > stop...) > > I think i'm going to ask you to stop, unless Guido prefers > otherwise. Guido, do you have a style pronouncement for module > docstrings? I'm with Ping. None of the examples in the style guide start the docstring with the function name. Almost none of the standard library modules start their module docstring with the module name (codecs is an exception, but I didn't write it :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn at worldonline.dk Mon Jan 15 21:45:02 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Mon, 15 Jan 2001 20:45:02 GMT Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <3A62C66E.2BB69E61@lemburg.com> References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3A62C66E.2BB69E61@lemburg.com> Message-ID: <3a636122.45847835@smtp.worldonline.dk> [Fredrik Lundh] > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? [M.-A. Lemburg] >Since the Unicode character names are probably >not used for performance sensitive tasks, I suggest to >checkin the smallest version possible. > >If it is too much work to get Finn's version recoded in C >(presuming it's written in Java), then I'd suggest checking >in your version until someone comes up with a yet smaller >edition. FWIW, I agree the that 160k module should be used. Please, nobody should use the jython compression as an argument to delay any improvements in CPython. I certainly didn't post because I wanted to complicate your processes. I just wanted to show off . regards, finn From fredrik at effbot.org Mon Jan 15 21:58:11 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Mon, 15 Jan 2001 21:58:11 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <010f01c07e52$e9801fc0$e46940d5@hagrid> <3A62C66E.2BB69E61@lemburg.com> <3a636122.45847835@smtp.worldonline.dk> Message-ID: <001f01c07f35$e2c09500$e46940d5@hagrid> mal, finn: > >If it is too much work to get Finn's version recoded in C > >(presuming it's written in Java), then I'd suggest checking > >in your version until someone comes up with a yet smaller > >edition. > > FWIW, I agree the that 160k module should be used. Please, nobody should > use the jython compression as an argument to delay any improvements in > CPython. okay, unless someone throws in a -1 vote, I'll check this in tomorrow. Cheers /F From tim.one at home.com Mon Jan 15 21:57:26 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 15:57:26 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <010f01c07e52$e9801fc0$e46940d5@hagrid> Message-ID: [Fredrik Lundh] > The name database portions of SF task 17335 ("add > compressed unicode database") were postponed to > 2.1. > > My current patch replaces the ~450k large ucnhash > module with a new ~160k large module. (See earlier > posts for more info on how the new database works). > > Should I check it in? Absolutely! But not like as for 2.0: check it in *now*, so we have a few days to deal with surprises before the alpha release. With 300K sitting on the table waiting to be taken, it's not worth delaying one hour to worry about 60K additional that may or may not be achievable later. From ping at lfw.org Mon Jan 15 22:02:38 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 13:02:38 -0800 (PST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Demo/metaclasses Meta.py,1.3,1.4 In-Reply-To: <20010116010748.41869A828@darjeeling.zadka.site.co.il> Message-ID: On Tue, 16 Jan 2001, Moshe Zadka wrote: > Ummmm....can you be bothered to make sure you really meant AttributeError > when you said NameError? Nice bugfix. Didn't know you were back from the sex-change operation. -- ?!ng From tim.one at home.com Mon Jan 15 22:15:54 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 16:15:54 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <200101151917.OAA29687@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > There doesn't seem to be a lot of enthousiasm for a Unittest > bakeoff... I'm enthusiastic, but ... > Certainly I don't think I'll get to this myself before the > conference. Ditto. Takes time that's not there. > ... > Would anyone object against Tim checking [doctest] in? You suggested that before, and so it was already on my 2.1a1 todo list. Hoped to get to it over the weekend but didn't. Hope to get to it today, but won't . On the chance that I do, anyone inclined to object should do so before the sun sets in Reston. or-if-it-never-sets-the-world-ends-anyway-ly y'rs - tim From akuchlin at mems-exchange.org Mon Jan 15 22:26:19 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 16:26:19 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14947.20512.140859.119597@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 15, 2001 at 02:31:44PM -0500 References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> <14947.20512.140859.119597@localhost.localdomain> Message-ID: <20010115162619.A19484@kronos.cnri.reston.va.us> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: >Let's have all the interested parties vote now, then. It would >certainly be helpful to have the new unittest module in the alpha >release of 2.1. I'd like to write some new tests and I'd rather use >the new stuff than the old stuff. Huh? If no one has tried the different modules, what's the point of having a vote? (Given that doctest is going to be added, though, it should be checked in ASAP.) --amk From trentm at ActiveState.com Mon Jan 15 23:10:26 2001 From: trentm at ActiveState.com (Trent Mick) Date: Mon, 15 Jan 2001 14:10:26 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <200101151602.LAA22272@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 11:02:39AM -0500 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> Message-ID: <20010115141026.I29870@ActiveState.com> On Mon, Jan 15, 2001 at 11:02:39AM -0500, Guido van Rossum wrote: > Greg Stein noticed me checking in *yet* another system that needs > the fallback TELL64() definition in fileobjects.c, and wrote: > > > All of those #ifdefs could be tossed and it would be more robust (long term) > > if an autoconf macro were used to specify when TELL64 should be defined. > > > > [ I've looked thru fileobject.c and am a bit confused: the conditions for > > defining TELL64 do not match the conditions for *using* it. that would > > seem to imply a semantic error somewhere and/or a potential gotcha when > > they get skewed (like I assume what happened to FreeBSD). simplifying with > > an autoconf macro may help to rationalize it. ] The problem is that these systems lie when they "say" (according to Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have largefile support. This seems to have happened for a particular release of BSD (which has since been fixed). I think that the Right(tm) (meaning the cleanest solution where the tests and definitions in the code actually represent the truth) answer is a proper configure test (sort of as Greg suggests). I don't really feel comfortable writing that patch (because (1) lack of time and (2) inability to test, I don't have any access to any of these BSD machines). [Guido] > > I have a better idea. Since "lseek((fd),0,SEEK_CUR)" seems to be the > universal fallback, why not just define TELL64 to be that if it's not > previously defined (currently only MS_WIN64 has a different > definition)? It isn't always *used* (the conditions under which > _portable_fseek() uses it are quite complex), but *when* it is used, > this seems to be the most common definition... While I agree that it is annoying that the build breaks for these platforms I think that it is appropriate that the build breaks. Having to put these: #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) definitions here gives a nice list of those platforms that *do* lie. I would prefer that to having an "#else" block that just captures all other cases, but that is just my opinion. Options (in order of preference): (1) Update the configure test for HAVE_LARGEFILE_SUPPORT such that the proper versions of these OSes do *not* #define it. (2) Guido's suggestion. (2) Keep extending the "#elif" list. ^---- using (2) twice was intentional Trent > > *** fileobject.c 2001/01/15 10:36:56 2.106 > --- fileobject.c 2001/01/15 16:02:06 > *************** > *** 58,66 **** > /* define the appropriate 64-bit capable tell() function */ > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #elif defined(__NetBSD__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(_HAVE_BSDI) || defined(__APPLE__) > ! /* NOTE: this is only used on older > ! NetBSD prior to f*o() funcions */ > #define TELL64(fd) lseek((fd),0,SEEK_CUR) > #endif > > --- 58,65 ---- > /* define the appropriate 64-bit capable tell() function */ > #if defined(MS_WIN64) > #define TELL64 _telli64 > ! #else > ! /* Fallback for older systems that don't have the f*o() funcions */ > #define TELL64(fd) lseek((fd),0,SEEK_CUR) > #endif > > > I'll check this in after 24 hours unless a better idea comes up. > Better idea but no patch. :( Trent -- Trent Mick TrentM at ActiveState.com From skip at mojam.com Mon Jan 15 23:10:36 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 15 Jan 2001 16:10:36 -0600 (CST) Subject: [Python-Dev] should we start instrumenting modules with __all__? Message-ID: <14947.30044.934204.951564@beluga.mojam.com> I see the from-import-* patch for __all__ has been checked in. Should we make an effort to add __all__ to at least some modules before 2.1a1? Skip From akuchlin at mems-exchange.org Mon Jan 15 23:13:03 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 15 Jan 2001 17:13:03 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: <200101121351.IAA19676@cj20424-a.reston1.va.home.com>; from guido@python.org on Fri, Jan 12, 2001 at 08:51:51AM -0500 References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> <200101121351.IAA19676@cj20424-a.reston1.va.home.com> Message-ID: <20010115171303.A23626@kronos.cnri.reston.va.us> On Fri, Jan 12, 2001 at 08:51:51AM -0500, Guido van Rossum wrote: >Ah. It's very simple. I create a directory "linux" as a subdirectory >of the Python source tree (i.e. at the same level as Lib, Objects, >etc.). Then I chdir into that directory, and I say "../configure". >The configure script creates subdirectories to hold the object files ... >Then I say "make" and it builds Python. This doesn't work at all for me in my copy of the CVS tree. Are there other steps or requirements to make this work. (Transcript available upon request, but I suspect I'm missing something simple.) --amk From tim.one at home.com Mon Jan 15 23:32:51 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 17:32:51 -0500 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: [Jeremy] > Let's have all the interested parties vote now, then. It would > certainly be helpful to have the new unittest module in the alpha > release of 2.1. I'd like to write some new tests and I'd rather use > the new stuff than the old stuff. [Andrew] > Huh? If no one has tried the different modules, what's the point of > having a vote? Presumably so that *something* gets into 2.1a1. At least you, Jeremy and Fredrik have tried them, and if that's all there can't be a tie . I would agree this is not an ideal decision procedure. the-question-is-whether-it's-better-than-paralysis-ly y'rs - tim From ping at lfw.org Mon Jan 15 23:35:47 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 14:35:47 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' Message-ID: I don't know whether this is going to be obvious or controversial, but here goes. Most of the time we're used to seeing a newline as '\n', not as '\012', and newlines are typed in as '\n'. A newcomer to Python is likely to do >>> 'hello\n' 'hello\012' and ask "what's \012?" -- whereupon one has to explain that it's an octal escape, that 012 in octal equals 10, and that chr(10) is newline, which is the same as '\n'. You're bound to run into this, and you'll see \012 a lot, because \n is such a common character. Aside from being slightly more frightening, '\012' also takes up twice as many characters as necessary. So... i'm submitting a patch that causes the three most common special whitespace characters, '\n', '\r', and '\t', to appear in their natural form rather than as octal escapes when strings are printed and repr()ed. Mm? -- ?!ng From esr at thyrsus.com Tue Jan 16 00:15:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 18:15:50 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from ping@lfw.org on Mon, Jan 15, 2001 at 02:35:47PM -0800 References: Message-ID: <20010115181550.A11566@thyrsus.com> Ka-Ping Yee : > I don't know whether this is going to be obvious or controversial, > but here goes. Most of the time we're used to seeing a newline as > '\n', not as '\012', and newlines are typed in as '\n'. > > A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > > and ask "what's \012?" -- whereupon one has to explain that it's an > octal escape, that 012 in octal equals 10, and that chr(10) is > newline, which is the same as '\n'. You're bound to run into this, > and you'll see \012 a lot, because \n is such a common character. > Aside from being slightly more frightening, '\012' also takes up > twice as many characters as necessary. > > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. Works for me. I'd add \v, \b and \a to cover the whole ANSI C standard escape set (hmmm...am I missing any?) -- Eric S. Raymond Live free or die; death is not the worst of evils. -- General George Stark. From thomas at xs4all.net Tue Jan 16 00:49:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 00:49:30 +0100 Subject: [Python-Dev] time functions Message-ID: <20010116004930.L1005@xs4all.nl> Maybe this is a dead and buried subject, but I'm going to try anyway, since everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood lately :) Why do we need the following atrocity : timestr = time.strftime("", time.localtime(time.time())) To do the simple task of 'date +' ? I never really understood why there isn't a way to get a timetuple directly from C, rather than converting a float that we got from C a bytecode before, even though the higher level almost always deals with timetuples. How about making the float-to-tuple functions (time.localtime, time.gmtime) accept 0 arguments as well, and defaulting to time.time() in that case ? Even better, how about doing the same for the other functions, too ? (where it makes sense, of course :) Actually, I'll split it up in three proposals: - Making the time in time.strftime default to 'now', so that the above becomes the ever so slightly confusing: timestr = time.strftime("") (confusing because it looks a bit like a regexp constructor...) - Making the time in time.asctime and time.ctime optional, defaulting to 'now', so you can just call 'time.ctime()' without having to pass time.time() (which are about half the calls in my own code :) - Making the time in time.localtime and time.gmtime default to 'now'. I'm 0/+1/+1 myself :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 16 00:55:36 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 00:55:36 +0100 Subject: [Python-Dev] TELL64 In-Reply-To: <20010115141026.I29870@ActiveState.com>; from trentm@ActiveState.com on Mon, Jan 15, 2001 at 02:10:26PM -0800 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> Message-ID: <20010116005536.M1005@xs4all.nl> On Mon, Jan 15, 2001 at 02:10:26PM -0800, Trent Mick wrote: > > > [ I've looked thru fileobject.c and am a bit confused: the conditions > > > for defining TELL64 do not match the conditions for *using* it. that > > > would seem to imply a semantic error somewhere and/or a potential > > > gotcha when they get skewed (like I assume what happened to > > > FreeBSD). simplifying with an autoconf macro may help to rationalize > > > it. ] > The problem is that these systems lie when they "say" (according to > Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have > largefile support. This seems to have happened for a particular release of > BSD (which has since been fixed). I think that the Right(tm) (meaning the > cleanest solution where the tests and definitions in the code actually > represent the truth) answer is a proper configure test (sort of as Greg > suggests). I don't really feel comfortable writing that patch (because (1) > lack of time and (2) inability to test, I don't have any access to any of > these BSD machines). There is no (longer any) 'single BSD release', so I doubt it has 'since been fixed' :) We should consider the different BSD derived OSes as separate, if slightly related, systems (much like SunOS <-> BSD.) The problem in the BSDI case is really simple: the autoconf test doesn't test whether the fs really supports large files, but rather whether the system has an off_t type that is 64 bits. BSDI has that type, but does not actually use it in any of the seek/tell functions. This has not been 'fixed' as far as I know, precisely because it isn't 'broken' :) I tried to fix the test, but I have been completely unable to find a proper test. There doesn't seem to be a 'standard' one, and I wasn't able to figure out what, say, 'zsh' uses -- black autoconf magic, for sure. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From trentm at ActiveState.com Tue Jan 16 01:24:54 2001 From: trentm at ActiveState.com (Trent Mick) Date: Mon, 15 Jan 2001 16:24:54 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <20010116005536.M1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 12:55:36AM +0100 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> Message-ID: <20010115162454.D3864@ActiveState.com> On Tue, Jan 16, 2001 at 12:55:36AM +0100, Thomas Wouters wrote: > On Mon, Jan 15, 2001 at 02:10:26PM -0800, Trent Mick wrote: > > > The problem is that these systems lie when they "say" (according to > > Python's configure tests for HAVE_LARGEFILE_SUPPORT) that they have > > largefile support. This seems to have happened for a particular release of > > BSD (which has since been fixed). I think that the Right(tm) (meaning the > > cleanest solution where the tests and definitions in the code actually > > represent the truth) answer is a proper configure test (sort of as Greg > > suggests). I don't really feel comfortable writing that patch (because (1) > > lack of time and (2) inability to test, I don't have any access to any of > > these BSD machines). > > There is no (longer any) 'single BSD release', so I doubt it has 'since been > fixed' :) Okay sure (showing my ignorance). My only understanding was that this "lying" was the case for some unspecified BSDs a while ago but that the latest releases of any of them *did* have largefile support. > > I tried to fix the test, but I have been completely unable to find a proper > test. There doesn't seem to be a 'standard' one, and I wasn't able to figure > out what, say, 'zsh' uses -- black autoconf magic, for sure. Hmmm... if one code encode whether or not a 64-bit fseek could be implemented (either using fseek, fseek0, fseek64, _fseek, fsetpos/fgetpos, etc.) in a short C program then that would be the test (or at least most of the test, might have to see if ftell could be implemented as well). Or are there other requirements? Trent -- Trent Mick TrentM at ActiveState.com From esr at thyrsus.com Tue Jan 16 02:26:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 20:26:14 -0500 Subject: [Python-Dev] time functions In-Reply-To: <20010116004930.L1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 12:49:30AM +0100 References: <20010116004930.L1005@xs4all.nl> Message-ID: <20010115202614.A11732@thyrsus.com> Thomas Wouters : > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) Likewise. -- Eric S. Raymond Never trust a man who praises compassion while pointing a gun at you. From barry at digicool.com Tue Jan 16 03:14:33 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 15 Jan 2001 21:14:33 -0500 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <14947.44681.254332.976234@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> I'm 0/+1/+1 myself :) Maybe I'm an inch on the +0/+1/+1 side. :) From jeremy at alum.mit.edu Tue Jan 16 01:11:59 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 15 Jan 2001 19:11:59 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115162619.A19484@kronos.cnri.reston.va.us> References: <14934.7465.360749.199433@localhost.localdomain> <200101151917.OAA29687@cj20424-a.reston1.va.home.com> <14947.20512.140859.119597@localhost.localdomain> <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: <14947.37327.395622.66435@localhost.localdomain> >>>>> "AMK" == Andrew Kuchling writes: AMK> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: >> Let's have all the interested parties vote now, then. It would >> certainly be helpful to have the new unittest module in the alpha >> release of 2.1. I'd like to write some new tests and I'd rather >> use the new stuff than the old stuff. AMK> Huh? If no one has tried the different modules, what's the AMK> point of having a vote? (Given that doctest is going to be AMK> added, though, it should be checked in ASAP.) Guido is the only person that said he hadn't tried anything. If others have given it a whirl, they ought to chime in now. If very few people have given them a try, we should decide whether we wait for them or proceed without them. We can't wait indefinitely. I'm not sure when we need to decide. Jeremy From nas at arctrix.com Mon Jan 15 20:40:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 11:40:55 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.82,2.83 In-Reply-To: <200101132225.RAA03197@cj20424-a.reston1.va.home.com>; from guido@python.org on Sat, Jan 13, 2001 at 05:25:12PM -0500 References: <20010113071758.C28643@glacier.fnational.com> <200101132225.RAA03197@cj20424-a.reston1.va.home.com> Message-ID: <20010115114055.A5879@glacier.fnational.com> On Sat, Jan 13, 2001 at 05:25:12PM -0500, Guido van Rossum wrote: > Do you have a tool that detects leaks? debauch is showing promise athough it is still pretty rough around the edges. memprof is another option. It looks like init_exceptions may be leaking memory. Some debauch output: 1 Leaked Memory 0x0849cf98, size 44 (from 0x0) AllocTime: 79269 FreeTime: 43436 return stack: ???:?? (0x40016005) classobject.c:84 (0x805c16d) exceptions.c:337 (0x8088594) exceptions.c:1061 (0x80898dc) pythonrun.c:151 (0x8053581) loop.c:23 (0x8053305) I haven't figured out if this is a real leak yet. Neil From michel at digicool.com Tue Jan 16 07:33:00 2001 From: michel at digicool.com (Michel Pelletier) Date: Mon, 15 Jan 2001 22:33:00 -0800 (PST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: <14947.37327.395622.66435@localhost.localdomain> Message-ID: On Mon, 15 Jan 2001, Jeremy Hylton wrote: > >>>>> "AMK" == Andrew Kuchling writes: > > AMK> On Mon, Jan 15, 2001 at 02:31:44PM -0500, Jeremy Hylton wrote: > >> Let's have all the interested parties vote now, then. It would > >> certainly be helpful to have the new unittest module in the alpha > >> release of 2.1. I'd like to write some new tests and I'd rather > >> use the new stuff than the old stuff. > > AMK> Huh? If no one has tried the different modules, what's the > AMK> point of having a vote? (Given that doctest is going to be > AMK> added, though, it should be checked in ASAP.) > > Guido is the only person that said he hadn't tried anything. If > others have given it a whirl, they ought to chime in now. I have used pyunit to create a simple set of tests. It seemed to do the job well and it was very easy. I'd never done it before and the docs were fat and A+. I can only give a one-sided opinion. I know of AMK's work but I have not used it, are there others? -Michel From akuchlin at mems-exchange.org Tue Jan 16 04:03:31 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Mon, 15 Jan 2001 22:03:31 -0500 Subject: [Python-Dev] Detecting install time Message-ID: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> For PEP 229, the setup.py script needs to figure out if it's running from the build directory, because then distutils.sysconfig needs to look at different config files; ./Modules/Makefile instead of /usr/lib/python2.0/config/Makefile, and so forth. Is there a simple/clean way to do this? --amk From guido at python.org Tue Jan 16 04:21:43 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:21:43 -0500 Subject: [Python-Dev] PEP 229: setup.py revised In-Reply-To: Your message of "Mon, 15 Jan 2001 17:13:03 EST." <20010115171303.A23626@kronos.cnri.reston.va.us> References: <200101112155.QAA16678@cj20424-a.reston1.va.home.com> <20010111172633.A26249@kronos.cnri.reston.va.us> <200101121351.IAA19676@cj20424-a.reston1.va.home.com> <20010115171303.A23626@kronos.cnri.reston.va.us> Message-ID: <200101160321.WAA00648@cj20424-a.reston1.va.home.com> > On Fri, Jan 12, 2001 at 08:51:51AM -0500, Guido van Rossum wrote: > >Ah. It's very simple. I create a directory "linux" as a subdirectory > >of the Python source tree (i.e. at the same level as Lib, Objects, > >etc.). Then I chdir into that directory, and I say "../configure". > >The configure script creates subdirectories to hold the object files ... > >Then I say "make" and it builds Python. > > This doesn't work at all for me in my copy of the CVS tree. Are there > other steps or requirements to make this work. (Transcript available > upon request, but I suspect I'm missing something simple.) You can't start doing this in a tree where you have already built Python using the default way -- you have to use a pristine tree. The reason is the funny way Make's VPATH feature works, it sees the .o files in the source directory and then thinks it doesn't have to creat the .o file in the build directory. I think a "make clobber" at the top level would probably eradicate everything that confuses Make. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:24:04 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:24:04 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 14:35:47 PST." References: Message-ID: <200101160324.WAA00677@cj20424-a.reston1.va.home.com> > I don't know whether this is going to be obvious or controversial, > but here goes. Most of the time we're used to seeing a newline as > '\n', not as '\012', and newlines are typed in as '\n'. > > A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > > and ask "what's \012?" -- whereupon one has to explain that it's an > octal escape, that 012 in octal equals 10, and that chr(10) is > newline, which is the same as '\n'. You're bound to run into this, > and you'll see \012 a lot, because \n is such a common character. > Aside from being slightly more frightening, '\012' also takes up > twice as many characters as necessary. > > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. +1 on the idea; no time to study the patch tonight. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:28:38 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:28:38 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 18:15:50 EST." <20010115181550.A11566@thyrsus.com> References: <20010115181550.A11566@thyrsus.com> Message-ID: <200101160328.WAA00723@cj20424-a.reston1.va.home.com> > > So... i'm submitting a patch that causes the three most common > > special whitespace characters, '\n', '\r', and '\t', to appear in > > their natural form rather than as octal escapes when strings are > > printed and repr()ed. > > Works for me. I'd add \v, \b and \a to cover the whole ANSI C > standard escape set (hmmm...am I missing any?) You missed \f [*]. Unclear to me whether it's a good idea to add the lesser-known ones; they are just as likely binary gobbledegook rather than what their escapes stand for. [*] http://www.python.org/doc/current/ref/strings.html --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:31:19 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:31:19 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 00:49:30 +0100." <20010116004930.L1005@xs4all.nl> References: <20010116004930.L1005@xs4all.nl> Message-ID: <200101160331.WAA00780@cj20424-a.reston1.va.home.com> > Maybe this is a dead and buried subject, but I'm going to try anyway, since > everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood > lately :) > > Why do we need the following atrocity : > > timestr = time.strftime("", time.localtime(time.time())) > > To do the simple task of 'date +' ? I never really understood why > there isn't a way to get a timetuple directly from C, rather than converting > a float that we got from C a bytecode before, even though the higher level > almost always deals with timetuples. How about making the float-to-tuple > functions (time.localtime, time.gmtime) accept 0 arguments as well, and > defaulting to time.time() in that case ? Even better, how about doing the > same for the other functions, too ? (where it makes sense, of course :) > > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) I don't see the confusion. > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) Yes, I've wondered this myself too. I guess the current API is based too much on the C API... +1/+1/+1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 04:47:32 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 22:47:32 -0500 Subject: [Python-Dev] Detecting install time In-Reply-To: Your message of "Mon, 15 Jan 2001 22:03:31 EST." <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> References: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> Message-ID: <200101160347.WAA01132@cj20424-a.reston1.va.home.com> > For PEP 229, the setup.py script needs to figure out if it's running > from the build directory, because then distutils.sysconfig needs to > look at different config files; ./Modules/Makefile instead of > /usr/lib/python2.0/config/Makefile, and so forth. Is there a > simple/clean way to do this? You could check for the presence of config.status -- that file is not installed. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 16 04:53:16 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 22:53:16 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Message-ID: [?!ng] > So... i'm submitting a patch that causes the three most common > special whitespace characters, '\n', '\r', and '\t', to appear in > their natural form rather than as octal escapes when strings are > printed and repr()ed. -1 on doing that when they're printed (although I probably misunderstand what you mean there). +1 for changing repr() as suggested. -0 on generalizing to \a \b \f \v too (I've never used one of those in a string literal in my life, so would be more baffled by seeing one come back than I would the octal equivalent). I would also be +1 on using hex escapes instead of octal (I grew up on 36- and 60-bit machines, but that was the last time octal looked *natural*!). Octal and hex escapes both consume 4 characters, so I can't imagine what octal has going for it in the 21st century . 377-is-an-irritating-way-to-spell-ff-ly y'rs - tim PS: Note that C doesn't define what numerical values \a etc have, just that: Each of these escape sequences shall produce a unique implementation-defined value which can be stored in a single char object. The external representations in a text file need not be identical to the internal representations, and are outside the scope of this International Standard. The current method does have the advantage of extreme clarity. From guido at python.org Tue Jan 16 05:08:46 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 23:08:46 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Mon, 15 Jan 2001 16:24:54 PST." <20010115162454.D3864@ActiveState.com> References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> Message-ID: <200101160408.XAA01368@cj20424-a.reston1.va.home.com> Looking at the code (in _portable_fseek()) that uses TELL64, I don't understand why it can't use fgetpos(). That code is used only when fpos_t -- the type used by fgetpos() and fsetpos() -- is 64-bit. Trent, you wrote that code. Why wouldn't this work just as well? (your code): if ((pos = TELL64(fileno(fp))) == -1L) return -1; (my suggestion): if (fgetpos(fp, &pos) != 0) return -1; It can't be because fgetpos() doesn't exist or is otherwise unusable, because the SEEK_CUR case uses it. We also know that offset is 8-bit capable (the #if around the declaration of _portable_fseek() ensures that). I would even go as far as to collapse the entire switch as follows: fpos_t pos; switch (whence) { case SEEK_END: /* do a "no-op" seek first to sync the buffering so that the low-level tell() can be used correctly */ if (fseek(fp, 0, SEEK_END) != 0) return -1; /* fall through */ case SEEK_CUR: if (fgetpos(fp, &pos) != 0) return -1; offset += pos; break; /* case SEEK_SET: break; */ } return fsetpos(fp, &offset); --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 05:13:40 2001 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jan 2001 23:13:40 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 22:53:16 EST." References: Message-ID: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> > [?!ng] > > So... i'm submitting a patch that causes the three most common > > special whitespace characters, '\n', '\r', and '\t', to appear in > > their natural form rather than as octal escapes when strings are > > printed and repr()ed. > > -1 on doing that when they're printed (although I probably misunderstand > what you mean there). Ping was using imprecise language here -- he meant repr() and "printed at the command line prompt." > +1 for changing repr() as suggested. > > -0 on generalizing to \a \b \f \v too (I've never used one of those in a > string literal in my life, so would be more baffled by seeing one come back > than I would the octal equivalent). > > I would also be +1 on using hex escapes instead of octal (I grew up on 36- > and 60-bit machines, but that was the last time octal looked *natural*!). Me too. One summer vacation while in college I had nothing better to do than decode the Pascal runtime system for the University's CDC-6600 from an octal dump into assembly. Learned lots! > Octal and hex escapes both consume 4 characters, so I can't imagine what > octal has going for it in the 21st century . Originally, using \x for these was impractical (at least) because of the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics of the \x escape. Now we've fixed this, I agree. > 377-is-an-irritating-way-to-spell-ff-ly y'rs - tim > > > PS: Note that C doesn't define what numerical values \a etc have, just > that: > > Each of these escape sequences shall produce a unique > implementation-defined value which can be stored in a single > char object. The external representations in a text file need > not be identical to the internal representations, and are > outside the scope of this International Standard. > > The current method does have the advantage of extreme clarity. Python doesn't support non-ASCII machines, like the C standard (pretends to). --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 16 05:26:13 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 23:26:13 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101160328.WAA00723@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 10:28:38PM -0500 References: <20010115181550.A11566@thyrsus.com> <200101160328.WAA00723@cj20424-a.reston1.va.home.com> Message-ID: <20010115232613.B12166@thyrsus.com> Guido van Rossum : > > > So... i'm submitting a patch that causes the three most common > > > special whitespace characters, '\n', '\r', and '\t', to appear in > > > their natural form rather than as octal escapes when strings are > > > printed and repr()ed. > > > > Works for me. I'd add \v, \b and \a to cover the whole ANSI C > > standard escape set (hmmm...am I missing any?) > > You missed \f [*]. Unclear to me whether it's a good idea to add the > lesser-known ones; they are just as likely binary gobbledegook rather > than what their escapes stand for. > > [*] http://www.python.org/doc/current/ref/strings.html Truth is, Guido, I'm kind of iffy about whether there'd be a gain in clarity myself. But I find I'm rather attached to the idea of maintaining strictest possible symmetry between what Python handles on input and what it emits on output. So unless we think adding \f, \v, \b, and \a to the special set would actually produce a *loss* of clarity relative to octal gibberish (!), I say do 'em all. Aesthetically, that feels to me like the right thing, and the *Pythonic* thing, to do here. Have I erred in my intuition, O BDFL? -- Eric S. Raymond A man who has nothing which he is willing to fight for, nothing which he cares about more than he does about his personal safety, is a miserable creature who has no chance of being free, unless made and kept so by the exertions of better men than himself. -- John Stuart Mill, writing on the U.S. Civil War in 1862 From nas at arctrix.com Mon Jan 15 22:45:28 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 15 Jan 2001 13:45:28 -0800 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <20010115232613.B12166@thyrsus.com>; from esr@thyrsus.com on Mon, Jan 15, 2001 at 11:26:13PM -0500 References: <20010115181550.A11566@thyrsus.com> <200101160328.WAA00723@cj20424-a.reston1.va.home.com> <20010115232613.B12166@thyrsus.com> Message-ID: <20010115134528.B6193@glacier.fnational.com> On Mon, Jan 15, 2001 at 11:26:13PM -0500, Eric S. Raymond wrote: > [...] I find I'm rather attached to the idea of maintaining > strictest possible symmetry between what Python handles on > input and what it emits on output. > > So unless we think adding \f, \v, \b, and \a to the special set would > actually produce a *loss* of clarity relative to octal gibberish (!), > I say do 'em all. Symmetry is good but I bet most people who would see \f, \v, \b, \a wouldn't have entered those characters using escapes. Most likely those character's would have been read from a binary file. That said, I don't really mind either way. Neil From tim.one at home.com Tue Jan 16 05:43:06 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 15 Jan 2001 23:43:06 -0500 Subject: [Python-Dev] Whitesapce normalization Message-ID: You may have noticed that I checked in changes to most of the modules in the top level of Lib yesterday (Sunday). This is part of a Crusade that was supposed to happen before 2.0a1, but got dropped on the floor then due to misunderstandings: make the Python code we distribute adhere to Guido's style guide (4-space indents, no hard tabs), + clean up minor whitespace nits (no stray blank lines at the ends of files, no trailing whitespace on lines, last line of the file should end with a newline). It would be nice if people cleaned up their code this way too; I'm not going to go thru the entire distribution doing this. So, if you give a rip, pick a directory or some modules you're fond of, and clean 'em up. The program Tools/scripts/reindent.py does all of the above for you, so it's not hard. But it takes some care in two areas, which is why I did the top level of Lib one file at a time by hand, and studied diffs by eyeball before checking in any changes: + It's unlikely but possible that some program file *depends* on trailing whitespace. That plain sucks (it's *going* to break sooner or later), but reindent.py can't help you there. + While reindent should never otherwise damage program logic, very strange commenting or docstring styles may get mangled by it, making code and/or docs hard to read. reindent works very hard to do a good job on that, and indeed I found no need to make manual changes to anything it did in the top level of Lib. But check anyway. Especially some of the very oldest modules are littered with ugly stuff like # all over the place, from back when nobody had an editor smart enough to skip over preceding blank lines when suggesting indentation for the current line. Then again, maybe we should just drop the Irix5 directory . voice-in-the-wilderness-ly y'rs - tim From esr at thyrsus.com Tue Jan 16 05:43:24 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 15 Jan 2001 23:43:24 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from tim.one@home.com on Mon, Jan 15, 2001 at 10:53:16PM -0500 References: Message-ID: <20010115234324.C12166@thyrsus.com> Tim Peters : > I would also be +1 on using hex escapes instead of octal (I grew up on 36- > and 60-bit machines, but that was the last time octal looked *natural*!). > Octal and hex escapes both consume 4 characters, so I can't imagine what > octal has going for it in the 21st century . Tim, on the level of aesthetic preference I'm totally with you. I've always found octal really ugly myself. Hex fits my brain better; somehow I find it easier to visualize the bit patterns from. Sadly, there are so many other related ways in which Python intelligently follows C/Unix conventions that I think changing to a default of hex escapes rather than octal would violate the Rule of Least Surprise. One of the things I like about Python is precisely its conservatism in areas like string escapes, that Guido refrained from inventing new OS APIs or new conventions for things like string escapes in places where Unix and C did them in a well-established and reasonable way. He didn't make the mistake, all too typical in academic languages, of confusing novelty with value... This conservatism is valuable because it frees the C-experienced programmer's mind from having to think about where the language is trivially different, so he can concentrate on where it's importantly different. It's worth maintaining. On the other hand, the change would mesh well with the Unicode support. Hmm. Tough call. I could go either way, I guess. -- Eric S. Raymond The politician attempts to remedy the evil by increasing the very thing that caused the evil in the first place: legal plunder. -- Frederick Bastiat From tim.one at home.com Tue Jan 16 06:07:16 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 16 Jan 2001 00:07:16 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <20010115234324.C12166@thyrsus.com> Message-ID: [Eric] > Tim, on the level of aesthetic preference I'm totally with you. > I've always found octal really ugly myself. Hex fits my brain > better; somehow I find it easier to visualize the bit patterns from. > > Sadly, there are so many other related ways in which Python > intelligently follows C/Unix conventions that I think changing to > a default of hex escapes rather than octal would violate the Rule > of Least Surprise. > > ... [and skipping nice stuff I *do* agree with ] ... The saving grace here is that repr() is a form of ASCII dump. C has nothing to say about that, while last time I used Unix it was real easy to get dumps in hex (and indeed that's what everyone I knew routinely did). I expect that od retains both its name and its octal defaults on most systems simply due to inertia. An octal dump would be infinitely surprising on Windows (I'm not sure I can even get one without writing it myself). Do people actually use octal dumps on Unices anymore? I'd be surprised, if they're running on power-of-2 boxes. Defaults aren't conventions when *everyone* overrides them, they're just old and in the way. takes-one-to-know-one-ly y'rs - tim From ping at lfw.org Tue Jan 16 06:27:33 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 21:27:33 -0800 (PST) Subject: [Python-Dev] time functions In-Reply-To: <20010116004930.L1005@xs4all.nl> Message-ID: On Tue, 16 Jan 2001, Thomas Wouters wrote: > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. I like all of these suggestions. Go for it! -- ?!ng From esr at thyrsus.com Tue Jan 16 06:31:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 00:31:14 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: ; from tim.one@home.com on Tue, Jan 16, 2001 at 12:07:16AM -0500 References: <20010115234324.C12166@thyrsus.com> Message-ID: <20010116003114.A12365@thyrsus.com> Tim Peters : > Do people actually use octal dumps on Unices anymore? Well, we do when we momentarily forget to give od(1) the -x escape :-) This so annoyed me that back around 1983 I wrote my own hex dumper specifically to emulate the 16-hex-bytes-with-midpage-gutter-and-ASCII- over-on-the-right-side format that CP/M used and DOS inherited. It's still available at . Do you know the history on this? C speaks octal because a bunch of mode fields in the PDP-11 instruction word were three bits wide. Time was it was actually useful to have the output from (say) core files chunk that way. But I haven't seen an octal code dump in over a decade, probably pushing fifteen years now. -- Eric S. Raymond In the absence of any evidence tending to show that possession or use of a 'shotgun having a barrel of less than eighteen inches in length' at this time has some reasonable relationship to the preservation or efficiency of a well regulated militia, we cannot say that the Second Amendment guarantees the right to keep and bear such an instrument. [...] The Militia comprised all males physically capable of acting in concert for the common defense. -- Majority Supreme Court opinion in "U.S. vs. Miller" (1939) From ping at lfw.org Tue Jan 16 06:33:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 15 Jan 2001 21:33:42 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> Message-ID: On Mon, 15 Jan 2001, Guido van Rossum wrote: > > > special whitespace characters, '\n', '\r', and '\t', to appear in > > > their natural form rather than as octal escapes when strings are > > > printed and repr()ed. > > > > -1 on doing that when they're printed (although I probably misunderstand > > what you mean there). > > Ping was using imprecise language here -- he meant repr() and "printed > at the command line prompt." Yes, i referred to "when strings are printed and repr()ed" as two cases because both string_print() and string_repr() have to be changed. (Side question: when are *_print() and *_repr() ever different, and why?) > Originally, using \x for these was impractical (at least) because of > the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics > of the \x escape. Now we've fixed this, I agree. Oh, now i understand. Good point. I'll update the patch to do hex. 0xdeadbeef-ly yours, -- ?!ng From fredrik at effbot.org Tue Jan 16 08:11:38 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 16 Jan 2001 08:11:38 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <00b201c07f8b$93996820$e46940d5@hagrid> thomas wrote: > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) where "now" is local time, I assume? since you're assuming a time zone, you could make it accept an integer as well... > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) same here. From thomas at xs4all.net Tue Jan 16 08:18:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 08:18:38 +0100 Subject: [Python-Dev] time functions In-Reply-To: <00b201c07f8b$93996820$e46940d5@hagrid>; from fredrik@effbot.org on Tue, Jan 16, 2001 at 08:11:38AM +0100 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> Message-ID: <20010116081838.N1005@xs4all.nl> On Tue, Jan 16, 2001 at 08:11:38AM +0100, Fredrik Lundh wrote: > thomas wrote: > > - Making the time in time.strftime default to 'now', so that the above > > becomes the ever so slightly confusing: > > > > timestr = time.strftime("") > > (confusing because it looks a bit like a regexp constructor...) > where "now" is local time, I assume? Yes. See the patch I'll upload later today (meetings first, grrr) > since you're assuming a time zone, you could make it accept > an integer as well... Could, yes... I'll include it in the 2nd revision of the patch, it can be rejected (or accepted) separately. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 16 09:22:11 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 09:22:11 +0100 Subject: [Python-Dev] time functions In-Reply-To: <20010116081838.N1005@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 16, 2001 at 08:18:38AM +0100 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> Message-ID: <20010116092211.O1005@xs4all.nl> On Tue, Jan 16, 2001 at 08:18:38AM +0100, Thomas Wouters wrote: > On Tue, Jan 16, 2001 at 08:11:38AM +0100, Fredrik Lundh wrote: > > > timestr = time.strftime("") > > since you're assuming a time zone, you could make it accept > > an integer as well... > Could, yes... Actually, on second thought, lets not, not just yet anyway. Doing that for all functions in the time module would continue to pollute the already toxic waters of a C API translated into Python :P Who knows what 'ctime' stands for, anyway ? And 'asctime' ? How can we expect Python programmers who think 'C' is a high note or average grade, to understand how the time module is supposed to be used ? :) We now have: time() -- return current time in seconds since the Epoch as a float gmtime() -- convert seconds since Epoch to UTC tuple localtime() -- convert seconds since Epoch to local time tuple asctime() -- convert time tuple to string ctime() -- convert time in seconds to string mktime() -- convert local time tuple to seconds since Epoch strftime() -- convert time tuple to string according to format specification where asctime and ctime are basically wrappers around strftime, and would do the exact same thing if they both accepted tuples and floats. I think we should have something like: time() -- current time in float timetuple() -- current (local) time in timetuple tuple2time(tuple) -- tuple -> float time2tuple(float, tz=local) -- float -> tuple using timezone tz stringtime(time=now, format="ctimeformat") -- convert time value to string Those are just working names, to make the point, I don't have time to think up better ones :) I'm not sure if the timezone support in the above list is extensive enough, mostly because I hardly use timezones myself. Also, tuple2time() could be merged with time(), and likewise for time2tuple() and timetuple(). I think keeping strftime() and maybe ctime() for ease-of-use is a good idea, but the rest could eventually be deprecated. Off-to-important-meetings-*cough*-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Tue Jan 16 09:30:28 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 16 Jan 2001 09:30:28 +0100 Subject: [Python-Dev] unit testing bake-off References: Message-ID: <01ba01c07f96$967b7870$e46940d5@hagrid> Tim Peters wrote: > At least you, Jeremy and Fredrik have tried them, and > if that's all there can't be a tie . let me guess: Jeremy: PyUnit Andrew: unittest Fredrik: unittest (I find pyunit a bit unpythonic, and both overengineered and underengineered at the same time... hard to explain, but I strongly prefer unittest) > I would agree this is not an ideal decision procedure. well, any decision procedure that comes up with what I want just has to be ideal ;-) From andy at reportlab.com Tue Jan 16 10:20:45 2001 From: andy at reportlab.com (Andy Robinson) Date: Tue, 16 Jan 2001 09:20:45 -0000 Subject: [Python-Dev] unit testing bake-off In-Reply-To: <20010115204701.11972EA6B@mail.python.org> Message-ID: > Subject: Re: [Python-Dev] unit testing bake-off > From: Guido van Rossum > Date: Mon, 15 Jan 2001 14:17:27 -0500 > > There doesn't seem to be a lot of enthousiasm for a Unittest > bakeoff... Certainly I don't think I'll get to this myself before the > conference. > > How about the following though: talking of low-hanging fruit, Tim's > doctest module is an excellent thing even if it isn't a unit testing > framework! (I found this out when I played with it -- it's real easy > to get used to...) > > Would anyone object against Tim checking this in? Since it isn't a > contender in the unit test bake-off, it shouldn't affect the outcome > there at all. > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think it should definitely go in. Ditto with whatever testing framework and documentation tools (pydoc etc.) shortly emerge as "best of breed". I spend my time on corporate consulting projects, and saying things like "Python has standard tools for unit testing and documentation" is even better than saying "We have standard tools for unit testing and documentation". BTW, ReportLab has recently adopted PyUnit's unittest.py It feels a bit Java-like to me - a few more lines of code than needed - but it certainly works. One key feature is aggregating test suites; a big app we installed on a customer site can run the test suite for itself, the ReportLab library (whose test suite we are just getting to work on) and four or five dependent utilities; another is that people have heard of JUnit. Just my 2p worth, Andy Robinson From tony at lsl.co.uk Tue Jan 16 10:47:01 2001 From: tony at lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 16 Jan 2001 09:47:01 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <200101152041.PAA32298@cj20424-a.reston1.va.home.com> Message-ID: <003901c07fa1$46e10c70$f05aa8c0@lslp7o.int.lsl.co.uk> In the context of my starting doc strings in an Emacs Lisp manner, Ka-Ping Yee said: > I think i'm going to ask you to stop, unless Guido prefers > otherwise. Guido, do you have a style pronouncement for module > docstrings? and since Guido replied > I'm with Ping. None of the examples in the style guide start the > docstring with the function name. Almost none of the standard library > modules start their module docstring with the module name (codecs is > an exception, but I didn't write it :-). I shall indeed stop (of course, my habit started before we HAD documentation tools, and if we're going to browse things with pydoc, et al, then there's no need for it. To be honest, it's the answer I expected. Oh dear, another item for my TO DO list (i.e., remove the offending nits). Still, if it's only me it's hardly high impact! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Which is safer, driving or cycling? Cycling - it's harder to kill people with a bike... My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony at lsl.co.uk Tue Jan 16 11:13:31 2001 From: tony at lsl.co.uk (Tony J Ibbs (Tibs)) Date: Tue, 16 Jan 2001 10:13:31 -0000 Subject: [Python-Dev] RE: [Doc-SIG] pydoc.py (show docs both inside and outside of Python) In-Reply-To: Message-ID: <003a01c07fa4$fa0883c0$f05aa8c0@lslp7o.int.lsl.co.uk> I mentioned a "spurious" > The system cannot find the path specified. on NT, and Ka-Ping Yee said: > Thanks for the NT testing. That's funny -- i put in a special case > for Windows to avoid messages like the above a couple of days ago. > How recently did you download pydoc.py? Does your copy contain: > > if hasattr(sys, 'winver'): > return lambda text: tempfilepager(text, 'more') Hmm. I downloaded it when I read the email message announcing it, which was yesterday some time. But it doesn't look like the lines you mention are there - I'll try re-downloading... ...I've redownloaded the files from http://www.lfw.org/python/pydoc.py, etc., and done a grep for hasattr within them. There's no check such as the one you mention, so I guess it's "download impedance". > So you can see what i'm up to, here's my current to-do list: > > make boldness optional (only if using more/less? only Unix?) probably sensible. By the way, I don't get boldness on the NT box - any chance (he says, not intending to help *at all* in doing it!) of it happening there as well? (or would that depend on what curses support is built into the Python?) > document a .py file given on the command line also allow for a directory module (i.e., something with __init__.py in it) given on the command line? > write a better htmlrepr (\n should look special, max > length limit, etc.) yes, but these things can always get better - the fact it's working allows for improoooovement down the line. > generate HTML index from precis and __path__ and package a neat idea - definitely Good Stuff! > contents list well, I always do these, so I'm for this one as well > have help(...) produce a directory of available things to > ask for help on bouncy fun! > Windows and Mac testing I'm running Windows 98 with Python 1.5.2 at home, and will willingly try it out on that (after all, it's not a very big download) - although it might sometimes take a day or two to get round to it (for instance, I haven't yet done so!). But I suspect I shan't be a very demanding user... > default to HTTP mode on GUI platforms? (win, mac) > > The ones marked with + i consider done. Feel free to comment on > or suggest priorities for the others; in particular, what do you > think of the last one? The idea is that double-clicking on > pydoc.py in Windows or MacOS could launch the server and then open > the localhost URL using webbrowser.py to display the documentation > index. Should it do this by default? I'll leave that to better designers than myself (although if one is to *have* a double click action, that seems sensible to me). (looks up webbrowser.py - ah, a 2.0 module). Personally, I'd also like to have the option of having a "mini-browser" supported directly, perhaps in Tkinter, so I don't need to start up a whole web browser. But again I may be odd in that wish (I can't remember what IDLE does). Oh - that also means "integrate into IDLE" presumably goes on at least a WishList as well... Other ideas: * command line switch to *output* HTML to a file (i.e., documentation generation) (presumably something like "-o .html", where the "html" indicates the output format - an alternative being "txt" * if I ever finish the docutils effort (I should be getting back to it soon) then use that to format the texts (this would mean I need not worry about the "frontend" to docutils too much, since pydoc is already doing so much). Or maybe the docutils tool should be importing pydoc... Tibs (must do some (paid) work now!) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "Bounce with the bunny. Strut with the duck. Spin with the chickens now - CLUCK CLUCK CLUCK!" BARNYARD DANCE! by Sandra Boynton My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From mal at lemburg.com Tue Jan 16 11:18:44 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 11:18:44 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> Message-ID: <3A642004.F6197E86@lemburg.com> Thomas Wouters wrote: > > Maybe this is a dead and buried subject, but I'm going to try anyway, since > everyone's been in such a wonderful 'lets fix ugly but harmless nits' mood > lately :) > > Why do we need the following atrocity : > > timestr = time.strftime("", time.localtime(time.time())) > > To do the simple task of 'date +' ? I never really understood why > there isn't a way to get a timetuple directly from C, rather than converting > a float that we got from C a bytecode before, even though the higher level > almost always deals with timetuples. How about making the float-to-tuple > functions (time.localtime, time.gmtime) accept 0 arguments as well, and > defaulting to time.time() in that case ? Even better, how about doing the > same for the other functions, too ? (where it makes sense, of course :) > > Actually, I'll split it up in three proposals: > > - Making the time in time.strftime default to 'now', so that the above > becomes the ever so slightly confusing: > > timestr = time.strftime("") > (confusing because it looks a bit like a regexp constructor...) > > - Making the time in time.asctime and time.ctime optional, defaulting to > 'now', so you can just call 'time.ctime()' without having to pass > time.time() (which are about half the calls in my own code :) > > - Making the time in time.localtime and time.gmtime default to 'now'. > > I'm 0/+1/+1 myself :) +1 all the way -- though these days I tend not to use the time module anymore. mxDateTime already does everything I want and there date/time values are objects rather than Python integers or tuples... ok, I'm just showing opff a little :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Jan 16 11:32:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 11:32:21 +0100 Subject: [Python-Dev] Strings: '\012' -> '\n' References: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> Message-ID: <3A642335.82358B02@lemburg.com> Minor nit about this idea: it makes decoding repr() style strings harder for external tools and it could cause breakage (e.g. if "\n" is usedby the encoding for some other purpose). BTW, since there are a gazillion ways to encode strings into 7-bit ASCII, why not use the new codec design to add additional output schemes for 8-bit strings ?! Strings have an .encode() method as well... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Tue Jan 16 11:37:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 16 Jan 2001 02:37:42 -0800 (PST) Subject: [Python-Dev] pydoc.py (show docs both inside and outside of Python) In-Reply-To: <003a01c07fa4$fa0883c0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: Before somebody decides to shoot us for spamming both lists, i'm taking this thread off of python-dev and solely to doc-sig. Please continue further discussion there... -- ?!ng From ping at lfw.org Tue Jan 16 11:47:02 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 16 Jan 2001 02:47:02 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Message-ID: On Mon, 15 Jan 2001, Ka-Ping Yee wrote: > On Mon, 15 Jan 2001, Guido van Rossum wrote: > > Originally, using \x for these was impractical (at least) because of > > the stupid gobble-up-everything-that-looks-like-a-hex-digit semantics > > of the \x escape. Now we've fixed this, I agree. > > Oh, now i understand. Good point. I'll update the patch to do hex. I assume you would like Unicode strings to do the same (\n, \t, \r, and \xff rather than \377). Guido, do you have a Pronouncement on \v, \f, \b, \a? By the way, why do Unicode escapes appear in capitals? >>> u'\uface' u'\uFACE' (If someone tells me that there happens to be a picture of a face at that code point, i'll laugh. Is there a cow at \uBEEF?) Does anyone care that \x will be followed by lowercase and \u by uppercase? I noticed that the tutorial claims Unicode strings can be str()-ified and will encode themselves using UTF-8 as default. But this doesn't actually work for me: >>> us = u'\uface' >>> us u'\uFACE' >>> str(us) Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> us.encode() Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> us.encode('UTF-8') '\xef\xab\x8e' Assuming i have understood this correctly, i have submitted a patch to correct tut.tex. -- ?!ng From bckfnn at worldonline.dk Tue Jan 16 11:52:10 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Tue, 16 Jan 2001 10:52:10 GMT Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: References: Message-ID: <3a642768.6426631@smtp.worldonline.dk> [Ping] >I don't know whether this is going to be obvious or controversial, >but here goes. Most of the time we're used to seeing a newline as >'\n', not as '\012', and newlines are typed in as '\n'. > >A newcomer to Python is likely to do > > >>> 'hello\n' > 'hello\012' > >and ask "what's \012?" -- whereupon one has to explain that it's an >octal escape, that 012 in octal equals 10, and that chr(10) is >newline, which is the same as '\n'. You're bound to run into this, >and you'll see \012 a lot, because \n is such a common character. >Aside from being slightly more frightening, '\012' also takes up >twice as many characters as necessary. > >So... i'm submitting a patch that causes the three most common >special whitespace characters, '\n', '\r', and '\t', to appear in >their natural form rather than as octal escapes when strings are >printed and repr()ed. I like it, because it removes yet another difference between Python and Jython. Jython happens to handle these chars specially: \n, \t, \b, \f and \r. regards, finn From esr at thyrsus.com Tue Jan 16 11:53:00 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 05:53:00 -0500 Subject: [Python-Dev] time functions In-Reply-To: <3A642004.F6197E86@lemburg.com>; from mal@lemburg.com on Tue, Jan 16, 2001 at 11:18:44AM +0100 References: <20010116004930.L1005@xs4all.nl> <3A642004.F6197E86@lemburg.com> Message-ID: <20010116055300.C12847@thyrsus.com> M.-A. Lemburg : > +1 all the way -- though these days I tend not to use the > time module anymore. mxDateTime already does everything I want > and there date/time values are objects rather than Python integers > or tuples... ok, I'm just showing opff a little :) mxDateTime is on my short list of "why isn't this in the Python library already?" Has it ever been discussed? -- Eric S. Raymond You need only reflect that one of the best ways to get yourself a reputation as a dangerous citizen these days is to go about repeating the very phrases which our founding fathers used in the great struggle for independence. -- Attributed to Charles Austin Beard (1874-1948) From mal at lemburg.com Tue Jan 16 12:18:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 16 Jan 2001 12:18:24 +0100 Subject: [Python-Dev] time functions References: <20010116004930.L1005@xs4all.nl> <3A642004.F6197E86@lemburg.com> <20010116055300.C12847@thyrsus.com> Message-ID: <3A642E00.BD330647@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > +1 all the way -- though these days I tend not to use the > > time module anymore. mxDateTime already does everything I want > > and there date/time values are objects rather than Python integers > > or tuples... ok, I'm just showing opff a little :) > > mxDateTime is on my short list of "why isn't this in the Python library > already?" Has it ever been discussed? Yes. I'd rather keep it separate from the standard dist for various reasons. One of these reasons is that I will be moving the mx tools into a new packaging scheme built on distutils -- installing it should then boil down to a simple RPM install or maybe a "python setup.py install" thanks to distutils. The package will then become a subpackage of the mx package. BTW, I see distutils as strong argument for *not* including more exotic packages in Python's stdlib. If this catches on, I expect that together with the Vaults we are not far away from having our own CPAN style archive of add-on packages. I also expect the commercial vendors like ActiveState et al. to take care of wrapping SUMO distributions of Python and the existing add-ons. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 16 12:20:18 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 16 Jan 2001 06:20:18 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <3a642768.6426631@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Tue, Jan 16, 2001 at 10:52:10AM +0000 References: <3a642768.6426631@smtp.worldonline.dk> Message-ID: <20010116062018.A12935@thyrsus.com> Finn Bock : > I like it, because it removes yet another difference between Python and > Jython. Jython happens to handle these chars specially: \n, \t, \b, \f > and \r. This is an argument for adding \b and \f to the special set in CPython. If the BDFL looks benignly on adding \v and \a, those should go into Jython's special set too. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address From fredrik at pythonware.com Tue Jan 16 12:37:10 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 16 Jan 2001 12:37:10 +0100 Subject: [Python-Dev] Strings: '\012' -> '\n' References: Message-ID: <03eb01c07fb0$aaaa19e0$0900a8c0@SPIFF> ping wrote: > By the way, why do Unicode escapes appear in capitals? > > >>> u'\uface' > u'\uFACE' > > (If someone tells me that there happens to be a picture of a face at > that code point, i'll laugh. Is there a cow at \uBEEF?) iirc, 0xFACE and 0xBEEF are part of the CJK and Hangul spaces. not sure 0xFACE is assigned, but 0xBEEF glyph looks like a ribcage with four legs... you'll find faces at 0x263A etc. From skip at mojam.com Tue Jan 16 14:09:51 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 16 Jan 2001 07:09:51 -0600 (CST) Subject: [Python-Dev] bummer - regsub/regex no longer in module index Message-ID: <14948.18463.971334.401426@beluga.mojam.com> I am now getting deprecation warnings about regsub so I decided to start replacing it with more zeal than I had previously. First thing I wanted to replace were some regsub.split calls. I went to the module index to look up the description but regsub was nowhere to be found. (I know, I know. I can use pydoc.) Still... how about continuing to include deprecated modules in the library reference manual but in a separate Deprecated Modules section and annotate them as such in the module index? Skip From guido at python.org Tue Jan 16 14:44:01 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 08:44:01 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 08:11:38 +0100." <00b201c07f8b$93996820$e46940d5@hagrid> References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> Message-ID: <200101161344.IAA04513@cj20424-a.reston1.va.home.com> > thomas wrote: > > - Making the time in time.strftime default to 'now', so that the above > > becomes the ever so slightly confusing: > > > > timestr = time.strftime("") > > (confusing because it looks a bit like a regexp constructor...) > > where "now" is local time, I assume? > > since you're assuming a time zone, you could make it accept > an integer as well... What would the integer mean? > > - Making the time in time.asctime and time.ctime optional, defaulting to > > 'now', so you can just call 'time.ctime()' without having to pass > > time.time() (which are about half the calls in my own code :) > > same here. Same what here? "now" == local time, sure. But accept an integer? It already accepts an integer! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 14:55:01 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 08:55:01 -0500 Subject: [Python-Dev] time functions In-Reply-To: Your message of "Tue, 16 Jan 2001 09:22:11 +0100." <20010116092211.O1005@xs4all.nl> References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> <20010116092211.O1005@xs4all.nl> Message-ID: <200101161355.IAA04802@cj20424-a.reston1.va.home.com> Let's not redesign the time module API too much. I'm all for adding the default argument values that Thomas proposes. Then, instead of changing the API, we should look into a higher-level Python module. That's how those things typically go. Digital Creations has its own time extension type somewhere in Zope, a bit similar to mxDateTime. I looked into making this a standard Python extension but quickly gave up. The problems with these things seems to be that it's hard to come up with a design that makes everyone happy: some people want small objects (because they have a lot of them around, e.g. a timestamp on almost every other object); others want timezone support; yet others want microsecond resolution; leap-second support; pre-Christian era support; support for nonstandard calendars; interval arithmetic; support for dates without times or times without dates... Python could use a better time type, but we'll have to look into which requirements make sense for a generalized type, and which don't. I fear that a committee could easily pee away years designing an interface to satisfy absolutely every wish. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:02:29 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:02:29 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Mon, 15 Jan 2001 21:33:42 PST." References: Message-ID: <200101161402.JAA05045@cj20424-a.reston1.va.home.com> > Yes, i referred to "when strings are printed and repr()ed" as two cases > because both string_print() and string_repr() have to be changed. > > (Side question: when are *_print() and *_repr() ever different, and why?) You mean the tp_print and tp_str function slots in type objects, right? tp_print *should* always render exactly the same as tp_str. tp_print is used by the print statement, not by value display at the interactive prompt. tp_print and tp_str have differed historically for 3rd party extension types by accident. So, string_print most definitely should *not* be changed -- only string_repr! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:06:23 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:06:23 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 02:47:02 PST." References: Message-ID: <200101161406.JAA05153@cj20424-a.reston1.va.home.com> > I assume you would like Unicode strings to do the same (\n, \t, \r, > and \xff rather than \377). Yeah. > Guido, do you have a Pronouncement on \v, \f, \b, \a? Practicality beats purity: these will remain octal. > By the way, why do Unicode escapes appear in capitals? > > >>> u'\uface' > u'\uFACE' Could it be just that that's what Unicode folks are expecting? > (If someone tells me that there happens to be a picture of a face at > that code point, i'll laugh. Is there a cow at \uBEEF?) I'm laughing even though I don't see pictures. :-) > Does anyone care that \x will be followed by lowercase and \u by uppercase? It's mildly weird, and I think hex escapes in lowercase are more Pythonic than in upper case. > I noticed that the tutorial claims Unicode strings can be str()-ified > and will encode themselves using UTF-8 as default. But this doesn't > actually work for me: > > >>> us = u'\uface' > >>> us > u'\uFACE' > >>> str(us) > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > >>> us.encode() > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > >>> us.encode('UTF-8') > '\xef\xab\x8e' > > Assuming i have understood this correctly, i have submitted a patch > to correct tut.tex. Yeah, I guess that part of the tutorial was written before we changed our minds about this. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:09:56 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:09:56 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 11:32:21 +0100." <3A642335.82358B02@lemburg.com> References: <200101160413.XAA01404@cj20424-a.reston1.va.home.com> <3A642335.82358B02@lemburg.com> Message-ID: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> > Minor nit about this idea: it makes decoding repr() style > strings harder for external tools and it could cause breakage > (e.g. if "\n" is usedby the encoding for some other purpose). Such a tool would be broken. If it accepts string literals it should accept all forms of escapes. > BTW, since there are a gazillion ways to encode strings into > 7-bit ASCII, why not use the new codec design to add additional > output schemes for 8-bit strings ?! > > Strings have an .encode() method as well... Good idea! This could also be used to "hexify" a string, for which currently one of the quickest ways is still the hack "%02x"*len(s) % tuple(s) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jan 16 15:11:53 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 09:11:53 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Tue, 16 Jan 2001 06:20:18 EST." <20010116062018.A12935@thyrsus.com> References: <3a642768.6426631@smtp.worldonline.dk> <20010116062018.A12935@thyrsus.com> Message-ID: <200101161411.JAA05336@cj20424-a.reston1.va.home.com> > Finn Bock : > > I like it, because it removes yet another difference between Python and > > Jython. Jython happens to handle these chars specially: \n, \t, \b, \f > > and \r. [ESR] > This is an argument for adding \b and \f to the special set in > CPython. If the BDFL looks benignly on adding \v and \a, those > should go into Jython's special set too. No, I think Jython should remove \b and \f. Or the language standard could allow implementations some freedom here (as long as the output is a string literal). --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Jan 16 16:06:34 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 16 Jan 2001 10:06:34 -0500 (EST) Subject: [Python-Dev] unit testing bake-off In-Reply-To: References: <20010115162619.A19484@kronos.cnri.reston.va.us> Message-ID: <14948.25466.698063.240902@cj42289-a.reston1.va.home.com> Tim Peters writes: > Presumably so that *something* gets into 2.1a1. At least you, Jeremy and > Fredrik have tried them, and if that's all there can't be a tie . I > would agree this is not an ideal decision procedure. I've been using PyUNIT some, but haven't tried the Quixote unittest module, which tells me I can't make a particularly informed recommendation (vote, whatever). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas at xs4all.net Tue Jan 16 16:23:52 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 16:23:52 +0100 Subject: [Python-Dev] time functions In-Reply-To: <200101161355.IAA04802@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 08:55:01AM -0500 References: <20010116004930.L1005@xs4all.nl> <00b201c07f8b$93996820$e46940d5@hagrid> <20010116081838.N1005@xs4all.nl> <20010116092211.O1005@xs4all.nl> <200101161355.IAA04802@cj20424-a.reston1.va.home.com> Message-ID: <20010116162350.A21010@xs4all.nl> On Tue, Jan 16, 2001 at 08:55:01AM -0500, Guido van Rossum wrote: > Let's not redesign the time module API too much. [snip] Agreed. > I fear that a committee could easily pee away years designing an > interface to satisfy absolutely every wish. A committee is a life form with six or more legs and no brain. Lazarus Long in "Time Enough For Love", by R. A. Heinlein. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at mojam.com Tue Jan 16 18:23:56 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 16 Jan 2001 11:23:56 -0600 (CST) Subject: [Python-Dev] Re: [Patches] [Patch #102891] Alternative readline module In-Reply-To: References: Message-ID: <14948.33708.332464.107009@beluga.mojam.com> Michael> ... (or I'll just call it pyttyinput) Which, like "Guido", when properly pronounced should leave your monitor slightly moist... ;-) Skip From thomas at xs4all.net Tue Jan 16 18:36:03 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 16 Jan 2001 18:36:03 +0100 Subject: [Python-Dev] Re: [Patches] [Patch #102891] Alternative readline module In-Reply-To: <14948.33708.332464.107009@beluga.mojam.com>; from skip@mojam.com on Tue, Jan 16, 2001 at 11:23:56AM -0600 References: <14948.33708.332464.107009@beluga.mojam.com> Message-ID: <20010116183603.B2776@xs4all.nl> On Tue, Jan 16, 2001 at 11:23:56AM -0600, Skip Montanaro wrote: > Which, like "Guido", when properly pronounced should leave your monitor > slightly moist... ;-) Nono, 'Guido' should be pronounced using a hard, back-of-your-throat 'G', more like a growl than a hiss. The less moisture the better :) You-were-thinking-of-Centraal-Wiskunde-Instituut-(cwi.nl)-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From trentm at ActiveState.com Tue Jan 16 19:36:29 2001 From: trentm at ActiveState.com (Trent Mick) Date: Tue, 16 Jan 2001 10:36:29 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <200101160408.XAA01368@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 11:08:46PM -0500 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> Message-ID: <20010116103626.D30209@ActiveState.com> On Mon, Jan 15, 2001 at 11:08:46PM -0500, Guido van Rossum wrote: > > Trent, you wrote that code. Why wouldn't this work just as well? > > (your code): > if ((pos = TELL64(fileno(fp))) == -1L) > return -1; > (my suggestion): > if (fgetpos(fp, &pos) != 0) > return -1; I agree, that looks to me like it would. I guess I just missed that when I wrote it. > > I would even go as far as to collapse the entire switch as follows: > > fpos_t pos; > switch (whence) { > case SEEK_END: > /* do a "no-op" seek first to sync the buffering so that > the low-level tell() can be used correctly */ > if (fseek(fp, 0, SEEK_END) != 0) > return -1; > /* fall through */ > case SEEK_CUR: > if (fgetpos(fp, &pos) != 0) > return -1; > offset += pos; > break; > /* case SEEK_SET: break; */ > } > return fsetpos(fp, &offset); Sure. Just get rid of the """do a "no-op" seek...""" comment because it is no longer applicable. I am not setup to test this on Win64 right and I don't suppose there are a lot of you out there with your own Win64 setups. I will be able to test this before the scheduled 2.1 beta (late Feb), though. Trent -- Trent Mick TrentM at ActiveState.com From trentm at ActiveState.com Tue Jan 16 20:34:17 2001 From: trentm at ActiveState.com (Trent Mick) Date: Tue, 16 Jan 2001 11:34:17 -0800 Subject: [Python-Dev] TELL64 In-Reply-To: <20010116103626.D30209@ActiveState.com>; from trentm@ActiveState.com on Tue, Jan 16, 2001 at 10:36:29AM -0800 References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> <20010116103626.D30209@ActiveState.com> Message-ID: <20010116113417.I30209@ActiveState.com> On Tue, Jan 16, 2001 at 10:36:29AM -0800, Trent Mick wrote: > Sure. Just get rid of the """do a "no-op" seek...""" comment because it is no > longer applicable. I am not setup to test this on Win64 right and I don't s/right/right now/ Trent -- Trent Mick TrentM at ActiveState.com From cgw at fnal.gov Tue Jan 16 21:19:09 2001 From: cgw at fnal.gov (Charles G Waldman) Date: Tue, 16 Jan 2001 14:19:09 -0600 (CST) Subject: [Python-Dev] Re: [Patch #103248] Fix a memory leak in _sre.c Message-ID: <14948.44221.876681.838046@buffalo.fnal.gov> Frederik - I noticed that you chose to check in a slightly different patch than the one I submitted. I wonder why you chose to do this? In particular at line 1238 I had: if (PyErr_Occurred()) { Py_DECREF(self); return NULL; } and you changed this to if (PyErr_Occurred()) { PyObject_DEL(self); return NULL; } Can you explain why you made this (seemingly arbitrary) change? I think that since "self" was created via: self = PyObject_NEW_VAR(PatternObject, &Pattern_Type, n); which calls PyObjectINIT, which in turn calls _Py_NewReference, which increments _Py_RefTotal, it is incorrect to simply do a PyObject_DEL to de-allocate it -- won't this screw up the value of _Py_RefTotal? Admittedly this is a minor nit and only matters if Py_TRACE_REFS is defined - I just wanted to check to make sure my understanding of reference counting w.r.t. memory allocation and deallocation is correct - if the above is in error, I'd apprecate any corrections... From guido at python.org Tue Jan 16 21:53:41 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 15:53:41 -0500 Subject: [Python-Dev] TELL64 In-Reply-To: Your message of "Tue, 16 Jan 2001 10:36:29 PST." <20010116103626.D30209@ActiveState.com> References: <20010108182056.C4640@lyra.org> <200101151602.LAA22272@cj20424-a.reston1.va.home.com> <20010115141026.I29870@ActiveState.com> <20010116005536.M1005@xs4all.nl> <20010115162454.D3864@ActiveState.com> <200101160408.XAA01368@cj20424-a.reston1.va.home.com> <20010116103626.D30209@ActiveState.com> Message-ID: <200101162053.PAA13099@cj20424-a.reston1.va.home.com> > I agree, that looks to me like it would. I guess I just missed that when I > wrote it. Excellent! I've checked this in now -- we'll hear if it breaks anywhere soon enough. >I am not setup to test this on Win64 right [now] and I don't > suppose there are a lot of you out there with your own Win64 setups. What happened to ActiveState's Itanium boxes? --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Tue Jan 16 22:53:22 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 16 Jan 2001 16:53:22 -0500 Subject: [Python-Dev] Re: Detecting install time In-Reply-To: <200101160347.WAA01132@cj20424-a.reston1.va.home.com>; from guido@python.org on Mon, Jan 15, 2001 at 10:47:32PM -0500 References: <200101160303.WAA11632@207-172-111-91.s91.tnt1.ann.va.dialup.rcn.com> <200101160347.WAA01132@cj20424-a.reston1.va.home.com> Message-ID: <20010116165322.B29674@kronos.cnri.reston.va.us> [CC'ing to the distutils-sig] On Mon, Jan 15, 2001 at 10:47:32PM -0500, Guido van Rossum wrote: >> For PEP 229, the setup.py script needs to figure out if it's running >> from the build directory, because then distutils.sysconfig needs to > >You could check for the presence of config.status -- that file is not >installed. This isn't a check suitable for inclusion in distutils.sysconfig, though, because it's so liable to being fooled (consider a Distutils-packaged module that comes with a configure script to build some library). Right now I'm using a hacked version of sysconfig with several patches like this: @@ -120,12 +121,16 @@ def get_config_h_filename(): """Return full pathname of installed config.h file.""" inc_dir = get_python_inc(plat_specific=1) + # XXX + if 1: inc_dir = '.' return os.path.join(inc_dir, "config.h") One hackish approach would be to add a assume_build_directories() to distutils.sysconfig, a little back door to be used by the setup.py script that comes with Python, so the above would become 'if build_time_flag: ...'. Anyone have a cleaner idea? --amk From akuchlin at mems-exchange.org Wed Jan 17 02:46:47 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Tue, 16 Jan 2001 20:46:47 -0500 Subject: [Python-Dev] PEP 229 issues Message-ID: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> I'm in a quandry about the patch implementing PEP 229. The patch is quite close to being ready, with only a few minor issues remaining, but to fix those issues, I need to make some changes to the Distutils, such as the sysconfig modification I recently suggested. Problem: I believe the patch *must* go in at the alpha stage, because there are bound to be lots of platform-specific problems that will show up; it should not be added in the beta stage, because it'll need time to get tested and debugged, and I wouldn't be surprised if it has to be reverted later because of some insurmountable problem. Problem: Greg Ward, the Distutils maintainer, is away at the moment. I can check in changes to the Distutils without his say-so, but when Greg gets back he might shriek in horror and rip all of the changes out again. (Or he's stuck with maintaining them until 2.2.) Problem: 2.1alpha1 is due on Friday. So, what to do? If I know there's going to be an alpha2, that's probably fine; Greg should have resurfaced by then, and the patch can go in for alpha2. Or, I can check in the changes before Friday, and if they're unacceptable, they can be fixed for alpha2/beta1, or simply backed out. Or, I can leave Distutils alone and make setup.py a tissue of hacks and workarounds. For example, it might insert new versions of various functions into the distutils.sysconf module. Icky and fragile, but cleaning it up for beta1 would then be a priority. Suggestions? Pronouncements? --amk From guido at python.org Wed Jan 17 02:39:35 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 20:39:35 -0500 Subject: [Python-Dev] PEP 229 issues In-Reply-To: Your message of "Tue, 16 Jan 2001 20:46:47 EST." <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> References: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> Message-ID: <200101170139.UAA17954@cj20424-a.reston1.va.home.com> I expect that there will be an alpha2, but I still recommend that you check in *something* that works for alpha1, to get maximal testing coverage. Alpha1 may slip a day or so (Jeremy and I are both late with our big patches, respectively nested scopes and rich comparisons, that we really want to have in alpha1). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 17 03:04:53 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 16 Jan 2001 21:04:53 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Good idea [using string.encode()]! This could also be used to > "hexify" a string, for which currently one of the quickest ways > is still the hack > > "%02x"*len(s) % tuple(s) Note that as of 2.0, a far quicker way is to use binascii.b2a_hex(), or its absurdist (read "Barry" ) synonym binascii.hexlify(). I'm wary of using string.encode() for this, because one normally hexlifies binary data (e.g., like sha checksums), and 4 days of 7 we're more than not in favor of moving away from strings to carry binary data. Of course we can change our minds about this across releases, and have even-numbered releases deprecate the function forms while odd-numbered ones abjure methods. Works for me . From nas at arctrix.com Tue Jan 16 22:08:23 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 16 Jan 2001 13:08:23 -0800 Subject: [Python-Dev] [droux@tuks.co.za: Our application doesn't work with Debian packaged Python] Message-ID: <20010116130823.C9640@glacier.fnational.com> This message was on the debian-python list. Does anyone know why the patch is needed? Neil ----- Forwarded message from Danie Roux ----- Date: Tue, 16 Jan 2001 11:44:48 +0200 From: Danie Roux Subject: Our application doesn't work with Debian packaged Python To: Debian Python Good they all, Our program is an archiver for gnome that uses gnome-python with one widget written in C. I converted our program to autoconf and automake so anyone can (and please do!) compile it and see what I mean. Everything compiles fine. But when it runs it just throws a weird exception. The funny thing is, if I alien RedHat 6.2's python package, and install that, it works! I need to change nothing else. Only the python package. I then went and look at the source rpm. They have this patch in there: --- Python-1.5.2/Python/importdl.c.global Sat Jul 17 16:52:26 1999 +++ Python-1.5.2/Python/importdl.c Sat Jul 17 16:53:19 1999 @@ -441,13 +441,13 @@ #ifdef RTLD_NOW /* RTLD_NOW: resolve externals now (i.e. core dump now if some are missing) */ - void *handle = dlopen(pathname, RTLD_NOW); + void *handle = dlopen(pathname, RTLD_NOW | RTLD_GLOBAL); #else void *handle; if (Py_VerboseFlag) printf("dlopen(\"%s\", %d);\n", pathname, - RTLD_LAZY); - handle = dlopen(pathname, RTLD_LAZY); + RTLD_LAZY | RTLD_GLOBAL); + handle = dlopen(pathname, RTLD_LAZY | RTLD_GLOBAL); #endif /* RTLD_NOW */ if (handle == NULL) { PyErr_SetString(PyExc_ImportError, dlerror()); Sure enough this fixes my problem. The thing is that this means our program only works on Redhat (and who ever patched python 1.5.2 with this). So what can I do now? How can I get this patch into debian-python? How can I change my program to not need the patch? btw the program is garchiver, it will be hosted at sourceforge as soon as they get back to me, in the mean time I will mail anyone a copy of the sources. -- Danie Roux *shuffle* Adore Unix -- To UNSUBSCRIBE, email to debian-python-request at lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster at lists.debian.org ----- End forwarded message ----- From guido at python.org Wed Jan 17 05:16:48 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 23:16:48 -0500 Subject: [Python-Dev] [droux@tuks.co.za: Our application doesn't work with Debian packaged Python] In-Reply-To: Your message of "Tue, 16 Jan 2001 13:08:23 PST." <20010116130823.C9640@glacier.fnational.com> References: <20010116130823.C9640@glacier.fnational.com> Message-ID: <200101170416.XAA20515@cj20424-a.reston1.va.home.com> > This message was on the debian-python list. Does anyone know why > the patch is needed? > - handle = dlopen(pathname, RTLD_LAZY); > + handle = dlopen(pathname, RTLD_LAZY | RTLD_GLOBAL); This comes back every once in a while. It means that they have an module whose shared library implementation exports symbols that are needed by another shared library (probably another module). IMO this approach is evil, because RTLD_GLOBAL means that *all* external symbols defined by any module are exported to all other shared libraries, and this will cause conflicts if the same symbol is exported by two different modules -- which can happen quite easily. (I don't know what happens on conflicts -- maybe you get an error, maybe it links to the wrong symbol.) The proper solution would be to put the needed entry points beside the init entry point in a separate shared library. But that's often not how quick-and-dirty extension modules are designed... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jan 17 05:22:54 2001 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Jan 2001 23:22:54 -0500 Subject: [Python-Dev] Rich Comparisons technical prerelease Message-ID: <200101170422.XAA20626@cj20424-a.reston1.va.home.com> I've got a working version of the rich comparisons ready for preview. The patch is here: http://www.python.org/~guido/richdiff.txt It's also referenced at sourceforge: http://sourceforge.net/patch/?func=detailpatch&patch_id=103283&group_id=5470 Here's a summary: - The comparison operators support "rich comparison overloading" (PEP 207). C extension types can provide a rich comparison function in the new tp_richcompare slot in the type object. The cmp() function and the C function PyObject_Compare() first try the new rich comparison operators before trying the old 3-way comparison. There is also a new C API PyObject_RichCompare() (which also falls back on the old 3-way comparison, but does not constrain the outcome of the rich comparison to a Boolean result). The rich comparison function takes two objects (at least one of which is guaranteed to have the type that provided the function) and an integer indicating the opcode, which can be Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, Py_GE (for <, <=, ==, !=, >, >=), and returns a Python object, which may be NotImplemented (in which case the tp_compare slot function is used as a fallback, if defined). Classes can overload individual comparison operators by defining one or more of the methods__lt__, __le__, __eq__, __ne__, __gt__, __ge__. There are no explicit "reversed argument" versions of these; instead, __lt__ and __gt__ are each other's reverse, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reverse (similar at the C level). No other implications are made; in particular, Python does not assume that == is the inverse of !=, or that < is the inverse of >=. This makes it possible to define types with partial orderings. Classes or types that want to implement (in)equality tests but not the ordering operators (i.e. unordered types) should implement == and !=, and raise an error for the ordering operators. It is possible to define types whose comparison results are not Boolean; e.g. a matrix type might want to return a matrix of bits for A < B, giving elementwise comparisons. Such types should ensure that any interpretation of their value in a Boolean context raises an exception, e.g. by defining __nonzero__ (or the tp_nonzero slot at the C level) to always raise an exception. XXX TO DO for this feature: - the test "test_compare" fails, because of the changed semantics for complex number comparisons (1j<2j raises an error now) - tuple, dict should implement EQ/NE so containers containing complex numbers can be compared for equality (list is already done) -- or complex numbers should be reverted to old behavior - list.sort() shoud use rich comparison - check for memory leaks - int, long, float contain new-style-cmp functions that aren't used to their full potential any more (the new-style-cmp functions introduced by Neil's coercion work are gone again) - decide on unresolved issues from PEP 207 - documentation - more testing - compare performance to 2.0 (microbench?) Please give this a good spin -- I'm hoping to check this in and make it part of the alpha 1 release Friday... --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Wed Jan 17 05:50:25 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 16 Jan 2001 23:50:25 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' References: <200101161409.JAA05268@cj20424-a.reston1.va.home.com> Message-ID: <14949.9361.591610.684695@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Note that as of 2.0, a far quicker way is to use TP> binascii.b2a_hex(), or its absurdist (read "Barry" ) TP> synonym binascii.hexlify(). Thanks for the compliment Tim, but I can't take credit for that name. If it was me I'd have called it wudduptify() (and its inverse, notmuchlify()). I stole the name from Emacs's hexlify-buffer function which kind of does the same thing. would-converting-to-octal-digits-be-called-octopuslify-ly y'rs, -Barry From fredrik at effbot.org Wed Jan 17 09:12:32 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 09:12:32 +0100 Subject: [Python-Dev] Re: [Patch #103248] Fix a memory leak in _sre.c References: <14948.44221.876681.838046@buffalo.fnal.gov> Message-ID: <00fe01c0805d$432d4cd0$e46940d5@hagrid> Charles G Waldman wrote: > Can you explain why you made this (seemingly arbitrary) change? > > I think that since "self" was created via: > > self = PyObject_NEW_VAR(PatternObject, &Pattern_Type, n); > > which calls PyObjectINIT, which in turn calls _Py_NewReference, which > increments _Py_RefTotal, it is incorrect to simply do a PyObject_DEL > to de-allocate it -- won't this screw up the value of _Py_RefTotal? and what do you think will happen if you call the destructor before you've initialized all pointer fields in the object? (according to the docs, the NEW/New functions return uninitialized memory. in this case, we're bailing out before the object has been fully initialized. pattern_dealloc definitely isn't prepared to deal with random pointer values...) > Admittedly this is a minor nit and only matters if Py_TRACE_REFS is > defined - I just wanted to check to make sure my understanding of > reference counting w.r.t. memory allocation and deallocation is > correct - if the above is in error, I'd apprecate any corrections... same here. I don't doubt it's working as you say it does, but I find it strange that you shouldn't be able to DEL an object you just created with NEW... maybe DEL should be fixed? Cheers /F From thomas at xs4all.net Wed Jan 17 10:48:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 17 Jan 2001 10:48:12 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules Setup.config.in,1.7,1.8 Setup.dist,1.7,1.8 In-Reply-To: ; from esr@users.sourceforge.net on Wed, Jan 17, 2001 at 12:25:13AM -0800 References: Message-ID: <20010117104812.F2776@xs4all.nl> On Wed, Jan 17, 2001 at 12:25:13AM -0800, Eric S. Raymond wrote: > + # ndbm(3) may require -lndbm or similar > + @USE_NDBM_MODULE at ndbm ndbmmodule.c @HAVE_LIBNDBM@ This is an interesting module... It's not in the Modules/ directory :-) Did you mean 'dbmmodule.c' with a different library argument ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at mojam.com Wed Jan 17 16:17:39 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 17 Jan 2001 09:17:39 -0600 (CST) Subject: [Python-Dev] Rich comparison confusion Message-ID: <14949.46995.259157.871323@beluga.mojam.com> I'm a bit confused about Guido's rich comparison stuff. In the description he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. From akuchlin at mems-exchange.org Wed Jan 17 16:42:13 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 10:42:13 -0500 Subject: [Python-Dev] PEP 229 issues In-Reply-To: <200101170139.UAA17954@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 08:39:35PM -0500 References: <200101170146.UAA00542@207-172-112-159.s159.tnt4.ann.va.dialup.rcn.com> <200101170139.UAA17954@cj20424-a.reston1.va.home.com> Message-ID: <20010117104213.B490@kronos.cnri.reston.va.us> On Tue, Jan 16, 2001 at 08:39:35PM -0500, Guido van Rossum wrote: >I expect that there will be an alpha2, but I still recommend that you >check in *something* that works for alpha1, to get maximal testing >coverage. Alpha1 may slip a day or so (Jeremy and I are both late >with our big patches, respectively nested scopes and rich comparisons, >that we really want to have in alpha1). OK; thanks for the pronouncement! I've checked in all the smaller changes that shouldn't break anything. All that's left now is to actually enable the new feature, which requires the nasty changes: * In the top-level Makefile.in, the "sharedmods" target simply runs "./python setup.py build", and "sharedinstall" runs "./python setup.py install". The "clobber" target also deletes the build/ subdirectory where Distutils puts its output. * Rip stuff out of the Setup files. Modules/Setup.config.in only contains entries for the gc and thread modules; the readline, curses, and db modules are removed because it's now setup.py's job to handle them. * Modules/Setup.dist now contains entries for only 3 modules -- _sre, posix, and strop. Guido and Jeremy are rushing to finish their patches in time for the alpha release, though Guido seems to be checking in the rich comparison stuff now. I don't want to impede them by making them stop to debug build problems, so I can either wait until they've landed their changes (at which point there's nothing major left, I think), or they can simply not do a 'cvs update' after the serious changes go in. Thoughts? --amk From barry at digicool.com Wed Jan 17 16:54:06 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 17 Jan 2001 10:54:06 -0500 Subject: [Python-Dev] Breakage in latest CVS Message-ID: <14949.49182.636526.292265@anthem.wooz.org> Looks like the latest CVS (updated just minutes ago) is broken. I'm trying to fix some of these complaints, but thought I'd at least report what I've found... -Barry ... gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -I./../Include -I.. -DHAVE_CONFIG_H -c floatobject.c -o floatobject.o floatobject.c:675: warning: excess elements in struct initializer after `float_as_number' floatobject.c:700: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) floatobject.c:700: initializer element for `PyFloat_Type.tp_flags' is not constant ... intobject.c:800: warning: excess elements in struct initializer after `int_as_number' intobject.c:825: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) intobject.c:825: initializer element for `PyInt_Type.tp_flags' is not constant make[1]: *** [intobject.o] Error 1 ... gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -I./../Include -I.. -DHAVE_CONFIG_H -c longobject.c -o longobject.o longobject.c:1865: warning: excess elements in struct initializer after `long_as_number' longobject.c:1890: `Py_TPFLAGS_NEWSTYLENUMBER' undeclared here (not in a function) longobject.c:1890: initializer element for `PyLong_Type.tp_flags' is not constant make[1]: *** [longobject.o] Error 1 From guido at python.org Wed Jan 17 17:09:27 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Jan 2001 11:09:27 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Wed, 17 Jan 2001 09:17:39 CST." <14949.46995.259157.871323@beluga.mojam.com> References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: <200101171609.LAA04102@cj20424-a.reston1.va.home.com> > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. Yes. By this I mean that AA are interchangeable, ditto for A<=B and B>=A. Also A==B interchanges for B==A, and A!=B for B!=A. > From a boolean standpoint this just can't be so. Guido mentions partial > orderings, but I'm still confused. Consider this example: Objects of type A > implement rich comparisons. Objects of type B don't. If my code looks like > > a = A() > b = B() > ... > if b < a: > ... > > My interpretation of the rich comparison stuff is that either > > 1. Since b doesn't implement rich comparisons, the interpreter falls > back to old fashioned comparisons which may or may not allow the > comparison of B objects and A objects. > > or > > 2. The sense of the inequality is switched (a > b) and the rich > comparison code in A's implementation is called. It's case 2. > That's my reading of it. It has to be wrong. The inverse comparison should > be a >= b, not a > b, but the described pairing of comparison functions > would imply otherwise. We're trying very hard *not* to make any connections between a=b. You've learned in grade school that these are each other's Boolean inverse (a=b is false). However, for partial orderings this may not be true: for unordered a and b, none of ab, a>=b, a==b may be true. On the other hand, even for partially ordered types, aa (note: swapped arguments *and* swapped sense of comparison) always give the same outcome! > I'm sure I'm missing something obvious or revealing some fundamental failure > of my grade school education. Please explain... I think what threw you off was the ambiguity of "inverse". This means Boolean negation. I'm not relying on Boolean negation here -- I'm relying on the more fundamental property that aa have the same outcome. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh21 at cam.ac.uk Wed Jan 17 17:13:32 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 17 Jan 2001 16:13:32 +0000 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Skip Montanaro's message of "Wed, 17 Jan 2001 09:17:39 -0600 (CST)" References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: Skip Montanaro writes: > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. > >From a boolean standpoint this just can't be so. Guido mentions partial > orderings, but I'm still confused. Consider this example: Objects of type A > implement rich comparisons. Objects of type B don't. If my code looks like > > a = A() > b = B() > ... > if b < a: > ... > > My interpretation of the rich comparison stuff is that either > > 1. Since b doesn't implement rich comparisons, the interpreter falls > back to old fashioned comparisons which may or may not allow the > comparison of B objects and A objects. > > or > > 2. The sense of the inequality is switched (a > b) and the rich > comparison code in A's implementation is called. > > That's my reading of it. It has to be wrong. The inverse comparison should > be a >= b, not a > b, but the described pairing of comparison functions > would imply otherwise. > > I'm sure I'm missing something obvious or revealing some fundamental failure > of my grade school education. Please explain... For a total order: a < b if and only if b > a. This is what the rich comparison code does. a < b if and only if a >= b. This is that the rich comparison code doesn't do. Does this make sense? Cheers, M. -- Presumably pronging in the wrong place zogs it. -- Aldabra Stoddart, ucam.chat From moshez at zadka.site.co.il Thu Jan 18 01:08:06 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 18 Jan 2001 02:08:06 +0200 (IST) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <14949.46995.259157.871323@beluga.mojam.com> References: <14949.46995.259157.871323@beluga.mojam.com> Message-ID: <20010118000806.D1C04A828@darjeeling.zadka.site.co.il> On Wed, 17 Jan 2001 09:17:39 -0600 (CST), Skip Montanaro wrote: > I'm a bit confused about Guido's rich comparison stuff. In the description > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. I think that you're confused between two meanings of inverses. You think: op is an inverse of op' if for every a,b (a op b) = not (a op' b) Guido meant (and I hope, implemented): op is an inverse of op' if for every a,b (a op b) = (b op' a) And aa a<=b iff b>=a Sounds sane. Unless I'm the one confused.... -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! From fredrik at effbot.org Wed Jan 17 17:47:29 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 17:47:29 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: Message-ID: <012901c080a5$306023a0$e46940d5@hagrid> tim wrote: > > Should I check it in? > > Absolutely! But not like as for 2.0: check it in *now*, so we have a few > days to deal with surprises before the alpha release. as it turned out, the source I had didn't build, and the table- building python script generated something that wasn't quite compatible with the C code. bit rot. I've almost sorted it all out. will check it in later tonight (local time). From tim.one at home.com Wed Jan 17 19:27:11 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 13:27:11 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Tools/idle CallTipWindow.py,1.2,1.3 CallTips.py,1.7,1.8 ClassBrowser.py,1.11,1.12 Debugger.py,1.14,1.15 Delegator.py,1.2,1.3 FileList.py,1.7,1.8 FormatParagraph.py,1.8,1.9 IdleConf.py,1.5,1.6 IdleHistory.py,1.3,1 In-Reply-To: <200101171358.IAA27661@cj20424-a.reston1.va.home.com> Message-ID: [an anonymous developer panics, after Tim "reindent"s the IDLE dir] > Oh no! > > I have a whole slew of changes to IDLE sitting in my work directory. > If I do an update half of these will turn into merge conflicts. :-( > > Don't worry, I'll get over it. I imagine this will pop up from time to time until everything is normalized. If it's about to burn you, run reindent.py on the affected directory *before* you update ("python redindent.py -v ."). That will make all the same changes to your local versions as were checked in, modulo the rare hand-edit (of which there were none in the IDLE directory). From akuchlin at mems-exchange.org Wed Jan 17 20:04:04 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 14:04:04 -0500 Subject: [Python-Dev] PEP 229 checked in Message-ID: I've checked in the last bit of the PEP 229 changes. Be sure to rename your Modules/Setup file (or do a 'make distclean' before rebuilding. Squeal if you run into trouble, or file bugs on SF. --am"Aieee!"k From jeremy at alum.mit.edu Wed Jan 17 20:12:47 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 17 Jan 2001 14:12:47 -0500 (EST) Subject: [Python-Dev] unexpected consequence of function attributes Message-ID: <14949.61103.258714.325465@localhost.localdomain> I have found one place in the library that depended on hasattr(func, '__dict__') to return false -- dis.dis. You might want to check and see if there is anything other code that doesn't expect function's to have extra attributes. I expect that only introspective code would be affected. Jeremy From barry at wooz.org Wed Jan 17 20:46:36 2001 From: barry at wooz.org (Barry A. Warsaw) Date: Wed, 17 Jan 2001 14:46:36 -0500 Subject: [Python-Dev] Re: unexpected consequence of function attributes References: <14949.61103.258714.325465@localhost.localdomain> Message-ID: <14949.63132.583025.303677@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I have found one place in the library that depended on JH> hasattr(func, '__dict__') to return false -- dis.dis. You JH> might want to check and see if there is anything other code JH> that doesn't expect function's to have extra attributes. I JH> expect that only introspective code would be affected. I guess we need a test_dis.py in the regression test suite, eh? :) Here's an extremely quick and dirty fix to dis.py. -Barry -------------------- snip snip -------------------- Index: dis.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/dis.py,v retrieving revision 1.28 diff -u -r1.28 dis.py --- dis.py 2001/01/14 23:36:05 1.28 +++ dis.py 2001/01/17 19:45:40 @@ -15,6 +15,10 @@ return if type(x) is types.InstanceType: x = x.__class__ + if hasattr(x, 'func_code'): + x = x.func_code + if hasattr(x, 'im_func'): + x = x.im_func if hasattr(x, '__dict__'): items = x.__dict__.items() items.sort() @@ -28,17 +32,12 @@ except TypeError, msg: print "Sorry:", msg print + elif hasattr(x, 'co_code'): + disassemble(x) else: - if hasattr(x, 'im_func'): - x = x.im_func - if hasattr(x, 'func_code'): - x = x.func_code - if hasattr(x, 'co_code'): - disassemble(x) - else: - raise TypeError, \ - "don't know how to disassemble %s objects" % \ - type(x).__name__ + raise TypeError, \ + "don't know how to disassemble %s objects" % \ + type(x).__name__ def distb(tb=None): """Disassemble a traceback (default: last traceback).""" From barry at wooz.org Wed Jan 17 20:49:51 2001 From: barry at wooz.org (Barry A. Warsaw) Date: Wed, 17 Jan 2001 14:49:51 -0500 Subject: [Python-Dev] Re: unexpected consequence of function attributes References: <14949.61103.258714.325465@localhost.localdomain> Message-ID: <14949.63327.22745.359978@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I have found one place in the library that depended on JH> hasattr(func, '__dict__') to return false -- dis.dis. You JH> might want to check and see if there is anything other code JH> that doesn't expect function's to have extra attributes. I JH> expect that only introspective code would be affected. Patch #103303 http://sourceforge.net/patch/?func=detailpatch&patch_id=103303&group_id=5470 From tim.one at home.com Wed Jan 17 21:51:57 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 15:51:57 -0500 Subject: [Python-Dev] Windows Python totally hosed Message-ID: Failures range from test test_winsound skipped -- Module use of python20.dll conflicts with this version of Python. to test test_tokenize crashed -- exceptions.AttributeError: 're' module has no attribute 'compile' I suspect the latter is really a disguised version of C:\Code\python\dist\src\PCbuild>python Python 2.1a1 (#8, Jan 17 2001, 13:15:23) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import re Traceback (most recent call last): File "", line 1, in ? File "c:\code\python\dist\src\lib\re.py", line 28, in ? from sre import * File "c:\code\python\dist\src\lib\sre.py", line 17, in ? import sre_compile File "c:\code\python\dist\src\lib\sre_compile.py", line 11, in ? import _sre ImportError: Module use of python20.dll conflicts with this version of Python. >>> Suspect all of this has to do with patchlevel.h changing. I'll try to dope it out, but if anyone knows the cure off the top of their head, don't be shy! From akuchlin at mems-exchange.org Wed Jan 17 22:00:56 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 16:00:56 -0500 Subject: [Python-Dev] Re: 'Setup' buglet In-Reply-To: <200101171928.OAA21460@cj20424-a.reston1.va.home.com>; from guido@python.org on Wed, Jan 17, 2001 at 02:28:36PM -0500 References: <200101171928.OAA21460@cj20424-a.reston1.va.home.com> Message-ID: <20010117160056.A20603@kronos.cnri.reston.va.us> [Taking this bug public] On Wed, Jan 17, 2001 at 02:28:36PM -0500, Guido van Rossum wrote: >One problem seems to be that the creation >of the (minimal) Modules/Setup file doesn't seem to be doing the right >thing. When I delete Modules/Setup, the next "make" doesn't create >it; it used to be copied from Setup.dist if it doesn't exist. This seems to have been removed from Modules/Makefile.pre.in in revision 1.69 by Fred; instead the configure script now copies Setup.dist to Setup, so you have to rerun configure in order to create Modules/Setup after deleting it. --amk From mal at lemburg.com Wed Jan 17 22:04:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 22:04:29 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests Message-ID: <3A6608DD.E12A2422@lemburg.com> I've just checked in a patch which removes all uses of the assert statement in the regression tests. This makes the tests compatible with the -O mode of Python and also allows centralizing error reporting (many tests already provide their own little test function for this purpose). I urge you to only check in tests which use the new API verify() to verify a certain condition. The API is defined in the regression tools module test_support. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at effbot.org Wed Jan 17 22:21:56 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 22:21:56 +0100 Subject: [Python-Dev] Windows Python totally hosed References: Message-ID: <028801c080cb$86658350$e46940d5@hagrid> tim wrote: > Suspect all of this has to do with patchlevel.h changing. I'll try to dope > it out, but if anyone knows the cure off the top of their head, don't be > shy! text.replace("python20", "python21") for all files in the PCBuild directory, plus PC/config.h From tim.one at home.com Wed Jan 17 22:42:13 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 16:42:13 -0500 Subject: [Python-Dev] Windows Python totally hosed In-Reply-To: <028801c080cb$86658350$e46940d5@hagrid> Message-ID: [/F] > text.replace("python20", "python21") for all files in > the PCBuild directory, plus PC/config.h Brrrr. It strikes me as insane to have the core Python files in an MS project file *named* after the release number (python20.dsp). So I'm going to change that to core.dsp so that at least that much never needs to be changed again. gratefully y'rs - tim From fredrik at effbot.org Wed Jan 17 22:47:28 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 17 Jan 2001 22:47:28 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests References: <3A6608DD.E12A2422@lemburg.com> Message-ID: <02b401c080cf$1a3a5530$e46940d5@hagrid> mal wrote: > I urge you to only check in tests which use the new API > verify() to verify a certain condition. The API is defined > in the regression tools module test_support. did you run the test yourself after applying that patch? (a patch to the patch is on the way in. please check that the test suite still runs on non-Windows boxes...) From gstein at lyra.org Wed Jan 17 22:45:44 2001 From: gstein at lyra.org (Greg Stein) Date: Wed, 17 Jan 2001 13:45:44 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.106,2.107 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Wed, Jan 17, 2001 at 01:27:04PM -0800 References: Message-ID: <20010117134544.H7731@lyra.org> On Wed, Jan 17, 2001 at 01:27:04PM -0800, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv14991 > > Modified Files: > object.c > Log Message: > Deal properly (?) with comparing recursive datastructures. >... > - Change the in-progress code to use static variables instead of > globals (both the nesting level and the key for the thread dict were > globals but have no reason to be globals; the key can even be a > function-static variable in get_inprogress_dict()). The "compare_nesting" variable is a bit troublesome long-term -- it will cause threading issues in a free-threaded implementation. The solution is to put the value into the thread-state. [ not sure if it matters right now, but just bringing it up ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Wed Jan 17 22:55:02 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 17 Jan 2001 16:55:02 -0500 (EST) Subject: [Python-Dev] [PEP 205] weak references patch Message-ID: <14950.5302.356566.778486@cj42289-a.reston1.va.home.com> I've updated the patch that implements PEP 205: http://sourceforge.net/patch/?func=detailpatch&patch_id=103203&group_id=5470 The actual patch is too big for SF: http://starship.python.net/crew/fdrake/patches/weakref.patch-5 One thing about this is that it changes some of the low-level object creation macros, so you'll need to do a "make clean" before "make" when testing it. Have fun! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at lemburg.com Wed Jan 17 23:16:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 23:16:29 +0100 Subject: [Python-Dev] Usage of "assert" in regression tests References: <3A6608DD.E12A2422@lemburg.com> <02b401c080cf$1a3a5530$e46940d5@hagrid> Message-ID: <3A6619BD.2AC8F6D3@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > I urge you to only check in tests which use the new API > > verify() to verify a certain condition. The API is defined > > in the regression tools module test_support. > > did you run the test yourself after applying that patch? Yes, but as I wrote in the SF patch message: I can only test it on Linux and there not all tests are run due to missing extensions. The alpha testing will hopefully catch all possible bugs this patch introduced. > (a patch to the patch is on the way in. please check > that the test suite still runs on non-Windows boxes...) I'll have to leave that to the Windows wizards, sorry. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Wed Jan 17 23:49:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 17 Jan 2001 23:49:25 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: ; from akuchlin@mems-exchange.org on Wed, Jan 17, 2001 at 02:04:04PM -0500 References: Message-ID: <20010117234925.A17392@xs4all.nl> On Wed, Jan 17, 2001 at 02:04:04PM -0500, Andrew Kuchling wrote: > I've checked in the last bit of the PEP 229 changes. Be sure to > rename your Modules/Setup file (or do a 'make distclean' before > rebuilding. make distclean doesn't remove Modules/Setup anymore :) Also, I couldn't get it to work with an old tree, even after several make distclean/reconfigures. I got tired looking for it, so I just grabbed a new tree. > Squeal if you run into trouble, or file bugs on SF. I have a couple of questions: what to do when setup.py doesn't work ? Is there a way to make it bypass a module ? What about specifying include dirs manually, for some modules (for instance, when you have readline source in a separate directory, and want to link it statically.) Here are are some specific squeals. See at the bottom for the most important one :) On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by setup.py. Also, SSL support for the socket module was not enabled, though OpenSSL is installed, in the default path. On Debian GNU/Linux' 'woody', the 'testing' (soon 'stable') branch, I can't compile dbmmodule: building 'dbm' extension gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -fpic -I. -I/home/thomas/python/python/dist/src/./Include -IInclude/ -c /home/thomas/python/python/dist/src/Modules/dbmmodule.c -o build/temp.linux-i686-2.1/dbmmodule.o /home/thomas/python/python/dist/src/Modules/dbmmodule.c:24: #error "No ndbm.h available!" error: command 'gcc' failed with exit status 1 make: *** [sharedmods] Error 1 (ndbm.h does exist, as /usr/include/db1/ndbm.h. There is also /usr/include/gdbm-ndbm.h, but I'm not sure if that's the same.) Nor can I build the _tkinter module there: building '_tkinter' extension gcc -g -O2 -Wall -Wstrict-prototypes -fPIC -fpic -DWITH_APPINIT=1 -I/usr/X11R6/include -I. -I/home/thomas/python/python/dist/src/./Include -IInclude/ -c /home/thomas/python/python/dist/src/Modules/_tkinter.c -o build/temp.linux-i686-2.1/_tkinter.o /home/thomas/python/python/dist/src/Modules/_tkinter.c:44: tcl.h: No such file or directory In file included from /home/thomas/python/python/dist/src/Modules/_tkinter.c:45:/usr/include/tk.h:66: tcl.h: No such file or directory error: command 'gcc' failed with exit status 1 make: *** [sharedmods] Error 1 The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, which I personally like a lot, though it's probably a bitch to autodetect. (I tried, using autoconf ;-P) On Debian GNU/Linux 'sid', the current unstable branch, I can't compile Python at all, now: c++ -Xlinker -export-dynamic python.o \ ../libpython2.1.a -lpthread -ldl -lutil -lm -o python ../libpython2.1.a(posixmodule.o): In function `posix_tmpnam': /home/thomas/python/python-write/dist/src/Modules/./posixmodule.c:4115: the use of `tmpnam_r' is dangerous, better use `mkstemp' ../libpython2.1.a(posixmodule.o): In function `posix_tempnam': /home/thomas/python/python-write/dist/src/Modules/./posixmodule.c:4071: the use of `tempnam' is dangerous, better use `mkstemp' mv python ../python make[1]: Leaving directory `/home/thomas/python/python-write/dist/src/Modules' ./python ./setup.py build running build running build_ext Traceback (most recent call last): File "./setup.py", line 460, in ? main() File "./setup.py", line 455, in main ext_modules=[Extension('struct', ['structmodule.c'])] File "/home/thomas/python/python-write/dist/src/Lib/distutils/core.py", line 138, in setup dist.run_commands() File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 871, in run_commands self.run_command(cmd) File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 891, in run_command cmd_obj.run() File "/home/thomas/python/python-write/dist/src/Lib/distutils/command/build.py", line 106, in run self.run_command(cmd_name) File "/home/thomas/python/python-write/dist/src/Lib/distutils/cmd.py", line 328, in run_command self.distribution.run_command(command) File "/home/thomas/python/python-write/dist/src/Lib/distutils/dist.py", line 891, in run_command cmd_obj.run() File "/home/thomas/python/python-write/dist/src/Lib/distutils/command/build_ext.py", line 202, in run customize_compiler(self.compiler) File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 121, in customize_compiler (cc, opt, ccshared, ldshared, so_ext) = \ File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 389, in get_config_vars func() File "/home/thomas/python/python-write/dist/src/Lib/distutils/sysconfig.py", line 302, in _init_posix raise DistutilsPlatformError, my_msg distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) make: *** [sharedmods] Error 1 For the record, I don't have a /usr/lib/python2.1 directory on the other machines either. I haven't been able to test FreeBSD yet, will get to that later tonight. And most importantly(!), on all these machines, 'make test' stops functioning. In fact, after setup.py started building, you can't run 'make' without 'make clean' anymore. You get a lot of undefined-symbol warnings (see below.) If you run 'make clean;make test' it also doesn't work, because the build directory is not in the Python library path, and regrtest.py requires (at least) the time module. c++ -Xlinker -export-dynamic python.o \ ../libpython2.1.a -lpthread -ldl -lutil -lm -o python ../libpython2.1.a(posixmodule.o): In function `posix_tmpnam': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:4115: the use of `tmpnam_r' is dangerous, better use `mkstemp' ../libpython2.1.a(posixmodule.o): In function `posix_tempnam': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:4071: the use of `tempnam' is dangerous, better use `mkstemp' ../libpython2.1.a(myreadline.o): In function `my_fgets': /home/thomas/python/python/dist/src/Parser/myreadline.c:41: undefined reference to `PyOS_InterruptOccurred' /home/thomas/python/python/dist/src/Parser/myreadline.c:35: undefined reference to `PyOS_InterruptOccurred' ../libpython2.1.a(errors.o): In function `PyErr_SetFromErrnoWithFilename': /home/thomas/python/python/dist/src/Python/errors.c:260: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(pythonrun.o): In function `Py_Finalize': /home/thomas/python/python/dist/src/Python/pythonrun.c:193: undefined reference to `PyOS_FiniInterrupts' ../libpython2.1.a(pythonrun.o): In function `initsigs': /home/thomas/python/python/dist/src/Python/pythonrun.c:1161: undefined reference to `PyOS_InitInterrupts' ../libpython2.1.a(traceback.o): In function `tb_printinternal': /home/thomas/python/python/dist/src/Python/traceback.c:213: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(fileobject.o): In function `get_line': /home/thomas/python/python/dist/src/Objects/fileobject.c:883: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `long_format': /home/thomas/python/python/dist/src/Objects/longobject.c:644: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `x_divrem': /home/thomas/python/python/dist/src/Objects/longobject.c:855: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(longobject.o): In function `long_mul': /home/thomas/python/python/dist/src/Objects/longobject.c:1193: undefined reference to `PyErr_CheckSignals' ../libpython2.1.a(object.o):/home/thomas/python/python/dist/src/Objects/object.c:174: more undefined references to `PyErr_CheckSignals' follow ../libpython2.1.a(posixmodule.o): In function `posix_fork': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:1666: undefined reference to `PyOS_AfterFork' ../libpython2.1.a(posixmodule.o): In function `posix_forkpty': /home/thomas/python/python/dist/src/Modules/./posixmodule.c:1733: undefined reference to `PyOS_AfterFork' collect2: ld returned 1 exit status make[1]: *** [link] Error 1 make[1]: Leaving directory `/home/thomas/python/python/dist/src/Modules' make: *** [python] Error 2 -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Wed Jan 17 23:56:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 17 Jan 2001 23:56:58 +0100 Subject: [Python-Dev] Standard install locations for Python ? Message-ID: <3A66233A.A6AE07BD@lemburg.com> I'm currently busy building new version of my mx packages. While trying to convert all of them to distutils I found that there seems to be no standard for installing documentation or other data files of Python extensions. I also noted, that for Windows the standard extension installation defaults to \Python instead of some \Python\Site-Packages. So the general question is: Where should Python extensions install themselves and their docs ? (On Linux the typical place for docs is /usr/doc/packages, for Python code it is /usr/local/lib/pythonX.X/site-packages, BTW) Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Thu Jan 18 00:04:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 17 Jan 2001 18:04:09 -0500 Subject: [Python-Dev] Rich Comparisons technical prerelease In-Reply-To: <200101170422.XAA20626@cj20424-a.reston1.va.home.com>; from guido@python.org on Tue, Jan 16, 2001 at 11:22:54PM -0500 References: <200101170422.XAA20626@cj20424-a.reston1.va.home.com> Message-ID: <20010117180409.A17897@thyrsus.com> Guido van Rossum : > This makes it possible to define types with partial orderings. Guido's time machine is working again, and seems now to have been augmented by telepathy. I was just thinking about bugging him about this... I will definitely check this out with my set() class -- it was waiting on rich comparisons so I could do partial-orderings properly. If it works, we'll have set algebra for the standard library. Coolness. -- Eric S. Raymond Under democracy one party always devotes its chief energies to trying to prove that the other party is unfit to rule--and both commonly succeed, and are right... The United States has never developed an aristocracy really disinterested or an intelligentsia really intelligent. Its history is simply a record of vacillations between two gangs of frauds. --- H. L. Mencken From akuchlin at mems-exchange.org Thu Jan 18 00:09:47 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 17 Jan 2001 18:09:47 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117234925.A17392@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 17, 2001 at 11:49:25PM +0100 References: <20010117234925.A17392@xs4all.nl> Message-ID: <20010117180947.E9384@kronos.cnri.reston.va.us> On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: >I have a couple of questions: what to do when setup.py doesn't work ? Is >there a way to make it bypass a module ? What about specifying include dirs There's a 'disabled_module_list' global in the code, but no way to set it from the command-line yet, since I couldn't figure out how to do that in time. >On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by >setup.py. Also, SSL support for the socket module was not enabled, though >OpenSSL is installed, in the default path. Can you take a look at the detection code in setup.py and see what's going wrong. I believe it should be found if OpenSSL is in /usr/local/, but /usr/contrib isn't checked currently. >The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, >which I personally like a lot, though it's probably a bitch to autodetect. >(I tried, using autoconf ;-P) There's code to handle Debian, though I have no way of testing it, and it worked on Neil's Debian box for some reason. Search for debian_tcl_include in setup.py, and see if you can fix it. >distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) Are you sure setup.py is up to date; do a 'cvs update setup.py' to check. You might get a "setup.py is in the way; remove it' message if you downloaded the first setup.py script manually. >without 'make clean' anymore. You get a lot of undefined-symbol warnings >(see below.) If you run 'make clean;make test' it also doesn't work, because >the build directory is not in the Python library path, and regrtest.py >requires (at least) the time module. Again, be sure the tree is up to date; I think this stems from attempting to compile the signal module as shared, which doesn't work. I know that "make test" doesn't work, but am not sure how to fix it yet. --amk From tim.one at home.com Thu Jan 18 00:42:24 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 17 Jan 2001 18:42:24 -0500 Subject: [Python-Dev] Windows Python totally rad Message-ID: Windows Python runs normally again, modulo four test failures I figure are due to the "get rid of assert" patch. Note that the python20 DevStudio subproject is gone. It's been replaced by a new subproject named pythoncore. From thomas at xs4all.net Thu Jan 18 00:44:00 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 00:44:00 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117234925.A17392@xs4all.nl>; from thomas@xs4all.net on Wed, Jan 17, 2001 at 11:49:25PM +0100 References: <20010117234925.A17392@xs4all.nl> Message-ID: <20010118004400.B17392@xs4all.nl> On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: I got around to testing on FreeBSD now, and it actually went pretty smooth! However, some small points: > On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > setup.py. Also, SSL support for the socket module was not enabled, though > OpenSSL is installed, in the default path. Curiously enough, FreeBSD, with OpenSSL installed in /usr/include/openssl, *did* get the socketmodule compiled with SSL support, but without the necessary -I directive, so the compile failed. > And most importantly(!), on all these machines, 'make test' stops > functioning. In fact, after setup.py started building, you can't run 'make' > without 'make clean' anymore. You get a lot of undefined-symbol warnings Strangely enough, this problem does not exist on FreeBSD. I can run 'make' or 'make test' after 'make' just fine. 'make test' still doesn't work because of the incorrect library path, but it doesn't barf like the other systems (BSDI and Debian Linux) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr at thyrsus.com Thu Jan 18 01:32:53 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 17 Jan 2001 19:32:53 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <20010118000806.D1C04A828@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 18, 2001 at 02:08:06AM +0200 References: <14949.46995.259157.871323@beluga.mojam.com> <20010118000806.D1C04A828@darjeeling.zadka.site.co.il> Message-ID: <20010117193253.A18565@thyrsus.com> Moshe Zadka : > I think that you're confused between two meanings of inverses. > > You think: > op is an inverse of op' if for every a,b (a op b) = not (a op' b) > > Guido meant (and I hope, implemented): > op is an inverse of op' if for every a,b (a op b) = (b op' a) I thought the same. if (a op1 b) <=> (b op2 a), op2 is properly described as the "reflection" of op1, and vice-versa. -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From greg at cosc.canterbury.ac.nz Thu Jan 18 01:22:11 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Jan 2001 13:22:11 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> Michael Hudson : > a < b if and only if b > a. > This is what the rich comparison code does. Someone is bound to come up with a use for comparison operator overloading in which this isn't true, just to be difficult! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at python.org Thu Jan 18 04:40:31 2001 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Jan 2001 22:40:31 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.106,2.107 In-Reply-To: Your message of "Wed, 17 Jan 2001 13:45:44 PST." <20010117134544.H7731@lyra.org> References: <20010117134544.H7731@lyra.org> Message-ID: <200101180340.WAA00655@cj20424-a.reston1.va.home.com> > > - Change the in-progress code to use static variables instead of > > globals (both the nesting level and the key for the thread dict were > > globals but have no reason to be globals; the key can even be a > > function-static variable in get_inprogress_dict()). > > The "compare_nesting" variable is a bit troublesome long-term -- it will > cause threading issues in a free-threaded implementation. The solution is to > put the value into the thread-state. > > [ not sure if it matters right now, but just bringing it up ] Good point -- especially since the in-progress-dict is already part of the thread state. Jeremy explained to me that the compare_nesting variable is mostly an optimization (avoiding the work with the in-progress-dict when we don't know for sure that it's worth it) but yes, mixing nesting levels (even if the dicts are separate) could cause coupling or interference between threads... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Thu Jan 18 05:20:30 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 17 Jan 2001 22:20:30 -0600 (CST) Subject: [Python-Dev] urllib.urlencode & repeated values Message-ID: <14950.28430.572215.10643@beluga.mojam.com> I'm pretty sure this has come up before, but urllib.urlencode doesn't handle repeated parameters properly. If I call urllib.urlencode({"performers": ("U2","Lawrence Martin")}) instead of getting performers=U2&performers=Lawrence+Martin I get a quoted stringified tuple: performers=%28%27U2%27%2c+%27Lawrence+Martin%27%29 Obviously, fixing this will change the function's current semantics, but I think it's worth treating lists and tuples (actually, any sequence) as repeated values. If the existing semantics are deemed valuable enough, a third default parameter could be added to switch on the new behavior when desired. If others agree I'd be happy to whip up a patch. I think it's a bug. Skip From jeremy at alum.mit.edu Thu Jan 18 03:58:19 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 17 Jan 2001 21:58:19 -0500 (EST) Subject: [Python-Dev] bug in grammar Message-ID: <14950.23499.275398.963621@localhost.localdomain> As part of the implementation of PEP 227 (and in an attempt to reach some low-hanging fruit Guido mentioned on the types-sig long ago), I have been working on a compiler pass that generates a module-level symbol table. I recently discovered a bug in the handling of list comprehensions that was giving me headaches. I realize now that the problem is with the current grammar and/or compiler. Here's a simple demonstration; try it in your friendly python 2.0 interpreter. >>> [i for i in range(10)] = (1, 2, 3) Traceback (most recent call last): File "", line 1, in ? ValueError: unpack list of wrong size The generated bytecode is: 0 SET_LINENO 0 3 SET_LINENO 1 6 LOAD_CONST 0 (1) 9 LOAD_CONST 1 (2) 12 LOAD_CONST 2 (3) 15 BUILD_TUPLE 3 18 UNPACK_SEQUENCE 1 21 STORE_NAME 0 (i) 24 LOAD_CONST 3 (None) 27 RETURN_VALUE I assume this isn't intended :-). The compiler is ignoring everything after the initial atom in the list comprehension. It's basically compiling the code as if it were: [i] = (1, 2, 3) I'm not sure how to try and fix this. Should the grammar allow one to construct the example statement above? If not, I'm not sure how to fix the grammar. If not, I suppose the compiler should detect that the list comp is misplaced. This seems fairly messy, since there are about 10 nodes between the expr_stmt and the list_for. Or is this a cool way to use list comprehensions to generate ValueErrors? Jeremy From akuchlin at mems-exchange.org Thu Jan 18 06:19:31 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Thu, 18 Jan 2001 00:19:31 -0500 Subject: [Python-Dev] Embedded language discussion Message-ID: <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> http://www.kuro5hin.org/?op=displaystory;sid=2001/1/16/11334/2280 The poster is on a project that's trying to use Python, but they're encountering unspecified problems (perhaps because of the global interpreter lock). --amk From mal at lemburg.com Thu Jan 18 10:32:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 10:32:54 +0100 Subject: [Python-Dev] Windows Python totally rad References: Message-ID: <3A66B846.3D24B959@lemburg.com> Tim Peters wrote: > > Windows Python runs normally again, modulo four test failures I figure are > due to the "get rid of assert" patch. Could you tell me which these are ? The tests tested all passed just fine, so I guess these must be Windows-related problems. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at effbot.org Thu Jan 18 07:48:41 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 18 Jan 2001 07:48:41 +0100 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? References: <012901c080a5$306023a0$e46940d5@hagrid> Message-ID: <008701c0811a$b3371c00$e46940d5@hagrid> I wrote: > I've almost sorted it all out. will check it in later tonight (local > time). python build problems and real life got in the way. will 2.1a1 be released according to plan? will there be a 2.1a2 release? maybe I should postpone this? From esr at thyrsus.com Thu Jan 18 08:23:21 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 18 Jan 2001 02:23:21 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? Message-ID: <20010118022321.A9021@thyrsus.com> So I'm writing a module to that needs to generate unique cookies. The module will run inside one of two environments: (1) a trivial test wrapper, not threaded, and (2) a lomg-running multithreaded server. Because Python garbage-collects, hash() of a just-created object isn't good enough. Because we may be threading, millisecond time isn't good enough. Because we may *not* be threading, thread ID isn't good either. On the other hand, I'm on Linux getting millisecond time resolution. And it's not hard to notice that an object hash is a memory address. So, how about `time.time()` + hex(hash([]))? It looks to me like this will remain unique forever, because another thread would have to create an object at the same memory address during the same millisecond to collide. Furthermore, it looks to me like this hack might be portable to any OS with a clock tick shorter than its timeslice. Comments? -- Eric S. Raymond Good intentions will always be pleaded for every assumption of authority. It is hardly too strong to say that the Constitution was made to guard the people against the dangers of good intentions. There are men in all ages who mean to govern well, but they mean to govern. They promise to be good masters, but they mean to be masters. -- Daniel Webster From ping at lfw.org Thu Jan 18 10:29:13 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 01:29:13 -0800 (PST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: <200101161402.JAA05045@cj20424-a.reston1.va.home.com> Message-ID: On Tue, 16 Jan 2001, Guido van Rossum wrote: > You mean the tp_print and tp_str function slots in type objects, > right? tp_print *should* always render exactly the same as tp_str. > tp_print is used by the print statement, not by value display at the > interactive prompt. Uh, i hate to disagree with you about your own interpreter, but: com_expr_stmt in Python/compile.c inserts a PRINT_EXPR opcode if c_interactive is true; eval_code2 in Python/ceval.c handles PRINT_EXPR by calling displayhook; sys_displayhook in Python/sysmodule.c prints the object by calling PyFile_WriteObject on sys.stdout; PyFile_WriteObject in Objects/fileobject.c calls PyObject_Print if the file is really a PyFileObject; PyObject_Print in Objects/object.c calls op->ob_type->tp_print if it's not NULL. The print statement produces a PRINT_ITEM opcode, which invokes PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW flag is propagated down to PyObject_Print and into string_print, where it causes the string to fwrite itself directly without quoting. > So, string_print most definitely should *not* be changed -- only > string_repr! I had to change them both before i actually saw the change in the interactive interpreter. Actually, your statement above (that the two should always render the same) seems to imply that if i change one, i must also change the other. -- ?!ng From sjoerd at oratrix.nl Thu Jan 18 11:11:09 2001 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Thu, 18 Jan 2001 11:11:09 +0100 Subject: [Python-Dev] distutils in Python 2.1 not ready for prime time Message-ID: <20010118101110.6D29C31E1B8@bireme.oratrix.nl> I just updated my copy of python with the current CVS version and I am not happy. The current version uses distutils for configuring and compiling most modules that are written in C. That is a nice idea in theory, but in practice it's not ready for prime time yet. The major advantage of using a Setup file is that you can add your own -I and -L compiler flags on a module-by-module basis. I *need* those flags since not all libraries and include files are in standard places (e.g. I need -I/usr/local/include and -L/usr/local/lib for some modules which my compiler doesn't provide by itself). There seems to be no way to tell distutils to supply those flags. The documentation (only on the web site, also not great, but I assume more documentation (at least an up-to-date README) will be provided in the final release) says that that has not yet been implemented. -- Sjoerd Mullender From ping at lfw.org Thu Jan 18 11:14:19 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 02:14:19 -0800 (PST) Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: <3A66BCCC.14997FE3@lemburg.com> Message-ID: I hope you don't mind that i'm taking this over to python-dev, because it led me to discover a more general issue (see below). For the others on python-dev, here's the background: MAL was about to check in the unistr() function, described as follows: > This patch adds a utility function unistr() which works just like > the standard builtin str() -- only that the return value will > always be a Unicode object. > > The patch also adds a new object level C API PyObject_Unicode() > which complements PyObject_Str(). I responded: > Why are unistr() and unicode() two separate functions? > > str() performs one task: convert to string. It can convert anything, > including strings or Unicode strings, numbers, instances, etc. > > The other type-named functions e.g. int(), long(), float(), list(), > tuple() are similar in intent. > > Why have unicode() just for converting strings to Unicode strings, > and unistr() for converting everything else to a Unicode string? > What does unistr(x) do differently from unicode(x) if x is a string? MAL responded: > unistr() is meant to complement str() very closely. unicode() > works as constructor for Unicode objects which can also take > care of decoding encoded data. str() and unistr() don't provide > this capability but instead always assume the default encoding. > > There's also a subtle difference in that str() and unistr() > try the tp_str slot which unicode() doesn't. unicode() > supports any character buffer which str() and unistr() don't. Okay, given this explanation, i still feel fairly confident that unicode() should subsume unistr(). Many of the other type-named functions try various slots: int() looks for __int__ float() looks for __float__ long() looks for __long__ str() looks for __str__ In testing this i also discovered the following: >>> class Foo: ... def __int__(self): ... return 3 ... >>> f = Foo() >>> int(f) 3 >>> long(f) Traceback (most recent call last): File "", line 1, in ? AttributeError: Foo instance has no attribute '__long__' >>> float(f) Traceback (most recent call last): File "", line 1, in ? AttributeError: Foo instance has no attribute '__float__' This is kind of surprising. How about: int() looks for __int__ float() looks for __float__, then tries __int__ long() looks for __long__, then tries __int__ str() looks for __str__ unicode() looks for __unicode__, then tries __str__ The extra parameter to unicode() is very similar to the extra parameter to int(), so i think there is a natural parallel here. Hmm... what about the other types? Wow!! __complex__ can produce a segfault! >>> complex >>> class Foo: ... def __complex__(self): return 3 ... >>> Foo() <__main__.Foo instance at 0x81e8684> >>> f = _ >>> complex(f) Segmentation fault (core dumped) This happens because builtin_complex first retrieves and saves the PyNumberMethods of the argument (in this case, from the instance), then tries to call __complex__ (in this case, returning 3), and THEN coerces the result using nbr->nb_float if the result is not complex! (This calls the instance's nb_float method on the integer object 3!!) I think __complex__ should probably look for __complex__, then __float__, then __int__. One could argue for __list__, __tuple__, or __dict__, but that seems much weaker; the Pythonic way has always been to implement __getitem__ instead. There is no built-in dict(); if it existed i suppose it would do the opposite of x.items(); again a weak argument, though i might have found such a function useful once or twice. And that about covers the built-in types for data. -- ?!ng From ping at lfw.org Thu Jan 18 11:16:42 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 18 Jan 2001 02:16:42 -0800 (PST) Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Message-ID: On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > str() looks for __str__ Oops. I forgot that str() looks for __str__, then tries __repr__ So, presumably, unicode() should look for __unicode__, then __str__, then __repr__ -- ?!ng From mal at lemburg.com Thu Jan 18 11:51:46 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 11:51:46 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: Message-ID: <3A66CAC2.74FC894@lemburg.com> Ka-Ping Yee wrote: > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > str() looks for __str__ > > Oops. I forgot that > > str() looks for __str__, then tries __repr__ > > So, presumably, > > unicode() should look for __unicode__, then __str__, then __repr__ Not quite... str() does this: 1. strings are passed back as-is 2. the type slot tp_str is tried 3. the method __str__ is tried 4. Unicode returns are converted to strings 5. anything other than a string return value is rejected unistr() does the same, but makes sure that the return value is an Unicode object. unicode() does the following: 1. for instances, __str__ is called 2. Unicode objects are returned as-is 3. string objects or character buffers are used as basis for decoding 4. decoding is applied to the character buffer and the results are returned I think we should perhaps merge the two approaches into one which then applies all of the above in unicode() (and then forget about unistr()). This might lose hide some type errors, but since all other generic constructors behave more or less in the same way, I think unicode() should too. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin at mira.cs.tu-berlin.de Thu Jan 18 11:48:30 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 11:48:30 +0100 Subject: [Python-Dev] Having extensions builtin Message-ID: <200101181048.f0IAmU210251@mira.informatik.hu-berlin.de> With the new distutils configuration scheme, it appears to be difficult to build modules in a non-shared way. Building modules non-shared is desirable when freezing is attempted, and also to reduce the startup time and memory consumption. It is still possible to add modules to Setup or Setup.local, so that they will be build into the interpreter. However, setup.py will still build them in a shared way afterwards. I propose that setup.py builds only those modules that are not builtin. Regards, Martin From martin at mira.cs.tu-berlin.de Thu Jan 18 13:20:06 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 13:20:06 +0100 Subject: [Python-Dev] Standard install locations for Python ? Message-ID: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> > Where should Python extensions install themselves and their docs? I feel that extensions should not need to care. For extensions, distutils will pick a location, and the system administrator configuration the package can chose a different location. Unfortunately, distutils does not support the installation of documentation, which I think it should. Now switching sides, as an administrator, I'd wish distutils to follow the system conventions by default. That means on Linux, documentation should go into the system's directory, which is /usr/share/doc according to latest standards. Distributions vary, so distutils should find out - e.g. by querying the location from rpm. In addition, when building RPMs, distutils should declare these files as %doc in the spec file, so RPM will install it following the system conventions. On Windows, the convention apparently is to put the documentation "nearby" the software, so it should probably go into Doc or a subdirectory thereof. On Unix, there appears to be no standard location, unless the documentation consists of man pages or perhaps info files. So /share/doc is probably a place as good as any other. Regards, Martin From martin at mira.cs.tu-berlin.de Thu Jan 18 11:39:30 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 11:39:30 +0100 Subject: [Python-Dev] SSL detection problem Message-ID: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> The distutils-based configuration fails to build on my system (SuSE 7.0) with the error /usr/src/python/Modules/socketmodule.c:159: rsa.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:160: crypto.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:161: x509.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:162: pem.h: Datei oder Verzeichnis nicht gefunden /usr/src/python/Modules/socketmodule.c:163: ssl.h: Datei oder Verzeichnis nicht gefunden The problem is that these header files are in /usr/include/openssl, which is not in the standard include search path. So the obvious request is: could this be fixed? I guess when setup.py finds the openssl library, it should also try to find ssl.h, in some obvious locations. The not-so-obvious question: How can one work-around such a problem with the new setup scheme? In the old scheme, I could have chosen to either provide the right -I option in Modules/Setup, to disable SSL support, or to disable the _socket module altogether. How can I achieve either configuration with the new scheme? Regards, Martin P.S. As a quick hack, I added a custom include_dirs parameter to the SSL extension. From martin at mira.cs.tu-berlin.de Thu Jan 18 13:39:54 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 13:39:54 +0100 Subject: [Python-Dev] bug in grammar Message-ID: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> > Should the grammar allow one to construct the example statement > above? It should not. Please note that the grammar allows a number of other things, e.g. a+b = c (pass this to parser.suite to see details) > If not, I'm not sure how to fix the grammar. The central problem is that it allows testlist on the LHS of an augassign or '=', whereas the languages only allows a small subset in that position. It is not possible to restrict the grammar in itself, as that will necessarily produce a conflict - you only know that the '+' was incorrect when you see the '='. > I suppose the compiler should detect that the list comp is misplaced I think there should be a well-formedness pass in-between. I.e. after the AST has been build, a single pass should descend through the tree, looking for an expr_statement with more than a single testlist. Once it finds one, it should confirm that this really is a well-formed lvalue (in C speak). In this case, the test should be that each term is a an atom without factors. If the parser itself performs such checks, the compiler could be simplified in many places, I guess. Regards, Martin From thomas at xs4all.net Thu Jan 18 10:53:14 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 10:53:14 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010117180947.E9384@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Wed, Jan 17, 2001 at 06:09:47PM -0500 References: <20010117234925.A17392@xs4all.nl> <20010117180947.E9384@kronos.cnri.reston.va.us> Message-ID: <20010118105314.D17392@xs4all.nl> On Wed, Jan 17, 2001 at 06:09:47PM -0500, Andrew Kuchling wrote: > >On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > >setup.py. Also, SSL support for the socket module was not enabled, though > >OpenSSL is installed, in the default path. > > Can you take a look at the detection code in setup.py and see what's > going wrong. I believe it should be found if OpenSSL is in > /usr/local/, but /usr/contrib isn't checked currently. Well, OpenSSL rests in the default location, which is /usr/local/ssl/include/openssl. Haven't the time to look into it right now, sorry. > >The Tcl/Tk header files are stored in /usr/include/tcl/ on Debian, > >which I personally like a lot, though it's probably a bitch to autodetect. > >(I tried, using autoconf ;-P) > There's code to handle Debian, though I have no way of testing it, and > it worked on Neil's Debian box for some reason. Search for > debian_tcl_include in setup.py, and see if you can fix it. Ah, yes. The problem in my case is that the *library* files are just in /usr/lib, but the include files are not. I re-indented the code to pull the debian-specific code out of the 'if prefix + os.sep + 'lib' not in lib_dirs' block, and it works now. Haven't tested it on other code yet, but I think it should work regardless. > >distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /usr/lib/python2.1/config/Makefile (No such file or directory) > Are you sure setup.py is up to date; do a 'cvs update setup.py' to check. > You might get a "setup.py is in the way; remove it' message if you > downloaded the first setup.py script manually. D'oh, I guess not. I thought I did (I did on all other platforms :) but I guess I didn't, 'cause it works now. Thanx. > >without 'make clean' anymore. You get a lot of undefined-symbol warnings > >(see below.) If you run 'make clean;make test' it also doesn't work, because > >the build directory is not in the Python library path, and regrtest.py > >requires (at least) the time module. > Again, be sure the tree is up to date; I think this stems from > attempting to compile the signal module as shared, which doesn't work. This happened even with completely fresh, newly checked out trees, on all but FreeBSD (three different trees: Debian woody, BSDI 4.0 and BSDI 4.1) so I'm pretty sure that's not it. It works now, though, so I guess the move from a dynamic signalmodule to a static one does the trick ;) I got 'make test' working by applying the following patch to Makefile{,.in}, and running 'make PYTHONPATH=.: test' (determining builddir by hand, for now.): *************** *** 216,223 **** TESTPYTHON= ./python$(EXE) -tt test: all -rm -f $(srcdir)/Lib/test/*.py[co] ! -PYTHONPATH= $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) ! PYTHONPATH= $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) # Install everything install: altinstall bininstall maninstall --- 216,223 ---- TESTPYTHON= ./python$(EXE) -tt test: all -rm -f $(srcdir)/Lib/test/*.py[co] ! -PYTHONPATH=$(PYTHONPATH) $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) ! PYTHONPATH=$(PYTHONPATH) $(TESTPYTHON) $(TESTPROG) $(TESTOPTS) # Install everything install: altinstall bininstall maninstall And because of that, I also noticed something funny: BSDI calls itself 'BSD/OS ', so distutils actually makes a directory called 'lib.bsd' and 'temp.bsd', with inside those a directory 'os--i386-2.1'. Is that a distutils bug, a setup.py bug, or intentional behaviour of one of the two ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas at arctrix.com Thu Jan 18 08:59:22 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 17 Jan 2001 23:59:22 -0800 Subject: [Python-Dev] new Makefile.in Message-ID: <20010117235922.A12356@glacier.fnational.com> Spurred on by comments made by Andrew, I spent some time last night overhauling the Python Makefiles. I now have a toplevel non-recursive Makefile.in that seems to work fairly well. I'm pretty sure it still should be portable. It doesn't use includes or any special GNU make features. It is half the size of the old Makefiles. The build is faster and its now easier to follow if something goes wrong. A question: is it possible to break the Python static library up? For example, instead of having libpython.a have Parser/parser.a, Objects/objects.a, etc? There would still only be one shared library. This would speed up incremental builds and also help Andrew with PEP 229. I'm thinking that the Makefile do something like this: all: python$(EXE) PYLIBS= Parser/parser.a Objects/objects.a ... Modules/modules.a python$(EXE): $(PYLIBS) $(LINKCC) -o python$(EXE) $(PYLIBS) ... Modules/modules.a: minpython$(EXE) ./minpython$(EXE) setup.py AFACT, the only thing affected by splitting up the static library is Misc/Makefile.pre.in. Is this correct? Neil From guido at digicool.com Thu Jan 18 15:52:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 09:52:23 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Thu, 18 Jan 2001 13:22:11 +1300." <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> References: <200101180022.NAA00898@s454.cosc.canterbury.ac.nz> Message-ID: <200101181452.JAA06899@cj20424-a.reston1.va.home.com> > > a < b if and only if b > a. > > This is what the rich comparison code does. > > Someone is bound to come up with a use for comparison > operator overloading in which this isn't true, just > to be difficult! They'll get what they deserve -- this will be clearly documented! --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Thu Jan 18 16:15:25 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 10:15:25 -0500 (EST) Subject: [Python-Dev] Re: bug in grammar In-Reply-To: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> References: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> Message-ID: <14951.2189.14393.52725@localhost.localdomain> If I summarize your suggestion, I think you've said that ideally the grammar should not allow assignment to list comprehensions (or a variety of other constructs) -- but it doesn't so the compiler has to deal with it. This morning it seemed a lot easier to fix the bug than it did last night :-). com_assign() already has a number of checks for syntax errors in assignments. A test for list comprehensions belongs at the same place as tests for assignment to [] and augmented assignments applied to lists. I'll include a fix for assignment to list comprehensions in my big compiler patch. Jeremy From akuchlin at mems-exchange.org Thu Jan 18 16:28:19 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 10:28:19 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com>; from esr@thyrsus.com on Thu, Jan 18, 2001 at 02:23:21AM -0500 References: <20010118022321.A9021@thyrsus.com> Message-ID: <20010118102819.A21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 02:23:21AM -0500, Eric S. Raymond wrote: >And it's not hard to notice that an object hash is a memory address. Unless the object defines __hash__()! If you want the memory address, use id() instead. --amk From akuchlin at mems-exchange.org Thu Jan 18 16:30:36 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 10:30:36 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010118004400.B17392@xs4all.nl>; from thomas@xs4all.net on Thu, Jan 18, 2001 at 12:44:00AM +0100 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> Message-ID: <20010118103036.B21503@kronos.cnri.reston.va.us> >On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: >> On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by >> setup.py. Also, SSL support for the socket module was not enabled, though >> OpenSSL is installed, in the default path. What does the layout of /usr/contrib look like? Is it /usr/contrib/openssl/include/, /usr/contrib/include/, or something else? >Strangely enough, this problem does not exist on FreeBSD. I can run 'make' >or 'make test' after 'make' just fine. 'make test' still doesn't work >because of the incorrect library path, but it doesn't barf like the other >systems (BSDI and Debian Linux) Have you already run "make install"? Perhaps it's picking up the already-installed modules when running "make test", because it really shouldn't be working. --amk From gward at cnri.reston.va.us Thu Jan 18 16:42:51 2001 From: gward at cnri.reston.va.us (Greg Ward) Date: Thu, 18 Jan 2001 10:42:51 -0500 Subject: [Python-Dev] Where's Greg Ward ? In-Reply-To: <3A6237D7.673BBB30@lemburg.com>; from mal@lemburg.com on Mon, Jan 15, 2001 at 12:35:51AM +0100 References: <3A6237D7.673BBB30@lemburg.com> Message-ID: <20010118104250.A27049@thrak.cnri.reston.va.us> On 15 January 2001, M.-A. Lemburg said: > He seems to be offline and the people on the distutils list have some > patches and other things which would be nice to have in distutils > for 2.1. Tim was right -- I'm *really* close to being back online. Just have to figure out why qmail's not answering port 25 and why LILO doesn't like my newly repartitioned hard drive, and all will be well. Oh yeah, and getting insurance, and a credit card, and unpacking all these cardboard boxes, and getting some furniture, ... (If anyone is considering it, I do *not* recommend buying a new computer, moving internationally, and getting a high speed home Internet connection all at the same time.) BTW I quite approve of Andrew being temporary Distutils dictator. Should have done it in December, but I didn't think I'd be out of commission for so long. Sigh. Greg From moshez at zadka.site.co.il Fri Jan 19 01:19:45 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 19 Jan 2001 02:19:45 +0200 (IST) Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: References: Message-ID: <20010119001945.80DC8A83E@darjeeling.zadka.site.co.il> On Thu, 18 Jan 2001 01:29:13 -0800 (PST), Ka-Ping Yee wrote: > On Tue, 16 Jan 2001, Guido van Rossum wrote: > > You mean the tp_print and tp_str function slots in type objects, > > right? tp_print *should* always render exactly the same as tp_str. > > tp_print is used by the print statement, not by value display at the > > interactive prompt. > > Uh, i hate to disagree with you about your own interpreter, but: > > com_expr_stmt in Python/compile.c > inserts a PRINT_EXPR opcode if c_interactive is true; > eval_code2 in Python/ceval.c > handles PRINT_EXPR by calling displayhook; > sys_displayhook in Python/sysmodule.c > prints the object by calling PyFile_WriteObject on sys.stdout; > PyFile_WriteObject in Objects/fileobject.c > calls PyObject_Print if the file is really a PyFileObject; > PyObject_Print in Objects/object.c > calls op->ob_type->tp_print if it's not NULL. > > The print statement produces a PRINT_ITEM opcode, which invokes > PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW > flag is propagated down to PyObject_Print and into string_print, > where it causes the string to fwrite itself directly without quoting. > > > So, string_print most definitely should *not* be changed -- only > > string_repr! > > I had to change them both before i actually saw the change in the > interactive interpreter. Actually, your statement above (that the > two should always render the same) seems to imply that if i change > one, i must also change the other. > > > -- ?!ng > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > > -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido at digicool.com Thu Jan 18 17:23:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:23:19 -0500 Subject: [Python-Dev] unistr() vs. unicode() Message-ID: <200101181623.LAA07389@cj20424-a.reston1.va.home.com> Ping wrote in response to a SourceForge mail about MAL's unistr() checking: ------- Forwarded Message Date: Wed, 17 Jan 2001 23:51:48 -0800 From: Ka-Ping Yee To: noreply at sourceforge.net cc: mal at lemburg.com, guido at python.org, patches at python.org Subject: Re: [Patches] [Patch #101664] Add new unistr() builtin + PyObject_Unic ode() C API On Wed, 17 Jan 2001 noreply at sourceforge.net wrote: > Comment: > This patch adds a utility function unistr() which works just like > the standard builtin str() -- only that the return value will > always be a Unicode object. Sorry for barging in, but i have an issue/question: Why are unistr() and unicode() two separate functions? str() performs one task: convert to string. It can convert anything, including strings or Unicode strings, numbers, instances, etc. The other type-named functions e.g. int(), long(), float(), list(), tuple() are similar in intent. Why have unicode() just for converting strings to Unicode strings, and unistr() for converting everything else to a Unicode string? What does unistr(x) do differently from unicode(x) if x is a string? - -- ?!ng ------- End of Forwarded Message (And no, Tim, this did *not* end up in the patches list because I made Barry remove the reply-to. SourceForge mails never had reply-to to begin with.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jan 18 17:28:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:28:12 -0500 Subject: [Python-Dev] urllib.urlencode & repeated values In-Reply-To: Your message of "Wed, 17 Jan 2001 22:20:30 CST." <14950.28430.572215.10643@beluga.mojam.com> References: <14950.28430.572215.10643@beluga.mojam.com> Message-ID: <200101181628.LAA07406@cj20424-a.reston1.va.home.com> > I'm pretty sure this has come up before, but urllib.urlencode doesn't handle > repeated parameters properly. If I call > > urllib.urlencode({"performers": ("U2","Lawrence Martin")}) > > instead of getting > > performers=U2&performers=Lawrence+Martin > > I get a quoted stringified tuple: > > performers=%28%27U2%27%2c+%27Lawrence+Martin%27%29 > > Obviously, fixing this will change the function's current semantics, but I > think it's worth treating lists and tuples (actually, any sequence) as > repeated values. If the existing semantics are deemed valuable enough, a > third default parameter could be added to switch on the new behavior when > desired. > > If others agree I'd be happy to whip up a patch. I think it's a bug. Agreed. If you can come up with something that supports all sequence types, and treats singleton sequences the same as their one and only item, it would even be the inverse of cgi.parse_qs()! --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Thu Jan 18 17:43:49 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 18 Jan 2001 17:43:49 +0100 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: ; from ping@lfw.org on Thu, Jan 18, 2001 at 02:14:19AM -0800 References: <3A66BCCC.14997FE3@lemburg.com> Message-ID: <20010118174349.E17392@xs4all.nl> On Thu, Jan 18, 2001 at 02:14:19AM -0800, Ka-Ping Yee wrote: > Wow!! __complex__ can produce a segfault! > >>> complex > > >>> class Foo: > ... def __complex__(self): return 3 > ... > >>> Foo() > <__main__.Foo instance at 0x81e8684> > >>> f = _ > >>> complex(f) > Segmentation fault (core dumped) > This happens because builtin_complex first retrieves and saves > the PyNumberMethods of the argument (in this case, from the > instance), then tries to call __complex__ (in this case, returning 3), > and THEN coerces the result using nbr->nb_float if the result is > not complex! (This calls the instance's nb_float method on the > integer object 3!!) I've noticed that lurking bug in the coercion code when I added augmented assignment, though I don't recall whether I fixed it then, nor do I know if that part's been "touched" by the recent coercion changes. If none of the coercion champions speak up, I'll look at this sometime this weekend. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Thu Jan 18 17:50:28 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 11:50:28 -0500 Subject: [Python-Dev] SSL detection problem In-Reply-To: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 11:39:30AM +0100 References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> Message-ID: <20010118115028.D21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 11:39:30AM +0100, Martin v. Loewis wrote: >The problem is that these header files are in /usr/include/openssl, >which is not in the standard include search path. I have an improved version of setup.py (not checked in yet) that tries to do better, checking for both header and library files. One point: the OpenSSL docs imply that the headers should be loaded as , not as ; the header files themselves use the openssl/*.h form, which means you'd need two -I directives.. I'll patch the socket module accordingly. >The not-so-obvious question: How can one work-around such a problem >with the new setup scheme? In the old scheme, I could have chosen to >either provide the right -I option in Modules/Setup, to disable SSL >support, or to disable the _socket module altogether. How can I >achieve either configuration with the new scheme? I still need to implement command-line options to specify such overrides, but that couldn't possibly get done in time for alpha1. I was thinking of something like ---libs="foo bar", ---includes="/usr/include/blah/", and so forth. Suggestions for a better interface welcomed... --amk From guido at digicool.com Thu Jan 18 17:55:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 11:55:39 -0500 Subject: [Python-Dev] bug in grammar In-Reply-To: Your message of "Wed, 17 Jan 2001 21:58:19 EST." <14950.23499.275398.963621@localhost.localdomain> References: <14950.23499.275398.963621@localhost.localdomain> Message-ID: <200101181655.LAA08001@cj20424-a.reston1.va.home.com> > As part of the implementation of PEP 227 (and in an attempt to reach > some low-hanging fruit Guido mentioned on the types-sig long ago), I > have been working on a compiler pass that generates a module-level > symbol table. I recently discovered a bug in the handling of list > comprehensions that was giving me headaches. > > I realize now that the problem is with the current grammar and/or > compiler. Here's a simple demonstration; try it in your friendly > python 2.0 interpreter. > > >>> [i for i in range(10)] = (1, 2, 3) > Traceback (most recent call last): > File "", line 1, in ? > ValueError: unpack list of wrong size > > The generated bytecode is: > > 0 SET_LINENO 0 > > 3 SET_LINENO 1 > 6 LOAD_CONST 0 (1) > 9 LOAD_CONST 1 (2) > 12 LOAD_CONST 2 (3) > 15 BUILD_TUPLE 3 > 18 UNPACK_SEQUENCE 1 > 21 STORE_NAME 0 (i) > 24 LOAD_CONST 3 (None) > 27 RETURN_VALUE > > I assume this isn't intended :-). The compiler is ignoring everything > after the initial atom in the list comprehension. It's basically > compiling the code as if it were: > > [i] = (1, 2, 3) > > I'm not sure how to try and fix this. Should the grammar allow one to > construct the example statement above? If not, I'm not sure how to > fix the grammar. If not, I suppose the compiler should detect that > the list comp is misplaced. This seems fairly messy, since there are > about 10 nodes between the expr_stmt and the list_for. > > Or is this a cool way to use list comprehensions to generate > ValueErrors? Good catch! Not everything cool deserves to be preserved. It looks like this happens because the code that traverses lists on the left-hand side of an assignment was never told about list comprehensions. You're right that the grammar can't be fixed; it's for the same reason that it can't be fixed to disallow "f() = 1". The solution is to add a test for this to the compiler that flags this as an error. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jan 18 18:01:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 12:01:02 -0500 Subject: [Python-Dev] Embedded language discussion In-Reply-To: Your message of "Thu, 18 Jan 2001 00:19:31 EST." <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> References: <200101180519.AAA00612@207-172-111-227.s227.tnt1.ann.va.dialup.rcn.com> Message-ID: <200101181701.MAA08046@cj20424-a.reston1.va.home.com> > http://www.kuro5hin.org/?op=displaystory;sid=2001/1/16/11334/2280 > > The poster is on a project that's trying to use Python, but they're > encountering unspecified problems (perhaps because of the global > interpreter lock). I've sent the poster an email asking to be more specific about his questions; probably doing the right dance when calling Python from a thread created in C++ should do the trick. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jan 18 18:04:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 12:04:43 -0500 Subject: [Python-Dev] Strings: '\012' -> '\n' In-Reply-To: Your message of "Thu, 18 Jan 2001 01:29:13 PST." References: Message-ID: <200101181704.MAA08074@cj20424-a.reston1.va.home.com> > On Tue, 16 Jan 2001, Guido van Rossum wrote: > > You mean the tp_print and tp_str function slots in type objects, > > right? tp_print *should* always render exactly the same as tp_str. > > tp_print is used by the print statement, not by value display at the > > interactive prompt. > > Uh, i hate to disagree with you about your own interpreter, but: > > com_expr_stmt in Python/compile.c > inserts a PRINT_EXPR opcode if c_interactive is true; > eval_code2 in Python/ceval.c > handles PRINT_EXPR by calling displayhook; > sys_displayhook in Python/sysmodule.c > prints the object by calling PyFile_WriteObject on sys.stdout; > PyFile_WriteObject in Objects/fileobject.c > calls PyObject_Print if the file is really a PyFileObject; > PyObject_Print in Objects/object.c > calls op->ob_type->tp_print if it's not NULL. > > The print statement produces a PRINT_ITEM opcode, which invokes > PyFile_WriteObject with a Py_PRINT_RAW flag. That Py_PRINT_RAW > flag is propagated down to PyObject_Print and into string_print, > where it causes the string to fwrite itself directly without quoting. > > > So, string_print most definitely should *not* be changed -- only > > string_repr! > > I had to change them both before i actually saw the change in the > interactive interpreter. Actually, your statement above (that the > two should always render the same) seems to imply that if i change > one, i must also change the other. Oops. I'm so grateful that we have a collective memory! :-) You're right: tp_print() can be invoked in two modes: with or without Py_PRINT_RAW flag. In raw mode, it should behave exactly like str(); in cooked mode exactly like repr(). --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at mira.cs.tu-berlin.de Thu Jan 18 20:31:29 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 20:31:29 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? Message-ID: <200101181931.f0IJVTc00932@mira.informatik.hu-berlin.de> > Comments? Yes, three of them: 1. To guarantee uniqueness atleast within the process, the easiest solution would be if using_threads: import thread lock=thread.allocate_lock() _acquire = lock.acquire_lock _release = lock.release_lock else: _acquire = _release = lambda:None _cookie = time.time() def getCookie(): global _cookie _acquire() _cookie+=1 result = _cookie _release() return result 2. Invoking [] repeatedly likely returns the an object with the same id() when called twice in a row (i.e. with no intermediate objects allocated in-between). 3. Why did you send this question to python-dev? python-list is more appropriate. Regards, Martin From tim.one at home.com Thu Jan 18 20:49:12 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 14:49:12 -0500 Subject: [Python-Dev] Windows Python totally rad In-Reply-To: <3A66B846.3D24B959@lemburg.com> Message-ID: [MAL] > Could you tell me which these are [new test failures on Windows]? > The tests tested all passed just fine, so I guess these must be > Windows-related problems. Not to worry, all the tests pass now. Don't want to spend time backtracking, as I'm not the one who fixed them and don't know who did. FWIW, they "smelled like" shallow failures (== easy to diagnose & fix). onward!-ly y'rs - tim From martin at mira.cs.tu-berlin.de Thu Jan 18 20:37:04 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 18 Jan 2001 20:37:04 +0100 Subject: [Python-Dev] new Makefile.in Message-ID: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de> > A question: is it possible to break the Python static library up? > For example, instead of having libpython.a have > Parser/parser.a, Objects/objects.a, etc? Please, no. It was that way in Python 1.4 (libModules, libObjects, and I forgot which the others were :-). We had that all documented in our book, then Guido tried to build an extension module for the first time, saw that these many libraries were terrible, and combined them into a single one. That was a good thing, and we have it documented in our book. I'm not at all looking forward to answering all the questions why the build infrastructure of Python changed yet again... Regards, Martin From fdrake at acm.org Thu Jan 18 21:22:30 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 18 Jan 2001 15:22:30 -0500 (EST) Subject: [Python-Dev] weak references in 2.1alpha Message-ID: <14951.20614.176140.672447@cj42289-a.reston1.va.home.com> I'd like to put the weak references patch into the alpha, but haven't received any feedback on the latest patch. I have some comments from Martin von L?wis on the PEP that need to be addressed, and that could change the implementation a bit, but the basic machinery seems to be pretty reasonable and works for me. Does anyone have any objections to it going into the alpha? I'd like to enable more wide-spread testing. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at lemburg.com Thu Jan 18 18:10:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 18:10:14 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? References: <20010118022321.A9021@thyrsus.com> Message-ID: <3A672376.4B951848@lemburg.com> "Eric S. Raymond" wrote: > > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. > > Because Python garbage-collects, hash() of a just-created object isn't > good enough. Because we may be threading, millisecond time isn't > good enough. Because we may *not* be threading, thread ID isn't good > either. > > On the other hand, I'm on Linux getting millisecond time resolution. > And it's not hard to notice that an object hash is a memory address. > > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. > > Comments? A combination of time.time(), process id and counter should work in all cases. Make sure you use a lock around the counter, though. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Jan 18 18:30:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jan 2001 18:30:52 +0100 Subject: [Python-Dev] Standard install locations for Python ? References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> Message-ID: <3A67284C.B6C617A@lemburg.com> "Martin v. Loewis" wrote: > > > Where should Python extensions install themselves and their docs? > > I feel that extensions should not need to care. For extensions, > distutils will pick a location, and the system administrator > configuration the package can chose a different location. > > Unfortunately, distutils does not support the installation of > documentation, which I think it should. Right. > Now switching sides, as an administrator, I'd wish distutils to follow > the system conventions by default. > > That means on Linux, documentation should go into the system's > directory, which is /usr/share/doc according to latest > standards. Distributions vary, so distutils should find out - e.g. by > querying the location from rpm. In addition, when building RPMs, > distutils should declare these files as %doc in the spec file, so RPM > will install it following the system conventions. You currently have to do this by hand (e.g. in setup.cfg or using the doc_files option). It should fairly easy to add a command similar to install_data though which then applies all the necessary magic to the paths. If there a common landmark to look for on Unix (e.g. in case the system does not use RPM) ? Which paths should distutils check ? (/usr/share/doc/packages, /usr/share/doc, /usr/doc/packages, /usr/doc in that order ?) > On Windows, the convention apparently is to put the documentation > "nearby" the software, so it should probably go into Doc or a > subdirectory thereof. Na, I'd rather have \Python\Site-Packages and \Python\Site-Docs for that purpose. > On Unix, there appears to be no standard location, unless the > documentation consists of man pages or perhaps info files. So > /share/doc is probably a place as good as any other. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Thu Jan 18 18:45:29 2001 From: skip at mojam.com (Skip Montanaro) Date: Thu, 18 Jan 2001 11:45:29 -0600 (CST) Subject: [Python-Dev] urllib.urlencode & repeated values In-Reply-To: <200101181628.LAA07406@cj20424-a.reston1.va.home.com> References: <14950.28430.572215.10643@beluga.mojam.com> <200101181628.LAA07406@cj20424-a.reston1.va.home.com> Message-ID: <14951.11193.150232.564700@beluga.mojam.com> >> If others agree I'd be happy to whip up a patch. I think it's a bug. Guido> Agreed. Patch #103314: http://sourceforge.net/patch/?func=detailpatch&patch_id=103314&group_id=5470 I assigned it to Fred for doc review. Skip From akuchlin at mems-exchange.org Thu Jan 18 19:56:40 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 18 Jan 2001 13:56:40 -0500 Subject: [Python-Dev] Standard install locations for Python ? In-Reply-To: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 01:20:06PM +0100 References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> Message-ID: <20010118135640.G21503@kronos.cnri.reston.va.us> On Thu, Jan 18, 2001 at 01:20:06PM +0100, Martin v. Loewis wrote: >On Unix, there appears to be no standard location, unless the >documentation consists of man pages or perhaps info files. So >/share/doc is probably a place as good as any other. This seems like a good suggestion. Should docs go in /share/doc/python/, then? Perhaps with subdirectories for different extensions? --amk From tismer at tismer.com Thu Jan 18 22:39:18 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 18 Jan 2001 22:39:18 +0100 Subject: [Python-Dev] Rich comparison confusion References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> Message-ID: <3A676286.C33823B4@tismer.com> Guido van Rossum wrote: > > > I'm a bit confused about Guido's rich comparison stuff. In the description > > he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. > > Yes. By this I mean that AA are interchangeable, ditto for > A<=B and B>=A. Also A==B interchanges for B==A, and A!=B for B!=A. ... > I think what threw you off was the ambiguity of "inverse". This means > Boolean negation. I'm not relying on Boolean negation here -- I'm > relying on the more fundamental property that aa have the > same outcome. Yes, the "inverse" is confusing. Is what you mean the "reverse" ? Like the other right-side operators __radd__, is it correct to think of __ge__ == __rle__ if __rle__ was written in the same fashion like __radd__ ? It looks semantically the same, although the reason for a call might be different. And if my above view is right, would it perhaps be less confusing to use in fact __rle__ and __rlt__, or woudl it be more confusing, since __rlt__ would also be invoked left-to-right, implementing ">". Not shure if I added even more confusion. -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim.one at home.com Thu Jan 18 22:53:44 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 16:53:44 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com> Message-ID: [Eric S. Raymond, in search of uniqueness] > ... > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because > another thread would have to create an object at the same memory > address during the same millisecond to collide. I'm afraid it's much more vulnerable than that: Python's thread granularity is at the bytecode level, not the statement level. It's very easy for thread A and B to see the same `time.time()` value, and after that arbitrarily long amounts of time may pass before they get around to doing the hash([]) business. When hash() completes, the storage for [] is immediately reclaimed under CPython, and it's again very easy for another thread to reuse the storage. I'm attaching an executable test case. It uses time.clock() because that has much higher resolution than time.time() on Windows (better than microsecond), but rounds it back to three decimal places to simulate millisecond resolution. The first three runs: saw 14600 unique in 30000 total saw 14597 unique in 30000 total saw 14645 unique in 30000 total So it sucks bigtime on my box. Better idea: borrow the _ThreadSafeCounter class from the tail end of the current CVS tempfile.py. The code works whether or not threads are available. Then `time.time()` + str(_counter.get_next()) is thread-safe. For that matter, plain old str(_counter.get_next()) will always be unique within a single run. However, in either case you're still not safe against concurrent *processes* generating the same cookies. tempfile.py has to worry about that too, of course, so the *best* idea is to call tempfile.mktemp() and leave it at that. It wastes some time checking the filesystem for a file of the same name (which, btw, goes much quicker on Linux than on Windows). From tismer at tismer.com Thu Jan 18 22:56:08 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 18 Jan 2001 22:56:08 +0100 Subject: [Python-Dev] Weird use of hash() -- will this work? References: <20010118022321.A9021@thyrsus.com> Message-ID: <3A676678.7E4AF278@tismer.com> "Eric S. Raymond" wrote: > > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. What do you mean by "unique"? Unique regarding your long-running server? If so, then I wonder why one should do > > So, how about `time.time()` + hex(hash([]))? > instead of using a single, simple counter for all sessions? > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. > > Comments? If I'm not overlooking something fundamental, the counter approach seems to be simpler and most portable. :-) but-sometimes-my-brain-malfunctions-badly-ly y'rs - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From nas at arctrix.com Thu Jan 18 16:07:13 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 07:07:13 -0800 Subject: [Python-Dev] Re: new Makefile.in In-Reply-To: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 18, 2001 at 08:37:04PM +0100 References: <200101181937.f0IJb4N00997@mira.informatik.hu-berlin.de> Message-ID: <20010118070713.A13581@glacier.fnational.com> On Thu, Jan 18, 2001 at 08:37:04PM +0100, Martin v. Loewis wrote: > > A question: is it possible to break the Python static library up? > > For example, instead of having libpython.a have > > Parser/parser.a, Objects/objects.a, etc? > > Please, no. Okay. > I'm not at all looking forward to answering all the questions > why the build infrastructure of Python changed yet again... My Makefile patch shouldn't change the way you build extensions. Neil From tim.one at home.com Fri Jan 19 02:45:42 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 20:45:42 -0500 Subject: [Python-Dev] unistr() vs. unicode() In-Reply-To: <200101181623.LAA07389@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > (And no, Tim, this did *not* end up in the patches list because I made > Barry remove the reply-to. SourceForge mails never had reply-to to > begin with.) Aha! Another thing to blame Barry for . From tim.one at home.com Thu Jan 18 23:11:23 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:11:23 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: <008701c0811a$b3371c00$e46940d5@hagrid> Message-ID: [/F] > python build problems and real life got in the way. > > will 2.1a1 be released according to plan? will there > be a 2.1a2 release? maybe I should postpone this? Depends on how confident you are. Since this is purely an optimization, I don't think it *needs* to get into a1 in order to make the final release; postponing a few days would be better than pushing too hard on something that's proved hairier than anticipated. do-the-right-thing-whatever-that-is-ly y'rs - tim From guido at digicool.com Fri Jan 19 03:17:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 21:17:36 -0500 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Thu, 18 Jan 2001 02:14:19 PST." References: Message-ID: <200101190217.VAA01497@cj20424-a.reston1.va.home.com> > I hope you don't mind that i'm taking this over to python-dev, > because it led me to discover a more general issue (see below). No -- in fact I wanted to see this here! (My mail backlog seems to be clearing -- or maybe it was only a temporary unclogging... :-) > For the others on python-dev, here's the background: MAL was > about to check in the unistr() function, described as follows: > > > This patch adds a utility function unistr() which works just like > > the standard builtin str() -- only that the return value will > > always be a Unicode object. > > > > The patch also adds a new object level C API PyObject_Unicode() > > which complements PyObject_Str(). > > I responded: > > Why are unistr() and unicode() two separate functions? > > > > str() performs one task: convert to string. It can convert anything, > > including strings or Unicode strings, numbers, instances, etc. > > > > The other type-named functions e.g. int(), long(), float(), list(), > > tuple() are similar in intent. > > > > Why have unicode() just for converting strings to Unicode strings, > > and unistr() for converting everything else to a Unicode string? > > What does unistr(x) do differently from unicode(x) if x is a string? > > MAL responded: > > unistr() is meant to complement str() very closely. unicode() > > works as constructor for Unicode objects which can also take > > care of decoding encoded data. str() and unistr() don't provide > > this capability but instead always assume the default encoding. > > > > There's also a subtle difference in that str() and unistr() > > try the tp_str slot which unicode() doesn't. unicode() > > supports any character buffer which str() and unistr() don't. > > Okay, given this explanation, i still feel fairly confident > that unicode() should subsume unistr(). Many of the other > type-named functions try various slots: > > int() looks for __int__ > float() looks for __float__ > long() looks for __long__ > str() looks for __str__ > > In testing this i also discovered the following: > > >>> class Foo: > ... def __int__(self): > ... return 3 > ... > >>> f = Foo() > >>> int(f) > 3 > >>> long(f) > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Foo instance has no attribute '__long__' > >>> float(f) > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Foo instance has no attribute '__float__' > > This is kind of surprising. How about: > > int() looks for __int__ > float() looks for __float__, then tries __int__ > long() looks for __long__, then tries __int__ > str() looks for __str__ > unicode() looks for __unicode__, then tries __str__ For the numeric types this could perhaps be done by calling PyNumber_Long() from PyNumber_Float(), calling PyNumber_Int() from PyNumber_Long(). Complex is a bit of an exception -- there's no PyNumber_Complex(), just because I felt that nobody would need it. :-) > The extra parameter to unicode() is very similar to the extra > parameter to int(), so i think there is a natural parallel here. Makes sense. > Hmm... what about the other types? > > Wow!! __complex__ can produce a segfault! > > >>> complex > > >>> class Foo: > ... def __complex__(self): return 3 > ... > >>> Foo() > <__main__.Foo instance at 0x81e8684> > >>> f = _ > >>> complex(f) > Segmentation fault (core dumped) > > This happens because builtin_complex first retrieves and saves > the PyNumberMethods of the argument (in this case, from the > instance), then tries to call __complex__ (in this case, returning 3), > and THEN coerces the result using nbr->nb_float if the result is > not complex! (This calls the instance's nb_float method on the > integer object 3!!) Thanks! Fixed now in CVS. > I think __complex__ should probably look for __complex__, then > __float__, then __int__. I make it call PyNumber_Float(), which could be made smarter as explained above. > One could argue for __list__, __tuple__, or __dict__, but that > seems much weaker; the Pythonic way has always been to implement > __getitem__ instead. Yes -- since __list__ etc. aren't used, let's not add them. > There is no built-in dict(); if it existed > i suppose it would do the opposite of x.items(); again a weak > argument, though i might have found such a function useful once > or twice. Yeah, it's not very common. Dict comprehensions anyone? d = {k:v for k,v in zip(range(10), range(10))} # :-) > And that about covers the built-in types for data. Thanks! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu Jan 18 23:13:14 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:13:14 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <20010118022321.A9021@thyrsus.com> Message-ID: BTW, why doesn't hash([]) blow up in 2.1a1? In 2.0 it raised TypeError: unhashable type Did someone change this deliberately? From tim.one at home.com Thu Jan 18 23:58:22 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 17:58:22 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Message-ID: [Tim whined] > BTW, why doesn't hash([]) blow up in 2.1a1? In 2.0 it raised > > TypeError: unhashable type > > Did someone change this deliberately? Answer: it's an unintended consequence of the rich-comparison changes. Guido knows how to fix it and probably will. The list type grew a tp_richcompare slot but lost its non-NULL tp_compare pointer. PyObject_Hash wasn't changed accordingly (it now believes lists support neither direct hashing nor comparison, so does them a favor and hashes their memory addresses). Something trickier is probably going wrong elsewhere too, but I won't try to remember what that is unless Guido gets hit by a bus tonight. in-which-case-we-can-push-off-the-funeral-until-after-the-release-ly y'rs - tim From thomas at xs4all.net Fri Jan 19 00:02:09 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 00:02:09 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010118103036.B21503@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 18, 2001 at 10:30:36AM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> Message-ID: <20010119000209.F17392@xs4all.nl> On Thu, Jan 18, 2001 at 10:30:36AM -0500, Andrew Kuchling wrote: > >On Wed, Jan 17, 2001 at 11:49:25PM +0100, Thomas Wouters wrote: > >> On BSDI, readline sits in /usr/local or /usr/contrib, and isn't detected by > >> setup.py. Also, SSL support for the socket module was not enabled, though > >> OpenSSL is installed, in the default path. > What does the layout of /usr/contrib look like? Is it > /usr/contrib/openssl/include/, /usr/contrib/include/, or something > else? Actually, it's /usr/local, not /usr/contrib. I've never installed OpenSSL in /usr/contrib, though I could, and maybe BSDI will, in the future. (BSDI installs its own software in /usr, and optional free, pre-compiled software in /usr/contrib.) OpenSSL installs into /usr/local/ssl/include/openssl by default, and installing into /usr/contrib would make it /usr/contrib/ssl/include/openssl. > >Strangely enough, this problem does not exist on FreeBSD. I can run 'make' > >or 'make test' after 'make' just fine. 'make test' still doesn't work > >because of the incorrect library path, but it doesn't barf like the other > >systems (BSDI and Debian Linux) > Have you already run "make install"? Perhaps it's picking up the > already-installed modules when running "make test", because it really > shouldn't be working. Hm, I think you misread my statement. 'make test' *doesn't* work. But it doesn't barf on the signal module being built dynamically either. You fixed that for every platform now, I was just pointing out that this was not a problem for FreeBSD for some reason. 'make test' still doesn't work, but I can make it work by specifying a hand-tweaked PYTHONPATH that includes the OS/arch-dependant build directory. This brings me to another point: how can 'make test' work at all ? Does python always check for './Lib' (and './Modules') for modules ? If that's specific for 'make test' and running python in the source distribution, that sounds like a bit of a weird hack. I can't find any such hackery in the source, but I also can't figure out how else it's working :) More-later--Meteor-((c)-1979)-is-on-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at mira.cs.tu-berlin.de Fri Jan 19 00:14:05 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 19 Jan 2001 00:14:05 +0100 Subject: [Python-Dev] weak references in 2.1alpha Message-ID: <200101182314.f0INE5B00338@mira.informatik.hu-berlin.de> > Does anyone have any objections to it going into the alpha? I'd like to request that the .clear() method is removed from the patch for this alpha, and also that the weak dictionaries are removed until their semantics is clarified. It's always easier to add stuff later than to remove it. Regards, Martin From nas at arctrix.com Thu Jan 18 17:31:09 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 08:31:09 -0800 Subject: [Python-Dev] SSL detection problem In-Reply-To: <20010118115028.D21503@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Jan 18, 2001 at 11:50:28AM -0500 References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> <20010118115028.D21503@kronos.cnri.reston.va.us> Message-ID: <20010118083109.A13972@glacier.fnational.com> On Thu, Jan 18, 2001 at 11:50:28AM -0500, Andrew Kuchling wrote: > On Thu, Jan 18, 2001 at 11:39:30AM +0100, Martin v. Loewis wrote: > >The not-so-obvious question: How can one work-around such a problem > >with the new setup scheme? > > I still need to implement command-line options to specify such > overrides, but that couldn't possibly get done in time for alpha1. My non-recursive makefile patch allows you to use both Setup and setup.py. Its not quite really for prime time but its getting close. I would be interested if someone could point me to the source for some crappy makes. I've tried GNU make, BSD 4.4 pmake and whatever comes with SunOS 5.6. Searching for "make" doesn't work too well. :-( Neil From thomas at xs4all.net Fri Jan 19 00:45:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 00:45:32 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Thu, Jan 18, 2001 at 08:46:54AM -0800 References: Message-ID: <20010119004532.G17392@xs4all.nl> On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > filename = '/tmp/delete_me' This reminds me: we need a portable way to handle test-files :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 00:56:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 18:56:04 -0500 Subject: [Python-Dev] new Makefile.in In-Reply-To: Your message of "Wed, 17 Jan 2001 23:59:22 PST." <20010117235922.A12356@glacier.fnational.com> References: <20010117235922.A12356@glacier.fnational.com> Message-ID: <200101182356.SAA19616@cj20424-a.reston1.va.home.com> Hi Neil, My mail suffers delays of 12-24 hours while mail.python.org is working on some enormous backlog. So I just saw your message about a new Makefile... > Spurred on by comments made by Andrew, I spent some time last > night overhauling the Python Makefiles. I now have a toplevel > non-recursive Makefile.in that seems to work fairly well. I'm > pretty sure it still should be portable. It doesn't use includes > or any special GNU make features. It is half the size of the old > Makefiles. The build is faster and its now easier to follow if > something goes wrong. I'd like to see this! > A question: is it possible to break the Python static library up? > For example, instead of having libpython.a have > Parser/parser.a, Objects/objects.a, etc? There > would still only be one shared library. This would speed up > incremental builds and also help Andrew with PEP 229. I'm > thinking that the Makefile do something like this: > > all: python$(EXE) > > PYLIBS= Parser/parser.a Objects/objects.a ... Modules/modules.a > > python$(EXE): $(PYLIBS) > $(LINKCC) -o python$(EXE) $(PYLIBS) ... > > Modules/modules.a: minpython$(EXE) > ./minpython$(EXE) setup.py Sounds cool to me. (Where's the patch for a shared libpython???) > AFACT, the only thing affected by splitting up the static library > is Misc/Makefile.pre.in. Is this correct? Yeah, and that should be phased out in favor of distutils anyway. Now would be a great time! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 01:34:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:34:02 -0500 Subject: [Python-Dev] Mail delays and SourceForge bugs Message-ID: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Through no fault of my own, email to guido at python.org (which includes the python-dev list) is currently suffering delays of 12-24 hours. I have a feeling this is probably true for all mail going through python.org, so checkin messages ans python-dev discussion have been greatly frustrated, with about 1 day to go until the planned 2.1a1 release date! On top of that, the SourceForge bug manager has developed a problem: all references to http://sourceforge.net/bugs/?group_id=5470/ come back with this error: An error occured in the logger. ERROR: pg_atoi: error in "5470/": can't parse "/" I'm still hoping to release Python 2.1a1 tomorrow, unless Jeremy tells me that he needs more time for his nested scopes patch. In the mean time, please everybody, do check out the latest CVS version and give it a good workout! Andrew's setup.py still has some rough edges, I believe that in order to run it from the build directory you still have to point PYTHONPATH to the build/lib* directory, where he hides the shared libraries for all modules. Andrew, are you planning to fix this? If there's anything that you need me to know about, please mail to guido at digicool.com -- that address suffers no delays. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri Jan 19 01:51:19 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 19:51:19 -0500 Subject: [Python-Dev] RE: [Pycabal] Mail delays and SourceForge bugs In-Reply-To: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Message-ID: [Guido. notes current woes w/ python.org email, and SourceForge] Note too that, over the past two days, it's not possible to follow Python-Dev email via http://mail.python.org/pipermail/python-dev/2001-January/date.html either, as (unlike during previous occurrences of python.org email delays) msgs aren't showing up there in a timely fashion either (for example, the msg of Guido's to which I'm replying isn't there). good-thing-guido's-so-easy-to-channel-ly y'rs - tim From guido at digicool.com Fri Jan 19 01:52:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:52:02 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Your message of "Thu, 18 Jan 2001 02:23:21 EST." <20010118022321.A9021@thyrsus.com> References: <20010118022321.A9021@thyrsus.com> Message-ID: <200101190052.TAA26849@cj20424-a.reston1.va.home.com> > So I'm writing a module to that needs to generate unique cookies. The > module will run inside one of two environments: (1) a trivial test wrapper, > not threaded, and (2) a lomg-running multithreaded server. > > Because Python garbage-collects, hash() of a just-created object isn't > good enough. Because we may be threading, millisecond time isn't > good enough. Because we may *not* be threading, thread ID isn't good > either. > > On the other hand, I'm on Linux getting millisecond time resolution. > And it's not hard to notice that an object hash is a memory address. > > So, how about `time.time()` + hex(hash([]))? > > It looks to me like this will remain unique forever, because another thread > would have to create an object at the same memory address during the same > millisecond to collide. > > Furthermore, it looks to me like this hack might be portable to any OS > with a clock tick shorter than its timeslice. Argh! hash([]) should raise TypeError, since lists are not hashable objects -- mutable objects can't be allowed as dictionary keys. This (hash([]) accidentally returned a value for a brief period after I checked in the rich comparisons -- I've fixed that now. But not to worry: instead of using hash([]), you can use hex(id([])). Same thing. On the other hand, remember how much you can do in a millisecond! (E.g. I can call tempfile.mktemp() 5 times in that time.) And when you create an object and immediately delete it, the next object created is very likely to have the same address. But what's wrong with this: try: from thread import get_ident as unique_id else: def unique_id(): return id([]) --Guido van Rossum (home page: http://www.python.org/~guido/) From billtut at microsoft.com Fri Jan 19 01:53:15 2001 From: billtut at microsoft.com (Bill Tutt) Date: Thu, 18 Jan 2001 16:53:15 -0800 Subject: [Python-Dev] MS CRT crashing: Message-ID: <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> From guido at digicool.com Fri Jan 19 01:53:13 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:53:13 -0500 Subject: [Python-Dev] 2.1 alpha: what about the unicode name database? In-Reply-To: Your message of "Thu, 18 Jan 2001 07:48:41 +0100." <008701c0811a$b3371c00$e46940d5@hagrid> References: <012901c080a5$306023a0$e46940d5@hagrid> <008701c0811a$b3371c00$e46940d5@hagrid> Message-ID: <200101190053.TAA26862@cj20424-a.reston1.va.home.com> > I wrote: > > I've almost sorted it all out. will check it in later tonight (local > > time). > > python build problems and real life got in the way. What? You've got a real life? Can't be allowed, not when we're working on a release! > will 2.1a1 be released according to plan? will there > be a 2.1a2 release? maybe I should postpone this? Please check it in, there's still time (2.1a1 won't go out before Friday night, possibly it'll be delayed until Monday). And yes, there will be a 2.1a2. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 01:55:15 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:55:15 -0500 Subject: [Python-Dev] SSL detection problem In-Reply-To: Your message of "Thu, 18 Jan 2001 11:39:30 +0100." <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> References: <200101181039.f0IAdUT09947@mira.informatik.hu-berlin.de> Message-ID: <200101190055.TAA26905@cj20424-a.reston1.va.home.com> > The distutils-based configuration fails to build on my system (SuSE > 7.0) with the error > > /usr/src/python/Modules/socketmodule.c:159: rsa.h: Datei oder Verzeichnis nicht > gefunden > /usr/src/python/Modules/socketmodule.c:160: crypto.h: Datei oder Verzeichnis nicht gefunden > /usr/src/python/Modules/socketmodule.c:161: x509.h: Datei oder Verzeichnis nicht gefunden > /usr/src/python/Modules/socketmodule.c:162: pem.h: Datei oder Verzeichnis nicht > gefunden > /usr/src/python/Modules/socketmodule.c:163: ssl.h: Datei oder Verzeichnis nicht > gefunden The same happened to Fred on Mandrake 7.0 (except for the German messages :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 01:58:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 19:58:16 -0500 Subject: [Python-Dev] Re: unistr() vs. unicode() Message-ID: <200101190058.TAA26931@cj20424-a.reston1.va.home.com> MAL's reply to Ping in this thread. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Thu, 18 Jan 2001 10:52:12 +0100 From: "M.-A. Lemburg" To: Ka-Ping Yee cc: guido at python.org, patches at python.org Subject: Re: [Patches] [Patch #101664] Add new unistr() builtin + PyObject_Unic ode()C API Ka-Ping Yee wrote: > > On Wed, 17 Jan 2001 noreply at sourceforge.net wrote: > > Comment: > > This patch adds a utility function unistr() which works just like > > the standard builtin str() -- only that the return value will > > always be a Unicode object. > > Sorry for barging in, but i have an issue/question: > > Why are unistr() and unicode() two separate functions? > > str() performs one task: convert to string. It can convert anything, > including strings or Unicode strings, numbers, instances, etc. > > The other type-named functions e.g. int(), long(), float(), list(), > tuple() are similar in intent. > > Why have unicode() just for converting strings to Unicode strings, > and unistr() for converting everything else to a Unicode string? > What does unistr(x) do differently from unicode(x) if x is a string? unistr() is meant to complement str() very closely. unicode() works as constructor for Unicode objects which can also take care of decoding encoded data. str() and unistr() don't provide this capability but instead always assume the default encoding. There's also a subtle difference in that str() and unistr() try the tp_str slot which unicode() doesn't. unicode() supports any character buffer which str() and unistr() don't. Perhaps you are right though in that we should make all three APIs behave in the same way with respect to coercing their arguments. This could hide some errors... still in the long run, I agree that the existing setup probably causes more confusion than good. Guido ? - -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ _______________________________________________ Patches mailing list Patches at python.org http://mail.python.org/mailman/listinfo/patches ------- End of Forwarded Message From guido at digicool.com Fri Jan 19 02:04:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 20:04:22 -0500 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Thu, 18 Jan 2001 11:51:46 +0100." <3A66CAC2.74FC894@lemburg.com> References: <3A66CAC2.74FC894@lemburg.com> Message-ID: <200101190104.UAA27056@cj20424-a.reston1.va.home.com> > Ka-Ping Yee wrote: > > > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > > str() looks for __str__ > > > > Oops. I forgot that > > > > str() looks for __str__, then tries __repr__ > > > > So, presumably, > > > > unicode() should look for __unicode__, then __str__, then __repr__ > > Not quite... str() does this: > > 1. strings are passed back as-is > 2. the type slot tp_str is tried > 3. the method __str__ is tried > 4. Unicode returns are converted to strings > 5. anything other than a string return value is rejected > > unistr() does the same, but makes sure that the return > value is an Unicode object. > > unicode() does the following: > > 1. for instances, __str__ is called > 2. Unicode objects are returned as-is > 3. string objects or character buffers are used as basis for decoding > 4. decoding is applied to the character buffer and the results > are returned > > I think we should perhaps merge the two approaches into one > which then applies all of the above in unicode() (and then > forget about unistr()). This might lose hide some type errors, > but since all other generic constructors behave more or less > in the same way, I think unicode() should too. Yes, I would like to see these merged. I noticed that e.g. there is special code to compare Unicode strings in the comparison code (I think I *could* get rid of this now we have rich comparisons, but I decided to put that off), and when I looked at it it uses the same set of conversions as unicode(). Some of these seem questionable to me -- why do you try so many ways to get a string out of an object? (On the other hand the merge of unicode() and unistr() might have this effect anyway...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 02:06:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 18 Jan 2001 20:06:23 -0500 Subject: [Python-Dev] bug in grammar In-Reply-To: Your message of "Thu, 18 Jan 2001 13:39:54 +0100." <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> References: <200101181239.f0ICdsY10703@mira.informatik.hu-berlin.de> Message-ID: <200101190106.UAA27073@cj20424-a.reston1.va.home.com> > I think there should be a well-formedness pass in-between. I.e. after > the AST has been build, a single pass should descend through the tree, > looking for an expr_statement with more than a single testlist. Once > it finds one, it should confirm that this really is a well-formed > lvalue (in C speak). In this case, the test should be that each term > is a an atom without factors. Good ideal. > If the parser itself performs such checks, the compiler could be > simplified in many places, I guess. Not sure that in practice it makes much of a difference: there aren't that many of these kinds of checks, and writing a separate pass is expensive. On the other hand, Jeremy is just writing a separate pass anyway, to collect name usage information for the nested scopes. Maybe it could be folded into that pass... --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Fri Jan 19 04:20:08 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 22:20:08 -0500 (EST) Subject: [Python-Dev] deprecated regex used by un-deprecated modules Message-ID: <14951.45672.806978.600944@localhost.localdomain> There are several modules in the standard library that use the regex module. When they are imported, they print a warning about using a deprecated module. I think this is bad form. Either the modules that depend on regex should by updated to use re or they should be deprecated themselves. I discovered the following offenders: asynchat knee poplib reconvert I would suggest fixing asynchat and poplib and deprecating knee. The reconvert module may be a special case. Jeremy From jeremy at alum.mit.edu Fri Jan 19 04:31:02 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jan 2001 22:31:02 -0500 (EST) Subject: [Python-Dev] setup.py and build subdirectories Message-ID: <14951.46326.743921.988828@localhost.localdomain> I have a bunch of build directories under the source tree, e.g. src/python/dist/src/build src/python/dist/src/build-pg src/python/dist/src/build-O3 ... The new setup.py did not successfully build in these directories. I hacked distutils a tiny bit and had some success. Patch below. I'm not sure if the approach is kosher, but it allows me to build successfully. I also have a problem running 'make test' from these build directories. The reference to the distutils build directory has '..' prepended to it that shouldn't exist. Jeremy Index: setup.py =================================================================== RCS file: /cvsroot/python/python/dist/src/setup.py,v retrieving revision 1.8 diff -c -r1.8 setup.py *** setup.py 2001/01/18 20:39:34 1.8 --- setup.py 2001/01/19 03:26:55 *************** *** 536,540 **** # --install-platlib if __name__ == '__main__': ! sysconfig.set_python_build() main() --- 536,541 ---- # --install-platlib if __name__ == '__main__': ! path, file = os.path.split(sys.argv[0]) ! sysconfig.set_python_build(path) main() Index: Lib/distutils/sysconfig.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/distutils/sysconfig.py,v retrieving revision 1.31 diff -c -r1.31 sysconfig.py *** Lib/distutils/sysconfig.py 2001/01/17 15:16:52 1.31 --- Lib/distutils/sysconfig.py 2001/01/19 03:27:01 *************** *** 24,37 **** python_build = 0 ! def set_python_build(): """Set the python_build flag to true; this means that we're building Python itself. Only called from the setup.py script shipped with Python. """ global python_build ! python_build = 1 def get_python_inc(plat_specific=0, prefix=None): """Return the directory containing installed Python header files. --- 24,37 ---- python_build = 0 ! def set_python_build(loc): """Set the python_build flag to true; this means that we're building Python itself. Only called from the setup.py script shipped with Python. """ global python_build ! python_build = loc + "/" def get_python_inc(plat_specific=0, prefix=None): """Return the directory containing installed Python header files. *************** *** 48,54 **** prefix = (plat_specific and EXEC_PREFIX or PREFIX) if os.name == "posix": if python_build: ! return "Include/" return os.path.join(prefix, "include", "python" + sys.version[:3]) elif os.name == "nt": return os.path.join(prefix, "Include") # include or Include? --- 48,54 ---- prefix = (plat_specific and EXEC_PREFIX or PREFIX) if os.name == "posix": if python_build: ! return python_build + "Include/" return os.path.join(prefix, "include", "python" + sys.version[:3]) elif os.name == "nt": return os.path.join(prefix, "Include") # include or Include? From tim.one at home.com Fri Jan 19 04:46:16 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 18 Jan 2001 22:46:16 -0500 Subject: [Python-Dev] Type-converting functions, esp. unicode() vs. unistr() Message-ID: [attribution lost] > There is no built-in dict(); if it existed i suppose it would do > the opposite of x.items(); again a weak argument, though i might > have found such a function useful once or twice. [Guido] > Yeah, it's not very common. Dict comprehensions anyone? > > d = {k:v for k,v in zip(range(10), range(10))} # :-) It's very common in Perl code, but is in no sense the inverse of .items() there: when you build a dict from a list L in Perl, it acts like Python {L[0]: L[1], L[2]: L[3], L[4]: L[5], ... } That's what seems most practical most often; e.g., when crunching over text files with records of the form key value (e.g., mail headers are of this form; simple contact databases; to-do lists segregated by date; etc), whatever fancy re.split() is used to break things apart naturally returns a flat list. A list of two-tuples is natural only if it was obtained from another dict's .items() <0.9 wink>. pushing-the-limits-of-"practicality-beats-purity"?-ly y'rs - tim From tim.one at home.com Fri Jan 19 07:00:27 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 01:00:27 -0500 Subject: [Python-Dev] test_urllib failing on Windows Message-ID: test test_urllib crashed -- exceptions.AssertionError: urllib.quote problem From tim.one at home.com Fri Jan 19 07:39:30 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 01:39:30 -0500 Subject: [Python-Dev] (no subject) Message-ID: [some MS internal support group] > Turns out the C standard explicitly says you can't have an input > follow iutput on a stream without doing fflush or fseek in-between, > to make sure the stdio buffer is cleared. So this program is illegal. It's undefined (there are no "illegal" programs -- that word doesn't appear in the std; "undefined" does and has a precise technical meaning). In the presence of threads-- which the C std doesn't mention --you have to address issues the std doesn't touch. To date, MS's is the only C runtime we've seen that corrupts itself in this situation. It can do anything it likes short of blowing up and still be considered a good threaded implementation. As is, it has to be considered sub-standard, in the ordinary sense of displaying worse behavior than other threaded C stdio implementations. It falls short there on other counts too (like the lack of getc_unlocked() & friends), but internal corruption is a particularly egregious failing. and-that's-the-end-of-it-for-me-ly y'rs - tim From mwh21 at cam.ac.uk Fri Jan 19 09:31:18 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 19 Jan 2001 08:31:18 +0000 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Thomas Wouters's message of "Fri, 19 Jan 2001 00:02:09 +0100" References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: Thomas Wouters writes: > This brings me to another point: how can 'make test' work at all ? Does > python always check for './Lib' (and './Modules') for modules ? If that's > specific for 'make test' and running python in the source distribution, that > sounds like a bit of a weird hack. I can't find any such hackery in the > source, but I also can't figure out how else it's working :) It's in Modules/getpath.c Cheers, M. -- I really hope there's a catastrophic bug insome future e-mail program where if you try and send an attachment it cancels your ISP account, deletes your harddrive, and pisses in your coffee -- Adam Rixey From gstein at lyra.org Fri Jan 19 09:38:54 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 19 Jan 2001 00:38:54 -0800 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) In-Reply-To: ; from gvanrossum@users.sourceforge.net on Thu, Jan 18, 2001 at 04:28:10PM -0800 References: Message-ID: <20010119003854.F7731@lyra.org> On Thu, Jan 18, 2001 at 04:28:10PM -0800, Guido van Rossum wrote: >... > PyTypeObject PyCursesWindow_Type = { > ! PyObject_HEAD_INIT(NULL) > 0, /*ob_size*/ > "curses window", /*tp_name*/ >... > --- 2432,2443 ---- > /* Initialization function for the module */ > > ! DL_EXPORT(void) > init_curses(void) > { > PyObject *m, *d, *v, *c_api_object; > static void *PyCurses_API[PyCurses_API_pointers]; > + > + /* Initialize object type */ > + PyCursesWindow_Type.ob_type = &PyType_Type; > > /* Initialize the C API pointer array */ I've never truly understood this. Is it because Windows cannot initialize (at load-time) a pointer to a data structure that is located in a different DLL? It is a bit painful to keep moving inits from load-time to run-time. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Fri Jan 19 10:01:22 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 04:01:22 -0500 Subject: [Python-Dev] test_urllib failing on Windows Message-ID: Bet it was failing everywhere; it's fixed now. From moshez at zadka.site.co.il Fri Jan 19 18:53:36 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Fri, 19 Jan 2001 19:53:36 +0200 (IST) Subject: [Python-Dev] Dbm failure Message-ID: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il> test test_dbm skipped -- /home/moshez/prog/src/python/python/dist/src/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey Did it happen to anyone else? Anything else you need to know? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From mal at lemburg.com Fri Jan 19 10:58:08 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 10:58:08 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> Message-ID: <3A680FB0.AED2DB55@lemburg.com> Guido van Rossum wrote: > > > Ka-Ping Yee wrote: > > > > > > On Thu, 18 Jan 2001, Ka-Ping Yee wrote: > > > > str() looks for __str__ > > > > > > Oops. I forgot that > > > > > > str() looks for __str__, then tries __repr__ > > > > > > So, presumably, > > > > > > unicode() should look for __unicode__, then __str__, then __repr__ > > > > Not quite... str() does this: > > > > 1. strings are passed back as-is > > 2. the type slot tp_str is tried > > 3. the method __str__ is tried > > 4. Unicode returns are converted to strings > > 5. anything other than a string return value is rejected > > > > unistr() does the same, but makes sure that the return > > value is an Unicode object. > > > > unicode() does the following: > > > > 1. for instances, __str__ is called > > 2. Unicode objects are returned as-is > > 3. string objects or character buffers are used as basis for decoding > > 4. decoding is applied to the character buffer and the results > > are returned > > > > I think we should perhaps merge the two approaches into one > > which then applies all of the above in unicode() (and then > > forget about unistr()). This might lose hide some type errors, > > but since all other generic constructors behave more or less > > in the same way, I think unicode() should too. > > Yes, I would like to see these merged. I noticed that e.g. there is > special code to compare Unicode strings in the comparison code (I > think I *could* get rid of this now we have rich comparisons, but I > decided to put that off), and when I looked at it it uses the same set > of conversions as unicode(). Some of these seem questionable to me -- > why do you try so many ways to get a string out of an object? (On the > other hand the merge of unicode() and unistr() might have this effect > anyway...) ... because there are so many ways to get at string representations of objects in Python at C level. If we agree to merge the semantics of the two APIs, then str() would have to change too: is this desirable ? (IMHO, yes) Here's what we could do: a) merge the semantics of unistr() into unicode() b) apply the same semantics in str() c) remove unistr() -- how's that for a short-living builtin ;) About the semantics: These should be backward compatible to str() in that everything that worked before should continue to work after the merge. A strawman for processing str() and unicode(): 1. strings/Unicode is passed back as-is 2. tp_str is tried 3. the method __str__ is tried 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) 5. for str(): Unicode return values are converted to strings using the default encoding for unicode(): Unicode return values are passed back as-is; string return values are decoded according to the encoding parameter 6. the return object is type-checked: str() will always return a string object, unicode() always a Unicode object Note that passing back Unicode is only allowed in case no encoding was given. Otherwise an execption is raised: you can't decode Unicode. As extension we could add encoding and error parameters to str() as well. The result would be either an encoding of Unicode objects passed back by tp_str or __str__ or a recoding of string objects returned by checks 2, 3 or 4. If we agree to take this approach, then we should remove the unistr() Python API before the alpha ships. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at effbot.org Fri Jan 19 11:19:06 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 11:19:06 +0100 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) References: <20010119003854.F7731@lyra.org> Message-ID: <010c01c08201$4b0ec050$e46940d5@hagrid> greg wrote: > I've never truly understood this. Is it because Windows cannot initialize > (at load-time) a pointer to a data structure that is located in a different > DLL? Windows can do it (via DLL initialization code), but the compiler doesn't generate initialization code for C programs. you can compile the module as C++, but that's also a bit painful... From jack at oratrix.nl Fri Jan 19 12:02:00 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 19 Jan 2001 12:02:00 +0100 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments Message-ID: <20010119110200.9E455373C95@snelboot.oratrix.nl> I get the impression that I'm currently seeing a non-NULL third argument in my (C) methods even though the method is called without keyword arguments. Is this new semantics that I missed the discussion about, or is this a bug? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From thomas at xs4all.net Fri Jan 19 13:22:06 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:22:06 +0100 Subject: [Python-Dev] deprecated regex used by un-deprecated modules In-Reply-To: <14951.45672.806978.600944@localhost.localdomain>; from jeremy@alum.mit.edu on Thu, Jan 18, 2001 at 10:20:08PM -0500 References: <14951.45672.806978.600944@localhost.localdomain> Message-ID: <20010119132206.H17392@xs4all.nl> On Thu, Jan 18, 2001 at 10:20:08PM -0500, Jeremy Hylton wrote: > I would suggest fixing asynchat and poplib and deprecating knee. The > reconvert module may be a special case. Can't reconvert just disable the warning before importing regex ? That would seem the sane thing to do, at least to me. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Fri Jan 19 13:26:31 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:26:31 +0100 Subject: [Python-Dev] Mail delays and SourceForge bugs In-Reply-To: <200101190034.TAA26664@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jan 18, 2001 at 07:34:02PM -0500 References: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> Message-ID: <20010119132631.I17392@xs4all.nl> On Thu, Jan 18, 2001 at 07:34:02PM -0500, Guido van Rossum wrote: > Through no fault of my own, email to guido at python.org (which includes > the python-dev list) is currently suffering delays of 12-24 hours. I > have a feeling this is probably true for all mail going through > python.org, so checkin messages ans python-dev discussion have been > greatly frustrated, with about 1 day to go until the planned 2.1a1 > release date! I doubt it's (just) you, Guido. I'm seeing similar delays, and I already talked with Barry about it, too. It looks like it's clearing up a bit, now, but it's confusing as hell, for sure ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Fri Jan 19 13:33:47 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 13:33:47 +0100 Subject: [Python-Dev] Dbm failure In-Reply-To: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Fri, Jan 19, 2001 at 07:53:36PM +0200 References: <20010119175336.3B5A0A83E@darjeeling.zadka.site.co.il> Message-ID: <20010119133347.J17392@xs4all.nl> On Fri, Jan 19, 2001 at 07:53:36PM +0200, Moshe Zadka wrote: > test test_dbm skipped -- /home/moshez/prog/src/python/python/dist/src/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey > Did it happen to anyone else? Yes, to me. You're suffering from the same thing I did: GNU sucks. Okay, okay, not as much as MS products or most other UNIX software, but still ;) The problem is a conflict between gdbm and glibc. gdbm (1.7.3, which is what woody currently carries, not sure why it isn't updated) offers a dbm interface/replacement, which includes a libdbm.(so|a) and /usr/include/gdbm-ndbm.h. Glibc (or at least the debian package) *also* offers a dbm interface/replacement, which consists of libdb1.(so|a) and /usr/include/db1/ndbm.h (which needs /usr/include/db1/*.h). If you add /usr/include/db1 to your include path, and -ldbm to the dbmmodule, you end up with the wrong versions. You need either to include /usr/include/db1 in your includepath and use -ldb1, or fix up dbmmodule.c so it includes gdbm-ndbm.h and uses -ldbm. I only figured this out yesterday, and sent Andrew a mail about that... I'm not sure what the Right(tm) way to fix this is :( I've always loathed these library/version mismatches :P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Fri Jan 19 14:07:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 14:07:00 +0100 Subject: [Python-Dev] Standard install locations for Python ? References: <200101181220.f0ICK6K10612@mira.informatik.hu-berlin.de> <20010118135640.G21503@kronos.cnri.reston.va.us> Message-ID: <3A683BF4.BD74A979@lemburg.com> Andrew Kuchling wrote: > > On Thu, Jan 18, 2001 at 01:20:06PM +0100, Martin v. Loewis wrote: > >On Unix, there appears to be no standard location, unless the > >documentation consists of man pages or perhaps info files. So > >/share/doc is probably a place as good as any other. > > This seems like a good suggestion. Should docs go in > /share/doc/python/, then? Perhaps with > subdirectories for different extensions? Hmm, I guess it's better to follow bdist_rpm here: put the docs into a subdir under .../doc/ using the package name and version. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at alum.mit.edu Fri Jan 19 15:39:13 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 19 Jan 2001 09:39:13 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <20010119110200.9E455373C95@snelboot.oratrix.nl> References: <20010119110200.9E455373C95@snelboot.oratrix.nl> Message-ID: <14952.20881.848489.869512@localhost.localdomain> >>>>> "JJ" == Jack Jansen writes: JJ> I get the impression that I'm currently seeing a non-NULL third JJ> argument in my (C) methods even though the method is called JJ> without keyword arguments. JJ> Is this new semantics that I missed the discussion about, or is JJ> this a bug? This is a bug in the changes I made to the call function implementation. I wasn't sure what was supposed to happen to a function that expected a kw argument but was called without one. I thought I saw some crashes when I passed NULL, so I changed the implementation to pass an empty dictionary. (Is the correct behavior documented anywhere?) If a NULL value is correct, I'll update the implementation and see if I can rediscover those crashes. Jeremy From nas at arctrix.com Fri Jan 19 08:39:50 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 18 Jan 2001 23:39:50 -0800 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010119000209.F17392@xs4all.nl>; from thomas@xs4all.net on Fri, Jan 19, 2001 at 12:02:09AM +0100 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: <20010118233950.A15636@glacier.fnational.com> On Fri, Jan 19, 2001 at 12:02:09AM +0100, Thomas Wouters wrote: > I can't find any such hackery in the source, but I also can't > figure out how else it's working :) I thank you want to look at getpath.c. Neil From jeremy at alum.mit.edu Fri Jan 19 15:44:50 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 19 Jan 2001 09:44:50 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.107,2.108 In-Reply-To: References: Message-ID: <14952.21218.416551.695660@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: GvR> Log Message: Changes to recursive-object comparisons, having to GvR> do with a test case I found where rich comparison of unequal GvR> recursive objects gave unintuituve results. In a discussion GvR> with Tim, where we discovered that our intuition on when a<=b GvR> should be true was failing, we decided to outlaw ordering GvR> comparisons on recursive objects. (Once we have fixed our GvR> intuition and designed a matching algorithm that's practical GvR> and reasonable to implement, we can allow such orderings GvR> again.) Sounds sensible to me! I was quite puzzled about what <= should return for recursive objects. GvR> - Changed the nesting limit to a more reasonable small 20; this GvR> only slows down comparisons of very deeply nested objects GvR> (unlikely to occur in practice), while speeding up GvR> comparisons of recursive objects (previously, this would GvR> first waste time and space on 500 nested comparisons before GvR> it would start detecting recursion). After we talked through this code yesterday, I was also thinking that the limit was too high :-). Jeremy From guido at digicool.com Fri Jan 19 16:49:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 10:49:54 -0500 Subject: [Python-Dev] new Makefile.in In-Reply-To: Your message of "Thu, 18 Jan 2001 18:56:04 EST." <200101182356.SAA19616@cj20424-a.reston1.va.home.com> References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> Message-ID: <200101191549.KAA28699@cj20424-a.reston1.va.home.com> [Neil] > > A question: is it possible to break the Python static library up? [me] > Sounds cool to me. Of course after Martin's response I agree with him -- let's keep it one library. (Although I expect that the combined effect of setup.py and Neil's flat Makefile will still affect the infrastructure to build extensions... :-( ) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 16:56:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 10:56:58 -0500 Subject: [Python-Dev] MS CRT crashing: In-Reply-To: Your message of "Thu, 18 Jan 2001 16:53:15 PST." <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> References: <58C671173DB6174A93E9ED88DCB0883DB863F1@red-msg-07.redmond.corp.microsoft.com> Message-ID: <200101191556.KAA28761@cj20424-a.reston1.va.home.com> Bill Tutt writes: > From the internal support squad: > Turns out the C standard explicitly says you can't have an input follow > output on a stream without doing fflush or fseek in-between, to make sure > the stdio buffer is cleared. So this program is illegal. > > They've gone and resolved it by design. I'd just like to note for the record that this is exactly what I had predicted. I'd also like to note that I *agree*. Tim seems to think there's a race condition in the threading code, but it's really much simpler than that: the same bug can easily be provoked with a single-threaded program: just randomly read and write alternatingly. So obviously the people who wrote the threading code aren't interested in the bug, because it's not in their code -- and the people who wrote the code that doesn't behave well when abused are protected by the C standard... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 17:00:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:00:30 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Thu, 18 Jan 2001 22:39:18 +0100." <3A676286.C33823B4@tismer.com> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> Message-ID: <200101191600.LAA28788@cj20424-a.reston1.va.home.com> > Yes, the "inverse" is confusing. Is what you mean the "reverse" ? > Like the other right-side operators __radd__, is it correct to > think of > > __ge__ == __rle__ > > if __rle__ was written in the same fashion like __radd__ ? > It looks semantically the same, although the reason for a > call might be different. Yes, it's semantically the same, and the reason for the call is the same too ("the left argument doesn't support the operator so let's try if the right one knows"). > And if my above view is right, would it perhaps be less > confusing to use in fact __rle__ and __rlt__, > or woudl it be more confusing, since __rlt__ would also be > invoked left-to-right, implementing ">". I prefer 6 new operators over 12 any day. I can see no valid reason why someone would want to overload a>b different than b; from guido@digicool.com on Fri, Jan 19, 2001 at 10:49:54AM -0500 References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> <200101191549.KAA28699@cj20424-a.reston1.va.home.com> Message-ID: <20010119111455.C25056@kronos.cnri.reston.va.us> On Fri, Jan 19, 2001 at 10:49:54AM -0500, Guido van Rossum wrote: >Of course after Martin's response I agree with him -- let's keep it >one library. (Although I expect that the combined effect of setup.py >and Neil's flat Makefile will still affect the infrastructure to build >extensions... :-( ) Which reminds me... there should really be a way to ignore the setup.py stuff and use the old method. How should that be done. A --use-makesetup flag to configure, maybe? --amk From guido at digicool.com Fri Jan 19 17:14:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:14:20 -0500 Subject: [Python-Dev] Re: test_support.py In-Reply-To: Your message of "Thu, 18 Jan 2001 21:59:23 PST." References: Message-ID: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> > if not condition: > ! raise AssertionError(reason) Wouldn't it be better if this raised TestFailed rather than AssertionError? Or is there code that catches the AssertionError? [...grep...] Yes, there's code that catches AssertionError: (1) in Marc-Andre's own test_unicode.py; (2) in test_re, which catches AssertionError and raises TestFailed instead. Proposal: (1) change verify() to raise TestFailed; (2) change test_unicode.py to catch TestFailed instead. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Fri Jan 19 17:17:06 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 19 Jan 2001 17:17:06 +0100 Subject: [Python-Dev] Rich comparison confusion References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> Message-ID: <3A686882.F78C1268@tismer.com> Guido van Rossum wrote: > > > Yes, the "inverse" is confusing. Is what you mean the "reverse" ? > > Like the other right-side operators __radd__, is it correct to > > think of > > > > __ge__ == __rle__ > > > > if __rle__ was written in the same fashion like __radd__ ? > > It looks semantically the same, although the reason for a > > call might be different. > > Yes, it's semantically the same, and the reason for the call is the > same too ("the left argument doesn't support the operator so let's try > if the right one knows"). > > > And if my above view is right, would it perhaps be less > > confusing to use in fact __rle__ and __rlt__, > > or woudl it be more confusing, since __rlt__ would also be > > invoked left-to-right, implementing ">". > > I prefer 6 new operators over 12 any day. I can see no valid reason > why someone would want to overload a>b different than b there are plenty of reasons why a+b and b+a should be different: > e.g. string concatenation. Sure, I didn't want to introduce new operators, but use the "r" versions for three of the six new operators. But I should have read you proposal before. The confusion is not due to you, but Skip had a read error, since you don't talk about inverses at all: Skip==""" In the description he states that __le__ and __ge__ are inverses as are __lt__ and __gt__. """ Truth==""" There are no explicit "reversed argument" versions of these; instead, __lt__ and __gt__ are each other's reverse, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reverse (similar at the C level). """ No reason for confusion at all > python-dev/null - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas at xs4all.net Fri Jan 19 17:20:56 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 17:20:56 +0100 Subject: [Python-Dev] test_ucn errors ? Message-ID: <20010119172056.K17392@xs4all.nl> I'm currently seeing a failure in test_ucn: test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding error: Illegal Unicode character It looks like one of the unicode literals in test_ucn is invalid, but it's damned hard to pin down which: Python 2.1a1 (#7, Jan 19 2001, 17:06:32) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import test.test_ucn Traceback (most recent call last): File "", line 1, in ? UnicodeError: Unicode-Escape decoding error: Illegal Unicode character >>> I get the same crashes on FreeBSD and (Debian) Linux. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 17:26:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:26:34 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Your message of "Fri, 19 Jan 2001 00:02:09 +0100." <20010119000209.F17392@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> Message-ID: <200101191626.LAA29165@cj20424-a.reston1.va.home.com> > This brings me to another point: how can 'make test' work at all ? Does > python always check for './Lib' (and './Modules') for modules ? Look at the logic in Modules/getpath.c, which calculates the initial (default) sys.path. It detects that it's running from the build tree and then modifies the default path a bit to include Lib and Modules relative to where the python executable was found. > If that's > specific for 'make test' and running python in the source distribution, that > sounds like a bit of a weird hack. I can't find any such hackery in the > source, but I also can't figure out how else it's working :) It's not jut for 'make test' -- it's to make life easy for developers in general (and me in particular :-) who want to try out their hacks without going through 'make install'. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Jan 19 17:34:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 17:34:58 +0100 Subject: [Python-Dev] Re: test_support.py References: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> Message-ID: <3A686CB2.C75D184D@lemburg.com> Guido van Rossum wrote: > > > if not condition: > > ! raise AssertionError(reason) > > Wouldn't it be better if this raised TestFailed rather than > AssertionError? Or is there code that catches the AssertionError? > > [...grep...] > > Yes, there's code that catches AssertionError: > > (1) in Marc-Andre's own test_unicode.py; > > (2) in test_re, which catches AssertionError and raises TestFailed > instead. > > Proposal: > > (1) change verify() to raise TestFailed; > > (2) change test_unicode.py to catch TestFailed instead. +1 Why not simply make TestFailed a subclass of AssertionError ? Then we wouldn't have to fear about breaking test code... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Fri Jan 19 17:34:15 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 17:34:15 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <200101191626.LAA29165@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, Jan 19, 2001 at 11:26:34AM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> Message-ID: <20010119173415.M17295@xs4all.nl> On Fri, Jan 19, 2001 at 11:26:34AM -0500, Guido van Rossum wrote: > > This brings me to another point: how can 'make test' work at all ? Does > > python always check for './Lib' (and './Modules') for modules ? > Look at the logic in Modules/getpath.c, which calculates the initial > (default) sys.path. It detects that it's running from the build tree > and then modifies the default path a bit to include Lib and Modules > relative to where the python executable was found. Aye, I found it now. > > If that's > > specific for 'make test' and running python in the source distribution, that > > sounds like a bit of a weird hack. I can't find any such hackery in the > > source, but I also can't figure out how else it's working :) > It's not jut for 'make test' -- it's to make life easy for developers > in general (and me in particular :-) who want to try out their hacks > without going through 'make install'. Well, after some old SF movies & some sleep, I realized that :) But it is going to have to change: you now have to include the build tree as well, and that is quite a bit more difficult to figure out. I'd suggest a 'make run' that calls python with the appropriate PYTHONPATH environment variable, but that doesn't cover test-scripts (which I use a lot myself.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 17:34:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:34:45 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Fri, 19 Jan 2001 12:02:00 +0100." <20010119110200.9E455373C95@snelboot.oratrix.nl> References: <20010119110200.9E455373C95@snelboot.oratrix.nl> Message-ID: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> > I get the impression that I'm currently seeing a non-NULL third > argument in my (C) methods even though the method is called without > keyword arguments. > Is this new semantics that I missed the discussion about, or is this a bug? Can't tell without spending more time looking at the code and experimenting than I can afford today; but Jeremy refactored the calling code, and it could be that you're seeing an empty dictionary instead of a NULL. Do you really need the NULL? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 17:41:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:41:02 -0500 Subject: [Python-Dev] Mail delays and SourceForge bugs In-Reply-To: Your message of "Fri, 19 Jan 2001 13:26:31 +0100." <20010119132631.I17392@xs4all.nl> References: <200101190034.TAA26664@cj20424-a.reston1.va.home.com> <20010119132631.I17392@xs4all.nl> Message-ID: <200101191641.LAA29324@cj20424-a.reston1.va.home.com> > I doubt it's (just) you, Guido. I'm seeing similar delays, and I already > talked with Barry about it, too. It looks like it's clearing up a bit, now, > but it's confusing as hell, for sure ;) It's worse for me though than for most people: for others, only mail sent through mailman at mail.python.org is affected. For me, mail sent directly to guido at python.org is affected too (which is why I've changed my From address again to that old standby, guido at digicool.com). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 17:53:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 11:53:39 -0500 Subject: [Python-Dev] deprecated regex used by un-deprecated modules In-Reply-To: Your message of "Thu, 18 Jan 2001 22:20:08 EST." <14951.45672.806978.600944@localhost.localdomain> References: <14951.45672.806978.600944@localhost.localdomain> Message-ID: <200101191653.LAA29774@cj20424-a.reston1.va.home.com> > There are several modules in the standard library that use the regex > module. When they are imported, they print a warning about using a > deprecated module. I think this is bad form. Either the modules that > depend on regex should by updated to use re or they should be > deprecated themselves. > > I discovered the following offenders: > asynchat > knee > poplib > reconvert > > I would suggest fixing asynchat and poplib and deprecating knee. The > reconvert module may be a special case. Agreed. There's an idiom to disable the warning, which you can find in regsub.py: import warnings warnings.filterwarnings("ignore", "", DeprecationWarning, __name__) (The "" should be replaced by the specific warning message though.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jan 19 18:21:28 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 12:21:28 -0500 Subject: [Python-Dev] test_ucn errors ? In-Reply-To: Your message of "Fri, 19 Jan 2001 17:20:56 +0100." <20010119172056.K17392@xs4all.nl> References: <20010119172056.K17392@xs4all.nl> Message-ID: <200101191721.MAA31937@cj20424-a.reston1.va.home.com> > I'm currently seeing a failure in test_ucn: > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > error: Illegal Unicode character > > It looks like one of the unicode literals in test_ucn is invalid, but it's > damned hard to pin down which: Feels to me like there's a bug in the string literal processing that makes *any* string literal containing \N{...} fail during code generation. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Fri Jan 19 18:37:41 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 18:37:41 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> Message-ID: <023801c0823e$86fcedc0$e46940d5@hagrid> > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > error: Illegal Unicode character Make sure you rebuild Objects/unicodeobject.o and the ucnhash extension. If they build without warnings, run the following script. import ucnhash count = 0 for code in range(65536): try: name = ucnhash.getname(code) if ucnhash.getcode(name) != code: print name count += 1 except ValueError: pass print count if it prints anything but "10538", let me know. > It looks like one of the unicode literals in test_ucn is invalid, but it's > damned hard to pin down which: If the ucnhash extension cannot be found, the script won't even compile... shouldn't be too hard to fix. From Barrett at stsci.edu Fri Jan 19 18:32:26 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Fri, 19 Jan 2001 12:32:26 -0500 (EST) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <200101191600.LAA28788@cj20424-a.reston1.va.home.com> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> Message-ID: <14952.30800.112503.123675@nem-srvr.stsci.edu> Guido van Rossum writes: > > ... I can see no valid reason why someone would want to overload > a>b different than b I agree. But this assumes that the result of AA is a collection of Booleans. In the Interactive Data Language (IDL) these operators are essentially mapped to ceiling and floor functions which are not commutative. I personally find this silly, but IDL users coming to Python may be surprised when the comparison of two Numeric arrays returns a Boolean-like result. -- Dr. Paul Barrett Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From nas at arctrix.com Fri Jan 19 11:43:12 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 19 Jan 2001 02:43:12 -0800 Subject: [Python-Dev] new Makefile.in In-Reply-To: <20010119111455.C25056@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Fri, Jan 19, 2001 at 11:14:55AM -0500 References: <20010117235922.A12356@glacier.fnational.com> <200101182356.SAA19616@cj20424-a.reston1.va.home.com> <200101191549.KAA28699@cj20424-a.reston1.va.home.com> <20010119111455.C25056@kronos.cnri.reston.va.us> Message-ID: <20010119024312.A16179@glacier.fnational.com> On Fri, Jan 19, 2001 at 11:14:55AM -0500, Andrew Kuchling wrote: > Which reminds me... there should really be a way to ignore the > setup.py stuff and use the old method. How should that be done. A > --use-makesetup flag to configure, maybe? A different target for make would be easy. Neil From fredrik at effbot.org Fri Jan 19 19:13:15 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 19:13:15 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> Message-ID: <03a201c08243$7fa62af0$e46940d5@hagrid> thomas wrote: > > I'm currently seeing a failure in test_ucn: > > > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > > error: Illegal Unicode character > > > > It looks like one of the unicode literals in test_ucn is invalid, but it's > > damned hard to pin down which: > > Feels to me like there's a bug in the string literal processing that > makes *any* string literal containing \N{...} fail during code > generation. I took another look at the error message: the only explanation I can see here is that the lookup succeeds, but the call to ucn- hash returns a value larger than 0x10ffff. What is Py_UCS4 set to under gcc? Confusing /F From guido at digicool.com Fri Jan 19 19:11:21 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 13:11:21 -0500 Subject: [Python-Dev] Re: test_support.py In-Reply-To: Your message of "Fri, 19 Jan 2001 17:34:58 +0100." <3A686CB2.C75D184D@lemburg.com> References: <200101191614.LAA28881@cj20424-a.reston1.va.home.com> <3A686CB2.C75D184D@lemburg.com> Message-ID: <200101191811.NAA32539@cj20424-a.reston1.va.home.com> > > Proposal: > > > > (1) change verify() to raise TestFailed; > > > > (2) change test_unicode.py to catch TestFailed instead. > > +1 > > Why not simply make TestFailed a subclass of AssertionError ? > Then we wouldn't have to fear about breaking test code... No, I'd rather see the two separated. There can be assert statements in the modules we're testing, and I'd prefer not to see those caught by test code that is trying to catch TestFailed. I'll check this in momentarily. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Fri Jan 19 19:19:37 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 19 Jan 2001 19:19:37 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> Message-ID: <03b301c08244$627f22a0$e46940d5@hagrid> > Feels to me like there's a bug in the string literal processing that > makes *any* string literal containing \N{...} fail during code > generation. umm. can anyone explain how this can happen: python ../lib/test/regrtest.py test_ucn test_ucn 1 test OK. python ../lib/test/test_ucn.py UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name how can a test that works under regrtest.py fail when it's run separately? what am I missing here? From mal at lemburg.com Fri Jan 19 19:48:53 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 19:48:53 +0100 Subject: [Python-Dev] test_ucn errors ? References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> <03a201c08243$7fa62af0$e46940d5@hagrid> Message-ID: <3A688C15.8C9CFF46@lemburg.com> Fredrik Lundh wrote: > > thomas wrote: > > > I'm currently seeing a failure in test_ucn: > > > > > > test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding > > > error: Illegal Unicode character > > > > > > It looks like one of the unicode literals in test_ucn is invalid, but it's > > > damned hard to pin down which: > > > > Feels to me like there's a bug in the string literal processing that > > makes *any* string literal containing \N{...} fail during code > > generation. > > I took another look at the error message: the only explanation > I can see here is that the lookup succeeds, but the call to ucn- > hash returns a value larger than 0x10ffff. > > What is Py_UCS4 set to under gcc? Should be "unsigned int" on all modern Intel platforms. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Fri Jan 19 19:48:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 13:48:45 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Your message of "Fri, 19 Jan 2001 12:32:26 EST." <14952.30800.112503.123675@nem-srvr.stsci.edu> References: <14949.46995.259157.871323@beluga.mojam.com> <200101171609.LAA04102@cj20424-a.reston1.va.home.com> <3A676286.C33823B4@tismer.com> <200101191600.LAA28788@cj20424-a.reston1.va.home.com> <14952.30800.112503.123675@nem-srvr.stsci.edu> Message-ID: <200101191848.NAA02765@cj20424-a.reston1.va.home.com> > > ... I can see no valid reason why someone would want to overload > > a>b different than b > > > I agree. But this assumes that the result of AA is a > collection of Booleans. In the Interactive Data Language (IDL) these > operators are essentially mapped to ceiling and floor functions which > are not commutative. I personally find this silly, but IDL users > coming to Python may be surprised when the comparison of two Numeric > arrays returns a Boolean-like result. This means that Python can't be used to emulate this part of IDL. I don't understand how these can be not commutative unless they have a side effect on the left argument, and that's not possible in Python anyway. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri Jan 19 20:18:04 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 14:18:04 -0500 Subject: [Python-Dev] test_ucn errors ? Message-ID: [/F] > umm. can anyone explain how this can happen: > > python ../lib/test/regrtest.py test_ucn > test_ucn > test OK. > > python ../lib/test/test_ucn.py > UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name > > how can a test that works under regrtest.py fail when > it's run separately? what am I missing here? Dunno, but add to the pile of mysteries that you're unique. Here on Win98SE: python ../lib/test/regrtest.py test_ucn test_ucn test test_ucn crashed -- exceptions.UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name 1 test failed: test_ucn python ../lib/test/test_ucn.py UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name I suggest you reformat your hard drive, and reinstall Windows . From mwh21 at cam.ac.uk Fri Jan 19 20:25:03 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 19 Jan 2001 19:25:03 +0000 Subject: [Python-Dev] test_ucn errors ? In-Reply-To: "Fredrik Lundh"'s message of "Fri, 19 Jan 2001 19:19:37 +0100" References: <20010119172056.K17392@xs4all.nl> <200101191721.MAA31937@cj20424-a.reston1.va.home.com> <03b301c08244$627f22a0$e46940d5@hagrid> Message-ID: "Fredrik Lundh" writes: > > Feels to me like there's a bug in the string literal processing that > > makes *any* string literal containing \N{...} fail during code > > generation. > > umm. can anyone explain how this can happen: > > python ../lib/test/regrtest.py test_ucn > test_ucn > 1 test OK. This will run the .pyc if present? > python ../lib/test/test_ucn.py > UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name This won't? Note: no traceback -> (in effect, if not design) compile time error. > how can a test that works under regrtest.py fail when > it's run separately? what am I missing here? Well, this is just my guess. Cheers, M. -- Well, you pretty much need Microsoft stuff to get misbehaviours bad enough to actually tear the time-space continuum. Luckily for you, MS Internet Explorer is available for Solaris. -- Calle Dybedahl, alt.sysadmin.recovery From skip at mojam.com Fri Jan 19 20:55:29 2001 From: skip at mojam.com (Skip Montanaro) Date: Fri, 19 Jan 2001 13:55:29 -0600 (CST) Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <20010119173415.M17295@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> Message-ID: <14952.39857.83065.24889@beluga.mojam.com> Thomas> But it is going to have to change: you now have to include the Thomas> build tree as well, and that is quite a bit more difficult to Thomas> figure out. I'd suggest a 'make run' that calls python with the Thomas> appropriate PYTHONPATH environment variable, but that doesn't Thomas> cover test-scripts (which I use a lot myself.) Doesn't Andrew's new "platform" target in the top-level Makefile do the right thing? It *should* generate a platform-specific path to the correct build subdirectory. Skip From MarkH at ActiveState.com Fri Jan 19 21:11:02 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Fri, 19 Jan 2001 12:11:02 -0800 Subject: [Python-Dev] initializing ob_type (was: CVS: python/dist/src/Modules _cursesmodule.c,2.46,2.47) In-Reply-To: <010c01c08201$4b0ec050$e46940d5@hagrid> Message-ID: > you can compile the module as C++, but that's also a bit painful... My understanding is that the C std doesn't guarantee the order of static object initialization, whereas C++ does provide these semantics. At least that is the excuse I found when digging into this some years ago. Can't-believe-I-mentioned-the-C-standard-while-Tim-is-listening ly, Mark. From guido at digicool.com Fri Jan 19 21:44:53 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 15:44:53 -0500 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() In-Reply-To: Your message of "Fri, 19 Jan 2001 10:58:08 +0100." <3A680FB0.AED2DB55@lemburg.com> References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> <3A680FB0.AED2DB55@lemburg.com> Message-ID: <200101192044.PAA04154@cj20424-a.reston1.va.home.com> > If we agree to merge the semantics of the two APIs, then str() > would have to change too: is this desirable ? (IMHO, yes) Not clear. Which is why I'm backing off from my initial support for merging the two. I believe unicode() (which is really just an interface to PyUnicode_FromEncodedObject()) currently already does too much. In particular this whole business with calling __str__ on instances seems to me to be unnecessary. I think it should *only* bother to look for something that supports the buffer interface (checking for regular strings only as a tiny optimization), or existing unicode objects. > Here's what we could do: > > a) merge the semantics of unistr() into unicode() > b) apply the same semantics in str() > c) remove unistr() -- how's that for a short-living builtin ;) > > About the semantics: > > These should be backward compatible to str() in that everything > that worked before should continue to work after the merge. > > A strawman for processing str() and unicode(): > > 1. strings/Unicode is passed back as-is I hope you mean str() passes 8-bit strings back as-is, unicode() passes Unicode strings back as-is, right? > 2. tp_str is tried > 3. the method __str__ is tried Shouldn't have to -- instances should define tp_str and all the magic for calling __str__ should be there. I don't understand why it's not done that way, probably just for historical reasons. I also don't think __str__ should be tried for non-instance types. But, more seriously, I believe tp_str or __str__ shouldn't be tried at all by unicode(). > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > 5. for str(): Unicode return values are converted to strings using > the default encoding > for unicode(): Unicode return values are passed back as-is; > string return values are decoded according to the > encoding parameter > 6. the return object is type-checked: str() will always return > a string object, unicode() always a Unicode object > > Note that passing back Unicode is only allowed in case no encoding > was given. Otherwise an execption is raised: you can't decode > Unicode. > > As extension we could add encoding and error parameters to str() > as well. The result would be either an encoding of Unicode objects > passed back by tp_str or __str__ or a recoding of string objects > returned by checks 2, 3 or 4. Naaaah! > If we agree to take this approach, then we should remove the > unistr() Python API before the alpha ships. Frankly, I believe we need more time to sort this out, and therefore I propose to remove the unistr() built-in before the release. Marc, would you do the honors? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Fri Jan 19 21:55:53 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 19 Jan 2001 21:55:53 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <14952.39857.83065.24889@beluga.mojam.com>; from skip@mojam.com on Fri, Jan 19, 2001 at 01:55:29PM -0600 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> <14952.39857.83065.24889@beluga.mojam.com> Message-ID: <20010119215552.O17295@xs4all.nl> On Fri, Jan 19, 2001 at 01:55:29PM -0600, Skip Montanaro wrote: > > Thomas> But it is going to have to change: you now have to include the > Thomas> build tree as well, and that is quite a bit more difficult to > Thomas> figure out. I'd suggest a 'make run' that calls python with the > Thomas> appropriate PYTHONPATH environment variable, but that doesn't > Thomas> cover test-scripts (which I use a lot myself.) > Doesn't Andrew's new "platform" target in the top-level Makefile do the > right thing? It *should* generate a platform-specific path to the correct > build subdirectory. Yes, it does, that's what I meant with 'make run'. But that isn't quite as user-friendly as the current method. How would you run a script with the current python ? 'make SCRIPT=./spamtest.py runscript' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri Jan 19 23:06:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 17:06:03 -0500 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: Your message of "Fri, 19 Jan 2001 17:34:15 +0100." <20010119173415.M17295@xs4all.nl> References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> Message-ID: <200101192206.RAA12072@cj20424-a.reston1.va.home.com> I finally figured the best way to fix sys.path to find shared modules built by setup.py. At first I thought I had to add it to getpath.c, but the problem is that the name is calculated by calling distutils.util.get_platform(), and that requires a working Python interpreter, so we'd end up with a chicken-or-egg situation. So instead I added 5 lines to site.py, which tests for os.name=='posix', then for sys.path[-1] ending in '/Modules' -- this tests only succeeds when running from the build directory. Then it calls distutils.util.get_platform() and uses the result to calculate the correct directory name, which is then appended to sys.path. Yes, this slows down startup (it imports a large portion of the distutils package), but I don't care -- after all this is mostly for me so I can play with the interpreter right after I've built it, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Jan 19 22:32:34 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:32:34 +0100 Subject: [Python-Dev] Re: Type-converting functions, esp. unicode() vs. unistr() References: <3A66CAC2.74FC894@lemburg.com> <200101190104.UAA27056@cj20424-a.reston1.va.home.com> <3A680FB0.AED2DB55@lemburg.com> <200101192044.PAA04154@cj20424-a.reston1.va.home.com> Message-ID: <3A68B272.BBBAECD1@lemburg.com> Guido van Rossum wrote: > > > If we agree to merge the semantics of the two APIs, then str() > > would have to change too: is this desirable ? (IMHO, yes) > > Not clear. Which is why I'm backing off from my initial support for > merging the two. > > I believe unicode() (which is really just an interface to > PyUnicode_FromEncodedObject()) currently already does too much. In > particular this whole business with calling __str__ on instances seems > to me to be unnecessary. I think it should *only* bother to look for > something that supports the buffer interface (checking for regular > strings only as a tiny optimization), or existing unicode objects. Hmm, unicode() should (just like str()) take an object and convert it to a Unicode string. Since many objects either don't support the tp_str slot (instances don't for some reason -- just like they don't tp_call), I had to add some special cases to make Python instances compatible to Unicode in the same way str() does. What I think is really needed is a concept for "stringification" in Python. We currently have these schemes: 1. tp_str 2. method __str__ (not only of Python instances, but any object) 3. character buffer interface These three could easily be unified into the tp_str slot: e.g. tp_str could do the necessary magic to call __str__ or the buffer interface. Note that the same is true for e.g. tp_call -- the special cases we have in ceval.c for the different builtin callable objects would not be necessary if they would implement tp_call. > > Here's what we could do: > > > > a) merge the semantics of unistr() into unicode() > > b) apply the same semantics in str() > > c) remove unistr() -- how's that for a short-living builtin ;) > > > > About the semantics: > > > > These should be backward compatible to str() in that everything > > that worked before should continue to work after the merge. > > > > A strawman for processing str() and unicode(): > > > > 1. strings/Unicode is passed back as-is > > I hope you mean str() passes 8-bit strings back as-is, unicode() > passes Unicode strings back as-is, right? Right. > > 2. tp_str is tried > > 3. the method __str__ is tried > > Shouldn't have to -- instances should define tp_str and all the magic > for calling __str__ should be there. I don't understand why it's not > done that way, probably just for historical reasons. I also don't > think __str__ should be tried for non-instance types. Ok. > But, more seriously, I believe tp_str or __str__ shouldn't be tried at > all by unicode(). Hmm, but how would you implement generic conversion to Unicode then ? We'll need some way for instances (and other types) to provide a conversion to Unicode. Some time ago we discussed this issue and came to the conclusion that tp_str should be allowed to return Unicode data instead of inventing a new tp_unicode slot for this purpose. > > 4. the PyObject_AsCharBuffer() API is tried (bf_getcharbuffer) > > 5. for str(): Unicode return values are converted to strings using > > the default encoding > > for unicode(): Unicode return values are passed back as-is; > > string return values are decoded according to the > > encoding parameter > > 6. the return object is type-checked: str() will always return > > a string object, unicode() always a Unicode object > > > > Note that passing back Unicode is only allowed in case no encoding > > was given. Otherwise an execption is raised: you can't decode > > Unicode. > > > > As extension we could add encoding and error parameters to str() > > as well. The result would be either an encoding of Unicode objects > > passed back by tp_str or __str__ or a recoding of string objects > > returned by checks 2, 3 or 4. > > Naaaah! Would be nice for symmetry and useful in the light of making Unicode the only string type in Py4k ;-) > > If we agree to take this approach, then we should remove the > > unistr() Python API before the alpha ships. > > Frankly, I believe we need more time to sort this out, and therefore I > propose to remove the unistr() built-in before the release. Marc, > would you do the honors? Ok. I'll remove the builtin and the docs, but will leave the PyObject_Unicode() API enabled. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From uche.ogbuji at fourthought.com Fri Jan 19 22:42:40 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Fri, 19 Jan 2001 14:42:40 -0700 Subject: [Python-Dev] Extension doc bugs Message-ID: <200101192142.OAA29168@localhost.localdomain> I'm using the bleeding-edge documentation at http://python.sourceforge.net/devel-docs/api/api.html I know that it's not complete until someone has the time to do so, but I've run into a few places where it's completely wrong. For instance, from the object protocol docs: """ int PyObject_Cmp (PyObject *o1, PyObject *o2, int *result) Compare the values of o1 and o2 using a routine provided by o1, if one exists, otherwise with a routine provided by o2. The result of the comparison is returned in result. Returns -1 on failure. This is the equivalent of the Python statement "result = cmp(o1, o2)". """ After getting weird behavior implementing this, and then squinting at the relevant Python 2.0 code, it appears that in actuality the Cmp function is to return the direct comparison results (-1, 0, 1 based on ordering of the parameters) furthermore, there is no such "result" argument. 4Suite has a lot of C extension code developed by squinting at Python sources and long gdb sessions and I have a feeling that in many cases we're taking up hacks that would get us into trouble across versions, and all that; but the "official" interfaces and behaviors are not documented (or only poorly documented). In general, the C API docs are in a rather sorry state and though I doubt I could do a great deal about fixing it, I'd be interested in discussion of the matter, and perhaps making what contribution I can. Is the doc-sig the best place for this? My experience there wouldn't seem to encourage this conclusion (most of the discussion is of docstring syntax and neat-o automagic document generators). -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From mal at lemburg.com Fri Jan 19 22:46:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:46:24 +0100 Subject: [Python-Dev] readline and setup.py Message-ID: <3A68B5B0.771412F7@lemburg.com> The new setup.py procedure for Python causes readline not to be built on my machine. Instead I get a linker error telling me that termcap is not found. Looking at my old Setup file, I have this line: readline readline.c \ -I/usr/include/readline -L/usr/lib/termcap \ -lreadline -lterm I guess, setup.py should be modified to include additional library search paths -- shouldn't hurt on platforms which don't need them. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Jan 19 22:50:53 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jan 2001 22:50:53 +0100 Subject: [Python-Dev] _tkinter and setup.py Message-ID: <3A68B6BD.BAD038D6@lemburg.com> Why does setup.py stop with an error in case _tkinter cannot be built (due to an old Tk/Tcl version in my case) ? I think the policy in setup.py should be to output warnings, but continue building the rest of the Python modules. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Fri Jan 19 23:38:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 17:38:22 -0500 Subject: [Python-Dev] 2.1 alpha 1 release schedule Message-ID: <200101192238.RAA12413@cj20424-a.reston1.va.home.com> Practicality beats purity: we're very close to a release, but I've decided to hold off to give Jeremy a chance to finish the nested scopes, to give Fred a chance to revise the weak references according to Martin's wishes, and in general for things to settle. Most likely we'll be able to release Monday night (Jan 22). Unfortunately email through python.org seems to be wedged again (I swear, it seems like it starts getting wedged every afternoon between 3 and 4!) so I don't have a clear view of what the latest checkins were; but from cvs update it seems that the following things happened this afternoon: - Barry fixed a core dump in function attribute assignments - Marc-Andre withrew unistr(), pending more discussion - Fredrik fixed the ucnhash problem - I fixed two path problems in the new build process that only occurred when you were building in a subdirectory of the source tree Good work, crew! I'm taking the weekend off. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Sat Jan 20 00:23:18 2001 From: jack at oratrix.nl (Jack Jansen) Date: Sat, 20 Jan 2001 00:23:18 +0100 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Message by Guido van Rossum , Fri, 19 Jan 2001 11:34:45 -0500 , <200101191634.LAA29239@cj20424-a.reston1.va.home.com> Message-ID: <20010119232323.70B03116392@oratrix.oratrix.nl> Recently, Guido van Rossum said: > > I get the impression that I'm currently seeing a non-NULL third > > argument in my (C) methods even though the method is called without > > keyword arguments. > > > Is this new semantics that I missed the discussion about, or is this a bug? > > [...] > Do you really need the NULL? The places that I know I was counting on the NULL now have "if ( kw && PyObject_IsTrue(kw))", so I'll just have to hope there aren't any more lingering in there. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim.one at home.com Sat Jan 20 01:04:10 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 19 Jan 2001 19:04:10 -0500 Subject: [Python-Dev] MS CRT crashing: In-Reply-To: <200101191556.KAA28761@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I'd just like to note for the record that this is exactly what I had > predicted. I would have hoped you'd be content to let the record speak for itself . > I'd also like to note that I *agree*. With what? That the program is undefined by the C std was never in dispute. > Tim seems to think there's a race condition in the threading code, > but it's really much simpler than that: the same bug can easily be > provoked with a single-threaded program: just randomly read and > write alternatingly. And this is a point in their favor?! "It's OK that the MT library corrupts itself, because even the single-threaded library does"? > So obviously the people who wrote the threading code aren't interested > in the bug, I don't know that it ever got as far as the people who wrote the threading code, but I sure doubt it: when the reply starts "Turns out the C standard explicitly says ...", it strongly suggests it was written by someone who didn't already know what the C std says, and went looking for an excuse to get it off their plate without further effort. Par for the course, if so. > because it's not in their code -- and the people who wrote the code > that doesn't behave well when abused are protected by the C standard... The behavior of things designated "undefined" and "implementation-defined" by the std fall under "quality of implementation". In the real world, the latter is what vendors compete on; meeting the letter of the std is a bare minimum for playing the game at all. The plain fact is that their library is less robust than others in this case. I worked on a multithreaded stdio implementation at KSR, and that sure couldn't corrupt itself. Looks like no flavor of Linux does either. It's not *reasonable* for a library to corrupt itself in this case, although it's certainly reasonable for its behavior to vary from run to run. There's nothing in the C std that says a conforming implementation can't *crash* on the program void main() {int i = 1;} either . a-std-is-a-floor-on-acceptable-behavior-not-a-ceiling-ly y'rs - tim From gstein at lyra.org Sat Jan 20 02:21:56 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 19 Jan 2001 17:21:56 -0800 Subject: [Python-Dev] initializing ob_type In-Reply-To: ; from MarkH@ActiveState.com on Fri, Jan 19, 2001 at 12:11:02PM -0800 References: <010c01c08201$4b0ec050$e46940d5@hagrid> Message-ID: <20010119172156.Y7731@lyra.org> On Fri, Jan 19, 2001 at 12:11:02PM -0800, Mark Hammond wrote: > > you can compile the module as C++, but that's also a bit painful... > > My understanding is that the C std doesn't guarantee the order of static > object initialization, whereas C++ does provide these semantics. At least > that is the excuse I found when digging into this some years ago. True, but when PyWhatever_Type is initialized, &PyType_Type ought to be ready (even if it isn't initialized). Heck, &PyType_Type points into the Python core which is *definitely* loaded by that point. Now, if "initialization" also means "relocation to a specific address" then I can understand. Hrm... I've just spent some time with the Windows SDK docs, and I can't find anything that really discusses the problem and resolution. There certainly isn't any warning about "don't do this." It all talks about how fixups are stored with the DLL, how you can optionally use BIND to pre-bind the values, blah blah blah. But nothing saying "it doesn't work." It would be interesting to know more about the actual symptoms that appears when the ob_type init is performed by the structure (rather than at runtime). What happens? Bad address? NULL value? Failure to resolve and load? Is PyType_Type not exported correctly or something? Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at digicool.com Sat Jan 20 03:05:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 19 Jan 2001 21:05:39 -0500 Subject: [Python-Dev] How to get setup.py to build expat? Message-ID: <200101200205.VAA13299@cj20424-a.reston1.va.home.com> The setup.py script does not build the expat module for me. I have expat installed in /usr/local, at least I believe so: I have /usr/local/include/xmlparse.h and /usr/local/lib/libexpat.a -- do I need more? How can I get setup.py to spit out what it tries, and why it fails? setup.py -v build doesn't give any extra output. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Sat Jan 20 03:41:43 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 20 Jan 2001 03:41:43 +0100 Subject: [Python-Dev] initializing ob_type References: <010c01c08201$4b0ec050$e46940d5@hagrid> <20010119172156.Y7731@lyra.org> Message-ID: <00f001c0828a$bc903900$e46940d5@hagrid> greg wrote: > It would be interesting to know more about the actual symptoms that appears > when the ob_type init is performed by the structure (rather than at runtime). > What happens? http://www.python.org/doc/FAQ.html#3.24 "3.24. "Initializer not a constant" while building DLL on MS-Windows "Static type object initializers in extension modules may cause compiles to fail with an error message like "initializer not a constant" Cheers /F From uche.ogbuji at fourthought.com Sat Jan 20 06:29:23 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Fri, 19 Jan 2001 22:29:23 -0700 Subject: [Python-Dev] Extension doc bugs In-Reply-To: Message from uche.ogbuji@fourthought.com of "Fri, 19 Jan 2001 14:42:40 MST." <200101192142.OAA29168@localhost.localdomain> Message-ID: <200101200529.WAA30349@localhost.localdomain> > For instance, from the object protocol docs: > > """ > int PyObject_Cmp (PyObject *o1, PyObject *o2, int *result) > Compare the values of o1 and o2 using a routine provided by o1, if one > exists, otherwise with a routine provided by o2. The result of the > comparison is returned in result. Returns -1 on failure. This is the > equivalent of the Python statement "result = cmp(o1, o2)". > """ > > After getting weird behavior implementing this, and then squinting at the > relevant Python 2.0 code, it appears that in actuality the Cmp function is to > return the direct comparison results (-1, 0, 1 based on ordering of the > parameters) furthermore, there is no such "result" argument. Bother. I didn't squint hard enough. I mistook the tp_compare slot for the PyObject_Cmp equivalent. I have indeed run into what I'm sure are nits in the Python/C API but given that my greatest alarm was false, I'll be more careful before bringing up the others. I'm still curious as to the best forum for this. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Sat Jan 20 06:36:12 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 00:36:12 -0500 Subject: [Python-Dev] Extension doc bugs In-Reply-To: <200101192142.OAA29168@localhost.localdomain> Message-ID: [uche.ogbuji at fourthought.com] > ... > In general, the C API docs are in a rather sorry state and though > I doubt I could do a great deal about fixing it, I'd be interested in > discussion of the matter, and perhaps making what contribution I can. > > Is the doc-sig the best place for this? Nope! Discussing it won't do any good, there or anywhere else. What it needs is for people to send better docs to python-docs at python.org or upload LaTeX patches to SourceForge, and to report doc bugs on SourceForge (which is where the start of this msg should have gone!). Most days we just work on whatever is backed up at SourceForge; if doc bugs don't show up there, they won't get repaired. the-docs-are-only-10x-better-than-the-sum-of-the-individual- contributions-ly y'rs - tim From tim.one at home.com Sat Jan 20 07:17:04 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 01:17:04 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Objects object.c,2.109,2.110 In-Reply-To: Message-ID: [Barry] > Modified Files: > object.c > Log Message: > default_3way_compare(): When comparing the pointers, they must be cast > to integer types (i.e. Py_uintptr_t, our spelling of C9X's uintptr_t). > ANSI specifies that pointer compares other than == and != to > non-related structures are undefined. This quiets an Insure > portability warning. Barry, that comment belongs in the code, not in the checkin msg. The code *used* to do this correctly (as you well know, since you & I went thru considerable pain to fix this the first time). However, because the *reason* for the convolution wasn't recorded in the code as a comment, somebody threw it all away the first time it got reworked. c-code-isn't-often-self-explanatory-ly y'rs - tim From tim.one at home.com Sat Jan 20 07:30:42 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 01:30:42 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 Message-ID: I had a huge string and wanted to put a double-quote on each end. The boring: '"' + huge + '"' does the job, but is inefficent . Then this transparent variation sprang unbidden from my hoary brow: huge.join('""') *That* should put to rest the argument over whether .join() is more properly a method of the separator or the sequence -- '""'.join(huge) instead would look plain silly . not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim From tim.one at home.com Sat Jan 20 10:28:18 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 04:28:18 -0500 Subject: [Python-Dev] Comparison of recursive objects In-Reply-To: <14952.21218.416551.695660@localhost.localdomain> Message-ID: [Guido's checkin msg] > ... > In a discussion with Tim, where we discovered that our intuition > on when a<=b should be true was failing, we decided to outlaw > ordering comparisons on recursive objects. (Once we have fixed our > intuition and designed a matching algorithm that's practical and > reasonable to implement, we can allow such orderings again.) [Jeremy] > Sounds sensible to me! I was quite puzzled about what <= should > return for recursive objects. That's easy: x <= y for recursive objects should return true if and only if x < y or x == y return true <0.9 wink>. x == y isn't a problem, although Python gives a remarkable answer: recursive objects in Python are instances of rooted, ordered, directed, finite, node-labeled graphs, and "x == y" in Python answers whether their graphs are isomorphic. Viewed that way (which is the correct way <0.5 wink>), the *natural* meaning for "x <= y" is "y contains a subgraph isomorphic to x". And that has *almost* all the nice properties we like: x <= x is true (x <= y and y <= z) implies x <= z (x <= y and y <= x) if and only if x == y However, 1. That's much harder to compute. 2. It implies, e.g., [2] <= [1, 2], and that's not what we *want* non-recursive sequence comparison to mean. 3. It's a partial ordering: given arbitrary x and y, it may be that neither contains an isomorphic image of the other. 4. We've again given up on avoiding surprises in *simple* comparisons among builtin types, like (under current CVS): >>> 1 < [1] < 0L < 1 1 >>> 1 < 1 0 >>> so it's hard to see why we should do any work at all to avoid violating "intuition" when comparing recursive objects: we're already scrubbing the face of intuition with steel wool, setting it on fire, then putting it out with an axe . Now let's look at Guido's example (or one of them, anyway): >>> a = [] >>> a.append(a) >>> a.append("x") >>> b = [] >>> b.append(b) >>> b.append("y") >>> a [[...], 'x'] >>> b [[...], 'y'] >>> I think it's a trick of *typography* that caused my first thought to be "well, clearly, a < b". That is, the *display* shows me two 2-element lists, each with the same "blob" as the first element, and where a[1] is obviously less than b[1]. Since "the blobs" are the same, the second elements control the outcome. But those "blobs" aren't really the same: a[0] is a, and b[0] is b, so asking whether a < b by looking first at their first elements just leads back to the original question: asking whether a[0] < b[0] is again asking whether a < b, and that makes no progress. Saying that a is less than b by fiat is *consistent* with the rules for lexicographic ordering, but so is insisting that a is greater than b. There's no basis for picking one over the other, and so no clear hope of coming up with a generally consistent scheme. Well, one clear hope: if recursive comparison says "not equal", it could resolve the dilemma by comparing object id instead. That would be consistent (I mostly think at the moment ...), but if you run the program above multiple times it may say a < b on some runs and b < a on others. WRT "the right way", it should be clear from the attached picture that neither a nor b contains an isomorphic image of the other, so from that POV they're not comparable (a != b, but neither a <= b nor b <= a holds). So this is what Guido made Python do: >>> a == b # still cool: they're not isomorphic and Python knows it 0 >>> a < b Traceback (most recent call last): File "", line 1, in ? ValueError: can't order recursive values >>> a <= b Traceback (most recent call last): File "", line 1, in ? ValueError: can't order recursive values In light of that, I still find these mildly surprising: >>> a < a 0 >>> a <= a 1 >>> I guess some recursive values are more orderable than others . >>> import copy >>> c = copy.deepcopy(a) >>> c [[...], 'x'] >>> a == c 1 >>> a <= c 1 >>> a < c 0 >>> BTW, this kind of construction appears to give equality-testing that's at best(!) exponential-time in the size of the dicts: def timeeq(x, y): from time import clock import sys s = clock() result = x == y f = clock() print x, result, round(f-s, 1), "seconds" sys.stdout.flush() d = {} e = {} timeeq(d, e) d[0] = d e[0] = e timeeq(d, e) d[1] = d e[1] = e timeeq(d, e) d[2] = d e[2] = e timeeq(d, e) Output: {} 1 0.0 seconds {0: {...}} 1 0.0 seconds {1: {...}, 0: {...}} 1 6.5 seconds After more than 15 minutes, the 3-element dict comparison still hasn't completed (yikes!). ackerman's-function-eat-your-heart-out-ly y'rs - tim -------------- next part -------------- A non-text attachment was scrubbed... Name: loopy.jpg Type: image/jpeg Size: 11363 bytes Desc: not available URL: From thomas at xs4all.net Sat Jan 20 15:30:26 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 15:30:26 +0100 Subject: [Python-Dev] PEP 229 checked in In-Reply-To: <200101192206.RAA12072@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, Jan 19, 2001 at 05:06:03PM -0500 References: <20010117234925.A17392@xs4all.nl> <20010118004400.B17392@xs4all.nl> <20010118103036.B21503@kronos.cnri.reston.va.us> <20010119000209.F17392@xs4all.nl> <200101191626.LAA29165@cj20424-a.reston1.va.home.com> <20010119173415.M17295@xs4all.nl> <200101192206.RAA12072@cj20424-a.reston1.va.home.com> Message-ID: <20010120153026.L17392@xs4all.nl> On Fri, Jan 19, 2001 at 05:06:03PM -0500, Guido van Rossum wrote: > So instead I added 5 lines to site.py, which tests for > os.name=='posix', then for sys.path[-1] ending in '/Modules' -- this > tests only succeeds when running from the build directory. Then it > calls distutils.util.get_platform() and uses the result to calculate > the correct directory name, which is then appended to sys.path. > Yes, this slows down startup (it imports a large portion of the > distutils package), but I don't care -- after all this is mostly for > me so I can play with the interpreter right after I've built it, > right? Right. The only downside (as far as I can tell) is that 'python -S' no longer works, in the build tree. I don't think that's that big a deal, but it should be documented somewhere, so we don't end up being boggled by it once we forget about it :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Sat Jan 20 17:18:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 11:18:39 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: Your message of "Fri, 19 Jan 2001 00:45:32 +0100." <20010119004532.G17392@xs4all.nl> References: <20010119004532.G17392@xs4all.nl> Message-ID: <200101201618.LAA15675@cj20424-a.reston1.va.home.com> > On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > > > filename = '/tmp/delete_me' > > This reminds me: we need a portable way to handle test-files :) Yeah, I noticed that this test failed on Windows -- fixed now. The test_support module exports TESTFN; there's also tempfile.mktemp() which should generate temporary files on all platforms. Is that enough? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Sat Jan 20 17:36:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 17:36:05 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201618.LAA15675@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Jan 20, 2001 at 11:18:39AM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> Message-ID: <20010120173605.P17295@xs4all.nl> On Sat, Jan 20, 2001 at 11:18:39AM -0500, Guido van Rossum wrote: > > On Thu, Jan 18, 2001 at 08:46:54AM -0800, Guido van Rossum wrote: > > > > > filename = '/tmp/delete_me' > > > > This reminds me: we need a portable way to handle test-files :) > Yeah, I noticed that this test failed on Windows -- fixed now. > The test_support module exports TESTFN; there's also tempfile.mktemp() > which should generate temporary files on all platforms. > Is that enough? Well, there is one more issue, which we can't fix terribly easy: test_fcntl tries to flock() the file. flock() doesn't work on all filesystems (like NFS) :P If we cared a lot, we could try several alternatives (current dir, /tmp, /var/tmp) in the specific case of flock, but personally I don't want to bother, and real sysadmins (who should care about the test failure) are more likely to build Python on a local disk than in their NFS-mounted homedirectory. At least that's how we do it :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Sat Jan 20 17:43:49 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 11:43:49 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 In-Reply-To: Your message of "Sat, 20 Jan 2001 01:30:42 EST." References: Message-ID: <200101201643.LAA16269@cj20424-a.reston1.va.home.com> > I had a huge string and wanted to put a double-quote on each end. The > boring: > > '"' + huge + '"' > > does the job, but is inefficent . Then this transparent variation > sprang unbidden from my hoary brow: > > huge.join('""') Points off for obscurity though! My favorite for this is: '"%s"' % huge Worth a microbenchmark? > *That* should put to rest the argument over whether .join() is more properly > a method of the separator or the sequence -- '""'.join(huge) instead would > look plain silly . > > not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim Give up the channeling for a while -- there's too much interference in the air from the Microsoft threaded stdio debate still. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Sat Jan 20 17:47:44 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 20 Jan 2001 10:47:44 -0600 (CST) Subject: [Python-Dev] how to test my __all__ lists? Message-ID: <14953.49456.654121.987189@beluga.mojam.com> How do I test the __all__ lists I'm building? I'm worried about a couple things: 1. I may have typos 2. I may leave something out of a list that should be imported by from-module-import-*. Thoughts? Skip From guido at digicool.com Sat Jan 20 18:00:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 12:00:05 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: Your message of "Sat, 20 Jan 2001 17:36:05 +0100." <20010120173605.P17295@xs4all.nl> References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> Message-ID: <200101201700.MAA16491@cj20424-a.reston1.va.home.com> > > > > filename = '/tmp/delete_me' > > > > > > This reminds me: we need a portable way to handle test-files :) > > Yeah, I noticed that this test failed on Windows -- fixed now. > > > The test_support module exports TESTFN; there's also tempfile.mktemp() > > which should generate temporary files on all platforms. > > Is that enough? > > Well, there is one more issue, which we can't fix terribly easy: test_fcntl > tries to flock() the file. flock() doesn't work on all filesystems (like > NFS) :P If we cared a lot, we could try several alternatives (current dir, > /tmp, /var/tmp) in the specific case of flock, but personally I don't want to > bother, and real sysadmins (who should care about the test failure) are more > likely to build Python on a local disk than in their NFS-mounted > homedirectory. At least that's how we do it :-) These days, I would think that it's a pretty sure bet that the system's tmp directory is not on NFS. Then we could just use tempfile.mktemp() in that module, right? Or does the /tmp filesystem on Linux (which AFAIK is a RAM disk implemented in virtual memory so it uses swap space when it runs out of RAM) not support locking? I don't particularly care about fixing this -- I haven't seen bug reports about this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jan 20 18:38:38 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 12:38:38 -0500 Subject: [Python-Dev] how to test my __all__ lists? In-Reply-To: Your message of "Sat, 20 Jan 2001 10:47:44 CST." <14953.49456.654121.987189@beluga.mojam.com> References: <14953.49456.654121.987189@beluga.mojam.com> Message-ID: <200101201738.MAA16636@cj20424-a.reston1.va.home.com> > How do I test the __all__ lists I'm building? I'm worried about a couple > things: > > 1. I may have typos Do "from M import *" -- this will raise an AttributeError if there's something in __all__ that's not defined in the module. > 2. I may leave something out of a list that should be imported by > from-module-import-*. That's what alpha-testing's for. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at netaxs.com Sat Jan 20 18:49:43 2001 From: esr at netaxs.com (Eric Raymond) Date: Sat, 20 Jan 2001 12:49:43 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <3A672376.4B951848@lemburg.com>; from M.-A. Lemburg on Thu, Jan 18, 2001 at 06:10:14PM +0100 References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> Message-ID: <20010120124943.C6073@unix3.netaxs.com> > A combination of time.time(), process id and counter should > work in all cases. Make sure you use a lock around the counter, > though. Yes, but...this hack has to work in a multithreaded environment, so process ID isn't good enough. And I don't want to keep a counter around if I don't have to. -- Eric S. Raymond From guido at digicool.com Sat Jan 20 19:01:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 20 Jan 2001 13:01:04 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: Your message of "Sat, 20 Jan 2001 12:49:43 EST." <20010120124943.C6073@unix3.netaxs.com> References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> <20010120124943.C6073@unix3.netaxs.com> Message-ID: <200101201801.NAA16880@cj20424-a.reston1.va.home.com> > > A combination of time.time(), process id and counter should > > work in all cases. Make sure you use a lock around the counter, > > though. > > Yes, but...this hack has to work in a multithreaded environment, > so process ID isn't good enough. And I don't want to keep a counter > around if I don't have to. Sorry Eric, this just doesn't make sense. Keeping a counter around in your module (protected by a semaphore) is obviously the right solution. Why are you fighting it? --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at netaxs.com Sat Jan 20 19:20:26 2001 From: esr at netaxs.com (Eric Raymond) Date: Sat, 20 Jan 2001 13:20:26 -0500 Subject: [Python-Dev] Weird use of hash() -- will this work? In-Reply-To: <200101201801.NAA16880@cj20424-a.reston1.va.home.com>; from Guido van Rossum on Sat, Jan 20, 2001 at 01:01:04PM -0500 References: <20010118022321.A9021@thyrsus.com> <3A672376.4B951848@lemburg.com> <20010120124943.C6073@unix3.netaxs.com> <200101201801.NAA16880@cj20424-a.reston1.va.home.com> Message-ID: <20010120132026.E6073@unix3.netaxs.com> On Sat, Jan 20, 2001 at 01:01:04PM -0500, Guido van Rossum wrote: > > Yes, but...this hack has to work in a multithreaded environment, > > so process ID isn't good enough. And I don't want to keep a counter > > around if I don't have to. > > Sorry Eric, this just doesn't make sense. Keeping a counter around in > your module (protected by a semaphore) is obviously the right > solution. Why are you fighting it? Actually, I'm not fighting it any more. I changed my mind a few minutes after shipping that response. -- Eric S. Raymond From thomas at xs4all.net Sat Jan 20 19:37:10 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 20 Jan 2001 19:37:10 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201700.MAA16491@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Sat, Jan 20, 2001 at 12:00:05PM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> Message-ID: <20010120193710.Q17295@xs4all.nl> On Sat, Jan 20, 2001 at 12:00:05PM -0500, Guido van Rossum wrote: > > Well, there is one more issue, which we can't fix terribly easy: test_fcntl > > tries to flock() the file. flock() doesn't work on all filesystems (like > > NFS) :P If we cared a lot, we could try several alternatives (current dir, > > /tmp, /var/tmp) in the specific case of flock, but personally I don't want to > > bother, and real sysadmins (who should care about the test failure) are > > more likely to build Python on a local disk than in their NFS-mounted > > homedirectory. At least that's how we do it :-) > These days, I would think that it's a pretty sure bet that the > system's tmp directory is not on NFS. Then we could just use > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > it uses swap space when it runs out of RAM) not support locking? Actually, most Linux distributions don't care enough about /tmp to make it a RAM-based filesystem. At least Debian and RedHat don't :) (There's a good reason for that: Linux's disk-data cache rocks if you have enough RAM, so there's no real gain in using a ramdisk) BSDI does (optionally) have such a /tmp, and probably the other BSD derived systems as well. But that doesn't mean it doesn't support locking, so that's not a real excuse. But like I said, I don't care enough to worry about it. I'll look at it before alpha2. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Sat Jan 20 21:10:51 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 15:10:51 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Message-ID: [Tim] > ... > 4. We've again given up on avoiding surprises in *simple* comparisons > among builtin types, like (under current CVS): > > >>> 1 < [1] < 0L < 1 > 1 > >>> 1 < 1 > 0 > >>> I really dislike that. Here's a consequence at a higher level: N = 5 x = [1 for i in range(N)] + \ [[1] for i in range(N)] + \ [0L for i in range(N)] x.sort() print x from random import shuffle tries = failures = 0 while failures < 5: tries += 1 y = x[:] shuffle(y) y.sort() if x != y: print "oops, on try number", tries print y failures += 1 and here's a typical run (2.1a1): [1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L, 0L] oops, on try number 3 [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] oops, on try number 5 [[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]] oops, on try number 6 [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] oops, on try number 7 [[1], 0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1]] oops, on try number 8 [0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1], 0L, 0L, 0L, 0L] I've often used list.sort() on a heterogeneous list simply to bring the elements of the same type next to each other. But as "try number 5" shows, I can no longer rely on even getting all the lists together. Indeed, heterogenous list.sort() has become a very bad (biased and slow) implementation of random.shuffle() . Under 2.0, the program never prints "oops", because the only violations of transitivity in 2.0's ordering of builtin types were bugs in the implementation (none of which show up in this simple test case); 2.0's .sort() *always* produces [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] The base trick in 2.0 was sound: when falling back to the "compare by name of the type" last resort, treat all numeric types as if they had the same name. While Python can't enforce that any user-defined __cmp__ is consistent, I think it should continue to set a good example in the way it implements its own comparisons. grumblingly y'rs - tim From skip at mojam.com Sat Jan 20 21:42:27 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 20 Jan 2001 14:42:27 -0600 (CST) Subject: [Python-Dev] should a module's thread safety be documented? Message-ID: <14953.63539.629197.232848@beluga.mojam.com> A bit late for 2.1alpha1, but it just occurred to me that perhaps there should be an annotation in the documentation that indicates whether or not a module is thread-safe. For example, many functions in fileinput rely on a module global called _state. It strikes me that this module is not likely to be thread-safe, yet the documentation doesn't appear to mention this, certainly not in an obvious fashion. Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of LaTex macros in Fred's arsenal? This would make documenting these properties both easy and consistent across modules. Skip From tim.one at home.com Sat Jan 20 22:13:41 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 16:13:41 -0500 Subject: [Python-Dev] Stupid Python Tricks, Volume 38 Number 1 In-Reply-To: <200101201643.LAA16269@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > huge.join('""') [Guido] > Points off for obscurity though! The Subject line was "Stupid Python Tricks" for a reason . Those who don't know the language inside-out should be tickled by figuring out why it even *works* (hint for the baffled: you have to view '""' as a sequence rather than as an atomic string). > My favorite for this is: > > '"%s"' % huge > > Worth a microbenchmark? Absolutely! I get: obvious 15.574 obscure 8.165 sprintf 8.133 after running: ITERS = 1000 indices = [0] * ITERS def obvious(huge): for i in indices: '"' + huge + '"' def obscure(huge): for i in indices: huge.join('""') def sprintf(huge): for i in indices: '"%s"' % huge def runtimes(huge): from time import clock for f in obvious, obscure, sprintf: start = clock() f(huge) finish = clock() print "%12s %7.3f" % (f.__name__, finish - start) runtimes("x" * 1000000) under current 2.1a1. Not a dead-quiet machine, but the difference is too small to care. Speed up huge.join attr lookup, and it would probably be faster . Hmm: if I boost ITERS high enough and cut back the size of huge, "obscure" eventually becomes *slower* than "obvious", and even if the "huge.join" lookup is floated out of the loop. I guess that points to the relative burden of calling a bound method. So, in real life, the huge.join approach may well be the slowest! >> not-entirely-sure-i'm-channeling-on-this-one-ly y'rs - tim > Give up the channeling for a while -- there's too much interference in > the air from the Microsoft threaded stdio debate still. :-) What debate? You need two arguably valid points of view for a debate to even start . gloating-in-victory-vicious-in-defeat-but-simply-unbearable-in- ambiguity-ly y'rs - tim From fdrake at acm.org Sat Jan 20 22:23:58 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 20 Jan 2001 16:23:58 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <200101201700.MAA16491@cj20424-a.reston1.va.home.com> References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> Message-ID: <14954.494.223724.705495@cj42289-a.reston1.va.home.com> Guido van Rossum writes: > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > it uses swap space when it runs out of RAM) not support locking? I thought it was Solaris that used available+virtual memory for /tmp; that was what we ran into at CNRI. (Which doesn't preclude Linux from doing the same, I just don't recall that we've encountered that.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Sat Jan 20 23:05:27 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 20 Jan 2001 17:05:27 -0500 (EST) Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14953.63539.629197.232848@beluga.mojam.com> References: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > should be an annotation in the documentation that indicates whether or not a > module is thread-safe. For example, many functions in fileinput rely on a If you can create a list of the known thread safe and known thread unsafe modules, I'll come up with appropriate annotations for the documentation. > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > LaTex macros in Fred's arsenal? This would make documenting these > properties both easy and consistent across modules. Not sure that this is exactly the right approach to the markup; I'll think about this one. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at mojam.com Sat Jan 20 23:31:52 2001 From: skip at mojam.com (Skip Montanaro) Date: Sat, 20 Jan 2001 16:31:52 -0600 (CST) Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> References: <14953.63539.629197.232848@beluga.mojam.com> <14954.2983.450755.761653@cj42289-a.reston1.va.home.com> Message-ID: <14954.4568.460875.662560@beluga.mojam.com> Fred> If you can create a list of the known thread safe and known thread Fred> unsafe modules, I'll come up with appropriate annotations for the Fred> documentation. I think that's going to be a significant undertaking, requiring examination of a lot of Python and C code. I'd rather approach it incrementally, which was why I suggested the LaTeX macros. As modules are determined to be safe or unsafe, the appropriate safety macro could just be inserted into the correct lib*.tex file. It would (in my mind) expand to a stock bit of text inserted at a standard place in the file. Skip From tim.one at home.com Sat Jan 20 23:52:09 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 20 Jan 2001 17:52:09 -0500 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: [Skip Montanaro] > ... > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to > the litany of LaTex macros in Fred's arsenal? This would make > documenting these properties both easy and consistent across > modules. When a module is *not* threadsafe, that's usually considered "a bug" in the module. So we should just point out modules that aren't threadsafe by design. Alas, that's A Project. From nas at arctrix.com Sat Jan 20 16:59:14 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sat, 20 Jan 2001 07:59:14 -0800 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sat, Jan 20, 2001 at 03:10:51PM -0500 References: Message-ID: <20010120075914.B18840@glacier.fnational.com> On Sat, Jan 20, 2001 at 03:10:51PM -0500, Tim Peters wrote: > While Python can't enforce that any user-defined __cmp__ is consistent, I > think it should continue to set a good example in the way it implements its > own comparisons. I think the 2.0 behavior should be fairly easy to restore. I'll leave it up to Guido though since he's "Mr. Comparison" now and I haven't looked at the code since I checked in the coercion patch. Neil From nas at arctrix.com Sat Jan 20 17:03:36 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sat, 20 Jan 2001 08:03:36 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_dumbdbm.py,NONE,1.1 In-Reply-To: <14954.494.223724.705495@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Sat, Jan 20, 2001 at 04:23:58PM -0500 References: <20010119004532.G17392@xs4all.nl> <200101201618.LAA15675@cj20424-a.reston1.va.home.com> <20010120173605.P17295@xs4all.nl> <200101201700.MAA16491@cj20424-a.reston1.va.home.com> <14954.494.223724.705495@cj42289-a.reston1.va.home.com> Message-ID: <20010120080336.C18840@glacier.fnational.com> On Sat, Jan 20, 2001 at 04:23:58PM -0500, Fred L. Drake, Jr. wrote: > > Guido van Rossum writes: > > tempfile.mktemp() in that module, right? Or does the /tmp filesystem > > on Linux (which AFAIK is a RAM disk implemented in virtual memory so > > it uses swap space when it runs out of RAM) not support locking? > > I thought it was Solaris that used available+virtual memory for > /tmp; that was what we ran into at CNRI. (Which doesn't preclude > Linux from doing the same, I just don't recall that we've encountered > that.) I don't know of any Linux system that uses a RAM based /tmp. The Linux implemention of ext2 is so fast it doesn't make any sense. If you have enough memory all the data is stored in the buffer, page, and inode caches anyhow. Neil From trentm at ActiveState.com Sun Jan 21 00:35:56 2001 From: trentm at ActiveState.com (Trent Mick) Date: Sat, 20 Jan 2001 15:35:56 -0800 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? Message-ID: <20010120153556.C18375@ActiveState.com> ... or am I missing something? With Python 2.0 on Windows 2000, when playing with sys.exit() and sys.argv() I get some unexpected results. First here is a simple case that shows what I expect. I run "caller_good.py" which call "callee_good.py" and prints its return value. "callee_good.py" returns 42 so "42" is printed: ----------------- caller_good.py -------------------- import os retval = os.system("python callee_good.py") print "caller: the retval is", retval ----------------------------------------------------- ----------------- callee_good.py -------------------- import sys sys.exit(42) ----------------------------------------------------- D:\trentm\tmp>python caller_good.py caller: the retval is 42 Now here is what I didn't expect. I changed "caller_bad.py" to pass, as an argument, the value that "callee_bad.py" should return. ----------------- caller_bad.py --------------------- import os retval = os.system("python callee_bad.py 42") print "caller: the retval is", retval ----------------------------------------------------- ----------------- callee_bad.py --------------------- import sys firstarg = sys.argv[1] print "callee_bad: firstarg is", firstarg sys.exit(firstarg) ----------------------------------------------------- D:\trentm\tmp>python caller_bad.py callee_bad: firstarg is 42 42 # <---- where did *this* print come from? caller: the retval is 1 # <---- and this retval is incorrect Any ideas? I have not tried to track this down yet nor have I tried the latest Python-CVS state. Trent -- Trent Mick TrentM at ActiveState.com From moshez at zadka.site.co.il Sun Jan 21 13:37:57 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sun, 21 Jan 2001 14:37:57 +0200 (IST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,NONE,1.1 In-Reply-To: References: Message-ID: <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> Yay! I can change to python-dev manually! (hear sounds of the timbot's teeth grinding) On Sat, 20 Jan 2001, Skip Montanaro wrote: > def check_all(_modname): > exec "import %s" % _modname > verify(hasattr(sys.modules[_modname],"__all__"), > "%s has no __all__ attribute" % _modname) > exec "del %s" % _modname > exec "from %s import *" % _modname > > _keys = locals().keys() .... Wouldn't it be better to use the d = {} exec "foo", d And verify "d" instead? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido at digicool.com Sun Jan 21 17:51:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 21 Jan 2001 11:51:45 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sat, 20 Jan 2001 15:10:51 EST." References: Message-ID: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> [Tim, complaining that numerical types are no longer lumped together in default comparisons:] > I've often used list.sort() on a heterogeneous list simply to bring the > elements of the same type next to each other. But as "try number 5" shows, > I can no longer rely on even getting all the lists together. Indeed, > heterogenous list.sort() has become a very bad (biased and slow) > implementation of random.shuffle() . > > Under 2.0, the program never prints "oops", because the only violations of > transitivity in 2.0's ordering of builtin types were bugs in the > implementation (none of which show up in this simple test case); 2.0's > .sort() *always* produces > > [0L, 0L, 0L, 0L, 0L, 1, 1, 1, 1, 1, [1], [1], [1], [1], [1]] > > The base trick in 2.0 was sound: when falling back to the "compare by name > of the type" last resort, treat all numeric types as if they had the same > name. > > While Python can't enforce that any user-defined __cmp__ is consistent, I > think it should continue to set a good example in the way it implements its > own comparisons. I think I can put this behavior back. (I believe that before I reorganized the comparison code, it seemed really tricky to do this, but after refactoring the code, it's quite easy to do.) My only concern is that under the old schele, two different numeric extension types that somehow can't be compared will end up being *equal*. To fix this, I propose that if the names compare equal, as a last resort we compare the type pointers -- this should be consistent too. Here's a patch that stops your test program from reporting failures: *** object.c 2001/01/21 16:25:18 2.112 --- object.c 2001/01/21 16:50:16 *************** *** 522,527 **** --- 522,528 ---- default_3way_compare(PyObject *v, PyObject *w) { int c; + char *vname, *wname; if (v->ob_type == w->ob_type) { /* When comparing these pointers, they must be cast to *************** *** 550,557 **** } /* different type: compare type names */ ! c = strcmp(v->ob_type->tp_name, w->ob_type->tp_name); ! return (c < 0) ? -1 : (c > 0) ? 1 : 0; } #define CHECK_TYPES(o) PyType_HasFeature((o)->ob_type, Py_TPFLAGS_CHECKTYPES) --- 551,571 ---- } /* different type: compare type names */ ! if (v->ob_type->tp_as_number) ! vname = ""; ! else ! vname = v->ob_type->tp_name; ! if (w->ob_type->tp_as_number) ! wname = ""; ! else ! wname = w->ob_type->tp_name; ! c = strcmp(vname, wname); ! if (c < 0) ! return -1; ! if (c > 0) ! return 1; ! /* Same type name, or (more likely) incomparable numeric types */ ! return (v->ob_type < w->ob_type) ? -1 : 1; } #define CHECK_TYPES(o) PyType_HasFeature((o)->ob_type, Py_TPFLAGS_CHECKTYPES) Let me know if you agree with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sun Jan 21 18:00:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 21 Jan 2001 12:00:02 -0500 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: Your message of "Sat, 20 Jan 2001 14:42:27 CST." <14953.63539.629197.232848@beluga.mojam.com> References: <14953.63539.629197.232848@beluga.mojam.com> Message-ID: <200101211700.MAA25479@cj20424-a.reston1.va.home.com> > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > should be an annotation in the documentation that indicates whether or not a > module is thread-safe. For example, many functions in fileinput rely on a > module global called _state. It strikes me that this module is not likely > to be thread-safe, yet the documentation doesn't appear to mention this, > certainly not in an obvious fashion. > > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > LaTex macros in Fred's arsenal? This would make documenting these > properties both easy and consistent across modules. It's hard to say whether a *whole module* is threadsafe. E.g. in the fileinput example, there's the clear implication that if you use this in multiple threads, you should instantiate your own FileInput instances, and then you're totally thread-safe. Clearly the semantics of the module-global functions are thread-unsafe though. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sun Jan 21 19:45:07 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 13:45:07 -0500 Subject: [Python-Dev] test_sax failing (Windows) Message-ID: test test_sax crashed -- exceptions.SystemError: 'finally' pops bad exception Sometimes it crashes (some flavor of memory fault) instead. Elsewhere? From nas at arctrix.com Sun Jan 21 13:28:35 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 21 Jan 2001 04:28:35 -0800 Subject: [Python-Dev] autoconf --enable vs. --with Message-ID: <20010121042835.A19774@glacier.fnational.com> I've been working a bit on the build process lately. I came across this in the autoconf documentation: If a software package has optional compile-time features, the user can give `configure' command line options to specify whether to compile them. The options have one of these forms: --enable-FEATURE[=ARG] --disable-FEATURE Some packages require, or can optionally use, other software packages which are already installed. The user can give `configure' command line options to specify which such external software to use. The options have one of these forms: --with-package[=ARG] --without-package Is it worth fixing the Python configure script to comply with these definitions? It looks like with-cycle-gc and mybe with-pydebug would have to be changed. Neil AC_ARG_ENABLE From tim.one at home.com Sun Jan 21 20:44:38 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 14:44:38 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> Message-ID: [Guido, on again lumping numbers together] > I think I can put this behavior back. (I believe that before I > reorganized the comparison code, it seemed really tricky to do this, > but after refactoring the code, it's quite easy to do.) I can believe that; and I believe the "bugs" in 2.0 ended up somewhere in or around the bowels of the xxxHalfBinOp-like routines (which were really tricky to my eyes -- the interactions among coercions and comparisons were hard to keep straight). > My only concern is that under the old schele, two different numeric > extension types that somehow can't be compared will end up being > *equal*. To fix this, I propose that if the names compare equal, as a > last resort we compare the type pointers -- this should be consistent > too. Agreed, and sounds fine! Save Barry a little work, though: > ! /* Same type name, or (more likely) incomparable numeric types */ > ! return (v->ob_type < w->ob_type) ? -1 : 1; That's non-std C in a way Insure complains about elsewhere; change to return ((Py_uintptr_t)v->ob_type < (Py_uintptr_t)w->ob_type) ? -1 : 1; if-vendors-stuck-to-the-letter-of-the-c-std-python-wouldn't- compile-at-all-ly y'rs - tim From trentm at ActiveState.com Sun Jan 21 21:01:44 2001 From: trentm at ActiveState.com (Trent Mick) Date: Sun, 21 Jan 2001 12:01:44 -0800 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? In-Reply-To: <20010120153556.C18375@ActiveState.com>; from trentm@ActiveState.com on Sat, Jan 20, 2001 at 03:35:56PM -0800 References: <20010120153556.C18375@ActiveState.com> Message-ID: <20010121120144.B28643@ActiveState.com> On Sat, Jan 20, 2001 at 03:35:56PM -0800, Trent Mick wrote: > > ... or am I missing something? Ignore me. RTFM (sys.exit), Trent. Sorry, Trent -- Trent Mick TrentM at ActiveState.com From tim.one at home.com Sun Jan 21 21:13:02 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 15:13:02 -0500 Subject: [Python-Dev] spurious print and faulty return values: Is this a bug...? In-Reply-To: <20010121120144.B28643@ActiveState.com> Message-ID: [Trent, quoting Trent] >> >> ... or am I missing something? [and back to Trent] > Ignore me. RTFM (sys.exit), Trent. Nobody wants to ignore *you*, Trent! If it's not the case that you wanted to code sys.exit(int(firstarg)) instead, holler, cuz if that wasn't the problem I'm still baffled. or-if-it-was-it-caught-you-because-sys.exit's-tricks-aren't- really-pythonic-ly y'rs - tim From loewis at informatik.hu-berlin.de Sun Jan 21 22:21:24 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:21:24 +0100 (MET) Subject: [Python-Dev] test_sax failing (Windows) Message-ID: <200101212121.WAA16327@pandora.informatik.hu-berlin.de> > Elsewhere? Not for me, on neither Solaris nor Linux. What expat version? Regards, Martin From loewis at informatik.hu-berlin.de Sun Jan 21 22:22:44 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:22:44 +0100 (MET) Subject: [Python-Dev] autoconf --enable vs. --with Message-ID: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> > It looks like with-cycle-gc and mybe with-pydebug would have to be > changed. I'm in favour of changing it. Regards, Martin From loewis at informatik.hu-berlin.de Sun Jan 21 22:34:08 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 22:34:08 +0100 (MET) Subject: [Python-Dev] test___all__ fails with no bsddb Message-ID: <200101212134.WAA16446@pandora.informatik.hu-berlin.de> On my Solaris 2.6 installation, with no bsddb module, I get test test___all__ failed -- dbhash has no __all__ attribute This is caused by anydbm importing dbhash first. After that fails, dbhash is still in sys.modules, and the next import of dbhash silently loads an incomplete module. Regards, Martin From tim.one at home.com Sun Jan 21 22:38:11 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 16:38:11 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212121.WAA16327@pandora.informatik.hu-berlin.de> Message-ID: [Martin von Loewis] > Not for me, on neither Solaris nor Linux. What expat version? Tell me how to answer the question, and I'll be happy to (I have no idea what any of this stuff is or does). My pyexpat.c (well, my *everything*) is current CVS, pyexpat.c in particular is revision 2.33. xmltok.dll and xmlparse.dll were obtained from ftp://ftp.jclark.com/pub/xml/expat.zip for the 2.0 release. Is any of that relevant? The tests passed in the wee hours (EST; UTC -0500) this morning. They began failing after I updated around 1pm EST today. From thomas at xs4all.net Sun Jan 21 22:54:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 21 Jan 2001 22:54:05 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 02:44:38PM -0500 References: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> Message-ID: <20010121225405.M17392@xs4all.nl> On Sun, Jan 21, 2001 at 02:44:38PM -0500, Tim Peters wrote: > > ! /* Same type name, or (more likely) incomparable numeric types */ > > ! return (v->ob_type < w->ob_type) ? -1 : 1; > That's non-std C in a way Insure complains about elsewhere; change to > return ((Py_uintptr_t)v->ob_type < > (Py_uintptr_t)w->ob_type) ? -1 : 1; Why is comparing v->ob_type with w->ob_type illegal ? They're both pointers to the same type, aren't they ? > if-vendors-stuck-to-the-letter-of-the-c-std-python-wouldn't- > compile-at-all-ly y'rs - tim That's easy to check, gcc has these nice (and from a users point of view, fairly useless) options: '-ansi', '-pedantic' and '-pedantic-errors'. '-ansi' disables some GCC-specific features, -pedantic turns gcc into a whiney pedantic I'm sure you'd get along with just fine , and -pedantic-errors turns those whines into errors. Doing a quick check I see one error I added myself (but haven't commited) in the continue-inside-try patch (a trailing comma in an enumerator definition), and one error in configure (it mis-detects the arguments to setpgrp() in strict-ANSI mode, for some reason.) I don't see any errors in the core Python. I see an error in the nis module (missing function prototype, and broken system-include file) and a *lot* of errors in linuxaudiodev, but nothing else in the set of modules I can compile. Not bad! Note that this was tested in a current tree. I couldn't find either Guido's 'broken' code or your proposed 'good' code, so I don't know if you checked in a fix yet. If you didn't, don't bother, it's not broken :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis at informatik.hu-berlin.de Sun Jan 21 23:00:47 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Sun, 21 Jan 2001 23:00:47 +0100 (MET) Subject: [Python-Dev] Re: test_sax failing (Windows) In-Reply-To: References: Message-ID: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> > [Martin von Loewis] > > Not for me, on neither Solaris nor Linux. What expat version? > > Tell me how to answer the question, and I'll be happy to (I have no idea > what any of this stuff is or does). > > My pyexpat.c (well, my *everything*) is current CVS, pyexpat.c in > particular is revision 2.33. That's good; mine too. > xmltok.dll and xmlparse.dll were obtained from > > ftp://ftp.jclark.com/pub/xml/expat.zip > > for the 2.0 release. > > Is any of that relevant? That gives some clue, yes. Unfortunately, that URL itself is a symlink that was expat1_1.zip (157936 bytes) at some point, and now is expat1_2.zip (153591 bytes). The files themselves are not self-identifying, it's hard to tell once unzipped... Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either works for me. I never tested 1.95.x (which is also not available from jclark.com). > The tests passed in the wee hours (EST; UTC -0500) this morning. > They began failing after I updated around 1pm EST today. I just merged pyexpat changes from PyXML into Python 2 so that could be the cause. However, this very code has been used for some time by PyXML users, why it crashes for you is a mystery to me. Any chance of producing a C backtrace? Regards, Martin From tim.one at home.com Sun Jan 21 23:09:30 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 17:09:30 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: Message-ID: FYI, under the debug-build Python, running test_sax.py under the debugger dies like so: Passed test_attrs_empty Passed test_attrs_wattr Passed test_escape_all Passed test_escape_basic Passed test_escape_extra Passed test_expat_attrs_empty Passed test_expat_attrs_wattr Passed test_expat_dtdhandler Passed test_expat_entityresolver Passed test_expat_file Traceback (most recent call last): File "../lib/test/test_sax.py", line 603, in ? confirm(value(), name) File "../lib/test/test_sax.py", line 435, in test_expat_incomplete parser.parse(StringIO("")) File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 42, in parse xmlreader.IncrementalParser.parse(self, source) File "c:\code\python\dist\src\lib\xml\sax\xmlreader.py", line 122, in parse self.close() File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 91, in close self.feed("", isFinal = 1) File "c:\code\python\dist\src\lib\xml\sax\expatreader.py", line 82, in feed except expat.error: SystemError: 'finally' pops bad exception Running it from a command line instead produces the same output up to but not including the traceback, and Python crashes with a memory fault then. Attaching to the process with a debugger at that point shows it trying to do _Py_Dealloc on an op whose op->op_type member is NULL. Here's the call stack at that point: _Py_Dealloc(_object * 0x007af100) line 1304 + 6 bytes insertdict(dictobject * 0x007637ec, _object * 0x007a8270, long -1601350627, _object * 0x1e1eff18 __Py_NoneStruct) line 364 + 48 bytes PyDict_SetItem(_object * 0x007637ec, _object * 0x007a8270, _object * 0x1e1eff18 __Py_NoneStruct) line 498 + 21 bytes PyDict_SetItemString(_object * 0x007637ec, char * 0x1e1d84fc, _object * 0x1e1eff18 __Py_NoneStruct) line 1272 + 17 bytes PySys_SetObject(char * 0x1e1d84fc, _object * 0x1e1eff18 __Py_NoneStruct) line 67 + 17 bytes reset_exc_info(_ts * 0x00760630) line 2207 + 17 bytes eval_code2(PyCodeObject * 0x00993df0, _object * 0x0098794c, _object * 0x00000000, _object * * 0x007a9d28, int 2, _object * * 0x007a9d30, int 1, _object * * 0x009a0b60, int 1) line 2125 + 9 bytes fast_function(_object * 0x009a4f6c, _object * * * 0x0063f5a0, int 4, int 2, int 1) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x00993910, _object * 0x0098794c, _object * 0x00000000, _object * * 0x007a05e8, int 1, _object * * 0x007a05ec, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes fast_function(_object * 0x009a549c, _object * * * 0x0063f738, int 1, int 1, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x007b35e0, _object * 0x0098110c, _object * 0x00000000, _object * * 0x009beb10, int 2, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes call_eval_code2(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2765 + 57 bytes call_object(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2594 + 17 bytes call_method(_object * 0x0098a97c, _object * 0x009beafc, _object * 0x00000000) line 2717 + 17 bytes call_object(_object * 0x007e125c, _object * 0x009beafc, _object * 0x00000000) line 2592 + 17 bytes do_call(_object * 0x007e125c, _object * * * 0x0063f96c, int 2, int 0) line 2915 + 17 bytes eval_code2(PyCodeObject * 0x00991560, _object * 0x0098794c, _object * 0x00000000, _object * * 0x009bce98, int 2, _object * * 0x009bcea0, int 0, _object * * 0x00000000, int 0) line 1863 + 30 bytes fast_function(_object * 0x009a7dfc, _object * * * 0x0063fb04, int 2, int 2, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x009f7e00, _object * 0x0076f14c, _object * 0x00000000, _object * * 0x00775904, int 0, _object * * 0x00775904, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes fast_function(_object * 0x009bc8ac, _object * * * 0x0063fc9c, int 0, int 0, int 0) line 2817 + 61 bytes eval_code2(PyCodeObject * 0x009f86d0, _object * 0x0076f14c, _object * 0x0076f14c, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0, _object * * 0x00000000, int 0) line 1860 + 37 bytes PyEval_EvalCode(PyCodeObject * 0x009f86d0, _object * 0x0076f14c, _object * 0x0076f14c) line 338 + 29 bytes run_node(_node * 0x007aa740, char * 0x00760dd9, _object * 0x0076f14c, _object * 0x0076f14c) line 919 + 17 bytes run_err_node(_node * 0x007aa740, char * 0x00760dd9, _object * 0x0076f14c, _object * 0x0076f14c) line 907 + 21 bytes PyRun_FileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 257, _object * 0x0076f14c, _object * 0x0076f14c, int 1) line 899 + 21 bytes PyRun_SimpleFileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 1) line 612 + 30 bytes PyRun_AnyFileEx(_iobuf * 0x10261888, char * 0x00760dd9, int 1) line 466 + 17 bytes Py_Main(int 2, char * * 0x00760da0) line 295 + 44 bytes main(int 2, char * * 0x00760da0) line 10 + 13 bytes insertdict is doing Py_DECREF(old_value); reset_exc_info is doing PySys_SetObject("exc_type", frame->f_exc_type); Bet that's as helpful to you as it was to me . From thomas at xs4all.net Sun Jan 21 23:13:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 21 Jan 2001 23:13:02 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <20010121225405.M17392@xs4all.nl>; from thomas@xs4all.net on Sun, Jan 21, 2001 at 10:54:05PM +0100 References: <200101211651.LAA25346@cj20424-a.reston1.va.home.com> <20010121225405.M17392@xs4all.nl> Message-ID: <20010121231302.N17392@xs4all.nl> On Sun, Jan 21, 2001 at 10:54:05PM +0100, Thomas Wouters wrote: > I see an error in the nis module (missing function prototype, and broken > system-include file) and a *lot* of errors in linuxaudiodev The errors in linuxaudiodev are only errors because for some reason, in -ansi -pedantic-errors mode, gcc doesn't define the 'linux' symbol. IMHO, not worth fixing. The nismodule is 'broken' because of this: static nismaplist * nis_maplist (void) { nisresp_maplist *list; char *dom; CLIENT *cl, *clnt_create(); clnt_create() should be declared by the system include files. Anyone have objections to me moving it to pyport.h, inside the '#if 0' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Sun Jan 21 23:28:45 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 17:28:45 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <20010121225405.M17392@xs4all.nl> Message-ID: [Thomas Wouters] > Why is comparing v->ob_type with w->ob_type illegal ? They're > both pointers to the same type, aren't they ? Non-equality comparison of pointers is defined if and only if the pointers are both addresses in the same contiguous structure (think struct or array); an exception is made for a pointer "one beyond the end" of an array, i.e. if sometype a[N]; then &a[0] < &a[N] == 1 is guaranteed despite that &a[N] is outside the bounds of a; but &a[0] < &a[N+1] is undefined (which *means* undefined! e.g., it's OK if they compare equal, or if the comparison causes a hardware fault, or ...). > That's easy to check, gcc has these nice (and from a users point of view, > fairly useless) options: '-ansi', '-pedantic' and '-pedantic-errors'. > '-ansi' disables some GCC-specific features, -pedantic turns gcc into a > whiney pedantic I'm sure you'd get along with just fine , and > -pedantic-errors turns those whines into errors. Your faith in gcc is as charming as it is naive : the most interesting cases of undefined behavior can't be checked no-way, no-how at compile-time. That's why Barry keeps talking employers into dumping thousands of dollars into a single Insure++ license. Insure++ actually tags every pointer at runtime with its source, and gripes if non-equality comparisons are done on a pair not derived from the same array or malloc etc. Since Python type objects are individually allocated (not taken from a preallocated contiguous vector), Insure++ should complain about that compare. > ... > Note that this was tested in a current tree. I couldn't find > either Guido's 'broken' code or your proposed 'good' code, so I > don't know if you checked in a fix yet. If you didn't, don't bother, > it's not broken :-) Guido hasn't checked it in yet, but gcc isn't smart enough to detect *this* breakage anyway. From fredrik at effbot.org Mon Jan 22 00:02:10 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Mon, 22 Jan 2001 00:02:10 +0100 Subject: [Python-Dev] more unicode database changes Message-ID: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Just checked in another unicode database patch, which saves another ~60k. On my Windows box, the Unicode tables are now about 200k (down from 600k in 2.0). After this change, Modules/unicodedatabase.[ch] are no longer used. Since I'm on a Windows box with MSVC 5.0, I don't really want to try removing them from the official build files. In- stead, I've checked in empty versions of the files. Can anyone help me get rid of all references to them from the build files (and CVS)? PS. btw, if my changes broke the build somewhere, let me know asap! From tim.one at home.com Mon Jan 22 00:07:14 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:07:14 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: [Martin, on ftp://ftp.jclark.com/pub/xml/expat.zip] > ... > That gives some clue, yes. Unfortunately, that URL itself is a symlink > that was expat1_1.zip (157936 bytes) at some point, That's the one I've been using. > and now is expat1_2.zip (153591 bytes). I'm assuming you're recommending that one! Based on that assumption, I've downloaded a new one and will put that in the 2.1a1 Windows release. Scream if that's not what you want. > ... > Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either > works for me. I never tested 1.95.x (which is also not available from > jclark.com). If you do and love it, let me know where to get it and I'll ship that instead. >> The tests passed in the wee hours (EST; UTC -0500) this morning. >> They began failing after I updated around 1pm EST today. > I just merged pyexpat changes from PyXML into Python 2 so that could > be the cause. However, this very code has been used for some time by > PyXML users, why it crashes for you is a mystery to me. Perhaps gc, perhaps uninitialized vars, ..., hard to say. Unfortunately, it's not unusual for flawed code to display different behavior across platforms; or, from the long-term QA perspective, it's *great* that flawed code doesn't always appear to work on all platforms . > Any chance of producing a C backtrace? Sent that before; doesn't look like much help; we're seeing a NULL type pointer, but at that stage there's no telling when or where or why it *became* NULL. I'm going to rebuild the world from scratch, and use the new DLLs. You should assume that didn't help unless I say otherwise within 15 minutes. From tim.one at home.com Mon Jan 22 00:09:51 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:09:51 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: [/F] > Just checked in another unicode database patch, which > saves another ~60k. On my Windows box, the Unicode > tables are now about 200k (down from 600k in 2.0). Yay! I take it CNRI wasn't paying you by the byte . > After this change, Modules/unicodedatabase.[ch] are no > longer used. > > Since I'm on a Windows box with MSVC 5.0, I don't really > want to try removing them from the official build files. In- > stead, I've checked in empty versions of the files. That's fine. > Can anyone help me get rid of all references to them from > the build files (and CVS)? > > > > PS. btw, if my changes broke the build somewhere, let me > know asap! I'll take care of the MS project files -- and I was just about to rebuild the world from scratch anyway. From tim.one at home.com Mon Jan 22 00:20:03 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:20:03 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: > After this change, Modules/unicodedatabase.[ch] are no > longer used. Not so: unicodedata.c still #includes unicodedatabase.h. From tim.one at home.com Mon Jan 22 00:53:13 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 18:53:13 -0500 Subject: [Python-Dev] more unicode database changes In-Reply-To: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: [/F] > ... > PS. btw, if my changes broke the build somewhere, let me > know asap! The Windows build is fine now and changes checked-in. You can remove Modules/unicodedatabase.[ch] from the project without hurting it (although I imagine the Unixish builds still need to learn about this!). From tim.one at home.com Mon Jan 22 01:12:21 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 19:12:21 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: More FYI: With the new expat1_2.zip (153591 bytes) DLLs, all tests pass on Windows except for test_sax. No change in symptoms. The failure modes for test_sax depend on all of: + Whether run in release or debug builds. + Whether text_sax.py is run directly or via regrtest.py. + Whether I delete all .pyc/.pyo files first, or use precomplied ones. + In debug builds, whether the test is started from within the debugger, or I start it via cmdline and attach to the process after it crashes (with a memory fault). Here's a new failure mode: test test_sax crashed -- XMLParserType: no element found: line 1, column 5 So this smells to high heaven of either a nasty gc problem or referencing uninitialized memory. Symptoms don't change if I stick import gc gc.disable() at the start of test_sax.py. Barry, can you try running test_sax under Insure? I've got little chance of making enough time tonight to figure this out the hard way ... From nas at arctrix.com Sun Jan 21 18:28:52 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 21 Jan 2001 09:28:52 -0800 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 07:12:21PM -0500 References: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: <20010121092852.A24605@glacier.fnational.com> On Sun, Jan 21, 2001 at 07:12:21PM -0500, Tim Peters wrote: > So this smells to high heaven of either a nasty gc problem or referencing > uninitialized memory. Symptoms don't change if I stick > > import gc > gc.disable() > > at the start of test_sax.py. Can you try it with WITH_CYCLE_GC undefined? Neil From greg at cosc.canterbury.ac.nz Mon Jan 22 01:25:08 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 13:25:08 +1300 (NZDT) Subject: [Python-Dev] a>b == b Message-ID: <200101220025.NAA01809@s454.cosc.canterbury.ac.nz> Suppose I have a class which checks whether it knows how to do a comparison, and if not, wants to pass it on to the other operand in case it knows: class Foo: def __lt__(self, other): if I_know_about(other): # do the comparison else: return other.__gt__(self) If the other operand has a __gt__ method which is doing similar tricks, infinite recursion could result. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Jan 22 01:36:51 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 13:36:51 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: <200101191848.NAA02765@cj20424-a.reston1.va.home.com> Message-ID: <200101220036.NAA01813@s454.cosc.canterbury.ac.nz> Guido: > I don't understand how these can be not commutative unless they have a > side effect on the left argument I think he meant "not reflective". If ab == ceil(a,b), then clearly aa. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mwh21 at cam.ac.uk Mon Jan 22 01:48:16 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 22 Jan 2001 00:48:16 +0000 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Greg Ewing's message of "Mon, 22 Jan 2001 13:36:51 +1300 (NZDT)" References: <200101220036.NAA01813@s454.cosc.canterbury.ac.nz> Message-ID: Greg Ewing writes: > Guido: > > > I don't understand how these can be not commutative unless they have a > > side effect on the left argument > > I think he meant "not reflective". If ab == > ceil(a,b), then clearly aa. What's floor of two arguments? In common lisp, (floor a b) is the largest integer n such that (<= n (/ a b)), in Python it's a type error... if you meant min(a,b), then I then think the programmer who thinks "min(a,b)" is spelt "a Message-ID: <200101220052.NAA01817@s454.cosc.canterbury.ac.nz> > Non-equality comparison of pointers is defined if and only if the pointers > are both addresses in the same contiguous structure I'm not sure that the proposed alternative (casting both pointers to ints and comparing the ints) is any better. Does the C std define the result of doing that to two unrelated pointers? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon Jan 22 01:56:16 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 19:56:16 -0500 Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: <20010121092852.A24605@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Can you try it with WITH_CYCLE_GC undefined? Good idea -- for someone with an infinite amount of free time . But being a good sport, I did as you asked with giddy cheer. Alas, it didn't help (all the same bizarre context-dependent test_sax failure modes). I'm sure I disabled WITH_CYCLE_GC correctly, because "import gc" now fails with ImportError in both release and debug builds. BTW, a refcount-too-low problem is another good candidate. From greg at cosc.canterbury.ac.nz Mon Jan 22 02:00:46 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 14:00:46 +1300 (NZDT) Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: <200101220100.OAA01820@s454.cosc.canterbury.ac.nz> Michael Hudson : > if you meant min(a,b), Yes, sorry, that's what I meant. Or at least that's what I thought the original poster meant - if he didn't, then I'm confused, too! Anyway, I agree that it's a silly thing to want to make a>b mean, and I'm not all that disappointed that it won't be possible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon Jan 22 02:11:52 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:11:52 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101220052.NAA01817@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > I'm not sure that the proposed alternative (casting both > pointers to ints and comparing the ints) is any better. > Does the C std define the result of doing that to two > unrelated pointers? C99 guarantees that, if the type exists, casting a pointer to type uintptr_t won't blow up, and also guarantees that comparisons between (at least) ints of the same type won't blow up. Beyond that, we don't care what it returns. Mostly we're trying to eliminate warnings Barry has to wade thru from Insure++ -- same reason we have a "no compiler warnings!" build policy. Doing the cast is obviously "better" when viewed through Barry's 4AM eyes. You can find out *why* C has this rule (which was in C89, not new in C99) by reading the C FAQ. From tim.one at home.com Mon Jan 22 02:23:27 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:23:27 -0500 Subject: [Python-Dev] Rich comparison confusion In-Reply-To: Message-ID: [Michael Hudson] > ... > if you meant min(a,b), then I then think the programmer who > thinks "min(a,b)" is spelt "a deal with (if min has a symbol it's /\, but never mind that). Curiously, in the Icon language, if a is less than b then a < b returns b while b > a returns a. In this way they get the same effect as Python's chained comparisons a < b < c < d via purely binary operators (if a is *not* less than b, a < b in Icon "fails", which is a silent event that causes the expression's context to backtrack -- but we won't go into that here ). Anyway, that accounts for this curious Icon idiom: a <:= b which is short for a := a < b and binds a to max(a, b) (if a is smaller, a < b returns b and the assignment proceeds; but if a is not smaller, a < b fails and that propagates into its context, which here has no other possibilities to backtrack into, so the stmt just ends leaving a alone). "<"-and-">"-are-just-bags-of-pixels-ly y'rs - tim From uche.ogbuji at fourthought.com Mon Jan 22 02:24:46 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Sun, 21 Jan 2001 18:24:46 -0700 Subject: [Python-Dev] should a module's thread safety be documented? In-Reply-To: Message from Guido van Rossum of "Sun, 21 Jan 2001 12:00:02 EST." <200101211700.MAA25479@cj20424-a.reston1.va.home.com> Message-ID: <200101220124.SAA08868@localhost.localdomain> > > A bit late for 2.1alpha1, but it just occurred to me that perhaps there > > should be an annotation in the documentation that indicates whether or not a > > module is thread-safe. For example, many functions in fileinput rely on a > > module global called _state. It strikes me that this module is not likely > > to be thread-safe, yet the documentation doesn't appear to mention this, > > certainly not in an obvious fashion. > > > > Anyone for adding \notthreadsafe{} and \threadsafe{} macros to the litany of > > LaTex macros in Fred's arsenal? This would make documenting these > > properties both easy and consistent across modules. > > It's hard to say whether a *whole module* is threadsafe. E.g. in the > fileinput example, there's the clear implication that if you use this > in multiple threads, you should instantiate your own FileInput > instances, and then you're totally thread-safe. Clearly the semantics > of the module-global functions are thread-unsafe though. Perhaps what is needed rather is a prose annotation for thread-safety issues. My TeX is rusty, but in Docbook, with the use of role attributes, one could have, taking your FileInput example The module-global functions are not safe, but if you instantiate your own FileInput instances, they will be totally thread-safe. That way the MT issues could be styled differently on rendering, gathered into separate documentation, stripped by those who don't care, etc. I imagine this is also possible in TeX. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Mon Jan 22 02:32:30 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 21 Jan 2001 20:32:30 -0500 Subject: [Python-Dev] a>b == b Message-ID: [Greg Ewing] > Suppose I have a class which checks whether it knows > how to do a comparison, and if not, wants to pass it > on to the other operand in case it knows: > > class Foo: > > def __lt__(self, other): > if I_know_about(other): > # do the comparison > else: > return other.__gt__(self) > > If the other operand has a __gt__ method which is > doing similar tricks, infinite recursion could result. Does this have something to do with comparisons? That is, wouldn't the same be true if you coded two methods named "spam" and "eggs" in this way? whatever = 0 class Foo: def spam(self, other): if whatever: return 1 else: return other.eggs(self) class Bar: def eggs(self, other): if whatever: return 1 else: return other.spam(self) Foo().spam(Bar()) # RuntimeError: Maximum recursion depth exceeded It that's all there is to it, you got what you asked for. From greg at cosc.canterbury.ac.nz Mon Jan 22 04:31:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jan 2001 16:31:41 +1300 (NZDT) Subject: [Python-Dev] a>b == b Message-ID: <200101220331.QAA01833@s454.cosc.canterbury.ac.nz> Tim Peters : > Does this have something to do with comparisons? That is, wouldn't the same > be true if you coded two methods named "spam" and "eggs" in this > way? Yes, but Guido hasn't decreed that a.spam(b) and b.eggs(a) are to have a reflective relationship with each other. But don't worry - I've belatedly realised that the correct way to do what I was talking about is to return NotImplemented and let the interpreter take care of calling the reflected method. So I withdraw my objection. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon Jan 22 08:54:32 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 02:54:32 -0500 Subject: [Python-Dev] Worse news Message-ID: I still don't have a clue about test_sax, but have stumbled into more failure modes. Most of them seem related to the SystemError ("'finally' pops bad exception"). Around that part of ceval.c, sometimes the v popped off the stack has a NULL type pointer, other times it's a pointer to a damaged PyTuple_Type (for example, with a tp_dealloc field of 0x61, which leads to an illegal instruction exception). The MS debug heap routines fill all newly malloc'ed memory with 0Xcd ("clean landfill"), fill free'ed memory with 0Xdd ("dead landfill"), and *pad* malloc'ed memory with some number of 0xfd bytes on both sides ("no-man's land"). The clean landfill and no-man's land patterns are showing up more often they should "by chance", and especially in high-order bytes. Just more evidence of the obvious: something is really screwed up . I cannot get the subtest that test_sax is calling (test_expat_incomplete) to fail in isolation. Next headache: If I delete all .pyc files from Lib/ and Lib/test/, and then run: python ../lib/test/regrtest.py -x test_sax by hand, all the 98 tests that *should* run on Windows (excluding, of course, test_sax, which is no longer tried) pass. If I immediately run them again (without deleting .pyc) by hand: python ../lib/test/regrtest.py -x test_sax then they again pass. However, if I do rt -x test_sax which does exactly the steps (delete .pyc, run regrest excluding test_sax, run regrtest again) via the little MS batch file rt.bat, then on the second time thru regrtest, and 5 times out of 5, it died in test_extcall with an "illegal operation", while executing if (TYPE(c) == DOUBLESTAR) { near the end of symtable_params in compile.c. This is an optimized build, and the debugger has no idea what's in c at this point; to judge from the offending machine instruction and register contents, though, c is a bad pointer. Have not been able to get test_extcall to fail in isolation. Have also been unable to get test_extcall to fail in the debug build. So there's evidence of Deep Rot beyond test_sax, but test_sax remains the only test that fails every time and under both build types. Running regrtest with -r (randomize test order) is also "interesting": first time I tried that, test_cpickle failed (truncated output) as well as test_sax. I doubt anyone has run the tests more often than me over the last week, so I'm not surprised I'm seeing the most problems. However, since *nobody* is seeing anything on Linux, I'd at least like to get *someone* else to run the tests on Windows. While I'm not having any unusual problems with my box, it's certainly possible that I've got a corrupted file or a flaky memory chip etc, or that MSVC is generating bad code for some recent change (although that's unlikely since the debug build generates *really* straightforward code). Deleting my entire PCbuild subtree and refetching it from CVS didn't make any difference. From esr at thyrsus.com Mon Jan 22 09:01:27 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 03:01:27 -0500 Subject: [Python-Dev] autoconf --enable vs. --with In-Reply-To: <200101212122.WAA16371@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Sun, Jan 21, 2001 at 10:22:44PM +0100 References: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> Message-ID: <20010122030127.C20804@thyrsus.com> Martin von Loewis : > > It looks like with-cycle-gc and mybe with-pydebug would have to be > > changed. > > I'm in favour of changing it. Likewise. Let's be good neighbors. -- Eric S. Raymond Where rights secured by the Constitution are involved, there can be no rule making or legislation which would abrogate them. -- Miranda vs. Arizona, 384 US 436 p. 491 From loewis at informatik.hu-berlin.de Mon Jan 22 09:26:15 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 22 Jan 2001 09:26:15 +0100 (MET) Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: References: Message-ID: <200101220826.JAA20819@pandora.informatik.hu-berlin.de> > Running it from a command line instead produces the same output up to but > not including the traceback, and Python crashes with a memory fault then. > Attaching to the process with a debugger at that point shows it trying to do > _Py_Dealloc on an op whose op->op_type member is NULL. [...] > Bet that's as helpful to you as it was to me . Well, it was atleast motivating enough to try it out on my Whistler installation. Purify would probably find this rather quickly; the code writes into the 257th element of a 256-elements array. I've committed a fix. Depending on the exact organization of globals, this could have easily gone unnoticed. MSVC packs variables more than gcc does, so the write would overwrite one byte in ErrorObject, which would then not point to a PyObject anymore. Thanks for your patience, Martin From tim.one at home.com Mon Jan 22 10:18:04 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 04:18:04 -0500 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <200101220826.JAA20819@pandora.informatik.hu-berlin.de> Message-ID: [Martin] > Well, it was atleast motivating enough to try it out on my Whistler > installation. Purify would probably find this rather quickly; the code > writes into the 257th element of a 256-elements array. Ah! You shouldn't do that . > I've committed a fix. But you should do that. Thank you! Here's where I am now: ========================================================================= All test_sax failures have gone away (yay!). ========================================================================= Running rt -x test_sax on Windows still blows up in test_extcall on the 2nd pass. It does not blow up: using the debug build; or if test_sax is *not* excluded; or in the 1st pass; or when running text_extcall in isolation; or if the steps rt performs are done by hand ========================================================================= Running rt -r on Windows still sees test_cpickle fail in the first pass (with truncated output), but succeed in the second pass. First-pass failure is always like so (modulo line breaks I'm inserting by hand): test test_cpickle failed -- Tail of expected stdout unseen: 'dumps()\012 loads()\012 ok\012 loads() DATA\012 ok\012 dumps() binary\012 loads() binary\012 ok\012 loads() BINDATA\012 ok\012 dumps() RECURSIVE\012 ok\012' I've also seen it fail at least once when doing the same thing by hand: del ..\lib\*.pyc del ..\lib\test\*.pyc python ../lib/test/regrtest.py -r else-i-would-have-asked-martin-to-look-for-a digit-to-change-in- command.com-ly y'rs - tim From mal at lemburg.com Mon Jan 22 11:19:18 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:19:18 +0100 Subject: [Python-Dev] more unicode database changes References: <030501c083fe$2fe7dbf0$e46940d5@hagrid> Message-ID: <3A6C0926.D0A004E4@lemburg.com> Fredrik Lundh wrote: > > Just checked in another unicode database patch, which > saves another ~60k. On my Windows box, the Unicode > tables are now about 200k (down from 600k in 2.0). Great work, Fredrik :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 22 11:42:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:42:52 +0100 Subject: [Python-Dev] readline and setup.py References: <3A68B5B0.771412F7@lemburg.com> Message-ID: <3A6C0EAC.7D322174@lemburg.com> "M.-A. Lemburg" wrote: > > The new setup.py procedure for Python causes readline not to > be built on my machine. Instead I get a linker error telling > me that termcap is not found. > > Looking at my old Setup file, I have this line: > > readline readline.c \ > -I/usr/include/readline -L/usr/lib/termcap \ > -lreadline -lterm > > I guess, setup.py should be modified to include additional > library search paths -- shouldn't hurt on platforms which > don't need them. Here's a patch which works for me: projects/Python> diff CVS-Python/setup.py Dev-Python/ --- CVS-Python/setup.py Mon Jan 22 11:36:56 2001 +++ Dev-Python/setup.py Mon Jan 22 11:40:15 2001 @@ -216,10 +216,11 @@ class PyBuildExt(build_ext): exts.append( Extension('rgbimg', ['rgbimgmodule.c']) ) # readline if (self.compiler.find_library_file(lib_dirs, 'readline')): exts.append( Extension('readline', ['readline.c'], + library_dirs=['/usr/lib/termcap'], libraries=['readline', 'termcap']) ) # The crypt module is now disabled by default because it breaks builds # on many systems (where -lcrypt is needed), e.g. Linux (I believe). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 22 11:52:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 11:52:17 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> Message-ID: <3A6C10E1.EF890356@lemburg.com> "M.-A. Lemburg" wrote: > > Why does setup.py stop with an error in case _tkinter cannot > be built (due to an old Tk/Tcl version in my case) ? > > I think the policy in setup.py should be to output warnings, > but continue building the rest of the Python modules. I haven't heard anything from the powers to be... what should the policy be for auto-detected and -configured modules ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Mon Jan 22 13:37:04 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 13:37:04 +0100 Subject: [Python-Dev] _tkinter and setup.py In-Reply-To: <3A6C10E1.EF890356@lemburg.com>; from mal@lemburg.com on Mon, Jan 22, 2001 at 11:52:17AM +0100 References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> Message-ID: <20010122133704.O17392@xs4all.nl> On Mon, Jan 22, 2001 at 11:52:17AM +0100, M.-A. Lemburg wrote: > "M.-A. Lemburg" wrote: > > I think the policy in setup.py should be to output warnings, > > but continue building the rest of the Python modules. > I haven't heard anything from the powers to be... what should the > policy be for auto-detected and -configured modules ? I think Andrew is still working on a way to disable modules from the command line somehow. (I think moving setup.py to setup.py.in, and using autoconf --options would be easiest on both developer and user, but that's just me.) I also think everyone agrees with you that a module that can't be build shouldn't stop the entire process in the final release (and possibly the betas) but that it's definately a good way to debug setup.py in the alphas. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tismer at tismer.com Mon Jan 22 14:13:46 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 14:13:46 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: Message-ID: <3A6C320A.37CBB4E5@tismer.com> Maybe I can help. Tim Peters wrote: ... > Here's where I am now: > > ========================================================================= > All test_sax failures have gone away (yay!). > ========================================================================= > Running > > rt -x test_sax > > on Windows still blows up in test_extcall on the 2nd pass. It does not blow > up: > > using the debug build; or > if test_sax is *not* excluded; or > in the 1st pass; or > when running text_extcall in isolation; or > if the steps rt performs are done by hand ... I got problems with XML as well. I'm not using SAX, but plain expat for speed. The following error happens after parsing thousands of small XML files: from_my_log_window=""" \\bned-s1\tismer\pxml\sdf\mdl\DisplayRGB\1 \\bned-s1\tismer\pxml\sdf\mdl\DisplayVideo\1 Traceback (innermost last): File "", line 1, in ? File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 151, in getall getall(here, res) File "D:\crml_doc\pxml\clean.py", line 149, in getall res.append(p.parse()) File "D:\crml_doc\pxml\clean.py", line 81, in parse self.parsers[0].Parse(self.txt1, 1) File "D:\crml_doc\pxml\clean.py", line 53, in endElementMaster if self.txt2: self.parsers[1].Parse(self.txt2, 1) File "D:\crml_doc\pxml\clean.py", line 46, in startElementOther if name <> "MASTER": UnicodeError: UTF-8 decoding error: invalid data """ The good news: The error is reproducible, happens the same under PythonWin and DOS Python, and I can reduce it to a single XML file. That indicates to me that I am near the reason of the bug, not at late, indirect effects. It also *might* be related to Unicode. I will now try to create a minimized script and XML data that produces the above again. back in an hour - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas at xs4all.net Mon Jan 22 14:52:44 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 14:52:44 +0100 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: ; from tim.one@home.com on Sun, Jan 21, 2001 at 05:28:45PM -0500 References: <20010121225405.M17392@xs4all.nl> Message-ID: <20010122145244.Y17295@xs4all.nl> On Sun, Jan 21, 2001 at 05:28:45PM -0500, Tim Peters wrote: > [Thomas Wouters] > > Why is comparing v->ob_type with w->ob_type illegal ? They're > > both pointers to the same type, aren't they ? > Non-equality comparison of pointers is defined if and only if the pointers > are both addresses in the same contiguous structure (think struct or array); > an exception is made for a pointer "one beyond the end" of an array, i.e. if > sometype a[N]; > then &a[0] < &a[N] == 1 is guaranteed despite that &a[N] is outside the > bounds of a; but &a[0] < &a[N+1] is undefined (which *means* undefined! > e.g., it's OK if they compare equal, or if the comparison causes a hardware > fault, or ...). Ok, I guess I stand corrected. I was confused by the name of Py_uintptr_t: I thought it was a pointer-to-int, not an int large enough to hold a pointer. I'm also positively appalled by the fact the standard refuses to define sane behaviour for out-of-bounds access on an array, but attaches some weird significance to what pointers are pointing *to*, when comparing the values of those pointers, regardless of what type of object they are stored in. But I guess I don't have to whine about that to you, Tim :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tismer at tismer.com Mon Jan 22 15:03:25 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 15:03:25 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> Message-ID: <3A6C3DAD.522CE623@tismer.com> Christian Tismer wrote: > > Maybe I can help. ... ... > I will now try to create a minimized script and XML data that > produces the above again. > > back in an hour - chris Here we go. The following session produces the mentioned UTF8 error: >>> txt = "" >>> def startelt(name, dic): ... print name, dic ... >>> p=expat.ParserCreate() >>> p.StartElementHandler = startelt >>> p.Parse(txt) Traceback (innermost last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data Behavior depends of the ASCII code. From jeremy at alum.mit.edu Mon Jan 22 15:19:34 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 09:19:34 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: References: Message-ID: <14956.16758.68050.257212@localhost.localdomain> Tim, Funny (strange or haha?) that test_extcall is failing since the two pieces of code I've modified most recently are compile.c and the section of ceval.c that handles extended call syntax. I just got through my mail this morning and I'll see what I can reproduce on Linux. As for the test_sax failure, is any of the Python code being executed conditional on platform? The compiler may be generating bad bytecode for a code path that is only executed on Windows. Jeremy From mal at lemburg.com Mon Jan 22 15:27:38 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 15:27:38 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> Message-ID: <3A6C4359.BCB06252@lemburg.com> Christian Tismer wrote: > > Christian Tismer wrote: > > > > Maybe I can help. > > ... > > ... > > I will now try to create a minimized script and XML data that > > produces the above again. > > > > back in an hour - chris > > Here we go. > The following session produces the mentioned UTF8 error: > > >>> txt = "" > >>> def startelt(name, dic): > ... print name, dic > ... > >>> p=expat.ParserCreate() > >>> p.StartElementHandler = startelt > >>> p.Parse(txt) > Traceback (innermost last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > Behavior depends of the ASCII code. > >From code 128 (0200) to 191 (0277) the parser gives an > not well-formed exception, as it should be. > > The codes from 192 to 236, 238-243 produce > "UTF-8 decoding error: invalid data", > the rest gives "not well-formed". > > I would like to know if this happens with your (Tim) modified > version as well. I'm using plain vanilla BeOpen Python 2.0 . This has nothing to do with Python. UTF-8 marks the codes from 128-191 as illegal prefix. See Object/unicodeobject.c: static char utf8_code_length[256] = { /* Map UTF-8 encoded prefix byte to sequence length. zero means illegal prefix. see RFC 2279 for details */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 0, 0 }; Perhaps the parser should catch the UnicodeError and instead return a not-wellformed exception ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 22 15:38:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 15:38:14 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> Message-ID: <3A6C45D5.9A6FA25C@lemburg.com> Thomas Wouters wrote: > > On Mon, Jan 22, 2001 at 11:52:17AM +0100, M.-A. Lemburg wrote: > > "M.-A. Lemburg" wrote: > > > > I think the policy in setup.py should be to output warnings, > > > but continue building the rest of the Python modules. > > > I haven't heard anything from the powers to be... what should the > > policy be for auto-detected and -configured modules ? > > I think Andrew is still working on a way to disable modules from the command > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > --options would be easiest on both developer and user, but that's just me.) This is fairly simple to do: distutils allows great flexibility when it comes to adding user options, e.g. we could have python setup.py --enable-tkinter --disable-readline or more generic python setup.py --enable-package tkinter --disable-package readline The options could then be edited in setup.cfg. > I also think everyone agrees with you that a module that can't be build > shouldn't stop the entire process in the final release (and possibly the > betas) but that it's definately a good way to debug setup.py in the alphas. True... but currently the only way to get Python to compile is to hand-edit setup.py and this is not easy for people with no prior distutils experience. BTW, in my case, setup.py did find the TK-libs for 8.0, but for a beta version -- as a result, _tkinter.c's version #error line triggered and the build failed. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Mon Jan 22 15:38:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 09:38:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,NONE,1.1 In-Reply-To: Your message of "Sun, 21 Jan 2001 14:37:57 +0200." <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> References: <20010121123757.D897BA83E@darjeeling.zadka.site.co.il> Message-ID: <200101221438.JAA29303@cj20424-a.reston1.va.home.com> > Wouldn't it be better to use the > > d = {} > exec "foo", d Surely you meant exec "foo" in d --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Mon Jan 22 15:43:42 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 15:43:42 +0100 Subject: [Python-Dev] _tkinter and setup.py In-Reply-To: <3A6C45D5.9A6FA25C@lemburg.com>; from mal@lemburg.com on Mon, Jan 22, 2001 at 03:38:14PM +0100 References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> <3A6C45D5.9A6FA25C@lemburg.com> Message-ID: <20010122154342.B17295@xs4all.nl> On Mon, Jan 22, 2001 at 03:38:14PM +0100, M.-A. Lemburg wrote: > > I think Andrew is still working on a way to disable modules from the command > > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > > --options would be easiest on both developer and user, but that's just me.) > This is fairly simple to do: distutils allows great flexibility > when it comes to adding user options, e.g. we could have > > python setup.py --enable-tkinter --disable-readline > > or more generic > > python setup.py --enable-package tkinter --disable-package readline > > The options could then be edited in setup.cfg. Note that the 'user' only has 'configure' and 'make' to run, so optimally, the options would have to be given to one of those (preferably to 'configure', to keep it similar to 90% of the packages out there.) > but currently the only way to get Python to compile is > to hand-edit setup.py and this is not easy for people with no > prior distutils experience. You only have to edit the 'disabled_module_list' variable... not too hard even if you don't have distutils experience (though you do need some python experience.) I don't think its wrong to expect people who compile alpha versions to have at least that much knowledge (though it should be noted in the README somewhere.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis at informatik.hu-berlin.de Mon Jan 22 15:46:39 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 22 Jan 2001 15:46:39 +0100 (MET) Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <3A6C4359.BCB06252@lemburg.com> (mal@lemburg.com) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> <3A6C4359.BCB06252@lemburg.com> Message-ID: <200101221446.PAA05164@pandora.informatik.hu-berlin.de> > This has nothing to do with Python. UTF-8 marks the codes > from 128-191 as illegal prefix. [...] > Perhaps the parser should catch the UnicodeError and > instead return a not-wellformed exception ?! Right on both accounts. If no encoding is specified, and if the document appears not to be UTF-16 in any endianness, an XML processor shall assume it is UTF-8. As Marc-Andre explains, your document is not proper UTF-8, hence the error. The confusing thing is that expat itself does not care about it not being UTF-8; that is only detected when the callback is invoked in pyexpat, and therefore conversion to a Unicode object is attempted. The right solution probably would be to change expat so that it determines correctness of the encoding for each string it gets as part of the wellformedness analysis, and produces illformedness exceptions when an encoding error occurs. Patches are welcome, although they probable should go to sourceforge.net/projects/expat. Regards, Martin From jack at oratrix.nl Mon Jan 22 15:57:33 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 22 Jan 2001 15:57:33 +0100 Subject: [Python-Dev] test_sax and site-python Message-ID: <20010122145733.85E51373C95@snelboot.oratrix.nl> I'm not sure whether this is really a bug, but I had the problem that there was something wrong with the xml package I had installed into my Lib/site-python, and this caused test_sax to complain. If the test stuff is expected to test only the core functionality maybe sys.path should be edited so that it only contains directories that are part of the core distribution? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From tismer at tismer.com Mon Jan 22 16:05:24 2001 From: tismer at tismer.com (Christian Tismer) Date: Mon, 22 Jan 2001 16:05:24 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <3A6C320A.37CBB4E5@tismer.com> <3A6C3DAD.522CE623@tismer.com> <3A6C4359.BCB06252@lemburg.com> Message-ID: <3A6C4C34.4D1252C9@tismer.com> "M.-A. Lemburg" wrote: ... > > The codes from 192 to 236, 238-243 produce > > "UTF-8 decoding error: invalid data", > > the rest gives "not well-formed". > > > > I would like to know if this happens with your (Tim) modified > > version as well. I'm using plain vanilla BeOpen Python 2.0 . > > This has nothing to do with Python. UTF-8 marks the codes > from 128-191 as illegal prefix. See Object/unicodeobject.c: ... Schade. > Perhaps the parser should catch the UnicodeError and > instead return a not-wellformed exception ?! I belive it would be better. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at digicool.com Mon Jan 22 16:06:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:06:06 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.24,2.25 In-Reply-To: Your message of "Sun, 21 Jan 2001 15:34:14 PST." References: Message-ID: <200101221506.KAA29773@cj20424-a.reston1.va.home.com> > Move declaration of 'clnt_create()' NIS function to pyport.h, as it's > supposed to be declared in system include files (with a proper prototype.) > Should be moved to a platform-specific block if anyone finds out which > broken platforms need it :-) [The following is inside #if 0] > + /* From Modules/nismodule.c */ > + CLIENT *clnt_create(); > + Thomas, I'm not sure if this particular declaration belongs in pyport.h, even inside #if 0. CLIENT is declared in a NIS-specific header file that's not included by pyport.h, but which *is* included by nismodule.c. I think you did the right thing to nismodule.c; the pyport.h patch is redundant in my eyes. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jan 22 16:12:49 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 16:12:49 +0100 Subject: [Python-Dev] _tkinter and setup.py References: <3A68B6BD.BAD038D6@lemburg.com> <3A6C10E1.EF890356@lemburg.com> <20010122133704.O17392@xs4all.nl> <3A6C45D5.9A6FA25C@lemburg.com> <20010122154342.B17295@xs4all.nl> Message-ID: <3A6C4DF1.F71AA631@lemburg.com> Thomas Wouters wrote: > > On Mon, Jan 22, 2001 at 03:38:14PM +0100, M.-A. Lemburg wrote: > > > > I think Andrew is still working on a way to disable modules from the command > > > line somehow. (I think moving setup.py to setup.py.in, and using autoconf > > > --options would be easiest on both developer and user, but that's just me.) > > > This is fairly simple to do: distutils allows great flexibility > > when it comes to adding user options, e.g. we could have > > > > python setup.py --enable-tkinter --disable-readline > > > > or more generic > > > > python setup.py --enable-package tkinter --disable-package readline > > > > The options could then be edited in setup.cfg. > > Note that the 'user' only has 'configure' and 'make' to run, so optimally, > the options would have to be given to one of those (preferably to > 'configure', to keep it similar to 90% of the packages out there.) Hmm, but then you'll have to hack autoconf again... (even if only to pass the options to setup.py somehow, e.g. via your proposed setup.cfg.in trick). > > but currently the only way to get Python to compile is > > to hand-edit setup.py and this is not easy for people with no > > prior distutils experience. > > You only have to edit the 'disabled_module_list' variable... not too hard > even if you don't have distutils experience (though you do need some python > experience.) I don't think its wrong to expect people who compile alpha > versions to have at least that much knowledge (though it should be noted in > the README somewhere.) Oops, you're right; must have overlooked that one in setup.py. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From thomas at xs4all.net Mon Jan 22 16:14:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 16:14:02 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.24,2.25 In-Reply-To: <200101221506.KAA29773@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 10:06:06AM -0500 References: <200101221506.KAA29773@cj20424-a.reston1.va.home.com> Message-ID: <20010122161402.D17295@xs4all.nl> On Mon, Jan 22, 2001 at 10:06:06AM -0500, Guido van Rossum wrote: > > Move declaration of 'clnt_create()' NIS function to pyport.h, as it's > > supposed to be declared in system include files (with a proper prototype.) > > Should be moved to a platform-specific block if anyone finds out which > > broken platforms need it :-) > > [The following is inside #if 0] > > + /* From Modules/nismodule.c */ > > + CLIENT *clnt_create(); > > + > > Thomas, I'm not sure if this particular declaration belongs in > pyport.h, even inside #if 0. > > CLIENT is declared in a NIS-specific header file that's not included by > pyport.h, but which *is* included by nismodule.c. > > I think you did the right thing to nismodule.c; the pyport.h patch is > redundant in my eyes. The same goes for most prototypes inside that '#if 0'. I see it more as an easy list to see what prototypes were removed than as proper examples of the prototype. You're right about CLIENT being defined in system-specific include files, I just wasn't worried about it because it was inside an '#if 0' that will never be turned into an '#if 1'. If a specific platform needs that prototype, we'll figure out how to arrange the prototype then :) But if you want me to remove it, that's fine. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jan 22 16:22:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:22:29 -0500 Subject: [Python-Dev] autoconf --enable vs. --with In-Reply-To: Your message of "Mon, 22 Jan 2001 03:01:27 EST." <20010122030127.C20804@thyrsus.com> References: <200101212122.WAA16371@pandora.informatik.hu-berlin.de> <20010122030127.C20804@thyrsus.com> Message-ID: <200101221522.KAA30287@cj20424-a.reston1.va.home.com> > I've been working a bit on the build process lately. I came > across this in the autoconf documentation: > > > If a software package has optional compile-time features, the > user can give `configure' command line options to specify > whether to compile them. The options have one of these forms: > > --enable-FEATURE[=ARG] > --disable-FEATURE > > Some packages require, or can optionally use, other software > packages which are already installed. The user can give > `configure' command line options to specify which such > external software to use. The options have one of these > forms: > > --with-package[=ARG] > --without-package > > > Is it worth fixing the Python configure script to comply with > these definitions? It looks like with-cycle-gc and mybe > with-pydebug would have to be changed. OK, but please add explicit checks for the old --with[out]-cycle-gc and --with[out]-pydebug flags that cause errors (not just warnings) when these forms are used. It's bad enough that configure doesn't flag typos in such options as errors; if we change the option names, we really owe users who were using the old forms a clear error. (Is this stupid autoconf behavior changable? Does it also apply to enable/disable?) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Jan 22 16:19:49 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 22 Jan 2001 10:19:49 -0500 (EST) Subject: [Python-Dev] RE: test_sax failing (Windows) In-Reply-To: References: <200101212200.XAA16672@pandora.informatik.hu-berlin.de> Message-ID: <14956.20373.104748.573294@cj42289-a.reston1.va.home.com> [Martin, on ftp://ftp.jclark.com/pub/xml/expat.zip] > Anyway, I was using 1.1 in my own tests, and 1.2 in PyXML - either > works for me. I never tested 1.95.x (which is also not available from > jclark.com). Tim Peters writes: > If you do and love it, let me know where to get it and I'll ship that > instead. I'll recommend not updating to 1.95.1; let's awit at least until 1.95.2 is out. These are really just pre-2.0 releases to shake things out. I have been using the current Expat CVS lightly, but need to do more testing before I can be confident in it and our bindings (not yet checked in anywhere; should be in PyXML soon). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jeremy at alum.mit.edu Mon Jan 22 16:44:41 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 10:44:41 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: References: Message-ID: <14956.21865.943601.735426@localhost.localdomain> On Linux, I am also seeing test_cpickle failures. I have not been able to reproduce failures in test_extcall or test_sax. I ran 'regrtest.py -r -x test_thread test_unicodedata test_signal test_select test_poll' 10 times and test_cpickle failed five times. (I did the peculiar run because exclyding those five tests shaves two minutes off the running time of the test suite.) No more time to look into this... Jeremy From jeremy at alum.mit.edu Mon Jan 22 16:26:27 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 10:26:27 -0500 (EST) Subject: [Python-Dev] getcode() function in pyexpat.c Message-ID: <14956.20771.447958.389724@localhost.localdomain> The pyexpat module uses functions named getcode() and call_with_frame() for handlers of some sort. I can make this much out from the code, but the rest is a bit of a mystery. I was trying to read this code because of the errors Tim is seeing with test_sax on Windows. A few comments to explain this highly stylized and macro-laden code would be appreciated. The module appears to be creating empty code objects and calling them. I say they appear to be empty, because when they are created they don't appear to have anything initialized except name, filename, and firstlineno. getcode(EndNamespaceDecl, 419) (The freevars and cellvars entries are part of the support for nested scopes. They can be safely ignored for the moment.) I simply don't understand what's going on -- and I'm deeply suspicious that it is the source of whatever problems Tim is seeing with test_sax. Jeremy From thomas at xs4all.net Mon Jan 22 16:55:35 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 16:55:35 +0100 Subject: [Python-Dev] 'make distclean' broken. Message-ID: <20010122165535.P17392@xs4all.nl> 'make distclean' seems broken, at least on non-GNU make's: [snip] clobbering subdirectory Modules rm -f *.o python core *~ [@,#]* *.old *.orig *.rej rm -f add2lib hassignal rm -f *.a tags TAGS config.c Makefile.pre rm -f *.so *.sl so_locations make -f ./Makefile.in SUBDIRS="Include Lib Misc Demo" clobber "./Makefile.in", line 134: Need an operator make: fatal errors encountered -- cannot continue *** Error code 1 (ignored) rm -f config.status config.log config.cache config.h Makefile rm -f buildno platform rm -f Modules/Makefile [snip] (This is using FreeBSD's 'make'.) Looking at line 134, I'm not sure why it works with GNU make other than that it avoids complaining about syntax errors it doesn't run into (which could be both bad and good :) or that it avoids complaining about obvious GNU autoconf tricks. But I don't know enough about make to say for sure, nor to fix the above problem. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jan 22 16:55:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 10:55:42 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sun, 21 Jan 2001 17:28:45 EST." References: Message-ID: <200101221555.KAA30935@cj20424-a.reston1.va.home.com> > Your faith in gcc is as charming as it is naive : the most > interesting cases of undefined behavior can't be checked no-way, no-how at > compile-time. That's why Barry keeps talking employers into dumping > thousands of dollars into a single Insure++ license. Insure++ actually tags > every pointer at runtime with its source, and gripes if non-equality > comparisons are done on a pair not derived from the same array or malloc > etc. Since Python type objects are individually allocated (not taken from a > preallocated contiguous vector), Insure++ should complain about that > compare. IMHO, *this* *particular* gripe of Insure++ is just a pain in the butt, and I wish there was a way to turn it off in Insure++ without having to fix the code. IMHO, this was included in the standard to allow segmented-memory implementations of C. Think certain DOS or Windows 3.1 memory models where a pointer is a segment plus an offset. This is not current practice even on Palmpilots! The standard may say that such comparisons are undefined, but I don't care about this particular undefinedness, and I'm annoyed by the required patches. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 22 17:02:15 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 11:02:15 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: Your message of "Sun, 21 Jan 2001 14:44:38 EST." References: Message-ID: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> > > My only concern is that under the old schele, two different numeric > > extension types that somehow can't be compared will end up being > > *equal*. To fix this, I propose that if the names compare equal, as a > > last resort we compare the type pointers -- this should be consistent > > too. > > Agreed, and sounds fine! Checked in now. While fixing the test_b1 code again, which depends on this behavior, I thought of a refinement: it wouldn't be hard to make None compare smaller than *anything* (including numbers). Is this worth it? diff -c -r2.113 object.c *** object.c 2001/01/22 15:59:32 2.113 --- object.c 2001/01/22 16:03:38 *************** *** 550,555 **** --- 550,561 ---- PyErr_Clear(); } + /* None is smaller than anything */ + if (v == Py_None) + return -1; + if (w == Py_None) + return 1; + /* different type: compare type names */ if (v->ob_type->tp_as_number) vname = ""; --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh21 at cam.ac.uk Mon Jan 22 17:12:47 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: Mon, 22 Jan 2001 16:12:47 +0000 (GMT) Subject: [Python-Dev] Worse news In-Reply-To: <14956.21865.943601.735426@localhost.localdomain> Message-ID: On Mon, 22 Jan 2001, Jeremy Hylton wrote: > On Linux, I am also seeing test_cpickle failures. I have not been > able to reproduce failures in test_extcall or test_sax. Hmm - my machine's done 28 exemplary "make clean; make test" runs this morning. I last updated yesterday afternoon my time (~1700 GMT). Of course, I don't build pyexpat... > No more time to look into this... Don't you just love memory corruption bugs? Cheers, M. From akuchlin at mems-exchange.org Mon Jan 22 17:28:59 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Mon, 22 Jan 2001 11:28:59 -0500 Subject: [Python-Dev] Python 2.1 article Message-ID: I've put together an almost-complete first draft of a "What's New in 2.1" article. The only missing piece is a section on the Nested Scopes PEP, which obviously has to wait for the changes to get checked in. http://www.amk.ca/python/2.1/ ; as usual, nitpicking comments are welcomed. --amk From nas at arctrix.com Mon Jan 22 11:00:43 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 02:00:43 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from mwh21@cam.ac.uk on Mon, Jan 22, 2001 at 04:12:47PM +0000 References: <14956.21865.943601.735426@localhost.localdomain> Message-ID: <20010122020043.A25687@glacier.fnational.com> On Mon, Jan 22, 2001 at 04:12:47PM +0000, Michael Hudson wrote: > Don't you just love memory corruption bugs? Great fun. I've played around with efence and debauch on the weekend. I even when as far as merging an updated fmalloc from the XFree source tree into debauch and writing a reporting script in Python. I probably would have caught the pyexpat overrun if I would have used efence with EF_ALIGNMENT=0 and complied with -fpack-struct. I'll have to try it tonight. Maybe something else will turn up. Neil From guido at digicool.com Mon Jan 22 18:12:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 12:12:29 -0500 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: Your message of "Mon, 22 Jan 2001 16:55:35 +0100." <20010122165535.P17392@xs4all.nl> References: <20010122165535.P17392@xs4all.nl> Message-ID: <200101221712.MAA00694@cj20424-a.reston1.va.home.com> > 'make distclean' seems broken, at least on non-GNU make's: > > [snip] > clobbering subdirectory Modules > rm -f *.o python core *~ [@,#]* *.old *.orig *.rej > rm -f add2lib hassignal > rm -f *.a tags TAGS config.c Makefile.pre > rm -f *.so *.sl so_locations > make -f ./Makefile.in SUBDIRS="Include Lib Misc Demo" clobber > "./Makefile.in", line 134: Need an operator > make: fatal errors encountered -- cannot continue > *** Error code 1 (ignored) > rm -f config.status config.log config.cache config.h Makefile > rm -f buildno platform > rm -f Modules/Makefile > [snip] > > (This is using FreeBSD's 'make'.) > > Looking at line 134, I'm not sure why it works with GNU make other than that > it avoids complaining about syntax errors it doesn't run into (which could > be both bad and good :) or that it avoids complaining about obvious GNU > autoconf tricks. But I don't know enough about make to say for sure, nor to > fix the above problem. There's one line in Makefile.in that trips over Make (mine also complains about it): @SET_DLLLIBRARY@ Looking at the code in configure.in that generates this macro: AC_SUBST(SET_DLLLIBRARY) LDLIBRARY='' SET_DLLLIBRARY='' . . (and later) . cygwin*) LDLIBRARY='libpython$(VERSION).dll.a' SET_DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' ;; I don't see why we couldn't change this so that Makefile.in just contains DLLLIBRARY= @DLLLIBRARY@ and then configure.in could be changed to AC_SUBST(DLLLIBRARY) LDLIBRARY='' DLLLIBRARY='' . . (and later) . cygwin*) LDLIBRARY='libpython$(VERSION).dll.a' DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' ;; Or am I missing something? Does this fix the problem? --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Mon Jan 22 18:21:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 12:21:09 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221602.LAA31103@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 11:02:15AM -0500 References: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> Message-ID: <20010122122109.A14952@thyrsus.com> Guido van Rossum : > While fixing the test_b1 code again, which depends on this behavior, I > thought of a refinement: it wouldn't be hard to make None compare > smaller than *anything* (including numbers). > > Is this worth it? I think so, if only for the sake of well-definedness. -- Eric S. Raymond "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin, Historical Review of Pennsylvania, 1759. From thomas at xs4all.net Mon Jan 22 18:25:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 18:25:30 +0100 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: <200101221712.MAA00694@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 12:12:29PM -0500 References: <20010122165535.P17392@xs4all.nl> <200101221712.MAA00694@cj20424-a.reston1.va.home.com> Message-ID: <20010122182530.E17295@xs4all.nl> On Mon, Jan 22, 2001 at 12:12:29PM -0500, Guido van Rossum wrote: > and then configure.in could be changed to > AC_SUBST(DLLLIBRARY) > LDLIBRARY='' > DLLLIBRARY='' > . > . (and later) > . > cygwin*) > LDLIBRARY='libpython$(VERSION).dll.a' > DLLLIBRARY='DLLLIBRARY= $(basename $(LDLIBRARY))' > ;; You mean DLLLIBRARY='$(basename $(LDLIBRARY))' But yes, that fixes it. > Or am I missing something? Well, on *that* I'm not sure, that's why I asked :P If things in the Python source boggle me, they are always there for a good reason. Well, maybe just 'almost always', but practically always :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From nas at arctrix.com Mon Jan 22 11:39:59 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 02:39:59 -0800 Subject: [Python-Dev] 'make distclean' broken. In-Reply-To: <200101221712.MAA00694@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 12:12:29PM -0500 References: <20010122165535.P17392@xs4all.nl> <200101221712.MAA00694@cj20424-a.reston1.va.home.com> Message-ID: <20010122023959.A25798@glacier.fnational.com> [Guido on change SET_DLLLIBRARY] > Or am I missing something? I don't think so. My new Makefile uses "FOO = @FOO@" everywhere. SET_CXX is the same way in the current Makefile. Neil From esr at thyrsus.com Mon Jan 22 18:41:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 12:41:59 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? Message-ID: <20010122124159.A14999@thyrsus.com> \section{\module{set} --- Basic set algebra for Python} \declaremodule{standard}{set} \modulesynopsis{Basic set algebra operations on sequences.} \moduleauthor{Eric S. Raymond}{esr at thyrsus.com} \sectionauthor{Eric S. Raymond}{esr at thyrsus.com} The \module{set} module defines functions for treating lists and other sequences as mathematical sets, and defines a set class that uses these operations natively and overloads Python's standard operator set. The \module{set} functions work on any sequence type and return lists. The set methods can take a set or any sequence type as an argument. Set or sequence elements may be of any type and may be mutable. Comparisons and membership tests of elements against sequence objects are done using \keyword{in}, and so can be customized by supplying a suitable \method{__getattr__} method for the sequence type. The running time of these functions is O(n**2) in the worst case unless otherwise noted. For cases that can be short-circuited by cardinality comparisons, this has been done. \begin{funcdesc}{setify}{list1} Returns a list of the argument sequence's elements with duplicates removed. \end{funcdesc} \begin{funcdesc}{union}{list1, list2} Set union. All elements of both sets or sequences are returned. \end{funcdesc} \begin{funcdesc}{intersection}{list1, list2} Set intersection. All elements common to both sets or sequences are returned. \end{funcdesc} \begin{funcdesc}{difference}{list1, list2} Set difference. All elements of the first set or sequence not present in the second are returned. \end{funcdesc} \begin{funcdesc}{symmetric_difference}{list1, list2} Set symmetric difference. All elements present in one sequence or the other but not in both are returned. \end{funcdesc} \begin{funcdesc}{cartesian}{list1, list2} Returns a list of tuples consisting of all possible pairs of elements from the first and second sequences or sets. \end{funcdesc} \begin{funcdesc}{equality}{list1, list2} Set comparison. Return 1 if the two sets or sequences contain exactly the same elements, 0 or otherwise. \end{funcdesc} \begin{funcdesc}{subset}{list1, list2} Set subset test. Return 1 if all elements of the fiorst set or sequence are members of the second, 0 otherwise. \end{funcdesc} \begin{funcdesc}{proper_subset}{list1, list2} Set subset test, excluding equality. Return 1 if the arguments fail a set equality test, and all elements of the fiorst set or sequence are members of the second, 0 otherwise. \end{funcdesc} \begin{funcdesc}{powerset}{list1} Return the set of all subsets of the argument set or sequence. Warning: this produces huge results from small arguments and is O(2**n) in both running time and space requirements; you can readily run yourself out of memory using it. \end{funcdesc} \subsection{set Objects \label{set-objects}} A \class{set} instance uses the \module{set} module functions to implement set semantics on the list it contains, and to support a full set of Python list methods and operaors. Thus, the set methods can take a set or any sequence type as an argument. A set object contains a single data member: \begin{memberdesc}{elements} List containing the elements of the set. \end{memberdesc} Set objects can be treated as mutable sequences; they support the special methods \method{__len__}, \method{__getattr__}, \method{__setattr__}, and \method{__delattr__}. Through \method{__getattr__}, they support the memebership test via \keyword{in}. All the standard mutable-sequence methods \method{list}, \method{append}, \method{extend}, \method{count}, \method{index}, \method{insert} (the index argument is ignored), \method{pop}, \method{remove}, \method{reverse}, and \method{sort} are also supported. After method calls that add elements (\method{setattr}, \method{append}, \method{extend}, \method{insert}), the elements of the data member are re-setified, so it is not possible to introduce duplicates. Calling \function{repr()} on a set returns the result of calling \function{repr} on its element list. Calling \function{str()} returns a representation resembling mathematical notation for the set; an open set bracket, followed by a comma-separated list of \function{str()} representations of the elements, followed by a close set brackets. Set objects support the following Python operators: \begin {tableiii}{l|l|l}{code}{Operator}{Function}{Description} \lineiii{|,+}{union}{Union} \lineiii{&}{intersection}{Intersection} \lineiii{-}{difference}{Difference} \lineiii{^}{symmetric_difference}{Symmetric differe} \lineiii{*}{cartesian}{Cartesian product} \lineiii{==}{equality}{Equality test} \lineiii{!=,<>}{}{Inequality test} \lineiii{<}{proper_subset}{Proper-subset test} \lineiii{<=}{subset}{Subset test} \lineiii{>}{}{Proper superset test} \lineiii{>=}{}{Superset test} \end {tableiii} -- Eric S. Raymond Government is actually the worst failure of civilized man. There has never been a really good one, and even those that are most tolerable are arbitrary, cruel, grasping and unintelligent. -- H. L. Mencken From esr at snark.thyrsus.com Mon Jan 22 19:28:57 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 13:28:57 -0500 Subject: [Python-Dev] I still can't build HTML in a current CVS tree. Message-ID: <200101221828.f0MISvH15121@snark.thyrsus.com> Fred, I still can't build HTML documentation in a current CVS tree -- same complaint about lib/modindex.html being absent. Can we get this fixed before 2.1 ships? -- Eric S. Raymond ...Virtually never are murderers the ordinary, law-abiding people against whom gun bans are aimed. Almost without exception, murderers are extreme aberrants with lifelong histories of crime, substance abuse, psychopathology, mental retardation and/or irrational violence against those around them, as well as other hazardous behavior, e.g., automobile and gun accidents." -- Don B. Kates, writing on statistical patterns in gun crime From fredrik at effbot.org Mon Jan 22 19:33:56 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Mon, 22 Jan 2001 19:33:56 +0100 Subject: [Python-Dev] Python 2.1 article References: Message-ID: <059b01c084a1$e431e490$e46940d5@hagrid> > I've put together an almost-complete first draft of a "What's New in > 2.1" article. The only missing piece is a section on the Nested > Scopes PEP, which obviously has to wait for the changes to get checked > in. what's the current 2.1a1 eta? (pep 226 still says last friday) today? wednesday? this week? this month? Curious /F From mal at lemburg.com Mon Jan 22 19:33:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 22 Jan 2001 19:33:24 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <20010122124159.A14999@thyrsus.com> Message-ID: <3A6C7CF4.F10AA77B@lemburg.com> [LaTeX file] Eric, we are all hackers, but plain LaTeX is not really the right format for a posting to a mailing list... at least not if you really expect feedback ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From martin at mira.cs.tu-berlin.de Mon Jan 22 19:36:16 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 22 Jan 2001 19:36:16 +0100 Subject: [Python-Dev] getcode() function in pyexpat.c Message-ID: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> > A few comments to explain this highly stylized and macro-laden code > would be appreciated. I probably can't do that before 2.1a1, but I promise to suggest something right afterwards. In general, the macro magic is designed to make the many expat callbacks available to Python. RC_HANDLER (for return code) is the most general template; VOID_HANDLER and INT_HANDLER are common specializations. In the core of RC_HANDLER, there a tuple is built and a Python function is called. The code used to do PyEval_CallObject right inside the macro; the call_with_frame feature is new compared to 2.0. It solves the specific problem of incomprehensible tracebacks. In a typical SAX application, the user code calls expatreader.ExpatParser.parse, which in turn calls self._parser.Parse(data, isFinal) Now, in 2.0, a common problem was a traceback self._parser.Parse(data, isFinal) TypeError: not enough arguments; expected 4, got 2 Everybody assumes a problem in the call to Parse; the real problem is in the call to the callback inside RC_HANDLER, which tried to call a user's function with two arguments that expected four. 2.1 would improve this slightly on its own, writing self._parser.Parse(data, isFinal) TypeError: characters() takes exactly 4 arguments (2 given) With that code, you get File "/usr/local/lib/python2.1/xml/sax/expatreader.py", line 81, in feed self._parser.Parse(data, isFinal) File "pyexpat.c", line 379, in CharacterData TypeError: characters() takes exactly 4 arguments (2 given) So that tells you that it is the CharacterData handler that invokes characters(). You are right that the frame object is not used otherwise; it is just there to make a nice traceback. > I simply don't understand what's going on -- and I'm deeply > suspicious that it is the source of whatever problems Tim is seeing > with test_sax. I thought so, too, at first; it turned out that the problem was elsewhere. Regards, Martin From guido at digicool.com Mon Jan 22 20:04:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 14:04:02 -0500 Subject: [Python-Dev] Python 2.1 article In-Reply-To: Your message of "Mon, 22 Jan 2001 19:33:56 +0100." <059b01c084a1$e431e490$e46940d5@hagrid> References: <059b01c084a1$e431e490$e46940d5@hagrid> Message-ID: <200101221904.OAA01170@cj20424-a.reston1.va.home.com> > what's the current 2.1a1 eta? (pep 226 still > says last friday) You missed my email that I sent out Friday. Tentatively it's going out tonight. No point in updating the PEP each time there's slippage. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 22 20:10:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 14:10:54 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Mon, 22 Jan 2001 12:41:59 EST." <20010122124159.A14999@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> Message-ID: <200101221910.OAA01218@cj20424-a.reston1.va.home.com> Eric, There's already a PEP on a set object type, and everybody and their aunt has already implemented a set datatype. If *your* set module is ready for prime time, why not publish it in the Vaults of Parnassus? --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Mon Jan 22 20:29:18 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 14:29:18 -0500 (EST) Subject: [Python-Dev] Re: getcode() function in pyexpat.c In-Reply-To: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> References: <200101221836.f0MIaGL00923@mira.informatik.hu-berlin.de> Message-ID: <14956.35342.724657.865367@localhost.localdomain> >>>>> "MvL" == Martin v Loewis writes: >> I simply don't understand what's going on -- and I'm deeply >> suspicious that it is the source of whatever problems Tim is >> seeing with test_sax. MvL> I thought so, too, at first; it turned out that the problem was MvL> elsewhere. What was the cause of that problem? I didn't see any mail after Tim's middle-of-the-night message "Worse news." Jeremy From tim.one at home.com Mon Jan 22 21:01:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:01:59 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221602.LAA31103@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > While fixing the test_b1 code again, which depends on this behavior, I > thought of a refinement: it wouldn't be hard to make None compare > smaller than *anything* (including numbers). > > Is this worth it? First, an attempt to see what Python did in this morning's CVS turned up an internal error for Jeremy: >>> [None < x for x in (1, 1L, 1j, 1.0, [1], {}, (1,))] name: None, in ?, file '', line 1 locals: {'[1]': 0, 'x': 1} globals: {} Fatal Python error: compiler did not label name as local or global abnormal program termination A simpler way to provoke that: >>> [None < 2 for x in "x"] name: None, in ?, file '', line 1 locals: {'[1]': 0, 'x': 1} globals: {} Fatal Python error: compiler did not label name as local or global Anyway, I think forcing None to be "the smallest" is cute! Inexpensive to do, and while I don't see a compelling *use* for it, I bet it would be least surprising to newbies. +1. From fdrake at acm.org Mon Jan 22 21:08:54 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 22 Jan 2001 15:08:54 -0500 (EST) Subject: [Python-Dev] Re: I still can't build HTML in a current CVS tree. In-Reply-To: <200101221828.f0MISvH15121@snark.thyrsus.com> References: <200101221828.f0MISvH15121@snark.thyrsus.com> Message-ID: <14956.37718.968912.189834@cj42289-a.reston1.va.home.com> Eric S. Raymond writes: > Fred, I still can't build HTML documentation in a current CVS tree -- same > complaint about lib/modindex.html being absent. Can we get this fixed > before 2.1 ships? I'm guessing I've lost a previous email on the topic, or it's buried in my inbox. If this is still a problem after today's checkins, could you please file a bug report and assign it to me? Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Mon Jan 22 21:26:15 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:26:15 -0500 Subject: More damage to intuition (was RE: [Python-Dev] Comparison of recursive objects) In-Reply-To: <200101221555.KAA30935@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > IMHO, *this* *particular* gripe of Insure++ is just a pain in the > butt, and I wish there was a way to turn it off in Insure++ without > having to fix the code. Maybe there is. Barry? > IMHO, this was included in the standard to allow segmented-memory > implementations of C. Think certain DOS or Windows 3.1 memory models > where a pointer is a segment plus an offset. This is not current > practice even on Palmpilots! I could ask Tom MacDonald (former X3J11 chair), but don't want to bother him. The way these things usually turn out: the committee debated it 100 times over 10 years, but some committee member steadfastly claimed it was important. Since ANSI/ISO committees work via consensus, one implacable objector is enough. WRT pointers, I know that while the C committee did worry about segmented architectures a lot in the past, tagged architectures gave them much thornier problems (the HW tags each "word" with some manner of metadata (such as a busy/free or empty/full bit, or read+write permission bits, or a data type identifier, or a "capability" tag tying into a HW-enforced security architecture, ...), and checks those on each access, and some of the metadata can propagate into a pointer, and the HW can raise faults on pointer comparisons if the metadata doesn't match). While such machines aren't in common use, the US Govt does all sorts of things they don't talk about -- if it's not IBM's representative protecting a 40-year old architecture, it's someone emphatically not from the NSA protecting something they're not at liberty to discuss. Of course Python wants to run there too, even if we never hear about it ... > The standard may say that such comparisons are undefined, but I don't > care about this particular undefinedness, and I'm annoyed by the > required patches. Ya, and I'm annoyed that MS stdio corrupts itself -- but they're just clinging to the letter of the std too, and I've learned to live with it gracefully . pointer-ordering-comparisons-should-be-very-rare-anyway-ly y'rs - tim From tim.one at home.com Mon Jan 22 21:55:30 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 15:55:30 -0500 Subject: [Python-Dev] Worse news In-Reply-To: Message-ID: [Michael Hudson] > Hmm - my machine's done 28 exemplary "make clean; make test" runs this > morning. I last updated yesterday afternoon my time (~1700 GMT). So does mine now. The remaining failures require *unusual* ways of running the test suite (with -r to get test_cpickle to fail, confirmed now by Jeremy under Linux; and in an extremely specialized and seemingly Windows-specific way to get test_extcall to blow up w/ a bad pointer). From tim.one at home.com Mon Jan 22 22:07:27 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 16:07:27 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <14956.16758.68050.257212@localhost.localdomain> Message-ID: [Jeremy Hylton] > Funny (strange or haha?) that test_extcall is failing since the two > pieces of code I've modified most recently are compile.c and the > section of ceval.c that handles extended call syntax. Ya, I knew that, but I avoided wagging a Finger of Shame in your direction because coincidence isn't proof . > ... > As for the test_sax failure, There is no test_sax failure anywhere anymore that I know of (Martin found a dead-wrong array decl in contributed pyexpat.c code and repaired it). And I believe my "rt -x test_sax" failure in test_extcall almost certainly has nothing to do with test_sax -- far more likely the connection to test_sax is an accident, and that if I spend umpteen hours trying other things at random I'll provoke the same memory accident leading to a bad pointer via excluding some other test. I just picked test_sax because that *was* broken and I wanted to get thru the rest of the tests. BTW, delighted(?) to hear that test_cpickle fails for you too! I'm sure test_extcall is going to blow up for other people eventually too -- but it is sooooo hard to provoke even for me. I've dropped the effort pending news from someone running Insure++ or efence or whatever. From guido at digicool.com Mon Jan 22 22:18:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 16:18:26 -0500 Subject: [Python-Dev] Worse news In-Reply-To: Your message of "Mon, 22 Jan 2001 16:07:27 EST." References: Message-ID: <200101222118.QAA28305@cj20424-a.reston1.va.home.com> [Tim] > So does mine now. The remaining failures require *unusual* ways of running > the test suite (with -r to get test_cpickle to fail, confirmed now by Jeremy > under Linux; [and later] > BTW, delighted(?) to hear that test_cpickle fails for you too! This (test_cpickle) is a red herring -- it's a shallow failure in the test suite. test_cpickle imports test_pickle, but test_pickle first outputs the test output from testing pickle -- unless test_pickle has been run before! This succeeds: ./python Lib/test/regrtest.py test_cpickle test_pickle and this fails: ./python Lib/test/regrtest.py test_pickle test_cpickle Use regrtest.py -v to fidn out why. :-) I'm not sure how to restucture this, but it's not of the same quality as test_extcall or test_sax failing. Neither of those has failed for me on Linux during hours of testing. However on Windows I get an occasional appfail dialog box when using rt.bat. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Mon Jan 22 15:44:00 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 06:44:00 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from tim.one@home.com on Mon, Jan 22, 2001 at 04:07:27PM -0500 References: <14956.16758.68050.257212@localhost.localdomain> Message-ID: <20010122064400.A26543@glacier.fnational.com> On Mon, Jan 22, 2001 at 04:07:27PM -0500, Tim Peters wrote: > I've dropped the effort pending news from someone running > Insure++ or efence or whatever. efence to the rescue! I compiled with -fstruct-pack and used EF_ALIGNMENT=0 and now I can trigger a core dump by running test_extcall. More news comming... Neil From tim.one at home.com Mon Jan 22 22:41:08 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 16:41:08 -0500 Subject: [Python-Dev] test_sax and site-python In-Reply-To: <20010122145733.85E51373C95@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > I'm not sure whether this is really a bug, but I had the problem > that there was something wrong with the xml package I had > installed into my Lib/site-python, and this caused test_sax to > complain. > > If the test stuff is expected to test only the core functionality > maybe sys.path should be edited so that it only contains directories > that are part of the core distribution? AFAIK, xml *is* considered part of the core now, and has been since 2.0 was released. The wisdom of that decision is debatable with hindsight, but AFAICT xml is in the same boat as, say, zlib now: not builtin, and requires 3rd-party code to work, but part of the core all the same. The Windows installer comes w/ the necessary xml (and zlib) pieces, and I suppose the Mac Python package also should. From nas at arctrix.com Mon Jan 22 16:00:57 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 07:00:57 -0800 Subject: [Python-Dev] Worse news In-Reply-To: ; from tim.one@home.com on Mon, Jan 22, 2001 at 04:07:27PM -0500 References: <14956.16758.68050.257212@localhost.localdomain> Message-ID: <20010122070057.A26575@glacier.fnational.com> Perhaps this will help somone track down the bug: [running test_extcall...] unbound method method() must be called with instance as first argument unbound method method() must be called with instance as first argument Program received signal SIGSEGV, Segmentation fault. symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 4330 if (TYPE(c) == DOUBLESTAR) { (gdb) l 4325 symtable_add_def(st, STR(CHILD(n, i)), 4326 DEF_PARAM | DEF_STAR); 4327 i += 2; 4328 c = CHILD(n, i); 4329 } 4330 if (TYPE(c) == DOUBLESTAR) { 4331 i++; 4332 symtable_add_def(st, STR(CHILD(n, i)), 4333 DEF_PARAM | DEF_DOUBLESTAR); 4334 } (gdb) p c $3 = (node *) 0x42a43fff (gdb) p *c $4 = {n_type = 0, n_str = 0x0, n_lineno = 0, n_nchildren = 0, n_child = 0x0} (gdb) p n $5 = (node *) 0x42a3ffd7 (gdb) p *n $6 = {n_type = 261, n_str = 0x0, n_lineno = 1, n_nchildren = 2, n_child = 0x42a43fc3} (gdb) bt 10 #0 symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 #1 0x8060126 in symtable_funcdef (st=0x429bafd0, n=0x42a23feb) at Python/compile.c:4245 #2 0x805fd29 in symtable_node (st=0x429bafd0, n=0x429b0fc3) at Python/compile.c:4128 #3 0x80600da in symtable_node (st=0x429bafd0, n=0x4290cfeb) at Python/compile.c:4232 #4 0x805f443 in symtable_build (c=0xbffff5c8, n=0x4290cfeb) at Python/compile.c:3816 #5 0x805f130 in jcompile (n=0x4290cfeb, filename=0x80a040f "", base=0x0) at Python/compile.c:3720 #6 0x805f0c2 in PyNode_Compile (n=0x4290cfeb, filename=0x80a040f "") at Python/compile.c:3699 #7 0x8069adf in run_node (n=0x4290cfeb, filename=0x80a040f "", globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:915 #8 0x8069ac0 in run_err_node (n=0x4290cfeb, filename=0x80a040f "", globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:907 #9 0x8069a30 in PyRun_String ( str=0x429f9fd1 "def zv(*v): print \"ok zv\", a, b, d, e, v, k", start=257, globals=0x40644fe0, locals=0x40644fe0) at Python/pythonrun.c:881 (More stack frames follow...) From thomas at xs4all.net Mon Jan 22 23:13:29 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 23:13:29 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <20010122070057.A26575@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 22, 2001 at 07:00:57AM -0800 References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> Message-ID: <20010122231329.A27785@xs4all.nl> On Mon, Jan 22, 2001 at 07:00:57AM -0800, Neil Schemenauer wrote: > Perhaps this will help somone track down the bug: > [running test_extcall...] > unbound method method() must be called with instance as first argument > unbound method method() must be called with instance as first argument > > Program received signal SIGSEGV, Segmentation fault. > symtable_params (st=0x429bafd0, n=0x42a3ffd7) at Python/compile.c:4330 > 4330 if (TYPE(c) == DOUBLESTAR) { > (gdb) l > 4325 symtable_add_def(st, STR(CHILD(n, i)), > 4326 DEF_PARAM | DEF_STAR); > 4327 i += 2; > 4328 c = CHILD(n, i); > 4329 } > 4330 if (TYPE(c) == DOUBLESTAR) { > 4331 i++; > 4332 symtable_add_def(st, STR(CHILD(n, i)), > 4333 DEF_PARAM | DEF_DOUBLESTAR); > 4334 } > (gdb) p c > $3 = (node *) 0x42a43fff > (gdb) p *c > $4 = {n_type = 0, n_str = 0x0, n_lineno = 0, n_nchildren = 0, n_child = 0x0} > (gdb) p n > $5 = (node *) 0x42a3ffd7 > (gdb) p *n > $6 = {n_type = 261, n_str = 0x0, n_lineno = 1, n_nchildren = 2, > n_child = 0x42a43fc3} n_child is 0x42a43fc3. That's n_child[0]. 0x42a43fff is the child being handled now. That would be n_child[3] (0x42a43fff - 0x42a3ffd7 == 60, a struct node is 20 bytes.) But n_children is 2, so it's an off-by-two error somewhere -- and look, there's a "i += 2' right above it ! It *looks* like this code will blow up whenever you use '*eggs' without '**spam' in a funtion definition. That's a fairly wild guess, but it's worth a try. Try this patch: Index: Python/compile.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/compile.c,v retrieving revision 2.148 diff -c -c -r2.148 compile.c *** Python/compile.c 2001/01/22 04:35:57 2.148 --- Python/compile.c 2001/01/22 22:12:31 *************** *** 4324,4329 **** --- 4324,4331 ---- i++; symtable_add_def(st, STR(CHILD(n, i)), DEF_PARAM | DEF_STAR); + if (NCH(n) <= i+2) + return; i += 2; c = CHILD(n, i); } -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr at thyrsus.com Mon Jan 22 21:13:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 22 Jan 2001 15:13:09 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101221910.OAA01218@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 02:10:54PM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> Message-ID: <20010122151309.C15236@thyrsus.com> Guido van Rossum : > There's already a PEP on a set object type, and everybody and their > aunt has already implemented a set datatype. I've just read the PEP. Greg's proposal has a couple of problems. The biggest one is that the interface design isn't very Pythonic -- it's formally adequate, but doesn't exploit the extent to which sets naturally have common semantics with existing Python sequence types. This is bad; it means that a lot of code that could otherwise ignore the difference between lists and sets would have to be specialized one way or the other for no good reason. The only other set module I can find in the Vaults or anywhere else is kjBuckets (which I knew about before). Looks like a good design, but complicated -- and requires installation of an extension. > If *your* set module is ready for prime time, why not publish it in > the Vaults of Parnassus? I suppose that's what I'll do if you don't bless it for the standard library. But here are the reasons I suggest you should do so: 1. It supports a set of operations that are both often useful and fiddly to get right, thus enhancing the "batteries are included" effect. (I used its ancestor for representing seen-message numbers in a specialized mailreader, for example.) 2. It's simple for application programmers to use. No extension module to integrate. 3. It's unsurprising. My set objects behave almost exactly like other mutable sequences, with all the same built-in methods working, except for the fact that you can't introduce duplicates with the mutators. 4. It's already completely documented in a form suitable for the library. 5. It's simple enough not to cause you maintainance hassles down the road, and even if it did the maintainer is unlikely to disappear :-). -- Eric S. Raymond The United States is in no way founded upon the Christian religion -- George Washington & John Adams, in a diplomatic message to Malta. From guido at digicool.com Mon Jan 22 23:29:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 17:29:26 -0500 Subject: [Python-Dev] test_sax and site-python In-Reply-To: Your message of "Mon, 22 Jan 2001 16:41:08 EST." References: Message-ID: <200101222229.RAA28667@cj20424-a.reston1.va.home.com> > [Jack Jansen] > > I'm not sure whether this is really a bug, but I had the problem > > that there was something wrong with the xml package I had > > installed into my Lib/site-python, and this caused test_sax to > > complain. > > > > If the test stuff is expected to test only the core functionality > > maybe sys.path should be edited so that it only contains directories > > that are part of the core distribution? > [Tim] > AFAIK, xml *is* considered part of the core now, and has been since 2.0 was > released. The wisdom of that decision is debatable with hindsight, but > AFAICT xml is in the same boat as, say, zlib now: not builtin, and requires > 3rd-party code to work, but part of the core all the same. The Windows > installer comes w/ the necessary xml (and zlib) pieces, and I suppose the > Mac Python package also should. Yes, but Jack was talking about a non-std xml package in site-python... I agree that this shouldn't be picked up. But is it worth taking draconian measures to avoid this? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Mon Jan 22 23:35:08 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 22 Jan 2001 17:35:08 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <200101222118.QAA28305@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This (test_cpickle) is a red herring -- it's a shallow failure in the > test suite. Fixed now -- thanks! Please note that Neil got text_extcall to fail in exactly the same place (see his recent Python-Dev) mail. That's the only remaining failure I know of. > ... > However on Windows I get an occasional appfail dialog box when > using rt.bat. I don't believe I've ever seen one of those ("appfail" rings no bells), and rt has never acted strangely for me. Your DOS-box properties may be screwed up: use Start -> Find -> Files or Folders ...; set "Look in" to C:; enter *.pif in the "Named:" box; click Find. You'll probably get a dozen hits. One of them will correspond to the method you use to open a DOS box (which I don't know). Right-click on that one and select Properties. On the Memory tab of the dialog that pops up, the four dropdown lists should have "Auto" selected. "Uses HMA" should be checked. Hmm ... looks like "Protected" *should* be checked but mine isn't ... oh, this goes on and on. I don't even know which version of Windows you're using here! How about I look at it next time I'm at your house ... From greg at cosc.canterbury.ac.nz Mon Jan 22 23:50:07 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 11:50:07 +1300 (NZDT) Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl> Message-ID: <200101222250.LAA01929@s454.cosc.canterbury.ac.nz> > 4330 if (TYPE(c) == DOUBLESTAR) { > 4325 symtable_add_def(st, STR(CHILD(n, i)), > 4326 DEF_PARAM | DEF_STAR); Shouldn't line 4330 say if (TYPE(c) == STAR) ? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From thomas at xs4all.net Mon Jan 22 23:56:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 22 Jan 2001 23:56:02 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <200101222250.LAA01929@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Tue, Jan 23, 2001 at 11:50:07AM +1300 References: <20010122231329.A27785@xs4all.nl> <200101222250.LAA01929@s454.cosc.canterbury.ac.nz> Message-ID: <20010122235602.B27785@xs4all.nl> On Tue, Jan 23, 2001 at 11:50:07AM +1300, Greg Ewing wrote: > > 4330 if (TYPE(c) == DOUBLESTAR) { > > 4325 symtable_add_def(st, STR(CHILD(n, i)), > > 4326 DEF_PARAM | DEF_STAR); > Shouldn't line 4330 say if (TYPE(c) == STAR) ? No, that's line 4323. You can't have doublestar without having star, and star should precede doublestar. (Grammar should enforce that.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From paulp at ActiveState.com Tue Jan 23 00:02:07 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 22 Jan 2001 15:02:07 -0800 Subject: [Python-Dev] pydoc - put it in the core References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> Message-ID: <3A6CBBEF.4732BFF2@ActiveState.com> Guido van Rossum wrote: > > .... > > Yes, wow! > > .... I apologize but I'm not clear on my responsibilities here, if any. I wrote a PEP for online help. I submitted a partial implementation. Ping wrote a full implementation that basically supercedes mine. There are various ideas for improving it, but I think that we agree that the core is solid. Several people have said that it should be moved into the core library. Nobody has said that it shouldn't. Whose move is it? What's next? Paul Prescod From fredrik at effbot.org Tue Jan 23 00:08:40 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 00:08:40 +0100 Subject: [Python-Dev] test___all__ fails if bsddb not available Message-ID: <079a01c084c8$43023e40$e46940d5@hagrid> test___all__ test test___all__ failed -- dbhash has no __all__ attribute maybe this test shouldn't depend on optional modules? From nas at arctrix.com Mon Jan 22 17:24:34 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 22 Jan 2001 08:24:34 -0800 Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 22, 2001 at 11:13:29PM +0100 References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> <20010122231329.A27785@xs4all.nl> Message-ID: <20010122082433.B26765@glacier.fnational.com> On Mon, Jan 22, 2001 at 11:13:29PM +0100, Thomas Wouters wrote: > That's a fairly wild guess, but it's worth a try. Try this > patch: [...] Works for me. Neil From greg at cosc.canterbury.ac.nz Tue Jan 23 00:21:14 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 12:21:14 +1300 (NZDT) Subject: [Python-Dev] Worse news In-Reply-To: <20010122235602.B27785@xs4all.nl> Message-ID: <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> Thomas Wouters : > You can't have doublestar without having star What?!? You could in 1.5.2. Has that changed? Anyway, it just looked a bit odd that it seemed to be testing for DOUBLESTAR and then adding a DEF_STAR thing to the symtab. But I guess I should shut up until I've seen all of the code. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From thomas at xs4all.net Tue Jan 23 00:26:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 23 Jan 2001 00:26:02 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <200101222321.MAA01957@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Tue, Jan 23, 2001 at 12:21:14PM +1300 References: <20010122235602.B27785@xs4all.nl> <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> Message-ID: <20010123002602.C27785@xs4all.nl> On Tue, Jan 23, 2001 at 12:21:14PM +1300, Greg Ewing wrote: > Thomas Wouters : > > You can't have doublestar without having star > What?!? You could in 1.5.2. Has that changed? Sorry, my bad, I'm wrong. (I just tested this.) I could swear it was that way, but it's 0:25 right now, after a night with about 2 hours decent sleep, so ignore my delusions :) > Anyway, it just looked a bit odd that it seemed to be testing > for DOUBLESTAR and then adding a DEF_STAR thing to the symtab. > But I guess I should shut up until I've seen all of the code. No, it's not doing that. It's adding the symbol name to the symtab, with DEF_DOUBLESTAR as one of its flags. Not sure what the flag does, but I could guess. (But see the above mentioned delusions as to why I'm not doing that out loud anymore :-) The 'if' in front of it adds the symbol to the symtab with DEF_STAR as a flag, in the case of 'STAR' (rather than DOUBLESTAR). Really. go check :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 23 00:31:03 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 23 Jan 2001 00:31:03 +0100 Subject: [Python-Dev] Worse news In-Reply-To: <20010123002602.C27785@xs4all.nl>; from thomas@xs4all.net on Tue, Jan 23, 2001 at 12:26:02AM +0100 References: <20010122235602.B27785@xs4all.nl> <200101222321.MAA01957@s454.cosc.canterbury.ac.nz> <20010123002602.C27785@xs4all.nl> Message-ID: <20010123003103.D27785@xs4all.nl> On Tue, Jan 23, 2001 at 12:26:02AM +0100, Thomas Wouters wrote: > On Tue, Jan 23, 2001 at 12:21:14PM +1300, Greg Ewing wrote: > > Thomas Wouters : > > > > You can't have doublestar without having star > > > What?!? You could in 1.5.2. Has that changed? > Sorry, my bad, I'm wrong. (I just tested this.) I could swear it was that > way, but it's 0:25 right now, after a night with about 2 hours decent sleep, > so ignore my delusions :) Ah, yeah, what I meant to *think* was: you can't have *spam *after* **eggs: >>> def foo(x, **kwarg, *arg) File "", line 1 def foo(x, **kwarg, *arg) ^ SyntaxError: invalid syntax So the logic of the latter part of the function seems okay (after the little patch I posted before.) Jeremy should give his expert opinion before it goes in, though, since it's his code :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Tue Jan 23 00:36:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 18:36:17 -0500 Subject: [Python-Dev] test___all__ fails if bsddb not available In-Reply-To: Your message of "Tue, 23 Jan 2001 00:08:40 +0100." <079a01c084c8$43023e40$e46940d5@hagrid> References: <079a01c084c8$43023e40$e46940d5@hagrid> Message-ID: <200101222336.SAA30480@cj20424-a.reston1.va.home.com> > test test___all__ failed -- dbhash has no __all__ attribute > > maybe this test shouldn't depend on optional modules? Fixed -- I just skip dbhash if bsddb can't be imported. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Tue Jan 23 01:38:28 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 19:38:28 -0500 (EST) Subject: [Python-Dev] Worse news In-Reply-To: <20010122231329.A27785@xs4all.nl> References: <14956.16758.68050.257212@localhost.localdomain> <20010122070057.A26575@glacier.fnational.com> <20010122231329.A27785@xs4all.nl> Message-ID: <14956.53892.651549.493268@localhost.localdomain> Thomas, Your patch has the right diagnosis, although I would write it a tad differently. NCH(n) <= i + 2 should be NCH(n) < i + 2, because CHILD(n, NCH(i)) is not valid. I'll check it in. Jeremy From jeremy at alum.mit.edu Tue Jan 23 02:23:56 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 20:23:56 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <20010119232323.70B03116392@oratrix.oratrix.nl> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> Message-ID: <14956.56620.706531.647341@localhost.localdomain> >>>>> "JJ" == Jack Jansen writes: JJ> Recently, Guido van Rossum said: >> > I get the impression that I'm currently seeing a non-NULL third >> > argument in my (C) methods even though the method is called >> > without keyword arguments. >> >> > Is this new semantics that I missed the discussion about, or is >> > this a bug? >> >> [...] Do you really need the NULL? JJ> The places that I know I was counting on the NULL now have "if ( JJ> kw && PyObject_IsTrue(kw))", so I'll just have to hope there JJ> aren't any more lingering in there. Guido, Does your query ("Do you really need the NULL?") mean that you don't care whether the argument is NULL or an empty dictionary? I could change the code to do either for 2.1a2, if you have a preference. Jeremy From guido at digicool.com Tue Jan 23 02:33:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 20:33:20 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Mon, 22 Jan 2001 20:23:56 EST." <14956.56620.706531.647341@localhost.localdomain> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> Message-ID: <200101230133.UAA04378@cj20424-a.reston1.va.home.com> > Guido, > > Does your query ("Do you really need the NULL?") mean that you don't > care whether the argument is NULL or an empty dictionary? I could > change the code to do either for 2.1a2, if you have a preference. > > Jeremy Robust code IMO should treat NULL and {} the same. But since traditionally we passed NULL, it's better to pass NULL rather than {}. I believe that's the status quo now, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Tue Jan 23 02:54:53 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jan 2001 20:54:53 -0500 (EST) Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: <200101230133.UAA04378@cj20424-a.reston1.va.home.com> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> <200101230133.UAA04378@cj20424-a.reston1.va.home.com> Message-ID: <14956.58477.874472.190937@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: [Jeremy wrote:] >> Does your query ("Do you really need the NULL?") mean that you >> don't care whether the argument is NULL or an empty dictionary? >> I could change the code to do either for 2.1a2, if you have a >> preference. GvR> Robust code IMO should treat NULL and {} the same. But since GvR> traditionally we passed NULL, it's better to pass NULL rather GvR> than {}. I believe that's the status quo now, right? The current status in CVS is to pass {}, because there appeared to be some case where a PyCFunction was not expecting NULL. I assumed, without checking, that {} was required and change the implementation to always pass a dictionary to METH_KEYWORDS functions. I could change it back to NULL and see if I can reproduce the error I was seeing. Jeremy From guido at digicool.com Tue Jan 23 03:01:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 21:01:12 -0500 Subject: [Python-Dev] Keyword arg dictionary without keyword arguments In-Reply-To: Your message of "Mon, 22 Jan 2001 20:54:53 EST." <14956.58477.874472.190937@localhost.localdomain> References: <200101191634.LAA29239@cj20424-a.reston1.va.home.com> <20010119232323.70B03116392@oratrix.oratrix.nl> <14956.56620.706531.647341@localhost.localdomain> <200101230133.UAA04378@cj20424-a.reston1.va.home.com> <14956.58477.874472.190937@localhost.localdomain> Message-ID: <200101230201.VAA15993@cj20424-a.reston1.va.home.com> > [Jeremy wrote:] > >> Does your query ("Do you really need the NULL?") mean that you > >> don't care whether the argument is NULL or an empty dictionary? > >> I could change the code to do either for 2.1a2, if you have a > >> preference. > > GvR> Robust code IMO should treat NULL and {} the same. But since > GvR> traditionally we passed NULL, it's better to pass NULL rather > GvR> than {}. I believe that's the status quo now, right? > > The current status in CVS is to pass {}, because there appeared to be > some case where a PyCFunction was not expecting NULL. I assumed, > without checking, that {} was required and change the implementation > to always pass a dictionary to METH_KEYWORDS functions. I could > change it back to NULL and see if I can reproduce the error I was > seeing. Yes, that's a good idea. I hope that the {} in alpha 1 won't make folks think that they will never see a NULL in the future and code accordingly... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 23 03:15:11 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 21:15:11 -0500 Subject: [Python-Dev] 2.1a1 release tonight -- but no nested scopes or weak refs Message-ID: <200101230215.VAA16577@cj20424-a.reston1.va.home.com> We've decided to release 2.1a1 without further ado, but without two big hopeful patches: Jeremy's nested scopes aren't finished and will take considerably more time, and Fred's weak references need more review (I haven't had the time to look at the code). Rather than wait longer, I've decided to try and release 2.1a1 tonight -- there's nothing I'm waiting for now before I can cut a tarball. There will be an alpha2 release around February 1. Please don't make any check-ins until I announce the 2.1a1 release here. (PythonLabs: please mail or phone me if you need to check in a last-minute thing -- I'm tagging the tree now.) More news as it happens, --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Jan 23 03:36:24 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 22 Jan 2001 20:36:24 -0600 (CST) Subject: [Python-Dev] test_grammar failing Message-ID: <14956.60968.363878.643640@beluga.mojam.com> At the end of this: make distclean ; ./configure ; make OPT='-g -pipe' ; make test I get this: rm -f ./Lib/test/*.py[co] PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l test_grammar name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 locals: {'x': 2, '[1]': 1, 'l': 0} globals: {} Fatal Python error: compiler did not label name as local or global make: *** [test] Aborted PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l test_grammar name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 locals: {'x': 2, '[1]': 1, 'l': 0} globals: {} Fatal Python error: compiler did not label name as local or global make: *** [test] Aborted Any ideas? I notice that Jeremy checked in some changes to test_grammar.py this evening. Skip From gvwilson at nevex.com Tue Jan 23 03:47:33 2001 From: gvwilson at nevex.com (Greg Wilson) Date: Mon, 22 Jan 2001 21:47:33 -0500 (EST) Subject: [Python-Dev] re: I think my set module is ready for prime time Message-ID: > > Guido van Rossum: > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. > Eric Raymond: > Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > ...doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. I agree with Eric's point; I put the interface design on hold while I went off to try to find an efficient implementation capable of handling mutable values (i.e. one that would allow things like sets of sets). I'm still looking :-(, but would appreciate comments from this list on Eric's interface. Thanks, Greg From guido at digicool.com Tue Jan 23 04:02:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 22:02:50 -0500 Subject: [Python-Dev] test_grammar failing In-Reply-To: Your message of "Mon, 22 Jan 2001 20:36:24 CST." <14956.60968.363878.643640@beluga.mojam.com> References: <14956.60968.363878.643640@beluga.mojam.com> Message-ID: <200101230302.WAA27104@cj20424-a.reston1.va.home.com> > At the end of this: > > make distclean ; ./configure ; make OPT='-g -pipe' ; make test > > I get this: > > rm -f ./Lib/test/*.py[co] > PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l > test_grammar > name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 > locals: {'x': 2, '[1]': 1, 'l': 0} > globals: {} > Fatal Python error: compiler did not label name as local or global > make: *** [test] Aborted > PYTHONPATH=./build/lib.`cat platform` ./python -tt ./Lib/test/regrtest.py -l > test_grammar > name: None, in test_in_func, file './Lib/test/test_grammar.py', line 617 > locals: {'x': 2, '[1]': 1, 'l': 0} > globals: {} > Fatal Python error: compiler did not label name as local or global > make: *** [test] Aborted > > Any ideas? I notice that Jeremy checked in some changes to test_grammar.py > this evening. Try another cvs update and rebuild. The test that Jeremy checked in is supposed to catch a bug in the compiler code that he checked in. The latest compile.c is 103277 bytes long (in Unix). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 23 04:33:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 22 Jan 2001 22:33:02 -0500 Subject: [Python-Dev] Python 2.1 alpha 1 released! Message-ID: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Thanks to the PythonLabs developers and the many hard-working volunteers, I'm proud to release Python 2.1a1 -- the first alpha release of Python version 2.1. The release mechanics are different than for previous releases: we're only releasing through SourceForge for now. The official source tarball is already available from the download page: http://sourceforge.net/project/showfiles.php?group_id=5470 Additional files will be released soon: a Windows installer, Linux RPMs, and documentation. Please give it a good try! The only way Python 2.1 can become a rock-solid product is if people test the alpha releases. Especially if you are using Python for demanding applications or on extreme platforms we are interested in hearing your feedback. Are you embedding Python or using threads? Please test your application using Python 2.1a1! Please submit all bug reports through SourceForge: http://sourceforge.net/bugs/?group_id=5470 Here's the NEWS file: What's New in Python 2.1 alpha 1? ================================= Core language, builtins, and interpreter - There is a new Unicode companion to the PyObject_Str() API called PyObject_Unicode(). It behaves in the same way as the former, but assures that the returned value is an Unicode object (applying the usual coercion if necessary). - The comparison operators support "rich comparison overloading" (PEP 207). C extension types can provide a rich comparison function in the new tp_richcompare slot in the type object. The cmp() function and the C function PyObject_Compare() first try the new rich comparison operators before trying the old 3-way comparison. There is also a new C API PyObject_RichCompare() (which also falls back on the old 3-way comparison, but does not constrain the outcome of the rich comparison to a Boolean result). The rich comparison function takes two objects (at least one of which is guaranteed to have the type that provided the function) and an integer indicating the opcode, which can be Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, Py_GE (for <, <=, ==, !=, >, >=), and returns a Python object, which may be NotImplemented (in which case the tp_compare slot function is used as a fallback, if defined). Classes can overload individual comparison operators by defining one or more of the methods__lt__, __le__, __eq__, __ne__, __gt__, __ge__. There are no explicit "reflected argument" versions of these; instead, __lt__ and __gt__ are each other's reflection, likewise for__le__ and __ge__; __eq__ and __ne__ are their own reflection (similar at the C level). No other implications are made; in particular, Python does not assume that == is the Boolean inverse of !=, or that < is the Boolean inverse of >=. This makes it possible to define types with partial orderings. Classes or types that want to implement (in)equality tests but not the ordering operators (i.e. unordered types) should implement == and !=, and raise an error for the ordering operators. It is possible to define types whose rich comparison results are not Boolean; e.g. a matrix type might want to return a matrix of bits for A < B, giving elementwise comparisons. Such types should ensure that any interpretation of their value in a Boolean context raises an exception, e.g. by defining __nonzero__ (or the tp_nonzero slot at the C level) to always raise an exception. - Complex numbers use rich comparisons to define == and != but raise an exception for <, <=, > and >=. Unfortunately, this also means that cmp() of two complex numbers raises an exception when the two numbers differ. Since it is not mathematically meaningful to compare complex numbers except for equality, I hope that this doesn't break too much code. - Functions and methods now support getting and setting arbitrarily named attributes (PEP 232). Functions have a new __dict__ (a.k.a. func_dict) which hold the function attributes. Methods get and set attributes on their underlying im_func. It is a TypeError to set an attribute on a bound method. - The xrange() object implementation has been improved so that xrange(sys.maxint) can be used on 64-bit platforms. There's still a limitation that in this case len(xrange(sys.maxint)) can't be calculated, but the common idiom "for i in xrange(sys.maxint)" will work fine as long as the index i doesn't actually reach 2**31. (Python uses regular ints for sequence and string indices; fixing that is much more work.) - Two changes to from...import: 1) "from M import X" now works even if M is not a real module; it's basically a getattr() operation with AttributeError exceptions changed into ImportError. 2) "from M import *" now looks for M.__all__ to decide which names to import; if M.__all__ doesn't exist, it uses M.__dict__.keys() but filters out names starting with '_' as before. Whether or not __all__ exists, there's no restriction on the type of M. - File objects have a new method, xreadlines(). This is the fastest way to iterate over all lines in a file: for line in file.xreadlines(): ...do something to line... See the xreadlines module (mentioned below) for how to do this for other file-like objects. - Even if you don't use file.xreadlines(), you may expect a speedup on line-by-line input. The file.readline() method has been optimized quite a bit in platform-specific ways: on systems (like Linux) that support flockfile(), getc_unlocked(), and funlockfile(), those are used by default. On systems (like Windows) without getc_unlocked(), a complicated (but still thread-safe) method using fgets() is used by default. You can force use of the fgets() method by #define'ing USE_FGETS_IN_GETLINE at build time (it may be faster than getc_unlocked()). You can force fgets() not to be used by #define'ing DONT_USE_FGETS_IN_GETLINE (this is the first thing to try if std test test_bufio.py fails -- and let us know if it does!). - In addition, the fileinput module, while still slower than the other methods on most platforms, has been sped up too, by using file.readlines(sizehint). - Support for run-time warnings has been added, including a new command line option (-W) to specify the disposition of warnings. See the description of the warnings module below. - Extensive changes have been made to the coercion code. This mostly affects extension modules (which can now implement mixed-type numerical operators without having to use coercion), but occasionally, in boundary cases the coercion semantics have changed subtly. Since this was a terrible gray area of the language, this is considered an improvement. Also note that __rcmp__ is no longer supported -- instead of calling __rcmp__, __cmp__ is called with reflected arguments. - In connection with the coercion changes, a new built-in singleton object, NotImplemented is defined. This can be returned for operations that wish to indicate they are not implemented for a particular combination of arguments. From C, this is Py_NotImplemented. - The interpreter accepts now bytecode files on the command line even if they do not have a .pyc or .pyo extension. On Linux, after executing echo ':pyc:M::\x87\xc6\x0d\x0a::/usr/local/bin/python:' > /proc/sys/fs/binfmt_misc/register any byte code file can be used as an executable (i.e. as an argument to execve(2)). - %[xXo] formats of negative Python longs now produce a sign character. In 1.6 and earlier, they never produced a sign, and raised an error if the value of the long was too large to fit in a Python int. In 2.0, they produced a sign if and only if too large to fit in an int. This was inconsistent across platforms (because the size of an int varies across platforms), and inconsistent with hex() and oct(). Example: >>> "%x" % -0x42L '-42' # in 2.1 'ffffffbe' # in 2.0 and before, on 32-bit machines >>> hex(-0x42L) '-0x42L' # in all versions of Python The behavior of %d formats for negative Python longs remains the same as in 2.0 (although in 1.6 and before, they raised an error if the long didn't fit in a Python int). %u formats don't make sense for Python longs, but are allowed and treated the same as %d in 2.1. In 2.0, a negative long formatted via %u produced a sign if and only if too large to fit in an int. In 1.6 and earlier, a negative long formatted via %u raised an error if it was too big to fit in an int. - Dictionary objects have an odd new method, popitem(). This removes an arbitrary item from the dictionary and returns it (in the form of a (key, value) pair). This can be useful for algorithms that use a dictionary as a bag of "to do" items and repeatedly need to pick one item. Such algorithms normally end up running in quadratic time; using popitem() they can usually be made to run in linear time. Standard library - In the time module, the time argument to the functions strftime, localtime, gmtime, asctime and ctime is now optional, defaulting to the current time (in the local timezone). - The ftplib module now defaults to passive mode, which is deemed a more useful default given that clients are often inside firewalls these days. Note that this could break if ftplib is used to connect to a *server* that is inside a firewall, from outside; this is expected to be a very rare situation. To fix that, you can call ftp.set_pasv(0). - The module site now treats .pth files not only for path configuration, but also supports extensions to the initialization code: Lines starting with import are executed. - There's a new module, warnings, which implements a mechanism for issuing and filtering warnings. There are some new built-in exceptions that serve as warning categories, and a new command line option, -W, to control warnings (e.g. -Wi ignores all warnings, -We turns warnings into errors). warnings.warn(message[, category]) issues a warning message; this can also be called from C as PyErr_Warn(category, message). - A new module xreadlines was added. This exports a single factory function, xreadlines(). The intention is that this code is the absolutely fastest way to iterate over all lines in an open file(-like) object: import xreadlines for line in xreadlines.xreadlines(file): ...do something to line... This is equivalent to the previous the speed record holder using file.readlines(sizehint). Note that if file is a real file object (as opposed to a file-like object), this is equivalent: for line in file.xreadlines(): ...do something to line... - The bisect module has new functions bisect_left, insort_left, bisect_right and insort_right. The old names bisect and insort are now aliases for bisect_right and insort_right. XXX_right and XXX_left methods differ in what happens when the new element compares equal to one or more elements already in the list: the XXX_left methods insert to the left, the XXX_right methods to the right. Code that doesn't care where equal elements end up should continue to use the old, short names ("bisect" and "insort"). - The new curses.panel module wraps the panel library that forms part of SYSV curses and ncurses. Contributed by Thomas Gellekum. - The SocketServer module now sets the allow_reuse_address flag by default in the TCPServer class. - A new function, sys._getframe(), returns the stack frame pointer of the caller. This is intended only as a building block for higher-level mechanisms such as string interpolation. Build issues - For Unix (and Unix-compatible) builds, configuration and building of extension modules is now greatly automated. Rather than having to edit the Modules/Setup file to indicate which modules should be built and where their include files and libraries are, a distutils-based setup.py script now takes care of building most extension modules. All extension modules built this way are built as shared libraries. Only a few modules that must be linked statically are still listed in the Setup file; you won't need to edit their configuration. - Python should now build out of the box on Cygwin. If it doesn't, mail to Jason Tishler (jlt63 at users.sourceforge.net). - Python now always uses its own (renamed) implementation of getopt() -- there's too much variation among C library getopt() implementations. - C++ compilers are better supported; the CXX macro is always set to a C++ compiler if one is found. Windows changes - select module: By default under Windows, a select() call can specify no more than 64 sockets. Python now boosts this Microsoft default to 512. If you need even more than that, see the MS docs (you'll need to #define FD_SETSIZE and recompile Python from source). - Support for Windows 3.1, DOS and OS/2 is gone. The Lib/dos-8x3 subdirectory is no more! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Tue Jan 23 05:11:09 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:11:09 -0800 (PST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <3A6CBBEF.4732BFF2@ActiveState.com> Message-ID: Guido van Rossum wrote: > Yes, wow! Paul Prescod wrote: > I apologize but I'm not clear on my responsibilities here, if any. I > wrote a PEP for online help. I submitted a partial implementation. Hi, guys. Sorry i haven't been sending updates on what i'm doing. Here's the current picture as i see it. > Ping wrote a full implementation that basically supercedes mine. My implementation is "full" in that it deploys and seems to work on arbitrary modules as it stands, but it doesn't really supercede Paul's because it leaves out the big piece of Paul's work that did conversion from packaged HTML docs to plain text. It also has the deficiency that it imports modules live; for untrusted modules, this is a security risk. I know Paul has been working on stuff to compile a module into a kind of skeleton object that has all the same name bindings but no live contents, and if that works reliably, we should definitely try plugging that in. > There are various ideas for improving it, but I think that we agree > that the core is solid. Yes. I believe that as it stands, pydoc is useful enough to be a net positive addition to the core. inspect.py alone has been stable and alpha-ready for some time, i believe. Here is a summary of its status and work that remains. pydoc has: inspecting live objects generating text docs from live objects generating HTML docs from live objects serving HTML docs from a little web server showing docs from the command line showing docs from within the interactive interpreter apropos-style module listing It's missing the following, and Paul had stuff for this: inspecting unsafe modules generating text docs from packaged HTML (e.g. language reference) It also needs these: generating docs from a file given on the command line (easy) more Windows and Mac testing and decisions various small bugfixes This past week i've been messing around with Windows and Mac stuff, trying to see whether it's possible to reliably spawn a webserver and launch a web browser at the same time (this would seem to be a good default action to do on GUI platforms). In trying to do the latter i've found the webbrowser module pretty unreliable, by the way. For example, it relies on a constant delay of 4 seconds to launch a new browser that can't be expected on all platforms, and fails to launch Netscape 3 because it supplies an illegal command-line option. When i've found good cross-platform ways to make this work i'll suggest some patches. I've so far considered this project blocked only on cross-platform testing -- do you agree? While i know that inspecting unsafe modules and processing packaged HTML are important features, i don't consider them essential. -- ?!ng From ping at lfw.org Tue Jan 23 05:14:50 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:14:50 -0800 (PST) Subject: [Python-Dev] webbrowser.py In-Reply-To: Message-ID: On Mon, 22 Jan 2001, Ka-Ping Yee wrote: > In trying to do the latter i've found the webbrowser module pretty > unreliable, by the way. For example, it relies on a constant delay > of 4 seconds to launch a new browser that can't be expected on all > platforms, and fails to launch Netscape 3 because it supplies an > illegal command-line option. When i've found good cross-platform > ways to make this work i'll suggest some patches. Oh, and i forgot to mention... i was pretty disappointed that: setenv BROWSER my_browser_program python -c 'import webbrowser; webbrowser.open("http://python.org/")' doesn't execute "my_browser_program http://python.org/" as i would have hoped. Even for a known browser type: setenv BROWSER lynx python -c 'import webbrowser; webbrowser.open("http://python.org/")' does not work as expected, either. (Red Hat Linux here.) -- ?!ng From ping at lfw.org Tue Jan 23 05:22:56 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 20:22:56 -0800 (PST) Subject: [Python-Dev] Is X a (sequence|mapping)? Message-ID: We can implement abstract interfaces (sequence, mapping, number) in Python with the appropriate __special__ methods, but i don't see an easy way to test if something supports one of these abstract interfaces in Python. At the moment, to see if something is a sequence i believe i have to say something like try: x[0] except: # not a sequence else: # okay, it's a sequence or if hasattr(x, '__getitem__') or type(x) in [type(()), type([])]: ... Is there, or should there be, a better way to do this? -- ?!ng From greg at cosc.canterbury.ac.nz Tue Jan 23 05:46:26 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Jan 2001 17:46:26 +1300 (NZDT) Subject: [Python-Dev] re: I think my set module is ready for prime time In-Reply-To: Message-ID: <200101230446.RAA01992@s454.cosc.canterbury.ac.nz> Greg Wilson : > an efficient implementation capable of > handling mutable values (i.e. one that would allow things like sets of > sets) I suspect that such a thing is impossible. To avoid a linear search you have to take advantage of some kind of hashing or ordering, which you can't do if your objects can change their values out from under you. Also, there's nothing to stop someone from mutating two previously unequal elements so that they're equal. Then you have a "set" with two identical elements, which isn't a set any more, it's just a collection. So, I submit that the very concept of a set only makes sense for immutable values. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Tue Jan 23 06:03:18 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 00:03:18 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: Message-ID: [?!ng] > ... > At the moment, to see if something is a sequence i believe i have to > say something like > > try: > x[0] > except: > # not a sequence > else: > # okay, it's a sequence > > or > > if hasattr(x, '__getitem__') or type(x) in [type(()), type([])]: > ... > > Is there, or should there be, a better way to do this? Dunno. What's a sequence? If you want to know whether x[0] will blow up, trying x[0] is the most obvious way. BTW, I expect trying x[:0] is a better idea: doesn't succeed for dicts, and doesn't blow up for an irrelevant reason if x is an empty sequence. BTW2, your second method suggests an uncomfortable truth: many contexts that want "a sequence" don't want strings to pass the test, despite that strings are as much sequences as lists in Python, no matter how "a sequence" is defined. afraid that-what-you-want-to-do-with-it-is-more-important-than-what- python-calls-it-ly y'rs - tim From ping at lfw.org Tue Jan 23 06:27:30 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 22 Jan 2001 21:27:30 -0800 (PST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010122124159.A14999@thyrsus.com> Message-ID: On Mon, 22 Jan 2001, Eric S. Raymond wrote: > \section{\module{set} --- > Basic set algebra for Python} I'd like to look at the module. Did you actually show us the code for this, or am i a blind doofus? (Please, no answers to the unasked question of whether i am a doofus.) -- ?!ng From tim.one at home.com Tue Jan 23 07:05:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 01:05:26 -0500 Subject: [Python-Dev] Worse news In-Reply-To: <20010122064400.A26543@glacier.fnational.com> Message-ID: In finding and repairing the test_extcall bug, Neil and Thomas have once again contributed beyond the call of duty. Thank you! It took some doing to convince Guido to release his Dutch Death Grip on the PythonLabs coffers, but in the end he was overcome by the moral necessity of rewarding you sterling fellows for your golden deeds: you're both entitled to free(*)-- yes, FREE(*)! --copies of all Python 2.1 alpha, *and* beta, releases(*)! you-wouldn't-believe-how-much-he-charges-us-ly y'rs - tim (*) Does not apply to Jython releases. All applicable taxes are the responsibility of the recipient. No warranty is expressed or implied. This offer has not been reviewed or approved by CWI, CNRI, BeOpen.com, or Digital Creations 2. Export restrictions may apply. By acceptance of this offer, recipient grants perpetual license to use their name, image and likeness in Python promotional materials without compensation. Packaging, handling, shipping and insurance costs to be borne by recipient, but in no case to exceed 1 (one) US$/byte. This offer may be withdrawn at any time, including but not limited to retroactively, at the sole discretion of Guido van Rossum, or such of his heirs and successors as he may designate from time to time. From martin at mira.cs.tu-berlin.de Tue Jan 23 09:14:32 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 23 Jan 2001 09:14:32 +0100 Subject: [Python-Dev] Is X a (sequence|mapping)? Message-ID: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> > i don't see an easy way to test if something supports one of these > abstract interfaces in Python. Why do you want to test for that? If you have an algorithm that only operates on integer-indexed things, what can you do if the test fails? So it is always better to just use the object in the algorithm, and let it break with an exception if somebody passes a bad object. Regards, Martin From mal at lemburg.com Tue Jan 23 10:08:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 10:08:24 +0100 Subject: [Python-Dev] webbrowser.py References: Message-ID: <3A6D4A08.B3806984@lemburg.com> Ka-Ping Yee wrote: > > On Mon, 22 Jan 2001, Ka-Ping Yee wrote: > > In trying to do the latter i've found the webbrowser module pretty > > unreliable, by the way. For example, it relies on a constant delay > > of 4 seconds to launch a new browser that can't be expected on all > > platforms, and fails to launch Netscape 3 because it supplies an > > illegal command-line option. When i've found good cross-platform > > ways to make this work i'll suggest some patches. > > Oh, and i forgot to mention... i was pretty disappointed that: > > setenv BROWSER my_browser_program > python -c 'import webbrowser; webbrowser.open("http://python.org/")' > > doesn't execute "my_browser_program http://python.org/" as i would > have hoped. Even for a known browser type: > > setenv BROWSER lynx > python -c 'import webbrowser; webbrowser.open("http://python.org/")' > > does not work as expected, either. (Red Hat Linux here.) Hmm, lynx should work (the module has explicit support for it) and yes, I agree, webbrowser should trust BROWSER and use a generic calling mechanism (program ) for opening the URL. Too late for 2.1a1, but maybe for a2 ?! BTW, I think that the second line here is causing the problem: class CommandLineBrowser: _browsers = [] # <- this overrides the global of the same name if os.environ.get("DISPLAY"): _browsers.extend([ ("netscape", "netscape %s >/dev/null &"), ("mosaic", "mosaic %s >/dev/null &"), ]) _browsers.extend([ ("lynx", "lynx %s"), ("w3m", "w3m %s"), ]) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Jan 23 10:15:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 10:15:11 +0100 Subject: [Python-Dev] Is X a (sequence|mapping)? References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> Message-ID: <3A6D4B9F.38B17046@lemburg.com> "Martin v. Loewis" wrote: > > > i don't see an easy way to test if something supports one of these > > abstract interfaces in Python. > > Why do you want to test for that? If you have an algorithm that only > operates on integer-indexed things, what can you do if the test fails? > > So it is always better to just use the object in the algorithm, and > let it break with an exception if somebody passes a bad object. Right. Polymorphic code will usually get you more out of an algorithm, than type-safe or interface-safe code. BTW, there are Python interfaces to PySequence_Check() and PyMapping_Check() burried in the builtin operator module in case you really do care ;) ... operator.isSequenceType() operator.isMappingType() + some other C style _Check() APIs These only look at the type slots though, so Python instances will appear to support everything but when used fail with an exception if they don't provide the proper __xxx__ hooks. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 23 10:17:30 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 04:17:30 -0500 Subject: [Python-Dev] webbrowser.py Message-ID: <20010123041730.A25165@thyrsus.com> Ping's complaints are justified -- I've been looking at and testing webbrowser.py and it's a mess. Among other things: 1. The BROWSER variable is not interpreted properly. 2. The code is stupid about loading platform support it doesn't need. 3. It's not possible to specify lynx as a browser under Unix, because the computation of available browsers is split in two and partly done inside the CommandLineBrowser class. 3. The module code is excessively hard to read, obscuring these bugs. Our mistake was hurriedly merging the launcher code from IDLE with the browser-finder hack I wrote (the guts of CommandLineBrowser). The resulting code is a bad, overcomplicated architecture with a nasty seam in it. As co-designer/implementor I should have caught this sooner, but I was in a hurry to get a CML2 prototype out the door and didn't test anything but the case I needed. My apologies to all. I'm rewriting to fix these problems now. Documented semantics of entry points will be preserved. -- Eric S. Raymond The politician attempts to remedy the evil by increasing the very thing that caused the evil in the first place: legal plunder. -- Frederick Bastiat From mal at lemburg.com Tue Jan 23 11:26:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 11:26:16 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> Message-ID: <3A6D5C48.A076DA0@lemburg.com> "Eric S. Raymond" wrote: > > Guido van Rossum : > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. > > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. > > The only other set module I can find in the Vaults or anywhere else is > kjBuckets (which I knew about before). Looks like a good design, but > complicated -- and requires installation of an extension. There's also a kjSet.py available at Aaron's site: http://www.chordate.com/kwParsing/index.html which is a pure Python version of the C extenion's kjSet type. > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: > > 1. It supports a set of operations that are both often useful and > fiddly to get right, thus enhancing the "batteries are included" > effect. (I used its ancestor for representing seen-message numbers in > a specialized mailreader, for example.) > > 2. It's simple for application programmers to use. No extension module > to integrate. > > 3. It's unsurprising. My set objects behave almost exactly like other > mutable sequences, with all the same built-in methods working, except for > the fact that you can't introduce duplicates with the mutators. > > 4. It's already completely documented in a form suitable for the library. > > 5. It's simple enough not to cause you maintainance hassles down the > road, and even if it did the maintainer is unlikely to disappear :-). All very well, but are sets really that essential to every day Python programming ? If we include sets then we ought to also include graphs, tries, btrees and all those other goodies we have in computer science. All of these types are available out there, but I believe the audience who really cares for these types is also capable of downloading the extensions and installing them. It would be nice if all of these extension could go into a SUMO edition of Python though... together with your set module. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 23 12:08:06 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 06:08:06 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D5C48.A076DA0@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 11:26:16AM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> Message-ID: <20010123060806.A25436@thyrsus.com> M.-A. Lemburg : > All very well, but are sets really that essential to every > day Python programming ? If we include sets then we ought to > also include graphs, tries, btrees and all those other goodies > we have in computer science. I use sets a lot. And there was enough demand to generate a PEP. But the wider question here is how seriously we take "batteries are included" as a design principle. Does a facility have to be useful *every day* to be worth being in the standard library? And if so, what are things like the POP3 and IMAP libraries (or, for that matter, my own shlex and netrc modules) doing there? I don't think so. I think there are at least four different possible reasons for something to be in the standard library: 1. It's useful every day. 2. It's useful less frequently than every day, but is a stable cross-platform implementation of a wheel that would otherwise have to be reinvented frequently. That is, you can solve it *once* and have a zero-maintainance increment to the power of the language. 3. It's a technique that's not often used, and not necessarily stable in the face of platform variations, but nothing else will do when you need it and it's notably difficult to get right. (popen2 and BaseHTTPServer would be good examples of this.) 4. It's a developer checklist feature that improves Python's competitive position against Perl, Tcl, and other contenders for the same ecological niche. IMO a lightweight set facility, like POP3 and IMAP, qualifies under 2 and 4 even if not under 1 and 3. This question keeps coming up in different guises. I'm often the one to raise it, because I favor an aggressive interpretation of "batteries are included" that would pull in a lot of stuff. Yes, this makes more work for us -- but I think it's work we should be doing. While minimalism is an excellent design heuristic for the core language, I think it's a bad one for the libraries. Python is a high-level language and programmers using it both expect and deserve high-level libraries -- yes, including graphs/tries/btrees and all that computer science stuff. Just as much to the point, Python competing against languages like Perl that frequently get design wins against it because of the richness of the environment *they* are willing to carry around. Guido and Tim and others are more conservative than I, which would be OK -- but it seems to me that the conservatives do not have consistent or well-thought-out criteria for what to include, which is *not* OK. We need to solve this problem. Some time back I initiated a library guidelines PEP, then dropped it due to press of overwork. But the general question is going to keep coming up and we ought to have policy guidelines that potential library developers can understand. Should I pick this up again? -- Eric S. Raymond I do not find in orthodox Christianity one redeeming feature. -- Thomas Jefferson From mal at lemburg.com Tue Jan 23 12:50:39 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 12:50:39 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> Message-ID: <3A6D700F.7A9E2509@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > All very well, but are sets really that essential to every > > day Python programming ? If we include sets then we ought to > > also include graphs, tries, btrees and all those other goodies > > we have in computer science. > > I use sets a lot. And there was enough demand to generate a PEP. Sure, but sets are fairly easy to implement using Python dictionaries -- at least at the level normally needed by Python programs. Sets, queues and graphs are examples of data types which can have many different faces; it is hard to design APIs for these which meet everyones needs. > But the wider question here is how seriously we take "batteries are > included" as a design principle. Does a facility have to be useful > *every day* to be worth being in the standard library? And if so, > what are things like the POP3 and IMAP libraries (or, for that matter, > my own shlex and netrc modules) doing there? You can argue the same way for all kinds of extensions and packages you find in the Vaults. That's why there's demand for a different packaging of Python and this is what Moshe's PEP 206 addresses: http://python.sourceforge.net/peps/pep-0206.html > I don't think so. I think there are at least four different > possible reasons for something to be in the standard library: > > 1. It's useful every day. > > 2. It's useful less frequently than every day, but is a stable > cross-platform implementation of a wheel that would otherwise have to > be reinvented frequently. That is, you can solve it *once* and have a > zero-maintainance increment to the power of the language. > > 3. It's a technique that's not often used, and not necessarily stable > in the face of platform variations, but nothing else will do > when you need it and it's notably difficult to get right. (popen2 and > BaseHTTPServer would be good examples of this.) > > 4. It's a developer checklist feature that improves Python's competitive > position against Perl, Tcl, and other contenders for the same ecological > niche. > > IMO a lightweight set facility, like POP3 and IMAP, qualifies under 2 and 4 > even if not under 1 and 3. > > This question keeps coming up in different guises. I'm often the one to > raise it, because I favor an aggressive interpretation of "batteries > are included" that would pull in a lot of stuff. Yes, this makes more > work for us -- but I think it's work we should be doing. > > While minimalism is an excellent design heuristic for the core language, > I think it's a bad one for the libraries. Python is a high-level language > and programmers using it both expect and deserve high-level libraries -- > yes, including graphs/tries/btrees and all that computer science stuff. > > Just as much to the point, Python competing against languages like > Perl that frequently get design wins against it because of the > richness of the environment *they* are willing to carry around. > > Guido and Tim and others are more conservative than I, which would be > OK -- but it seems to me that the conservatives do not have consistent > or well-thought-out criteria for what to include, which is *not* OK. > We need to solve this problem. > > Some time back I initiated a library guidelines PEP, then dropped it > due to press of overwork. But the general question is going to keep > coming up and we ought to have policy guidelines that potential > library developers can understand. > > Should I pick this up again? Hmm, we already have the PEP 206 which focusses on the topic. Perhaps you could work with Moshe to sort out the "which batteries do we need" sub-topic ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Tue Jan 23 13:20:46 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 07:20:46 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D700F.7A9E2509@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 12:50:39PM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> Message-ID: <20010123072046.A25593@thyrsus.com> M.-A. Lemburg : > > But the wider question here is how seriously we take "batteries are > > included" as a design principle. Does a facility have to be useful > > *every day* to be worth being in the standard library? And if so, > > what are things like the POP3 and IMAP libraries (or, for that matter, > > my own shlex and netrc modules) doing there? > > You can argue the same way for all kinds of extensions and > packages you find in the Vaults. That's why there's demand for > a different packaging of Python and this is what Moshe's > PEP 206 addresses: > > http://python.sourceforge.net/peps/pep-0206.html Muttering "PEP 206" evades the fundamental problem rather than solving it. Not that I'm saying Moshe hasn't made a valiant effort, within the political constraint that the BDFL and others seem unwilling to confront the deeper issue. But PEP 206 is not enough. Here is why: 1. If the "Sumo" packaging ever happens, the vanilla non-Sumo version that Guido issues will quickly become of mostly theoretical interest -- because Red Hat and everybody else will move to Sumo instantly, figuring they have nothing to lose by including more features. 2. If by some change I'm wrong about 1, the outcome will be worse; we'll in effect have fragmented the language, because there won't be consistency in what library stuff is available between Sumo and non-Sumo builds on the same platform. 3. There are documentation issues as well. It's already a blot on Python that the standard documentation set doesn't cover Tkinter. In the Sumo distribution, the gap between what's installed and what's documented is likely to widen further. Developers will see this as pointlessly irritating -- and they'll be right. The stock distribution should *be* the Sumo distribution. If we're really so terrified of the extra maintainence load, then the right fix is to mark some modules and documentation as "externally maintained" with prominent pointers back to the responsible people. -- Eric S. Raymond The day will come when the mystical generation of Jesus by the Supreme Being as his father, in the womb of a virgin, will be classed with the fable of the generation of Minerva in the brain of Jupiter. -- Thomas Jefferson, 1823 From mal at lemburg.com Tue Jan 23 13:48:09 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 13:48:09 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> Message-ID: <3A6D7D89.A6BE1B74@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > > But the wider question here is how seriously we take "batteries are > > > included" as a design principle. Does a facility have to be useful > > > *every day* to be worth being in the standard library? And if so, > > > what are things like the POP3 and IMAP libraries (or, for that matter, > > > my own shlex and netrc modules) doing there? > > > > You can argue the same way for all kinds of extensions and > > packages you find in the Vaults. That's why there's demand for > > a different packaging of Python and this is what Moshe's > > PEP 206 addresses: > > > > http://python.sourceforge.net/peps/pep-0206.html > > Muttering "PEP 206" evades the fundamental problem rather than solving it. > > Not that I'm saying Moshe hasn't made a valiant effort, within the political > constraint that the BDFL and others seem unwilling to confront the deeper > issue. But PEP 206 is not enough. Here is why: > > 1. If the "Sumo" packaging ever happens, the vanilla non-Sumo version that > Guido issues will quickly become of mostly theoretical interest -- because > Red Hat and everybody else will move to Sumo instantly, figuring they have > nothing to lose by including more features. > > 2. If by some change I'm wrong about 1, the outcome will be worse; > we'll in effect have fragmented the language, because there won't be > consistency in what library stuff is available between Sumo and > non-Sumo builds on the same platform. > > 3. There are documentation issues as well. It's already a blot on > Python that the standard documentation set doesn't cover Tkinter. In > the Sumo distribution, the gap between what's installed and what's > documented is likely to widen further. Developers will see this as > pointlessly irritating -- and they'll be right. > > The stock distribution should *be* the Sumo distribution. If we're really > so terrified of the extra maintainence load, then the right fix is to > mark some modules and documentation as "externally maintained" with > prominent pointers back to the responsible people. That's your POV, others think different and since this is not a democracy, the Sumo distribution is a feasable way of satisfying both needs. There are a few other issues to consider as well: * licensing is a problem (and this is also mentioned in the PEP 206) since some of the nicer additions are GPLed and thus not in the spirit of Python's closed-source friendliness which has provided it with a large user base in the commercial field * packages authors are not all the same and some may not want to split their distribution due to the integration of their package in a Sumo-distribution * the packages mentioned in PEP 206 are very complex and usually largish; maintaining them will cause much more effort compared to the standard lib modules and extensions * the build process varies widely between packages; even though we have distutils, some of the packages extend it to fit their specific needs (which is OK, but causes extra efforts in getting the build process combined) I'm not objecting to the Sumo-distribution project; to the contrary -- I tried a similar project a few years ago: the Python PowerTools distribution which you can download from: http://www.lemburg.com/python/PowerTools-0.2.zip The project died quickly though, as I wasn't able to keep up with the maintenance effort. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin at cnri.reston.va.us Tue Jan 23 14:40:06 2001 From: akuchlin at cnri.reston.va.us (Andrew Kuchling) Date: Tue, 23 Jan 2001 08:40:06 -0500 Subject: [Python-Dev] What does "batteries are included" mean? In-Reply-To: <3A6D7D89.A6BE1B74@lemburg.com>; from mal@lemburg.com on Tue, Jan 23, 2001 at 01:48:09PM +0100 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> <3A6D7D89.A6BE1B74@lemburg.com> Message-ID: <20010123084006.A23485@newcnri.cnri.reston.va.us> On Tue, Jan 23, 2001 at 01:48:09PM +0100, M.-A. Lemburg wrote: >There are a few other issues to consider as well: > To add a few: * The larger the amount of code in the distribution, the more effort it is maintain it all. * Minor fixes aren't available until the next Python release. For example, to drag out the XML code again: there have been two PyXML releases since Python 2.0 fixing various bugs, but someone who sticks to installing just Python will not be able to get at those bugfixes until April (when 2.1 is supposed to get finalized). If there were a core Python distribution and a sumo distribution, and the sumo distribution was the one that most people downloaded and used, that would be perfectly OK. Practically no one assembles their own Linux distribution, and that's not considered a problem. To some degree, if you're using a well-packaged Linux distribution such as Debian, you also have Python distribution mechanism with intermodule dependencies; we just have to reinvent the wheel for people on other platforms. >The project died quickly though, as I wasn't able to keep >up with the maintenance effort. Interesting. Did you get much feedback indicating that people used it much? Perhaps when you were doing that effort the Python community was composed more of self-reliant early adopter types; there are probably more newbies around now. --amk From mal at lemburg.com Tue Jan 23 15:05:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 23 Jan 2001 15:05:13 +0100 Subject: [Python-Dev] What does "batteries are included" mean? References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <3A6D5C48.A076DA0@lemburg.com> <20010123060806.A25436@thyrsus.com> <3A6D700F.7A9E2509@lemburg.com> <20010123072046.A25593@thyrsus.com> <3A6D7D89.A6BE1B74@lemburg.com> <20010123084006.A23485@newcnri.cnri.reston.va.us> Message-ID: <3A6D8F99.53A0F411@lemburg.com> Andrew Kuchling wrote: > > On Tue, Jan 23, 2001 at 01:48:09PM +0100, M.-A. Lemburg wrote: > >There are a few other issues to consider as well: > > > > To add a few: > > * The larger the amount of code in the distribution, the more effort it is > maintain it all. > > * Minor fixes aren't available until the next Python release. For example, > to drag out the XML code again: there have been two PyXML releases since > Python 2.0 fixing various bugs, but someone who sticks to installing just > Python will not be able to get at those bugfixes until April (when 2.1 > is supposed to get finalized). > > If there were a core Python distribution and a sumo distribution, and the > sumo distribution was the one that most people downloaded and used, that > would be perfectly OK. Practically no one assembles their own Linux > distribution, and that's not considered a problem. To some degree, if > you're using a well-packaged Linux distribution such as Debian, you also > have Python distribution mechanism with intermodule dependencies; we just > have to reinvent the wheel for people on other platforms. > > >The project died quickly though, as I wasn't able to keep > >up with the maintenance effort. > > Interesting. Did you get much feedback indicating that people used it much? Not much -- the interested parties were mostly Python experts (the lib started out as a project called expert-lib). > Perhaps when you were doing that effort the Python community was composed > more of self-reliant early adopter types; there are probably more newbies > around now. True. The included packages are dated 1997-1998 -- at that time Starship was just starting to get off the ground (this are moving at a much faster pace now). The PowerTools package still uses the Makefile.pre.in mechanism (with much success though) as distutils wasn't even considered at the time. Perhaps Moshe could pick this up to have a head start for Sumo-Python ?! Some of the included packages are not available elsewhere, AFAIK, so it may well be worthwhile having a look (e.g. the LGPLed trie and btree implementations donated by John W. M. Stevens). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Tue Jan 23 15:06:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 09:06:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: Your message of "Tue, 23 Jan 2001 04:17:30 EST." <20010123041730.A25165@thyrsus.com> References: <20010123041730.A25165@thyrsus.com> Message-ID: <200101231406.JAA04765@cj20424-a.reston1.va.home.com> > Ping's complaints are justified -- I've been looking at and testing > webbrowser.py and it's a mess. Among other things: > > 1. The BROWSER variable is not interpreted properly. > > 2. The code is stupid about loading platform support it doesn't need. > > 3. It's not possible to specify lynx as a browser under Unix, because the > computation of available browsers is split in two and partly done inside > the CommandLineBrowser class. > > 3. The module code is excessively hard to read, obscuring these bugs. > > Our mistake was hurriedly merging the launcher code from IDLE with the > browser-finder hack I wrote (the guts of CommandLineBrowser). The resulting > code is a bad, overcomplicated architecture with a nasty seam in it. > > As co-designer/implementor I should have caught this sooner, but I was > in a hurry to get a CML2 prototype out the door and didn't test > anything but the case I needed. My apologies to all. > > I'm rewriting to fix these problems now. Documented semantics of entry > points will be preserved. Excellent, Eric! That's the spirit. Can you point me to docs explaining the meaning of the BROWSER environment variable? I've never heard of it... The last new environment variables I learned were PAGER and EDITOR, probably 15 years ago when 4.1BSD was released... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 23 15:22:26 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 09:22:26 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101231406.JAA04765@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 09:06:47AM -0500 References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> Message-ID: <20010123092226.A25968@thyrsus.com> Guido van Rossum : > Can you point me to docs explaining the meaning of the BROWSER > environment variable? I've never heard of it... The last new > environment variables I learned were PAGER and EDITOR, probably 15 > years ago when 4.1BSD was released... :-) You've never heard of BROWSER because I invented it and have not widely popularized it yet :-). Ping knew about it either because he read the module code and saw that it was supposed to work, or because he remembered the design discussion when webbrowser.py was first implemented. I've had conversations with some key Perl and Tcl people (Larry Wall, Tom Christiansen, Clif Flynt) about the BROWSER convention, and they agree it's a good idea. I'll probably hack support for it into Perl's browser launcher next. It's documented in the version of libwebbrowser.tex now in the CVS tree. -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From nas at arctrix.com Tue Jan 23 09:30:56 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 23 Jan 2001 00:30:56 -0800 Subject: [Python-Dev] Does autoconfig detect INSTALL incorrectly? Message-ID: <20010123003056.A28309@glacier.fnational.com> Why is the configure.in file set to always use "install-sh"? There is a comment that says: # Install just never works :-( I don't think that statement is accurate. /usr/bin/install works quite well on my machine. The only commments I can find in the changelog are: revision 1.16 date: 1995/01/20 14:12:16; author: guido; state: Exp; lines: +27 -2 add INSTALL_PROGRAM and INSTALL_DATA; check for getopt and: revision 1.5 date: 1994/08/19 15:33:51; author: guido; state: Exp; lines: +14 -6 Simplify value of INSTALL (always 'cp'). Is there any reason why the autoconf macro AC_PROG_INSTALL is not used? The documentation seems to indicate that is does what we want. Neil From guido at digicool.com Tue Jan 23 16:31:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 10:31:39 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: Your message of "Tue, 23 Jan 2001 10:15:11 +0100." <3A6D4B9F.38B17046@lemburg.com> References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> <3A6D4B9F.38B17046@lemburg.com> Message-ID: <200101231531.KAA05122@cj20424-a.reston1.va.home.com> > Polymorphic code will usually get you more out of an > algorithm, than type-safe or interface-safe code. Right. But there are times when people want to write methods that take e.g. either a sequence or a mapping, and need to distinguish between the two. That's not easy in Python! Java and C++ support it very well though, and thus we'll always keep seeing this kind of complaint. Not sure what to do, except to recommend "find out which methods you expect in one case but not in the other (e.g. keys()) and do a hasattr() test for that." > BTW, there are Python interfaces to PySequence_Check() and > PyMapping_Check() burried in the builtin operator module in case > you really do care ;) ... > > operator.isSequenceType() > operator.isMappingType() > + some other C style _Check() APIs > > These only look at the type slots though, so Python instances > will appear to support everything but when used fail with > an exception if they don't provide the proper __xxx__ hooks. Yes, these should probably be deprecated. I certainly have never used them! (The operator module doesn't seem to get much use in general... Was it a bad idea?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 23 16:49:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 10:49:23 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Mon, 22 Jan 2001 15:13:09 EST." <20010122151309.C15236@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> Message-ID: <200101231549.KAA05172@cj20424-a.reston1.va.home.com> > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. Actually, I thought that Greg's proposal has some charm: it seems to be using a natural extension of the existing dictionary syntax, where a set is a dictionary without the values. I haven't thought about this deeply enough, but I see a lot of potential here. I understand that you have probably given this more thought than I have recently, so I'd like to see your more detailed analysis of what you do and don't like about Greg's proposal! > The only other set module I can find in the Vaults or anywhere else is > kjBuckets (which I knew about before). Looks like a good design, but > complicated -- and requires installation of an extension. > > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: > > 1. It supports a set of operations that are both often useful and > fiddly to get right, thus enhancing the "batteries are included" > effect. (I used its ancestor for representing seen-message numbers in > a specialized mailreader, for example.) I haven't read your docs yet (and no time because Digital Creations is requiring my attention all of today), but I expect that designing a universal set type, one that is good enough to be used in all sorts of applications, is very difficult. > 2. It's simple for application programmers to use. No extension module > to integrate. This is a silly argument for wanting something to be added to the core. If it's part of the core, the need for an extension is immaterial because that extension will always be available. So I conclude that your module is set up perfectly for a popular module in the Vaults. :-) > 3. It's unsurprising. My set objects behave almost exactly like other > mutable sequences, with all the same built-in methods working, except for > the fact that you can't introduce duplicates with the mutators. Ah, so you see a set as an extension of a sequence. That may be the big rift between your version and Greg's PEP: are sets more like sequences or more like dictionaries? > 4. It's already completely documented in a form suitable for the library. Much appreciated. > 5. It's simple enough not to cause you maintainance hassles down the > road, and even if it did the maintainer is unlikely to disappear :-). I'll be the judge of that, and since you prefer not to show your source code (why is that?), I can't tell yet. [...time flows...] Having just skimmed your docs, I'm disappointed that you choose lists as your fundamental representation type -- this makes it slow to test for membership and hence makes intersection and union slow. I suppose that you have evidence from using this that those operations aren't used much, or not for large sets? This is one of the problems with coming up with a set type for the core: it has to work for (nearly) everybody. It's no big deal if the Vaults contain three or more set modules -- perfect even, people can choose the best one for their purpose. But in the core, there's only room for one set type or module. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 23 17:30:50 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 11:30:50 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231549.KAA05172@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 10:49:23AM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> Message-ID: <20010123113050.A26162@thyrsus.com> Guido van Rossum : > I understand that you have probably given this more thought than I > have recently, so I'd like to see your more detailed analysis of what > you do and don't like about Greg's proposal! I've already covered my big objection, the fact that it doesn't support the degree of polymorphic crossover one might expect with sequence types (and Greg has agreed that I have a point there). Another problem is the lack of support for mutable elements (and yes, I'm quite aware of the problems with this.) One thing I do like is the proposal for an actual set input syntax. Of course this would require that the set type become one of the builtins, with compiler support. > I haven't read your docs yet (and no time because Digital Creations is > requiring my attention all of today), but I expect that designing a > universal set type, one that is good enough to be used in all sorts of > applications, is very difficult. For "difficult" read "can't be done". This is one of those cases where no matter what implementation you choose, some of the operations you want to be cheap will be worst-case quadratic. Life is like that. So I chose a dead-simple representation and accepted quadratic times for union/intersection. > > 2. It's simple for application programmers to use. No extension module > > to integrate. > > This is a silly argument for wanting something to be added to the > core. If it's part of the core, the need for an extension is > immaterial because that extension will always be available. So > I conclude that your module is set up perfectly for a popular module > in the Vaults. :-) Reasonable point. > > 3. It's unsurprising. My set objects behave almost exactly like other > > mutable sequences, with all the same built-in methods working, except for > > the fact that you can't introduce duplicates with the mutators. > > Ah, so you see a set as an extension of a sequence. That may be the > big rift between your version and Greg's PEP: are sets more like > sequences or more like dictionaries? Indeed it is. > > 5. It's simple enough not to cause you maintainance hassles down the > > road, and even if it did the maintainer is unlikely to disappear :-). > > I'll be the judge of that, and since you prefer not to show your > source code (why is that?), I can't tell yet. No nefarious concealment going on here here :-), I've sent versions of the code to Greg and Ping already. I'll shoot you a copy too. > Having just skimmed your docs, I'm disappointed that you choose lists > as your fundamental representation type -- this makes it slow to test > for membership and hence makes intersection and union slow. Not quite. Membership test is still linear-time; so is adding and deleting elements. It's true that union and intersection are quadratic, but see below. > I suppose > that you have evidence from using this that those operations aren't > used much, or not for large sets? Exactly! In my experience the usage pattern of a class like this runs heavily to small sets (usually < 64 elements); membership tests dominate usage, with addition and deletion of elements running second and the "classical" boolean operations like union and intersection being uncommon. What you get by going with a dictionary representation is that membership test becomes close to constant-time, while insertion and deletion become sometimes cheap and sometimes quite expensive (depending of course on whether you have to allocate a new hash bucket). Given the usage pattern I described, the overall difference in performance is marginal. > This is one of the problems with > coming up with a set type for the core: it has to work for (nearly) > everybody. As I pointed out above (and someone else on the list had made the same point earlier), "works for everbody" isn't really possible here. So my solution does the next best thing -- pick a choice of tradeoffs that isn't obviously worse than the alternatives and keeps things bog-simple. -- Eric S. Raymond Alcohol still kills more people every year than all `illegal' drugs put together, and Prohibition only made it worse. Oppose the War On Some Drugs! -------------- next part -------------- """ A set-algebra module for Python. The functions work on any sequence type and return lists. The set methods can take a set or any sequence type as an argument. They are insensitive to the types of the elements. Lists are used rather than dictionaries so the elements can be mutable. """ # Design and implementation by ESR, January 2001. def setify(list1): # Used by set constructor "Remove duplicates in sequence." res = [] for i in range(len(list1)): duplicate = 0 for j in range(i): if list1[i] == list1[j]: duplicate = 1 break if not duplicate: res.append(list1[i]) return res def union(list1, list2): # Used for | "Compute set intersection of sequences." res = list1[:] for x in list2: if not x in list1: res.append(x) return res def intersection(list1, list2): # Used for & "Compute set intersection of sequences." res = [] for x in list1: if x in list2: res.append(x) return res def difference(list1, list2): # Used for - "Compute set difference of sequences." res = [] for x in list1: if not x in list2: res.append(x) return res def symmetric_difference(list1, list2): # Used for ^ "Compute set symmetric-difference of sequences." res = [] for x in list1: if not x in list2: res.append(x) for x in list2: if not x in list1: res.append(x) return res def cartesian(list1, list2): # Used for * "Cartesian product of sequences considered as sets." res = [] for x in list1: for y in list2: res.append((x,y)) return res def equality(list1, list2): "Test sequences considered as sets for equality." if len(list1) != len(list2): return 0 for x in list1: if not x in list2: return 0 for x in list2: if not x in list1: return 0 return 1 def proper_subset(list1, list2): "Return 1 if first argument is a proper subset of second, 0 otherwise." if not len(list1) < len(list2): return 0 for x in list1: if not x in list2: return 0 return 1 def subset(list1, list2): "Return 1 if first argument is a subset of second, 0 otherwise." if not len(list1) <= len(list2): return 0 for x in list1: if not x in list2: return 0 return 1 def powerset(base): "Compute the set of all subsets of a set." powerset = [] for n in xrange(2 ** len(base)): subset = [] for e in xrange(len(base)): if n & 2 ** e: subset.append(base[e]) powerset.append(subset) return powerset class set: "Lists with set-theoretic operations." def __init__(self, value): self.elements = setify(value) def __len__(self): return len(self.elements) def __getitem__(self, ind): return self.elements[ind] def __setitem__(self, ind, val): if val not in self.elements: self.elements[ind] = val def __delitem__(self, ind): del self.elements[ind] def list(self): return self.elements def append(self, new): if new not in self.elements: self.elements.append(new) def extend(self, new): self.elements.extend(new) self.elements = setify(self.elements) def count(self, x): self.elements.count(x) def index(self, x): self.elements.index(x) def insert(self, i, x): if x not in self.elements: self.elements.index(i, x) def pop(self, i=None): self.elements.pop(i) def remove(self, x): self.elements.remove(x) def reverse(self): self.elements.reverse() def sort(self, cmp=None): self.elements.sort(cmp) def __or__(self, other): if type(other) == type(self): other = other.elements return set(union(self.elements, other)) __add__ = __or__ def __and__(self, other): if type(other) == type(self): other = other.elements return set(intersection(self.elements, other)) def __sub__(self, other): if type(other) == type(self): other = other.elements return set(difference(self.elements, other)) def __xor__(self, other): if type(other) == type(self): other = other.elements return set(symmetric_difference(self.elements, other)) def __mul__(self, other): if type(other) == type(self): other = other.elements return set(cartesian(self.elements, other)) def __eq__(self, other): if type(other) == type(self): other = other.elements return self.elements == other def __ne__(self, other): if type(other) == type(self): other = other.elements return self.elements != other def __lt__(self, other): if type(other) == type(self): other = other.elements return proper_subset(self.elements, other) def __le__(self, other): if type(other) == type(self): other = other.elements return subset(self.elements, other) def __gt__(self, other): if type(other) == type(self): other = other.elements return proper_subset(other, self.elements) def __ge__(self, other): if type(other) == type(self): other = other.elements return subset(other, self.elements) def __str__(self): res = "{" for x in self.elements: res = res + str(x) + ", " res = res[0:-2] + "}" return res def __repr__(self): return repr(self.elements) if __name__ == '__main__': a = set([1, 2, 3, 4]) b = set([1, 4]) c = set([5, 6]) d = [1, 1, 2, 1] print `d`, "setifies to", set(d) print `a`, "|", `b`, "is", `a | b` print `a`, "^", `b`, "is", `a ^ b` print `a`, "&", `b`, "is", `a & b` print `b`, "*", `c`, "is", `b * c` print `a`, '<', `b`, "is", `a < b` print `a`, '>', `b`, "is", `a > b` print `b`, '<', `c`, "is", `b < c` print `b`, '>', `c`, "is", `b > c` print "Power set of", `c`, "is", powerset(c) # end From sdm7g at virginia.edu Tue Jan 23 18:12:22 2001 From: sdm7g at virginia.edu (Steven D. Majewski) Date: Tue, 23 Jan 2001 12:12:22 -0500 (EST) Subject: [Python-Dev] libraries=['m'] in config.py [Re: Python 2.1 alpha 1 released!] In-Reply-To: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Message-ID: Is there a simple way (other than editing config.py) to remove the effect of all of the "libraries=['m']" options from config.py ? This breaks the MacOSX build as there's no libm -- that functionality is build into the System.framework . Shouldn't these type of flags be acquired from configure or the make environment somehow ? -- Steve Majewski ( BTW: OSX build also needs a "-traditional-cpp" flag to get thru compiling classobject.c without error. ) From uche.ogbuji at fourthought.com Tue Jan 23 18:28:18 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 10:28:18 -0700 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: Message from Martin von Loewis of "Mon, 22 Jan 2001 15:46:39 +0100." <200101221446.PAA05164@pandora.informatik.hu-berlin.de> Message-ID: <200101231728.KAA03408@localhost.localdomain> > > This has nothing to do with Python. UTF-8 marks the codes > > from 128-191 as illegal prefix. > [...] > > Perhaps the parser should catch the UnicodeError and > > instead return a not-wellformed exception ?! > > Right on both accounts. If no encoding is specified, and if the > document appears not to be UTF-16 in any endianness, an XML processor > shall assume it is UTF-8. As Marc-Andre explains, your document is not > proper UTF-8, hence the error. > > The confusing thing is that expat itself does not care about it not > being UTF-8; that is only detected when the callback is invoked in > pyexpat, and therefore conversion to a Unicode object is attempted. Pyexpat violates the XML spec here. XML parsers are not allowed to "recover" from well-formedness errors. And I would classify blithley reporting the character data as "recovery". However, I'm amazed that this wouldn't have come up before, considering the pedigree of expat. I'll poke around, and raise a bug on the expat site if need be. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tismer at tismer.com Tue Jan 23 18:35:08 2001 From: tismer at tismer.com (Christian Tismer) Date: Tue, 23 Jan 2001 18:35:08 +0100 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) References: <200101231728.KAA03408@localhost.localdomain> Message-ID: <3A6DC0CC.C4FF83DF@tismer.com> uche.ogbuji at fourthought.com wrote: > > > > This has nothing to do with Python. UTF-8 marks the codes > > > from 128-191 as illegal prefix. > > [...] > > > Perhaps the parser should catch the UnicodeError and > > > instead return a not-wellformed exception ?! > > > > Right on both accounts. If no encoding is specified, and if the > > document appears not to be UTF-16 in any endianness, an XML processor > > shall assume it is UTF-8. As Marc-Andre explains, your document is not > > proper UTF-8, hence the error. > > > > The confusing thing is that expat itself does not care about it not > > being UTF-8; that is only detected when the callback is invoked in > > pyexpat, and therefore conversion to a Unicode object is attempted. > > Pyexpat violates the XML spec here. XML parsers are not allowed to "recover" > from well-formedness errors. And I would classify blithley reporting the > character data as "recovery". > > However, I'm amazed that this wouldn't have come up before, considering the > pedigree of expat. Well, I had to write a preprocessor which turns some "xml-like" but not well-formed stuff into something useable. This was a bulk of 100 MB of data, partially hand-written, partially machine-generated, but not really well-formed. Some special characters appeared very late in the data set, raising an error in Python 2.0, but not in 1.5.2, so I perceived it as an error in the parser first, not the data. :-) ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From uche.ogbuji at fourthought.com Tue Jan 23 18:55:12 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 10:55:12 -0700 Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: Message from Christian Tismer of "Mon, 22 Jan 2001 16:05:24 +0100." <3A6C4C34.4D1252C9@tismer.com> Message-ID: <200101231755.KAA03471@localhost.localdomain> > "M.-A. Lemburg" wrote: > ... > > > The codes from 192 to 236, 238-243 produce > > > "UTF-8 decoding error: invalid data", > > > the rest gives "not well-formed". > > > > > > I would like to know if this happens with your (Tim) modified > > > version as well. I'm using plain vanilla BeOpen Python 2.0 . > > > > This has nothing to do with Python. UTF-8 marks the codes > > from 128-191 as illegal prefix. See Object/unicodeobject.c: > ... > > Schade. > > > Perhaps the parser should catch the UnicodeError and > > instead return a not-wellformed exception ?! > > I belive it would be better. Yes, and given there is not much time before thr 2.1 release, doing so is an acceptable stop-gap. However, I think the real fix has to lie in expat. I just had a *very* quick and dirty perusal of expat 1.2 and 1.95.1, and not only do the UTF-8 validity checks (at the top of xmltok.c) seem wrong, but it doesn't look as if they're ever invoked. I'll try to some time to look into this more closely, or perhaps someone will straighten me out if I'm on the wrong trail. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From fredrik at effbot.org Tue Jan 23 19:03:42 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 19:03:42 +0100 Subject: [Python-Dev] getting rid of ucnhash Message-ID: <013901c08566$d2a8f360$e46940d5@hagrid> It's probably just me, but the names of the two unicode modules tend to irritate me: > ls u*.pyd ucnhash.pyd unicodedata.pyd (the former contains names, the latter data) I've been meaning to rename the former, but I just realized that it might be better to get rid of it completely, and move its functionality into the unicodedata module. The result is a single 200k unicodedata module, which con- tains the name database as well as two new functions: name(character [, default]) => map unicode character to name. if the name doesn't exist, return the default object, or raise ValueError. lookup(name) => unicode character (or raise KeyError if it doesn't exist) Should I check it in now, change the names/semantics and check it in, or post it to sourceforge? Cheers /F From uche.ogbuji at fourthought.com Tue Jan 23 19:00:19 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:00:19 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Mon, 22 Jan 2001 12:41:59 EST." <20010122124159.A14999@thyrsus.com> Message-ID: <200101231800.LAA03515@localhost.localdomain> > \section{\module{set} --- > Basic set algebra for Python} Looks good. Are you making this available for download? I could put this to experimental use right away (experimental since, IIRC, you are using the new rich comparisons). -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji at fourthought.com Tue Jan 23 19:16:27 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:16:27 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Mon, 22 Jan 2001 15:13:09 EST." <20010122151309.C15236@thyrsus.com> Message-ID: <200101231816.LAA03551@localhost.localdomain> > Guido van Rossum : > > There's already a PEP on a set object type, and everybody and their > > aunt has already implemented a set datatype. Tim mentioned that he had one, and he also claimed that every other dodder had a set class, but the only one listed in the vaults is kjBuckets, which I'm not sure is maintained any more. (Is Aaron Watters hereabouts?) > I've just read the PEP. Greg's proposal has a couple of problems. > The biggest one is that the interface design isn't very Pythonic -- > it's formally adequate, but doesn't exploit the extent to which sets > naturally have common semantics with existing Python sequence types. > This is bad; it means that a lot of code that could otherwise ignore > the difference between lists and sets would have to be specialized > one way or the other for no good reason. IMO, Eric's Set interface is close to perfect. PEP 218 is interesting, but I'm not sure it's worth slogging through the inevitable uproar over an entirely new syntactic construct (the "{}" notation) before getting something as useful as a set class into the standard library. > > If *your* set module is ready for prime time, why not publish it in > > the Vaults of Parnassus? > > I suppose that's what I'll do if you don't bless it for the standard > library. But here are the reasons I suggest you should do so: For what it's worth, I'm +1 on adding this to the standard library. I've seen so many set hacks with dictionaries (memory ouch) and list hacks (speed ouch) in Python code out there, that I'm convinced it would meet much more common usage than, say zlib, xdr, or even expat. On this hacker list everyone's aunt might whip up set extensions on boring weekends, but I doubt this describes the overall Python populace. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From uche.ogbuji at fourthought.com Tue Jan 23 19:29:36 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:29:36 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "M.-A. Lemburg" of "Tue, 23 Jan 2001 11:26:16 +0100." <3A6D5C48.A076DA0@lemburg.com> Message-ID: <200101231829.LAA03575@localhost.localdomain> > All very well, but are sets really that essential to every > day Python programming ? Not everyday, but as I said, the standard library has zlib, expat, tkinter, colorsys, and a whole lot of other stuff that is undoubtedly less useful than a set class. > If we include sets then we ought to > also include graphs, tries, btrees I see all of these as far less commonly useful than sets (at least in situations where implementations using existing data structures won't suffice). I run into needs for sets all the time. I don't have as much trouble with your other examples, though I've always considered tries as a possible performance boost in XPath. Oddly enough another data structure I often wish I had is a splay tree, and I hope to wrap my old C++ splay tree implementation for Python one of these days. > and all those other goodies > we have in computer science. All of these types are available > out there, but I believe the audience who really cares for these > types is also capable of downloading the extensions and installing > them. > > It would be nice if all of these extension could go into a SUMO > edition of Python though... together with your set module. Considering "batteries included", it's worth considering these very important "batteries". -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From skip at mojam.com Tue Jan 23 19:35:04 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 23 Jan 2001 12:35:04 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: References: Message-ID: <14957.52952.48739.53360@beluga.mojam.com> Guido> - Use "exec ... in dict" to avoid having to walk on eggshells; Guido> locals no don't have to start with underscore. Thanks. I have just been incredibly short on time lately. Guido> - Only test dbhash if bsddb can be imported. (Wonder if there Guido> are more like this?) Alpha testing should pick those up, yes? ;-) Guido> ! try: Guido> ! import bsddb Guido> ! except ImportError: Guido> ! if verbose: Guido> ! print "can't import bsddb, so skipping dbhash" Guido> ! else: Guido> ! check_all("dbhash") Instead of having to know that dbhash includes bsddb, shouldn't dbhash be the module that's imported here? Skip From uche.ogbuji at fourthought.com Tue Jan 23 19:36:59 2001 From: uche.ogbuji at fourthought.com (uche.ogbuji at fourthought.com) Date: Tue, 23 Jan 2001 11:36:59 -0700 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message from "Eric S. Raymond" of "Tue, 23 Jan 2001 11:30:50 EST." <20010123113050.A26162@thyrsus.com> Message-ID: <200101231836.LAA03655@localhost.localdomain> > """ > A set-algebra module for Python. > > The functions work on any sequence type and return lists. > The set methods can take a set or any sequence type as an argument. > They are insensitive to the types of the elements. > > Lists are used rather than dictionaries so the elements can be mutable. > > """ Hmm. I was hoping this was actually a C extension for the performance boost, esp. given the number of __foo__ methods in the set class. Implementation in Python makes my interest in adding it to the standard lib more tepid (not to cast the least bit of aspersion on your work). -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From skip at mojam.com Tue Jan 23 19:37:44 2001 From: skip at mojam.com (Skip Montanaro) Date: Tue, 23 Jan 2001 12:37:44 -0600 (CST) Subject: [Python-Dev] pydoc - put it in the core In-Reply-To: <3A6CBBEF.4732BFF2@ActiveState.com> References: <14945.59192.400783.403810@beluga.mojam.com> <200101142055.PAA13041@cj20424-a.reston1.va.home.com> <3A6CBBEF.4732BFF2@ActiveState.com> Message-ID: <14957.53112.119272.797494@beluga.mojam.com> Paul> I apologize but I'm not clear on my responsibilities here, if Paul> any. I wrote a PEP for online help. I submitted a partial Paul> implementation. Perhaps I am the one who should apologize. I started the thread. I tried Ping's code and was simply amazed at how useful it was. I didn't bother checking the list of PEPs to see if it overlapped with something there, and I suspect any discussion of this stuff has taken place in the doc sig, where I don't hang out. Skip From esr at thyrsus.com Tue Jan 23 19:39:04 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 13:39:04 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231816.LAA03551@localhost.localdomain>; from uche.ogbuji@fourthought.com on Tue, Jan 23, 2001 at 11:16:27AM -0700 References: <200101231816.LAA03551@localhost.localdomain> Message-ID: <20010123133904.B26487@thyrsus.com> uche.ogbuji at fourthought.com : > I've seen so many set hacks with dictionaries (memory ouch) and list > hacks (speed ouch) in Python code out there, that I'm convinced it > would meet much more common usage than, say zlib, xdr, or even > expat. Uche brings up a point I meant to make in my reply to Guido. The dict- vs.-list choice in set representation is indeed a choice between memory ouch and speed ouch. I believe most uses of sets are small sets. That reduces the speed ouch of using a list representation and increases the proportional memory ouch of a dictionary implementation. -- Eric S. Raymond Question with boldness even the existence of a God; because, if there be one, he must more approve the homage of reason, than that of blindfolded fear.... Do not be frightened from this inquiry from any fear of its consequences. If it ends in the belief that there is no God, you will find incitements to virtue in the comfort and pleasantness you feel in its exercise... -- Thomas Jefferson, in a 1787 letter to his nephew From jeremy at alum.mit.edu Tue Jan 23 19:41:23 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 13:41:23 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123113050.A26162@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> Message-ID: <14957.53331.342827.462297@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Guido van Rossum : >> Having just skimmed your docs, I'm disappointed that you choose >> lists as your fundamental representation type -- this makes it >> slow to test for membership and hence makes intersection and >> union slow. ESR> Not quite. Membership test is still linear-time; so is adding ESR> and deleting elements. It's true that union and intersection ESR> are quadratic, but see below. >> I suppose that you have evidence from using this that those >> operations aren't used much, or not for large sets? ESR> Exactly! In my experience the usage pattern of a class like ESR> this runs heavily to small sets (usually < 64 elements); ESR> membership tests dominate usage, with addition and deletion of ESR> elements running second and the "classical" boolean operations ESR> like union and intersection being uncommon. I use a Set type in the compiler package (Tools/compiler/compiler) to collect the names for a code block. I implemented a trivial Set type using a dictionary, because it supported the operations I was most interested in: addition, membership tests, intersection, and get elements as sequence (in arbitrary order). Those are the only operations the compiler uses. I think I use sets for this purpose frequently, although I can't think of any other good examples at the moment. I usually just use a dictionary explicitly. In the compiler, I chose an explicit Set class with unique method names (add, has_elt, elements) to make it obvious for readers that I was using a set. ESR> What you get by going with a dictionary representation is that ESR> membership test becomes close to constant-time, while insertion ESR> and deletion become sometimes cheap and sometimes quite ESR> expensive (depending of course on whether you have to allocate ESR> a new hash bucket). Given the usage pattern I described, the ESR> overall difference in performance is marginal. The cost of insertion would presumably be dominated by the frequency of dictionary resizes. I don't know how often they occur, but I assume the dictionary type is designed to accommodate efficient insert. I did a quick and dirty performance comparison of dictionary-based and list-based sets. (I'll include the code below.) It uses sample data collected from running the compiler; so it is measuring actual usage. The tests showed that dictionary-based sets were always faster. For small tests (3 operations), the difference was about 10 percent. For larger tests (88 operations), the difference ranged from 180 to almost 700 percent. >> This is one of the problems with coming up with a set type for >> the core: it has to work for (nearly) everybody. ESR> As I pointed out above (and someone else on the list had made ESR> the same point earlier), "works for everbody" isn't really ESR> possible here. So my solution does the next best thing -- pick ESR> a choice of tradeoffs that isn't obviously worse than the ESR> alternatives and keeps things bog-simple. For my applications, the dictionary-based approach is faster and offers a natural interface. If a set implementation were included in the standard library, I would like to see either (1) the implementation that favors my needs or (2) multiple implementations tuned for different uses. I think it would be just as easy to make set implementations available separately, though. Jeremy -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sets.tar URL: From loewis at informatik.hu-berlin.de Tue Jan 23 19:51:37 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 23 Jan 2001 19:51:37 +0100 (MET) Subject: Partial victory (was RE: [Python-Dev] RE: test_sax failing (Windows)) In-Reply-To: <200101231755.KAA03471@localhost.localdomain> (uche.ogbuji@fourthought.com) References: <200101231755.KAA03471@localhost.localdomain> Message-ID: <200101231851.TAA19488@pandora.informatik.hu-berlin.de> > I'll try to some time to look into this more closely, or perhaps > someone will straighten me out if I'm on the wrong trail. Spending only a little time myself, either, I'd agree with your conclusions. Regards, Martin From esr at thyrsus.com Tue Jan 23 19:55:30 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 13:55:30 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <14957.53331.342827.462297@localhost.localdomain>; from jeremy@alum.mit.edu on Tue, Jan 23, 2001 at 01:41:23PM -0500 References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> Message-ID: <20010123135530.A26565@thyrsus.com> Jeremy Hylton : Content-Description: message body text > The tests showed that dictionary-based sets were always faster. For > small tests (3 operations), the difference was about 10 percent. For > larger tests (88 operations), the difference ranged from 180 to almost > 700 percent. Not surprising. 88 elements is getting pretty large. -- Eric S. Raymond Hoplophobia (n.): The irrational fear of weapons, correctly described by Freud as "a sign of emotional and sexual immaturity". Hoplophobia, like homophobia, is a displacement symptom; hoplophobes fear their own "forbidden" feelings and urges to commit violence. This would be harmless, except that they project these feelings onto others. The sequelae of this neurosis include irrational and dangerous behaviors such as passing "gun-control" laws and trashing the Constitution. From petrilli at amber.org Tue Jan 23 20:06:05 2001 From: petrilli at amber.org (Christopher Petrilli) Date: Tue, 23 Jan 2001 14:06:05 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123133904.B26487@thyrsus.com>; from esr@thyrsus.com on Tue, Jan 23, 2001 at 01:39:04PM -0500 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> Message-ID: <20010123140604.E18796@trump.amber.org> Eric S. Raymond [esr at thyrsus.com] wrote: > I believe most uses of sets are small sets. That reduces the speed ouch > of using a list representation and increases the proportional memory > ouch of a dictionary implementation. The problem is that there are a lot of uses for large sets, especially when you begin to introduce intersections and unions. If an implementation is only useful for a few dozen (or a hundered) items in the set, that eliminates a lot of places where the real use of set types is useful---optimizing large scale manipulations. Zope for example, manipulates sets with 10,000 items in it on a regular basis when doing text index manipulation. The data structures are heavily optimized for this kind of behaviour, without a major sacrifice in space. I think Jim perhaps can talk to this. Unfortunately, for me, a Python implementation of Sets is only interesting academicaly. Any time I've needed to work with them at a large scale, I've needed them *much* faster than Python could achieve without a C extension. Perhaps the difference is in problem domain. In the "scripting" problem domain, I would agree that Setswould rarely reach large sizes, and so a algorithm which performed in quadratic time might be fine, because the actual resultant time is small. However, in more full-blown applications, this would be counter productive, and the user would be forced implement their own (or use Aaron's excellent kjBuckets). Just my opinion, of course. Chris -- | Christopher Petrilli | petrilli at amber.org From ping at lfw.org Tue Jan 23 20:27:38 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 11:27:38 -0800 (PST) Subject: [Python-Dev] Sets: elt in dict, lst.include In-Reply-To: <14957.53331.342827.462297@localhost.localdomain> Message-ID: On Tue, 23 Jan 2001, Jeremy Hylton wrote: > For my applications, the dictionary-based approach is faster and > offers a natural interface. The only change that needs to be made to support sets of immutable elements is to provide "in" on dictionaries. The rest is then all quite natural: dict[key] = 1 if key in dict: ... for key in dict: ... (Then we can also get rid of the ugly has_key method.) For those that need mutable set elements badly enough to sacrifice a little speed, we can add two methods to lists: lst.include(elt) # same as - if elt not in lst: lst.append(elt) lst.exclude(elt) # same as - while elt in lst: lst.remove(elt) (These are generally useful methods to have anyway.) This proposal has the following advantages: 1. You still get to choose which implementation best suits your needs. 2. No new types are introduced; lists and dicts are well understood. 3. Both features are extremely simple to understand and explain. 4. Both features are useful in their own right, and could stand as independent proposals to improve lists and dicts respectively. (For instance, i spotted about 10 places in the std library where the 'include' method could be used, and i know i would use it myself -- certainly more often than pop or reverse!) 5. In all cases this is faster than a new Python class. (For instance, Jeremy's implementation even contained a commented-out optimization that stored self.elts.has_key as self.has_elt to speed things up a bit. Using straight dicts would see this optimization and raise it one, with no effort at all.) 6. Either feature can be independently approved or rejected without affecting the other. -- ?!ng From loewis at informatik.hu-berlin.de Tue Jan 23 20:33:00 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 23 Jan 2001 20:33:00 +0100 (MET) Subject: [Python-Dev] getting rid of ucnhash Message-ID: <200101231933.UAA02223@pandora.informatik.hu-berlin.de> > Should I check it in now, change the names/semantics and check it > in, or post it to sourceforge? Is that two or three options? If three, what change in semantics did you propose? Anyway, I feel it could go in right now; the only breakage would be to applications that use ucnhash.ucnhashAPI, right? Regards, Martin From fredrik at effbot.org Tue Jan 23 20:49:09 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 23 Jan 2001 20:49:09 +0100 Subject: [Python-Dev] Re: getting rid of ucnhash References: <200101231933.UAA02223@pandora.informatik.hu-berlin.de> Message-ID: <01e801c08575$8f71c680$e46940d5@hagrid> martin wrote: > > Should I check it in now, change the names/semantics and check it > > in, or post it to sourceforge? > > Is that two or three options? three, I think. > If three, what change in semantics did you propose? none -- but maybe someone else has a better name for "lookup"? (the "name" function behaves like the existing property methods in 2.0's unicodedata) > Anyway, I feel it could go in right now; the only breakage would be to > applications that use ucnhash.ucnhashAPI, right? yup -- and those applications are already broken, since the CObject was renamed in 2.1a1. (well, any code using 2.1a1's new ucnhash.getcode/getname functions will of course also break. but I think we can live with that ;-) Cheers /F From ping at lfw.org Tue Jan 23 20:43:50 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 11:43:50 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Message-ID: Christopher Petrilli wrote: > The problem is that there are a lot of uses for large sets, especially > when you begin to introduce intersections and unions. [...] > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. On Tue, 23 Jan 2001, Ka-Ping Yee wrote: > This proposal has the following advantages: [six nice things about 'in dict' and 'lst.include'] I forgot to mention an important seventh advantage: 7. The list and dictionary data structures are implemented in the C core, so we leave open the possibility of a wizard going and optimizing the snot out of them later. Just as there's e.g. a boundary on recursion levels before Python invokes the cycle detection algorithm during comparison, if we decide we need more speed for big sets, Python could notice when a list or dictionary gets very big and invoke more powerful optimizations. We don't have to do this now, but the important thing is that we will always have the option to make Christopher's dream come true. (A wizard can do this once, and every Python script on the planet benefits.) In general i support Python deciding on the Right Thing to do under the hood, performance-wise, so that the programmer doesn't have to think too hard about what data structure to choose. -- ?!ng From nas at arctrix.com Tue Jan 23 14:08:07 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 23 Jan 2001 05:08:07 -0800 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123140604.E18796@trump.amber.org>; from petrilli@amber.org on Tue, Jan 23, 2001 at 02:06:05PM -0500 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> Message-ID: <20010123050807.A29115@glacier.fnational.com> On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. I think this argues that if sets are added to the core they should be implemented as an extension type with the speed of dictionaries and the memory usage of lists. Basicly, we would use the implementation of PyDict but drop the values. Neil From jeremy at alum.mit.edu Tue Jan 23 20:48:18 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 14:48:18 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <14957.53331.342827.462297@localhost.localdomain> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> Message-ID: <14957.57346.248852.656387@localhost.localdomain> Sorry about the garbled attachment on the previous message; I think I got the content-type wrong. Here's a second try. Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: sets.tar Type: application/octet-stream Size: 20480 bytes Desc: not available URL: From petrilli at amber.org Tue Jan 23 21:06:16 2001 From: petrilli at amber.org (Christopher Petrilli) Date: Tue, 23 Jan 2001 15:06:16 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com>; from nas@arctrix.com on Tue, Jan 23, 2001 at 05:08:07AM -0800 References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> Message-ID: <20010123150616.F18796@trump.amber.org> Neil Schemenauer [nas at arctrix.com] wrote: > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > Unfortunately, for me, a Python implementation of Sets is only > > interesting academicaly. Any time I've needed to work with them at a > > large scale, I've needed them *much* faster than Python could achieve > > without a C extension. > > I think this argues that if sets are added to the core they > should be implemented as an extension type with the speed of > dictionaries and the memory usage of lists. Basicly, we would > use the implementation of PyDict but drop the values. This is effectively the implementation that Zope has for Sets. In addition we have "buckets" that have scores on them (which are implemented as a modified BTree). Unfortunately Jim Fulton (who wrote all the code for that level) is in a meeting, but I hope he'll comment on the implementation that was chosen for our software. Chris -- | Christopher Petrilli | petrilli at amber.org From jeremy at alum.mit.edu Tue Jan 23 20:56:05 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 23 Jan 2001 14:56:05 -0500 (EST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123135530.A26565@thyrsus.com> References: <20010122124159.A14999@thyrsus.com> <200101221910.OAA01218@cj20424-a.reston1.va.home.com> <20010122151309.C15236@thyrsus.com> <200101231549.KAA05172@cj20424-a.reston1.va.home.com> <20010123113050.A26162@thyrsus.com> <14957.53331.342827.462297@localhost.localdomain> <20010123135530.A26565@thyrsus.com> Message-ID: <14957.57813.23072.723418@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Jeremy Hylton : Content-Description: ESR> message body text >> The tests showed that dictionary-based sets were always faster. >> For small tests (3 operations), the difference was about 10 >> percent. For larger tests (88 operations), the difference ranged >> from 180 to almost 700 percent. ESR> Not surprising. 88 elements is getting pretty large. Large for what? I've got directories with that many files and modules with the many names defined at the top-level :-). I'm just reporting the range of set sizes I've encountered for a real application. In general, I expect a few hundred elements should be handled without trouble by most Python containers. Jeremy From gvwilson at nevex.com Tue Jan 23 21:26:22 2001 From: gvwilson at nevex.com (Greg Wilson) Date: Tue, 23 Jan 2001 15:26:22 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123200601.87817EF68@mail.python.org> Message-ID: <001101c0857a$c0dce420$770a0a0a@nevex.com> Greg Wilson: Meta-question: do people want to continue to discuss sets on the general python-dev list, or take it out-of-line (e.g. to an egroups list)? I'm finding all of the discussion very useful, but I realize that many readers might prefer to concentrate on the 2.1 release... > Jeremy Hylton : > > The tests showed that dictionary-based sets were always faster. > > small tests (3 operations), the difference was about 10 percent. > > larger tests (88 operations), the difference ranged from > > 180 to almost 700 percent. > Eric Raymond : > Not surprising. 88 elements is getting pretty large. Greg Wilson: Really? I was testing my implementation with sets of email addresses grep'd out of old mail folders --- typical sizes were several thousand elements. > From: Christopher Petrilli > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. Greg Wilson: I had been expecting to implement this in C, not in pure Python, for performance. > From: Christopher Petrilli > In the "scripting" problem domain, I would agree that Sets would > rarely reach large sizes, > and so a algorithm which performed in quadratic time might be fine, Greg Wilson: I strongly disagree (see the email address example above --- it was the first thing that occurred to me to try). I am still hoping to find a sub-quadratic (preferably sub-linear) implementation. I can do it in C++ with observer/observable (contained items notify containers of changes in value, sets store all equivalent items in the same bucket), but that doesn't really help... > From: Ka-Ping Yee > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries... and: > From: Neil Schemenauer > ...if sets are added to the core...we would > use the implementation of PyDict but drop the values. Unfortunately, if values are required to be immutable, then sets of sets aren't possible... :-( Thanks, everyone, Greg From esr at thyrsus.com Tue Jan 23 21:38:39 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 23 Jan 2001 15:38:39 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from ping@lfw.org on Tue, Jan 23, 2001 at 11:27:38AM -0800 References: <14957.53331.342827.462297@localhost.localdomain> Message-ID: <20010123153839.B26676@thyrsus.com> Ka-Ping Yee : > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries. The rest is then all > quite natural: > > dict[key] = 1 > if key in dict: ... > for key in dict: ... Independently of implementation issues about sets, I think this is a damn fine idea. +1. > (Then we can also get rid of the ugly has_key method.) > > For those that need mutable set elements badly enough to sacrifice > a little speed, we can add two methods to lists: > > lst.include(elt) # same as - if elt not in lst: lst.append(elt) > lst.exclude(elt) # same as - while elt in lst: lst.remove(elt) +1 on the concept, -0 on the names. -- Eric S. Raymond [The disarming of citizens] has a double effect, it palsies the hand and brutalizes the mind: a habitual disuse of physical forces totally destroys the moral [force]; and men lose at once the power of protecting themselves, and of discerning the cause of their oppression. -- Joel Barlow, "Advice to the Privileged Orders", 1792-93 From tim.one at home.com Tue Jan 23 23:02:41 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 17:02:41 -0500 Subject: [Python-Dev] Is X a (sequence|mapping)? In-Reply-To: <200101231531.KAA05122@cj20424-a.reston1.va.home.com> Message-ID: >> operator.isMappingType() >> + some other C style _Check() APIs [Guido] > Yes, these should probably be deprecated. I certainly have never > used them! (The operator module doesn't seem to get much use in > general... It's used heavily by test_operator.py . Outside of that, it's used maybe three times in the std distribution, nowhere essential; the return map(operator.__div__, rgbtuple, _maxtuple) in Pynche's ColorDB.py is typical. 2.0's return [x / 256. for x in rgbtuple] does the same thing more clearly (_maxtuple is a module constant). It appeals to functional-language fans and extreme micro-optimizers, so they don't have to type "lambda" in the simplest cases. At least operator.truth(x) is *clearer* than "not not x". > Was it a bad idea?) Mixed, but I'd say more bad than good overall. From thomas at xs4all.net Wed Jan 24 00:38:14 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 00:38:14 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010123153839.B26676@thyrsus.com>; from esr@thyrsus.com on Tue, Jan 23, 2001 at 03:38:39PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> Message-ID: <20010124003814.F27785@xs4all.nl> On Tue, Jan 23, 2001 at 03:38:39PM -0500, Eric S. Raymond wrote: > > The only change that needs to be made to support sets of immutable > > elements is to provide "in" on dictionaries. The rest is then all > > quite natural: > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > Independently of implementation issues about sets, I think this is a > damn fine idea. +1. It's come up before. The problem with it is that it's not quite obvious whether it is 'if key in dict' or 'if value in dict'. Sure, from the above example it's obvious what you *expect*, but I suspect that 'for x in dict' will result in a 40/60 split in expectations, and like American voters, the 20% middle section will change their vote each recount :-) Now, if only there was a terribly obvious way to spell it... so that it's immediately obvious which of the two you wanted.... something like, oh, I donno, this, maybe: if key in dict.keys: ... if value in dict.values: ... Ponder-ponder--Guido-should-use-the-time-machine-for-this-one!-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Wed Jan 24 01:13:20 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 01:13:20 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> Message-ID: <02f401c0859a$765d07c0$e46940d5@hagrid> > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. you forgot "if (key, value) in dict" on the other hand, it's not quite obvious that "list.sort" doesn't return the sorted list, "print >>None" prints to standard output, "except KeyError, ValueError" doesn't catch a ValueError exception, etc, etc, etc. (nor that it's "has_key" and "hasattr", and not "has_key" and "has_attr" or "haskey" and "hasattr" ;-) let's just say that "in" is the same thing as "has_key", and be done with it. Cheers /F From tim.one at home.com Wed Jan 24 02:51:22 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 20:51:22 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123140604.E18796@trump.amber.org> Message-ID: [Christopher Petrilli] > .... > Unfortunately, for me, a Python implementation of Sets is only > interesting academicaly. Any time I've needed to work with them at a > large scale, I've needed them *much* faster than Python could achieve > without a C extension. How do you know that? I've used large sets in Python happily without resorting to C or kjbuckets (which is really aiming at fast operations on *graphs*, in which area it has no equal). Everyone (except Eric ) uses dicts to implement sets in Python, and "most" set operations can work at full C speed then; e.g., assuming both sets have N elements: membership testing O(1) -- it's just dict.has_key() element insertion O(1) -- dict[element] = 1 element removal O(1) -- del dict[element] union O(N), but at full C speed -- dict1.update(dict2) intersection O(N), but at Python speed (the only 2.1 dog in the bunch!) choose some element and remove it took O(N) time and additional space in 2.0, but is O(1) in both since dict.pop() was introduced iteration O(N), with O(N) additional space using dict.keys(), or O(1) additional space using dict.pop() repeatedly What are you going to do in C that's faster than using a Python dict for this purpose? Most key set operations are straightforward Python dict 1-liners then, and Python dicts are very fast. kjbuckets sets were slower last time I timed them (several years ago, but Python dicts have gotten faster since then while kjbuckets has been stagnant). There's a long tradition in the Lisp world of using unordered lists to represent sets (when the only tool you have is a hammer ... <0.5 wink>), but it's been easy to do much better than that in Python almost since the start. Even in the Python list world, enormous improvements for large sets can be gotten by maintaining lists in sorted order (then most O(N) operations drop to O(log2(N)), and O(N**2) to O(N)). Curiously, though, in 2.1 we can still use a dict-set for complex numbers, but no longer a sorted-list-set! Requiring a total ordering can get in the way more than requiring hashability (and vice versa -- that's a tough one). measurement-is-the-measure-of-all-measurable-things-ly y'rs - tim From greg at cosc.canterbury.ac.nz Wed Jan 24 03:45:01 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:45:01 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010124003814.F27785@xs4all.nl> Message-ID: <200101240245.PAA02098@s454.cosc.canterbury.ac.nz> Thomas Wouters : > Now, if only there was a terribly obvious way to spell it... so that it's > immediately obvious which of the two you wanted... Well, in the case of for key in d: or for value in d: it's immediately obvious to a *human* reader what is meant, so all we need to do is make the compiler a bit smarter. This can easily be done by the use of a small table, containing the equivalents of the words 'key' and 'value' in all known natural languages, against which the target variable name is matched using some suitable fuzzy matching algorithm. Soundex could be used for this, if we can decide on which version to use... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed Jan 24 03:46:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 21:46:37 -0500 Subject: [Python-Dev] getting rid of ucnhash In-Reply-To: Your message of "Tue, 23 Jan 2001 19:03:42 +0100." <013901c08566$d2a8f360$e46940d5@hagrid> References: <013901c08566$d2a8f360$e46940d5@hagrid> Message-ID: <200101240246.VAA06336@cj20424-a.reston1.va.home.com> > It's probably just me, but the names of the two unicode > modules tend to irritate me: > > > ls u*.pyd > ucnhash.pyd unicodedata.pyd > > (the former contains names, the latter data) > > I've been meaning to rename the former, but I just realized > that it might be better to get rid of it completely, and move > its functionality into the unicodedata module. > > The result is a single 200k unicodedata module, which con- > tains the name database as well as two new functions: > > name(character [, default]) => map unicode > character to name. if the name doesn't exist, > return the default object, or raise ValueError. > > lookup(name) => unicode character > (or raise KeyError if it doesn't exist) > > Should I check it in now, change the names/semantics and check > it in, or post it to sourceforge? To me, both of these are irrelevant details of the Unicode implementation. :-) IOW, feel free to check it in. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed Jan 24 03:49:21 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:49:21 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Message-ID: <200101240249.PAA02101@s454.cosc.canterbury.ac.nz> Tim Peters : > Requiring a total ordering can get in the way more than requiring > hashability Often it's useful to have *some* total ordering, and you don't really care what it is as long as its consistent. Maybe all types should be required to support cmp(x,y) even if doing x < y via the rich comparison route raises a NotOrderable exception. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Wed Jan 24 03:52:43 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jan 2001 15:52:43 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com> Message-ID: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Neil Schemenauer : > Basicly, we would > use the implementation of PyDict but drop the values. This could be incorporated into PyDict. Instead of storing keys and values in the same array, keep them in separate arrays and only allocate the values array the first time someone stores a value other than 1. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed Jan 24 03:58:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 21:58:59 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Wed, 24 Jan 2001 01:13:20 +0100." <02f401c0859a$765d07c0$e46940d5@hagrid> References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> Message-ID: <200101240258.VAA06479@cj20424-a.reston1.va.home.com> > let's just say that "in" is the same thing as "has_key", > and be done with it. You know, I've long resisted this, but I agree now -- this is the right thing. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 04:11:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:11:30 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: Your message of "Tue, 23 Jan 2001 12:35:04 CST." <14957.52952.48739.53360@beluga.mojam.com> References: <14957.52952.48739.53360@beluga.mojam.com> Message-ID: <200101240311.WAA06582@cj20424-a.reston1.va.home.com> > Guido> - Use "exec ... in dict" to avoid having to walk on eggshells; > Guido> locals no don't have to start with underscore. > > Thanks. I have just been incredibly short on time lately. You're welcome. > Guido> - Only test dbhash if bsddb can be imported. (Wonder if there > Guido> are more like this?) > > Alpha testing should pick those up, yes? ;-) Yes. :-) > Guido> ! try: > Guido> ! import bsddb > Guido> ! except ImportError: > Guido> ! if verbose: > Guido> ! print "can't import bsddb, so skipping dbhash" > Guido> ! else: > Guido> ! check_all("dbhash") > > Instead of having to know that dbhash includes bsddb, shouldn't dbhash be > the module that's imported here? I think I saw a complaint about this that specifically said that when dbhash is imported when bsddb can't be imported, an incomplete dbhash is left behind in sys.modules, and then a second import of dbhash will succeed -- but of course it will define no objects. Since dbhash may be imported elsewhere, testing for bsddb is safer. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 04:22:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:22:14 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.114,2.115 In-Reply-To: Your message of "Tue, 23 Jan 2001 08:24:38 PST." References: Message-ID: <200101240322.WAA06671@cj20424-a.reston1.va.home.com> > A few miscellaneous helpers. > > PyObject_Dump(): New function that is useful when debugging Python's C > runtime. In something like gdb it can be a pain to get some useful > information out of PyObject*'s. This function prints the str() of the > object to stderr, along with the object's refcount and hex address. > > PyGC_Dump(): Similar to PyObject_Dump() but knows how to cast from the > garbage collector prefix back to the PyObject* structure. > > [See Misc/gdbinit for some useful gdb hooks] > > none_dealloc(): Rather than SEGV if we accidentally decref None out of > existance, we assign None's and NotImplemented's destructor slot to > this function, which just calls abort(). Barry, since these are only gdb helpers, would it perhaps be better if their names started with "_Py" to indicate that they aren't part of the regular API? They violate an important rule: you shouldn't write to stderr directly, but always to sys.stderr. (There's a helper routines to write to stderr: PySys_WriteStderr().) I understand that for the gdb helper it's important to use the real stderr, and I don't object to having these functions present at all times (they're so small), but I do think that we should make it clear (by a _Py name, and also by a comment) that they should not be called! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Wed Jan 24 04:29:24 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 23 Jan 2001 19:29:24 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010124003814.F27785@xs4all.nl> Message-ID: I wrote: > The only change that needs to be made to support sets of immutable > elements is to provide "in" on dictionaries. Thomas Wouters wrote: > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. Yes, and i've seen this objection before, and i think it's silly. > Sure, from the above > example it's obvious what you *expect*, but I suspect that 'for x in dict' > will result in a 40/60 split in expectations, No way... it's at least 90/10. How often do you write 'dict.has_key(x)'? (std lib says: 206) How often do you write 'for x in dict.keys()'? (std lib says: 49) How often do you write 'x in dict.values()'? (std lib says: 0) How often do you write 'for x in dict.values()'? (std lib says: 3) I rest my case. -- ?!ng From barry at digicool.com Wed Jan 24 04:44:31 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 23 Jan 2001 22:44:31 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.114,2.115 References: <200101240322.WAA06671@cj20424-a.reston1.va.home.com> Message-ID: <14958.20383.795064.832967@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Barry, since these are only gdb helpers, would it perhaps be GvR> better if their names started with "_Py" to indicate that GvR> they aren't part of the regular API? They violate an GvR> important rule: you shouldn't write to stderr directly, but GvR> always to sys.stderr. (There's a helper routines to write to GvR> stderr: PySys_WriteStderr().) I understand that for the gdb GvR> helper it's important to use the real stderr, and I don't GvR> object to having these functions present at all times GvR> (they're so small), but I do think that we should make it GvR> clear (by a _Py name, and also by a comment) that they should GvR> not be called! I thought about it, couldn't decide and figured I'd check it in anyway, knowing that you'd let me know. See how wise I was? :) I will rename them as _Py* and fix the gdbinit file accordingly. One note: these functions /ought/ to be useful for dbx or any other command line debugger. I just haven't used anything but gdb for years. If anybody's got a dbxinit equivalent I could add that to Misc too. nothing-an-adjacent-office-wouldn't-have-solved-much-more-quick-ly y'rs, -Barry From guido at digicool.com Wed Jan 24 04:46:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:46:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: Your message of "Tue, 23 Jan 2001 09:22:26 EST." <20010123092226.A25968@thyrsus.com> References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> Message-ID: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> > Guido van Rossum : > > Can you point me to docs explaining the meaning of the BROWSER > > environment variable? I've never heard of it... The last new > > environment variables I learned were PAGER and EDITOR, probably 15 > > years ago when 4.1BSD was released... :-) ESR replies: > You've never heard of BROWSER because I invented it and have not > widely popularized it yet :-). Ping knew about it either because he > read the module code and saw that it was supposed to work, or because > he remembered the design discussion when webbrowser.py was first > implemented. > > I've had conversations with some key Perl and Tcl people (Larry Wall, > Tom Christiansen, Clif Flynt) about the BROWSER convention, and they > agree it's a good idea. I'll probably hack support for it into Perl's > browser launcher next. > > It's documented in the version of libwebbrowser.tex now in the CVS > tree. Grumble. That wasn't the kind of answer I expected. I don't like it if Python is used as a wedge to get a particular thing introduced to the rest of the world, no matter how useful it may seem at the time. If something is already a popular convention, I'll happily adopt it, but I'm not comfortable being put in front of somebody else's cart. There just are too many carts that would like to be pulled by a horse as strong as Python, and I don't want to take sides if I can avoid it. BROWSER seems unlikely to take the world by storm and I don't feel I need to be involved in the effort to get it accepted. (And yes, I know there are enough cases where I *did* take sides. There were some cases where I *do* want to take a side, and there were some mistakes -- which is one of the reasons why I'm shy about taking sides now.) Anyway, shouldn't you also talk to the developers of packages like KDE and Gnome? Surely their users would like to be able to configure the default webbrowser. Talking just to the scripting language people seems like you're thinking too small. There must be lots of C apps with the desire to invoke a browser. Also Emacs, which has an extensive list of browser-url-* functions (you might even learn a few tricks from it about how to invoke various external browsers) but AFAIK no default browser selection. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 04:54:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 22:54:25 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Wed, 24 Jan 2001 15:52:43 +1300." <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> > Neil Schemenauer : > > > Basicly, we would > > use the implementation of PyDict but drop the values. > > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Not a bad idea! (But shouldn't the default value be something else, like none?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 24 05:20:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 23 Jan 2001 23:20:56 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Wed, 24 Jan 2001 00:38:14 +0100." <20010124003814.F27785@xs4all.nl> References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> Message-ID: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> > > > dict[key] = 1 > > > if key in dict: ... > > > for key in dict: ... > > > Independently of implementation issues about sets, I think this is a > > damn fine idea. +1. > > It's come up before. The problem with it is that it's not quite obvious > whether it is 'if key in dict' or 'if value in dict'. Sure, from the above > example it's obvious what you *expect*, but I suspect that 'for x in dict' > will result in a 40/60 split in expectations, and like American voters, the > 20% middle section will change their vote each recount :-) > > Now, if only there was a terribly obvious way to spell it... so that it's > immediately obvious which of the two you wanted.... something like, oh, I > donno, this, maybe: > > if key in dict.keys: ... > if value in dict.values: ... > > Ponder-ponder--Guido-should-use-the-time-machine-for-this-one!-ly y'rs, No chance of a time-machine escape, but I *can* say that I agree that Ping's proposal makes a lot of sense. This is a reversal of my previous opinion on this matter. (Take note -- those don't happen very often! :-) First to submit a working patch gets a free copy of 2.1a2 and subsequent releases, --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 24 05:50:49 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 23 Jan 2001 23:50:49 -0500 Subject: [Python-Dev] getting rid of ucnhash In-Reply-To: <013901c08566$d2a8f360$e46940d5@hagrid> Message-ID: [/F] > It's probably just me, but the names of the two unicode > modules tend to irritate me: I don't care much about the names, but having two Unicode subprojects in the MS build seems overkill . > ls u*.pyd > ucnhash.pyd unicodedata.pyd > > (the former contains names, the latter data) Maybe that's the reason: the names don't get loaded at all unless you *use* one of the name APIs? Hard to say whether that's worth the bother; now that everything has been nicely compressed, it's sure not as compelling as it may have been earlier. > I've been meaning to rename the former, but I just realized > that it might be better to get rid of it completely, and move > its functionality into the unicodedata module. > > The result is a single 200k unicodedata module, which con- > tains the name database as well as two new functions: > > name(character [, default]) => map unicode > character to name. if the name doesn't exist, > return the default object, or raise ValueError. > > lookup(name) => unicode character > (or raise KeyError if it doesn't exist) > > Should I check it in now, change the names/semantics and check > it in, or post it to sourceforge? I have no opinion on what's best: you're working with it, you're the best judge of that. I only vote for checking in whatever you decide sooner rather than later; I'll fiddle the MS project files and readmes accordingly ASAP after that. From moshez at zadka.site.co.il Wed Jan 24 15:07:08 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 24 Jan 2001 16:07:08 +0200 (IST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001, Greg Ewing wrote: > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Cool idea, but even cooler (would catch more idioms, that is) is "the first time someone stores something not 'is' something in the dict, allocate the values array". This would catch small numbers, None and identifier-looking strings, for the measly cost of one pointer/dict object. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From moshez at zadka.site.co.il Wed Jan 24 15:15:39 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 24 Jan 2001 16:15:39 +0200 (IST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>, <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> Message-ID: <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il> On Tue, 23 Jan 2001 22:46:47 -0500, Guido van Rossum wrote: [ESR] > You've never heard of BROWSER because I invented it and have not > widely popularized it yet :-). [Guido v. Rossum] > Grumble. That wasn't the kind of answer I expected. I don't like it > if Python is used as a wedge to get a particular thing introduced to > the rest of the world, no matter how useful it may seem at the time. Guido, I think you're being over-dramatic. BROWSER is right in the tradition of PAGER and EDITOR, and a lot of other programs need it. I know Eric uses RH and mutt, so probably RH's urlview program (which mutt uses to jump to URLs) uses BROWSER. I was just about to submit a bug report to Debian that their urlview doesn't respect it. And if you really don't want to be a horse in front of a cart... > Anyway, shouldn't you also talk to the developers of packages like KDE > and Gnome? Surely their users would like to be able to configure the > default webbrowser. Yes -- via GNOME/KDE specific mechanisms. I have 0 experience with KDE, but I'm guessing the GNOME guys would do it via the GNOME "registry". KDE probably has something similar. I'm sure you wouldn't want Python to depend on GNOME, though it would be nice to make the browser-choosing part pluggable so when "import gnome" is done, it automatically tries to choose the user's browser. On UNIX (as opposed to GNOME/KDE, which are pretty much operating systems themselves), these things are done via environment variable. And $BROWSER doesn't seem like that much of an innovation. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From skip at mojam.com Wed Jan 24 07:28:21 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 00:28:21 -0600 (CST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: <200101240311.WAA06582@cj20424-a.reston1.va.home.com> References: <14957.52952.48739.53360@beluga.mojam.com> <200101240311.WAA06582@cj20424-a.reston1.va.home.com> Message-ID: <14958.30213.325584.373062@beluga.mojam.com> Guido> I think I saw a complaint about this that specifically said that Guido> when dbhash is imported when bsddb can't be imported, an Guido> incomplete dbhash is left behind in sys.modules, and then a Guido> second import of dbhash will succeed -- but of course it will Guido> define no objects. So it does: % ./python Python 2.1a1 (#2, Jan 23 2001, 23:30:41) [GCC 2.95.3 19991030 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import dbhash Traceback (most recent call last): File "", line 1, in ? File "/home/beluga/skip/src/python/dist/src/Lib/dbhash.py", line 3, in ? import bsddb ImportError: No module named bsddb >>> import dbhash >>> Can that be construed as a bug? If import fails, shouldn't the stub module that was inserted in sys.modules be removed? Skip From skip at mojam.com Wed Jan 24 07:31:08 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 00:31:08 -0600 (CST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <14958.30380.851599.764535@beluga.mojam.com> Guido> BROWSER seems unlikely to take the world by storm and I don't Guido> feel I need to be involved in the effort to get it accepted. Editors and web browsers are classes of tools which (one would hope) will always come in several varieties. Users have to have some way to specify what to launch. BROWSER seems analogous to the EDITOR environment variable which is commonly used in Unix environments for just that purpose. Skip From thomas at xs4all.net Wed Jan 24 08:03:09 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 08:03:09 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101240420.XAA07153@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 11:20:56PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <200101240420.XAA07153@cj20424-a.reston1.va.home.com> Message-ID: <20010124080308.G27785@xs4all.nl> On Tue, Jan 23, 2001 at 11:20:56PM -0500, Guido van Rossum wrote: > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, Patch submitted. It only implements 'if key in dict', not 'for key in dict'. The latter is kind of hard until we have a separate iteration protocol. (PEP, anyone ?) Once we have it, we could consider 'for key, value in dict', which is now easily explained with 'dict.popitem()'. Does this mean I get a legally sound and thus empty legal statement with every Python release for the rest of your, its or my life, Guido, or will you just make me 'Free Python Release Receiver For Life' ? :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From pf at artcom-gmbh.de Wed Jan 24 08:31:30 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 24 Jan 2001 08:31:30 +0100 (MET) Subject: OT: contribution rewards (was Re: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> from Guido van Rossum at "Jan 23, 2001 11:20:56 pm" Message-ID: Hi, Guido van Rossum: [...] > Ping's proposal makes a lot of sense. This is a reversal of my > previous opinion on this matter. (Take note -- those don't happen > very often! :-) It gives a warm und fuzzy feeling to see that happen sometimes at all. ;-) > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, This repeated offer of free copies of Python becomes increasingly boring. For quite a while I myself have not contributed anything useful and I am nevertheless hoarding free copies of Python here. ;-) What about offering another immaterial reward to potential contributors instead? What about "fame points"? Anybody contributing something useful to Python receives a certain number of "fame points": These fame points will be added and placed in front of the name of the contributor into the ACKS file and the file will be sorted accordingly turning the ACKS file effectively into some kind of "Python contribution high score" ... ;-) Just kidding, Peter From tim.one at home.com Wed Jan 24 09:08:50 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 03:08:50 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123050807.A29115@glacier.fnational.com> Message-ID: [Neil Schemenauer] > I think this argues that if sets are added to the core they > should be implemented as an extension type with the speed of > dictionaries and the memory usage of lists. Basicly, we would > use the implementation of PyDict but drop the values. They'll be slower than dicts and take more memory than lists then. WRT memory, dicts cache the hash code with each entry for speed (so double the memory of a list even without the value field), and are never more than 2/3 full anyway. The dict implementation also gets low-level speed benefits out of using both the key and value fields to characterize the nature of a slot (the key field is NULL iff the slot is virgin; the value field is NULL iff the slot is available (virgin or dummy)). Dummy slots can be avoided (and so also the need for runtime code to distinguish them from active slots) by using a hash table of pointers to linked lists-- or flex vectors, or linked lists of small vectors --instead, and in most ways that leads to much simpler code (no more fiddling with dummies, no more probe-sequence hassles, no more boosting the size before the table is full). But without fine control over the internals of malloc, that takes even more memory in the end. Interesting twist: "a dict" *is* "a set", but a set of (key, value) pairs further constrained so that no two elements have the same key. So any set implementation can be used as-is to implement a dict as a set of 2-tuples, customizing the hash and "is equal" functions to look at just the tuples' first elements. The was the view taken by SETL in 1969, although their "map" (dict) type was eventually optimized to get away from actually constructing 2-tuples. Indeed, SETL eventually grew an elaborate optional type declaration sublanguage, allowing the user to influence many details of its many internal set-storage schemes; e.g., from pg 399 of "Programming With Sets: An Introduction to SETL": For example, we can declare [I'm putting their keywords in UPPERCASE for, umm, clarity] successors: LOCAL MMAP(ELMT b) REMOTE SET(ELMT b); This declaration specifies that for each x in b the image set successors{x} is stored in the element block of x, and that this image set is always to be represented as a bit vector. Similarly, the declaration successors: LOCAL MMAP(ELMT b) SPARSE SET(ELMT b); specifies that for each x in b the image set successors{x} is to be stored as a hash table containing pointers to elements of b. Note that the attribute LOCAL cannot be used for image sets of multivalued maps, This follows from the remarks in section 10.4.3 on the awkwardness of making local objects into subparts of composite objects. Clear? Snort. Here are some citations lifted from the web for their experience in trying to make these kinds of decisions by magic: @article{dewar:79, title="Programming by Refinement, as Exemplified by the {SETL} Representation Sublanguage", author="Robert B. K. Dewar and Arthur Grand and Ssu-Cheng Liu and Jacob T. Schwartz and Edmond Schonberg", journal=toplas, year=1979, month=jul, volume=1, number=1, pages="27--49" } @article{schonberg:81, title="An Automatic Technique for Selection of Data Structures in {SETL} Programs", author="Edmond Schonberg and Jacob T. Schwartz and Micha Sharir", journal=toplas, year=1981, month=apr, volume=3, number=2, pages="126--143" } @article{freudenberger:83, title="Experience with the {SETL} Optimizer", author="Stefan M. Freudenberger and Jacob T. Schwartz and Micha Sharir", pages="26--45", journal=toplas, year=1983, month=jan, volume=5, number=1 } If someone wanted to take sets seriously today, a better approach would be to define a minimal "set interface" ("abstract base class" in C++ terms), then supply multiple implementations of that interface, letting the user choose directly which implementation strategy they want for each of their sets. And people are doing just that in the C++ and Java worlds; e.g., http://developer.java.sun.com/developer/onlineTraining/ collections/Collection.html#SetInterface Curiously, the newer Java Collections Framework (covering multiple implementations of list, set, and dict interfaces) gave up on thread-safety by default, because it cost too much at runtime. Just another thing to argue about . we're-not-exactly-pioneers-here-ly y'rs - tim From fredrik at effbot.org Wed Jan 24 09:29:30 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 09:29:30 +0100 Subject: [Python-Dev] getting rid of ucnhash References: <013901c08566$d2a8f360$e46940d5@hagrid> <200101240246.VAA06336@cj20424-a.reston1.va.home.com> Message-ID: <019801c085df$c7ee0540$e46940d5@hagrid> guido wrote: > > It's probably just me, but the names of the two unicode > > modules tend to irritate me: > > > > > ls u*.pyd > > ucnhash.pyd unicodedata.pyd > > To me, both of these are irrelevant details of the Unicode > implementation. :-) IOW, feel free to check it in. Done. Note that Include/ucnhash.h is still there; it declares the "ucnhash_CAPI" structure used to access names from the unicodeobject module. (and all name-related tests are still kept in test_ucn) I'll leave it to Tim to update the MSVC build files. Cheers /F From tim.one at home.com Wed Jan 24 09:28:34 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 03:28:34 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Can you point me to docs explaining the meaning of the BROWSER > environment variable? I've never heard of it... The last new > environment variables I learned were PAGER and EDITOR, probably 15 > years ago when 4.1BSD was released... :-) I gotta say, politics aside, BROWSER is a screamingly natural answer to the question "what comes next in this sequence?": PAGER, EDITOR, ... Dear Lord, even *I* use a browser almost every week . explicit-is-better-than-implicit-ly y'rs - tim From esr at thyrsus.com Wed Jan 24 10:02:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:02:59 -0500 Subject: OT: contribution rewards (was Re: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: ; from pf@artcom-gmbh.de on Wed, Jan 24, 2001 at 08:31:30AM +0100 References: <200101240420.XAA07153@cj20424-a.reston1.va.home.com> Message-ID: <20010124040259.A28086@thyrsus.com> Peter Funk : > What about offering another immaterial reward to potential contributors > instead? What about "fame points"? Anybody contributing something > useful to Python receives a certain number of "fame points": These > fame points will be added and placed in front of the name of > the contributor into the ACKS file and the file will be sorted > accordingly turning the ACKS file effectively into some kind of > "Python contribution high score" ... ;-) > > Just kidding, Peter You may be joking, but as an observer of how gift cultures work I say this isn't a bad idea. -- Eric S. Raymond "One of the ordinary modes, by which tyrants accomplish their purposes without resistance, is, by disarming the people, and making it an offense to keep arms." -- Constitutional scholar and Supreme Court Justice Joseph Story, 1840 From esr at thyrsus.com Wed Jan 24 10:09:18 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:09:18 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101240258.VAA06479@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 09:58:59PM -0500 References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> <200101240258.VAA06479@cj20424-a.reston1.va.home.com> Message-ID: <20010124040918.B28086@thyrsus.com> Guido van Rossum : > > let's just say that "in" is the same thing as "has_key", > > and be done with it. > > You know, I've long resisted this, but I agree now -- this is the > right thing. I think we've just justified the time and energy that went into this discussion. -- Eric S. Raymond What is a magician but a practicing theorist? -- Obi-Wan Kenobi, 'Return of the Jedi' From esr at thyrsus.com Wed Jan 24 10:14:27 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:14:27 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 23, 2001 at 10:46:47PM -0500 References: <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124041427.D28086@thyrsus.com> Guido van Rossum : > Grumble. That wasn't the kind of answer I expected. I don't like it > if Python is used as a wedge to get a particular thing introduced to > the rest of the world, no matter how useful it may seem at the time. Oh, stop! I'm not using Python as an argument for other people to adopt the BROWSER convention. The idea sells itself quite nicely by analogy to EDITOR and PAGER the second people hear it. > Anyway, shouldn't you also talk to the developers of packages like KDE > and Gnome? Surely their users would like to be able to configure the > default webbrowser. Talking just to the scripting language people > seems like you're thinking too small. There must be lots of C apps > with the desire to invoke a browser. Also Emacs, which has an > extensive list of browser-url-* functions (you might even learn a few > tricks from it about how to invoke various external browsers) but > AFAIK no default browser selection. All on my TO-DO list. -- Eric S. Raymond It is proper to take alarm at the first experiment on our liberties. We hold this prudent jealousy to be the first duty of citizens and one of the noblest characteristics of the late Revolution. The freemen of America did not wait till usurped power had strengthened itself by exercise and entangled the question in precedents. They saw all the consequences in the principle, and they avoided the consequences by denying the principle. We revere this lesson too much ... to forget it -- James Madison. From esr at thyrsus.com Wed Jan 24 10:16:12 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:16:12 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 03:28:34AM -0500 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124041612.E28086@thyrsus.com> Tim Peters : > I gotta say, politics aside, BROWSER is a screamingly natural answer to the > question "what comes next in this sequence?": > > PAGER, EDITOR, ... That's exactly what I thought when I was struck by the obvious. Everybody I spread this meme to seems to agree. -- Eric S. Raymond Government is actually the worst failure of civilized man. There has never been a really good one, and even those that are most tolerable are arbitrary, cruel, grasping and unintelligent. -- H. L. Mencken From esr at thyrsus.com Wed Jan 24 10:21:56 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 04:21:56 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Wed, Jan 24, 2001 at 04:15:39PM +0200 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com>, <20010123041730.A25165@thyrsus.com> <200101231406.JAA04765@cj20424-a.reston1.va.home.com> <20010123092226.A25968@thyrsus.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124141539.76C3FA83E@darjeeling.zadka.site.co.il> Message-ID: <20010124042156.F28086@thyrsus.com> Moshe Zadka : > I know Eric uses RH and mutt, so probably RH's urlview program (which > mutt uses to jump to URLs) uses BROWSER. I was just about to submit > a bug report to Debian that their urlview doesn't respect it. Oh, *do* that! Note: BROWSER may consist of a colon-separated series of parts, browser commands to be tried in order (this is useful so you can put an X browser first, then a console browser, and have the right thing happen). If a part contains %s, the URL is substituted there; otherwise, the URL is concatenated to the command after a space. -- Eric S. Raymond Gun Control: The theory that a woman found dead in an alley, raped and strangled with her panty hose, is somehow morally superior to a woman explaining to police how her attacker got that fatal bullet wound. -- L. Neil Smith From tim.one at home.com Wed Jan 24 10:24:26 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 04:24:26 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> Message-ID: [Greg Ewing] > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. [Guido] > Not a bad idea! In theory, but if Vladimir were here he'd bust a gut over the possibly bad cache effects on "real dicts" (by keeping everything together, simply accessing the cached hash code brings both the key and value pointers into L1 cache too). We would need to quantify the effect of breaking that connection. > (But shouldn't the default value be something else, > like none?) Bleech. I hate the idiom of using a false value to mean "present". d = {} for x in seq: d[x] = 1 runs faster too (None needs a LOAD_GLOBAL now). From tim.one at home.com Wed Jan 24 11:01:36 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 05:01:36 -0500 Subject: [Python-Dev] test___all__ failing; Windows Message-ID: > python ../lib/test/regrtest.py test___all__ test___all__ test test___all__ crashed -- exceptions.AttributeError: 'locale' module has no attribute 'LC_MESSAGES' And indeed it does not: > python Python 2.1a1 (#9, Jan 24 2001, 04:40:55) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import locale >>> dir(locale) ['CHAR_MAX', 'Error', 'LC_ALL', 'LC_COLLATE', 'LC_CTYPE', 'LC_MONETARY', 'LC_NUMERIC', 'LC_TIME', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '_build_localename', '_group', '_parse_localename', '_print_locale', '_setlocale', '_test', 'atof', 'atoi', 'encoding_alias', 'format', 'getdefaultlocale', 'getlocale', 'locale_alias', 'localeconv', 'normalize', 'resetlocale', 'setlocale', 'str', 'strcoll', 'string', 'strxfrm', 'sys', 'windows_locale'] >>> Nor is LC_MESSAGES std C (the other LC_XXX guys are). I pin the blame on from _locale import * in locale.py -- who knows what that's supposed to export? Certainly not Skip . From tim.one at home.com Wed Jan 24 11:17:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 05:17:47 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: Message-ID: Nevermind; checked in a hack to stop the error on Windows. From mal at lemburg.com Wed Jan 24 14:00:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 14:00:28 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <14957.53331.342827.462297@localhost.localdomain> <20010123153839.B26676@thyrsus.com> <20010124003814.F27785@xs4all.nl> <02f401c0859a$765d07c0$e46940d5@hagrid> Message-ID: <3A6ED1EC.237B5B1D@lemburg.com> Fredrik Lundh wrote: > > > It's come up before. The problem with it is that it's not quite obvious > > whether it is 'if key in dict' or 'if value in dict'. > > you forgot "if (key, value) in dict" > > on the other hand, it's not quite obvious that "list.sort" > doesn't return the sorted list, "print >>None" prints to > standard output, "except KeyError, ValueError" doesn't > catch a ValueError exception, etc, etc, etc. > > (nor that it's "has_key" and "hasattr", and not "has_key" > and "has_attr" or "haskey" and "hasattr" ;-) > > let's just say that "in" is the same thing as "has_key", > and be done with it. +1 all the way :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 24 15:01:33 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 15:01:33 +0100 Subject: [Python-Dev] Interfaces (Is X a (sequence|mapping)?) References: <200101230814.f0N8EWQ00849@mira.informatik.hu-berlin.de> <3A6D4B9F.38B17046@lemburg.com> <200101231531.KAA05122@cj20424-a.reston1.va.home.com> Message-ID: <3A6EE03D.4D5DFD17@lemburg.com> Guido van Rossum wrote: > > > Polymorphic code will usually get you more out of an > > algorithm, than type-safe or interface-safe code. > > Right. > > But there are times when people want to write methods that take > e.g. either a sequence or a mapping, and need to distinguish between > the two. That's not easy in Python! Java and C++ support it very > well though, and thus we'll always keep seeing this kind of > complaint. Not sure what to do, except to recommend "find out which > methods you expect in one case but not in the other (e.g. keys()) and > do a hasattr() test for that." Perhaps we should provide simple means for testing a set of available methods and slots ?! E.g. hasinterface(obj, ('keys', 'items', '__len__')) Objects could provide an __interface__ special attribute for this purpose (since not all slots can be auto-detected and -verified without side-effects). > > BTW, there are Python interfaces to PySequence_Check() and > > PyMapping_Check() burried in the builtin operator module in case > > you really do care ;) ... > > > > operator.isSequenceType() > > operator.isMappingType() > > + some other C style _Check() APIs > > > > These only look at the type slots though, so Python instances > > will appear to support everything but when used fail with > > an exception if they don't provide the proper __xxx__ hooks. > > Yes, these should probably be deprecated. I certainly have never used > them! (The operator module doesn't seem to get much use in > general... Was it a bad idea?) Some of these are nice to have and provide some good performance boost (e.g. the numeric slot access APIs). The type slot checking APIs are not too useful though. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at digicool.com Wed Jan 24 10:05:44 2001 From: jim at digicool.com (Jim Fulton) Date: Wed, 24 Jan 2001 04:05:44 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> <20010123150616.F18796@trump.amber.org> Message-ID: <3A6E9AE8.6C2D3CF0@digicool.com> Christopher Petrilli wrote: > > Neil Schemenauer [nas at arctrix.com] wrote: > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > Unfortunately, for me, a Python implementation of Sets is only > > > interesting academicaly. Any time I've needed to work with them at a > > > large scale, I've needed them *much* faster than Python could achieve > > > without a C extension. > > > > I think this argues that if sets are added to the core they > > should be implemented as an extension type with the speed of > > dictionaries and the memory usage of lists. Basicly, we would > > use the implementation of PyDict but drop the values. > > This is effectively the implementation that Zope has for Sets. Except we use sorted collections with binary search for sets. I think that a simple hash-based set would make alot of sense. > In > addition we have "buckets" that have scores on them (which are > implemented as a modified BTree). > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > a meeting, but I hope he'll comment on the implementation that was > chosen for our software. We have a number of special needs: - Scalability is critical. We make some special opimizations, like sets of integers and mapping objects with integer keys and values. In these cases, data are stored using C int arrays, allowing very efficient data storage and manipulation, especially when using integer keys. - We need to spread data over multiple database records. Our data structures may be hundreds of megabytes in size. We have ZODB-aware structures that use multiple independently stored database objects. - Range searches are very common, and under some circomstances, sorted collections and BTrees can have very little overhead compared to dictionaries. For this reason, out mapping objects and sets have been based on BTrees and sorted collections. Unfortunately, our current BTree implementation has a flaw that causes excessive number of objects to be updated when items are added and removed. (Each BTree internal node keeps track of the number of objects contained in it.) Also, out current sets are limited to integers and cannot be spread over multiple database records. We are completing a new BTree implementation that overcomes these limitations. IN this implementation, we will provide sets as value-less BTrees. Jim -- Jim Fulton mailto:jim at digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org From gvwilson at nevex.com Wed Jan 24 15:10:41 2001 From: gvwilson at nevex.com (Greg Wilson) Date: Wed, 24 Jan 2001 09:10:41 -0500 Subject: [Python-Dev] re: sets In-Reply-To: <20010124032401.EB329F199@mail.python.org> Message-ID: <000301c0860f$6fa29010$770a0a0a@nevex.com> 1. I did a poll overnight by email of 22 friends and colleagues, none of whom are regular Python users (yet). My question was, "Would you expect the interface of a set class to be like the interface of a vector or list, or like the interface of a map or hash?" 15 people have replied; all 15 have said, "map or hash". Several respondents are Perl hackers, so I'm sure the answer is influenced by previous exposure to the set-as-valueless-hash idiom. Still, I think 15-0 is a pretty convincing score... Four, unprompted, said that they thought the STL's hierarchy of containers was as good as it gets, and that other languages should mirror it. (One of those added that this makes teaching much simpler --- students can transfer instincts from one language to another.) 2. Is there enough interest in sets for a BOF at IPC9? Please reply to me point-to-point if you're interested; I'll summarize and post the result. I volunteer to bring the donuts... > > Ka-Ping Yee: > > The only change that needs to be made to support sets of immutable > > elements is to provide "in" on dictionaries. The rest is then all > > quite natural: > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > various: > > [but what about 'value in dict' or '(key, value) in dict'?] > Fredrik Lundh: > let's just say that "in" is the same thing as "has_key", > and be done with it. > Guido van Rossum: > You know, I've long resisted this, but I agree now -- this is the > right thing. Greg Wilson: Woo hoo! Now, on a related note, what is the status of the 'indices()' proposal, as in: for i in indices(someList): instead of: for i in range(len(someList)): Would 'indices(dict)' be the same as 'dict.keys()', to allow uniform iteration? Or would it be more economical to introduce a 'keys()' method on lists and tuples, so that: for i in collection.keys(): would work on dicts, lists, and tuples? I know that 'keys()' is the wrong name for lists and tuples, but dicts are already using it, and it's completely unambiguous... Thanks, Greg From mal at lemburg.com Wed Jan 24 15:46:10 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 24 Jan 2001 15:46:10 +0100 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101231816.LAA03551@localhost.localdomain> <20010123133904.B26487@thyrsus.com> <20010123140604.E18796@trump.amber.org> <20010123050807.A29115@glacier.fnational.com> <20010123150616.F18796@trump.amber.org> <3A6E9AE8.6C2D3CF0@digicool.com> Message-ID: <3A6EEAB2.5E6A4E83@lemburg.com> Jim Fulton wrote: > > Christopher Petrilli wrote: > > > > Neil Schemenauer [nas at arctrix.com] wrote: > > > On Tue, Jan 23, 2001 at 02:06:05PM -0500, Christopher Petrilli wrote: > > > > Unfortunately, for me, a Python implementation of Sets is only > > > > interesting academicaly. Any time I've needed to work with them at a > > > > large scale, I've needed them *much* faster than Python could achieve > > > > without a C extension. > > > > > > I think this argues that if sets are added to the core they > > > should be implemented as an extension type with the speed of > > > dictionaries and the memory usage of lists. Basicly, we would > > > use the implementation of PyDict but drop the values. > > > > This is effectively the implementation that Zope has for Sets. > > Except we use sorted collections with binary search for sets. > > I think that a simple hash-based set would make alot of sense. > > > In > > addition we have "buckets" that have scores on them (which are > > implemented as a modified BTree). > > > > Unfortunately Jim Fulton (who wrote all the code for that level) is in > > a meeting, but I hope he'll comment on the implementation that was > > chosen for our software. > > We have a number of special needs: > > - Scalability is critical. We make some special opimizations, > like sets of integers and mapping objects with integer keys > and values. In these cases, data are stored using C int arrays, > allowing very efficient data storage and manipulation, especially > when using integer keys. > > - We need to spread data over multiple database records. Our data > structures may be hundreds of megabytes in size. We have ZODB-aware > structures that use multiple independently stored database objects. > > - Range searches are very common, and under some circomstances, > sorted collections and BTrees can have very little overhead > compared to dictionaries. For this reason, out mapping objects > and sets have been based on BTrees and sorted collections. > > Unfortunately, our current BTree implementation has a flaw that > causes excessive number of objects to be updated when items are > added and removed. (Each BTree internal node keeps track of the number > of objects contained in it.) Also, out current sets are limited > to integers and cannot be spread over multiple database records. > > We are completing a new BTree implementation that overcomes these > limitations. IN this implementation, we will provide sets as > value-less BTrees. You may want to check out a soon to be released new mx package: mxBeeBase. This is an on-disk b+tree implementation which supports data files up to 2GB on 32-bit platforms. Here's a preview: http://www.lemburg.com/python/mxBeeBase.html (The links on that page are not functional.) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Wed Jan 24 15:42:23 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 08:42:23 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: References: Message-ID: <14958.59855.4855.52638@beluga.mojam.com> Tim> Nor is LC_MESSAGES std C (the other LC_XXX guys are). Tim> I pin the blame on Tim> from _locale import * Tim> in locale.py -- who knows what that's supposed to export? Tim> Certainly not Skip . Was that a roundabout way of complimenting me for having found a bug? ;-) Skip From skip at mojam.com Wed Jan 24 15:50:02 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 08:50:02 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: References: Message-ID: <14958.60314.482226.825611@beluga.mojam.com> Tim> Nevermind; checked in a hack to stop the error on Windows. Probably should file a bug report (if you haven't already) so the root problem isn't forgotten because the hack obscures it. I see this code in localemodule.c: #ifdef LC_MESSAGES x = PyInt_FromLong(LC_MESSAGES); PyDict_SetItemString(d, "LC_MESSAGES", x); Py_XDECREF(x); #endif /* LC_MESSAGES */ Martin, looks like this module is your baby. Care to hazard a guess about whether LC_MESSAGES should always or never be there? Skip From fredrik at effbot.org Wed Jan 24 16:11:33 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 24 Jan 2001 16:11:33 +0100 Subject: [Python-Dev] test___all__ failing; Windows References: <14958.60314.482226.825611@beluga.mojam.com> Message-ID: <04de01c08617$f56216f0$e46940d5@hagrid> Skip wrote: > Probably should file a bug report (if you haven't already) so the root > problem isn't forgotten because the hack obscures it. I see this code in > localemodule.c: > > #ifdef LC_MESSAGES > x = PyInt_FromLong(LC_MESSAGES); > PyDict_SetItemString(d, "LC_MESSAGES", x); > Py_XDECREF(x); > #endif /* LC_MESSAGES */ > > Martin, looks like this module is your baby. Care to hazard a guess about > whether LC_MESSAGES should always or never be there? I think the correct answer is "sometimes": ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, and LC_TIME Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and LC_TIME in other words, if it's supported, it should be exposed by the Python bindings. Cheers /F From tismer at tismer.com Wed Jan 24 15:40:04 2001 From: tismer at tismer.com (Christian Tismer) Date: Wed, 24 Jan 2001 16:40:04 +0200 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> Message-ID: <3A6EE944.C8CC6EF7@tismer.com> Greg Ewing wrote: > > Neil Schemenauer : > > > Basicly, we would > > use the implementation of PyDict but drop the values. > > This could be incorporated into PyDict. Instead of storing keys and > values in the same array, keep them in separate arrays and only > allocate the values array the first time someone stores a value other > than 1. Very good idea. It fits also in my view of how dicts should be implemented: Keep keys and values apart, since this information has different access patterns. I think (or at least hope) that dictionaries become faster, when hashes, keys and values are in seperate areas, giving more cache hits. Not sure if hashes and keys should be apart, but sure for values. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at digicool.com Wed Jan 24 16:37:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 10:37:03 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test___all__.py,1.3,1.4 In-Reply-To: Your message of "Wed, 24 Jan 2001 00:28:21 CST." <14958.30213.325584.373062@beluga.mojam.com> References: <14957.52952.48739.53360@beluga.mojam.com> <200101240311.WAA06582@cj20424-a.reston1.va.home.com> <14958.30213.325584.373062@beluga.mojam.com> Message-ID: <200101241537.KAA27039@cj20424-a.reston1.va.home.com> > Guido> I think I saw a complaint about this that specifically said that > Guido> when dbhash is imported when bsddb can't be imported, an > Guido> incomplete dbhash is left behind in sys.modules, and then a > Guido> second import of dbhash will succeed -- but of course it will > Guido> define no objects. > > So it does: > > % ./python > Python 2.1a1 (#2, Jan 23 2001, 23:30:41) > [GCC 2.95.3 19991030 (prerelease)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> import dbhash > Traceback (most recent call last): > File "", line 1, in ? > File "/home/beluga/skip/src/python/dist/src/Lib/dbhash.py", line 3, in ? > import bsddb > ImportError: No module named bsddb > >>> import dbhash > >>> > > Can that be construed as a bug? If import fails, shouldn't the stub module > that was inserted in sys.modules be removed? Yep, but not a very important bug -- typically this isn't caught. Feel free to check in a change; I think you should be able to insert something like import sys try: import bsddb except ImportError: del sys.modules[__name__] raise into dbhash. If this works for you in testing, forget the patch manager, just check it in. (I'm too busy to do much myself, the company needs me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Wed Jan 24 16:32:55 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 24 Jan 2001 16:32:55 +0100 (MET) Subject: LC_MESSAGES (was Re: [Python-Dev] test___all__ failing; Windows) In-Reply-To: <14958.60314.482226.825611@beluga.mojam.com> from Skip Montanaro at "Jan 24, 2001 8:50: 2 am" Message-ID: Hi, Skip Montanaro: > > Tim> Nevermind; checked in a hack to stop the error on Windows. > > Probably should file a bug report (if you haven't already) so the root > problem isn't forgotten because the hack obscures it. I see this code in > localemodule.c: > > #ifdef LC_MESSAGES > x = PyInt_FromLong(LC_MESSAGES); > PyDict_SetItemString(d, "LC_MESSAGES", x); > Py_XDECREF(x); > #endif /* LC_MESSAGES */ > > Martin, looks like this module is your baby. Care to hazard a guess about > whether LC_MESSAGES should always or never be there? AFAI found out, LC_MESSAGES was added to the POSIX "standard" in Posix.2. Non-posix2 compatible systems probably miss the proper functionality behind 'setlocale()'. So the best solution would be to add a clever emulation/approximation of this feature, if the underlying platform (here windows) doesn't provide it. This would require to wrap 'setlocale()'. But I'm not sure how to emulate for example 'setlocale(LC_MESSAGES, 'DE_de') on a Windows box. May be it is impossible to achieve. What I would love to see is that the typical query 'setlocale(LC_MESSAGES)' would return 'DE_de' on a Box running for example the german version of Windows or MacOS. This would eliminate the need for ugly language selection menus on these platforms in a portable fashion. Regards, Peter From guido at digicool.com Wed Jan 24 16:41:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 10:41:07 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Wed, 24 Jan 2001 16:07:08 +0200." <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> Message-ID: <200101241541.KAA27082@cj20424-a.reston1.va.home.com> > > This could be incorporated into PyDict. Instead of storing keys and > > values in the same array, keep them in separate arrays and only > > allocate the values array the first time someone stores a value other > > than 1. > > Cool idea, but even cooler (would catch more idioms, that is) is > "the first time someone stores something not 'is' something in the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > dict, allocate the values array". This would catch small numbers, > None and identifier-looking strings, for the measly cost of one > pointer/dict object. Sorry, but I don't understand what you mean by the ^^^ marked phrase. Can you please elaborate? Regarding storing one for "present", that's all well and fine, but it suggests to me that storing a false value could mean "not present". Do we really want that? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Thu Jan 25 01:50:13 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 02:50:13 +0200 (IST) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101241541.KAA27082@cj20424-a.reston1.va.home.com> References: <200101241541.KAA27082@cj20424-a.reston1.va.home.com>, <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> Message-ID: <20010125005013.58C12A840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 10:41:07 -0500, Guido van Rossum wrote: > > Cool idea, but even cooler (would catch more idioms, that is) is > > "the first time someone stores something not 'is' something in the > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > dict, allocate the values array". This would catch small numbers, > > None and identifier-looking strings, for the measly cost of one > > pointer/dict object. > > Sorry, but I don't understand what you mean by the ^^^ marked phrase. > Can you please elaborate? I should really stop writing incomprehensible bits like that. Heck, I can't even understand it on second reading. I meant that the dictionary would keep a slot for "the one and only value". First time someone puts a value in the dict, it puts it in the "one and only value" slot, and doesn't initalize the value array. The second time someone puts a value, it checks for pointer equality with that "one and only value". If it is the same, it it still doesn't initalize the value array. The only time when the dictionary initalizes the value array is when two pointer-different values are put in. This would let me code a[key] = None For my sets (but consistent in the same set!) a[key] = 1 When the timbot codes (again, consistent in the same set) and a[key] = 'present' If you're really weird. (identifier-like strings get interned) That's not *semantics*, that's *optimization* for a commonly used (I think) idiom with dictionaries -- you can't predict the value, but it will probably remain the same. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From skip at mojam.com Wed Jan 24 17:44:17 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 10:44:17 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <04de01c08617$f56216f0$e46940d5@hagrid> References: <14958.60314.482226.825611@beluga.mojam.com> <04de01c08617$f56216f0$e46940d5@hagrid> Message-ID: <14959.1633.163407.779930@beluga.mojam.com> Fredrik> I think the correct answer is "sometimes": Fredrik> ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, Fredrik> LC_MONETARY, LC_NUMERIC, and LC_TIME Fredrik> Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, Fredrik> LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and Fredrik> LC_TIME Fredrik> in other words, if it's supported, it should be exposed by Fredrik> the Python bindings. Then this suggests that either Tim's hack is the correct fix (leave it out because we can't rely on it always being there) or I should add it to __all__ at the bottom of the file if and only if it's present in the module's namespace. Skip From moshez at zadka.site.co.il Thu Jan 25 01:57:22 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 02:57:22 +0200 (IST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <04de01c08617$f56216f0$e46940d5@hagrid> References: <04de01c08617$f56216f0$e46940d5@hagrid>, <14958.60314.482226.825611@beluga.mojam.com> Message-ID: <20010125005722.D2229A840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 16:11:33 +0100, "Fredrik Lundh" wrote: > I think the correct answer is "sometimes": > > ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, > LC_MONETARY, LC_NUMERIC, and LC_TIME > > Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, > LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and > LC_TIME > > in other words, if it's supported, it should be exposed by > the Python bindings. In that case, the __all__ attribute in the module has to be calculated dynamically. Say, adding code like try: LC_MESSAGES except NameError: pass else: __all__.append('LC_MESSAGES') Ditto for anything else. Should I check in a patch? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From trentm at ActiveState.com Wed Jan 24 17:49:17 2001 From: trentm at ActiveState.com (Trent Mick) Date: Wed, 24 Jan 2001 08:49:17 -0800 Subject: [Python-Dev] webbrowser.py In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 03:28:34AM -0500 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010124084917.C29977@ActiveState.com> How will the expected adherence of apps to BROWSER jive with the current (and poorly understood by me) Windows convention of specifying the "default" browser somewhere in the registry? Trent -- Trent Mick TrentM at ActiveState.com From skip at mojam.com Wed Jan 24 17:49:23 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 10:49:23 -0600 (CST) Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <20010125005722.D2229A840@darjeeling.zadka.site.co.il> References: <04de01c08617$f56216f0$e46940d5@hagrid> <14958.60314.482226.825611@beluga.mojam.com> <20010125005722.D2229A840@darjeeling.zadka.site.co.il> Message-ID: <14959.1939.398029.896891@beluga.mojam.com> Moshe> In that case, the __all__ attribute in the module has to be Moshe> calculated dynamically. Say, adding code like No need. I've already got this exact change in my local copy and I'll be adding a few more __all__ lists later today. Skip From paulp at ActiveState.com Wed Jan 24 17:56:26 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 24 Jan 2001 08:56:26 -0800 Subject: [Python-Dev] I think my set module is ready for prime time; comments? References: <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> <200101241541.KAA27082@cj20424-a.reston1.va.home.com> Message-ID: <3A6F093A.A311C71E@ActiveState.com> Guido van Rossum wrote: > >... > > > Cool idea, but even cooler (would catch more idioms, that is) is > > "the first time someone stores something not 'is' something in the > > Sorry, but I don't understand what you mean by the ^^^ marked phrase. > Can you please elaborate? I wasn't clear about that either. The idea is: def add(new_value): if not values_array: if self.magic_value is NULL: self.magic_value = new_value elif new_value is not self.magic_value: self.values_array=[self.magic_value, new_value, ... ] else: # new_value is self.magic_value: do nothing I am neutral on this proposal myself. I think that even if we optimize any code where you pass the same thing over and over again, we should document a convention for consistency. So I'm not sure there is much advantage. Paul Prescod From esr at thyrsus.com Wed Jan 24 17:53:31 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 11:53:31 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124084917.C29977@ActiveState.com>; from trentm@ActiveState.com on Wed, Jan 24, 2001 at 08:49:17AM -0800 References: <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> Message-ID: <20010124115331.A15059@thyrsus.com> Trent Mick : > How will the expected adherence of apps to BROWSER jive with the current (and > poorly understood by me) Windows convention of specifying the "default" > browser somewhere in the registry? BROWSER overrides the registry setting. Which is OK; under Windows, only wizards are going to muck with it. -- Eric S. Raymond Ideology, politics and journalism, which luxuriate in failure, are impotent in the face of hope and joy. -- P. J. O'Rourke From guido at digicool.com Wed Jan 24 17:59:00 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 11:59:00 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: Your message of "Wed, 24 Jan 2001 10:44:17 CST." <14959.1633.163407.779930@beluga.mojam.com> References: <14958.60314.482226.825611@beluga.mojam.com> <04de01c08617$f56216f0$e46940d5@hagrid> <14959.1633.163407.779930@beluga.mojam.com> Message-ID: <200101241659.LAA27650@cj20424-a.reston1.va.home.com> > Fredrik> I think the correct answer is "sometimes": > > Fredrik> ANSI C mandates LC_ALL, LC_COLLATE, LC_CTYPE, > Fredrik> LC_MONETARY, LC_NUMERIC, and LC_TIME > > Fredrik> Unix mandates LC_ALL, LC_COLLATE,LC_CTYPE, > Fredrik> LC_MESSAGES, LC_MONETARY, LC_NUMERIC, and > Fredrik> LC_TIME > > Fredrik> in other words, if it's supported, it should be exposed by > Fredrik> the Python bindings. > > Then this suggests that either Tim's hack is the correct fix (leave it out > because we can't rely on it always being there) or I should add it to > __all__ at the bottom of the file if and only if it's present in the > module's namespace. The latter. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Thu Jan 25 18:05:44 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Thu, 25 Jan 2001 19:05:44 +0200 (IST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124084917.C29977@ActiveState.com> References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> Message-ID: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> On Wed, 24 Jan 2001 08:49:17 -0800, Trent Mick wrote: > How will the expected adherence of apps to BROWSER jive with the current (and > poorly understood by me) Windows convention of specifying the "default" > browser somewhere in the registry? The "webbrowser" module should prefer to take the setting from the registry on windows. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From guido at digicool.com Wed Jan 24 18:17:09 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 24 Jan 2001 12:17:09 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: Your message of "Thu, 25 Jan 2001 02:50:13 +0200." <20010125005013.58C12A840@darjeeling.zadka.site.co.il> References: <200101241541.KAA27082@cj20424-a.reston1.va.home.com>, <200101240252.PAA02105@s454.cosc.canterbury.ac.nz> <20010124140708.2B6A2A83E@darjeeling.zadka.site.co.il> <20010125005013.58C12A840@darjeeling.zadka.site.co.il> Message-ID: <200101241717.MAA27852@cj20424-a.reston1.va.home.com> > I meant that the dictionary would keep a slot for "the one and only > value". First time someone puts a value in the dict, it puts it > in the "one and only value" slot, and doesn't initalize the value > array. The second time someone puts a value, it checks for pointer > equality with that "one and only value". If it is the same, it > it still doesn't initalize the value array. The only time when > the dictionary initalizes the value array is when two pointer-different > values are put in. > > This would let me code > > a[key] = None > > For my sets (but consistent in the same set!) > > a[key] = 1 > > When the timbot codes (again, consistent in the same set) > > and > > a[key] = 'present' > > If you're really weird. > > (identifier-like strings get interned) > > That's not *semantics*, that's *optimization* for a commonly > used (I think) idiom with dictionaries -- you can't predict > the value, but it will probably remain the same. This I like! But note that a dict currently uses 12 bytes per slot in the hash table (on a 32-bit platform: long me_hash; PyObject *me_key, *me_value). The hash table's fill factor is typically between 50 and 67%. I think removing the hashes would slow down lookups too much, so optimizing identical values out would only save 6-8 bytes per existing key on average. Not clear if it's worth enough. I think I have to agree with Tim's expectation that two (or three) separate parallel arrays will reduce the cache locality and thus slow things down. Once you start probing, you jump through the hashtable at large random strides, causing bad cache performance (for largeish hash tables); but since often enough the first slot tried is right, you have the hash, key and value right next together, typically on the same cache line. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Wed Jan 24 18:31:55 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 12:31:55 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 25, 2001 at 07:05:44PM +0200 References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> Message-ID: <20010124123155.A15203@thyrsus.com> Moshe Zadka : > > How will the expected adherence of apps to BROWSER jive with the > > current (and poorly understood by me) Windows convention of > > specifying the "default" browser somewhere in the registry? > > The "webbrowser" module should prefer to take the setting from the > registry on windows. Um, that's not the way it works right now. The windows-default browser choice launches the registered default browser, but BROWSER may have something else in its search list first. -- Eric S. Raymond The real point of audits is to instill fear, not to extract revenue; the IRS aims at winning through intimidation and (thereby) getting maximum voluntary compliance -- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980 From esr at thyrsus.com Wed Jan 24 18:52:11 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 12:52:11 -0500 Subject: [Python-Dev] BROWSER status Message-ID: <20010124125211.A15276@thyrsus.com> I spent the morning writing and testing patches to make urlview and GNU Emacs BROWSER-aware, and have sent them off to the relevant maintainers. I've also sent a patch to Andries Brouwer for the environ(5) man page. Those of you interested in my latest bit of social engineering can take a look at http://www.tuxedo.org/~esr/BROWSER/ A bow in Guido's direction -- if he hadn't been grouchy about this I probably wouldn't have gotten to shipping those patches for a while. -- Eric S. Raymond A right is not what someone gives you; it's what no one can take from you. -- Ramsey Clark From thomas at xs4all.net Wed Jan 24 19:33:27 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 24 Jan 2001 19:33:27 +0100 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il>; from moshez@zadka.site.co.il on Thu, Jan 25, 2001 at 07:05:44PM +0200 References: <20010124084917.C29977@ActiveState.com>, <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010124084917.C29977@ActiveState.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> Message-ID: <20010124193326.B962@xs4all.nl> On Thu, Jan 25, 2001 at 07:05:44PM +0200, Moshe Zadka wrote: > On Wed, 24 Jan 2001 08:49:17 -0800, Trent Mick wrote: > > > How will the expected adherence of apps to BROWSER jive with the current (and > > poorly understood by me) Windows convention of specifying the "default" > > browser somewhere in the registry? > The "webbrowser" module should prefer to take the setting from the > registry on windows. Why ? That's a lot harder to change, and not settable per 'shell'/'thread'/'process'. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed Jan 24 20:54:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 14:54:47 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124115331.A15059@thyrsus.com> Message-ID: Guys, while I like BROWSER, don't think it has anything to do with Windows! Windows is not Unix; doesn't have PAGER or EDITOR either; and, in general, use of envars is an abomination under Windows. The old webbrowser.py uses the Windows-specific os.startfile(url) because that's the *right* way to do it on Windows, wizard or not. And you would have to be a Windows wizard to succeed in launching a browser under Windows in any other way anyway. You may as well try to sell the notion that, on Unix, Python should maintain a dict mapping file extensions to the user's preferred ways of opening such files <0.9 wink>. From tim.one at home.com Wed Jan 24 20:56:32 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 14:56:32 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124193326.B962@xs4all.nl> Message-ID: >> The "webbrowser" module should prefer to take the setting from the >> registry on windows. > Why ? That's a lot harder to change, and not settable per > 'shell'/'thread'/'process'. A Windows user has a legitimate expectation that *every* time an .html file is opened, it will come up in their browser of choice. That choice is made via the registry, and this is how *all* apps work under Windows. Ditto for .htm files (and that may be a different browser than is used for .html files, but again the user has set up their registry to do what *they* want done with it). It's not supposed to be easy to change; it is supposed to be consistent. Using a different browser per shell/thread/process is a foreign concept; it's also a useless concept on Windows <0.5 wink>. From tim.one at home.com Wed Jan 24 21:32:35 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 15:32:35 -0500 Subject: LC_MESSAGES (was Re: [Python-Dev] test___all__ failing; Windows) In-Reply-To: Message-ID: [Peter Funk] > ... > AFAI found out, LC_MESSAGES was added to the POSIX "standard" in Posix.2. FYI, it appears that C99 declined to adopt this extension to C89, but don't know why (the C99 Rationale doesn't mention it). That means the vendors who don't already support it can (well, *will*) use the new C99 std as "a reason" to continue leaving it out. From tim.one at home.com Wed Jan 24 21:15:28 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 15:15:28 -0500 Subject: [Python-Dev] test___all__ failing; Windows In-Reply-To: <14959.1633.163407.779930@beluga.mojam.com> Message-ID: [Skip] > Then this suggests that either Tim's hack is the correct fix (leave it out > because we can't rely on it always being there) or I should add it to > __all__ at the bottom of the file if and only if it's present in the > module's namespace. What you suggest at the end *is* the hack I checked in. That is, it's already done. The existence of LC_MESSAGES is clearly platform-specific; if anyone can say for sure a priori *which* platforms it's available on, tell Fred Drake so he can update the docs accordingly. From skip at mojam.com Wed Jan 24 22:25:45 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 24 Jan 2001 15:25:45 -0600 (CST) Subject: [Python-Dev] webbrowser.py In-Reply-To: <20010124123155.A15203@thyrsus.com> References: <20010124084917.C29977@ActiveState.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> <20010124123155.A15203@thyrsus.com> Message-ID: <14959.18521.648454.488731@beluga.mojam.com> >>>>> "Eric" == Eric S Raymond writes: Moshe Zadka : >> The "webbrowser" module should prefer to take the setting from the >> registry on windows. Eric> Um, that's not the way it works right now. The windows-default Eric> browser choice launches the registered default browser, but Eric> BROWSER may have something else in its search list first. Why not have a special REGISTRY token you can place in the BROWSER path to tell it when to consult the registry? On non-Windows platforms it can simply be ignored: BROWSER=netscape:REGISTRY:explorer Skip From esr at thyrsus.com Wed Jan 24 22:30:44 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 16:30:44 -0500 Subject: [Python-Dev] webbrowser.py In-Reply-To: <14959.18521.648454.488731@beluga.mojam.com>; from skip@mojam.com on Wed, Jan 24, 2001 at 03:25:45PM -0600 References: <20010124084917.C29977@ActiveState.com> <200101240346.WAA06790@cj20424-a.reston1.va.home.com> <20010125170544.A4C7DA840@darjeeling.zadka.site.co.il> <20010124123155.A15203@thyrsus.com> <14959.18521.648454.488731@beluga.mojam.com> Message-ID: <20010124163044.A15877@thyrsus.com> Skip Montanaro : > Why not have a special REGISTRY token you can place in the BROWSER path to > tell it when to consult the registry? On non-Windows platforms it can > simply be ignored: In effect, windows-default is that special token. -- Eric S. Raymond The Bible is not my book, and Christianity is not my religion. I could never give assent to the long, complicated statements of Christian dogma. -- Abraham Lincoln From martin at mira.cs.tu-berlin.de Wed Jan 24 22:41:11 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 24 Jan 2001 22:41:11 +0100 Subject: [Python-Dev] Tkinter documentation (Was: What does "batteries are included" mean?) Message-ID: <200101242141.f0OLfBT01812@mira.informatik.hu-berlin.de> > It's already a blot on Python that the standard documentation set > doesn't cover Tkinter. Just point your friendly web browser to Ping's HTML generator and ask for Tkinter, or invoke "pydoc.py Tkinter". [I wouldn't have brought this up if it hadn't been the contribution of my friend Nils Fischbeck:-] Regards, Martin From nas at arctrix.com Wed Jan 24 16:31:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 24 Jan 2001 07:31:55 -0800 Subject: [Python-Dev] Makefile changes Message-ID: <20010124073155.B32266@glacier.fnational.com> I've checked in my new makefile. Hopefully everything goes well. The following files are no longer used so please don't patch them: Grammar/Makefile.in Include/Makefile Lib/Makefile Modules/Makefile.pre.in Objects/Makefile.in Parser/Makefile.in Python/Makefile.in Makefile.in They will be removed in a few days assuming all goes well. You should re-run configure to use the new makefile. I would appreciate it if people using platforms other than Linux and GNU make could give me some feedback on the build process. Does configure and make work okay? Does "make test" and "make install" work? Thanks. Neil From greg at cosc.canterbury.ac.nz Wed Jan 24 23:55:00 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Jan 2001 11:55:00 +1300 (NZDT) Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101240354.WAA06903@cj20424-a.reston1.va.home.com> Message-ID: <200101242255.LAA02208@s454.cosc.canterbury.ac.nz> Guido: > But shouldn't the default value be something else, > like none? It should really be whatever is the first value that gets stored after the dict is created. That way people can use whatever they want for their dummy value and it will Just Work. And it will probably catch most existing uses of a dict as a set as well. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From ping at lfw.org Wed Jan 24 21:33:43 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 24 Jan 2001 12:33:43 -0800 (PST) Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! Message-ID: Hi -- after updating my CVS tree today with Python 2.1a1, i ran the tests and test_inspect failed. This revealed that the format of code.co_varnames has changed. At first i tried to update the inspect.py module to check the Python version number and track the change, but now i believe this is actually symptomatic of a real interpreter problem. Consider the function: def f(a, (b, c), *d): x = 1 print a, b, c, d, x Whereas in Python 1.5.2: f.func_code.co_argcount = 2 f.func_code.co_nlocals = 6 f.func_code.co_names = ('x', 'a', 'b', 'c', 'd') f.func_code.co_varnames = ('a', '.2', 'd', 'b', 'c', 'x') In Python 2.1a1: f.func_code.co_argcount = 2 f.func_code.co_nlocals = 6 f.func_code.co_names = ('b', 'c', 'x', 'a', 'd') f.func_code.co_varnames = ('a', '.2', 'b', 'c', 'd', 'x') Notice how the ordering of the variable names has changed. I went and looked at the CO_VARARGS clause in eval_code2 to see if it put the varargs and kwdict arguments in different slots, but it appears unchanged! It still puts varargs at locals[co_argcount] and kwdict at locals[co_argcount + 1]. Please try: >>> def f(a, (b, c), *d): ... x = 1 ... print a, b, c, d, x ... >>> f(1, (2, 3), 4) 1 2 3 Traceback (most recent call last): File "", line 1, in ? File "", line 3, in f UnboundLocalError: local variable 'd' referenced before assignment >>> In Python 1.5.2, this prints "1 2 3 (4,)" as expected. I only have 1.5.2 and 2.1a1 to test. I hope this problem isn't present in 2.0... Note that test_inspect was the only test to fail! It might be the only test that checks anonymous and *varargs at the same time. (Yet another reason to put inspect in the core...) I did recently check in additions to test_extcall that made the test much beefier -- but that only tested combinations of regular, keyword, varargs, and kwdict arguments; it neglected to test anonymous (tuple) arguments as well. -- ?!ng From tim.one at home.com Thu Jan 25 00:56:25 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 18:56:25 -0500 Subject: [Python-Dev] Re: test___all__ failing; Windows Message-ID: > In that case, the __all__ attribute in the module has to be calculated > dynamically. Say, adding code like > > try: > LC_MESSAGES > except NameError: > pass > else: > __all__.append('LC_MESSAGES') > > Ditto for anything else. > > Should I check in a patch? SourceForge CVS doesn't appear to be broken, so I can only conclude everyone decided this was a bad to stop taking drugs <0.9 wink>. From tim.one at home.com Thu Jan 25 01:04:50 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 19:04:50 -0500 Subject: [Python-Dev] (no subject) Message-ID: [Skip] > Why not have a special REGISTRY token you can place in the BROWSER > path to tell it when to consult the registry? On non-Windows > platforms it can simply be ignored: > > BROWSER=netscape:REGISTRY:explorer Because non-Windows platforms shouldn't be bothered with Windows silliness any more than Windows users should be bothered with Unix silliness. BROWSER isn't of any use on Windows, and REGISTRY isn't of any use on Unix. Eric may still *think* BROWSER is of use on Windows, but if so that's not really a technical problem . From thomas at xs4all.net Thu Jan 25 01:25:54 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 01:25:54 +0100 Subject: [Python-Dev] Makefile changes In-Reply-To: <20010124073155.B32266@glacier.fnational.com>; from nas@arctrix.com on Wed, Jan 24, 2001 at 07:31:55AM -0800 References: <20010124073155.B32266@glacier.fnational.com> Message-ID: <20010125012554.F962@xs4all.nl> On Wed, Jan 24, 2001 at 07:31:55AM -0800, Neil Schemenauer wrote: > I would appreciate it if people using platforms other than Linux > and GNU make could give me some feedback on the build process. > Does configure and make work okay? Does "make test" and "make > install" work? Thanks. Only have time for a quick check now, and no time what so ever tomorrow, but at first glance, it looks okay (read: it compiles Python) on BSDI 4.0.1, BSDI 4.1 and FreeBSD 4.2. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From esr at thyrsus.com Thu Jan 25 01:15:10 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 24 Jan 2001 19:15:10 -0500 Subject: [Python-Dev] (no subject) In-Reply-To: ; from tim.one@home.com on Wed, Jan 24, 2001 at 07:04:50PM -0500 References: Message-ID: <20010124191510.A17782@thyrsus.com> Tim Peters : > Because non-Windows platforms shouldn't be bothered with Windows silliness > any more than Windows users should be bothered with Unix silliness. BROWSER > isn't of any use on Windows, and REGISTRY isn't of any use on Unix. Eric > may still *think* BROWSER is of use on Windows, but if so that's not really > a technical problem . Actually that's not something I have an opinion on. I addressed the original question because I know it would be technically possible to set a BROWSER variable under Windows. Yes, an unlikely move, but possible. -- Eric S. Raymond A man who has nothing which he is willing to fight for, nothing which he cares about more than he does about his personal safety, is a miserable creature who has no chance of being free, unless made and kept so by the exertions of better men than himself. -- John Stuart Mill, writing on the U.S. Civil War in 1862 From tim.one at home.com Thu Jan 25 05:38:54 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 24 Jan 2001 23:38:54 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <3A6EE944.C8CC6EF7@tismer.com> Message-ID: [Christian Tismer] > ... > Not sure if hashes and keys should be apart, but > sure for values. How so? That is, under what assumptions? Any savings from separation would appear to require that I look up keys a lot more than I access the associated values; while trivially true for dicts used as sets, it seems dubious to me for use of dicts as mappings (count[word] += 1, etc). From Jason.Tishler at dothill.com Thu Jan 25 07:09:47 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Thu, 25 Jan 2001 01:09:47 -0500 Subject: [Python-Dev] Re: Python 2.1 alpha 1 released! In-Reply-To: <200101230333.WAA28376@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 22, 2001 at 10:33:02PM -0500 References: <200101230333.WAA28376@cj20424-a.reston1.va.home.com> Message-ID: <20010125010947.M1256@dothill.com> On Mon, Jan 22, 2001 at 10:33:02PM -0500, Guido van Rossum wrote: > - Python should now build out of the box on Cygwin. If it doesn't, > mail to Jason Tishler (jlt63 at users.sourceforge.net). Although Python CVS built OOTB under Cygwin until 2001/01/17 18:54:54, Python 2.1a1 needs a small patch in order to build cleanly under Cygwin. If interested, please see the following for details: http://www.cygwin.com/ml/cygwin-apps/2001-01/msg00019.html Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From tim.one at home.com Thu Jan 25 08:29:19 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 02:29:19 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <200101231549.KAA05172@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > It's no big deal if the Vaults contain three or more set modules -- > perfect even, people can choose the best one for their purpose. They really can't, not realistically, unless all the modules in question conform to the same interface (which users can't control), and users restrict themselves to methods defined only in the interface (which users can control). The problem is that "their purpose" changes over time, and in some cases the effects of representation on performance simply can't be out-guessed in advance of actual measurement. If people need to change any more than just the import statement, *then* a single implementation has to be all things to all people. I hate to say this (bet ?), but I suspect the fact that Python's basic types are all builtin and not classes has kept us from fully appreciating the class-based "1 interface, N implementations" approach that C++ and Java hackers are having so much fun with. They're not all that easy to find, but people who have climbed the steep STL learning curve often end up in the same ecstatic trance I used to see only among fellow Pythoneers. > But in the core, there's only room for one set type or module. I don't like the conclusion: it implies there's no room in the core for more than one implementation of anything, yet one-size-fits-all doesn't. I have no problem with the idea that there's only room for one Set *interface* in the core. Then you only need Pronounce on a reasonable set of abstract operations, and leave the implementation tradeoffs to be made by different people in different ways (I've really got no use for Eric's list-based sets; he's really got no use for my sets-of-sets). That said, if there can be at most one, and must be at least one, a hashtable based set is the best compromise there is, and mutable objects as elements should not be supported (they add great implementation complexity for the benefit of relatively few applications). jeremy's-set-class-couldn't-be-accused-of-overkill-ly y'rs - tim From tim.one at home.com Thu Jan 25 08:57:18 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 02:57:18 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <20010123113050.A26162@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > What you get by going with a dictionary representation is that > membership test becomes close to constant-time, while insertion and > deletion become sometimes cheap and sometimes quite expensive > (depending of course on whether you have to allocate a new > hash bucket). Note that Python's dicts aren't vulnerable to that: they use open addressing in a contiguous, preallocated vector. There are no mallocs() or free()s going on for lookups, deletes, or inserts, unless an insert happens to hit a "time to double the size of the vector" boundary. Deletes never cost more than a lookup; inserts never more unless the table-size boundary is hit (one in 2**N unique inserts, at which point N goes up too). > ... > "works for everbody" isn't really possible here. So my solution > does the next best thing -- pick a choice of tradeoffs that isn't > obviously worse than the alternatives and keeps things bog-simple. I agree that this shouldn't be an either/or choice, but if it's going to be forced into that mold I have to protest that the performance of unordered lists would kill most of the set applications I've ever had. I typically have a small number of very large sets (and I'm talking not 100s, but often 100s of 1000s of elements). The relatively large memory burden of a dict representation wouldn't bother me unless I instead had 100s of 1000s of very small sets. which-we-may-happen-in-my-next-life-but-not-in-this-one-ly y'rs - tim From tim.one at home.com Thu Jan 25 09:08:30 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 03:08:30 -0500 Subject: [Python-Dev] I think my set module is ready for prime time; comments? In-Reply-To: <001101c0857a$c0dce420$770a0a0a@nevex.com> Message-ID: [Greg Wilson] > ... > Unfortunately, if values are required to be immutable, then sets of > sets aren't possible... :-( Sure they are. I wrote about how before, and Moshe put up a simple implementation as a SourceForge patch. Not bulletproof, though: "consentng adults". No matter *what* you implement, I'll find *some* way to trick it into believing my sets are immutable , so don't worry about that. Bulletproof is very hard, and is a minority distraction at best. IIRC, SETL had "by value" semantics when inserting a set into another set as an element, and had some exceedingly hairy copy-on-write scheme under the covers to make that bearably quick. That may be wrong, though. Herman Venter's Slim (Sets, Lists and Maps) language does work that way (Guido, Herman was a friend of the departed Stoffel Erasmus, who you may recall fondly from Python's very early days -- if *that* doesn't make sets attractive to you, nothing will ). Ah! Meant to post this before: http://birch.eecs.lehigh.edu/~bacon/setlprog.ps.gz That's a readable and very good intro to SETL Classic. People pondering computerized sets should at least catch up with what was common knowledge 30 years ago . From thomas at xs4all.net Thu Jan 25 10:24:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 10:24:24 +0100 Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! In-Reply-To: ; from ping@lfw.org on Wed, Jan 24, 2001 at 12:33:43PM -0800 References: Message-ID: <20010125102424.G962@xs4all.nl> On Wed, Jan 24, 2001 at 12:33:43PM -0800, Ka-Ping Yee wrote: > Please try: > >>> def f(a, (b, c), *d): > ... x = 1 > ... print a, b, c, d, x > ... > >>> f(1, (2, 3), 4) > 1 2 3 > Traceback (most recent call last): > File "", line 1, in ? > File "", line 3, in f > UnboundLocalError: local variable 'd' referenced before assignment > >>> > In Python 1.5.2, this prints "1 2 3 (4,)" as expected. > I only have 1.5.2 and 2.1a1 to test. I hope this problem > isn't present in 2.0... It isn't present in 2.0. This is probably related to Jeremy's changes in the call mechanism or the compiler track, though Jeremy himself is the best person to claim that for sure :) > Note that test_inspect was the only test to fail! It might be the > only test that checks anonymous and *varargs at the same time. > (Yet another reason to put inspect in the core...) Well, this is not an inspect-specific test, so it shouldn't *be* in test_inspect, it should be in test_extcall :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Thu Jan 25 10:45:31 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 10:45:31 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/lib libwinsound.tex,1.5,1.6 References: Message-ID: <003801c086b3$8ff41560$e46940d5@hagrid> tim accidentally wrote: > \versionadded{1.5.3} % XXX fix this version number when release is scheduled! 1.5.3? time for a 1.5.3 => 1.6 query replace? > fgrep 1.5.3 doc/*/*.tex doc/lib/libcmp.tex:\deprecated{1.5.3}{Use the \module{filecmp} module inste doc/lib/libcmpcache.tex:\deprecated{1.5.3}{Use the \module{filecmp} module ad.} doc/lib/libwinsound.tex: \versionadded{1.5.3} % XXX fix this version number or am I missing something? Cheers /F From tim.one at home.com Thu Jan 25 12:20:18 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 06:20:18 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/lib libwinsound.tex,1.5,1.6 In-Reply-To: <003801c086b3$8ff41560$e46940d5@hagrid> Message-ID: Gotta ask Fred about this one! > or am I missing something? Yes, the Python 1.5.3 release. I use it all the time . From tismer at tismer.com Thu Jan 25 13:22:32 2001 From: tismer at tismer.com (Christian Tismer) Date: Thu, 25 Jan 2001 14:22:32 +0200 Subject: [Python-Dev] Intended to work? (lambda x,y:map(eval, ["x", "y"]))(2,3) Message-ID: <3A701A88.F2C68635@tismer.com> In a function like this: def f(x): return eval("x") , eval uses the local function namespace, and the above works. This is according to chapter 2.3 of the Python library ref. Now on my problem: When eval() is used with map, the same mechanism takes place: def f(x): return map(eval,["x"]) It works the same as the above, because map is a builtin function that does not modify the frame chain, so eval finds the local namespace. Not so with Stackless Python (at the moment), since Stackless map assigns an own frame to map without passing the correct namespaces to it. (Reported by Bernd Rinn) Question: Is this by chance, or is eval() *meant* to function with the local namespace, even if it is executed in the context of a function like map() ? The description of map() does not state whether it has to pass its surrounding namespace to the mapped function, and if one simulates map() by writing one's own python implementation, it will fail exactly like Stackless does today. The same applies to apply(). I think I should fix Stackless here, anyway? ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at digicool.com Thu Jan 25 14:35:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 08:35:12 -0500 Subject: [Python-Dev] Re: Intended to work? (lambda x,y:map(eval, ["x", "y"]))(2,3) In-Reply-To: Your message of "Thu, 25 Jan 2001 14:22:32 +0200." <3A701A88.F2C68635@tismer.com> References: <3A701A88.F2C68635@tismer.com> Message-ID: <200101251335.IAA16713@cj20424-a.reston1.va.home.com> > In a function like this: > > def f(x): > return eval("x") > > , eval uses the local function namespace, and the above works. > This is according to chapter 2.3 of the Python library ref. > > Now on my problem: When eval() is used with map, the same > mechanism takes place: > > def f(x): > return map(eval,["x"]) > > It works the same as the above, because map is a builtin function > that does not modify the frame chain, so eval finds the local > namespace. > Not so with Stackless Python (at the moment), since Stackless map > assigns an own frame to map without passing the correct namespaces > to it. (Reported by Bernd Rinn) > > Question: Is this by chance, or is eval() *meant* to function with > the local namespace, even if it is executed in the context of > a function like map() ? Map, being a built-in, is transparent to namespaces. > The description of map() does not state whether it has to pass > its surrounding namespace to the mapped function, and if one > simulates map() by writing one's own python implementation, > it will fail exactly like Stackless does today. The same > applies to apply(). So you can't simulate a built-in. > I think I should fix Stackless here, anyway? Yes. Note: beware of Jeremy's nested scopes. That adds a whole slew of namespaces! (But eval() is more crippled there.) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Thu Jan 25 16:20:45 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 10:20:45 -0500 (EST) Subject: [Python-Dev] Anonymous + varargs: possible serious breakage -- please confirm! In-Reply-To: <20010125102424.G962@xs4all.nl> References: <20010125102424.G962@xs4all.nl> Message-ID: <14960.17485.549337.5476@localhost.localdomain> >>>>> "TW" == Thomas Wouters writes: TW> On Wed, Jan 24, 2001 at 12:33:43PM -0800, Ka-Ping Yee wrote: >> Please try: >> >>> def f(a, (b, c), *d): >> ... x = 1 ... print a, b, c, d, x ... >> >>> f(1, (2, 3), 4) >> 1 2 3 Traceback (most recent call last): File "", line 1, >> in ? File "", line 3, in f UnboundLocalError: local >> variable 'd' referenced before assignment >> >>> >> In Python 1.5.2, this prints "1 2 3 (4,)" as expected. >> I only have 1.5.2 and 2.1a1 to test. I hope this problem isn't >> present in 2.0... TW> It isn't present in 2.0. This is probably related to Jeremy's TW> changes in the call mechanism or the compiler track, though TW> Jeremy himself is the best person to claim that for sure :) The bug is in the compiler. It creates varnames while it is parsing the argument list. While I got the handling of the anonymous tuples right, I forgot to insert *varargs or **kwargs in varnames *before* the names defined in the tuple. I will fix it real soon now. >> Note that test_inspect was the only test to fail! It might be >> the only test that checks anonymous and *varargs at the same >> time. (Yet another reason to put inspect in the core...) TW> Well, this is not an inspect-specific test, so it shouldn't *be* TW> in test_inspect, it should be in test_extcall :) It should probably be in test_grammar. The ext call mechanism is only invoked when the caller uses a form like 'f(*arg)'. Perhaps the name "ext call" isn't very clear. Jeremy From esr at thyrsus.com Thu Jan 25 17:19:36 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 11:19:36 -0500 Subject: [Python-Dev] Waiting method for file objects Message-ID: <20010125111936.A23512@thyrsus.com> I have been researching the question of how to ask a file descriptor how much data it has waiting for the next sequential read, with a view to discovering what cross-platform behavior we could count on for a hypothetical `waiting' method in Python's built-in file class. 1: Why bother? I have these main applications in mind: 1. Detecting EOF on a static plain file. 2. Non-blocking poll of a socket opened in non-blocking mode. 3. Non-blocking poll of a FIFO opened in non-blocking mode. 4. Non-blocking poll of a terminal device opened in non-blocking mode. These are all frequently requested capabilities on C newsgroups -- how often have *you* seen the "how do I detect an individual keypress" question from beginning programmers? I believe having these capabilities would substantially enhance Python's appeal. 2: What would be under the hood? Summary: We can do this portably, and we can do it with only one (1) new #ifdef. Our tools for this purpose will be the fstat(2) st_size field and the FIONREAD ioctl(2) call. They are complementary. In all supposedly POSIX-conformant environments I know of, the st_size field has a documented meaning for plain files (S_IFREG) and may or may not give a meaningful number for FIFOs, sockets, and tty devices. The Single Unix Specification is silent on the meaning of st_size for file types other than regular files (S_IFREG). I have filed a defect report about this with OpenGroup and am discussing appropriate language with them. (The last sentence of the Inferno operating system's language on stat(2) is interesting: "If the file resides on permanent storage and is not a directory, the length returned by stat is the number of bytes in the file. For directories, the length returned is zero. Some devices report a length that is the number of bytes that may be read from the device without blocking.") The FIONREAD ioctl(2) call, on the other hand, returns bytes waiting on character devices such as FIFOs, sockets, or ttys -- but does not return a useful value for files or directories or block devices. The FIONREAD ioctl was supported in both SVr4 and 4.2BSD. It's present in all the open-source Unixes, SunOS, Solaris, and AIX. Via Google search I have discovered that it's also supported in the Windows Sockets API and the GUSI POSIX libraries for the Macintosh. Thus, it can be considered portable for Python's purposes even though it's rather sparsely documented. I was able to obtain confirming information on Linux from Linus Torvalds himself. My information on Windows and the Mac is from Gavriel State, formerly a lead developer on Corel's WINE team and a programmer with extensive cross-platform experience. Gavriel reported on the MSCRT POSIX environment, on the Metrowerks Standard Library POSIX implementation for the Mac, and on the GUSI POSIX implementation for the Mac. 2.1: Plain files Torvalds and State confirm that for plain files (S_IFREG) the st_size field is reliable on all three platforms. On the Mac it gives the file's data fork size. One apparent difficulty with the plain-file case is that POSIX does not guarantee anything about seek_t quantities such as lseek(2) returns and the st_size field except that they can be compared for equality. Thus, under the strict letter of POSIX law, `waiting' can be used to detect EOF but not to get a reliable read-size return in any other file position. Fortunately, this is less an issue than it appears. The weakness of the POSIX language was a 1980s-era concession to a generation of mainframe operating systems with record-oriented file structures -- all of which are now either thoroughly obsolete or (in the case of IBM VM/CMS) have become Linux emulators :-). On modern operating systems under which files have character granularity, stat(2) emulations can be and are written to give the right result. 2.2: Block devices The directory case (S_IFDIR) is a complete loss. Under Unixes, including Linux, the fstat(2) size field gives the allocated size of the directory as if it were a plain file. Under MSCRT POSIX the meaning is undocumented and unclear. Metroworks returns garbage. GUSI POSIX returns the number of files in the directory! FIONREAD cannot be used on directories. Block devices (S_IFBLK) are a mess again. Linus points out that a system with removable or unmountable volumes *cannot* return a useful st_size field -- what happens when the device is dismounted? 2.3: Character devices Pipes and FIFOs (S_IFIFO) look better. On MSCRT the fstat(2) size field returns the number of bytes waiting to be read. This is also true under current Linuxes, though Torvalds says it is "an implementation detail" and recommends polling with the FIONREAD ioctl instead. Fortunately, FIONREAD is available under Unix, Windows, and the Mac. Sockets (S_IFSOCK) look better too. Under Linux, the fstat(2) size field gives number of bytes waiting. Torvalds again says this is "an implementation detail" and recommends polling with the FIONREAD ioctl. Neither MSCRT POSIX nor Metroworks has direct support for sockets. GUSI POSIX returns 1 (!) in the st_size field. But FIONREAD is available under Unix, Windows, and the GUSI POSIX libraries on the Mac. Character devices (S_IFCHR) can be polled with FIONREAD. This technique has a long history of use with tty devices under Unix. I don't know whether it will work with the equivalents of terminal devices for Windows and the Mac. Fortunately this is not a very important question, as those are GUI environments with the terminal devices are rarely if ever used. 3. How does this turn into Python? The upshot of our portability analysis is that by using FIONREAD and fstat(2), we can get useful results for plain files, pipes, and sockets on all three platforms. Directories and block devices are a complete loss. Character devices (in particular, ttys) we can poll reliably under Unix. What we'll get polling the equivalents of tty or character devices under Windows and the Mac is presently unknown, but also unimportant. My proposed semantics for a Python `waiting' method is that it reports the amount of data that would be returned by a read() call at the time of the waiting-method invocation. The interpreter throws OSError if such a report is impossible or forbidden. I have enclosed a patch against the current CVS sources, including documentation. This patch is tested and working against plain files, sockets, and FIFOs under Linux. I have also attached the Python test program I used under Linux. I would appreciate it if those of you on Windows and Macintosh machines would test the waiting method. The test program will take some porting, because it needs to write to a FIFO in background. Under Linux I do it this way: (echo -n '%s' >testfifo; echo 'Data written to FIFO.') & I don't know how to do the equivalent under Windows or Mac. When you run this program, it will try to mail me your test results. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address -------------- next part -------------- Index: fileobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/fileobject.c,v retrieving revision 2.108 diff -c -r2.108 fileobject.c *** fileobject.c 2001/01/18 03:03:16 2.108 --- fileobject.c 2001/01/25 16:16:10 *************** *** 35,40 **** --- 35,44 ---- #include #endif + #ifndef DONT_HAVE_IOCTL_H + #include + #endif + typedef struct { PyObject_HEAD *************** *** 423,428 **** --- 427,513 ---- } static PyObject * + file_waiting(PyFileObject *f, PyObject *args) + { + struct stat stbuf; + #ifdef HAVE_FSTAT + int ret; + #endif + + if (f->f_fp == NULL) + return err_closed(); + if (!PyArg_NoArgs(args)) + return NULL; + #ifndef HAVE_FSTAT + PyErr_SetString(PyExc_OSError, "fstat(2) is not available."); + clearerr(f->f_fp); + return NULL; + #else + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = fstat(fileno(f->f_fp), &stbuf); + Py_END_ALLOW_THREADS + if (ret == -1) { /* the fstat failed */ + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } else if (S_ISDIR(stbuf.st_mode) || S_ISBLK(stbuf.st_mode)) { + PyErr_SetString(PyExc_IOError, + "Can't poll a block device or directory."); + clearerr(f->f_fp); + return NULL; + } else if (S_ISREG(stbuf.st_mode)) { /* plain file */ + #if defined(HAVE_LARGEFILE_SUPPORT) && SIZEOF_OFF_T < 8 && SIZEOF_FPOS_T >= 8 + fpos_t pos; + #else + off_t pos; + #endif + Py_BEGIN_ALLOW_THREADS + errno = 0; + pos = _portable_ftell(f->f_fp); + Py_END_ALLOW_THREADS + if (pos == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + #if !defined(HAVE_LARGEFILE_SUPPORT) + return PyInt_FromLong(stbuf.st_size - pos); + #else + return PyLong_FromLongLong(stbuf.st_size - pos); + #endif + } else if (S_ISFIFO(stbuf.st_mode) + || S_ISSOCK(stbuf.st_mode) + || S_ISCHR(stbuf.st_mode)) { /* stream device */ + #ifndef FIONREAD + PyErr_SetString(PyExc_OSError, + "FIONREAD is not available."); + clearerr(f->f_fp); + return NULL; + #else + int waiting; + + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = ioctl(fileno(f->f_fp), FIONREAD, &waiting); + Py_END_ALLOW_THREADS + if (ret == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + + return Py_BuildValue("i", waiting); + #endif /* FIONREAD */ + } else { /* should never happen! */ + PyErr_SetString(PyExc_OSError, "Unknown file type."); + clearerr(f->f_fp); + return NULL; + } + #endif /* HAVE_FSTAT */ + } + + static PyObject * file_fileno(PyFileObject *f, PyObject *args) { if (f->f_fp == NULL) *************** *** 1263,1268 **** --- 1348,1354 ---- {"truncate", (PyCFunction)file_truncate, 1}, #endif {"tell", (PyCFunction)file_tell, 0}, + {"waiting", (PyCFunction)file_waiting, 0}, {"readinto", (PyCFunction)file_readinto, 0}, {"readlines", (PyCFunction)file_readlines, 1}, {"xreadlines", (PyCFunction)file_xreadlines, 1}, -------------- next part -------------- #!/usr/bin/env python import sys, os, random, string, time, socket, smtplib, readline print "This program tests the `waiting' method of file objects." fp = open("waiting_test.py") if hasattr(fp, "waiting"): print "Good, you're running a patched Python with `waiting' available." else: print "You haven't installed the `waiting' patch yet. This won't work." sys.exit(1) successes = "" failures = "" nogo = "" print "" print "First, plain files:" filesize = fp.waiting() print "There are %d bytes waiting to be read in this file." % filesize if os.name == 'posix': os.system("ls -l waiting_test.py") print "That should match the number in the ls listing above." else: print "Please check this with your OS's directory tools." get = random.randrange(fp.waiting()) print "I'll now read a random number (%d) of bytes." % get fp.read(get) print "The waiting method sees %d bytes left." % fp.waiting() if get + fp.waiting() == filesize: print "%d + %d = %d. That's consistent. Test passed." % \ (get, fp.waiting(), filesize) successes += "Plain file random-read test passed.\n" else: print "That's not consistent. Test failed." failures += "Plain file random-read test failed\n" print "Now let's see if we can detect EOF reliably." fp.read() left = fp.waiting() print "I'll do a read()...the waiting method now returns %d" % left if left == 0: print "That looks like EOF." successes += "Plain file EOF test passed.\n" else: print "%d bytes left. Test failed." % left failures += "Plain file EOF test failed\n" fp.close() print "" print "Now sockets:" print "Connecting to imap.netaxs.com's IMAP server now..." sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) file = sock.makefile('rb') sock.connect(("imap.netaxs.com", 143)) print "Waiting a few seconds to avoid a race condition..." time.sleep(3) greetsize = file.waiting() print "There appear to be %d bytes waiting..." % greetsize greeting = file.readline() print "I just read the greeting line..." sys.stdout.write(greeting) if len(greeting) == greetsize: print "...and the size matches. Test passed." successes += "Socket test passed.\n" else: print "That's not right. Test failed." failures += "Socket test failed.\n" sock.close() print "" if not hasattr(os, "mkfifo"): print "Your platform doesn't have FIFOs (mkfifo() is absent), so I can't test them." nogo = "FIFO test could not be performed." else: print "Now FIFOs:" print "I'm making a FIFO named testfifo."; os.mkfifo("testfifo") str = string.letters[:random.randrange(len(string.letters))] print "I'm going to send it the following string '%s' of random length %d:" \ % (str, len(str),) # Note: Unix dependency here! os.system("(echo -n '%s' >testfifo; echo 'Data written to FIFO.') &" % str) fp = open("testfifo", "r") print "Waiting a few seconds to avoid a race condition..." time.sleep(3) ready = fp.waiting() print "I see %d bytes waiting in the FIFO." % ready if ready == len(str): print "That's consistent. Test passed." successes += "FIFO test passed.\n" else: print "That's not consistent. Test failed." failures += "FIFO test failed\n" os.remove("testfifo") print "\nSummary:" report = "Platform is: %s, version is %s\n" % (sys.platform, sys.version) if successes: report += "The following tests succeeded:\n" + successes if failures: report += "The following tests failed:\n" + failures if nogo: report += "The following tests could not be performed:\n" + nogo if not nogo: report += "No tests were skipped.\n" if not failures: report += "All tests succeeded.\n" print report if os.name == 'posix': me = os.environ["USER"] + "@" + socket.getfqdn() else: me = raw_input("Enter your emasil address, please?") try: server = smtplib.SMTP('localhost') report = ("From: %s\nTo: esr at thyrsus.com\nSubject: waiting_test\n\n" % me) + report server.sendmail(me, ["esr at thyrsus.com"], report) server.quit() except: print "The attempt to mail your test result failed.\n" From esr at snark.thyrsus.com Thu Jan 25 17:46:20 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 11:46:20 -0500 Subject: [Python-Dev] Documentation patch for waiting method. Message-ID: <200101251646.f0PGkKM23567@snark.thyrsus.com> Index: libstdtypes.tex =================================================================== RCS file: /cvsroot/python/python/dist/src/Doc/lib/libstdtypes.tex,v retrieving revision 1.50 diff -u -r1.50 libstdtypes.tex --- libstdtypes.tex 2001/01/17 01:18:00 1.50 +++ libstdtypes.tex 2001/01/25 16:46:40 @@ -1142,6 +1142,24 @@ \UNIX{} versions support this operation). \end{methoddesc} +\begin{methoddesc}[file]{waiting}{} + Return the number of bytes waiting to be read from this file object. + For regular files, this returns the size of the file in bytes minus + the current seek address, as would be returned by \method{tell()}; a + zero return can be used to detect EOF. For streams such as FIFOs, + sockets, Unix ttys, and other Unix character devices, this method + returns the number of bytes currently buffered up and waiting to be + read. Attempts to call this method on Unix block devices or + on directories will raise an error. + \footnote{The \method{waiting()} method uses + \cfunction{fstat(2)} and \cfunction{lseek(2)} on plain files; + these should be reliable on all of Unix, Windows, and MacOS. + It uses the FIONREAD ioctl(2) call to query FIFOs, sockets, + Unix ttys, and other POSIX character devices; FIFO and socket + behavior should be consistent across all three platforms, but + the results from querying other character devices may vary.} +\end{methoddesc} + \begin{methoddesc}[file]{write}{str} Write a string to the file. There is no return value. Note: Due to buffering, the string may not actually show up in the file until -- Eric S. Raymond "To disarm the people... was the best and most effectual way to enslave them." -- George Mason, speech of June 14, 1788 From fredrik at effbot.org Thu Jan 25 20:23:50 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 20:23:50 +0100 Subject: [Python-Dev] Fw: random.py gives wrong results (+ a solution) Message-ID: <00f701c08704$59bde510$e46940d5@hagrid> I'm pretty sure Tim's seen this already, but just in case... ----- Original Message ----- From: "Ivan Frohne" Newsgroups: comp.lang.python Sent: Thursday, January 25, 2001 5:20 PM Subject: Re: random.py gives wrong results (+ a solution) > > "Janne Sinkkonen" wrote in message > news:m3u26oy1rw.fsf at kinos.nnets.fi... > > > > At least in Python 2.0 and earlier, the samples returned by the > > function betavariate() of random.py are not from a beta distribution > > although the function name misleadingly suggests so. > > > > The following would give beta-distributed samples: > > > > def betavariate(alpha, beta): > > y = gammavariate(alpha,1) > > if y==0: return 0.0 > > else: return y/(y+gammavariate(beta,1)) > > > > This is from matlab. A comment in the original matlab code refers to > > Devroye, L. (1986) Non-Uniform Random Variate Generation, theorem 4.1A > > (p. 430). Another reference would be Gelman, A. et al. (1995) Bayesian > > data analysis, p. 481, which I have checked and found to agree with > > the code above. > > > I'm convinced that Janne Sinkkonen is right: The beta distribution > generator in module random.py does not return Beta-distributed > random numbers. Janne's suggested fix should work just fine. > > Here's my guess on how and why this bug bit -- it won't be of interest to > most but > this subject is so obscure sometimes that there needs to be a detailed > analysis. > > The probability density function of the gamma distribution with (positive) > parameters > A and B is usually written > > g(x; A, B) = (x**(A-1) * exp(x/B)) / (Gamma(A) * B**A), where x, A, and > B > 0. > > Here Gamma(A) is the gamma function -- for A a positive integer, Gamma(A) is > the > factorial of A - 1, Gamma(A) = (A-1)!. In fact, this is the definition used > by the authors of random.py in defining gammavariate(alpha, beta), the gamma > distribution random number generator. > > Now it happens that a gamma-distributed random variable with parameters A = > 1 and > B has the (much simpler) exponential distribution with density function > > g(x; 1, B) = exp(-x/B) / B. > > Keep that in mind. > > The reference "Discrete Event Simulation in ," by Kevin Watkins > (McGraw-Hill, 1993) > was consulted by the random.py authors. But this reference defines the > gamma probability distribution a little differently, as > > g1(x; A, B) = (B**A * x**(A-1) * exp(B*x)) / Gamma(A), where x, A, B > > 0. > > (See p. 85). On page 87, Watkins states (incorrectly) that if grv(A, B) is > a function which > returns a gamma random variable with parameters A and B (using his > definition on p. 85), > then the function > > brv(A, B) = grv(1, 1/B) / ( grv(1, 1/B) + grv(1, A) ) [ not > true!] > > will return a random variable which has the beta distribution with > parameters A and B. > > Believing Watkins to be correct, the random.py authors remembered that a > gamma > random variable with parameter A = 1 is just an exponential random variable > and > further simplified their beta generator to > > brv(A, B) = erv(1/B) / (erv(1/B) + erv(A)), where erv(K) is a random > variable > > having the exponential distribution with > > parameter K. > > The corrected equation for a beta random variable, using Watkins' definition > of the > gamma density, is > > brv(A, B) = grv(A, 1) / ( grv(A, 1) + grv(1/B, 1) ), > > which translates to > > brv(A, B) = grv(A, 1) / (grv(A, 1) + grv(B, 1) > > using the more common gamma density definition (the one used in random.py). > Many standard statistical references give this equation -- two are > "Non-Uniform random Variate Generation," by Luc Devroye, Springer-Verlag, > 1986, > p. 432, and "Monte Carlo Concepts, Algorithms and Applications," by > George S. Fishman, Springer, 1996, p. 200. > > --Ivan Frohne > > > > > >>> > > > > From jeremy at alum.mit.edu Thu Jan 25 18:13:03 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 12:13:03 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <20010124073155.B32266@glacier.fnational.com> References: <20010124073155.B32266@glacier.fnational.com> Message-ID: <14960.24223.599357.388059@localhost.localdomain> Neil, What would it take to add useful dependency information to the Makefile? Or does it already exist? When I was working the nested scopes, building was tedious at times because a change to funcobject.h meant that, e.g., newmodule.c needed to be recompiled. The Makefiles didn't capture that information, so I had been adding it to the individual Makefiles, e.g. newmodule.o: newmodule.c ../Include/funcobject.h (I think this worked.) It would be great if the Makefile captured all the dependencies. Could we just use makedepend? Jeremy From MarkH at ActiveState.com Thu Jan 25 20:43:35 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 25 Jan 2001 11:43:35 -0800 Subject: [Python-Dev] Waiting method for file objects In-Reply-To: <20010125111936.A23512@thyrsus.com> Message-ID: > I would appreciate it if those of you on Windows and Macintosh > machines would test the waiting method. The test program will take > some porting, because it needs to write to a FIFO in background. This didn't compile under Windows. I have a patch (against CVS) that compiles, but doesnt appear to work (and will be forwarded to Eric under seperate cover) [news flash :-) Changing the open call to add "rb" as the mode makes it work - text v binary bites again] I didn't try any sort of fifo test. The sockets test failed with a socket error, but would certainly have failed had the socket connected, as my patch includes: #ifndef S_ISSOCK # define S_ISSOCK(mode) (0) #endif I have no idea if it managed to mail the results, but I guess not, so the output is below. The test file (after some small mods, including the "rb" param) is indeed 4252 bytes long. Hope this is useful! Mark. This program tests the `waiting' method of file objects. Good, you're running a patched Python with `waiting' available. First, plain files: There are 4252 bytes waiting to be read in this file. Please check this with your OS's directory tools. I'll now read a random number (3091) of bytes. The waiting method sees 1161 bytes left. 3091 + 1161 = 4252. That's consistent. Test passed. Now let's see if we can detect EOF reliably. I'll do a read()...the waiting method now returns 0 That looks like EOF. Now sockets: Connecting to imap.netaxs.com's IMAP server now... Traceback (most recent call last): File "c:\temp\waiting_test.py", line 57, in ? sock.connect(("imap.netaxs.com", 143)) File "", line 1, in connect socket.error: (10060, 'Operation timed out') From nas at arctrix.com Thu Jan 25 14:07:53 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 05:07:53 -0800 Subject: [Python-Dev] Makefile changes In-Reply-To: <14960.24223.599357.388059@localhost.localdomain>; from jeremy@alum.mit.edu on Thu, Jan 25, 2001 at 12:13:03PM -0500 References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> Message-ID: <20010125050753.A1573@glacier.fnational.com> On Thu, Jan 25, 2001 at 12:13:03PM -0500, Jeremy Hylton wrote: > What would it take to add useful dependency information to the > Makefile? Or does it already exist? Some of it exists but I don't think its complete. > When I was working the nested scopes, building was tedious at times > because a change to funcobject.h meant that, e.g., newmodule.c needed > to be recompiled. The Makefiles didn't capture that information, so I > had been adding it to the individual Makefiles, e.g. > > newmodule.o: newmodule.c ../Include/funcobject.h > > (I think this worked.) Hmm, I don't think so. Which makefile did you add this to? Are you using the new makefile? The Makefile.pre.in file contains a line like: $(LIBRARY_OBJS) $(MAINOBJ): $(PYTHON_HEADERS) but newmodule.o not in LIBRARY_OBJS. By default its not compiled by make but with distutils. If you add newmodule to Setup then a line like: Modules/newmodule.o: $(PYTHON_HEADERS) would do the trick. I think I will add a line like: $(MODOBJS): $(PYTHON_HEADERS) to fix the problem. I could easily restore the mkdep target but my feeling right now that explicitly including the header dependencies is better. What do you think? Neil From jeremy at alum.mit.edu Thu Jan 25 21:02:46 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 15:02:46 -0500 (EST) Subject: [Python-Dev] PEP 227 checkins to follow Message-ID: <14960.34406.342961.834827@localhost.localdomain> I am about to check in the changes that implemention PEP 227. There are many changes, which I will make via separate commits. You might want to wait until the checkins are done to do an update. I'll send a note when I'm done. I also wanted to mention that the PEP has fallen a little out of date. There are a few wrinkles that it doesn't deal with, e.g. def f(x): def g(y): return x + y del x return g For now, this raises a SyntaxError. I'll flesh out the PEP to reflect the current implemention and spec out some of the less obvious cases. I'd welcome any comments on the code itself. I know there are a number of rough edges and also, most likely, a bunch of memory leaks. I'll be working to clean things up before 2.1a2, but wanted to get the code into CVS ASAP. Jeremy From jeremy at alum.mit.edu Thu Jan 25 21:15:01 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 15:15:01 -0500 (EST) Subject: [Python-Dev] checkins done for PEP 227 Message-ID: <14960.35141.237252.468467@localhost.localdomain> It looks like python-dev is very slow, so you'll see my original warning well after the checkins occurred. Oh, well. They're done. Jeremy From tim.one at home.com Thu Jan 25 21:58:03 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 15:58:03 -0500 Subject: [Python-Dev] Fw: random.py gives wrong results (+ a solution) Message-ID: [/F, fwds a c.l.py claim that random.betavariate is dead wrong] Not to worry; I had already entered that into the SF bug database and assigned it to me (hmm: why would you send it to Python-Dev instead of putting it in the database?). I suspect he's correct, and, more importantly, so does Ivan Frohne. We'll settle it before 2.1a2, but perhaps not today. Alas, I have no idea where the original code came from ("Guido" isn't a useful answer -- he was just converting somebody else's C++ code to Python). From fredrik at effbot.org Thu Jan 25 21:42:05 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 25 Jan 2001 21:42:05 +0100 Subject: [Python-Dev] Waiting method for file objects References: <20010125111936.A23512@thyrsus.com> Message-ID: <01fb01c0870f$48517110$e46940d5@hagrid> eric wrote: > Fortunately, this is less an issue than it appears. only if you ignore Windows... -1 on making this a file method +0 on adding it as an optional support function to the os module. From martin at mira.cs.tu-berlin.de Thu Jan 25 21:42:39 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 25 Jan 2001 21:42:39 +0100 Subject: [Python-Dev] jeremy@alum.mit.edu Message-ID: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de> > It would be great if the Makefile captured all the dependencies. That would be great, yes. However, setup.py should probably also consider dependencies. > Could we just use makedepend? Not sure. Certainly not in the build process. I dislike distributions which, as the first thing, perform dependency generation. Dependencies change less often than the actual source, so it is should be sufficient to update them manually. Furthermore, generated files as part of the CVS repository fail to work properly unless everybody uses the exact same generator. For autoconf alone, that's a problem because of multiple autoconf versions. I don't know how many different makedepend versions are in use. Regards, Martin From tim.one at home.com Thu Jan 25 22:02:11 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 16:02:11 -0500 Subject: [Python-Dev] Windows compile broken Message-ID: Linking... Creating library ./python21.lib and object ./python21.exp ceval.obj : error LNK2001: unresolved external symbol _PyCell_Set ceval.obj : error LNK2001: unresolved external symbol _PyCell_Get frameobject.obj : error LNK2001: unresolved external symbol _PyCell_New ./python21.dll : fatal error LNK1120: 3 unresolved externals Error executing link.exe. Sorry if this has already been discussed. I don't see mention of it in the Python-Dev archive, and my email is almost worse than useless (random delays of minutes to days, due to what appears to be the simultaneous worldwide wedging of every email server servicing every email account I have). From esr at thyrsus.com Thu Jan 25 22:12:25 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 16:12:25 -0500 Subject: [Python-Dev] Waiting method for file objects In-Reply-To: <01fb01c0870f$48517110$e46940d5@hagrid>; from fredrik@effbot.org on Thu, Jan 25, 2001 at 09:42:05PM +0100 References: <20010125111936.A23512@thyrsus.com> <01fb01c0870f$48517110$e46940d5@hagrid> Message-ID: <20010125161225.A24305@thyrsus.com> Fredrik Lundh : > > Fortunately, this is less an issue than it appears. > > only if you ignore Windows... I don't understand this. Explain? -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From esr at thyrsus.com Thu Jan 25 22:13:31 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 25 Jan 2001 16:13:31 -0500 Subject: [Python-Dev] jeremy@alum.mit.edu In-Reply-To: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de>; from martin@mira.cs.tu-berlin.de on Thu, Jan 25, 2001 at 09:42:39PM +0100 References: <200101252042.f0PKgd101532@mira.informatik.hu-berlin.de> Message-ID: <20010125161331.B24305@thyrsus.com> Martin v. Loewis : > Not sure. Certainly not in the build process. I dislike distributions > which, as the first thing, perform dependency generation. Dependencies > change less often than the actual source, so it is should be > sufficient to update them manually. Furthermore, generated files as > part of the CVS repository fail to work properly unless everybody uses > the exact same generator. For autoconf alone, that's a problem because > of multiple autoconf versions. I don't know how many different > makedepend versions are in use. Easily solved -- there are script versions of makedepend we can just ship with the distribution. -- Eric S. Raymond Morality is always the product of terror; its chains and strait-waistcoats are fashioned by those who dare not trust others, because they dare not trust themselves, to walk in liberty. -- Aldous Huxley From mal at lemburg.com Thu Jan 25 22:26:04 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 25 Jan 2001 22:26:04 +0100 Subject: [Python-Dev] Windows compile broken References: Message-ID: <3A7099EC.81689EA5@lemburg.com> Tim Peters wrote: > > Linking... > Creating library ./python21.lib and object ./python21.exp > ceval.obj : error LNK2001: unresolved external symbol _PyCell_Set > ceval.obj : error LNK2001: unresolved external symbol _PyCell_Get > frameobject.obj : error LNK2001: unresolved external symbol _PyCell_New > ./python21.dll : fatal error LNK1120: 3 unresolved externals > Error executing link.exe. > > Sorry if this has already been discussed. I don't see mention of it in the > Python-Dev archive, and my email is almost worse than useless (random delays > of minutes to days, due to what appears to be the simultaneous worldwide > wedging of every email server servicing every email account I have). These must be related to checkins by Jeremy and his nested scopes... (I knew these would get us into trouble ;-) I think Jeremy forgot to check in the needed change for Objects/Makefile.in and probably the Windows project file is missing the new object type too (Objects/cellobject.c). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at alum.mit.edu Thu Jan 25 22:14:52 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 16:14:52 -0500 (EST) Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A7099EC.81689EA5@lemburg.com> References: <3A7099EC.81689EA5@lemburg.com> Message-ID: <14960.38732.773129.793360@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> Tim Peters wrote: >> >> Linking... Creating library ./python21.lib and object >> ./python21.exp ceval.obj : error LNK2001: unresolved external >> symbol _PyCell_Set ceval.obj : error LNK2001: unresolved external >> symbol _PyCell_Get frameobject.obj : error LNK2001: unresolved >> external symbol _PyCell_New ./python21.dll : fatal error LNK1120: >> 3 unresolved externals Error executing link.exe. >> >> Sorry if this has already been discussed. I don't see mention of >> it in the Python-Dev archive, and my email is almost worse than >> useless (random delays of minutes to days, due to what appears to >> be the simultaneous worldwide wedging of every email server >> servicing every email account I have). MAL> These must be related to checkins by Jeremy and his nested MAL> scopes... (I knew these would get us into trouble ;-) Just you wait and see! MAL> I think Jeremy forgot to check in the needed change for MAL> Objects/Makefile.in and probably the Windows project file is MAL> missing the new object type too (Objects/cellobject.c). That's right. I didn't change the Makefile in Objects or do anything with Windows. Don't know how to do the latter, but perhaps Tim will stop by my desk next week and show me. As for the Makefile, I thought I saw a message from Neil saying not to update those anymore. Jeremy From nas at arctrix.com Thu Jan 25 16:10:56 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 07:10:56 -0800 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include cellobject.h,NONE,2.1 Python.h,2.30,2.31 In-Reply-To: ; from jhylton@users.sourceforge.net on Thu, Jan 25, 2001 at 12:04:16PM -0800 References: Message-ID: <20010125071056.A2390@glacier.fnational.com> On Thu, Jan 25, 2001 at 12:04:16PM -0800, Jeremy Hylton wrote: > A cell contains a reference to a single PyObject. It could be > implemented as a mutable, one-element sequence, but the separate type > has less overhead. Can this object be involved in reference cycles? If so, it should probably have the GC methods added to it. Neil From jeremy at alum.mit.edu Thu Jan 25 22:42:04 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jan 2001 16:42:04 -0500 (EST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include cellobject.h,NONE,2.1 Python.h,2.30,2.31 In-Reply-To: <20010125071056.A2390@glacier.fnational.com> References: <20010125071056.A2390@glacier.fnational.com> Message-ID: <14960.40364.594582.353511@localhost.localdomain> >>>>> "NS" == Neil Schemenauer writes: NS> On Thu, Jan 25, 2001 at 12:04:16PM -0800, Jeremy Hylton wrote: >> A cell contains a reference to a single PyObject. It could be >> implemented as a mutable, one-element sequence, but the separate >> type has less overhead. NS> Can this object be involved in reference cycles? If so, it NS> should probably have the GC methods added to it. It's already there. (Last five lines of cellobject.c quoted as proof.) > Py_TPFLAGS_DEFAULT | Py_TPFLAGS_GC, /* tp_flags */ > 0, /* tp_doc */ > (traverseproc)cell_traverse, /* tp_traverse */ > (inquiry)cell_clear, /* tp_clear */ >}; From nas at arctrix.com Thu Jan 25 16:19:22 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 07:19:22 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A7099EC.81689EA5@lemburg.com>; from mal@lemburg.com on Thu, Jan 25, 2001 at 10:26:04PM +0100 References: <3A7099EC.81689EA5@lemburg.com> Message-ID: <20010125071922.B2390@glacier.fnational.com> On Thu, Jan 25, 2001 at 10:26:04PM +0100, M.-A. Lemburg wrote: > I think Jeremy forgot to check in the needed change for > Objects/Makefile.in That file is dead. Should I remove it now? I haven't heard any major complaints about Makefile.pre.in yet. Maybe the messages are all sitting in the python.org mail spool. Barry, what the hell is going on? You need to drop that Postfix crap and get qmail. :-) Neil From thomas at xs4all.net Thu Jan 25 23:19:37 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 25 Jan 2001 23:19:37 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include modsupport.h,2.35,2.36 In-Reply-To: ; from fdrake@users.sourceforge.net on Thu, Jan 25, 2001 at 02:13:36PM -0800 References: Message-ID: <20010125231937.I962@xs4all.nl> On Thu, Jan 25, 2001 at 02:13:36PM -0800, Fred L. Drake wrote: > The addition of new parameters to functions in the Python/C API requires > that PYTHON_API_VERSION be incremented. When we update the API version, isn't it time to clean up the TP_HASFEATURE stuff ? Since we updated the API, all the current slots should be there, right ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Thu Jan 25 23:32:32 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 17:32:32 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include modsupport.h,2.35,2.36 In-Reply-To: Your message of "Thu, 25 Jan 2001 23:19:37 +0100." <20010125231937.I962@xs4all.nl> References: <20010125231937.I962@xs4all.nl> Message-ID: <200101252232.RAA20013@cj20424-a.reston1.va.home.com> > > The addition of new parameters to functions in the Python/C API requires > > that PYTHON_API_VERSION be incremented. > > When we update the API version, isn't it time to clean up the TP_HASFEATURE > stuff ? Since we updated the API, all the current slots should be there, > right ? No, we're issuing a warning about old API versions but still try to work with them. After all most extensions don't create frame or code objects. I added the flags for the tp_richcompare field when I tried 2.1a1 with Zope's ExtensionClasses and Acquisition modules. Turns out I cot a core dump, while 2.1 ran flawlessly. The reason: they have their own type struct which has the same lay-out as the Python 1.5.2 (or even older) type struct, followed by fields of their own. They have the tp_flags field set to 0, so up to 2.0, it was compatible. I expect that 2.1a2 will work with the unchanged Zope code because of the flag I added. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Jan 26 00:04:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 00:04:54 +0100 Subject: [Python-Dev] Windows compile broken References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> Message-ID: <3A70B116.12BF756B@lemburg.com> Neil Schemenauer wrote: > > On Thu, Jan 25, 2001 at 10:26:04PM +0100, M.-A. Lemburg wrote: > > I think Jeremy forgot to check in the needed change for > > Objects/Makefile.in > > That file is dead. Should I remove it now? I haven't heard any > major complaints about Makefile.pre.in yet. What about that file ? Are you saying that Makefile.pre.in will no longer work in 2.1 ??? Please don't remove that mechanism -- it has been in use for quite a while and is much more stable than distutils. We should at least wait a few more distutils releases for the dust to settle before removing the old fallback solution. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Fri Jan 26 00:06:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 25 Jan 2001 18:06:40 -0500 Subject: [Python-Dev] Windows compile broken In-Reply-To: Your message of "Fri, 26 Jan 2001 00:04:54 +0100." <3A70B116.12BF756B@lemburg.com> References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> Message-ID: <200101252306.SAA20173@cj20424-a.reston1.va.home.com> > > That file is dead. Should I remove it now? I haven't heard any > > major complaints about Makefile.pre.in yet. > > What about that file ? Are you saying that Makefile.pre.in > will no longer work in 2.1 ??? > > Please don't remove that mechanism -- it has been in use for > quite a while and is much more stable than distutils. We should > at least wait a few more distutils releases for the dust to > settle before removing the old fallback solution. Let's at least mark it clearly as obsolete though -- it's a pain to maintain. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Thu Jan 25 17:31:28 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 08:31:28 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <3A70B116.12BF756B@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 12:04:54AM +0100 References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> Message-ID: <20010125083128.A2699@glacier.fnational.com> On Fri, Jan 26, 2001 at 12:04:54AM +0100, M.-A. Lemburg wrote: > What about that file ? Are you saying that Makefile.pre.in > will no longer work in 2.1 ??? I'm talking about Objects/Makefile.in. Which Makefile.pre.in are you talking about? Modules/Makefile.pre.in is dead too. There is a Makefile.pre.in in the toplevel directory which does the same thing. There is also Misc/Makefile.pre.in. That file gets installed into lib and still works as it aways did. The toplevel Makefile.pre.in can use Modules/Setup* just like the old Modules/Makefile.pre.in could. Does this address your concerns? > Please don't remove that mechanism -- it has been in use for > quite a while and is much more stable than distutils. We should > at least wait a few more distutils releases for the dust to > settle before removing the old fallback solution. No doubt. Neil From nas at arctrix.com Thu Jan 25 17:33:48 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 25 Jan 2001 08:33:48 -0800 Subject: [Python-Dev] Windows compile broken In-Reply-To: <200101252306.SAA20173@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jan 25, 2001 at 06:06:40PM -0500 References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> <200101252306.SAA20173@cj20424-a.reston1.va.home.com> Message-ID: <20010125083348.B2699@glacier.fnational.com> On Thu, Jan 25, 2001 at 06:06:40PM -0500, Guido van Rossum wrote: > Let's at least mark it clearly as obsolete though -- it's a pain to > maintain. Are you talking about Misc/Makefile.pre.in? If so, how do you suggest we mark it? I don't think Modules/Setup should go away any time soon. I often like to build lots of modules staticly into the interpreter. setup.py has no support for building static modules. Neil From tim.one at home.com Fri Jan 26 00:27:52 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 25 Jan 2001 18:27:52 -0500 Subject: [Python-Dev] Windows compile broken In-Reply-To: <14960.38732.773129.793360@localhost.localdomain> Message-ID: Thanks for the clues, everyone! I'll fix it for Windows. Note that I'm getting email in wild bursts, and most often delayed. So I'm generally not seeing any checkin msgs, or SF bug email, or Python-Dev email, ..., anywhere near the time (or, alas, sometimes even day) they're generated. So I simply didn't see the checkin msg introducing cellobject.c. all's-well-that-looks-like-it-may-end-ly y'rs - tim From mal at lemburg.com Fri Jan 26 10:32:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 10:32:14 +0100 Subject: [Python-Dev] Makefile.pre.in (Windows compile broken) References: <3A7099EC.81689EA5@lemburg.com> <20010125071922.B2390@glacier.fnational.com> <3A70B116.12BF756B@lemburg.com> <20010125083128.A2699@glacier.fnational.com> Message-ID: <3A71441E.4584A5C8@lemburg.com> Neil Schemenauer wrote: > > On Fri, Jan 26, 2001 at 12:04:54AM +0100, M.-A. Lemburg wrote: > > What about that file ? Are you saying that Makefile.pre.in > > will no longer work in 2.1 ??? > > I'm talking about Objects/Makefile.in. Which Makefile.pre.in are > you talking about? Modules/Makefile.pre.in is dead too. There > is a Makefile.pre.in in the toplevel directory which does the > same thing. There is also Misc/Makefile.pre.in. That file gets > installed into lib and still works as it aways did. The toplevel > Makefile.pre.in can use Modules/Setup* just like the old > Modules/Makefile.pre.in could. Does this address your concerns? Yes. Thanks. I was talking about the Misc/Makefile.pre.in mechanism which was used in the past by many Python C extensions to provide a portable of compiling the extension into a shared module or statically into the Python interpreter. I have been using that mechanism for years now and with much success. Even though I am currently moving to distutils I have no idea how stable distutils is on exotic platforms or ones which have special needs (like e.g. AIX). > > Please don't remove that mechanism -- it has been in use for > > quite a while and is much more stable than distutils. We should > > at least wait a few more distutils releases for the dust to > > settle before removing the old fallback solution. > > No doubt. Ok. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Jan 26 10:37:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 10:37:12 +0100 Subject: [Python-Dev] setup.py Message-ID: <3A714548.C487DCC9@lemburg.com> I have posted two messages here regarding the new setup.py mechanism for building Modules/ but have received no comments on them so far. Here's another go: 1. I think that setup.py should output warnings about modules which cannot be built for some reason rather than having ot the build process completely. 2. I suggest adding -L/usr/lib/termcap to the readline extension. This doesn't hurt anywhere and will get this extension to compile on SuSE Linux too. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Fri Jan 26 13:27:56 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 26 Jan 2001 07:27:56 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <3A714548.C487DCC9@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 10:37:12AM +0100 References: <3A714548.C487DCC9@lemburg.com> Message-ID: <20010126072756.A5013@thyrsus.com> M.-A. Lemburg : > 1. I think that setup.py should output warnings about modules > which cannot be built for some reason rather than having > ot the build process completely. > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > This doesn't hurt anywhere and will get this extension to compile > on SuSE Linux too. Both good ideas. -- Eric S. Raymond Such are a well regulated militia, composed of the freeholders, citizen and husbandman, who take up arms to preserve their property, as individuals, and their rights as freemen. -- "M.T. Cicero", in a newspaper letter of 1788 touching the "militia" referred to in the Second Amendment to the Constitution. From mal at lemburg.com Fri Jan 26 15:13:45 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 15:13:45 +0100 Subject: [Python-Dev] setup.py References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> Message-ID: <3A718619.6278AF41@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > 1. I think that setup.py should output warnings about modules > > which cannot be built for some reason rather than having > > ot the build process completely. > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > This doesn't hurt anywhere and will get this extension to compile > > on SuSE Linux too. > > Both good ideas. Should I implement the two and check these in ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Fri Jan 26 15:25:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 26 Jan 2001 09:25:59 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <3A718619.6278AF41@lemburg.com>; from mal@lemburg.com on Fri, Jan 26, 2001 at 03:13:45PM +0100 References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> <3A718619.6278AF41@lemburg.com> Message-ID: <20010126092559.A5623@thyrsus.com> M.-A. Lemburg : > "Eric S. Raymond" wrote: > > > > M.-A. Lemburg : > > > 1. I think that setup.py should output warnings about modules > > > which cannot be built for some reason rather than having > > > ot the build process completely. > > > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > > This doesn't hurt anywhere and will get this extension to compile > > > on SuSE Linux too. > > > > Both good ideas. > > Should I implement the two and check these in ? I may not channel Guido the way Tim does, but I suspect he gave you developer privileges because he trusts you to do routine stuff like this. -- Eric S. Raymond The saddest life is that of a political aspirant under democracy. His failure is ignominious and his success is disgraceful. -- H.L. Mencken From mal at lemburg.com Fri Jan 26 15:29:18 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 15:29:18 +0100 Subject: [Python-Dev] setup.py References: <3A714548.C487DCC9@lemburg.com> <20010126072756.A5013@thyrsus.com> <3A718619.6278AF41@lemburg.com> <20010126092559.A5623@thyrsus.com> Message-ID: <3A7189BE.C6C2806E@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > "Eric S. Raymond" wrote: > > > > > > M.-A. Lemburg : > > > > 1. I think that setup.py should output warnings about modules > > > > which cannot be built for some reason rather than having > > > > ot the build process completely. > > > > > > > > 2. I suggest adding -L/usr/lib/termcap to the readline extension. > > > > This doesn't hurt anywhere and will get this extension to compile > > > > on SuSE Linux too. > > > > > > Both good ideas. > > > > Should I implement the two and check these in ? > > I may not channel Guido the way Tim does, but I suspect he gave you > developer privileges because he trusts you to do routine stuff like this. Just asking because setup.py is Andrew's baby. I'll add the above two later today. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mwh21 at cam.ac.uk Fri Jan 26 17:40:47 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 26 Jan 2001 16:40:47 +0000 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. Message-ID: Following discussion on c.l.py I've just submitted: http://sourceforge.net/patch/?func=detailpatch&patch_id=103441&group_id=5470 which implements a syntax for adding function attributes inline: >>> def f(a) having (publish=1): ... print 1 ... >>> f.publish 1 It uses an "import-as" like strategy to avoid makeing "having" a keyword (which interacts a bit badly with error reporting, as it happens). Obviously, it would be easy to change "having" to a different word. Another idea I had was: >>> def f(a) having (.publish=1): ... print 1 ... >>> f.publish 1 to emphasize the attributeness of what's going on, but I didn't like this as much in practice (I always forgot the period!). Emile van Sebille also suggested >>> d = {'a':1} >>> def f(a) having (**d): ... print 1 ... >>> f.a 1 which I haven't implemented, because I didn't really like it, but I thought I'd mention. I'll do test suites and documentation in time, but I thought I'd call in here to check the idea wasn't DOA. What do you all think? Cheers, M. -- surely, somewhere, somehow, in the history of computing, at least one manual has been written that you could at least remotely attempt to consider possibly glancing at. -- Adam Rixey From nas at arctrix.com Fri Jan 26 10:55:57 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 01:55:57 -0800 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. In-Reply-To: ; from mwh21@cam.ac.uk on Fri, Jan 26, 2001 at 04:40:47PM +0000 References: Message-ID: <20010126015556.A4215@glacier.fnational.com> I don't see whats wrong with: def f(a): print 1 f.publish = 1 Its perfectly clear to me. As a bonus it works already. I'm -1 on inventing more syntax. Neil From evan at digicool.com Fri Jan 26 18:12:43 2001 From: evan at digicool.com (Evan Simpson) Date: Fri, 26 Jan 2001 12:12:43 -0500 Subject: [Python-Dev] [PEP 232] Syntactic support for function attributes strawman. References: Message-ID: <00c001c087bb$322a9720$3e48a4d8@digicool.com> From: Michael Hudson > >>> def f(a) having (publish=1): > ... print 1 This doesn't really need special syntax. I would much rather have this (or something like it) as a way of spelling initialized local variables. That is, when I want static local variables, instead of corrupting the function signature by writing: def f(x, marker=[], foo=foo) ...I could write: def f(x) having (marker=[], foo) Cheers, Evan @ digicool From jeremy at alum.mit.edu Fri Jan 26 18:58:24 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 26 Jan 2001 12:58:24 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <20010125050753.A1573@glacier.fnational.com> References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> Message-ID: <14961.47808.315324.734238@localhost.localdomain> >>>>> "NS" == Neil Schemenauer writes: >> When I was working the nested scopes, building was tedious at >> times because a change to funcobject.h meant that, e.g., >> newmodule.c needed to be recompiled. The Makefiles didn't >> capture that information, so I had been adding it to the >> individual Makefiles, e.g. >> >> newmodule.o: newmodule.c ../Include/funcobject.h >> >> (I think this worked.) NS> Hmm, I don't think so. Which makefile did you add this to? Just to clarify: I added this line to the old Makefile before you checked the new one in. NS> Hmm, I don't think so. Which makefile did you add this to? Are NS> you using the new makefile? The Makefile.pre.in file contains a NS> line like: NS> $(LIBRARY_OBJS) $(MAINOBJ): $(PYTHON_HEADERS) NS> but newmodule.o not in LIBRARY_OBJS. By default its not NS> compiled by make but with distutils. If you add newmodule to NS> Setup then a line like: NS> Modules/newmodule.o: $(PYTHON_HEADERS) NS> would do the trick. I think I will add a line like: NS> $(MODOBJS): $(PYTHON_HEADERS) NS> to fix the problem. NS> I could easily restore the mkdep target but my feeling right now NS> that explicitly including the header dependencies is better. NS> What do you think? Isn't it overkill to have every .o file depend on all the .h files? If I change cobject.h, there are very few .o files that depend on this change. I suppose, however, it's not worth the effort to get it right at a finer granularity, e.g. that the only files that depend on cobject.h are cobject, cStringIO, unicodedata, _cursesmodule, object, and unicodeobject. Jeremy From fdrake at acm.org Fri Jan 26 21:36:18 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 26 Jan 2001 15:36:18 -0500 (EST) Subject: [Python-Dev] Makefile changes In-Reply-To: <14961.47808.315324.734238@localhost.localdomain> References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> <14961.47808.315324.734238@localhost.localdomain> Message-ID: <14961.57282.880552.358709@cj42289-a.reston1.va.home.com> Jeremy Hylton writes: > Isn't it overkill to have every .o file depend on all the .h files? > If I change cobject.h, there are very few .o files that depend on this > change. I suppose, however, it's not worth the effort to get it right Perhaps. It's definately easier to maintain than tracking it more specifically and better than what we had, so I'll live with it. ;) > at a finer granularity, e.g. that the only files that depend on > cobject.h are cobject, cStringIO, unicodedata, _cursesmodule, object, > and unicodeobject. And py_curses.h, which is also used in _curses_panel.c. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas at arctrix.com Fri Jan 26 14:58:50 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 05:58:50 -0800 Subject: [Python-Dev] Makefile changes In-Reply-To: <14961.47808.315324.734238@localhost.localdomain>; from jeremy@alum.mit.edu on Fri, Jan 26, 2001 at 12:58:24PM -0500 References: <20010124073155.B32266@glacier.fnational.com> <14960.24223.599357.388059@localhost.localdomain> <20010125050753.A1573@glacier.fnational.com> <14961.47808.315324.734238@localhost.localdomain> Message-ID: <20010126055850.C4918@glacier.fnational.com> On Fri, Jan 26, 2001 at 12:58:24PM -0500, Jeremy Hylton wrote: > Isn't it overkill to have every .o file depend on all the .h files? Maybe, but Python compiles pretty fast anyhow. I'd rather error on the safe side (ie. compiling too much). Trying to figure out which of the subheaders a .c file uses when it imports Python.h would be a lot of work and error prone. More power to you if you want to do it. ;-) Neil From dgoodger at atsautomation.com Fri Jan 26 22:46:13 2001 From: dgoodger at atsautomation.com (Goodger, David) Date: Fri, 26 Jan 2001 16:46:13 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very rusty (long live Python!), I don't know my way around configure, and am not familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of tweaks), but I'm getting caught by the new way of building things. Please help if you can! Many thanks in advance. Here's an excerpt of my efforts: # cd /tmp/py # gunzip -c < python-2.1a1.tgz | tar -rf - # cd Python-2.1a1 # ./configure 2>&1 | tee ../configure.1 # make 2>&1 | tee ../make.1 ... ./python //5/tmp/py/Python-2.1a1/setup.py build 'import site' failed; use -v for traceback Traceback (most recent call last): File "//5/tmp/py/Python-2.1a1/setup.py", line 4, in ? import sys, os, string, getopt ImportError: No module named string Running ./python results in stack overflow. The old QNX instructions in README recommend editing Modules/Makefile: LDFLAGS= -N 64k # make 2>&1 | tee ../make.2 Same error as first make. But now the stack doesn't overflow. # python 'import site' failed; use -v for traceback Python 2.1a1 (#2, Jan 26 2001, 11:38:55) [C] on qnxJ Type "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/usr/local/lib/python', '/home/dgoodger/lib/python', '/5/tmp/py/Python-2.1a1/Lib', '/5/tmp/py/Python-2.1a1/Lib/plat-qnxJ', '/tmp/py/Python-2.1a1/Modules'] >>> ^D # fullpath . . is //5/tmp/py/Python-2.1a1 The QNX node number prefix '//5' (machine or host number, equivalent to a 'hostname:' prefix for network paths) is being reduced somehow (path normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are required at the head of the path. Is this something that can be fixed? I added a prefix (QNX virtual-to-real path mapping on the filesystem tree) to correct this: # prefix -A /5=//5 Now /5 points to //5, similar to a link. # make 2>&1 | tee ../make.3 ... ./python //5/tmp/py/Python-2.1a1/setup.py build unable to execute ld: No such file or directory running build running build_ext building 'struct' extension creating build creating build/temp.qnx-J-PCI-2.1 cc -O -I. -I/5/tmp/py/Python-2.1a1/./Include -IInclude/ -I/usr/local/include -c /5/tmp/py/Python-2.1a1/Modules/structmodule.c -o build/temp.qnx-J-PCI-2.1/structmodule.o creating build/lib.qnx-J-PCI-2.1 ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o build/lib.qnx-J-PCI-2.1/struct.so error: command 'ld' failed with exit status 1 make: *** [sharedmods] Error 1 QNX doesn't have an 'ld' command. Is configure not getting its info to setup.py? (Is it supposed to?) What should I check? I have logs of each of the configure & make runs. Should I submit this as a bug on SourceForge? Hope to hear from somebody soon. David Goodger Systems Administrator & Programmer, Advanced Systems Automation Tooling Systems Inc., Automation Systems Division direct: (519) 653-4483 ext. 7121 fax: (519) 650-6695 e-mail: dgoodger at atsautomation.com From guido at digicool.com Fri Jan 26 22:52:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 26 Jan 2001 16:52:47 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: Your message of "Fri, 26 Jan 2001 16:46:13 EST." References: Message-ID: <200101262152.QAA26624@cj20424-a.reston1.va.home.com> > [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] > > I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very > rusty (long live Python!), I don't know my way around configure, and am not > familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of > tweaks), but I'm getting caught by the new way of building things. Please > help if you can! Many thanks in advance. > > Here's an excerpt of my efforts: > > # cd /tmp/py > # gunzip -c < python-2.1a1.tgz | tar -rf - > # cd Python-2.1a1 > # ./configure 2>&1 | tee ../configure.1 > # make 2>&1 | tee ../make.1 > ... > ./python //5/tmp/py/Python-2.1a1/setup.py build > 'import site' failed; use -v for traceback > Traceback (most recent call last): > File "//5/tmp/py/Python-2.1a1/setup.py", line 4, in ? > import sys, os, string, getopt > ImportError: No module named string > > Running ./python results in stack overflow. The old QNX instructions in > README recommend editing Modules/Makefile: > LDFLAGS= -N 64k > > # make 2>&1 | tee ../make.2 > > Same error as first make. But now the stack doesn't overflow. > > # python > 'import site' failed; use -v for traceback > Python 2.1a1 (#2, Jan 26 2001, 11:38:55) [C] on qnxJ > Type "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.path > ['', '/usr/local/lib/python', '/home/dgoodger/lib/python', > '/5/tmp/py/Python-2.1a1/Lib', '/5/tmp/py/Python-2.1a1/Lib/plat-qnxJ', > '/tmp/py/Python-2.1a1/Modules'] > >>> ^D > > # fullpath . > . is //5/tmp/py/Python-2.1a1 > > The QNX node number prefix '//5' (machine or host number, equivalent to a > 'hostname:' prefix for network paths) is being reduced somehow (path > normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are > required at the head of the path. Is this something that can be fixed? Aha -- you may need QNX-specific path manipulation functions. What's going on is that site.py normalizes the entries in sys.path, using this function: def makepath(*paths): dir = os.path.join(*paths) return os.path.normcase(os.path.abspath(dir)) I've got a feeling that os.path.abspath(dir) here is the culprit in posixpath.py: def abspath(path): """Return an absolute path.""" if not isabs(path): path = join(os.getcwd(), path) return normpath(path) And here I think that normpath(path) is the routine that actually gets rid of the double leading /. Feel free to submit a patch that leaves double leading slashes in if on QNX. > I added a prefix (QNX virtual-to-real path mapping on the filesystem tree) > to correct this: > > # prefix -A /5=//5 > > Now /5 points to //5, similar to a link. > > # make 2>&1 | tee ../make.3 > ... > ./python //5/tmp/py/Python-2.1a1/setup.py build > unable to execute ld: No such file or directory > running build > running build_ext > building 'struct' extension > creating build > creating build/temp.qnx-J-PCI-2.1 > cc -O -I. -I/5/tmp/py/Python-2.1a1/./Include -IInclude/ > -I/usr/local/include -c /5/tmp/py/Python-2.1a1/Modules/structmodule.c -o > build/temp.qnx-J-PCI-2.1/structmodule.o > creating build/lib.qnx-J-PCI-2.1 > ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o > build/lib.qnx-J-PCI-2.1/struct.so > error: command 'ld' failed with exit status 1 > make: *** [sharedmods] Error 1 > > QNX doesn't have an 'ld' command. Is configure not getting its info to > setup.py? (Is it supposed to?) > > What should I check? I have logs of each of the configure & make runs. > Should I submit this as a bug on SourceForge? > > Hope to hear from somebody soon. This is probably in the realm of the distutils. I have no idea how to teach it to build on QNX, sorry! --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at cnri.reston.va.us Fri Jan 26 23:01:01 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 26 Jan 2001 17:01:01 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 04:46:13PM -0500 References: Message-ID: <20010126170101.B2762@amarok.cnri.reston.va.us> On Fri, Jan 26, 2001 at 04:46:13PM -0500, Goodger, David wrote: > ImportError: No module named string The 'import string' in setup.py actually seems to be redundant now, since nothing seems to actually refer to the string module. I've removed it from CVS. >The QNX node number prefix '//5' (machine or host number, equivalent to a >'hostname:' prefix for network paths) is being reduced somehow (path >normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are >required at the head of the path. Is this something that can be fixed? Ooh, very likely: >>> os.path.normpath('//5/foo/bar') '/5/foo/bar' Isn't // at the root a Unix convention of some sort for some network filesystems? Probably normpath() should just leave it alone. >QNX doesn't have an 'ld' command. Is configure not getting its info to >setup.py? (Is it supposed to?) setup.py should be parsing the Makefile. The old QNX instructions say Modules/Makefile should be edited, but with Neil's non-recursive Makefile patch (committed after alpha1's release), editing Modules/Makefile will have no effect. Try editing just the top-level Makefile, which should affect setup.py. --amk From mal at lemburg.com Fri Jan 26 23:15:09 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 23:15:09 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> Message-ID: <3A71F6ED.D6D642A7@lemburg.com> "Andrew M. Kuchling" wrote: > >The QNX node number prefix '//5' (machine or host number, equivalent to a > >'hostname:' prefix for network paths) is being reduced somehow (path > >normalization?) to '/5', so paths don't resolve. 2 slashes ('//') are > >required at the head of the path. Is this something that can be fixed? > > Ooh, very likely: > >>> os.path.normpath('//5/foo/bar') > '/5/foo/bar' > > Isn't // at the root a Unix convention of some sort for some > network filesystems? Probably normpath() should just leave it alone. Samba uses ////. os.path.normpath() should probably leave the leading '//' untouched (having too many of those in the path doesn't do any harm, AFAIK). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nas at arctrix.com Fri Jan 26 16:26:12 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 07:26:12 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 04:46:13PM -0500 References: Message-ID: <20010126072611.A5345@glacier.fnational.com> On Fri, Jan 26, 2001 at 04:46:13PM -0500, Goodger, David wrote: > Running ./python results in stack overflow. The old QNX instructions in > README recommend editing Modules/Makefile: > LDFLAGS= -N 64k > > # make 2>&1 | tee ../make.2 The README should be changed to say edit the toplevel Makefile. Should those flags be the default? If you can give me the MACHDEP from your Makefile I can add it to configure.in. > QNX doesn't have an 'ld' command. Is configure not getting its info to > setup.py? (Is it supposed to?) I'm not sure how distutils figures out what to use for ld. It doesn't appear in the Makefile. It think this is probably some distutils thing. Andrew? Neil From fredrik at effbot.org Fri Jan 26 23:25:34 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 26 Jan 2001 23:25:34 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> Message-ID: <001a01c087e6$ec3b9710$e46940d5@hagrid> mal wrote:> > Ooh, very likely: > > >>> os.path.normpath('//5/foo/bar') > > '/5/foo/bar' > > > > Isn't // at the root a Unix convention of some sort for some > > network filesystems? Probably normpath() should just leave it alone. > > Samba uses ////. os.path.normpath() > should probably leave the leading '//' untouched (having too > many of those in the path doesn't do any harm, AFAIK). from 1.5.2's posixpath: def normpath(path): """Normalize path, eliminating double slashes, etc.""" import string # Treat initial slashes specially slashes = '' while path[:1] == '/': slashes = slashes + '/' path = path[1:] ... return slashes + string.joinfields(comps, '/') from 2.0's posixpath: def normpath(path): """Normalize path, eliminating double slashes, etc.""" if path == '': return '.' import string initial_slash = (path[0] == '/') ... if initial_slash: path = '/' + path return path or '.' interesting... Cheers /F From akuchlin at cnri.reston.va.us Fri Jan 26 23:28:03 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Fri, 26 Jan 2001 17:28:03 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <20010126072611.A5345@glacier.fnational.com>; from nas@arctrix.com on Fri, Jan 26, 2001 at 07:26:12AM -0800 References: <20010126072611.A5345@glacier.fnational.com> Message-ID: <20010126172803.A2817@amarok.cnri.reston.va.us> On Fri, Jan 26, 2001 at 07:26:12AM -0800, Neil Schemenauer wrote: >I'm not sure how distutils figures out what to use for ld. It >doesn't appear in the Makefile. It think this is probably some >distutils thing. Andrew? It looks at LDSHARED. See customize_compiler in Lib/distutils/sysconfig.py. Looking in Modules/Makefile, LDFLAGS is only used for the final link to produce a Python executable, so I think this is up to the Makefile, not setup.py. --amk From nas at arctrix.com Fri Jan 26 16:56:41 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 07:56:41 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <20010126172803.A2817@amarok.cnri.reston.va.us>; from akuchlin@cnri.reston.va.us on Fri, Jan 26, 2001 at 05:28:03PM -0500 References: <20010126072611.A5345@glacier.fnational.com> <20010126172803.A2817@amarok.cnri.reston.va.us> Message-ID: <20010126075641.A5534@glacier.fnational.com> On Fri, Jan 26, 2001 at 05:28:03PM -0500, Andrew M. Kuchling wrote: > On Fri, Jan 26, 2001 at 07:26:12AM -0800, Neil Schemenauer wrote: > >I'm not sure how distutils figures out what to use for ld. > > It looks at LDSHARED. Okay. David, what should LDSHARED say for QNX? I can add the magic to configure.in. Neil From mal at lemburg.com Fri Jan 26 23:51:09 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jan 2001 23:51:09 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> Message-ID: <3A71FF5D.DC609775@lemburg.com> Fredrik Lundh wrote: > > mal wrote:> > Ooh, very likely: > > > >>> os.path.normpath('//5/foo/bar') > > > '/5/foo/bar' > > > > > > Isn't // at the root a Unix convention of some sort for some > > > network filesystems? Probably normpath() should just leave it alone. > > > > Samba uses ////. os.path.normpath() > > should probably leave the leading '//' untouched (having too > > many of those in the path doesn't do any harm, AFAIK). > > from 1.5.2's posixpath: > > def normpath(path): > """Normalize path, eliminating double slashes, etc.""" > import string > # Treat initial slashes specially > slashes = '' > while path[:1] == '/': > slashes = slashes + '/' > path = path[1:] > ... > return slashes + string.joinfields(comps, '/') > > from 2.0's posixpath: > > def normpath(path): > """Normalize path, eliminating double slashes, etc.""" > if path == '': > return '.' > import string > initial_slash = (path[0] == '/') > ... > if initial_slash: > path = '/' + path > return path or '.' > > interesting... Here's the log message: revision 1.34 date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 added rewritten normpath from Moshe Zadka that does the right thing with paths containing .. and the diff: diff -r1.34 -r1.33 349,350d348 < if path == '': < return '.' 352,367c350,372 < initial_slash = (path[0] == '/') < comps = string.split(path, '/') < new_comps = [] < for comp in comps: < if comp in ('', '.'): < continue < if (comp != '..' or (not initial_slash and not new_comps) or < (new_comps and new_comps[-1] == '..')): < new_comps.append(comp) < elif new_comps: < new_comps.pop() < comps = new_comps < path = string.join(comps, '/') < if initial_slash: < path = '/' + path < return path or '.' --- > # Treat initial slashes specially > slashes = '' > while path[:1] == '/': > slashes = slashes + '/' > path = path[1:] > comps = string.splitfields(path, '/') > i = 0 > while i < len(comps): > if comps[i] == '.': > del comps[i] > while i < len(comps) and comps[i] == '': > del comps[i] > elif comps[i] == '..' and i > 0 and comps[i-1] not in ('', '..'): > del comps[i-1:i+1] > i = i-1 > elif comps[i] == '' and i > 0 and comps[i-1] <> '': > del comps[i] > else: > i = i+1 > # If the path is now empty, substitute '.' > if not comps and not slashes: > comps.append('.') > return slashes + string.joinfields(comps, '/') Revision 1.33 clearly leaves initial slashes untouched. I guess we should restore this... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From nas at arctrix.com Fri Jan 26 17:12:15 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 26 Jan 2001 08:12:15 -0800 Subject: [Python-Dev] LINKCC defaults to CXX Message-ID: <20010126081215.B5534@glacier.fnational.com> Dear lord why? So people can develop extensions using C++? Its not worth the pain inflicted on everyone else. Let them recompile with LINKCC=CXX. Linking with CXX opens a huge can of stinky worms. First of all, just because configure found a value for CXX doesn't mean it works. Even if it does that doesn't mean that using it is a good idea. Linking with CXX will bring in the C++ runtime. There are a large number of platforms where the C++ ABI has not been standarized; for example, anything that used g++. Can we please leave LINKCC default to CXX? Its easy enough for the crazies to override if they like. I'll even create a configure option for them. Neil From barry at digicool.com Sat Jan 27 00:09:57 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 26 Jan 2001 18:09:57 -0500 Subject: [Python-Dev] LINKCC defaults to CXX References: <20010126081215.B5534@glacier.fnational.com> Message-ID: <14962.965.464326.794431@anthem.wooz.org> >>>>> "NS" == Neil Schemenauer writes: NS> Can we please leave LINKCC default to CXX? I think you mean default it to CC, eh? +1 From mal at lemburg.com Sat Jan 27 01:16:01 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 01:16:01 +0100 Subject: [Python-Dev] Nightly CVS tarballs Message-ID: <3A721341.3F348E51@lemburg.com> I just got a request from someone who wants to test the latest CVS version but unfortunately can't because he's behind a firewall. Is there any chance of reactivating the nightly tarball generation that was once in place ? http://www.python.org/download/cvs.html Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From dgoodger at atsautomation.com Sat Jan 27 01:30:21 2001 From: dgoodger at atsautomation.com (Goodger, David) Date: Fri, 26 Jan 2001 19:30:21 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: Thank you all for your prompt replies. (Guido's was within seconds! Well, minutes, certainly.) I'll give it another go on Monday. I've got renovations to fill my weekend. /David From thomas at xs4all.net Sat Jan 27 01:35:41 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 27 Jan 2001 01:35:41 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Fri, Jan 26, 2001 at 07:30:21PM -0500 References: Message-ID: <20010127013541.N962@xs4all.nl> On Fri, Jan 26, 2001 at 07:30:21PM -0500, Goodger, David wrote: > Thank you all for your prompt replies. (Guido's was within seconds! Well, > minutes, certainly.) Oh, the wonderful things one can do with a time machine.... -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jeremy at alum.mit.edu Fri Jan 26 23:14:26 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 26 Jan 2001 17:14:26 -0500 (EST) Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <3A721341.3F348E51@lemburg.com> References: <3A721341.3F348E51@lemburg.com> Message-ID: <14961.63170.394043.790610@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> I just got a request from someone who wants to test the latest MAL> CVS version but unfortunately can't because he's behind a MAL> firewall. MAL> Is there any chance of reactivating the nightly tarball MAL> generation that was once in place ? MAL> http://www.python.org/download/cvs.html I plan to set up nightly cvs snapshots soon. We should be moving into our new office next week; I hope to have a machine that is on the net 24x7 shortly after that. Jeremy From bckfnn at worldonline.dk Sat Jan 27 08:58:38 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sat, 27 Jan 2001 07:58:38 GMT Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <14961.63170.394043.790610@localhost.localdomain> References: <3A721341.3F348E51@lemburg.com> <14961.63170.394043.790610@localhost.localdomain> Message-ID: <3a727e79.835771@smtp.worldonline.dk> >>>>>> "MAL" == M -A Lemburg writes: > > MAL> I just got a request from someone who wants to test the latest > MAL> CVS version but unfortunately can't because he's behind a > MAL> firewall. > > MAL> Is there any chance of reactivating the nightly tarball > MAL> generation that was once in place ? > > MAL> http://www.python.org/download/cvs.html [Jeremy] >I plan to set up nightly cvs snapshots soon. We should be moving into >our new office next week; I hope to have a machine that is on the net >24x7 shortly after that. FWIW, I have been using this cron and shell script running on shell.sourceforge.net. This way I don't need 24x7 in order to make a cvs tarball (and .zip) available. 22 2 * * * $HOME/bin/jython-snap SHOTLABEL=`date +%Y%m%d` LOGLABEL=log.`date +%Y%m%d` cd /home/groups/jython/htdocs/cvssnaps (cvs -Qd :pserver:anonymous at cvs1:/cvsroot/jython checkout -d jython-$SHOTLABEL jython && \ tar zcf jython-nightly.tar.gz jython-$SHOTLABEL && \ rm -fr jython-nightly.zip && \ zip -qr9 jython-nightly.zip jython-$SHOTLABEL && \ rm -fr jython-$SHOTLABEL) >$LOGLABEL 2>&1 regards, finn From tim.one at home.com Sat Jan 27 10:35:14 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 27 Jan 2001 04:35:14 -0500 Subject: [Python-Dev] setup.py In-Reply-To: <20010126092559.A5623@thyrsus.com> Message-ID: [Eric S. Raymond] > I may not channel Guido the way Tim does, but I suspect he gave you > developer privileges because he trusts you to do routine stuff like this. Excellent, Eric! You're batting 1%. Here's how to boost it to 93%: whenever a new idea comes up, just grumble "no". You'll be right 92% of the time . Reminds me of a friend who got sucked into working at a neural-net startup trying to build a black box to predict whether the daily close of the S&P 500 would be above or below the previous day's. He was greatly impressed by the research they had done, showing that the prototype got the right answer more than half the time when fed historical data, and at a very high significance level (i.e., it almost certainly did better than flipping a coin). What he didn't realize at the time is that if they had written the prototype in Python: # S&P close daily direction predictor print "higher" it would have been right about 2/3rds the time <0.33 wink>. never-ascribe-to-insight-what-can-be-explained-by-idiocy-ly y'rs - tim From martin at mira.cs.tu-berlin.de Sat Jan 27 10:38:41 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 27 Jan 2001 10:38:41 +0100 Subject: [Python-Dev] Nightly CVS tarballs Message-ID: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> > Is there any chance of reactivating the nightly tarball generation > that was once in place ? What's wrong with http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz ? Regards, Martin From fredrik at effbot.org Sat Jan 27 11:43:50 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 27 Jan 2001 11:43:50 +0100 Subject: [Python-Dev] setup.py References: Message-ID: <008c01c0884e$09bd2030$e46940d5@hagrid> tim wrote: > Reminds me of a friend who got sucked into working at a neural-net startup > trying to build a black box to predict whether the daily close of the S&P > 500 would be above or below the previous day's. /.../ > > # S&P close daily direction predictor > print "higher" replace "higher" with "same", and you have a pretty decent weather predictor. Cheers /F From mal at lemburg.com Sat Jan 27 13:01:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 13:01:30 +0100 Subject: [Python-Dev] Nightly CVS tarballs References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> Message-ID: <3A72B89A.E03C1912@lemburg.com> "Martin v. Loewis" wrote: > > > Is there any chance of reactivating the nightly tarball generation > > that was once in place ? > > What's wrong with > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz > > ? I didn't realize that SF does this automagically. Could someone please redirect the link on the python.org cvs page to the above address (David Ascher's tarball generation stopped in February 2000 !). Thanks for the hint, Martin. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Sat Jan 27 14:16:01 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 27 Jan 2001 08:16:01 -0500 (EST) Subject: [Python-Dev] Nightly CVS tarballs In-Reply-To: <3A72B89A.E03C1912@lemburg.com> References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> <3A72B89A.E03C1912@lemburg.com> Message-ID: <14962.51729.905084.154359@cj42289-a.reston1.va.home.com> "Martin v. Loewis" wrote: > What's wrong with > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz M.-A. Lemburg writes: > I didn't realize that SF does this automagically. Could someone > please redirect the link on the python.org cvs page to the > above address (David Ascher's tarball generation stopped in > February 2000 !). Did you want a "snapshot" or a copy of the repository? What SF produces is a tarball of the repository, not a snapshot. We still need to do something to create snapshots. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at lemburg.com Sat Jan 27 14:28:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jan 2001 14:28:40 +0100 Subject: [Python-Dev] Nightly CVS tarballs References: <200101270938.f0R9cfU08311@mira.informatik.hu-berlin.de> <3A72B89A.E03C1912@lemburg.com> <14962.51729.905084.154359@cj42289-a.reston1.va.home.com> Message-ID: <3A72CD08.F47DAA69@lemburg.com> "Fred L. Drake, Jr." wrote: > > "Martin v. Loewis" wrote: > > What's wrong with > > > > http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.gz > > M.-A. Lemburg writes: > > I didn't realize that SF does this automagically. Could someone > > please redirect the link on the python.org cvs page to the > > above address (David Ascher's tarball generation stopped in > > February 2000 !). > > Did you want a "snapshot" or a copy of the repository? What SF > produces is a tarball of the repository, not a snapshot. I meant a copy of what you get when you check out the Python CVS tree wrapped into a .tar.gz file. The size of the above archive (16MB) suggests that a lot more is going into the .tar.gz file. A .tar.gz of the CVS checkout is around 4MB in size. Looks like we still need to do something after all ;) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From armin at steinhoff.de Sat Jan 27 17:24:57 2001 From: armin at steinhoff.de (Armin Steinhoff) Date: Sat, 27 Jan 2001 17:24:57 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: <4.3.2.7.2.20010127170125.00b2ee80@mail.secureweb.de> Hello Guido, nice to see the first 2.1 version :) At 16:52 26.01.01 -0500, you wrote: > > [CC'ing to Armin Steinhoff, who maintains pyqnx on SourceForge.] > > > > I'm having trouble building Python 2.1a1 on QNX 4.25. Caveat: my C is very > > rusty (long live Python!), I don't know my way around configure, and am not > > familiar with Python's Makefile. Python 2.0 compiled fine (with a couple of > > tweaks), but I'm getting caught by the new way of building things. Please > > help if you can! Many thanks in advance. > > > > Here's an excerpt of my efforts: > > > > # cd /tmp/py > > # gunzip -c < python-2.1a1.tgz | tar -rf - > > # cd Python-2.1a1 > > # ./configure 2>&1 | tee ../configure.1 I did a fast hack with the new 2.1 version: CC=cc LINKCC=cc configure --without-gcc --shared=no --without-threads (Hope '--shared=no' works ... QNX4 doesn't support dynamic loading) Please replace all references to g++ by cc -> in the main Makefile and the Modules/Makefile. In the Modules/Makefile set LDFLAGS=250K ... the default stacksize of 32K seems to be too small. > > # make 2>&1 | tee ../make.1 > > ... > > ./python //5/tmp/py/Python-2.1a1/setup.py build > > 'import site' failed; use -v for traceback 'python -v' shows that the module 'distutils.util' isn't there .... it seems to be not included in the source distribution. 'import site' failed; traceback: Traceback (most recent call last): File "//1/Python-2.1a1/Lib/site.py", line 85, in ? from distutils.util import get_platform ImportError: No module named distutils.util ^^^^^^^^^^^^^^ [ clip ..] >This is probably in the realm of the distutils. I have no idea how to >teach it to build on QNX, sorry! IMHO ... it is not a path problem. In the moment there is no time left for me to go into these details. A clean port will happen in a few weeks. Please check out PyQNX for news regarding QNX4.25 and QNX6.0 (aka QNX Neutrino). Greetings Armin Steinhoff Life-Demo of PyDACHS http://www.dachs.net/PyDACHS_python-tilcon.htm in our booth at Embedded Systems 2001, Nuremberg, GER http://www.embedded-systems-messe.de Febr. 14-16, 2000 Hall 11, Booth P 04 From guido at digicool.com Sat Jan 27 17:50:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:50:50 -0500 Subject: [Python-Dev] LINKCC defaults to CXX In-Reply-To: Your message of "Fri, 26 Jan 2001 08:12:15 PST." <20010126081215.B5534@glacier.fnational.com> References: <20010126081215.B5534@glacier.fnational.com> Message-ID: <200101271650.LAA30720@cj20424-a.reston1.va.home.com> > Dear lord why? So people can develop extensions using C++? Its > not worth the pain inflicted on everyone else. Let them > recompile with LINKCC=CXX. > > Linking with CXX opens a huge can of stinky worms. First of all, > just because configure found a value for CXX doesn't mean it > works. Even if it does that doesn't mean that using it is a good > idea. Linking with CXX will bring in the C++ runtime. There are > a large number of platforms where the C++ ABI has not been > standarized; for example, anything that used g++. > > Can we please leave LINKCC default to CXX? Its easy enough for > the crazies to override if they like. I'll even create a > configure option for them. Arg. My bad. I did this as an experiment; it didn't break on my machine, but I didn't intend this to become the standard! Thanks for changing it back. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jan 27 17:52:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:52:23 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: Your message of "Fri, 26 Jan 2001 23:51:09 +0100." <3A71FF5D.DC609775@lemburg.com> References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> Message-ID: <200101271652.LAA30750@cj20424-a.reston1.va.home.com> > revision 1.34 > date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 > added rewritten normpath from Moshe Zadka that does the right thing with > paths containing .. [...] > Revision 1.33 clearly leaves initial slashes untouched. > I guess we should restore this... Yes, please! (Just the "leading extra slashes stay" behavior.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jan 27 17:57:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 11:57:40 -0500 Subject: [Python-Dev] New bug in function object hash() and comparisons In-Reply-To: Your message of "Fri, 26 Jan 2001 17:02:09 EST." References: Message-ID: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> Barry noticed: > Anyway, did you know that you can use functions as keys to a > dictionary, but that you can mutate them to "lose" the element? > > -------------------- snip snip -------------------- > Python 2.0 (#13, Jan 10 2001, 13:06:39) > [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> d = {} > >>> def foo(): pass > ... > >>> def bar(): pass > ... > >>> d[foo] = 1 > >>> d[foo] > 1 > >>> foocode = foo.func_code > >>> foo.func_code = bar.func_code > >>> d[foo] > Traceback (most recent call last): > File "", line 1, in ? > KeyError: > >>> d[bar] = 2 > >>> d[bar] > 2 > >>> d[foo] > 2 > >>> foo.func_code = foocode > >>> d[foo] > 1 > -------------------- snip snip -------------------- > > It's because a function's func_code attribute is used in its hash > calculation, but func_code is writable! Clearly, something changed. I'm pretty sure it's the function attributes. Either the function attributes shouldn't be used in comparing function objects, or hash() on functions should be unimplemented, or comparison on functions should use simple pointer compares. What's the right solution? Do people use functions as dict keys? If not, we can remove the hash() implementation. But I suspect they *are* used as dict keys. Not using the __dict__ on comparisons appears ugly, so probably the best solution is to change function comparisons to use simple pointer compares. That removes the possibility to see whether two different functions implement the same code -- but does anybody really use that? --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at zadka.site.co.il Sat Jan 27 18:17:50 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sat, 27 Jan 2001 19:17:50 +0200 (IST) Subject: [Python-Dev] New bug in function object hash() and comparisons In-Reply-To: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com>, Message-ID: <20010127171750.91412A840@darjeeling.zadka.site.co.il> On Sat, 27 Jan 2001 11:57:40 -0500, Guido van Rossum wrote: (about function hash doing the wrong thing) > What's the right solution? I have no idea... > Do people use functions as dict keys? If > not, we can remove the hash() implementation. ...but this ain't it. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From gvwilson at ca.baltimore.com Sat Jan 27 18:23:42 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Sat, 27 Jan 2001 12:23:42 -0500 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1119 - 17 msgs In-Reply-To: <20010127170103.DA6DEEA44@mail.python.org> Message-ID: <000001c08885$e5418c40$770a0a0a@nevex.com> > Guido wrote: > What's the right solution? Do people use functions as dict keys? Yup --- even use this as an example in the course (part of drumming home to students that functions are just a special kind of data). Greg From barry at digicool.com Sat Jan 27 18:43:43 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 27 Jan 2001 12:43:43 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> Message-ID: <14963.2255.268933.615456@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Clearly, something changed. I'm pretty sure it's the GvR> function attributes. Actually no. func_code is used in func_hash() but somewhere in the Python 1.6 cycle, func_code was made assignable. GvR> Either the function attributes shouldn't be used in comparing GvR> function objects, or hash() on functions should be GvR> unimplemented, or comparison on functions should use simple GvR> pointer compares. GvR> What's the right solution? We should definitely continue to allow functions as keys to dictionaries, but probably just remove func_code as an input to the function's hash. -Barry From barry at digicool.com Sat Jan 27 18:48:33 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Sat, 27 Jan 2001 12:48:33 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> <14963.2255.268933.615456@anthem.wooz.org> Message-ID: <14963.2545.14600.667505@anthem.wooz.org> Me> We should definitely continue to allow functions as keys to Me> dictionaries, but probably just remove func_code as an input Me> to the function's hash. But of course, func_globals won't be sufficient as a hash for functions. Probably changing the hash to a pointer compare is the best thing after all. -Barry From guido at digicool.com Sat Jan 27 18:49:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 27 Jan 2001 12:49:16 -0500 Subject: [Python-Dev] Re: New bug in function object hash() and comparisons In-Reply-To: Your message of "Sat, 27 Jan 2001 12:43:43 EST." <14963.2255.268933.615456@anthem.wooz.org> References: <200101271657.LAA30782@cj20424-a.reston1.va.home.com> <14963.2255.268933.615456@anthem.wooz.org> Message-ID: <200101271749.MAA32025@cj20424-a.reston1.va.home.com> > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Clearly, something changed. I'm pretty sure it's the > GvR> function attributes. > > Actually no. func_code is used in func_hash() but somewhere in the > Python 1.6 cycle, func_code was made assignable. Argh! You're right. > GvR> Either the function attributes shouldn't be used in comparing > GvR> function objects, or hash() on functions should be > GvR> unimplemented, or comparison on functions should use simple > GvR> pointer compares. > > GvR> What's the right solution? > > We should definitely continue to allow functions as keys to > dictionaries, but probably just remove func_code as an input to the > function's hash. OK, that settles it. There's not much point in having a function compare do anything besides a pointer comparison when the code objects aren't compared. (Two completely different functions could compare equal e.g. if they has the same attribute dict.) So we should just punt, and compare functions by object pointer. The proper way to do this is to *delete* func_hash and func_compare from funcobject.c -- the default comparison will take care of this. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Sat Jan 27 19:58:30 2001 From: akuchlin at mems-exchange.org (A.M. Kuchling) Date: Sat, 27 Jan 2001 13:58:30 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: Message-ID: <200101271858.NAA04898@mira.erols.com> On Sat, 27 Jan 2001 18:28:02 +0100, Andreas Jung wrote: >Is there a reason why 2.1 runs significantly slower ? >Both Python versions were compiled with -g -O2 only. [CC'ing to python-dev] Confirmed: [amk at mira Python-2.0]$ ./python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.14 This machine benchmarks at 3184.71 pystones/second [amk at mira Python-2.0]$ python2.1 Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.81 This machine benchmarks at 2624.67 pystones/second The ceval.c changes seem a likely candidate to have caused this. Anyone want to run Marc-Andre's microbenchmarks and see how the numbers have changed? --amk From moshez at zadka.site.co.il Sat Jan 27 20:14:28 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Sat, 27 Jan 2001 21:14:28 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? Message-ID: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Attached is an example Python session after I patched the intepreter. The test-suite passes all right. I want an OK to check this in. Here is the patch: Index: Objects/funcobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/funcobject.c,v retrieving revision 2.33 diff -c -r2.33 funcobject.c *** Objects/funcobject.c 2001/01/25 20:06:59 2.33 --- Objects/funcobject.c 2001/01/27 19:13:08 *************** *** 347,358 **** 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ ! (cmpfunc)func_compare, /*tp_compare*/ (reprfunc)func_repr, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ ! (hashfunc)func_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ (getattrofunc)func_getattro, /*tp_getattro*/ --- 347,358 ---- 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ ! 0, /*tp_compare*/ (reprfunc)func_repr, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ ! 0, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ (getattrofunc)func_getattro, /*tp_getattro*/ Python 2.1a1 (#1, Jan 27 2001, 21:01:24) [GCC 2.95.3 20010111 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> def foo(): ... pass ... >>> def bar(): ... pass ... >>> hash(foo) 135484636 >>> hash(bar) 135481676 >>> foo == bar 0 >>> d = {} >>> d[foo] =1 >>> def temp(): ... print "baz" ... >>> foo.func_code = temp.func_code >>> d[foo] 1 -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one at home.com Sat Jan 27 21:06:20 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 27 Jan 2001 15:06:20 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <200101271858.NAA04898@mira.erols.com> Message-ID: [A.M. Kuchling] > [CC'ing to python-dev] Confirmed: > > [amk at mira Python-2.0]$ ./python Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.14 > This machine benchmarks at 3184.71 pystones/second > [amk at mira Python-2.0]$ python2.1 Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.81 > This machine benchmarks at 2624.67 pystones/second > > The ceval.c changes seem a likely candidate to have caused this. > Anyone want to run Marc-Andre's microbenchmarks and see how the > numbers have changed? Want to, yes, but it looks hopeless on my box: **** 2.0 C:\Python20>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 0.851013 This machine benchmarks at 11750.7 pystones/second C:\Python20>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.24279 This machine benchmarks at 8046.41 pystones/second **** 2.1a1 C:\Python21a1>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 0.823313 This machine benchmarks at 12146 pystones/second C:\Python21a1>python lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.27046 This machine benchmarks at 7871.15 pystones/second **** CVS C:\Code\python\dist\src\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.836391 This machine benchmarks at 11956.1 pystones/second C:\Code\python\dist\src\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 1.3055 This machine benchmarks at 7659.9 pystones/second That's after a reboot: no matter which Python I use, it gets about 12000 on the first run with a given python.exe, and about 8000 on the second. Not shown is that it *stays* at about 8000 until the next reboot. So there's a Windows (W98SE) Mystery, but also no evidence that timings have changed worth spit under the MS compiler. The eval loop is very touchy, and I suspect you won't track this down on your box until staring at the code gcc (I presume you're using gcc) generates. May be sensitive to which release of gcc you're using too. switch-to-windows-and-you'll-have-easier-things-to-worry-about-ly y'rs - tim From fredrik at pythonware.com Sun Jan 28 10:37:45 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 28 Jan 2001 10:37:45 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> Message-ID: <00ed01c0890e$e3bf5ad0$e46940d5@hagrid> guido wrote: > > Revision 1.33 clearly leaves initial slashes untouched. > > I guess we should restore this... > > Yes, please! (Just the "leading extra slashes stay" behavior.) just looked this up in the specs, and POSIX seem to require that leading slashes are preserved only if there are exactly two of them: A pathname that begins with two successive slashes may be interpreted in an implementation-dependent manner, although more than two leading slashes are treated as a single slash. (from susv2) maybe we should add a if len(slashes) > 2: slashes = "/" test to the patch? Cheers /F From thomas at xs4all.net Sun Jan 28 18:39:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 28 Jan 2001 18:39:58 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: <00ed01c0890e$e3bf5ad0$e46940d5@hagrid>; from fredrik@pythonware.com on Sun, Jan 28, 2001 at 10:37:45AM +0100 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> <00ed01c0890e$e3bf5ad0$e46940d5@hagrid> Message-ID: <20010128183958.Q962@xs4all.nl> On Sun, Jan 28, 2001 at 10:37:45AM +0100, Fredrik Lundh wrote: > guido wrote: > > > Revision 1.33 clearly leaves initial slashes untouched. > > > I guess we should restore this... > > > > Yes, please! (Just the "leading extra slashes stay" behavior.) > just looked this up in the specs, and POSIX seem to > require that leading slashes are preserved only if there > are exactly two of them: > A pathname that begins with two successive slashes > may be interpreted in an implementation-dependent > manner, although more than two leading slashes are > treated as a single slash. > (from susv2) > maybe we should add a if len(slashes) > 2: slashes = "/" > test to the patch? How strictly do we need (or want, for that matter) to follow POSIX here ? I'm aware the module is called 'posixpath', but it's used in a bit more than just POSIX environments (or POSIX behaviours) so it might make sense to ignore this particular tidbit. What if there is a system that attaches a special meaning to ///, should we create a new path module for it ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at mira.cs.tu-berlin.de Sun Jan 28 21:50:35 2001 From: martin at mira.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 28 Jan 2001 21:50:35 +0100 Subject: [Python-Dev] XSLT parser interface Message-ID: <200101282050.f0SKoZr08809@mira.informatik.hu-berlin.de> Based on my previous IDL interface for XPath parsers, I've defined an API for a parser that parsers XSLT pattern expressions. It is an extension to the XPath API, so I attach only the additional functions. Any comments are appreciated. Martin module XPath{ // XSLT exprType values const unsigned short PATTERN = 17; const unsigned short LOCATION_PATTERN = 18; const unsigned short RELATIVE_PATH_PATTERN = 19; const unsigned short STEP_PATTERN = 20; interface Pattern; interface LocationPathPattern; interface RelativePathPattern; interface StepPattern; interface PatternFactory:ExprFactory{ Pattern createPattern(in LocationPathPattern first); // idkey may be null, represents IdKeyPattern // if parent is true, it is '/', else '//' // rel may be null LocationPathPattern createLocationPathPattern(in FunctionCall idkey, boolean parent, in RelativePathPattern rel); // if parent is true, it is /, else // RelativePathPattern createRelativePathPattern(in RelativePathPattern rel, boolean parent, in StepPattern step); StepPattern createStepPattern(in AxisSpecifier axis, in NodeTest test, in PredicateList predicates); }; typedef sequence LocationPathPatterns; interface Pattern:Expr{ readonly attribute LocationPathPatterns patterns; void append(in LocationPathPattern pattern); }; interface LocationPathPattern:Expr{ readonly attribute FunctionCall idkey; readonly attribute boolean parent; readonly attribute RelativePathPattern relative_pattern; }; interface RelativePathPattern:Expr{ readonly attribute RelativePathPattern relative; readonly attribute boolean parent; readonly attribute StepPattern step; }; interface StepPattern:Expr{ readonly attribute AxisSpecifier axis; readonly attribute NodeTest test; readonly attribute PredicateList predicates; }; interface XSLTParser:Parser{ Pattern parsePattern(in DOMString pattern); }; }; From skip at mojam.com Sun Jan 28 22:40:28 2001 From: skip at mojam.com (Skip Montanaro) Date: Sun, 28 Jan 2001 15:40:28 -0600 (CST) Subject: [Python-Dev] What happened to Setup.local's functionality? Message-ID: <14964.37324.642566.602319@beluga.mojam.com> I just remembered Modules/Setup.local. I peeked at mine and noticed it had been zeroed out. I then copied a version of it over from another machine and reran make a couple times. Makesetup ran but nothing mentioned in Setup.local got built. I don't think 2.1 can be released without providing a way for users to recover from this change. I didn't see anything obvious in setup.py. Am I missing something? Skip From thomas at xs4all.net Mon Jan 29 01:39:04 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 29 Jan 2001 01:39:04 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.20,2.21 In-Reply-To: <20001104001415.A2093@53b.hoffleit.de>; from gregor@hoffleit.de on Sat, Nov 04, 2000 at 12:14:15AM +0100 References: <200010050142.SAA08326@slayer.i.sourceforge.net> <20001104001415.A2093@53b.hoffleit.de> Message-ID: <20010129013904.R962@xs4all.nl> On Sat, Nov 04, 2000 at 12:14:15AM +0100, Gregor Hoffleit wrote: > FYI: This misdefinition with LONG_BIT was due to a bug in glibc's limits.h. It > has been fixed in glibc 2.96. Do you mean gcc 2.96, or glibc 2.(1|2).96 ? Or is 2.96 some internal versioning for glibc that I was unaware of ? :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry at digicool.com Mon Jan 29 06:03:45 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 00:03:45 -0500 Subject: [Python-Dev] Function Hash: Check it in? References: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <14964.63921.966960.445548@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> Attached is an example Python session after I patched the MZ> intepreter. The test-suite passes all right. MZ> I want an OK to check this in. Moshe, please remove the func_hash() and func_compare() functions, and if the patch passes the test suite, go ahead and check it all in. Please also check in a test case. Thanks, -Barry From barry at digicool.com Mon Jan 29 06:04:12 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 00:04:12 -0500 Subject: [Python-Dev] Function Hash: Check it in? References: <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <14964.63948.492662.775413@anthem.wooz.org> Oh yeah, please also add an entry to the NEWS file. Thanks, -Barry From moshez at zadka.site.co.il Mon Jan 29 07:26:25 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 08:26:25 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <14964.63948.492662.775413@anthem.wooz.org> References: <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> Message-ID: <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 00:04:12 -0500, barry at digicool.com (Barry A. Warsaw) wrote: > Oh yeah, please also add an entry to the NEWS file. Done. The checkin to the NEWS file will be done in about a million years, when my antique of a modem finishes sending the data. I had to change test_opcodes since it tested that functions with the same code compare equal. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From gregor at mediasupervision.de Mon Jan 29 12:13:39 2001 From: gregor at mediasupervision.de (Gregor Hoffleit) Date: Mon, 29 Jan 2001 12:13:39 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.20,2.21 In-Reply-To: <20010129013904.R962@xs4all.nl>; from thomas@xs4all.net on Mon, Jan 29, 2001 at 01:39:04AM +0100 References: <200010050142.SAA08326@slayer.i.sourceforge.net> <20001104001415.A2093@53b.hoffleit.de> <20010129013904.R962@xs4all.nl> Message-ID: <20010129121339.A1166@mediasupervision.de> On Mon, Jan 29, 2001 at 01:39:04AM +0100, Thomas Wouters wrote: > On Sat, Nov 04, 2000 at 12:14:15AM +0100, Gregor Hoffleit wrote: > > FYI: This misdefinition with LONG_BIT was due to a bug in glibc's limits.h. It > > has been fixed in glibc 2.96. > > Do you mean gcc 2.96, or glibc 2.(1|2).96 ? Or is 2.96 some internal > versioning for glibc that I was unaware of ? :) Sorry, it was fixed in glibc 2.1.96. Gregor From mal at lemburg.com Mon Jan 29 12:31:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 12:31:11 +0100 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 References: <20010126170101.B2762@amarok.cnri.reston.va.us> <3A71F6ED.D6D642A7@lemburg.com> <001a01c087e6$ec3b9710$e46940d5@hagrid> <3A71FF5D.DC609775@lemburg.com> <200101271652.LAA30750@cj20424-a.reston1.va.home.com> Message-ID: <3A75547F.A601E219@lemburg.com> Guido van Rossum wrote: > > > revision 1.34 > > date: 2000/07/19 17:09:51; author: montanaro; state: Exp; lines: +18 -23 > > added rewritten normpath from Moshe Zadka that does the right thing with > > paths containing .. > [...] > > Revision 1.33 clearly leaves initial slashes untouched. > > I guess we should restore this... > > Yes, please! (Just the "leading extra slashes stay" behavior.) Checked in a patch which preserves '/' and '//' but converts more than 3 initial slashes into one (see Fredrik's note about POSIX standard on this). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 29 13:24:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 13:24:15 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> Message-ID: <3A7560EF.39D6CF@lemburg.com> Here the results of my micro benckmark pybench 0.7: PYBENCH 0.7 Benchmark: /home/lemburg/tmp/pybench-2.1a1.pyb (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 1102.30 ms 8.65 us +7.56% BuiltinMethodLookup: 966.75 ms 1.84 us +4.56% ConcatStrings: 1198.55 ms 7.99 us +11.63% ConcatUnicode: 1835.60 ms 12.24 us +19.29% CreateInstances: 1556.40 ms 37.06 us +2.49% CreateStringsWithConcat: 1396.70 ms 6.98 us +5.44% CreateUnicodeWithConcat: 1895.80 ms 9.48 us +31.61% DictCreation: 1760.50 ms 11.74 us +2.43% ForLoops: 1426.90 ms 142.69 us -7.51% IfThenElse: 1155.25 ms 1.71 us -6.24% ListSlicing: 555.40 ms 158.69 us -4.14% NestedForLoops: 784.55 ms 2.24 us -6.33% NormalClassAttribute: 1052.80 ms 1.75 us -10.42% NormalInstanceAttribute: 1053.80 ms 1.76 us +0.89% PythonFunctionCalls: 1127.50 ms 6.83 us +12.56% PythonMethodCalls: 909.10 ms 12.12 us +9.70% Recursion: 942.40 ms 75.39 us +23.74% SecondImport: 924.20 ms 36.97 us +3.98% SecondPackageImport: 951.10 ms 38.04 us +6.16% SecondSubmoduleImport: 1211.30 ms 48.45 us +7.69% SimpleComplexArithmetic: 1635.30 ms 7.43 us +5.58% SimpleDictManipulation: 963.35 ms 3.21 us -0.57% SimpleFloatArithmetic: 877.00 ms 1.59 us -2.92% SimpleIntFloatArithmetic: 851.10 ms 1.29 us -5.89% SimpleIntegerArithmetic: 850.05 ms 1.29 us -6.41% SimpleListManipulation: 1168.50 ms 4.33 us +8.14% SimpleLongArithmetic: 1231.15 ms 7.46 us +1.52% SmallLists: 2153.35 ms 8.44 us +10.77% SmallTuples: 1314.65 ms 5.48 us +3.80% SpecialClassAttribute: 1050.80 ms 1.75 us +1.48% SpecialInstanceAttribute: 1248.75 ms 2.08 us -2.32% StringMappings: 1702.60 ms 13.51 us +19.69% StringPredicates: 1024.25 ms 3.66 us -25.49% StringSlicing: 1093.35 ms 6.25 us +4.35% TryExcept: 1584.85 ms 1.06 us -10.90% TryRaiseExcept: 1239.50 ms 82.63 us +4.64% TupleSlicing: 983.00 ms 9.36 us +3.36% UnicodeMappings: 1631.65 ms 90.65 us +42.76% UnicodePredicates: 1762.10 ms 7.83 us +15.99% UnicodeProperties: 1410.80 ms 7.05 us +19.57% UnicodeSlicing: 1366.20 ms 7.81 us +19.23% ------------------------------------------------------------------------ Average round time: 58001.00 ms +3.30% *) measured against: /home/lemburg/tmp/pybench-2.0.pyb (rounds=10, warp=20) The benchmark is available here in case someone wants to verify the results on different platforms: http://www.lemburg.com/python/pybench-0.7.zip The above tests were done on a Linux 2.2 system, AMD K6 233MHz. The figures shown compare CVS Python (2.1a1) against stock Python 2.0. As you can see, Python function calls have suffered a lot for some reason. Unicode mappings and other Unicode database related methods show the effect of the compression of the Unicode database -- a clear space/speed tradeoff. I can't really explain why Unicode concatenation has had a slowdown -- perhaps the new coercion logic has something to do with this ?! On the nice side: attribute lookups are faster; probably due to the string key optimizations in the dictionary implementation. Loops and exceptions are also a tad faster. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Mon Jan 29 13:30:32 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 29 Jan 2001 13:30:32 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> Message-ID: <01fc01c089ef$48072230$0900a8c0@SPIFF> mal wrote: > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > Unicode mappings and other Unicode database related methods > show the effect of the compression of the Unicode database -- a > clear space/speed tradeoff. umm. the tests don't seem to test the "\N{name}" escapes, so the only thing that has changed in 2.1 is the "decomposition" method (used in the UnicodeProperties test). are you sure you're comparing against 2.0 final? Cheers /F From mal at lemburg.com Mon Jan 29 13:52:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 13:52:12 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> Message-ID: <3A75677C.E4FA82A0@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > > > Unicode mappings and other Unicode database related methods > > show the effect of the compression of the Unicode database -- a > > clear space/speed tradeoff. > > umm. the tests don't seem to test the "\N{name}" escapes, so the > only thing that has changed in 2.1 is the "decomposition" method > (used in the UnicodeProperties test). The mappings figure surprised me too: the code has not changed, but the unicodetype_db.h look different. Don't know how this affects performance though. The differences could also be explained by a increase in Unicode object creation time (the concatenation is also a lot slower), so perhaps that's where we should look... > are you sure you're comparing against 2.0 final? Yes... after a check of the Makefile I found that I had compiled Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this makes a difference w/r to inlining of code. I'll recompile and rerun the benchmark. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim.one at home.com Mon Jan 29 13:56:49 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 07:56:49 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include Message-ID: [Ping] > dict[key] = 1 > if key in dict: ... > for key in dict: ... [Guido] > No chance of a time-machine escape, but I *can* say that I agree that > Ping's proposal makes a lot of sense. This is a reversal of my > previous opinion on this matter. (Take note -- those don't happen > very often! :-) > > First to submit a working patch gets a free copy of 2.1a2 and > subsequent releases, Thomas since submitted a patch to do the "if key in dict" part (which I reviewed and accepted, pending resolution of doc issues). It does not do the "for key in dict" part. It's not entirely clear whether you intended to approve that part too (I've simplified away many layers of quoting in the above ). In any case, nobody is working on that part. WRT that part, Ping produced some stats in: http://mail.python.org/pipermail/python-dev/2001-January/012106.html > How often do you write 'dict.has_key(x)'? (std lib says: 206) > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > How often do you write 'x in dict.values()'? (std lib says: 0) > How often do you write 'for x in dict.values()'? (std lib says: 3) However, he did not report on occurrences of for k, v in dict.items() I'm not clear exactly which files he examined in the above, or how the counts were obtained. So I don't know how this compares: I counted 188 instances of the string ".items(" in 122 .py files, under the dist/ portion of current CVS. A number of those were assignment and return stmts, others were dict.items() in an arglist, and at least one was in a comment. After weeding those out, I was left with 153 legit "for" loops iterating over x.items(). In all: 153 iterating over x.items() 118 " over x.keys() 17 " over x.values() So I conclude that iterating over .values() is significantly more common than iterating over .keys(). On c.l.py about an hour ago, Thomas complained that two (out of two) of his coworkers guessed wrong about what for x in dict: would do, but didn't say what they *did* think it would do. Since Thomas doesn't work with idiots, I'm guessing they *didn't* guess it would iterate over either values or the lines of a freshly-opened file named "dict" . So if you did intend to approve "for x in dict" iterating over dict.keys(), maybe you want to call me out on that "approval post" I forged under your name. falls-on-swords-so-often-there's-nothing-left-to-puncture-ly y'rs - tim From mal at lemburg.com Mon Jan 29 14:18:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 14:18:52 +0100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> Message-ID: <3A756DBC.8EAC42F5@lemburg.com> "M.-A. Lemburg" wrote: > > Fredrik Lundh wrote: > > > > mal wrote: > > > UnicodeMappings: 1631.65 ms 90.65 us +42.76% > > > UnicodePredicates: 1762.10 ms 7.83 us +15.99% > > > UnicodeProperties: 1410.80 ms 7.05 us +19.57% > > > UnicodeSlicing: 1366.20 ms 7.81 us +19.23% > > > > > > Unicode mappings and other Unicode database related methods > > > show the effect of the compression of the Unicode database -- a > > > clear space/speed tradeoff. > > > > umm. the tests don't seem to test the "\N{name}" escapes, so the > > only thing that has changed in 2.1 is the "decomposition" method > > (used in the UnicodeProperties test). > > The mappings figure surprised me too: the code has not changed, > but the unicodetype_db.h look different. Don't know how this > affects performance though. > > The differences could also be explained by a increase in Unicode > object creation time (the concatenation is also a lot slower), > so perhaps that's where we should look... > > > are you sure you're comparing against 2.0 final? > > Yes... after a check of the Makefile I found that I had compiled > Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this makes > a difference w/r to inlining of code. I'll recompile and rerun > the benchmark. Looks like there is an effect of choosing -O3 over -O2 (even though not necessarily positive all the way); what results do you get on Windows ? -- PYBENCH 0.7 Benchmark: /home/lemburg/tmp/pybench-2.1a1.pyb (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 1065.10 ms 8.35 us +3.93% BuiltinMethodLookup: 1286.30 ms 2.45 us +39.12% ConcatStrings: 1243.30 ms 8.29 us +15.80% ConcatUnicode: 1449.10 ms 9.66 us -5.83% CreateInstances: 1639.25 ms 39.03 us +7.95% CreateStringsWithConcat: 1453.45 ms 7.27 us +9.73% CreateUnicodeWithConcat: 1558.45 ms 7.79 us +8.19% DictCreation: 1869.35 ms 12.46 us +8.77% ForLoops: 1526.85 ms 152.69 us -1.03% IfThenElse: 1381.00 ms 2.05 us +12.09% ListSlicing: 547.40 ms 156.40 us -5.52% NestedForLoops: 824.50 ms 2.36 us -1.56% NormalClassAttribute: 1233.55 ms 2.06 us +4.96% NormalInstanceAttribute: 1215.50 ms 2.03 us +16.37% PythonFunctionCalls: 1107.30 ms 6.71 us +10.55% PythonMethodCalls: 1047.00 ms 13.96 us +26.34% Recursion: 940.35 ms 75.23 us +23.47% SecondImport: 894.05 ms 35.76 us +0.59% SecondPackageImport: 915.05 ms 36.60 us +2.14% SecondSubmoduleImport: 1131.10 ms 45.24 us +0.56% SimpleComplexArithmetic: 1652.05 ms 7.51 us +6.67% SimpleDictManipulation: 1150.25 ms 3.83 us +18.72% SimpleFloatArithmetic: 889.65 ms 1.62 us -1.52% SimpleIntFloatArithmetic: 900.80 ms 1.36 us -0.40% SimpleIntegerArithmetic: 901.75 ms 1.37 us -0.72% SimpleListManipulation: 1125.40 ms 4.17 us +4.15% SimpleLongArithmetic: 1305.15 ms 7.91 us +7.62% SmallLists: 2102.85 ms 8.25 us +8.18% SmallTuples: 1329.55 ms 5.54 us +4.98% SpecialClassAttribute: 1234.60 ms 2.06 us +19.23% SpecialInstanceAttribute: 1422.55 ms 2.37 us +11.28% StringMappings: 1585.55 ms 12.58 us +11.46% StringPredicates: 1241.35 ms 4.43 us -9.69% StringSlicing: 1206.20 ms 6.89 us +15.12% TryExcept: 1764.35 ms 1.18 us -0.81% TryRaiseExcept: 1217.40 ms 81.16 us +2.77% TupleSlicing: 933.00 ms 8.89 us -1.90% UnicodeMappings: 1137.35 ms 63.19 us -0.49% UnicodePredicates: 1632.05 ms 7.25 us +7.43% UnicodeProperties: 1244.05 ms 6.22 us +5.44% UnicodeSlicing: 1252.10 ms 7.15 us +9.27% ------------------------------------------------------------------------ Average round time: 58804.00 ms +4.73% *) measured against: /home/lemburg/tmp/pybench-2.0.pyb (rounds=10, warp=20) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 29 14:28:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 14:28:24 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A756FF8.B7185FA2@lemburg.com> Tim Peters wrote: > > [Ping] > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > [Guido] > > No chance of a time-machine escape, but I *can* say that I agree that > > Ping's proposal makes a lot of sense. This is a reversal of my > > previous opinion on this matter. (Take note -- those don't happen > > very often! :-) > > > > First to submit a working patch gets a free copy of 2.1a2 and > > subsequent releases, > > Thomas since submitted a patch to do the "if key in dict" part (which I > reviewed and accepted, pending resolution of doc issues). > > It does not do the "for key in dict" part. It's not entirely clear whether > you intended to approve that part too (I've simplified away many layers of > quoting in the above ). In any case, nobody is working on that part. > > WRT that part, Ping produced some stats in: > > http://mail.python.org/pipermail/python-dev/2001-January/012106.html > > > How often do you write 'dict.has_key(x)'? (std lib says: 206) > > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > > > How often do you write 'x in dict.values()'? (std lib says: 0) > > How often do you write 'for x in dict.values()'? (std lib says: 3) > > However, he did not report on occurrences of > > for k, v in dict.items() > > I'm not clear exactly which files he examined in the above, or how the > counts were obtained. So I don't know how this compares: I counted 188 > instances of the string ".items(" in 122 .py files, under the dist/ portion > of current CVS. A number of those were assignment and return stmts, others > were dict.items() in an arglist, and at least one was in a comment. After > weeding those out, I was left with 153 legit "for" loops iterating over > x.items(). In all: > > 153 iterating over x.items() > 118 " over x.keys() > 17 " over x.values() > > So I conclude that iterating over .values() is significantly more common > than iterating over .keys(). > > On c.l.py about an hour ago, Thomas complained that two (out of two) of his > coworkers guessed wrong about what > > for x in dict: > > would do, but didn't say what they *did* think it would do. Since Thomas > doesn't work with idiots, I'm guessing they *didn't* guess it would iterate > over either values or the lines of a freshly-opened file named "dict" > . > > So if you did intend to approve "for x in dict" iterating over dict.keys(), > maybe you want to call me out on that "approval post" I forged under your > name. Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect... Please also take into account that dictionaries are *mutable* and their internal state is not defined to e.g. not change due to lookups (take the string optimization for example...), so exposing PyDict_Next() in any to Python will cause trouble. In the end, you will need to create a list or tuple to iterate over one way or another, so why bother overloading for-loops w/r to dictionaries ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From bckfnn at worldonline.dk Mon Jan 29 14:48:44 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Mon, 29 Jan 2001 13:48:44 GMT Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> References: <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> Message-ID: <3a75747e.17414620@smtp.worldonline.dk> On Mon, 29 Jan 2001 08:26:25 +0200 (IST), you wrote: >I had to change test_opcodes since it tested that functions with the >same code compare equal. Thanks. With this change, Jython too can complete the test_opcodes. In Jython a code object can never compare equal to anything but itself. regards, finn From moshez at zadka.site.co.il Mon Jan 29 15:04:47 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 16:04:47 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <3a75747e.17414620@smtp.worldonline.dk> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> Message-ID: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 13:48:44 GMT, bckfnn at worldonline.dk (Finn Bock) wrote: > Thanks. With this change, Jython too can complete the test_opcodes. In > Jython a code object can never compare equal to anything but itself. Great! I'm happy to have helped. I'm starting to wonder what the tests really test: the language definition, or accidents of the implementation? -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From MarkH at ActiveState.com Mon Jan 29 15:35:25 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 30 Jan 2001 01:35:25 +1100 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A756DBC.8EAC42F5@lemburg.com> Message-ID: "M.-A. Lemburg" wrote: > what results do you get on Windows ? Win2k, dual 800, relatively quiet! Python 2.0 F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.847605 This machine benchmarks at 11798 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.845104 This machine benchmarks at 11832.9 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.846069 This machine benchmarks at 11819.4 pystones/second F:\src\Python-2.0\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.849447 This machine benchmarks at 11772.4 pystones/second Python from CVS today: F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.885801 This machine benchmarks at 11289.2 pystones/second F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.889048 This machine benchmarks at 11248 pystones/second F:\src\python-cvs\PCbuild>python ..\lib\test\pystone.py Pystone(1.1) time for 10000 passes = 0.892422 This machine benchmarks at 11205.5 pystones/second Although I deleted Tim's earlier mail, from memory this is pretty similar in terms of performance lost. I'm afraid I have no idea what your benchmarks are or how to build them , but did check that the optimizer is set for "mazimize for speed" (/O2). Other compiler options gave significantly smaller results (no optimizations around 8500, and "optimize for space" (/O1) at around 10000). Other fiddling with the optimizer couldn't get better results than the existing settings. Mark. From guido at digicool.com Mon Jan 29 15:48:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 09:48:22 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 07:56:49 EST." References: Message-ID: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> > [Ping] > > dict[key] = 1 > > if key in dict: ... > > for key in dict: ... > > [Guido] > > No chance of a time-machine escape, but I *can* say that I agree that > > Ping's proposal makes a lot of sense. This is a reversal of my > > previous opinion on this matter. (Take note -- those don't happen > > very often! :-) > > > > First to submit a working patch gets a free copy of 2.1a2 and > > subsequent releases, > > Thomas since submitted a patch to do the "if key in dict" part (which I > reviewed and accepted, pending resolution of doc issues). > > It does not do the "for key in dict" part. It's not entirely clear whether > you intended to approve that part too (I've simplified away many layers of > quoting in the above ). In any case, nobody is working on that part. > > WRT that part, Ping produced some stats in: > > http://mail.python.org/pipermail/python-dev/2001-January/012106.html > > > How often do you write 'dict.has_key(x)'? (std lib says: 206) > > How often do you write 'for x in dict.keys()'? (std lib says: 49) > > > > How often do you write 'x in dict.values()'? (std lib says: 0) > > How often do you write 'for x in dict.values()'? (std lib says: 3) > > However, he did not report on occurrences of > > for k, v in dict.items() > > I'm not clear exactly which files he examined in the above, or how the > counts were obtained. So I don't know how this compares: I counted 188 > instances of the string ".items(" in 122 .py files, under the dist/ portion > of current CVS. A number of those were assignment and return stmts, others > were dict.items() in an arglist, and at least one was in a comment. After > weeding those out, I was left with 153 legit "for" loops iterating over > x.items(). In all: > > 153 iterating over x.items() > 118 " over x.keys() > 17 " over x.values() > > So I conclude that iterating over .values() is significantly more common > than iterating over .keys(). I did a less sophisticated count but come to the same conclusion: iterations over items() are (somewhat) more common than over keys(), and values() are 1-2 orders of magnitude less common. My numbers: $ cd python/src/Lib $ grep 'for .*items():' *.py | wc -l 47 $ grep 'for .*keys():' *.py | wc -l 43 $ grep 'for .*values():' *.py | wc -l 2 > On c.l.py about an hour ago, Thomas complained that two (out of two) of his > coworkers guessed wrong about what > > for x in dict: > > would do, but didn't say what they *did* think it would do. Since Thomas > doesn't work with idiots, I'm guessing they *didn't* guess it would iterate > over either values or the lines of a freshly-opened file named "dict" > . I don't much value to the readability argument: typically, one will write "for key in dict" or "for name in dict" and then it's obvious what is meant. > So if you did intend to approve "for x in dict" iterating over dict.keys(), > maybe you want to call me out on that "approval post" I forged under your > name. But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody has even asked me for a has_item() method). I can live with "x in list" checking the values and "x in dict" checking the keys. But I can *not* live with "x in dict" equivalent to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()". I also think that defining "x in dict" but not "for x in dict" will be confusing. So we need to think more. How about: for key in dict: ... # ... over keys for key:value in dict: ... # ... over items This is syntactically unambiguous (a colon is currently illegal in that position). This also suggests: for index:value in list: ... # ... over zip(range(len(list), list) while doesn't strike me as bad or ugly, and would fulfill my brother's dearest wish. (And why didn't we think of this before?) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Mon Jan 29 15:58:16 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 29 Jan 2001 15:58:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291448.JAA11473@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:48:22AM -0500 References: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: <20010129155816.T962@xs4all.nl> On Mon, Jan 29, 2001 at 09:48:22AM -0500, Guido van Rossum wrote: > How about: > for key in dict: ... # ... over keys > for key:value in dict: ... # ... over items > This is syntactically unambiguous (a colon is currently illegal in > that position). I won't comment on the syntax right now, I need to look at it for a while first :-) However, what about MAL's point about dict ordering, internally ? Wouldn't FOR_LOOP be forced to generate a list of keys anyway, to avoid skipping keys ? I know currently the dict implementation doesn't do any reordering except during adds/deletes, but there is nothing in the language ref that supports that -- it's an implementation detail. Would we make a future enhancement where (some form of) gc would 'clean up' large dictionaries impossible ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jan 29 16:00:38 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:00:38 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 14:28:24 +0100." <3A756FF8.B7185FA2@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> Message-ID: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> > Dictionaries are not sequences. I wonder what order a user of > for k,v in dict: (or whatever other of this proposal you choose) > will expect... The same order that for k,v in dict.items() will yield, of course. > Please also take into account that dictionaries are *mutable* > and their internal state is not defined to e.g. not change due to > lookups (take the string optimization for example...), so exposing > PyDict_Next() in any to Python will cause trouble. In the end, > you will need to create a list or tuple to iterate over one way > or another, so why bother overloading for-loops w/r to dictionaries ? Actually, I was going to propose to play dangerously here: the for k:v in dict: ... syntax I proposed in my previous message should indeed expose PyDict_Next(). It should be a big speed-up, and I'm expecting (though don't have much proof) that most loops over dicts don't mutate the dict. Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 29 16:30:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:30:17 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: Your message of "Mon, 29 Jan 2001 16:04:47 +0200." <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <200101291530.KAA12037@cj20424-a.reston1.va.home.com> > I'm starting to wonder what the tests really test: the language definition, > or accidents of the implementation? It's good to test conformance to the language definition, but this is also a regression test for the implementation. The "accidents of the implementation" definitely need to be tested. E.g. if we decide that repr(s) uses \n rather than \012 or \x0a, this should be tested too. The language definition gives the implementer a choice here; but once the implementer has made a choice, it's good to have a test that tests that this choice is implemented correctly. Perhaps there should be several parts to the regression test, e.g. language conformance, library conformance, platform-specific features, and implementation conformance? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jan 29 16:57:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 10:57:12 -0500 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: Your message of "Sun, 28 Jan 2001 15:40:28 CST." <14964.37324.642566.602319@beluga.mojam.com> References: <14964.37324.642566.602319@beluga.mojam.com> Message-ID: <200101291557.KAA12347@cj20424-a.reston1.va.home.com> > I just remembered Modules/Setup.local. I peeked at mine and noticed it had > been zeroed out. I then copied a version of it over from another machine > and reran make a couple times. Makesetup ran but nothing mentioned in > Setup.local got built. > > I don't think 2.1 can be released without providing a way for users to > recover from this change. I didn't see anything obvious in setup.py. Am I > missing something? Well, Module/Setup is still used, so it should be trivial to add Setup.local back too. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Mon Jan 29 10:23:55 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 29 Jan 2001 01:23:55 -0800 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <14964.37324.642566.602319@beluga.mojam.com>; from skip@mojam.com on Sun, Jan 28, 2001 at 03:40:28PM -0600 References: <14964.37324.642566.602319@beluga.mojam.com> Message-ID: <20010129012355.A14763@glacier.fnational.com> On Sun, Jan 28, 2001 at 03:40:28PM -0600, Skip Montanaro wrote: > Makesetup ran but nothing mentioned in Setup.local got built. I believe Setup.local should still work. One possibility is that the modules in Setup.local were marked as shared. Shared modules from Setup* don't get build by default. You have to do "make oldsharedmods". I'm not sure why oldsharedmods is not included in the all target. Andrew, can you think of any reason why it shouldn't be added. Neil From dgoodger at atsautomation.com Mon Jan 29 17:19:12 2001 From: dgoodger at atsautomation.com (Goodger, David) Date: Mon, 29 Jan 2001 11:19:12 -0500 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 Message-ID: Marc-Andre Lemburg's patch to posixpath.py clears up the path problem. Thanks! MACHDEP is qnxJ for QNX 4.25, qnxG for QNX 4.23. I don't know what it is for QNX 6 (Neutrino). Perhaps test for MACHDEP[:3]=='qnx'? I'm still stuck at 'python setup.py build': unable to execute ld: no such file or directory running build running build_ext building 'struct' extension skipping //5/tmp/py/Python-2.1a1/Modules/structmodule.c (build/temp.qnx-J-PCI-2.1/structmodule.o up-to-date) ld build/temp.qnx-J-PCI-2.1/structmodule.o -L/usr/local/lib -o build/lib.qnx-J-PCI-2.1/struct.so error: command 'ld' failed with exit status 1 make: *** [sharedmods] Error 1 Armin Steinhoff said "QNX4 doesn't support dynamic loading". Is this compatible with distutils? If not, is there a workaround? Neil Schemenauer asked, "what should LDSHARED say for QNX?". I don't know. Python 2.0 compiled OK, and its makefile says LDSHARED=ld. However, Modules/Setup has no uncommented "*shared*" line. Those of us who rely on Python to get our work done, and who don't have the bandwidth for the implementation complexities, owe a lot to everyone who makes it possible to compile Python out-of-the-box. Very much appreciated. Thank you! David Goodger Systems Administrator & Programmer, Advanced Systems Automation Tooling Systems Inc., Automation Systems Division direct: (519) 653-4483 ext. 7121 fax: (519) 650-6695 e-mail: dgoodger at atsautomation.com From nas at arctrix.com Mon Jan 29 10:40:07 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 29 Jan 2001 01:40:07 -0800 Subject: [Python-Dev] problems building Python 2.1a1 on QNX 4.25 In-Reply-To: ; from dgoodger@atsautomation.com on Mon, Jan 29, 2001 at 11:19:12AM -0500 References: Message-ID: <20010129014007.C14763@glacier.fnational.com> On Mon, Jan 29, 2001 at 11:19:12AM -0500, Goodger, David wrote: > I'm still stuck at 'python setup.py build': ... > Armin Steinhoff said "QNX4 doesn't support dynamic loading". Is this > compatible with distutils? If not, is there a workaround? The setup.py script only builds shared modules. Your going to have to enable modules using the old Setup file. I think Setup.dist should got back to including all the modules (commented out of course). This would make it easier to people who can't or don't want to build shared modules. Neil From akuchlin at cnri.reston.va.us Mon Jan 29 17:50:31 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 29 Jan 2001 11:50:31 -0500 Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <20010129012355.A14763@glacier.fnational.com>; from nas@arctrix.com on Mon, Jan 29, 2001 at 01:23:55AM -0800 References: <14964.37324.642566.602319@beluga.mojam.com> <20010129012355.A14763@glacier.fnational.com> Message-ID: <20010129115031.B4018@amarok.cnri.reston.va.us> On Mon, Jan 29, 2001 at 01:23:55AM -0800, Neil Schemenauer wrote: >from Setup* don't get build by default. You have to do "make >oldsharedmods". I'm not sure why oldsharedmods is not included >in the all target. Andrew, can you think of any reason why it >shouldn't be added. That's an excellent idea, particularly if we add back Setup.dist, too, and comment out all but the required modules. I'll try to do that today. Note that I'm leaving on vacation tomorrow, and will be back next Monday. Everyone, feel free to check in changes to setup.py that are required. --amk From jeremy at alum.mit.edu Mon Jan 29 17:48:11 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 11:48:11 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A75677C.E4FA82A0@lemburg.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> Message-ID: <14965.40651.233438.311104@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: MAL> Yes... after a check of the Makefile I found that I had MAL> compiled Python 2.0 with -O3 and 2.1a1 with -O2 -- perhaps this MAL> makes a difference w/r to inlining of code. I'll recompile and MAL> rerun the benchmark. When I was working in the CALL_FUNCTION revision, I compared 2.0 final with my development working using -O3. At that time, I saw no significant performance difference between the two. And I did notice a difference between -O2 and -O3. The strange thing is that I notice a difference between -O2 and -O3 with 2.1a1, but in the opposite direction. On pystone, python -O2 runs consistently faster than -O3; the difference is .05 sec on my machine. Jeremy From esr at thyrsus.com Mon Jan 29 18:12:05 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 12:12:05 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.40651.233438.311104@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 29, 2001 at 11:48:11AM -0500 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> Message-ID: <20010129121205.A8337@thyrsus.com> Jeremy Hylton : > The strange thing is that I notice a difference between -O2 and -O3 > with 2.1a1, but in the opposite direction. On pystone, python -O2 > runs consistently faster than -O3; the difference is .05 sec on my > machine. Bizarre. Make me wonder if we have a C compiler problem. -- Eric S. Raymond In every country and in every age, the priest has been hostile to liberty. He is always in alliance with the despot, abetting his abuses in return for protection to his own. -- Thomas Jefferson, 1814 From jeremy at alum.mit.edu Mon Jan 29 18:27:08 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 12:27:08 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <20010129121205.A8337@thyrsus.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> <20010129121205.A8337@thyrsus.com> Message-ID: <14965.42988.362288.154254@localhost.localdomain> >>>>> "ESR" == Eric S Raymond writes: ESR> Jeremy Hylton : >> The strange thing is that I notice a difference between -O2 and >> -O3 with 2.1a1, but in the opposite direction. On pystone, >> python -O2 runs consistently faster than -O3; the difference is >> .05 sec on my machine. ESR> Bizarre. Make me wonder if we have a C compiler problem. Depends on your defintion of "compiler problem" . If you mean, it compiles our code so it runs slower, then, yes, we've got one :-). One of the differences between -O2 and -O3, according to the man page, is that -O3 will perform optimizations that involve a space-speed tradeoff. It also include -finline-functions. I can imagine that some of these optimizations hurt memory performance enough to make a difference. not-really-understanding-but-not-really-expecting-too-ly y'rs, Jeremy From jeremy at alum.mit.edu Mon Jan 29 18:39:05 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 12:39:05 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.40651.233438.311104@localhost.localdomain> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> Message-ID: <14965.43705.367236.994786@localhost.localdomain> The recursion test in pybench is testing the performance of the nested scopes changes, which must do some extra bookkeeping to reference the recursive function in a nested scope. To some extent, a performance hit is a necessary consequence for nested functions with free variables. Nonetheless, there are two interesting things to say about this situation. First, there is a bug in the current implementation of nested scopes that the benchmark tickles. The problem is with code like this: def outer(): global f def f(x): if x > 0: return f(x - 1) The compiler determines that f is free in f. (It's recursive.) If f is free in f, in the absence of the global decl, the body of outer must allocate fresh storage (a cell) for f each time outer is called and add a reference to that cell to f's closure. If f is declared global in outer, then it ought to be treated as a global in nested scopes, too. In general terms, a free variable should use the binding found in the nearest enclosing scope. If the nearest enclosing scope has a global binding, then the reference is global. If I fix this problem, the recursion benchmark shouldn't be any slower than a normal function call. The second interesting thing to say is that frame allocation and dealloc is probably more expensive than it needs to be in the current implementation. The frame object has a new f_closure slot that holds a tuple that is freshly allocated every time the frame is allocated. (Unless the closure is empty, then f_closure is just NULL.) The extra tuple allocation can probably be done away with by using the same allocation strategy as locals & stack. If the f_localsplus array holds cells + frees + locals + stack, then a new frame will never require more than a single malloc (and often not even that). Jeremy From akuchlin at cnri.reston.va.us Mon Jan 29 18:54:37 2001 From: akuchlin at cnri.reston.va.us (Andrew M. Kuchling) Date: Mon, 29 Jan 2001 12:54:37 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.42988.362288.154254@localhost.localdomain>; from jeremy@alum.mit.edu on Mon, Jan 29, 2001 at 12:27:08PM -0500 References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <14965.40651.233438.311104@localhost.localdomain> <20010129121205.A8337@thyrsus.com> <14965.42988.362288.154254@localhost.localdomain> Message-ID: <20010129125437.E4018@amarok.cnri.reston.va.us> On Mon, Jan 29, 2001 at 12:27:08PM -0500, Jeremy Hylton wrote: >Depends on your defintion of "compiler problem" . If you mean, >it compiles our code so it runs slower, then, yes, we've got one :-). Compiling with gcc and -g, with no optimization, 2.0 and 2.1cvs seem to be very close, with 2.1 slightly slower: 2.0: Pystone(1.1) time for 10000 passes = 1.04 This machine benchmarks at 9615.38 pystones/second This machine benchmarks at 9345.79 pystones/second This machine benchmarks at 9433.96 pystones/second This machine benchmarks at 9433.96 pystones/second This machine benchmarks at 9523.81 pystones/second 2.1cvs: Pystone(1.1) time for 10000 passes = 1.09 This machine benchmarks at 9174.31 pystones/second This machine benchmarks at 9090.91 pystones/second This machine benchmarks at 9259.26 pystones/second This machine benchmarks at 9174.31 pystones/second This machine benchmarks at 9090.91 pystones/second Would it be worth experimenting with platform-specific compiler options to try to squeeze out the last bit of performance (can wait for the betas, probably). --amk From jeremy at alum.mit.edu Mon Jan 29 19:04:28 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jan 2001 13:04:28 -0500 (EST) Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <3A756DBC.8EAC42F5@lemburg.com> References: <200101271858.NAA04898@mira.erols.com> <3A7560EF.39D6CF@lemburg.com> <01fc01c089ef$48072230$0900a8c0@SPIFF> <3A75677C.E4FA82A0@lemburg.com> <3A756DBC.8EAC42F5@lemburg.com> Message-ID: <14965.45228.197778.579989@localhost.localdomain> I hope another set of benchmarks isn't overkill for the list. I see different results comparing 2.1 with 2.0 (both -O3) using pybench 0.6. The interesting differences I see in this benchmark that I didn't see in MAL's are: DictCreation +15.87% SeoncdImport +20.29% Other curious differences, which show up in both benchmarks, include: SpecialClassAttribute +17.91% (private variables) SpecialInstanceAttribute +15.34% (__methods__) Jeremy PYBENCH 0.6 Benchmark: py21 (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 305.05 ms 2.39 us +4.77% BuiltinMethodLookup: 319.65 ms 0.61 us +2.55% ConcatStrings: 383.70 ms 2.56 us +1.27% CreateInstances: 463.85 ms 11.04 us +1.96% CreateStringsWithConcat: 381.20 ms 1.91 us +2.39% DictCreation: 508.85 ms 3.39 us +15.87% ForLoops: 577.60 ms 57.76 us +5.65% IfThenElse: 443.70 ms 0.66 us +1.02% ListSlicing: 207.50 ms 59.29 us -4.18% NestedForLoops: 315.75 ms 0.90 us +3.54% NormalClassAttribute: 379.80 ms 0.63 us +7.39% NormalInstanceAttribute: 385.45 ms 0.64 us +8.04% PythonFunctionCalls: 400.00 ms 2.42 us +13.62% PythonMethodCalls: 306.25 ms 4.08 us +5.13% Recursion: 337.25 ms 26.98 us +19.00% SecondImport: 301.20 ms 12.05 us +20.29% SecondPackageImport: 298.20 ms 11.93 us +18.15% SecondSubmoduleImport: 339.15 ms 13.57 us +11.40% SimpleComplexArithmetic: 392.70 ms 1.79 us -10.52% SimpleDictManipulation: 350.40 ms 1.17 us +3.87% SimpleFloatArithmetic: 300.75 ms 0.55 us +2.04% SimpleIntFloatArithmetic: 347.95 ms 0.53 us +9.01% SimpleIntegerArithmetic: 356.40 ms 0.54 us +12.01% SimpleListManipulation: 351.85 ms 1.30 us +11.33% SimpleLongArithmetic: 309.00 ms 1.87 us -5.81% SmallLists: 584.25 ms 2.29 us +10.20% SmallTuples: 442.00 ms 1.84 us +10.33% SpecialClassAttribute: 406.50 ms 0.68 us +17.91% SpecialInstanceAttribute: 557.40 ms 0.93 us +15.34% StringSlicing: 336.45 ms 1.92 us +9.56% TryExcept: 650.60 ms 0.43 us +1.40% TryRaiseExcept: 345.95 ms 23.06 us +2.70% TupleSlicing: 266.35 ms 2.54 us +4.70% ------------------------------------------------------------------------ Average round time: 14413.00 ms +7.07% *) measured against: py20 (rounds=10, warp=20) From skip at mojam.com Mon Jan 29 19:07:26 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 29 Jan 2001 12:07:26 -0600 (CST) Subject: [Python-Dev] What happened to Setup.local's functionality? In-Reply-To: <20010129012355.A14763@glacier.fnational.com> References: <14964.37324.642566.602319@beluga.mojam.com> <20010129012355.A14763@glacier.fnational.com> Message-ID: <14965.45406.933528.53857@beluga.mojam.com> Neil> You have to do "make oldsharedmods". This did the trick. This should be emblazoned in big red letters somewhere if the decision is made to not include oldsharedmods as a dependency for the all target. Thx, Skip From gvwilson at ca.baltimore.com Mon Jan 29 19:19:21 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Mon, 29 Jan 2001 13:19:21 -0500 Subject: [Python-Dev] Re: Re: Sets: elt in dict, lst.include In-Reply-To: <20010129162012.32158ED49@mail.python.org> Message-ID: <001501c08a20$00dca2a0$770a0a0a@nevex.com> > > > [Ping] > > > dict[key] = 1 > > > if key in dict: ... > > > for key in dict: ... > "Tim Peters" > "if (k, v) in dict" is clearly useless... > I can live with "x in list" checking the values and "x in dict" > checking the keys. But I can *not* live with "x in dict" equivalent > to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()". > I also think that defining "x in dict" but not "for x in dict" will be > confusing. [Greg] Quick poll (four people): if the expression "if a in b" works, then all four expected "for a in b" to work as well. This is also my intuition; are there any exceptions in really existing Python? > [Guido] > for key in dict: ... # ... over keys > for key:value in dict: ... # ... over items [Greg] I'm probably revealing my ignorance of Python's internals here, but can the iteration protocol be extended so that the object (in this case, the dict) is told the number and type(s) of the values the loop is expecting? With: for key in dict: ... the dict would be asked for one value; with: for (key, value) in dict: the dict would be told that a two-element tuple was expected, and so on. This would allow multi-dimensional structures (e.g. NumPy arrays) to do things like: for (i, j, k) in array: # please give me three indices and: for ((i, j, k), v) in array: # three indices and value > [Guido] > for index:value in list: ... # ... over zip(range(len(list), list) How do you feel about: for i in seq.keys(): # strings, tuples, etc. "keys()" is kind of strange ("indices" or something would be more natural), *but* this allows uniform iteration over all built-in collections: def showem(c): for i in c.keys(): print i, c[i] Greg From bckfnn at worldonline.dk Mon Jan 29 19:31:48 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Mon, 29 Jan 2001 18:31:48 GMT Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> References: <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <3a75aba9.31537178@smtp.worldonline.dk> On Mon, 29 Jan 2001 16:04:47 +0200 (IST), you wrote: >On Mon, 29 Jan 2001 13:48:44 GMT, bckfnn at worldonline.dk (Finn Bock) wrote: > >> Thanks. With this change, Jython too can complete the test_opcodes. In >> Jython a code object can never compare equal to anything but itself. > >Great! I'm happy to have helped. >I'm starting to wonder what the tests really test: the language definition, >or accidents of the implementation? Based on the amount of code in test_opcodes dedicated to code comparison, I doubt this particular situation was an accident. The problems I have had with the test suite are better described as accidents of the tests themself. From test_extcall: We expected (repr): "g() got multiple values for keyword argument 'b'" But instead we got: "g() got multiple values for keyword argument 'a'" This is caused by a difference in iteration over a dictionary. Or from test_import: test test_import crashed -- java.lang.ClassFormatError: java.lang.ClassFormatError: @test$py (Illegal Class name "@test$py") where '@' isn't allowed in java classnames. These are failures that have very little to do with the thing the test are about and nothing at all to do with the language definition. regards, finn From cgw at alum.mit.edu Mon Jan 29 19:35:58 2001 From: cgw at alum.mit.edu (Charles G Waldman) Date: Mon, 29 Jan 2001 12:35:58 -0600 (CST) Subject: [Python-Dev] Re: Re: Sets: elt in dict, lst.include In-Reply-To: <001501c08a20$00dca2a0$770a0a0a@nevex.com> References: <20010129162012.32158ED49@mail.python.org> <001501c08a20$00dca2a0$770a0a0a@nevex.com> Message-ID: <14965.47118.135246.700571@sirius.net.home> Greg Wilson writes: > This would allow multi-dimensional structures > (e.g. NumPy arrays) to do things like: > > for (i, j, k) in array: # please give me three indices > > and: > > for ((i, j, k), v) in array: # three indices and value And what if I had, for example, a 3-dimensional array where the values are 3-tuples? Would "for (i,j,k) in array" refer to the indices or the values? From mal at lemburg.com Mon Jan 29 20:03:41 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 20:03:41 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3A75BE8D.1B7673EE@lemburg.com> With all this confusion about how to actually write the iteration on dictionary items, wouldn't it make more sense to implement an extension module which then provides a __getitem__ style iterator for dictionaries by interfacing to PyDict_Next() ? The module could have three different iterators: 1. iterate over items 2. ... over keys 3. ... over values The reasoning behind this is that the __getitem__ interface is well established and this doesn't introduce any new syntax while still providing speed and flexibility. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jan 29 19:08:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 19:08:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3A75B190.3FD2A883@lemburg.com> Guido van Rossum wrote: > > > Dictionaries are not sequences. I wonder what order a user of > > for k,v in dict: (or whatever other of this proposal you choose) > > will expect... > > The same order that for k,v in dict.items() will yield, of course. And then people find out that the order has some sorting properties and start to use it... "how to sort a dictionary?" comes up again, every now and then. > > Please also take into account that dictionaries are *mutable* > > and their internal state is not defined to e.g. not change due to > > lookups (take the string optimization for example...), so exposing > > PyDict_Next() in any to Python will cause trouble. In the end, > > you will need to create a list or tuple to iterate over one way > > or another, so why bother overloading for-loops w/r to dictionaries ? > > Actually, I was going to propose to play dangerously here: the > > for k:v in dict: ... > > syntax I proposed in my previous message should indeed expose > PyDict_Next(). It should be a big speed-up, and I'm expecting (though > don't have much proof) that most loops over dicts don't mutate the > dict. > > Maybe we could add a flag to the dict that issues an error when a new > key is inserted during such a for loop? (I don't think the key order > can be affected when a key is *deleted*.) You mean: mark it read-only ? That would be a "nice to have" property for a lot of mutable types indeed -- sort of like low-level locks. This would be another candidate for an object flag (much like the one Fred wants to introduce for weak referenced objects). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Mon Jan 29 20:22:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 14:22:07 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 19:08:16 +0100." <3A75B190.3FD2A883@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> Message-ID: <200101291922.OAA13321@cj20424-a.reston1.va.home.com> > > > Dictionaries are not sequences. I wonder what order a user of > > > for k,v in dict: (or whatever other of this proposal you choose) > > > will expect... > > > > The same order that for k,v in dict.items() will yield, of course. > > And then people find out that the order has some sorting > properties and start to use it... "how to sort a dictionary?" > comes up again, every now and then. I don't understand why you bring this up. We're not revealing anything new here, the random order of dict items has always been part of the language. The answer to "how to sort a dict" should be "copy it into a list and sort that." Or am I missing something? > > > Please also take into account that dictionaries are *mutable* > > > and their internal state is not defined to e.g. not change due to > > > lookups (take the string optimization for example...), so exposing > > > PyDict_Next() in any to Python will cause trouble. In the end, > > > you will need to create a list or tuple to iterate over one way > > > or another, so why bother overloading for-loops w/r to dictionaries ? > > > > Actually, I was going to propose to play dangerously here: the > > > > for k:v in dict: ... > > > > syntax I proposed in my previous message should indeed expose > > PyDict_Next(). It should be a big speed-up, and I'm expecting (though > > don't have much proof) that most loops over dicts don't mutate the > > dict. > > > > Maybe we could add a flag to the dict that issues an error when a new > > key is inserted during such a for loop? (I don't think the key order > > can be affected when a key is *deleted*.) > > You mean: mark it read-only ? That would be a "nice to have" > property for a lot of mutable types indeed -- sort of like > low-level locks. This would be another candidate for an object flag > (much like the one Fred wants to introduce for weak referenced > objects). Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From gvwilson at ca.baltimore.com Mon Jan 29 20:38:50 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Mon, 29 Jan 2001 14:38:50 -0500 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1124 - 13 msgs In-Reply-To: <20010129193101.7BF83EF62@mail.python.org> Message-ID: <001a01c08a2b$1ba5a040$770a0a0a@nevex.com> > Greg Wilson writes: > > This would allow multi-dimensional structures > > (e.g. NumPy arrays) to do things like: > > for (i, j, k) in array: > > and: > > for ((i, j, k), v) in array: # three indices and value > Charles Waldman asks: > And what if I had, for example, a 3-dimensional array where the values > are 3-tuples? Would "for (i,j,k) in array" refer to the > indices or the values? Greg Wilson writes: That would be up to the module's implementer --- my idea was to have the 'for' loop provide more information to the object being iterated over, so that it could "do the right thing" (just as objects do right now with "x[i]"). Greg From mal at lemburg.com Mon Jan 29 20:45:46 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jan 2001 20:45:46 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> Message-ID: <3A75C86A.3A4236E8@lemburg.com> Guido van Rossum wrote: > > > > > Dictionaries are not sequences. I wonder what order a user of > > > > for k,v in dict: (or whatever other of this proposal you choose) > > > > will expect... > > > > > > The same order that for k,v in dict.items() will yield, of course. > > > > And then people find out that the order has some sorting > > properties and start to use it... "how to sort a dictionary?" > > comes up again, every now and then. > > I don't understand why you bring this up. We're not revealing > anything new here, the random order of dict items has always been part > of the language. The answer to "how to sort a dict" should be "copy > it into a list and sort that." > > Or am I missing something? I just wanted to hint at a problem which iterating over items in an unordered set can cause. Especially new Python users will find it confusing that the order of the items in an iteration can change from one run to the next. Not much of an argument, but I like explicit programming more than magic under the cover. What we really want is iterators for dictionaries, so why not implement these instead of tweaking for-loops. If you are looking for speedups w/r to for-loops, applying a different indexing technique in for-loops would go a lot further and provide better performance not only to dictionary loops, but also to other sequences. I have made some good experience with a special counter object (sort of like a mutable integer) which is used instead of the iteration index integer in the current implementation. Using an iterator object instead of the integer + __getitem__ call machinery would allow more flexibility for all kinds of sequences or containers. There could be an iterator type for dictionaries, one for generic __getitem__ style sequences, one for lists and tuples, etc. All of these could include special logic to get the most out of the targetted datatype. Well, just a thought... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From esr at thyrsus.com Mon Jan 29 21:02:47 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 15:02:47 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291922.OAA13321@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 02:22:07PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> Message-ID: <20010129150247.B10191@thyrsus.com> Guido van Rossum : > > > Maybe we could add a flag to the dict that issues an error when a new > > > key is inserted during such a for loop? (I don't think the key order > > > can be affected when a key is *deleted*.) > > > > You mean: mark it read-only ? That would be a "nice to have" > > property for a lot of mutable types indeed -- sort of like > > low-level locks. This would be another candidate for an object flag > > (much like the one Fred wants to introduce for weak referenced > > objects). > > Yes. For different reasons, I'd like to be able to set a constant flag on a object instance. Simple semantics: if you try to assign to a member or method, it throws an exception. Application? I have a large Python program that goes to a lot of effort to build elaborate context structures in core. It would be nice to know they can't be even inadvertently trashed without throwing an exception I can watch for. -- Eric S. Raymond No one is bound to obey an unconstitutional law and no courts are bound to enforce it. -- 16 Am. Jur. Sec. 177 late 2d, Sec 256 From esr at thyrsus.com Mon Jan 29 21:09:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 15:09:14 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75C86A.3A4236E8@lemburg.com>; from mal@lemburg.com on Mon, Jan 29, 2001 at 08:45:46PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> Message-ID: <20010129150914.C10191@thyrsus.com> M.-A. Lemburg : > If you are looking for speedups w/r to for-loops, applying a > different indexing technique in for-loops would go a lot further > and provide better performance not only to dictionary loops, > but also to other sequences. Which reminds me... There's not much I miss from C these days, but one thing I wish Python had is a more general for-loop. The C semantics that let you have any initialization, any termination test, and any iteration you like are rather cool. Yes, I realize that for (; ; ) {} can be simulated with: while 1: if : break Still, having them spatially grouped the way a C for does it is nice. Makes it easier to see invariants, I think. -- Eric S. Raymond "Rightful liberty is unobstructed action, according to our will, within limits drawn around us by the equal rights of others." -- Thomas Jefferson From moshez at zadka.site.co.il Mon Jan 29 21:29:53 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Mon, 29 Jan 2001 22:29:53 +0200 (IST) Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <200101291530.KAA12037@cj20424-a.reston1.va.home.com> References: <200101291530.KAA12037@cj20424-a.reston1.va.home.com>, <3a75747e.17414620@smtp.worldonline.dk>, <14964.63948.492662.775413@anthem.wooz.org>, <20010127191428.D71ADA840@darjeeling.zadka.site.co.il> <20010129062625.3A35DA840@darjeeling.zadka.site.co.il> <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: <20010129202953.D1498A840@darjeeling.zadka.site.co.il> On Mon, 29 Jan 2001 10:30:17 -0500, Guido van Rossum wrote: > It's good to test conformance to the language definition, but this is > also a regression test for the implementation. The "accidents of the > implementation" definitely need to be tested. E.g. if we decide that > repr(s) uses \n rather than \012 or \x0a, this should be tested too. > The language definition gives the implementer a choice here; but once > the implementer has made a choice, it's good to have a test that tests > that this choice is implemented correctly. I agree. > Perhaps there should be several parts to the regression test, > e.g. language conformance, library conformance, platform-specific > features, and implementation conformance? This sounds like a good idea...probably for the 2.2 timeline. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one at home.com Mon Jan 29 22:51:56 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 16:51:56 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <20010129140447.AE0E4A840@darjeeling.zadka.site.co.il> Message-ID: [Moshe Zadka] > ... > I'm starting to wonder what the tests really test: the language > definition, or accidents of the implementation? You'd be amazed (appalled?) at how hard it is to separate them. In two previous lives as a Big Iron compiler hacker, we routinely had to get our compilers validated by a govt agency before any US govt account would be allowed to buy our stuff; e.g., http://www.itl.nist.gov/div897/ctg/vpl/language.htm This usually *started* as a two-day process, flying the inspector to our headquarters, taking perhaps 2 minutes of machine time to run the test suite, then sitting around that day and into the next arguing about whether the "failures" were due to non-standard assumptions in the tests, or compiler bugs. It was almost always the former, but sometimes that didn't get fully resolved for months (if the inspector was being particularly troublesome, it could require getting an Official Interpretation from the relevant stds body -- not swift!). (BTW, this is one reason huge customers are often very reluctant to move to a new release: the validation process can be very expensive and drag on for months) >>> def f(): ... global g ... g += 1 ... return g ... >>> g = 0 >>> d = {f(): f()} >>> d {2: 1} >>> The Python Lang Ref doesn't really say whether {2: 1} or {1: 2} "should be" the result, nor does it say it's implementation-defined. If you *asked* Guido what he thought it should do, he'd probably say {1: 2} (not much of a guess: I asked him in the past, and that's what he did say ). Something "like that" can show up in the test suite, but buried under layers of obfuscating accidents. Nobody is likely to realize it in the absence of a failure motivating people to search for it. Which is a trap: sometimes ours was the only compiler (of dozens and dozens) that had *ever* "failed" a particular test. This was most often the case at Cray Research, which had bizarre (but exceedingly fast -- which is what Cray's customers valued most) floating-point arithmetic. I recall one test in particular that failed because Cray's was the only box on earth that set I to 1 in INTEGER I I = 6.0/3.0 Fortran doesn't define that the result must be 2. But-- you guessed it --neither does Python. Cute: at KSR, INT(6.0/3.0) did return 2 -- but INT(98./49.) did not . then-again-the-python-test-suite-is-still-shallow-ly y'rs - tim From hughett at mercur.uphs.upenn.edu Mon Jan 29 23:05:22 2001 From: hughett at mercur.uphs.upenn.edu (Paul Hughett) Date: Mon, 29 Jan 2001 17:05:22 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: References: Message-ID: <200101292205.RAA18790@mercur.uphs.upenn.edu> tim says: > Cray's was the only box on earth that set I to 1 in > INTEGER I > I = 6.0/3.0 > Fortran doesn't define that the result must be 2. But-- you guessed > it --neither does Python. I would _guess_ that the IEEE 754 floating point standard does require that, but I haven't actually gotten my hands on a copy of the standard yet. If it doesn't, I may have to stop writing code that depends on the assumption that floating point computation is exact for exactly representable integers. If so, then we're reasonably safe; there aren't many non-IEEE machines left these days. Un-lurking-ly yours, Paul Hughett From tim.one at home.com Mon Jan 29 23:53:43 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 17:53:43 -0500 Subject: [Python-Dev] Function Hash: Check it in? In-Reply-To: <200101292205.RAA18790@mercur.uphs.upenn.edu> Message-ID: [Paul Hughett] > I would _guess_ that the IEEE 754 floating point standard does require > that [6./3. == 2.], It does, but 754 is silent on how languages may or may not *bind* to its semantics. The C99 std finally addresses that (15 years after 754), and Java does too (albeit in a way Kahan despises), but that's about it for "name brand" languages. > ... > If it doesn't, I may have to stop writing code that depends on > the assumption that floating point computation is exact for exactly > representable integers. If so, then we're reasonably safe; there > aren't many non-IEEE machines left these days. I'm afraid you've got no guarantees even on a box with 100% conforming 754 hardware. One of the last "mystery bugs" I helped tracked down at my previous employer only showed up under Intel's C++ compiler. It turned out the compiler was looking for code of the form: double *a, *b, scale; for (i=0; i < n; ++i) { a[i] = b[i] / scale; } and rewriting it as: double __temp = 1./scale; for (i=0; i < n; ++i) { a[i] = b[i] * __temp; } for speed. As time goes on, PC compilers are becoming more and more like Cray's and KSR's in this respect: float division is much more expensive than float mult, and so variations of "so multiply by the reciprocal instead" are hard for vendors to resist. And, e.g., under 754 double rules, (17. * 123.) * (1./123.) must *not* yield exactly 17.0 if done wholly in 754 double (but then 754 says nothing about how any language maps that string to 754 operations). if-you-like-logic-chopping-you'll-love-arguing-stds-ly y'rs - tim From guido at digicool.com Tue Jan 30 00:59:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 18:59:34 -0500 Subject: [Python-Dev] Does autoconfig detect INSTALL incorrectly? In-Reply-To: Your message of "Tue, 23 Jan 2001 00:30:56 PST." <20010123003056.A28309@glacier.fnational.com> References: <20010123003056.A28309@glacier.fnational.com> Message-ID: <200101292359.SAA20364@cj20424-a.reston1.va.home.com> > Why is the configure.in file set to always use "install-sh"? > There is a comment that says: > > # Install just never works :-( > > I don't think that statement is accurate. /usr/bin/install works > quite well on my machine. The only commments I can find in the > changelog are: > > revision 1.16 > date: 1995/01/20 14:12:16; author: guido; state: Exp; lines: +27 -2 > add INSTALL_PROGRAM and INSTALL_DATA; check for getopt > > and: > > revision 1.5 > date: 1994/08/19 15:33:51; author: guido; state: Exp; lines: +14 -6 > Simplify value of INSTALL (always 'cp'). > > Is there any reason why the autoconf macro AC_PROG_INSTALL is not used? The > documentation seems to indicate that is does what we want. Neil, It's too long for me to remember, and I bet this was before AC_PROG_INSTALL. If there's a reason to prefer a working "install" over install-sh, feel free to do the right thing! (You're in charge of the Makefile anyway now, it seems. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Jan 30 01:17:25 2001 From: skip at mojam.com (Skip Montanaro) Date: Mon, 29 Jan 2001 18:17:25 -0600 (CST) Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP Message-ID: <14966.2069.950895.627663@beluga.mojam.com> After reading through this thread and noticing (but not paying close attention to) all the related posts on c.l.py (subject: "in for dicts"), it seems to me that the whole "if/for something in dict" thing needds to be hashed out in a PEP. There were a fair amount of "Python's changing too fast" rants when 2.0 was released. Adding a major feature such as this at the 2.1 stage is only going to generate that many more rants. The fact that it was easy for Thomas to implement "if key in dict" doesn't make the overall concept less controversial. There are apparently lots of varying opinions about what's reasonable. This topic seems related to PEP 212 (Loop Counter Iteration) and PEP 218 (Adding a Built-In Set Object Type), but may well warrant its own. That said, I have plenty enough on my plate trying to keep Mojam afloat these days, so I can't step into the crevass, just observe that it looks to me like a very long ways to the bottom... ;-) Skip From guido at digicool.com Tue Jan 30 01:22:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 19:22:58 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Your message of "Mon, 29 Jan 2001 18:17:25 CST." <14966.2069.950895.627663@beluga.mojam.com> References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <200101300022.TAA21244@cj20424-a.reston1.va.home.com> > After reading through this thread and noticing (but not paying close > attention to) all the related posts on c.l.py (subject: "in for dicts"), it > seems to me that the whole "if/for something in dict" thing needds to be > hashed out in a PEP. There were a fair amount of "Python's changing too > fast" rants when 2.0 was released. Adding a major feature such as this at > the 2.1 stage is only going to generate that many more rants. The fact that > it was easy for Thomas to implement "if key in dict" doesn't make the > overall concept less controversial. There are apparently lots of varying > opinions about what's reasonable. This topic seems related to PEP 212 (Loop > Counter Iteration) and PEP 218 (Adding a Built-In Set Object Type), but may > well warrant its own. Excellent. Good reminder also that this shouldn't go into 2.1 -- clearly the design space is too complicated for a quick decision. > That said, I have plenty enough on my plate trying to keep Mojam afloat > these days, so I can't step into the crevass, just observe that it looks to > me like a very long ways to the bottom... ;-) I'm not able to lead such a PEP effort myself either, but I hope *someone* will be. This PEP has a good chance for 2.2 though (what with BDFL approval and all :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 30 02:39:17 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 20:39:17 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I did a less sophisticated count but come to the same conclusion: > iterations over items() are (somewhat) more common than over keys(), > and values() are 1-2 orders of magnitude less common. My numbers: > > $ cd python/src/Lib > $ grep 'for .*items():' *.py | wc -l > 47 > $ grep 'for .*keys():' *.py | wc -l > 43 > $ grep 'for .*values():' *.py | wc -l > 2 I like my larger sample and anal methodology better . A closer look showed that it may have been unduly biased by the mass of files in Lib/encodings/, where encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k is at the end of most files (btw, MAL, that's the answer to your question: people would expect "the same" ordering you expected there, i.e. none in particular). > ... > I don't much value to the readability argument: typically, one will > write "for key in dict" or "for name in dict" and then it's obvious > what is meant. Well, "fiddlesticks" comes to mind <0.9 wink>. If I've got a dict mapping phone numbers to names, "for name in dict" is dead backwards. for vevent in keydefs.keys(): for x in self.subdirs.keys(): for name in lsumdict.keys(): for locale in self.descriptions.keys(): for name in attrs.keys(): for func in other.top_level.keys(): for func in target.keys(): for i in u2.keys(): for s in d.keys(): for url in self.bad.keys(): are other cases in the CVS tree where I don't think the name makes it obvious in the absence of ".keys()". But I don't personally give any weight to whether people can guess what something does at first glance. My rule is that it doesn't matter, provided it's (a) easy to learn; and (especially), (b) hard to *forget* once you've learned it. A classic example is Python's "points between elements" treatment of slice indices: few people guess right what that does at first glance, but once they "get it" they're delighted and rarely mess up again. And I think this is "like that". > ... > But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody > has even asked me for a has_item() method). Yup. > I can live with "x in list" checking the values and "x in dict" > checking the keys. But I can *not* live with "x in dict" equivalent > to "dict.has_key(x)" if "for x in dict" would mean > "for x in dict.items()". That's why I brought it up -- it's not entirely clear what's to be done here. > I also think that defining "x in dict" but not "for x in dict" will > be confusing. > > So we need to think more. The hoped-for next step indeed. > How about: > > for key in dict: ... # ... over keys > > for key:value in dict: ... # ... over items > > This is syntactically unambiguous (a colon is currently illegal in > that position). Cool! Can we resist adding if key:value in dict for "parallelism"? (I know I can ...) 2/3rd of these are marginally more attractive: for key: in dict: # over dict.keys() for :value in dict: # over dict.values() for : in dict: # a delay loop > This also suggests: > > for index:value in list: ... # ... over zip(range(len(list), list) > > while doesn't strike me as bad or ugly, and would fulfill my brother's > dearest wish. You mean besides the one that you fry in hell for not adding "for ... indexing"? Ya, probably. > (And why didn't we think of this before?) Best guess: we were focused exclusively on sequences, and a colon just didn't suggest itself in that context. Second-best guess: having finally approved one of these gimmicks, you finally got desperate enough to make it work . ponderingly y'rs - tim From tim.one at home.com Tue Jan 30 02:58:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 20:58:59 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > I'm expecting (though don't have much proof) that most loops over > dicts don't mutate the dict. Safe bet! I do recall writing one once: it del'ed keys for which the associated count was 1, because the rest of the algorithm was only interested in duplicates. > Maybe we could add a flag to the dict that issues an error when a new > key is inserted during such a for loop? (I don't think the key order > can be affected when a key is *deleted*.) That latter is true but specific to this implementation. "Can't mutate the dict period" is easier to keep straight, and probably harmless in practice (if not, it could be relaxed later). Recall that a similar trick is played during list.sort(), replacing the list's type pointer for the duration (to point to an internal "immutable list" type, same as the list type except the "dangerous" slots point to a function that raises an "immutable list" TypeError). Then no runtime expense is incurred for regular lists to keep checking flags. I thought of this as an elegant use for switching types at runtime; you may still be appalled by it, though! From tim.one at home.com Tue Jan 30 03:07:36 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:07:36 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75B190.3FD2A883@lemburg.com> Message-ID: [Guido] > The same order that for k,v in dict.items() will yield, of course. [MAL] > And then people find out that the order has some sorting > properties and start to use it... Except that it has none. dict insertion has never used any comparison outcome beyond "equal"/"not equal", so any ordering you think you see is-- and always was --an illusion. From guido at digicool.com Tue Jan 30 03:06:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:06:35 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Mon, 29 Jan 2001 20:39:17 EST." References: Message-ID: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> This is all PEP material now. Tim, do you want to own the PEP? It seems just up your alley! > Cool! Can we resist adding > > if key:value in dict > > for "parallelism"? (I know I can ...) That's easy to resist because, unlike ``for key:value in dict'', it's not unambiguous: ``if key:value in dict'' is already legal syntax currently, with 'key' as the condition and 'value in dict' as the (not particularly useful) body of the if statement. > > (And why didn't we think of this before?) > > Best guess: we were focused exclusively on sequences, and a colon just > didn't suggest itself in that context. Second-best guess: having finally > approved one of these gimmicks, you finally got desperate enough to make it > work . I'm certainly more comfortable with just ``for key in dict'' than with the whole slow of extensions using colons. But, again, that's for the PEP to fight over. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 03:15:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:15:04 -0500 Subject: [Python-Dev] C's for statement In-Reply-To: Your message of "Mon, 29 Jan 2001 15:09:14 EST." <20010129150914.C10191@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> Message-ID: <200101300215.VAA21955@cj20424-a.reston1.va.home.com> [ESR] > There's not much I miss from C these days, but one thing I wish Python > had is a more general for-loop. The C semantics that let you have > any initialization, any termination test, and any iteration you like > are rather cool. > > Yes, I realize that > > for (; ; ) {} > > can be simulated with: > > > while 1: > if : > break > > > Still, having them spatially grouped the way a C for does it is nice. > Makes it easier to see invariants, I think. Hm, I've seen too many ugly C for loops to have much appreciation for it. I can recognize and appreciate the few common forms that clearly iterate over an array; most other forms look rather contorted to me. Check out the Python C sources; if you find anything more complicated than ``for (i = n; i > 0; i--)'' I probably didn't write it. :-) Common abominations include: - writing a while loop as for(;;) - putting arbitrary initialization code in - having an empty condition, so the becomes an arbitraty extension of the body that's written out-of-sequence --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 30 03:19:12 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:19:12 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A75C86A.3A4236E8@lemburg.com> Message-ID: [MAL] > I just wanted to hint at a problem which iterating over items > in an unordered set can cause. Especially new Python users will find > it confusing that the order of the items in an iteration can change > from one run to the next. Do they find "for k, v in dict.items()" confusing now? Would be the same. > ... > What we really want is iterators for dictionaries, so why not > implement these instead of tweaking for-loops. Seems an unrelated topic: would "iterators for dictionaries" solve the supposed problem with iteration order? > If you are looking for speedups w/r to for-loops, applying a > different indexing technique in for-loops would go a lot further > and provide better performance not only to dictionary loops, > but also to other sequences. > > I have made some good experience with a special counter object > (sort of like a mutable integer) which is used instead of the > iteration index integer in the current implementation. Please quantify, if possible. My belief (based on past experiments) is that in loops fancier than for i in range(n): pass the loop overhead quickly falls into the noise even now. > Using an iterator object instead of the integer + __getitem__ > call machinery would allow more flexibility for all kinds of > sequences or containers. ... This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new iteration *protocol* could have major attractions. From guido at digicool.com Tue Jan 30 03:17:27 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:17:27 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: Your message of "Mon, 29 Jan 2001 15:02:47 EST." <20010129150247.B10191@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> Message-ID: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> [ESR] > For different reasons, I'd like to be able to set a constant flag on a > object instance. Simple semantics: if you try to assign to a > member or method, it throws an exception. > > Application? I have a large Python program that goes to a lot of effort > to build elaborate context structures in core. It would be nice to know > they can't be even inadvertently trashed without throwing an exception I > can watch for. Yes, this is a good thing. Easy to do on lists and dicts. Questions: - How to spell it? x.freeze()? x.readonly()? - Should this reversible? I.e. should there be an x.unfreeze()? - Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values... --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue Jan 30 03:29:25 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:29:25 -0500 Subject: [Python-Dev] C's for statement In-Reply-To: <200101300215.VAA21955@cj20424-a.reston1.va.home.com> Message-ID: Check out SETL's loop statement. I think Perl5 is a subset of it <0.9 wink>. From esr at thyrsus.com Tue Jan 30 03:34:01 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 21:34:01 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <200101300215.VAA21955@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:15:04PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> Message-ID: <20010129213401.A17235@thyrsus.com> Guido van Rossum : > Common abominations include: > > - writing a while loop as for(;;) Agreed. Bletch. > - putting arbitrary initialization code in Not sure what's "arbitrary", unless you mean unrelated to the iteration variable. > - having an empty condition, so the becomes an arbitraty > extension of the body that's written out-of-sequence Again agreed. Double bletch. I guess my archetype of the cute C for-loop is the idiom for pointer-list traversal: struct foo {int data; struct foo *next;} *ptr, *head; for (ptr = head; *ptr; ptr = ptr->next) do_something_with(ptr->data) This is elegant. It separates the logic for list traversal from the operation on the list element. Not the highest on my list of wants -- I'd sooner have ?: back. I submitted a patch for that once, and the discussion sort of died. Were you dead det against it, or should I revive this proposal? -- Eric S. Raymond "The bearing of arms is the essential medium through which the individual asserts both his social power and his participation in politics as a responsible moral being..." -- J.G.A. Pocock, describing the beliefs of the founders of the U.S. From esr at thyrsus.com Tue Jan 30 03:49:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 21:49:59 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:17:27PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <20010129214959.B17235@thyrsus.com> Guido van Rossum : > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? I like "freeze", it'a a clear imperative where "readonly()" sounds like a test (e.g. "is this readonly()?") > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... Moshe Zadka sent me a hack that handles instances: > class MarkableAsConstant: > > def __init__(self): > self.mark_writable() > > def __setattr__(self, name, value): > if self._writable: > self.__dict__[name] = value > else: > raise ValueError, "object is read only" > > def mark_writable(self): > self.__dict__['_writable'] = 1 > > def mark_readonly(self): > self.__dict__['_writable'] = 0 > - Should this reversible? I.e. should there be an x.unfreeze()? I gave this some thought earlier today. There are advantages to either way. Making freeze a one-way operation would make it possible to use freezing to get certain kinds of security and integrity guarantees that you can't have if freezing is reversible. Fortunately, there's a semantics that captures both. If we allow freeze to take an optional key argument, and require that an unfreeze call must supply the same key or fail, we get both worlds. We can even one-way-hash the keys so they don't have to be stored in the bytecode. Want to lock a structure permanently? Pick a random long key. Freeze with it. Then throw that key away... -- Eric S. Raymond Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune From tim.one at home.com Tue Jan 30 03:57:59 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 21:57:59 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? See below. > - Should this reversible? Of course. Or x.freeze(solid=1) to default to permanent rigidity, but not require it. > I.e. should there be an x.unfreeze()? That conveniently answers the first question, since x.unreadonly() reads horribly . > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... "Should be" supported for every mutable object. Next step: as in endless C++ debates, endless Python debates about "representation freeze" vs "logical freeze" ("well, yes, I'm changing this member, but it's just an invisible cache so I *should* be able to tag the object as const anyway ..."; etc etc etc). keep-it-simple-ly y'rs - tim From guido at digicool.com Tue Jan 30 03:57:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:57:24 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: Your message of "Mon, 29 Jan 2001 21:34:01 EST." <20010129213401.A17235@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> Message-ID: <200101300257.VAA22186@cj20424-a.reston1.va.home.com> > > - putting arbitrary initialization code in > > Not sure what's "arbitrary", unless you mean unrelated to the > iteration variable. Yes, that. > I guess my archetype of the cute C for-loop is the idiom for > pointer-list traversal: > > struct foo {int data; struct foo *next;} *ptr, *head; > > for (ptr = head; *ptr; ptr = ptr->next) > do_something_with(ptr->data) > > This is elegant. It separates the logic for list traversal from the > operation on the list element. And it rarely happens in Python, because sequences are rarely represented as linked lists. > Not the highest on my list of wants -- I'd sooner have ?: back. I submitted > a patch for that once, and the discussion sort of died. Were you dead > det against it, or should I revive this proposal? Not dead set against something like it, but dead set against the ?: syntax because then : becomes too overloaded for the human reader, e.g.: if foo ? bar : bletch : spam = eggs If you want to revive this, I strongly suggest writing a PEP first before posting here. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 03:59:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 29 Jan 2001 21:59:17 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: Your message of "Mon, 29 Jan 2001 21:49:59 EST." <20010129214959.B17235@thyrsus.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <20010129214959.B17235@thyrsus.com> Message-ID: <200101300259.VAA22208@cj20424-a.reston1.va.home.com> > > - How to spell it? x.freeze()? x.readonly()? > > I like "freeze", it'a a clear imperative where "readonly()" sounds > like a test (e.g. "is this readonly()?") Agreed. > > - Should we support something like this for instances too? Sometimes > > it might be cool to be able to freeze changing attribute values... > > Moshe Zadka sent me a hack that handles instances: [...] OK, so no special support needed there. > > - Should this reversible? I.e. should there be an x.unfreeze()? > > I gave this some thought earlier today. There are advantages to either > way. Making freeze a one-way operation would make it possible to use > freezing to get certain kinds of security and integrity guarantees that > you can't have if freezing is reversible. > > Fortunately, there's a semantics that captures both. If we allow > freeze to take an optional key argument, and require that an unfreeze > call must supply the same key or fail, we get both worlds. We can > even one-way-hash the keys so they don't have to be stored in the > bytecode. > > Want to lock a structure permanently? Pick a random long key. Freeze > with it. Then throw that key away... Way too cute. My suggestion freeze(0) freezes forever, freeze(1) can be unfrozen. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr at thyrsus.com Tue Jan 30 04:06:19 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 22:06:19 -0500 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <200101300257.VAA22186@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Mon, Jan 29, 2001 at 09:57:24PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> <200101300257.VAA22186@cj20424-a.reston1.va.home.com> Message-ID: <20010129220619.A17713@thyrsus.com> Guido van Rossum : > Not dead set against something like it, but dead set against the ?: > syntax because then : becomes too overloaded for the human reader, e.g.: > > if foo ? bar : bletch : spam = eggs > > If you want to revive this, I strongly suggest writing a PEP first > before posting here. Noted. Will do. -- Eric S. Raymond Such are a well regulated militia, composed of the freeholders, citizen and husbandman, who take up arms to preserve their property, as individuals, and their rights as freemen. -- "M.T. Cicero", in a newspaper letter of 1788 touching the "militia" referred to in the Second Amendment to the Constitution. From tim.one at home.com Tue Jan 30 04:18:47 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 22:18:47 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly In-Reply-To: <20010129214959.B17235@thyrsus.com> Message-ID: Note that even adding a "frozen" flag would add 4 bytes to every freezable object on most machines. That's why I'd rather .freeze() replace the type pointer and .unfreeze() restore it. No time or space overhead; no cluttering up the normal-case (i.e., unfrozen) type implementations with new tests. From tim.one at home.com Tue Jan 30 04:57:07 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 22:57:07 -0500 Subject: [Python-Dev] Re: Python 2.1 slower than 2.0 In-Reply-To: <14965.42988.362288.154254@localhost.localdomain> Message-ID: Note that optimizing compilers use a pile of linear-time heuristics to attempt to solve exponential-time optimization problems (from optimal register assignment to optimal instruction scheduling, they're all formally intractable even in isolation). When code gets non-trivial, not even a compiler's chief designer can reliably outguess what optimization may do. It's really not unusual for a higher optimization level to yield slower code, and especially not when the source code is pushing or exceeding machine limits (# of registers, # of instruction pipes, size of branch-prediction buffers; I-cache structure; dynamic restrictions on execution units; ...). [Jeremy] > ... > One of the differences between -O2 and -O3, according to the man page, > is that -O3 will perform optimizations that involve a space-speed > tradeoff. It also include -finline-functions. I can imagine that > some of these optimizations hurt memory performance enough to make a > difference. One of the time-consuming ongoing tasks at my last employer was running profiles and using them to override counterproductive compiler inlining decisions (in both directions). It's not just memory that excessive inlining can screw up, but also things like running out of registers and so inserting gobs of register spill/restore code, and inlining so much code that the instruction scheduler effectively gives up (under many compilers, a sure sign of this is when you look at the generated code for a function, and it looks beautiful "at the top" but terrible "at the bottom"; some clever optimizers tried to get around that by optimizing "bottom-up", and then it looks beautiful at the bottom but terrible at the top <0.5 wink>; others work middle-out or burn the candle at both ends, with visible consequences you should be able to recognize now!). optimization-is-easier-than-speech-recog-but-the-latter-doesn't-work- all-that-well-either-ly y'rs - tim From barry at digicool.com Tue Jan 30 05:13:24 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:13:24 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <14966.16228.548177.112853@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> it seems to me that the whole "if/for something in dict" thing SM> needds to be hashed out in a PEP. SM> There are apparently lots of varying opinions about what's SM> reasonable. This topic seems related to PEP 212 (Loop Counter SM> Iteration) and PEP 218 (Adding a Built-In Set Object Type), SM> but may well warrant its own. As keeper of PEP0, I have to agree. I personally would vastly prefer a new iterator protocol than syntax such as "for key:value in dict". I'd really like to see a PEP on an iterator protocol for Python, but like Skip, I'm too busy at the moment to do it myself. If nobody takes it on before then, I might be willing to champion such a PEP for the 2.2 time frame. Until then, I'm decidedly -1 on "for/if in dict". -Barry From barry at digicool.com Tue Jan 30 05:25:09 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:25:09 -0500 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <14966.16933.209494.214183@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Yes, this is a good thing. Easy to do on lists and dicts. GvR> Questions: GvR> - How to spell it? x.freeze()? x.readonly()? GvR> - Should this reversible? I.e. should there be an GvR> x.unfreeze()? GvR> - Should we support something like this for instances too? GvR> Sometimes it might be cool to be able to freeze changing GvR> attribute values... lock(x) ...? :) -Barry From barry at digicool.com Tue Jan 30 05:26:50 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 29 Jan 2001 23:26:50 -0500 Subject: [Python-Dev] Re: Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <20010129214959.B17235@thyrsus.com> Message-ID: <14966.17034.721204.305315@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Fortunately, there's a semantics that captures both. If we ESR> allow freeze to take an optional key argument, and require ESR> that an unfreeze call must supply the same key or fail, we ESR> get both worlds. We can even one-way-hash the keys so they ESR> don't have to be stored in the bytecode. ESR> Want to lock a structure permanently? Pick a random long ESR> key. Freeze with it. Then throw that key away... Clever! From esr at thyrsus.com Tue Jan 30 05:32:16 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 29 Jan 2001 23:32:16 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <14966.16933.209494.214183@anthem.wooz.org>; from barry@digicool.com on Mon, Jan 29, 2001 at 11:25:09PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <14966.16933.209494.214183@anthem.wooz.org> Message-ID: <20010129233215.A18533@thyrsus.com> Barry A. Warsaw : > lock(x) ...? :) I was thinking that myself, Barry. -- Eric S. Raymond "Boys who own legal firearms have much lower rates of delinquency and drug use and are even slightly less delinquent than nonowners of guns." -- U.S. Department of Justice, National Institute of Justice, Office of Juvenile Justice and Delinquency Prevention, NCJ-143454, "Urban Delinquency and Substance Abuse," August 1995. From tim.one at home.com Tue Jan 30 05:56:09 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 29 Jan 2001 23:56:09 -0500 Subject: [Python-Dev] SSL socket read at EOF; SourceForge problem Message-ID: I tried to open an SF bug for the following msg from c.l.py, but SF balked: ERROR ERROR getting bug_id Logged out, logged in, tried it again, same outcome. Intended bug report content: Good question from c.l.py, assigned to Guido cuz he's a Socket Guy: From: Clarence Gardner Subject: RE: Thread Safety Date: Mon, 29 Jan 2001 09:51:03 -0800 ... I'm going to repeat a question that I posted about a week ago that passed without comment on the newsgroup. The issue is the SSL support in the socket module, which raises an exception when the reading socket is at EOF, rather than returning an empty string. I'm hesitant to call it a "bug", but I wouldn't have implemented it this way. There are the names of two people mentioned at the top of socketmodule.c, but no contact information, so I'm suggesting here that it be changed to conform to normal file/socket practice. (SSL was actually added at 2.0, so I'm late to the party with this; mea culpa, mea culpa. I delayed trying Python2 because of the extension rebuilding.) From thomas at xs4all.net Tue Jan 30 07:14:20 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:14:20 +0100 Subject: [Python-Dev] Re: C's for statement In-Reply-To: <20010129213401.A17235@thyrsus.com>; from esr@thyrsus.com on Mon, Jan 29, 2001 at 09:34:01PM -0500 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <3A75C86A.3A4236E8@lemburg.com> <20010129150914.C10191@thyrsus.com> <200101300215.VAA21955@cj20424-a.reston1.va.home.com> <20010129213401.A17235@thyrsus.com> Message-ID: <20010130071420.U962@xs4all.nl> On Mon, Jan 29, 2001 at 09:34:01PM -0500, Eric S. Raymond wrote: > I guess my archetype of the cute C for-loop is the idiom for > pointer-list traversal: > struct foo {int data; struct foo *next;} *ptr, *head; > for (ptr = head; *ptr; ptr = ptr->next) > do_something_with(ptr->data) Note two things: in Python, you would use a list, so 'for x i list' does exactly what you want here ;) And if you really need it, you could use iterators for exactly this (once we have them, of course): you are inventing a new storage type. Quite common in C, since the only one it has is useless for anything other than strings, but not so common in Python. > Not the highest on my list of wants -- I'd sooner have ?: back. I submitted > a patch for that once, and the discussion sort of died. Were you dead > det against it, or should I revive this proposal? Triple blech. Guido will never go for it! (There, increased your chance of getting it approved! :) Seriously though, I wouldn't like it much, it's too cryptic a syntax. I notice I use it less and less in C, too. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 30 07:18:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:18:25 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from tim.one@home.com on Mon, Jan 29, 2001 at 08:39:17PM -0500 References: <200101291448.JAA11473@cj20424-a.reston1.va.home.com> Message-ID: <20010130071825.V962@xs4all.nl> On Mon, Jan 29, 2001 at 08:39:17PM -0500, Tim Peters wrote: > for key: in dict: # over dict.keys() > for :value in dict: # over dict.values() > for : in dict: # a delay loop Wot's the last one supposed to do ? 'for unused_var in range(len(dict)):' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Tue Jan 30 07:25:51 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 30 Jan 2001 01:25:51 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010130071825.V962@xs4all.nl> Message-ID: >> for key: in dict: # over dict.keys() >> for :value in dict: # over dict.values() >> for : in dict: # a delay loop [Thomas Wouters] > Wot's the last one supposed to do ? 'for unused_var in > range(len(dict)):' ? Well, as the preceding line said in the original: >> 2/3rd of these are marginally more attractive [than >> "if key:value in dict"]: I think you've guessed which 2/3 those are . I don't see that the last line has any visible semantics whatsoever, so Python can do whatever it likes, provided it doesn't do anything visible. You still hang out on c.l.py! So you gotta know that if something of the form x:y is suggested, people will line up to suggest meanings for the 3 obvious variations, along with x::y and x:-:y and x lambda y too <0.9 wink>. From thomas at xs4all.net Tue Jan 30 07:26:48 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:26:48 +0100 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <14966.2069.950895.627663@beluga.mojam.com>; from skip@mojam.com on Mon, Jan 29, 2001 at 06:17:25PM -0600 References: <14966.2069.950895.627663@beluga.mojam.com> Message-ID: <20010130072648.W962@xs4all.nl> On Mon, Jan 29, 2001 at 06:17:25PM -0600, Skip Montanaro wrote: > The fact that it was easy for Thomas to implement "if key in dict" doesn't > make the overall concept less controversial. Note that the fact I implemented it doesn't mean I'm +1 on it (witness my posts on python-list.) In fact, *while implementing it*, I grew from +0 to -0 and maybe even to a weak -1 (all in 5 minutes :) The enthousiastic subject of the patch was a weak attempt at 5AM humour, not a venting of an ancient desire :) More-5AM-humour-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue Jan 30 07:55:16 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 07:55:16 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: ; from jhylton@users.sourceforge.net on Mon, Jan 29, 2001 at 05:27:30PM -0800 References: Message-ID: <20010130075515.X962@xs4all.nl> On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > add note about two kinds of illegal imports that are now checked > + - The compiler will report a SyntaxError if "from ... import *" occurs > + in a function or class scope or if a name bound by the import > + statement is declared global in the same scope. The language > + reference has also documented that these cases are illegal, but > + they were not enforced. Woah. Is this really a good idea ? I have seen 'from ... import *' in a function scope put to good (relatively -- we're talking 'import *' here) use. I also thought of 'import' as yet another assignment statement, so to me it's both logical and consistent if 'import' would listen to 'global'. Otherwise we have to re-invent 'import spam; eggs = spam' if we want eggs to be global. Is there really a reason to enforce this, or are we enforcing the wording of the language reference for the sake of enforcing the wording of the language reference ? When writing 'import as' for 2.0, I fixed some of the inconsistencies in import, making it adhere to 'global' statements in as many cases as possible (all except 'from ... import *') but I was apparently not aware of the wording of the language reference. I'd suggest updating the wording in the language reference, not the implementation, unless there is a good reason to disallow this. I also have another issue with your recent patches, Jeremy, also in the backwards-compatibility departement :) You gave new.code two new, non-optional arguments, in the middle of the long argument list. I sent a note about it to python-checkins instead of python-dev by accident, but Fred seemed to agree with me there. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh21 at cam.ac.uk Tue Jan 30 09:30:15 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 30 Jan 2001 08:30:15 +0000 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: "Tim Peters"'s message of "Mon, 29 Jan 2001 22:57:07 -0500" References: Message-ID: In the interest of generating some numbers (and filling up my hard drive), last night I wrote a script to build lots & lots of versions of python (many of which turned out to be redundant - eg. -O6 didn't seem to do anything different to -O3 and pybench doesn't work with 1.5.2), and then run pybench with them. Summarised results below; first a key: src-n: this morning's CVS (with Jeremy's f_localsplus optimisation) (only built this with -O3) src: CVS from yesterday afternoon src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc patch applied. More on this later... Python-2.0: you can guess what this is. All runs are compared against Python-2.0-O2: Benchmark: src-n-O3 (rounds=10, warp=20) Average round time: 49029.00 ms -0.86% Benchmark: src (rounds=10, warp=20) Average round time: 67141.00 ms +35.76% Benchmark: src-O (rounds=10, warp=20) Average round time: 50167.00 ms +1.44% Benchmark: src-O2 (rounds=10, warp=20) Average round time: 49641.00 ms +0.37% Benchmark: src-O3 (rounds=10, warp=20) Average round time: 49104.00 ms -0.71% Benchmark: src-O6 (rounds=10, warp=20) Average round time: 49131.00 ms -0.66% Benchmark: src-obmalloc (rounds=10, warp=20) Average round time: 63276.00 ms +27.94% Benchmark: src-obmalloc-O (rounds=10, warp=20) Average round time: 46927.00 ms -5.11% Benchmark: src-obmalloc-O2 (rounds=10, warp=20) Average round time: 46146.00 ms -6.69% Benchmark: src-obmalloc-O3 (rounds=10, warp=20) Average round time: 46456.00 ms -6.07% Benchmark: src-obmalloc-O6 (rounds=10, warp=20) Average round time: 46450.00 ms -6.08% Benchmark: Python-2.0 (rounds=10, warp=20) Average round time: 68933.00 ms +39.38% Benchmark: Python-2.0-O (rounds=10, warp=20) Average round time: 49542.00 ms +0.17% Benchmark: Python-2.0-O3 (rounds=10, warp=20) Average round time: 48262.00 ms -2.41% Benchmark: Python-2.0-O6 (rounds=10, warp=20) Average round time: 48273.00 ms -2.39% My conclusion? Python 2.1 is slower than Python 2.0, but not by enough to care about. Interestingly, adding obmalloc speeds things up. Let's take a closer look: $ python pybench.py -c src-obmalloc-O3 -s src-O3 PYBENCH 0.7 Benchmark: src-O3 (rounds=10, warp=20) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 843.35 ms 6.61 us +2.93% BuiltinMethodLookup: 878.70 ms 1.67 us +0.56% ConcatStrings: 1068.80 ms 7.13 us -1.22% ConcatUnicode: 1373.70 ms 9.16 us -1.24% CreateInstances: 1433.55 ms 34.13 us +9.06% CreateStringsWithConcat: 1031.75 ms 5.16 us +10.95% CreateUnicodeWithConcat: 1277.85 ms 6.39 us +3.14% DictCreation: 1275.80 ms 8.51 us +44.22% ForLoops: 1415.90 ms 141.59 us -0.64% IfThenElse: 1152.70 ms 1.71 us -0.15% ListSlicing: 397.40 ms 113.54 us -0.53% NestedForLoops: 789.75 ms 2.26 us -0.37% NormalClassAttribute: 935.15 ms 1.56 us -0.41% NormalInstanceAttribute: 961.15 ms 1.60 us -0.60% PythonFunctionCalls: 1079.65 ms 6.54 us -1.00% PythonMethodCalls: 908.05 ms 12.11 us -0.88% Recursion: 838.50 ms 67.08 us -0.00% SecondImport: 741.20 ms 29.65 us +25.57% SecondPackageImport: 744.25 ms 29.77 us +18.66% SecondSubmoduleImport: 947.05 ms 37.88 us +25.60% SimpleComplexArithmetic: 1129.40 ms 5.13 us +114.92% SimpleDictManipulation: 1048.55 ms 3.50 us -0.00% SimpleFloatArithmetic: 746.05 ms 1.36 us -2.75% SimpleIntFloatArithmetic: 823.35 ms 1.25 us -0.37% SimpleIntegerArithmetic: 823.40 ms 1.25 us -0.37% SimpleListManipulation: 1004.70 ms 3.72 us +0.01% SimpleLongArithmetic: 865.30 ms 5.24 us +100.65% SmallLists: 1657.65 ms 6.50 us +6.63% SmallTuples: 1143.95 ms 4.77 us +2.90% SpecialClassAttribute: 949.00 ms 1.58 us -0.22% SpecialInstanceAttribute: 1353.05 ms 2.26 us -0.73% StringMappings: 1161.00 ms 9.21 us +7.30% StringPredicates: 1069.65 ms 3.82 us -5.30% StringSlicing: 846.30 ms 4.84 us +8.61% TryExcept: 1590.40 ms 1.06 us -0.49% TryRaiseExcept: 1104.65 ms 73.64 us +24.46% TupleSlicing: 681.10 ms 6.49 us -3.13% UnicodeMappings: 1021.70 ms 56.76 us +0.79% UnicodePredicates: 1308.45 ms 5.82 us -4.79% UnicodeProperties: 1148.45 ms 5.74 us +13.67% UnicodeSlicing: 984.15 ms 5.62 us -0.51% ------------------------------------------------------------------------ Average round time: 49104.00 ms +5.70% *) measured against: src-obmalloc-O3 (rounds=10, warp=20) Words fail me slightly, but maybe some tuning of the memory allocation of longs & complex numbers would be in order? Time for lectures - I don't think algebraic geometry is going to make my head hurt as much as trying to explain benchmarks... Cheers, M. -- ARTHUR: But which is probably incapable of drinking the coffee. -- The Hitch-Hikers Guide to the Galaxy, Episode 6 From ping at lfw.org Tue Jan 30 09:38:12 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 30 Jan 2001 00:38:12 -0800 (PST) Subject: [Python-Dev] Read-only function attributes Message-ID: Hi there. I see that the function attribute feature specifically allows assignment to func_code and func_defaults, but no other special attributes. This seems really suspect to me. Why would we want to allow the reassignment of special attributes at all? Functions have always been immutable objects, and i can see some motivation for attaching mutable dictionaries to them, but it's a more serious move to make the functions mutable themselves. I don't recall any discussion about changing special attributes; i don't see a clear purpose to them; and i do see a danger in making it harder to be certain that a program is safe and predictable. (Yes, i did notice that function attributes can't be set in restricted mode, but the addition of extra features requiring extra security checks makes me uneasy.) -- ?!ng From ping at lfw.org Tue Jan 30 09:52:43 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 30 Jan 2001 00:52:43 -0800 (PST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: Eric S. Raymond wrote: > For different reasons, I'd like to be able to set a constant flag on a > object instance. Simple semantics: if you try to assign to a > member or method, it throws an exception. Guido van Rossum wrote: > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? I'm not so sure. There seem to be many issues here. More questions: What's the difference between a frozen list and a tuple? Is a frozen list hashable? > - Should this reversible? I.e. should there be an x.unfreeze()? What if two threads lock and then unlock the same structure? > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... If you do this, i bet people will immediately want to freeze individual attributes. Some might be confused by a.x = [1, 2, 3] lock(a.x) # intend to lock the attribute, not the list a.x = 3 # hey, why is this allowed? What does locking an extension object do? What happens when you lock an object that implements list or dict semantics? Do we care that locking a UserList accomplishes nothing? Should unfreeze/unlock() be disallowed in restricted mode? -- ?!ng No software is totally secure, but using [Microsoft] Outlook is like hanging a sign on your back that reads "PLEASE MESS WITH MY COMPUTER." -- Scott Rosenberg, Salon Magazine From fredrik at effbot.org Tue Jan 30 10:05:47 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 30 Jan 2001 10:05:47 +0100 Subject: [Python-Dev] Read-only function attributes References: Message-ID: <01d701c08a9b$d7a9fe60$e46940d5@hagrid> Ka-Ping Yee wrote: > I see that the function attribute feature specifically allows > assignment to func_code and func_defaults, but no other special > attributes. This seems really suspect to me. Why would we want > to allow the reassignment of special attributes at all? to allow an IDE to "patch" a running program? From gvwilson at ca.baltimore.com Tue Jan 30 14:08:42 2001 From: gvwilson at ca.baltimore.com (Greg Wilson) Date: Tue, 30 Jan 2001 08:08:42 -0500 (EST) Subject: [Python-Dev] re: Making mutable objects readonly In-Reply-To: <20010130085202.18E71EAC4@mail.python.org> Message-ID: > Barry Warsaw: > lock(x) ...? :) Greg Wilson: -1 --- everyone will assume it's mutual exclusion, rather than immutability. From guido at digicool.com Tue Jan 30 15:01:15 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 09:01:15 -0500 Subject: [Python-Dev] Read-only function attributes In-Reply-To: Your message of "Tue, 30 Jan 2001 00:38:12 PST." References: Message-ID: <200101301401.JAA25600@cj20424-a.reston1.va.home.com> > I see that the function attribute feature specifically allows > assignment to func_code and func_defaults, but no other special > attributes. This seems really suspect to me. Why would we want > to allow the reassignment of special attributes at all? As Effbot said, this is useful in certain circumstances where a development environment wants to implement a "better reload". For this same reason you can assign to a class's __bases__ and __dict__ and to an instance's __class__ and __dict__. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 16:00:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:00:58 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: Your message of "Tue, 30 Jan 2001 00:52:43 PST." References: Message-ID: <200101301500.KAA25733@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > > > - How to spell it? x.freeze()? x.readonly()? Ping: > I'm not so sure. There seem to be many issues here. More questions: > > What's the difference between a frozen list and a tuple? A frozen list can be unfrozen (maybe)? > Is a frozen list hashable? Yes -- that's what started this thread (using dicts as dict keys, actually). > > - Should this reversible? I.e. should there be an x.unfreeze()? > > What if two threads lock and then unlock the same structure? That's up to the threads -- it's no different that other concurrent access. > > - Should we support something like this for instances too? Sometimes > > it might be cool to be able to freeze changing attribute values... > > If you do this, i bet people will immediately want to freeze > individual attributes. Some might be confused by > > a.x = [1, 2, 3] > lock(a.x) # intend to lock the attribute, not the list > a.x = 3 # hey, why is this allowed? That's a matter of API. I wouldn't make this a built-in, but rather a method on freezable objects (please don't call it lock()!). > What does locking an extension object do? What does adding 1 to an extension object do? > What happens when you lock an object that implements list or dict > semantics? Do we care that locking a UserList accomplishes nothing? Who says it doesn't? > Should unfreeze/unlock() be disallowed in restricted mode? I don't see why not. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 16:06:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:06:57 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Tue, 30 Jan 2001 07:55:16 +0100." <20010130075515.X962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> Message-ID: <200101301506.KAA25763@cj20424-a.reston1.va.home.com> > On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > > > add note about two kinds of illegal imports that are now checked > > > + - The compiler will report a SyntaxError if "from ... import *" occurs > > + in a function or class scope or if a name bound by the import > > + statement is declared global in the same scope. The language > > + reference has also documented that these cases are illegal, but > > + they were not enforced. > Woah. Is this really a good idea ? I have seen 'from ... import *' > in a function scope put to good (relatively -- we're talking 'import > *' here) use. I also thought of 'import' as yet another assignment > statement, so to me it's both logical and consistent if 'import' > would listen to 'global'. Otherwise we have to re-invent 'import > spam; eggs = spam' if we want eggs to be global. Note that Jeremy is only raising errors for "from M import *". > Is there really a reason to enforce this, or are we enforcing the > wording of the language reference for the sake of enforcing the > wording of the language reference ? When writing 'import as' for > 2.0, I fixed some of the inconsistencies in import, making it adhere > to 'global' statements in as many cases as possible (all except > 'from ... import *') but I was apparently not aware of the wording > of the language reference. I'd suggest updating the wording in the > language reference, not the implementation, unless there is a good > reason to disallow this. I think Jeremy has an excellent reason. Compilers want to do analysis of name usage at compile time. The value of * cannot be determined at compile time (you cannot know what module will actually be imported at run time). Up till now, we were able to fudge this, but Jeremy's new compiler needs to know exactly which names are defined in all local scopes, in order to do nested scopes right. > I also have another issue with your recent patches, Jeremy, also in > the backwards-compatibility departement :) You gave new.code two > new, non-optional arguments, in the middle of the long argument > list. I sent a note about it to python-checkins instead of > python-dev by accident, but Fred seemed to agree with me there. (Tim will love this. :-) I don't know what those new arguments represent. If they can reasonably be assumed to be empty for code that doesn't use the new features, I'd say move them to the end and default them properly. If they must be specified, I'd say too bad, the new module is an accident of the implementation anyway, and its users should update their code. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jan 30 16:08:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 10:08:39 -0500 Subject: [Python-Dev] Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Your message of "Tue, 30 Jan 2001 07:26:48 +0100." <20010130072648.W962@xs4all.nl> References: <14966.2069.950895.627663@beluga.mojam.com> <20010130072648.W962@xs4all.nl> Message-ID: <200101301508.KAA25825@cj20424-a.reston1.va.home.com> > Note that the fact I implemented it doesn't mean I'm +1 on it (witness my > posts on python-list.) In fact, *while implementing it*, I grew from +0 to > -0 and maybe even to a weak -1 (all in 5 minutes :) The enthousiastic > subject of the patch was a weak attempt at 5AM humour, not a venting of an > ancient desire :) Can you say "PEP time"? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Tue Jan 30 16:29:43 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 30 Jan 2001 10:29:43 -0500 Subject: [Python-Dev] Read-only function attributes References: Message-ID: <14966.56807.288840.7850@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> I see that the function attribute feature specifically allows KY> assignment to func_code and func_defaults, but no other KY> special attributes. This seems really suspect to me. Why KY> would we want to allow the reassignment of special attributes KY> at all? ... and actually, none of that changed w/ the function attribute patch. You've been able to assign to func_code and func_defaults since Python 1.6! -Barry From thomas at xs4all.net Tue Jan 30 16:52:04 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 16:52:04 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <200101301506.KAA25763@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 30, 2001 at 10:06:57AM -0500 References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> Message-ID: <20010130165204.I962@xs4all.nl> On Tue, Jan 30, 2001 at 10:06:57AM -0500, Guido van Rossum wrote: > > On Mon, Jan 29, 2001 at 05:27:30PM -0800, Jeremy Hylton wrote: > > > > > add note about two kinds of illegal imports that are now checked > > > > > + - The compiler will report a SyntaxError if "from ... import *" occurs > > > + in a function or class scope or if a name bound by the import > > > + statement is declared global in the same scope. The language > > > + reference has also documented that these cases are illegal, but > > > + they were not enforced. > > Woah. Is this really a good idea ? I have seen 'from ... import *' > > in a function scope put to good (relatively -- we're talking 'import > > *' here) use. I also thought of 'import' as yet another assignment > > statement, so to me it's both logical and consistent if 'import' > > would listen to 'global'. Otherwise we have to re-invent 'import > > spam; eggs = spam' if we want eggs to be global. > Note that Jeremy is only raising errors for "from M import *". No, he says he's also raising errors for 'import spam' if 'spam' is declared global, like so: def viking(): global spam import spam > > Is there really a reason to enforce this, or are we enforcing the > > wording of the language reference for the sake of enforcing the > > wording of the language reference ? When writing 'import as' for > > 2.0, I fixed some of the inconsistencies in import, making it adhere > > to 'global' statements in as many cases as possible (all except > > 'from ... import *') but I was apparently not aware of the wording > > of the language reference. I'd suggest updating the wording in the > > language reference, not the implementation, unless there is a good > > reason to disallow this. > I think Jeremy has an excellent reason. Compilers want to do analysis > of name usage at compile time. The value of * cannot be determined at > compile time (you cannot know what module will actually be imported at > run time). Up till now, we were able to fudge this, but Jeremy's new > compiler needs to know exactly which names are defined in all local > scopes, in order to do nested scopes right. Hrrmm.... I guess I have to agree with that. None the less, I wish we could have a "ack! this is stupid code! it uses 'from larch import *'! All bets are off, we do a lot of slow complicated runtime checking now!" mode. The thing I still enjoy most about Python is that it always does what I want, and though I'd never want to do 'from different import *' in a local scope, I do want other, less wise people to have the same experience, where possible :) And I also want to be able to do: def fill_me(with): global me if with == 1: import me elif with == 2: import me_too as me elif with == 3: from me.Tools import me_me as me elif with == 4: me = FakeModule() sys.modules['me'] = me else: raise ValueError And I can't quite argue that away with 'the compiler needs to know ...' -- it's all there! > > I also have another issue with your recent patches, Jeremy, also in > > the backwards-compatibility departement :) You gave new.code two > > new, non-optional arguments, in the middle of the long argument > > list. I sent a note about it to python-checkins instead of > > python-dev by accident, but Fred seemed to agree with me there. > (Tim will love this. :-) > I don't know what those new arguments represent. If they can > reasonably be assumed to be empty for code that doesn't use the new > features, I'd say move them to the end and default them properly. If > they must be specified, I'd say too bad, the new module is an accident > of the implementation anyway, and its users should update their code. Okay, I can live with that. It's sure to cause some gripes though. Then again, from looking at the code I'd say those arguments (freevars and cellvars) can easily default to empty tuples. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From bckfnn at worldonline.dk Tue Jan 30 18:34:10 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Tue, 30 Jan 2001 17:34:10 GMT Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101291500.KAA11569@cj20424-a.reston1.va.home.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> Message-ID: <3a76df10.22007715@smtp.worldonline.dk> [Guido] >Maybe we could add a flag to the dict that issues an error when a new >key is inserted during such a for loop? FWIW, some of the java2 collections decided to throw a Concurrent- ModificationException in the iterator if the collection was modified during the iteration. Generally none of java2 collections can be modified while iterating over it (the exception is calling .remove() on the iterator object and not all collections support that). >(I don't think the key order can be affected when a key is *deleted*.) Probably also true for the Hashtables which is backing our PyDictionary, but I'll rather not depend too much on it being true. [Tim] >That latter is true but specific to this implementation. "Can't mutate the >dict period" is easier to keep straight, and probably harmless in practice >(if not, it could be relaxed later). Agree. >Recall that a similar trick is played >during list.sort(), replacing the list's type pointer for the duration (to >point to an internal "immutable list" type, same as the list type except the >"dangerous" slots point to a function that raises an "immutable list" >TypeError). Then no runtime expense is incurred for regular lists to keep >checking flags. I thought of this as an elegant use for switching types at >runtime; you may still be appalled by it, though! Changing the type of a type? Yuck! I might very likely be reading the CPython sources wrongly, but it seems this trick will cause an BadInternalCall if some other C extension are trying to modify a list while it is freezed by the type switching trick. I imagine this would happen if the extension called: PyList_SetItem(myList, 0, aValue); I guess Jython could support this from the python side, but its hard to ensure from the java side without adding an additional PyList_Check(..) to all list methods. It just doesn't feel like the right thing to go since it would cause slower access to all mutable objects. regards, finn From guido at digicool.com Tue Jan 30 21:42:58 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 15:42:58 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Tue, 30 Jan 2001 16:52:04 +0100." <20010130165204.I962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> Message-ID: <200101302042.PAA29301@cj20424-a.reston1.va.home.com> > > > Woah. Is this really a good idea ? I have seen 'from ... import *' > > > in a function scope put to good (relatively -- we're talking 'import > > > *' here) use. I also thought of 'import' as yet another assignment > > > statement, so to me it's both logical and consistent if 'import' > > > would listen to 'global'. Otherwise we have to re-invent 'import > > > spam; eggs = spam' if we want eggs to be global. > > > Note that Jeremy is only raising errors for "from M import *". > > No, he says he's also raising errors for 'import spam' if 'spam' is declared > global, like so: > > def viking(): > global spam > import spam Yeah, this was just brought to my attention at our group meeting today. I'm with you on this one -- there really isn't a good reason why this shouldn't work. (I wonder why that constraint was ever added to the reference manual; maybe I was just upset that someone would *do* something as ugly as that, or maybe there was a J[P]ython reason???.) > > I think Jeremy has an excellent reason. Compilers want to do analysis > > of name usage at compile time. The value of * cannot be determined at > > compile time (you cannot know what module will actually be imported at > > run time). Up till now, we were able to fudge this, but Jeremy's new > > compiler needs to know exactly which names are defined in all local > > scopes, in order to do nested scopes right. > > Hrrmm.... I guess I have to agree with that. None the less, I wish we could > have a "ack! this is stupid code! it uses 'from larch import *'! All bets > are off, we do a lot of slow complicated runtime checking now!" mode. The > thing I still enjoy most about Python is that it always does what I want, > and though I'd never want to do 'from different import *' in a local scope, > I do want other, less wise people to have the same experience, where > possible :) Hm, maybe, just *maybe* Jeremy can do this if there are no nested scopes in sight. But I don't think it's a big deal as long as the error message is clear -- it's bad style. > And I also want to be able to do: > > def fill_me(with): > global me > if with == 1: > import me > elif with == 2: > import me_too as me > elif with == 3: > from me.Tools import me_me as me > elif with == 4: > me = FakeModule() > sys.modules['me'] = me > else: > raise ValueError > > And I can't quite argue that away with 'the compiler needs to know ...' -- > it's all there! Sort of, although I would prefer to do a two-stager here: first some variation of "import me as meohmy", and then "global me; me = meohmy" . > > > I also have another issue with your recent patches, Jeremy, also in > > > the backwards-compatibility departement :) You gave new.code two > > > new, non-optional arguments, in the middle of the long argument > > > list. I sent a note about it to python-checkins instead of > > > python-dev by accident, but Fred seemed to agree with me there. > > > (Tim will love this. :-) > > > I don't know what those new arguments represent. If they can > > reasonably be assumed to be empty for code that doesn't use the new > > features, I'd say move them to the end and default them properly. If > > they must be specified, I'd say too bad, the new module is an accident > > of the implementation anyway, and its users should update their code. > > Okay, I can live with that. It's sure to cause some gripes though. Then > again, from looking at the code I'd say those arguments (freevars and > cellvars) can easily default to empty tuples. OK. I hope Jeremy can fix this when he gets home. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Tue Jan 30 23:30:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 30 Jan 2001 23:30:25 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3a76df10.22007715@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Tue, Jan 30, 2001 at 05:34:10PM +0000 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> Message-ID: <20010130233025.J962@xs4all.nl> On Tue, Jan 30, 2001 at 05:34:10PM +0000, Finn Bock wrote: > >Recall that a similar trick is played during list.sort(), replacing the > >list's type pointer for the duration (to point to an internal "immutable > >list" type, same as the list type except the "dangerous" slots point to a > >function that raises an "immutable list" TypeError). Then no runtime > >expense is incurred for regular lists to keep checking flags. I thought > >of this as an elegant use for switching types at runtime; you may still > >be appalled by it, though! > Changing the type of a type? Yuck! No, the typeobject itself isn't changed -- that would freeze *all* dicts/lists/whatever, not just the one we want. We'd be changing the type of an object (or 'type instance', if you want, but not "type 'instance'"), not the type of a type. > I might very likely be reading the CPython sources wrongly, but it seems > this trick will cause an BadInternalCall if some other C extension are > trying to modify a list while it is freezed by the type switching trick. > I imagine this would happen if the extension called: > PyList_SetItem(myList, 0, aValue); Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() (or whatever), and methods/operations that modify the listobject would have to check if the list is frozen, and raise an appropriate error if so. This might throw 'unexpected' errors, but only in situations that can't happen right now! -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fredrik at effbot.org Tue Jan 30 23:45:16 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 30 Jan 2001 23:45:16 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> <20010130233025.J962@xs4all.nl> Message-ID: <003501c08b0e$51f975c0$e46940d5@hagrid> > Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, > 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() > (or whatever), and methods/operations that modify the listobject would have > to check if the list is frozen, and raise an appropriate error if so. This > might throw 'unexpected' errors. did someone just subscribe me to the perl-porters list? -1 on "modal freeze" (it's madness) -0 on an "immutable dictionary" type in the core From tim.one at home.com Wed Jan 31 00:53:45 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 30 Jan 2001 18:53:45 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This is all PEP material now. Yup. > Tim, do you want to own the PEP? Not really. Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first). >> Cool! Can we resist adding >> >> if key:value in dict >> >> for "parallelism"? (I know I can ...) > That's easy to resist because, unlike ``for key:value in dict'', it's > not unambiguous: But if (key:value) in dict is. Just trying to help whoever *does* want the PEP . > ... > I'm certainly more comfortable with just ``for key in dict'' than with > the whole slow of extensions using colons. What about just the for key:value in dict for index:value in sequence extensions? The degenerate forms (omitting x or y or both in x:y) are mechanical variations so are likely to get raised. > But, again, that's for the PEP to fight over. PEPs are easier if you Pronounce on things you hate early so that those can get recorded in the "BDFL Pronouncements" section without further ado. whatever-this-may-look-like-it's-not-a-pep-discussion-ly y'rs - tim From nas at arctrix.com Tue Jan 30 18:12:15 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 30 Jan 2001 09:12:15 -0800 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <003501c08b0e$51f975c0$e46940d5@hagrid>; from fredrik@effbot.org on Tue, Jan 30, 2001 at 11:45:16PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3a76df10.22007715@smtp.worldonline.dk> <20010130233025.J962@xs4all.nl> <003501c08b0e$51f975c0$e46940d5@hagrid> Message-ID: <20010130091215.C18319@glacier.fnational.com> On Tue, Jan 30, 2001 at 11:45:16PM +0100, Fredrik Lundh wrote: > did someone just subscribe me to the perl-porters list? > > -1 on "modal freeze" (it's madness) > -0 on an "immutable dictionary" type in the core I'm glad I'm not the only one who had that feeling. I agree with your votes too. Neil From nas at arctrix.com Tue Jan 30 18:24:54 2001 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 30 Jan 2001 09:24:54 -0800 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: ; from tim.one@home.com on Tue, Jan 30, 2001 at 06:53:45PM -0500 References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> Message-ID: <20010130092454.D18319@glacier.fnational.com> [Tim Peters on adding yet more syntatic sugar] > Available time is finite, and this isn't at the top of the list > of things I'd like to see (resuming the discussion of > generators + coroutines + iteration protocol comes to mind > first). What's the chances of getting generators into 2.2? The implementation should not be hard. Didn't Steven Majewski have something years ago? Why do we always get sidetracked on trying to figure out how to do coroutines and continuations? Generators would add real power to the language and are simple enough that most users could benefit from them. Also, it should be possible to design an interface that does not preclude the addition of coroutines or continuations later. I'm not volunteering to champion the cause just yet. I just want to know if there is some issue I'm missing. Neil From barry at digicool.com Wed Jan 31 01:24:05 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 30 Jan 2001 19:24:05 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <14967.23333.57259.347222@anthem.wooz.org> >>>>> "NS" == Neil Schemenauer writes: NS> What's the chances of getting generators into 2.2? The NS> implementation should not be hard. Didn't Steven Majewski NS> have something years ago? Why do we always get sidetracked on NS> trying to figure out how to do coroutines and continuations? I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, renaming it just "Generators" and filling it out for the 2.2 time frame. If we want to address coroutines and continuations later, we can write separate PEPs for them. Send me a draft. -Barry From guido at digicool.com Wed Jan 31 01:28:44 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:28:44 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 18:53:45 EST." References: Message-ID: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> > Not really. Available time is finite, and this isn't at the top of the list > of things I'd like to see (resuming the discussion of generators + > coroutines + iteration protocol comes to mind first). OK, get going on that one then! > >> Cool! Can we resist adding > >> > >> if key:value in dict > >> > >> for "parallelism"? (I know I can ...) > > > That's easy to resist because, unlike ``for key:value in dict'', it's > > not unambiguous: > > But > > if (key:value) in dict > > is. Just trying to help whoever *does* want the PEP . OK, I'll pronounce -1 on this one. It looks ugly to me -- too reminiscent of C's if (...) required parentheses. Also it suggests that (key:value) is a new tuple notation that might be useful in other contexts -- which it's not. > > ... > > I'm certainly more comfortable with just ``for key in dict'' than with > > the whole slow of extensions using colons. > > What about just the > > for key:value in dict > for index:value in sequence > > extensions? I'm not against these -- I'd say +0.5. > The degenerate forms (omitting x or y or both in x:y) are > mechanical variations so are likely to get raised. For those, +0.2. > > But, again, that's for the PEP to fight over. > > PEPs are easier if you Pronounce on things you hate early so that those can > get recorded in the "BDFL Pronouncements" section without further ado. At your service -- see above. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 31 01:49:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:49:24 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 09:24:54 PST." <20010130092454.D18319@glacier.fnational.com> References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <200101310049.TAA30197@cj20424-a.reston1.va.home.com> > [Tim Peters on adding yet more syntatic sugar] > > Available time is finite, and this isn't at the top of the list > > of things I'd like to see (resuming the discussion of > > generators + coroutines + iteration protocol comes to mind > > first). > > What's the chances of getting generators into 2.2? The > implementation should not be hard. Didn't Steven Majewski have > something years ago? Why do we always get sidetracked on trying > to figure out how to do coroutines and continuations? I think there's a very good chance of getting them into 2.2. But it *is* true that coroutines are a very attractice piece of land "just nextdoor". On the other hand, continiations are a mirage, so don't try to go there. :-) > Generators would add real power to the language and are simple > enough that most users could benefit from them. Also, it should be > possible to design an interface that does not preclude the > addition of coroutines or continuations later. > > I'm not volunteering to champion the cause just yet. I just want > to know if there is some issue I'm missing. There are different ways to do interators. Here is a very "tame" proposal (and definitely in the realm of 2.2), that doesn't require any coroutine-like tricks. Let's propose that for var in expr: ...do something with var... will henceforth be translated into __iter = iterator(expr) while __iter.more(): var = __iter.next() ...do something with var... -- or some variation that combines more() and next() (I don't care). Then a new built-in function iterator() is needed that creates an iterator object. It should try two things: (1) If the object implements __iterator__() (or a C API equivalent), call that and be done; this way arbitrary iterators can be created. (2) If the object smells like a sequence (how to test???), use an iterator sort of like this: class Iterator: def __init__(self, sequence): self.sequence = sequence self.index = 0 def more(self): # Store the item so that each index is tried exactly once try: self.item = self.sequence[self.index] except IndexError: return 0 else: self.index = self.index + 1 return 1 def next(self): return self.item (I don't necessarily mean that all those instance variables should be publicly available.) The built-in sequence types can use a very fast built-in iterator type that uses a C int for the index and doesn't store the item in the iterator. (This should be as fast as Marc-Andre's for loop optimization using a C counter.) Dictionaries can define an appropriate iterator that uses PyDict_Next(). If the argument to iterator() is itself an iterator (how to test???), it returns the argument unchanged, so that one can also write for var in iterator(obj): ...do something with var... Files of course should have iterators that return the next input line. We could build filtering and mapping iterators that take an iterator argument and do certain manipulations with the elements; this would effectively introduce the notion lazy evaluation on sequences. Etc., etc. This does not come close to Icon generators -- but it doesn't require any coroutine-like capabilities, unlike those. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jan 31 01:55:10 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 30 Jan 2001 19:55:10 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3a76df10.22007715@smtp.worldonline.dk> Message-ID: [Finn Bock] > Changing the type of a type? Yuck! No, it temporarily changes the type of the single list being sorted, like so, where "self" is a pointer to a PyListObject (which is a list, not a list *type* object): self->ob_type = &immutable_list_type; err = samplesortslice(self->ob_item, self->ob_item + self->ob_size, compare); self->ob_type = &PyList_Type; immutable_list_type is "just like" PyList_Type, except that the slots for mutating methods point to a function that raises a TypeError. Before this drastic step came years of increasingly ugly hacks trying to stop core dumps when people mutated a list during the sort. Python's sort is very complex, and lots of pointers are tucked away -- having the size of the array, or its position in memory, or the set of objects it contains, change as a side effect of doing a compare, would be difficult and expensive to recover from -- and by "difficult" read "nobody ever managed to get it right before this" <0.5 wink>. > I might very likely be reading the CPython sources wrongly, but it seems > this trick will cause an BadInternalCall if some other C extension are > trying to modify a list while it is freezed by the type switching trick. > I imagine this would happen if the extension called: > > PyList_SetItem(myList, 0, aValue); Well, in CPython it's not "legal" for any other thread to use the C API while the sort is in progress, because the thread doing the sort holds the global interpreter lock for the duration. So this could happen "legally" only if a comparison function called by the sort called out to a C extension attempting to mutate the list. In that case, fine, it *is* a bad call: mutation is not allowed during list sorting, so they deserve whatever they get -- and far better a "bad internal call" than a core dump. If the immutable_list_type were used more generally, it would require more general support (but I see Thomas already talked about that -- thanks). From guido at digicool.com Wed Jan 31 01:55:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 19:55:19 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: Your message of "Tue, 30 Jan 2001 19:24:05 EST." <14967.23333.57259.347222@anthem.wooz.org> References: <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> <14967.23333.57259.347222@anthem.wooz.org> Message-ID: <200101310055.TAA30250@cj20424-a.reston1.va.home.com> > I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, > renaming it just "Generators" and filling it out for the 2.2 time > frame. If we want to address coroutines and continuations later, we > can write separate PEPs for them. I think it's better not to re-use PEP 220 for that. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Wed Jan 31 01:58:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 01:58:32 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310028.TAA30090@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, Jan 30, 2001 at 07:28:44PM -0500 References: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> Message-ID: <20010131015832.K962@xs4all.nl> On Tue, Jan 30, 2001 at 07:28:44PM -0500, Guido van Rossum wrote: > > What about just the > > for key:value in dict > > for index:value in sequence > > extensions? > I'm not against these -- I'd say +0.5. What, fractions ? Isn't that against the whole idea of (+|-)(0|1) ? :) But since we are voting, I'm -0 on this right now, and might end up -1 or +0, depending on the implementation; I still can't *see* this, though I wouldn't be myself if I hadn't tried to implement it anyway :) And I ran into some fairly mind-boggling issues. The worst bit is 'how the f*ck does FOR_LOOP know if something's a dict or a list'. And the almost-as-bad bit is 'WTF to do for user classes, extension types and almost-list/almost-dict practically-builtin types (arrays, the *dbm's, etc.)'. After some sleep-deprived consideration I gave up and decided we need an iteration/generator protocol first. However, my life's been busy (or rather, my work has been) with all kinds of small and not so small details, and I haven't been getting much sleep in the last week or so, so I might be overlooking something very simple. That's why I can go either way based on implementation -- it might prove me wrong :) Until my boss is back and I stop being 'responsible' (end of this week, start of next week) and I get a chance to get rid of about 2 months of work backlog (the time he was away) I won't have time to champion or even contribute to such a PEP. Then again, by that time I might be preparing for IPC9 (_if_ my boss sends me there) or even my ApacheCon US presentation (which got accepted today, yay!) So, if that other message was an attempt to drop the PEP on me, Guido, the answer is the same as I tend to give to suits that show up next to my desk wanting to discuss something important (to them) right away: "b'gg'r 'ff" :) I'll-save-my-answer-to-PR-officers-doing-the-same-for-when-you-do-something- -*really*-offensive-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Wed Jan 31 02:16:51 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 20:16:51 -0500 Subject: [Python-Dev] Let's release 2.1a2 Thursday night Message-ID: <200101310116.UAA30386@cj20424-a.reston1.va.home.com> Things look good for a release of 2.1a2 this week; we're aiming for Thursday night. I won't be in town (speaking to the press at LinuxWorld Expo in New York) but Jeremy will handle the release process and the other PythonLabs folks will assist him. Tomorrow Fred will check in his weak references after making some changes (mostly making it more Spartan :-) that I suggested in a code review. After that, I think we're good for the second (and last!) alpha release; and enough has changed (e.g. nested scopes, lots of setup.py changes, flat Makefile) to warrant going ahead now. Now is the time for those last-minute bugfixes that you're all so famous for! I propose a checkin freeze for non-PythonLabs folks Wednesday midnight US west coast time, to give Jeremy c.s. enough time to build the release and give it a good work-out. (An internal freeze is up to Jeremy to declare, but should probably take Tim's sleep cycle into account.) --Guido van Rossum (home page: http://www.python.org/~guido/) PS. I'll be out of reach from noon US east coast time tomorrow (Wednesday), traveling to New York by train. I probably won't check my email while out there; I'll be back Friday night. From guido at digicool.com Wed Jan 31 02:35:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 30 Jan 2001 20:35:25 -0500 Subject: [Python-Dev] SSL socket read at EOF; SourceForge problem In-Reply-To: Your message of "Mon, 29 Jan 2001 23:56:09 EST." References: Message-ID: <200101310135.UAA30629@cj20424-a.reston1.va.home.com> > I'm going to repeat a question that I posted about a week ago that passed > without comment on the newsgroup. The issue is the SSL support in the socket > module, which raises an exception when the reading socket is at EOF, rather > than returning an empty string. I'm hesitant to call it a "bug", but I > wouldn't have implemented it this way. There are the names of two people > mentioned at the top of socketmodule.c, but no contact information, so I'm > suggesting here that it be changed to conform to normal file/socket > practice. (SSL was actually added at 2.0, so I'm late to the party with > this; mea culpa, mea culpa. I delayed trying Python2 because of the > extension rebuilding.) I agree that it makes more sense if a read at EOF returns an empty string, since that's what other file-like objects in Python do. I can't do much about this right now, but I'd love to see a patch. It could go into 2.1a2 if small enough. Note that input() and raw_input() are specifically excepted because they are intended for use in interactive mode by newbies mostly; and because "" as return value for EOF would be ambiguous for these. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed Jan 31 05:12:23 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jan 2001 17:12:23 +1300 (NZDT) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310028.TAA30090@cj20424-a.reston1.va.home.com> Message-ID: <200101310412.RAA03140@s454.cosc.canterbury.ac.nz> : > for index:value in sequence -1, because we only construct dicts using that notation, not sequences. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed Jan 31 06:21:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 00:21:37 -0500 Subject: [Python-Dev] codecity.com Message-ID: <200101310521.AAA31653@cj20424-a.reston1.va.home.com> Should I spread this word, or is this a joke? The Python quiz category is laughable. --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Sat, 27 Jan 2001 23:16:02 -0800 From: "Jeff Cordova" To: Subject: New, fun way to learn Python. Hi Guido, I wanted to let you know about www.codecity.com After several years of managing large software projects in Silicon Valley, I realized that I was spending a lot of time teaching jr. programmers how to write code. So, I created CodeCity to help me automate some of that. If you go to the site, you'll see that I've created a category for Python. There's not much depth to the Python content yet (the site is only a week old) but I'm expecting the Python community to add their wisdom over a period of time. If you could spread the word, it would be highly appreciated. Thankyou, Jeff C. ------- End of Forwarded Message From tim.one at home.com Wed Jan 31 07:16:48 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 01:16:48 -0500 Subject: [Python-Dev] codecity.com In-Reply-To: <200101310521.AAA31653@cj20424-a.reston1.va.home.com> Message-ID: [Guido, on www.codecity.com] > Should I spread this word, or is this a joke? The Python quiz > category is laughable. While the Python section still seems to have only one question, the first day this was announced the third choice wasn't today's: Python is Open Source code, so it doesn't have a creator but: Martha Stewart I liked it better before <0.9 wink>. From moshez at zadka.site.co.il Wed Jan 31 07:30:07 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 08:30:07 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <200101310049.TAA30197@cj20424-a.reston1.va.home.com> References: <200101310049.TAA30197@cj20424-a.reston1.va.home.com>, <200101300206.VAA21925@cj20424-a.reston1.va.home.com> <20010130092454.D18319@glacier.fnational.com> Message-ID: <20010131063007.536ACA83E@darjeeling.zadka.site.co.il> On Tue, 30 Jan 2001 19:49:24 -0500, Guido van Rossum wrote: > There are different ways to do interators. > > Here is a very "tame" proposal (and definitely in the realm of 2.2), > that doesn't require any coroutine-like tricks. Let's propose that > > for var in expr: > ...do something with var... > > will henceforth be translated into > > __iter = iterator(expr) > while __iter.more(): > var = __iter.next() > ...do something with var... I'm +1 on that...but Tim's "try to use that to write something that will return the nodes of a binary tree" still haunts me. Personally, though, I'd thin down the interface to while 1: try: var = __iter.next() except NoMoreError: break # pseudo-break? With the usual caveat that this is a lie as far as "else" is concerned (IOW, pseudo-break gets into the else) > Then a new built-in function iterator() is needed that creates an > iterator object. It should try two things: > > (1) If the object implements __iterator__() (or a C API equivalent), > call that and be done; this way arbitrary iterators can be > created. > (2) If the object smells like a sequence (how to test???), use an > iterator sort of like this: Why not, "if the object doesn't have __iterator__, try this. If it won't work, we'll find out by the exception that will be thrown in our face". class Iterator: def __init__(self, seq): self.seq = seq self.index = 0 def next(self): try: try: return self.seq[self.index] # <- smells like except IndexError: raise NoMoreError(self.index) finally: self.index += 1 > (I don't necessarily mean that all those instance variables should > be publicly available.) But what about your poor brother? Er....I mean, this would make implementing "indexing" really about just getting the index from the iterator. > If the argument to iterator() is itself an iterator (how to test???), No idea, and this looks problematic. I see your point -- but it's still problematic. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From tim.one at home.com Wed Jan 31 07:57:26 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 01:57:26 -0500 Subject: [Python-Dev] Can't enter new Python bugs on SourceForge? Message-ID: Reported this earlier. Still can't create a new bug. Guido either. Here's the SF Support request opened on this: http://sourceforge.net/support/ index.php?func=detailsupport&support_id=113100&group_id=1 The good(?) news is that Python isn't the only project to report this problem. From tim.one at home.com Wed Jan 31 08:50:18 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 02:50:18 -0500 Subject: [Python-Dev] FW: Python programmer needed (addition to urllib2 and HTTPS support) Message-ID: Get rich quick! -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of Albert Chin-A-Young Sent: Wednesday, January 31, 2001 2:31 AM To: python-list at python.org Subject: Python programmer needed (addition to urllib2 and HTTPS support) We're in need of a contract Python programmer for the following: 1. Allow connecting to a host with urlopen() which requires BASIC HTTP authentication with a proxy (via urllib2.py). This should address bug #125217: http://sourceforge.net/bugs/?func=detailbug&bug_id=125217&group_id=5470 2. Allow connecting to a host with urlopen() which requires BASIC HTTP authentication with a proxy that requires BASIC HTTP authentication (via urllib2.py). 3. Support for non-authenticated clients to connect to a HTTPS server 4. Support for a client to authenticate the HTTPS host (to verify that it's certificate is valid) What we might consider adding (depends on cost): 1. Support for authenticated clients to connect to a HTTPS server. Please note that solutions to the four items above must be rolled back into the main Python distribution (implies the "community" and the Python developers need to agree on the adopted solution). -- albert chin (china at thewrittenword dot com) -- http://mail.python.org/mailman/listinfo/python-list From ping at lfw.org Wed Jan 31 10:47:10 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 01:47:10 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: Message-ID: On Tue, 30 Jan 2001, Guido van Rossum wrote: > > Can you say "PEP time"? :-) Okay, i have written a draft PEP that tries to combine the "elt in dict", custom iterator, and "for k:v" issues into a coherent proposal. Have a look: http://www.lfw.org/python/pep-iterators.txt http://www.lfw.org/python/pep-iterators.html Could i get a number for this please? -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From moshez at zadka.site.co.il Wed Jan 31 11:14:49 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 12:14:49 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: References: Message-ID: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> On Wed, 31 Jan 2001 01:47:10 -0800 (PST), Ka-Ping Yee wrote: > Okay, i have written a draft PEP that tries to combine the > "elt in dict", custom iterator, and "for k:v" issues into a > coherent proposal. Have a look: > > http://www.lfw.org/python/pep-iterators.txt > http://www.lfw.org/python/pep-iterators.html Er....one problem with first reading: you forgot to mention in the while loop description that 'else:' would be executed if the exception is raised, so the 'break' is a pseudo-break'. Basic response: I *love* the iter(), sq_iter and __iter__ parts. I tremble at seeing the rest. Why not add a method to dictionaries .iteritems() and do for (k, v) in dict.iteritems(): pass (dict.iteritems() would return an an iterator to the items) -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From MarkH at ActiveState.com Wed Jan 31 11:34:01 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 31 Jan 2001 21:34:01 +1100 Subject: [Python-Dev] WARNING: Changed build process for zlib on Windows Message-ID: Hi all, In an attempt to solve "[ Bug #129293 ] zlib library used for binary win32 distribution can crash" (https://sourceforge.net/bugs/?func=detailbug&group_id=5470&bug_id=129293), Tim and I have decided that we should fix the build process of zlib.pyd on windows. The current process requires that the builder download _2_ zlib archives - a binary distribution for zlib.lib, and the source archive for the headers. We believe that slight differences between the 2 are causing the above bug. A particular warning-light is that the current process defines ZLIB_DLL even though we are _not_ currently using the DLL but the static lib. Removing this #define generates linker errors. The new process is very simple, but may break some peoples build. In theory it _should_ still work for everyone, but if it fails to build, please check your directory structure. From ping at lfw.org Wed Jan 31 12:00:48 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 03:00:48 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <20010131015832.K962@xs4all.nl> Message-ID: On Wed, 31 Jan 2001, Thomas Wouters wrote: > I still can't *see* this, though I > wouldn't be myself if I hadn't tried to implement it anyway :) And I ran > into some fairly mind-boggling issues. The worst bit is 'how the f*ck > does FOR_LOOP know if something's a dict or a list'. I believe the Pythonic answer to that is "see if the appropriate method is available". The best definition of "sequence-like" or "mapping-like" i can come up with is: x is sequence-like if it provides __getitem__() but not keys() x is mapping-like if it provides __getitem__() and keys() But in our case, since we need iteration, we can look for specific methods that have to do with just what we need for iteration and nothing else. Thus, e.g. a mapping-like class without a values() method is no problem if we never ask to iterate over values. > And the > almost-as-bad bit is 'WTF to do for user classes, extension types and > almost-list/almost-dict practically-builtin types I think it can be done; the draft PEP at http://www.lfw.org/python/pep-iterators.html is a best-attempt at supporting everything just as you would expect. Let me know if you think there are important cases it doesn't cover. I know, the table mp_iteritems __iteritems__, __iter__, items, __getitem__ mp_iterkeys __iterkeys__, __iter__, keys, __getitem__ mp_itervalues __itervalues__, __iter__, values, __getitem__ sq_iter __iter__, __getitem__ might look a little frightening, but it's not so bad, and i think it's about as simple as you can make it while continuing to support existing pseudo-lists and pseudo-dictionaries. No instance should ever provide __iter__ at the same time as any of the other __iter*__ methods anyway. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From mal at lemburg.com Wed Jan 31 12:56:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 12:56:12 +0100 Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) References: Message-ID: <3A77FD5C.DE8729DC@lemburg.com> > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv17061/Python > > Modified Files: > compile.c > Log Message: > Enforce two illegal import statements that were outlawed in the > reference manual but not checked: Names bound by import statemants may > not occur in global statements in the same scope. The from ... import * > form may only occur in a module scope. > > I guess these changes could break code, but the reference manual > warned about them. Jeremy, your code breaks all uses of "from package import submodule" inside packages. Try distutils for example or setup.py.... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 13:01:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:01:24 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> Message-ID: <3A77FE94.E5082136@lemburg.com> Guido van Rossum wrote: > > [ESR] > > For different reasons, I'd like to be able to set a constant flag on a > > object instance. Simple semantics: if you try to assign to a > > member or method, it throws an exception. > > > > Application? I have a large Python program that goes to a lot of effort > > to build elaborate context structures in core. It would be nice to know > > they can't be even inadvertently trashed without throwing an exception I > > can watch for. > > Yes, this is a good thing. Easy to do on lists and dicts. Questions: > > - How to spell it? x.freeze()? x.readonly()? How about .lock() and .unlock() ? > - Should this reversible? I.e. should there be an x.unfreeze()? Yes. These low-level locks could be used in thread programming since the above calls are C level functions and thus thread safe w/r to the global interpreter lock. > - Should we support something like this for instances too? Sometimes > it might be cool to be able to freeze changing attribute values... Sure :) Eric, could you write a PEP for this ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 13:08:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:08:15 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A78002F.DC8F0582@lemburg.com> Tim Peters wrote: > > [MAL] > > ... > > What we really want is iterators for dictionaries, so why not > > implement these instead of tweaking for-loops. > > Seems an unrelated topic: would "iterators for dictionaries" solve the > supposed problem with iteration order? No, but it would solve the problem in a more elegant and generalized way. Besides, it also allows writing code which is thread safe, since the iterator can take special actions to assure that the dictionary doesn't change during the iteration phase (see the other thread about "making mutable objects readonly"). > > If you are looking for speedups w/r to for-loops, applying a > > different indexing technique in for-loops would go a lot further > > and provide better performance not only to dictionary loops, > > but also to other sequences. > > > > I have made some good experience with a special counter object > > (sort of like a mutable integer) which is used instead of the > > iteration index integer in the current implementation. > > Please quantify, if possible. My belief (based on past experiments) is that > in loops fancier than > > for i in range(n): > pass > > the loop overhead quickly falls into the noise even now. I don't remember the figures, but these micor optimizations do speedup loops by a noticable amount. Just compare the performance of stock Python 1.5 against my patched version. > > Using an iterator object instead of the integer + __getitem__ > > call machinery would allow more flexibility for all kinds of > > sequences or containers. ... > > This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new > iteration *protocol* could have major attractions. Not really... the counter object is just a special case of an iterator -- in this case iteration is over the IN. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 13:10:43 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:10:43 +0100 Subject: [Python-Dev] Re: Making mutable objects readonly References: Message-ID: <3A7800C3.B5D3203F@lemburg.com> Tim Peters wrote: > > Note that even adding a "frozen" flag would add 4 bytes to every freezable > object on most machines. That's why I'd rather .freeze() replace the type > pointer and .unfreeze() restore it. No time or space overhead; no > cluttering up the normal-case (i.e., unfrozen) type implementations with new > tests. Note that Fred's weak ref implementation also need a flag on every weak referencable object (at least last time I looked at his patches). Why not add a flag byte or word to these objects -- then we'd have 8 or 16 choices of what to do with them ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From MarkH at ActiveState.com Wed Jan 31 13:18:12 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 31 Jan 2001 23:18:12 +1100 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com> Message-ID: MAL writes: > > - How to spell it? x.freeze()? x.readonly()? > > How about .lock() and .unlock() ? I'm with Greg here - lock() and unlock() imply an operation similar to threading.Lock() - ie, exclusivity rather than immutability. I don't have a strong opinion on the other names, but definately prefer any of the others over lock() for this operation. Mark. From mal at lemburg.com Wed Jan 31 13:26:07 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 13:26:07 +0100 Subject: [Python-Dev] Making mutable objects readonly References: Message-ID: <3A78045F.7DB50871@lemburg.com> Mark Hammond wrote: > > MAL writes: > > > > - How to spell it? x.freeze()? x.readonly()? > > > > How about .lock() and .unlock() ? > > I'm with Greg here - lock() and unlock() imply an operation similar to > threading.Lock() - ie, exclusivity rather than immutability. > > I don't have a strong opinion on the other names, but definately prefer any > of the others over lock() for this operation. Funny, I though that .lock() and .unlock() could be used to implement exactly what threading.Lock() does... Anyway, names really don't matter much, so how about: .mutable([flag]) -> integer If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state. The semantics of this interface would be in sync with many other state APIs in Python and C (e.g. setlocale()). The advantage of making this a method should be clear: it allows writing polymorphic code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pedroni at inf.ethz.ch Wed Jan 31 13:34:32 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Wed, 31 Jan 2001 13:34:32 +0100 (MET) Subject: [Python-Dev] weak refs and jython Message-ID: <200101311234.NAA24584@core.inf.ethz.ch> Hi. I have read weak ref PEP, maybe too late. I don't know if portability of code using weak refs between python and jython was a goal or could be one, and up to which extent actual impl. will correspond to the PEP. But about The callbacks registered with weak references must accept a single parameter, which will be the weak-ly referenced object itself. The object can be resurrected by creating some other reference to the object in the callback, in which case the weak reference generating the callback will still be cleared but no remaining weak references to the object will be cleared. AFAIK using java weak refs (which I think is a natural choice) I see no way (at least no worth-the-effort way) to implement this in jython. Java weak refs cannot be resurrected. regards, Samuele Pedroni. PS: Mr. X is a jython developer. From bckfnn at worldonline.dk Wed Jan 31 13:49:22 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Wed, 31 Jan 2001 12:49:22 GMT Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <200101302042.PAA29301@cj20424-a.reston1.va.home.com> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> Message-ID: <3a7809c0.14839067@smtp.worldonline.dk> >> > Note that Jeremy is only raising errors for "from M import *". >> >> No, he says he's also raising errors for 'import spam' if 'spam' is declared >> global, like so: >> >> def viking(): >> global spam >> import spam > >Yeah, this was just brought to my attention at our group meeting >today. I'm with you on this one -- there really isn't a good reason >why this shouldn't work. (I wonder why that constraint was ever added >to the reference manual; maybe I was just upset that someone would >*do* something as ugly as that, or maybe there was a J[P]ython >reason???.) Previously Jython have had problems with "from .. import *" in function scope, and still have problems when used with the python -> java compiler: http://sourceforge.net/bugs/?func=detailbug&bug_id=122834&group_id=12867 Using global on an import name is currently ignored by Jython because the name assignment is done by the runtime, not the compiler. regards, finn From thomas at xs4all.net Wed Jan 31 13:59:14 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 13:59:14 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <3a7809c0.14839067@smtp.worldonline.dk>; from bckfnn@worldonline.dk on Wed, Jan 31, 2001 at 12:49:22PM +0000 References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> Message-ID: <20010131135914.N962@xs4all.nl> On Wed, Jan 31, 2001 at 12:49:22PM +0000, Finn Bock wrote: > Using global on an import name is currently ignored by Jython because > the name assignment is done by the runtime, not the compiler. So it's impossible to do, in Jython, something like: def fillme(): global me import me but it is possible to do: def fillme(): global me import me as _me me = _me ? I have to say I don't like that; we're always claiming 'import' (and 'def' and 'class' for that matter) are 'just another way of writing assignment'. All these special cases break that. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From bckfnn at worldonline.dk Wed Jan 31 14:35:36 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Wed, 31 Jan 2001 13:35:36 GMT Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: <20010131135914.N962@xs4all.nl> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> <20010131135914.N962@xs4all.nl> Message-ID: <3a780eda.16144995@smtp.worldonline.dk> On Wed, 31 Jan 2001 13:59:14 +0100, you wrote: >On Wed, Jan 31, 2001 at 12:49:22PM +0000, Finn Bock wrote: > >> Using global on an import name is currently ignored by Jython because >> the name assignment is done by the runtime, not the compiler. > >So it's impossible to do, in Jython, something like: > >def fillme(): > global me > import me > >but it is possible to do: > >def fillme(): > global me > import me as _me > me = _me > >? Yes, only the second example will make a global variable. > I have to say I don't like that; we're always claiming 'import' (and >'def' and 'class' for that matter) are 'just another way of writing >assignment'. All these special cases break that. I don't like it either, I was only reported what jython currently does. The current design used by Jython does lend itself directly towards a solution, but I don't see anything that makes it impossible to solve. regards, finn From mal at lemburg.com Wed Jan 31 15:34:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 15:34:19 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A78226B.2E177EFE@lemburg.com> Michael Hudson wrote: > > In the interest of generating some numbers (and filling up my hard > drive), last night I wrote a script to build lots & lots of versions > of python (many of which turned out to be redundant - eg. -O6 didn't > seem to do anything different to -O3 and pybench doesn't work with > 1.5.2), and then run pybench with them. Summarised results below; > first a key: > > src-n: this morning's CVS (with Jeremy's f_localsplus optimisation) > (only built this with -O3) > src: CVS from yesterday afternoon > src-obmalloc: CVS from yesterday afternoon with Vladimir's obmalloc > patch applied. More on this later... > Python-2.0: you can guess what this is. > > All runs are compared against Python-2.0-O2: > > Benchmark: src-n-O3 (rounds=10, warp=20) > Average round time: 49029.00 ms -0.86% > Benchmark: src (rounds=10, warp=20) > Average round time: 67141.00 ms +35.76% > Benchmark: src-O (rounds=10, warp=20) > Average round time: 50167.00 ms +1.44% > Benchmark: src-O2 (rounds=10, warp=20) > Average round time: 49641.00 ms +0.37% > Benchmark: src-O3 (rounds=10, warp=20) > Average round time: 49104.00 ms -0.71% > Benchmark: src-O6 (rounds=10, warp=20) > Average round time: 49131.00 ms -0.66% > Benchmark: src-obmalloc (rounds=10, warp=20) > Average round time: 63276.00 ms +27.94% > Benchmark: src-obmalloc-O (rounds=10, warp=20) > Average round time: 46927.00 ms -5.11% > Benchmark: src-obmalloc-O2 (rounds=10, warp=20) > Average round time: 46146.00 ms -6.69% > Benchmark: src-obmalloc-O3 (rounds=10, warp=20) > Average round time: 46456.00 ms -6.07% > Benchmark: src-obmalloc-O6 (rounds=10, warp=20) > Average round time: 46450.00 ms -6.08% > Benchmark: Python-2.0 (rounds=10, warp=20) > Average round time: 68933.00 ms +39.38% > Benchmark: Python-2.0-O (rounds=10, warp=20) > Average round time: 49542.00 ms +0.17% > Benchmark: Python-2.0-O3 (rounds=10, warp=20) > Average round time: 48262.00 ms -2.41% > Benchmark: Python-2.0-O6 (rounds=10, warp=20) > Average round time: 48273.00 ms -2.39% > > My conclusion? Python 2.1 is slower than Python 2.0, but not by > enough to care about. What compiler did you use and on which platform ? I have made similar experience with -On with n>3 compared to -O2 using pgcc (gcc optimized for PC processors). BTW, the Linux kernel uses "-Wall -Wstrict-prototypes -O3 -fomit-frame-pointer" as CFLAGS -- perhaps Python should too on Linux ?! Does anybody know about the effect of -fomit-frame-pointer ? Would it cause problems or produce code which is not compatible with code compiled without this flag ? > Interestingly, adding obmalloc speeds things up. Let's take a closer > look: > > $ python pybench.py -c src-obmalloc-O3 -s src-O3 > PYBENCH 0.7 > > Benchmark: src-O3 (rounds=10, warp=20) > > Tests: per run per oper. diff * > ------------------------------------------------------------------------ > BuiltinFunctionCalls: 843.35 ms 6.61 us +2.93% > BuiltinMethodLookup: 878.70 ms 1.67 us +0.56% > ConcatStrings: 1068.80 ms 7.13 us -1.22% > ConcatUnicode: 1373.70 ms 9.16 us -1.24% > CreateInstances: 1433.55 ms 34.13 us +9.06% > CreateStringsWithConcat: 1031.75 ms 5.16 us +10.95% > CreateUnicodeWithConcat: 1277.85 ms 6.39 us +3.14% > DictCreation: 1275.80 ms 8.51 us +44.22% > ForLoops: 1415.90 ms 141.59 us -0.64% > IfThenElse: 1152.70 ms 1.71 us -0.15% > ListSlicing: 397.40 ms 113.54 us -0.53% > NestedForLoops: 789.75 ms 2.26 us -0.37% > NormalClassAttribute: 935.15 ms 1.56 us -0.41% > NormalInstanceAttribute: 961.15 ms 1.60 us -0.60% > PythonFunctionCalls: 1079.65 ms 6.54 us -1.00% > PythonMethodCalls: 908.05 ms 12.11 us -0.88% > Recursion: 838.50 ms 67.08 us -0.00% > SecondImport: 741.20 ms 29.65 us +25.57% > SecondPackageImport: 744.25 ms 29.77 us +18.66% > SecondSubmoduleImport: 947.05 ms 37.88 us +25.60% > SimpleComplexArithmetic: 1129.40 ms 5.13 us +114.92% > SimpleDictManipulation: 1048.55 ms 3.50 us -0.00% > SimpleFloatArithmetic: 746.05 ms 1.36 us -2.75% > SimpleIntFloatArithmetic: 823.35 ms 1.25 us -0.37% > SimpleIntegerArithmetic: 823.40 ms 1.25 us -0.37% > SimpleListManipulation: 1004.70 ms 3.72 us +0.01% > SimpleLongArithmetic: 865.30 ms 5.24 us +100.65% > SmallLists: 1657.65 ms 6.50 us +6.63% > SmallTuples: 1143.95 ms 4.77 us +2.90% > SpecialClassAttribute: 949.00 ms 1.58 us -0.22% > SpecialInstanceAttribute: 1353.05 ms 2.26 us -0.73% > StringMappings: 1161.00 ms 9.21 us +7.30% > StringPredicates: 1069.65 ms 3.82 us -5.30% > StringSlicing: 846.30 ms 4.84 us +8.61% > TryExcept: 1590.40 ms 1.06 us -0.49% > TryRaiseExcept: 1104.65 ms 73.64 us +24.46% > TupleSlicing: 681.10 ms 6.49 us -3.13% > UnicodeMappings: 1021.70 ms 56.76 us +0.79% > UnicodePredicates: 1308.45 ms 5.82 us -4.79% > UnicodeProperties: 1148.45 ms 5.74 us +13.67% > UnicodeSlicing: 984.15 ms 5.62 us -0.51% > ------------------------------------------------------------------------ > Average round time: 49104.00 ms +5.70% > > *) measured against: src-obmalloc-O3 (rounds=10, warp=20) > > Words fail me slightly, but maybe some tuning of the memory allocation > of longs & complex numbers would be in order? AFAIR, Vladimir's malloc implementation favours small objects. All number objects (except longs) fall into this category. Perhaps we should think about adding his lib to the core ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 15:39:01 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 15:39:01 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A782385.5B544CD5@lemburg.com> > In the interest of generating some numbers (and filling up my hard > drive), last night I wrote a script to build lots & lots of versions > of python (many of which turned out to be redundant - eg. -O6 didn't > seem to do anything different to -O3 and pybench doesn't work with > 1.5.2), and then run pybench with them. FYI, I've just updated the archive to also work under Python 1.5.x: http://www.lemburg.com/python/pybench-0.7.zip -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mwh21 at cam.ac.uk Wed Jan 31 16:52:23 2001 From: mwh21 at cam.ac.uk (Michael Hudson) Date: 31 Jan 2001 15:52:23 +0000 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 31 Jan 2001 15:34:19 +0100" References: <3A78226B.2E177EFE@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > > My conclusion? Python 2.1 is slower than Python 2.0, but not by > > enough to care about. > > What compiler did you use and on which platform ? Argh, sorry; I meant to put this in! $ uname -a Linux atrus.jesus.cam.ac.uk 2.2.14-1.1.0 #1 Thu Jan 6 05:12:58 EST 2000 i686 unknown $ gcc --version 2.95.1 It's a Dell Dimension XPS D233 (a 233MHz PII) with a reasonably fast hard drive (two year old 10G IBM 7200rpm thingy) and quite a lot of RAM (192Mb). [snip] > AFAIR, Vladimir's malloc implementation favours small objects. > All number objects (except longs) fall into this category. Well, longs & complex numbers don't do any free list handling (like floats and int do), so I see two conclusions: 1) Don't add obmalloc to the core, but do simple free list stuff for longs (might be tricky) and complex nubmers (this should be a no-brainer). 2) Integrate obmalloc - then maybe we can ditch all of that icky freelist stuff. > Perhaps we should think about adding his lib to the core ?! Strikes me as the better solution. Can anyone try this on Windows? Seeing as windows malloc reputedly sucks, maybe the differences would be bigger. Cheers, M. -- Our lecture theatre has just crashed. It will currently only silently display an unexplained line-drawing of a large dog accompanied by spookily flickering lights. -- Dan Sheppard, ucam.chat (from Owen Dunn's summary of the year) From barry at digicool.com Wed Jan 31 17:42:28 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 11:42:28 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: Message-ID: <14968.16500.594486.613828@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> Could i get a number for this please? Looks like you beat Eric to PEP 234. :) I'll update PEP 0 and let you check in your txt file. I may want to do an editorial pass over it. -Barry From barry at digicool.com Wed Jan 31 17:50:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 11:50:10 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> Message-ID: <14968.16962.830739.920771@anthem.wooz.org> >>>>> "MZ" == Moshe Zadka writes: MZ> Basic response: I *love* the iter(), sq_iter and __iter__ MZ> parts. I tremble at seeing the rest. Why not add a method to MZ> dictionaries .iteritems() and do | for (k, v) in dict.iteritems(): | pass MZ> (dict.iteritems() would return an an iterator to the items) Moshe, I had exactly the same reaction and exactly the same idea. I'm a strong -1 on introducing new syntax for this when new methods can handle it in a much more readable way (IMO). Another idea would be to allow the iterator() method to take an argument: for key in dict.iterator() a.k.a. for key in dict.iterator(KEYS) and also for value in dict.iterator(VALUES) for key, value in dict.iterator(ITEMS) One problem is that the constants KEYS, VALUES, and ITEMS would either have to be defined some place, or you'd just use values like 0, 1, 2, which is less readable perhaps than just having iteratoritems(), iteratorkeys(), and iteratorvalues() methods. Alternative spellings: itemsiter(), keysiter(), valsiter() itemsiterator(), keysiterator(), valuesiterator() iiterator(), kiterator(), viterator() ad-nauseum-ly y'rs, -Barry From skip at mojam.com Wed Jan 31 17:11:19 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 10:11:19 -0600 (CST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> Message-ID: <14968.14631.419491.440774@beluga.mojam.com> What stimulated this thread about making mutable objects (temporarily) immutable? Can someone give me an example where this is actually useful and can't be handled through some existing mechanism? I'm definitely with Fredrik on this one. Sounds like madness to me. I'm just guessing here, but since the most common need for immutable objects is a dictionary keys, I can envision having to test the lock state of a list or dict that someone wants to use as a key everywhere you would normally call has_key: if l.islocked() and d.has_key(l): ... If you want immutable dicts or lists in order to use them as dictionary keys, just serialize them first: survey_says = {"spam": 14, "eggs": 42} sl = marshal.dumps(survey_says) dict[sl] = "spam" Here's another pitfall I can envision. survey_says = {"spam": 14, "eggs": 42} survey_says.lock() dict[survey_says] = "Richard Dawson" survey_says.unlock() At this point can I safely iterate over the keys in the dictionary or not? Skip From skip at mojam.com Wed Jan 31 16:57:30 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 09:57:30 -0600 (CST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: References: <20010131015832.K962@xs4all.nl> Message-ID: <14968.13802.22823.702114@beluga.mojam.com> Ping> x is sequence-like if it provides __getitem__() but not keys() So why does this barf? >>> [].__getitem__ Traceback (most recent call last): File "", line 1, in ? AttributeError: __getitem__ (Obviously, lists *do* understand __getitem__ at some level. Why isn't it exposed in the method table?) Skip From fredrik at pythonware.com Wed Jan 31 18:19:44 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 31 Jan 2001 18:19:44 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <007301c08baa$02908220$e46940d5@hagrid> barry wrote: > Alternative spellings: > > itemsiter(), keysiter(), valsiter() > itemsiterator(), keysiterator(), valuesiterator() > iiterator(), kiterator(), viterator() shouldn't that be xitems, xkeys, xvalues? From mal at lemburg.com Wed Jan 31 18:21:02 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 18:21:02 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> Message-ID: <3A78497E.8BCF197E@lemburg.com> Skip Montanaro wrote: > > What stimulated this thread about making mutable objects (temporarily) > immutable? Can someone give me an example where this is actually useful and > can't be handled through some existing mechanism? I'm definitely with > Fredrik on this one. Sounds like madness to me. This thread is an offspring of the "for something in dict:" thread. The problem we face when iterating over mutable objects is that the underlying objects can change. By marking them read-only we can safely iterate over their contents. Another advantage of being able to mark mutable as read-only is that they may become usable as dictionary keys. Optimizations such as self-reorganizing read-only dictionaries would also become possible (e.g. attribute dictionaries which are read-only could calculate a second hash value to make the hashing perfect). > I'm just guessing here, but since the most common need for immutable objects > is a dictionary keys, I can envision having to test the lock state of a list > or dict that someone wants to use as a key everywhere you would normally > call has_key: > > if l.islocked() and d.has_key(l): > ... > > If you want immutable dicts or lists in order to use them as dictionary > keys, just serialize them first: > > survey_says = {"spam": 14, "eggs": 42} > sl = marshal.dumps(survey_says) > dict[sl] = "spam" Sure and that's what .items(), .keys() and .values() do. The idea was to avoid the extra step of creating lists or tuples first. > Here's another pitfall I can envision. > > survey_says = {"spam": 14, "eggs": 42} > survey_says.lock() > dict[survey_says] = "Richard Dawson" > survey_says.unlock() > > At this point can I safely iterate over the keys in the dictionary or not? Tim already pointed out that we will need two different read-only states: a) temporary b) permanent For dictionaries to become usable as keys in another dictionary, they'd have to marked permanently read-only. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at alum.mit.edu Wed Jan 31 05:35:58 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jan 2001 23:35:58 -0500 (EST) Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) In-Reply-To: <3A77FD5C.DE8729DC@lemburg.com> References: <3A77FD5C.DE8729DC@lemburg.com> Message-ID: <14967.38446.700271.122029@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: >> Modified Files: compile.c Log Message: Enforce two illegal import >> statements that were outlawed in the reference manual but not >> checked: Names bound by import statemants may not occur in global >> statements in the same scope. The from ... import * form may only >> occur in a module scope. >> >> I guess these changes could break code, but the reference manual >> warned about them. MAL> Jeremy, your code breaks all uses of "from package import MAL> submodule" inside packages. MAL> Try distutils for example or setup.py.... Quite aside from whether the changes should be preserved, I don't see how "from package import submodule" is affected. I ran setup.py without any problem; I wouldn't have been able to build Python otherwise. I wrote some simple test cases and didn't have any trouble with the form you describe. Can you provide a concrete example? It may be that something other than the changes mentioned above that is causing you problems. Jeremy From jeremy at alum.mit.edu Wed Jan 31 05:35:58 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jan 2001 23:35:58 -0500 (EST) Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) In-Reply-To: <3A77FD5C.DE8729DC@lemburg.com> References: <3A77FD5C.DE8729DC@lemburg.com> Message-ID: <14967.38446.700271.122029@localhost.localdomain> >>>>> "MAL" == M -A Lemburg writes: >> Modified Files: compile.c Log Message: Enforce two illegal import >> statements that were outlawed in the reference manual but not >> checked: Names bound by import statemants may not occur in global >> statements in the same scope. The from ... import * form may only >> occur in a module scope. >> >> I guess these changes could break code, but the reference manual >> warned about them. MAL> Jeremy, your code breaks all uses of "from package import MAL> submodule" inside packages. MAL> Try distutils for example or setup.py.... Quite aside from whether the changes should be preserved, I don't see how "from package import submodule" is affected. I ran setup.py without any problem; I wouldn't have been able to build Python otherwise. I wrote some simple test cases and didn't have any trouble with the form you describe. Can you provide a concrete example? It may be that something other than the changes mentioned above that is causing you problems. Jeremy From barry at digicool.com Wed Jan 31 18:20:24 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 31 Jan 2001 12:20:24 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> <007301c08baa$02908220$e46940d5@hagrid> Message-ID: <14968.18776.644453.903217@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> shouldn't that be xitems, xkeys, xvalues? Or iitems(), ikeys(), ivalues()? Personally, I don't much care. If we get consensus on the more important issue of going with methods instead of new syntax, I'm sure Guido will pick whatever method names appeal to him most. -Barry From ping at lfw.org Wed Jan 31 18:14:15 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 31 Jan 2001 09:14:15 -0800 (PST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: On Wed, 31 Jan 2001, Skip Montanaro wrote: > Ping> x is sequence-like if it provides __getitem__() but not keys() > > So why does this barf? > > >>> [].__getitem__ I was describing how to tell if instances are sequence-like. Before we get to make that judgement, first we have to look at the C method table. So: x is sequence-like if it has tp_as_sequence; all instances have tp_as_sequence; an instance is sequence-like if it has __getitem__() but not keys() x is mapping-like if it has tp_as_mapping; all instances have tp_as_mapping; an instance is mapping-like if it has both __getitem__() and keys() The "in" operator is implemented this way. x customizes "in" if it has sq_contains; all instances have sq_contains; an instance customizes "in" if it has __contains__() If sq_contains is missing, or if an instance has no __contains__ method, we supply the default behaviour by comparing the operand to each member of x in turn. This default behaviour is implemented twice: once in PyObject_Contains, and once in instance_contains. So i proposed this same structure for sq_iter and __iter__. x customizes "for ... in x" if it has sq_iter; all instances have sq_iter; an instance customizes "in" if it has __iter__() If sq_iter is missing, or if an instance has no __iter__ method, we supply the default behaviour by calling PyObject_GetItem on x and incrementing the index until IndexError. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces From mal at lemburg.com Wed Jan 31 18:57:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 18:57:20 +0100 Subject: [Python-Dev] Re: from ... import * ([Python-checkins] CVS: python/dist/src/Python compile.c,2.153,2.154) References: <3A77FD5C.DE8729DC@lemburg.com> <14967.38446.700271.122029@localhost.localdomain> Message-ID: <3A785200.FFB37CAD@lemburg.com> Jeremy Hylton wrote: > > >>>>> "MAL" == M -A Lemburg writes: > > >> Modified Files: compile.c Log Message: Enforce two illegal import > >> statements that were outlawed in the reference manual but not > >> checked: Names bound by import statemants may not occur in global > >> statements in the same scope. The from ... import * form may only > >> occur in a module scope. > >> > >> I guess these changes could break code, but the reference manual > >> warned about them. > > MAL> Jeremy, your code breaks all uses of "from package import > MAL> submodule" inside packages. > > MAL> Try distutils for example or setup.py.... > > Quite aside from whether the changes should be preserved, I don't see > how "from package import submodule" is affected. I ran setup.py > without any problem; I wouldn't have been able to build Python > otherwise. I wrote some simple test cases and didn't have any trouble > with the form you describe. Perhaps you still had old .pyc files in your installation dir ? > Can you provide a concrete example? It may be that something other > than the changes mentioned above that is causing you problems. The distutils code is full of imports like these (and other code I'm running is too): distutils/cmd.py: def __init__ (self, dist): """Create and initialize a new Command object. Most importantly, invokes the 'initialize_options()' method, which is the real initializer and depends on the actual command being instantiated. """ # late import because of mutual dependence between these classes from distutils.dist import Distribution This is the report I got from Benjamin Collar: > I've gotten the newest CVS tarball, but setup.py is still not > working; this time with a different error. I will resubmit a bug on > sourceforge if that's the proper way to handle this. Here's the error: > > ./python ./setup.py build > Traceback (most recent call last): > File "./setup.py", line 12, in ? > from distutils.core import Extension, setup > File "/usr/src/python/dist/src/Lib/distutils/core.py", line 20, in ? > from distutils.cmd import Command > File "/usr/src/python/dist/src/Lib/distutils/cmd.py", line 15, in ? > from distutils import util, dir_util, file_util, archive_util, > dep_util > SyntaxError: 'from ... import *' may only occur in a module scope > make: *** [sharedmods] Error 1 -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Wed Jan 31 19:33:56 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 12:33:56 -0600 (CST) Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A78497E.8BCF197E@lemburg.com> References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> <3A78497E.8BCF197E@lemburg.com> Message-ID: <14968.23188.573257.392841@beluga.mojam.com> MAL> This thread is an offspring of the "for something in dict:" thread. MAL> The problem we face when iterating over mutable objects is that the MAL> underlying objects can change. By marking them read-only we can MAL> safely iterate over their contents. I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. (And what about Andy Dustman's cool sqldict stuff?) If you can't extend this concept in a reasonable fashion to cover (most of) the other objects that smell like dictionaries, I think you'll just be adding needless complications for a feature than can't be used where it's really needed. I see no problem asking for the items() of an in-memory dictionary in order to get a predictable list to iterate over, but doing that for disk-based mappings would be next to impossible. So, I'm stuck iterating over something can can change out from under me. In the end, the programmer will still have to handle border cases specially. Besides, even if you *could* lock your disk-based mapping, are you really going to do that in situations where its sharable (that's what databases they are there for, after all)? I suspect you're going to keep the database mutable and work around any resulting problems. If you want to implement "for key in dict:", why not just have the VM call keys() under the covers and use that list? It would be no worse than the situation today where you call "for key in dict.keys():", and with the same caveats. If you're dumb enough to do that for an on-disk mapping object, well, you get what you asked for. Skip From esr at thyrsus.com Wed Jan 31 18:55:00 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 31 Jan 2001 12:55:00 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A78045F.7DB50871@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 01:26:07PM +0100 References: <3A78045F.7DB50871@lemburg.com> Message-ID: <20010131125500.C5151@thyrsus.com> M.-A. Lemburg : > Anyway, names really don't matter much, so how about: > > .mutable([flag]) -> integer > > If called without argument, returns 1/0 depending on whether > the object is mutable or not. When called with a flag argument, > sets the mutable state of the object to the value indicated > by flag and returns the previous flag state. I'll bear this in mind if things progress to the point where a PEP is indicated. -- Eric S. Raymond From tim.one at home.com Wed Jan 31 20:49:34 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 14:49:34 -0500 Subject: [Python-Dev] WARNING: Changed build process for zlib on Windows In-Reply-To: Message-ID: [Mark Hammond] > ... > The new process is very simple, but may break some peoples build. > ... > The reason this _should_ not break your build is that your > _probably_ already have a "..\..\zlib-1.1.3" directory installed > in the right place so the header files can be located. Actually, it's certain to break the build for anyone who read PCbuild\readme.txt. But I *want* it to break: changing the directory name is a strong hint that they should download the zlib source code from the same place you did (and which is now explained in PCbuild\readme.txt, and mentioned in the 2.1a2 NEWS file). Other than that, worked first time, and-- even better --the second time too . From esr at thyrsus.com Wed Jan 31 18:53:16 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 31 Jan 2001 12:53:16 -0500 Subject: [Python-Dev] Making mutable objects readonly In-Reply-To: <3A77FE94.E5082136@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 01:01:24PM +0100 References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> Message-ID: <20010131125316.B5151@thyrsus.com> M.-A. Lemburg : > Eric, could you write a PEP for this ? Not yet. I'm about (at Guido's suggestion) to submit a revised ternary-select proposal. Let's process that first. -- Eric S. Raymond "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom." -- John F. Kennedy From tim.one at home.com Wed Jan 31 21:28:00 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 15:28:00 -0500 Subject: [Python-Dev] weak refs and jython In-Reply-To: <200101311234.NAA24584@core.inf.ethz.ch> Message-ID: [Samuele Pedroni] > I have read weak ref PEP, maybe too late. > I don't know if portability of code using weak refs between > python and jython was a goal or could be one, CPython generally doesn't want to do anything impossible for Jython, if it can help it. > and up to which extent actual impl. will correspond to the PEP. Don't care about that. > ... > AFAIK using java weak refs (which I think is a natural choice) I > see no way (at least no worth-the-effort way) to implement this > in jython. Java weak refs cannot be resurrected. Thanks for bringing this up! Fred is looking into it. From fdrake at acm.org Wed Jan 31 21:25:51 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 31 Jan 2001 15:25:51 -0500 (EST) Subject: [Python-Dev] weak refs and jython In-Reply-To: <200101311234.NAA24584@core.inf.ethz.ch> References: <200101311234.NAA24584@core.inf.ethz.ch> Message-ID: <14968.29903.183882.41485@cj42289-a.reston1.va.home.com> Samuele Pedroni writes: > AFAIK using java weak refs (which I think is a natural choice) I see > no way (at least no worth-the-effort way) to implement this in jython. > Java weak refs cannot be resurrected. This is certainly annoying. How about this: the callback receives the weak reference object or proxy which it was registered on as a parameter. Since the reference has already been cleared, there's no way to get the object back, so we don't need to get it from Java either. Would that be workable? (I'm adjusting my patch now.) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Wed Jan 31 21:56:52 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 15:56:52 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: [Ping] > x is sequence-like if it provides __getitem__() but not keys() [Skip] > So why does this barf? > > >>> [].__getitem__ > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: __getitem__ > > (Obviously, lists *do* understand __getitem__ at some level. Why > isn't it exposed in the method table?) The old type/class split: list is a type, and types spell their "method tables" in ways that have little in common with how classes do it. See PyObject_GetItem in abstract.c for gory details (e.g., dicts spell their version of getitem via ->tp_as_mapping->mp_subscript(...), while lists spell it ->tp_as_sequence->sq_item(...); neither has any binding to the attr "__getitem__"; instance objects fill in both the tp_as_mapping and tp_as_sequence slots, then map both the mp_subscript and sq_item slots to classobject.c's instance_item, which in turn looks up "__getitem__"). bet-you're-sorry-you-asked-ly y'rs - tim From tim.one at home.com Wed Jan 31 22:24:53 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 16:24:53 -0500 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: <3A78226B.2E177EFE@lemburg.com> Message-ID: [M.-A. Lemburg] > AFAIR, Vladimir's malloc implementation favours small objects. It favors the memory alloc/dealloc patterns Vlad recorded while running an instrumented Python. Which is mostly good news. The flip side is that it favors the specific programs he ran, and who knows whether those are "typical". OTOH, vendor mallocs favor the programs *they* ran, which probably didn't include Python at all . > ... > Perhaps we should think about adding his lib to the core ?! It's patch 101104 on SF. I pushed Vlad to push this for 2.0, but he wisely decided it was too big a change at the time. It's certainly too much a change to slam into 2.1 at this late stage too. There are many reasons to want this (e.g., list.append() calls realloc every time today, because, despite over-allocating, it has no idea how much storage *has* already been allocated; any malloc has to know this info under the covers, but there's no way for us to know that too unless we add another N bytes to every list object to record it, or use our own malloc which *can* tell us that info). list.append()-behavior-varies-wildly-across-platforms-today- when-the-list-gets-large-because-of-that-ly y'rs - tim From tim.one at home.com Wed Jan 31 22:49:31 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 16:49:31 -0500 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: <3A78002F.DC8F0582@lemburg.com> Message-ID: [Tim] >> Seems an unrelated topic: would "iterators for dictionaries" solve the >> supposed problem with iteration order? [MAL] > No, but it would solve the problem in a more elegant and > generalized way. I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the problem], but it would solve the problem ...". Can only assume we're switching topics within single sentences now . > Besides, it also allows writing code which is thread safe, since > the iterator can take special actions to assure that the dictionary > doesn't change during the iteration phase (see the other thread > about "making mutable objects readonly"). Sorry, but immutability has nothing to do with thread safety (the latter has to do with "doing a right thing" in the presence of multiple threads, to keep data structures internally consistent; raising an exception is never "a right thing" unless the user is violating the advertised semantics, and if mutation during iteration is such a violation, the presence or absence of multiple threads has nothing to do with that). IOW, perhaps, a critical section is an area of non-exceptional serialization, not a landmine that makes other threads *blow up* if they touch it. > ... > I don't remember the figures, but these micor optimizations That's plural, but I thought you were talking specifically about the mutable counter object. I don't know which, but the two statements don't jibe. > do speedup loops by a noticable amount. Just compare the performance > of stock Python 1.5 against my patched version. No time now, but after 2.1 is out, sure, wrt it (not 1.5). From tim.one at home.com Wed Jan 31 23:10:12 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 17:10:12 -0500 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: Message-ID: [Michael Hudson] > ... > Can anyone try this on Windows? Seeing as windows malloc > reputedly sucks, maybe the differences would be bigger. No time now (pymalloc is a non-starter for 2.1). Was tried in the past on Windows. Helped significantly. Unclear how much was simply due to exploiting the global interpreter lock, though. "Windows" is also a multiheaded beast (e.g., NT has very different memory performance characteristics than 95). From tim.one at home.com Wed Jan 31 23:43:59 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 31 Jan 2001 17:43:59 -0500 Subject: generators (was RE: [Python-Dev] Re: Sets: elt in dict, lst.include) In-Reply-To: <20010130092454.D18319@glacier.fnational.com> Message-ID: [Neil Schemenauer] > What's the chances of getting generators into 2.2? Unknown. IMO it has more to do with generalizing the iteration protocol than with generators per se (a generator object that doesn't play nice with "for" is unpleasant to use; otoh, a generator object that can't be used divorced from "for" is frustrating too (like when comparing the fringes of two trees efficiently, which requires interleaving two distinct traversals, each naturally recursive on its own)). > The implementation should not be hard. Didn't Steven Majewski have > something years ago? Yes, but Guido also sketched out a nearly complete implementation within the last year or so. > Why do we always get sidetracked on trying to figure out how to > do coroutines and continuations? Sorry, I've been failing to find a good answer to that question for a decade <0.4 wink>. I should note, though, that Guido's current notion of "generator" is stronger than Icon/CLU/Sather's (which are "strictly stack-like"), and requires machinery more elaborate than StevenM (or Guido) sketched before. > Generators would add real power to the language and are simple > enough that most users could benefit from them. Also, it should be > possible to design an interface that does not preclude the > addition of coroutines or continuations later. Agreed. > I'm not volunteering to champion the cause just yet. I just want > to know if there is some issue I'm missing. microthreads have an enthusiastic and possibly growing audience. That gets into (C) stacklessness, though, as do coroutines. I'm afraid that once you go beyond "simple" (Icon) generators, a whole world of other stuff gets pulled in. The key trick to implementing simple generators in current Python is simply to decline decrementing the frame's refcount upon a "suspend" (of course the full details are more involved than *just* that, but they mostly follow *from* just that). everything-is-the-enemy-of-something-ly y'rs - tim From skip at mojam.com Wed Jan 31 23:27:38 2001 From: skip at mojam.com (Skip Montanaro) Date: Wed, 31 Jan 2001 16:27:38 -0600 (CST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include In-Reply-To: References: <14968.13802.22823.702114@beluga.mojam.com> Message-ID: <14968.37210.886842.820413@beluga.mojam.com> >>>>> "Tim" == Tim Peters writes: >> (Obviously, lists *do* understand __getitem__ at some level. Why >> isn't it exposed in the method table?) Tim> The old type/class split: list is a type, and types spell their Tim> "method tables" in ways that have little in common with how classes Tim> do it. The problem that rolls around in the back of my mind from time-to-time is that since Python doesn't currently support interfaces, checking for specific methods seems to be the only reasonable way to determine if a object does what you want or not. What would break if we decided to simply add __getitem__ (and other sequence methods) to list object's method table? Would they foul something up or would simply sit around quietly waiting for hasattr to notice them? Skip From pedroni at inf.ethz.ch Wed Jan 31 23:29:37 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Wed, 31 Jan 2001 23:29:37 +0100 Subject: [Python-Dev] weak refs and jython References: <200101311234.NAA24584@core.inf.ethz.ch> <14968.29903.183882.41485@cj42289-a.reston1.va.home.com> Message-ID: <001f01c08bd5$4c9c9900$7c5821c0@newmexico> Hi. [Fred L. Drake, Jr.] > > Java weak refs cannot be resurrected. > > This is certainly annoying. > How about this: the callback receives the weak reference object or > proxy which it was registered on as a parameter. Since the reference > has already been cleared, there's no way to get the object back, so we > don't need to get it from Java either. > Would that be workable? (I'm adjusting my patch now.) Yes, it is workable: clearly we can implement weak refs only under java2 but this is not (really) an issue. We can register the refs in a java reference queue, and poll it lazily or trough a low-priority thread in order to invoke the callbacks. -- Some remarks I have used java weak/soft refs to implement some of the internal tables of jython in order to avoid memory leaks, at least under java2. I imagine that the idea behind callbacks plus resurrection was to enable the construction of sofisticated caches. My intuition is that these features are not present under java because they will interfere too much with gc and have a performance penalty. On the other hand java offers reference queues and soft references, the latter cover the common case of caches that should be cleared when there is few memory left. (Never tried them seriously, so I don't know if the actual impl is fair, or will just wait too much starting to discard things => behavior like primitives gc). The main difference I see between callbacks and queues approach is that with queues is this left to the user when to do the actual cleanup of his tables/caches, and handling queues internally has a "low" overhead. With callbacks what happens depends really on the collection times/patterns and the overhead is related to call overhead and how much is non trivial, what the user put in the callbacks. Clearly general performance will not be easily predictable. (From a theoretical viewpoint one can simulate more or less queues with callbacks and the other way around). Resurrection makes few sense with queues, but I can easely see that lacking of both resurrection and soft refs limits what can be done with weak-like refs. Last thing: one of the things that is really missing in java refs features is that one cannot put conditions of the form as long A is not collected B should not be collected either. Clearly I'm referring to situation when one cannot modify the class of A in order to add a field, which is quite typical in java. This should not be a problem with python and its open/dynamic way-of-life. regards, Samuele Pedroni. From mal at lemburg.com Wed Jan 31 20:03:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 20:03:12 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A756FF8.B7185FA2@lemburg.com> <200101291500.KAA11569@cj20424-a.reston1.va.home.com> <3A75B190.3FD2A883@lemburg.com> <200101291922.OAA13321@cj20424-a.reston1.va.home.com> <20010129150247.B10191@thyrsus.com> <200101300217.VAA21978@cj20424-a.reston1.va.home.com> <3A77FE94.E5082136@lemburg.com> <14968.14631.419491.440774@beluga.mojam.com> <3A78497E.8BCF197E@lemburg.com> <14968.23188.573257.392841@beluga.mojam.com> Message-ID: <3A786170.CD65B8A4@lemburg.com> Skip Montanaro wrote: > > MAL> This thread is an offspring of the "for something in dict:" thread. > MAL> The problem we face when iterating over mutable objects is that the > MAL> underlying objects can change. By marking them read-only we can > MAL> safely iterate over their contents. > > I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. > (And what about Andy Dustman's cool sqldict stuff?) If you can't extend > this concept in a reasonable fashion to cover (most of) the other objects > that smell like dictionaries, I think you'll just be adding needless > complications for a feature than can't be used where it's really needed. We are currently only talking about Python dictionaries here, even though other objects could also benefit from this. > I see no problem asking for the items() of an in-memory dictionary in order > to get a predictable list to iterate over, but doing that for disk-based > mappings would be next to impossible. So, I'm stuck iterating over > something can can change out from under me. In the end, the programmer will > still have to handle border cases specially. Besides, even if you *could* > lock your disk-based mapping, are you really going to do that in situations > where its sharable (that's what databases they are there for, after all)? I > suspect you're going to keep the database mutable and work around any > resulting problems. > > If you want to implement "for key in dict:", why not just have the VM call > keys() under the covers and use that list? It would be no worse than the > situation today where you call "for key in dict.keys():", and with the same > caveats. If you're dumb enough to do that for an on-disk mapping object, > well, you get what you asked for. That's why iterators do a much better task here. In DB design these are usually called cursors which the allow moving inside large result sets. But this really is a different topic... Readonlyness could be put to some good use in optimizing data structure for which you know that they won't change anymore. Temporary readonlyness has the nice sideeffect of allowing low-level lock implementations and makes writing thread safe code easier to handle, because you can make assertions w/r to the immutability of an object during a certain period of time explicit in your code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 21:36:54 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 21:36:54 +0100 Subject: [Python-Dev] Making mutable objects readonly References: <3A78045F.7DB50871@lemburg.com> <20010131125500.C5151@thyrsus.com> Message-ID: <3A787766.35453597@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > Anyway, names really don't matter much, so how about: > > > > .mutable([flag]) -> integer > > > > If called without argument, returns 1/0 depending on whether > > the object is mutable or not. When called with a flag argument, > > sets the mutable state of the object to the value indicated > > by flag and returns the previous flag state. > > I'll bear this in mind if things progress to the point where a PEP is > indicated. Great :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at digicool.com Wed Jan 31 17:23:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 11:23:37 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc NEWS,1.108,1.109 In-Reply-To: Your message of "Wed, 31 Jan 2001 13:35:36 GMT." <3a780eda.16144995@smtp.worldonline.dk> References: <20010130075515.X962@xs4all.nl> <200101301506.KAA25763@cj20424-a.reston1.va.home.com> <20010130165204.I962@xs4all.nl> <200101302042.PAA29301@cj20424-a.reston1.va.home.com> <3a7809c0.14839067@smtp.worldonline.dk> <20010131135914.N962@xs4all.nl> <3a780eda.16144995@smtp.worldonline.dk> Message-ID: <200101311623.LAA01774@cj20424-a.reston1.va.home.com> [Finn] > >> Using global on an import name is currently ignored by Jython because > >> the name assignment is done by the runtime, not the compiler. [Thomas] > >So it's impossible to do, in Jython, something like: > > > >def fillme(): > > global me > > import me > > > >but it is possible to do: > > > >def fillme(): > > global me > > import me as _me > > me = _me > > > >? [Finn again] > Yes, only the second example will make a global variable. > > > I have to say I don't like that; we're always claiming 'import' (and > >'def' and 'class' for that matter) are 'just another way of writing > >assignment'. All these special cases break that. > > I don't like it either, I was only reported what jython currently does. > The current design used by Jython does lend itself directly towards a > solution, but I don't see anything that makes it impossible to solve. Tentatively, I'd say that this should be documented as a Jython difference and Jython should strive to fix this. So I see no good reason to rule it out in CPython. That doesn't mean I like Thomas's example! It should probably be redesigned along the lines of def fillme(): import me return me me = fillme() to avoid needing side effects on globals. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jan 31 17:26:11 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 31 Jan 2001 11:26:11 -0500 Subject: [Python-Dev] The 2nd Korea Python Users Seminar Message-ID: <200101311626.LAA01799@cj20424-a.reston1.va.home.com> Wow...! Way to go, Christian! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Wed, 31 Jan 2001 22:46:06 +0900 From: "Changjune Kim" To: Subject: The 2nd Korea Python Users Seminar Dear Mr. Guido van Rossum, First of all, I can't thank you more for your great contribution to the presence of Python. It is not a mere computer programming language but a whole culture, I think. I am proud to tell you that we are having the 2nd Korea Python Users Seminar which is wide open to the public. There are already more than 400 people who registered ahead, and we expect a few more at the site. The seminar will be held in Seoul, South Korea on Feb 2. With the effort of Korea Python Users Group, there has been quite a boom or phenomenon for Python among developers in Korea. Several magazines are _competitively_ carrying regular articles about Python -- I'm one of the authors -- and there was an article even on a _normal_ newspaper, one of the major four big newspapers in Korea, which described the sprouting of Python in Korea and pointed its extreme easiness to learn. (moreover, it's the year of the snake in the 12 zodiac animals) The seminar is mainly about: Python 2.0, intro for newbies, Python coding style, ZOPE, internationalization of Zope for Korean, GUIs such as wxPython, PyQt, Internet programming in Python, Python with UML, Python C/API, XML with Python, and Stackless Python. Christian Tismer is coming for SPC presentation with me, and Hostway CEO Lucas Roh will give a talk about how they are using Python, and one of the Python evangelists, Brian Lee, CTO of Linuxkorea will give a brief intro to Python and Python C/API. I'm so excited and happy to tell you this great news. If there is any message you want to give to Korea Python Users Group and the audience, it'd be great -- I could translate it and post it at the site for all the audience. Thank you again for your wonderful snake. Best regards, June from Korea. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com ------- End of Forwarded Message From moshez at zadka.site.co.il Wed Jan 31 21:32:45 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 31 Jan 2001 22:32:45 +0200 (IST) Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <007301c08baa$02908220$e46940d5@hagrid> References: <007301c08baa$02908220$e46940d5@hagrid>, <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <20010131203245.E813BA83E@darjeeling.zadka.site.co.il> [Barry] > itemsiter(), keysiter(), valsiter() > itemsiterator(), keysiterator(), valuesiterator() > iiterator(), kiterator(), viterator() [/F] > shouldn't that be xitems, xkeys, xvalues? I'm so hoping I missed a there somewhere. Please, no more of the dreaded 'x'. thinking-of-ripping-x-from-my-keyboard-ly y'rs, Z. -- Moshe Zadka This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6 From thomas at xs4all.net Wed Jan 31 22:00:33 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 22:00:33 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) In-Reply-To: <3A78226B.2E177EFE@lemburg.com>; from mal@lemburg.com on Wed, Jan 31, 2001 at 03:34:19PM +0100 References: <3A78226B.2E177EFE@lemburg.com> Message-ID: <20010131220033.O962@xs4all.nl> On Wed, Jan 31, 2001 at 03:34:19PM +0100, M.-A. Lemburg wrote: > I have made similar experience with -On with n>3 compared to -O2 > using pgcc (gcc optimized for PC processors). BTW, the Linux > kernel uses "-Wall -Wstrict-prototypes -O3 -fomit-frame-pointer" > as CFLAGS -- perhaps Python should too on Linux ?! Maybe, but the Linux kernel can be quite specific in what version of gcc you need, and knows in advance on what platform you are using it :) The stability and actual speedup of gcc's optimization options can and does vary across platforms. In the above example, -Wall and -Wstrict-prototypes are just warnings, and -O3 is the same as "-O2 -finline-functions". As for -fomit-frame-pointer.... > Does anybody know about the effect of -fomit-frame-pointer ? > Would it cause problems or produce code which is not compatible > with code compiled without this flag ? The effect of -fomit-frame-pointer is that the compilation of frame-pointer handling code is avoided. It doesn't have any effect on compatibility, since it doesn't matter that other parts/functions/libraries do have such code, but it does make debugging impossible (on most machines, in any case.) From GCC's info docs: -fomit-frame-pointer' Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. *It also makes debugging impossible on some machines.* On some machines, such as the Vax, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro RAME_POINTER_REQUIRED' controls whether a target machine supports this flag. *Note Registers::. Obviously, for the Linux kernel this is a very good thing, you don't debug the Linux kernel like a normal program anyway (contrary to some other UNIX kernels, I might add.) I believe -g turns off -fomit-frame-pointer itself, but the docs for -g or -fomit-frame-pointer don't mention it. One other thing I noted in the gcc docs is that gcc doesn't do loop unrolling even with -O3, though I thought it would at -O2. You need to add -funroll-loop to enable loop unrolling, and that might squeeze out some more performance.. This only works for loops with a fixed repetition, though, so I'm not sure if it matters. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Wed Jan 31 20:14:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 31 Jan 2001 20:14:58 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include - really begs for a PEP In-Reply-To: <14968.16962.830739.920771@anthem.wooz.org>; from barry@digicool.com on Wed, Jan 31, 2001 at 11:50:10AM -0500 References: <20010131101449.B28C5A83E@darjeeling.zadka.site.co.il> <14968.16962.830739.920771@anthem.wooz.org> Message-ID: <20010131201457.I922@xs4all.nl> [ Trimming CC: line ] On Wed, Jan 31, 2001 at 11:50:10AM -0500, Barry A. Warsaw wrote: > Moshe, I had exactly the same reaction and exactly the same idea. I'm > a strong -1 on introducing new syntax for this when new methods can > handle it in a much more readable way (IMO). Same here. I *might* like it if iterators were given a format string (or tuple object, or whatever) so they knew what the iterating code expected (so something like this: for x,y,z in obj would translate into iterator(obj)("(x,y,z)") or maybe just iterator(obj)((None,None,None)) or maybe even just iterator(obj)(3) # that is, number of elements or so) but I suspect it might be too cute (and obfuscated) for Python, especially if it was put to use to distingish between 'for x:y in obj' and 'for x,y in obj'. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From sjoerd at oratrix.nl Wed Jan 31 21:05:06 2001 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Wed, 31 Jan 2001 21:05:06 +0100 Subject: [Python-Dev] python setup.py fails with illegal import (+ fix) Message-ID: <20010131200507.A106931E1AD@bireme.oratrix.nl> With the current CVS version, running python setup.py as part of the build process fails with a syntax error: Traceback (most recent call last): File "../setup.py", line 12, in ? from distutils.core import Extension, setup File "/usr/people/sjoerd/src/python/Lib/distutils/core.py", line 20, in ? from distutils.cmd import Command File "/usr/people/sjoerd/src/python/Lib/distutils/cmd.py", line 15, in ? from distutils import util, dir_util, file_util, archive_util, dep_util SyntaxError: 'from ... import *' may only occur in a module scope The fix is to change the from ... import * that the compiler complains about: Index: file_util.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/distutils/file_util.py,v retrieving revision 1.7 diff -u -c -r1.7 file_util.py *** file_util.py 2000/09/30 17:29:35 1.7 --- file_util.py 2001/01/31 20:01:56 *************** *** 106,112 **** # changing it (ie. it's not already a hard/soft link to src OR # (not update) and (src newer than dst). ! from stat import * from distutils.dep_util import newer if not os.path.isfile(src): --- 106,112 ---- # changing it (ie. it's not already a hard/soft link to src OR # (not update) and (src newer than dst). ! from stat import ST_ATIME, ST_MTIME, ST_MODE, S_IMODE from distutils.dep_util import newer if not os.path.isfile(src): I didn't check this in because distutils is Greg Ward's baby. -- Sjoerd Mullender From mal at lemburg.com Wed Jan 31 23:24:43 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 23:24:43 +0100 Subject: [Python-Dev] Benchmarking "fun" (was Re: Python 2.1 slower than 2.0) References: Message-ID: <3A7890AB.69B893F9@lemburg.com> Tim Peters wrote: > > [Michael Hudson] > > ... > > Can anyone try this on Windows? Seeing as windows malloc > > reputedly sucks, maybe the differences would be bigger. > > No time now (pymalloc is a non-starter for 2.1). Was tried in the past on > Windows. Helped significantly. Unclear how much was simply due to > exploiting the global interpreter lock, though. "Windows" is also a > multiheaded beast (e.g., NT has very different memory performance > characteristics than 95). We're still in alpha, no ? Adding pymalloc is not much of a deal since it fits nicely with the Python malloc macros and giving the package a nice spin by putting it into a Python alpha release would sure create more confidence in this nice piece of work. We can always take it out again before going into the beta phase. Or do we have a 2.1 feature freeze already ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jan 31 23:15:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jan 2001 23:15:50 +0100 Subject: [Python-Dev] Re: Sets: elt in dict, lst.include References: Message-ID: <3A788E96.AB823FAE@lemburg.com> Tim Peters wrote: > > [Tim] > >> Seems an unrelated topic: would "iterators for dictionaries" solve the > >> supposed problem with iteration order? > > [MAL] > > No, but it would solve the problem in a more elegant and > > generalized way. > > I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the > problem], but it would solve the problem ...". Can only assume we're > switching topics within single sentences now . Sorry, not my brightest day today... what I wanted to say is that iterators would solve the problem of defining "something" in "for something in dict" nicely. Since iterators can define the order in which a data structure is traversed, this would also do away with the second (supposed) problem. > > Besides, it also allows writing code which is thread safe, since > > the iterator can take special actions to assure that the dictionary > > doesn't change during the iteration phase (see the other thread > > about "making mutable objects readonly"). > > Sorry, but immutability has nothing to do with thread safety (the latter has > to do with "doing a right thing" in the presence of multiple threads, to > keep data structures internally consistent; raising an exception is never "a > right thing" unless the user is violating the advertised semantics, and if > mutation during iteration is such a violation, the presence or absence of > multiple threads has nothing to do with that). IOW, perhaps, a critical > section is an area of non-exceptional serialization, not a landmine that > makes other threads *blow up* if they touch it. Who said that an exception is raised ? The method I posted on the mutability thread allows querying the current state just like you would query the availability of a resource. > > ... > > I don't remember the figures, but these micor optimizations > > That's plural, but I thought you were talking specifically about the mutable > counter object. I don't know which, but the two statements don't jibe. The counter object patch is a micro-optimization and as such will only give you a gain of a few percent. What makes the difference is the sum of these micro optimizations. Here's the patch for Python 1.5 which includes the optimizations: http://www.lemburg.com/python/mxPython-1.5.patch.gz > > do speedup loops by a noticable amount. Just compare the performance > > of stock Python 1.5 against my patched version. > > No time now, but after 2.1 is out, sure, wrt it (not 1.5). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/