From tim.one@comcast.net Sun Sep 1 08:04:44 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 01 Sep 2002 03:04:44 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020824183542.GA22248@glacier.arctrix.com> Message-ID: [Neil Schemenauer] > ... > For whatever reason, setting HAMBIAS to 1.0 seems to produce worse results. It's remarkable. Graham's scheme is pasted together out of all sorts of things that shouldn't work , but this one seems the most mysterious. It has a huge effect in my 5x5 c.l.py test grid. Combining all unique msgs identified as false negative or false positive across all 20 test runs, At HAMBIAS = 1.0 total false negatives goes down by a factor of 2 (337 -> 166) total false positives goes up by a factor of 7.6 (23 -> 174) and some of the false positives are just amazing -- David Ascher announcing a Python conference, Laura Creighton pontificating about the GPL, ... it's hard to fathom! One innocuous example: """ Hello, I love all these speed debates but if speed were our only concern we would all be writing in assembly for all non internet based programs...! Thank you, Vincent A. Primavera prob = 0.99918657946 prob('only') = 0.645419 prob('would') = 0.349237 prob('hello,') = 0.342435 prob('assembly') = 0.34891 prob('thank') = 0.819611 prob('these') = 0.677099 prob('all') = 0.709966 prob('you,') = 0.803672 prob('concern') = 0.225352 prob('our') = 0.951928 prob('internet') = 0.942274 prob('speed') = 0.305927 prob('but') = 0.229635 prob('love') = 0.736116 prob('non') = 0.885065 prob('writing') = 0.150994 """ There's not a lot going on in that msg! *Perhaps* the primary effect of boosting HAMBIAS is to take common glue words (like 'these' and 'all') out of this uniquely "only look at smoking guns" scoring scheme altogether? I don't know what "sense" there is in letting 'these' vote in favor of spam, for example. At HAMBIAS = 3.0 total false negatives goes up by a factor of 2.08 (337 -> 702) total false positives goes down by a factor of 4.6 (23 -> 5) Somebody else think about this . It's certainly the easiest knob to twiddle to make a false-positive versus false-negative rate tradeoff. From sholden@holdenweb.com Sun Sep 1 10:14:25 2002 From: sholden@holdenweb.com (Steve Holden) Date: Sun, 1 Sep 2002 05:14:25 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> <15727.33074.324120.988215@gargle.gargle.HOWL> <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> <15727.33451.698048.657655@slothrop.zope.com> <2m3csw5qu9.fsf@starship.python.net> <055301c2503a$e1cfea60$6300000a@holdenweb.com> <2mfzwwiaud.fsf@starship.python.net> Message-ID: <003301c25197$f8522600$6300000a@holdenweb.com> [Michael Hudson] > "Steve Holden" writes: > > > > A bunch of 0.5% improvements add up. If there's not much cost in > > > complexity, why not go for it? > > > > > > > Yeah, right, we just need 200 of them and we're laughing. Computation in > > infinitesimal time. > > Multiply up doesn't have the same ring to it, does it? > Indeed not. I try to keep my pedantry in control, but it escapes from time to time. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming pydish.holdenweb.com/pwp/ Previous .sig file retired to www.homeforoldsigs.com ----------------------------------------------------------------------- From skip@manatee.mojam.com Sun Sep 1 13:00:23 2002 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 1 Sep 2002 07:00:23 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200209011200.g81C0NSH019331@manatee.mojam.com> Bug/Patch Summary ----------------- 282 open / 2810 total bugs (+7) 119 open / 1676 total patches (+10) New Bugs -------- textwrap has problems wrapping hyphens (2002-08-17) http://python.org/sf/596434 Another dealloc stack killer (2002-08-25) http://python.org/sf/600007 Installing w/o admin generates key error (2002-08-27) http://python.org/sf/600952 bug in new execvpe (2002-08-27) http://python.org/sf/601077 weird header wrapping in email.Generator (2002-08-28) http://python.org/sf/601392 xmlrpclib ignores CDATA (2002-08-28) http://python.org/sf/601534 some int results that should be bool (2002-08-29) http://python.org/sf/601775 smtplib mishandles empty sender (2002-08-29) http://python.org/sf/602029 configure finds c++ w/o --with-cxx (2002-08-29) http://python.org/sf/602102 os.popen() negative error code IOError (2002-08-29) http://python.org/sf/602245 3rd parameter for Tkinter.scan_dragto (2002-08-30) http://python.org/sf/602259 Bgen should learn about booleans (2002-08-30) http://python.org/sf/602291 option for not writing .py[co] files (2002-08-30) http://python.org/sf/602345 Jaguar "install" does not overwrite (2002-08-30) http://python.org/sf/602398 non greedy match bug (2002-08-30) http://python.org/sf/602444 pydoc -g dumps core on Solaris 2.8 (2002-08-30) http://python.org/sf/602627 cgitb tracebacks not accessible (2002-08-31) http://python.org/sf/602893 New Patches ----------- test_commands test fails under Cygwin (2002-04-16) http://python.org/sf/544740 email: RFC 2231 parameters encoding (2002-08-26) http://python.org/sf/600096 IDLE [Open module]: import submodules (2002-08-26) http://python.org/sf/600152 Robustness tweak to httplib.py (2002-08-26) http://python.org/sf/600488 Refactoring of difflib.Differ (2002-08-27) http://python.org/sf/600984 build_ext forgets libraries par w MSVC (2002-08-28) http://python.org/sf/601314 obmalloc,structmodule: 64bit, big endian (2002-08-28) http://python.org/sf/601369 expose PYTHON_API_VERSION via sys (2002-08-28) http://python.org/sf/601456 replace_header method for Message class (2002-08-29) http://python.org/sf/601959 sys.path in user.py (2002-08-29) http://python.org/sf/602005 improper use of strncpy in getpath (2002-08-29) http://python.org/sf/602108 single shared ticker (2002-08-29) http://python.org/sf/602191 Closed Bugs ----------- test_commands test fails under Cygwin (2002-04-16) http://python.org/sf/544740 Various Playstation 2 Linux Test Errors (2002-06-12) http://python.org/sf/567892 Core dump when using mmap. (2002-08-20) http://python.org/sf/597938 execfile() not show filename when IOErro (2002-08-23) http://python.org/sf/599163 SocketServer wrong about allow_reuse_add (2002-08-24) http://python.org/sf/599681 sub[n] not working as expected. (2002-08-24) http://python.org/sf/599757 httplib.connect broken in 2.1 branch (2002-08-25) http://python.org/sf/599838 NameError value is not the name error (2002-08-25) http://python.org/sf/599869 Closed Patches -------------- "simplification" to ceval.c (2002-08-19) http://python.org/sf/597221 Failure building the documentation (2002-08-22) http://python.org/sf/598996 From martin@v.loewis.de Sun Sep 1 22:25:39 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 01 Sep 2002 23:25:39 +0200 Subject: [Python-Dev] mimetypes patch #554192 In-Reply-To: <3D5F9C2D.8010209@livinglogic.de> References: <3D5BEBB8.7080904@livinglogic.de> <15707.61612.844119.819432@anthem.wooz.org> <3D5CE38D.9080905@livinglogic.de> <3D5F9C2D.8010209@livinglogic.de> Message-ID: Walter D=F6rwald writes: > >>Even better would be, if we could assign priorities to the mappings, > >>so that for e.g. image/jpeg the preferred extension is .jpeg. > >>Then guess_type() and guess_extension() would return the preferred > >>mimetype/extension. > > Do you have a specific application for that in mind? It sounds like > > overkill. >=20 > I'm using a web mirror script which uses the extensions from > guess_extension to save all downloaded resources, and I hate it > when the HTML files are named .htm and JPEG images are named .jpe. Then this is your preference - others might prefer jpg, just because their file system can deal better with that. If you can agree that this is your preference, you should put the preference mechanism into the application. Maybe your preference can be expressed algorithmically? It might be that you always want the longest known extension (it is unlikely that you prefer "jpeg" over "jpg" just because that contains a vowel :-). Regards, Martin From martin@v.loewis.de Sun Sep 1 22:31:26 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 01 Sep 2002 23:31:26 +0200 Subject: [Python-Dev] PyString_DecodeEscape and PEP293 In-Reply-To: <3D60EA3B.7030008@livinglogic.de> References: <3D60EA3B.7030008@livinglogic.de> Message-ID: Walter D=F6rwald writes: > A recent checkin added a function PyString_DecodeEscape() > to stringobject.c. To make this function PEP293 compatible > it would need access to unicode_decode_call_errorhandler > which is defined static in unicodeobject.c. Does > PyString_DecodeEscape() really need an errors argument? What do you mean, "really need"? The callers of this function pass the argument, in particular escape_decode. Is that "real"? > If yes, we could either move it to unicodeobject.c=20 No. It has to do little with Unicode. > or make unicode_decode_call_errorhandler externally visible. I don't know this function. What does this have to do with Unicode? > Another problem that I noticed is that string-escape can't > be used for encoding Unicode objects: That is a feature. string-escape has nothing to do with Unicode. Regards, Martin From martin@v.loewis.de Sun Sep 1 22:22:29 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 01 Sep 2002 23:22:29 +0200 Subject: [Python-Dev] PEP 277 (unicode filenames): please review In-Reply-To: References: Message-ID: Matthias Urlichs writes: > Linux and MacOSX use UTF-8 and should probably be treated as such,=20 > i.e. I want to open("=E4=F6=FC"), not open("=E4=F6=FC".encode("utf-8")). What would be "=E4=F6=FC" in this context? Your message was encoded as Latin-1 - was that deliberate? You could expect that open(u"=E4=F6=FC") works well; for the way you write it, somebody needs to know what encoding the string has. Linux does *not* "use" UTF-8. On the file system API, it treats arbitrary byte sequences as-is, i.e. when you pass "=E4=F6=FC" as Latin-1, it will put those bytes on disk - if you later use "=E4=F6=FC" in UTF-8, Linux won't find the file. Instead, the convention seems to be that file names are in the locale's encoding - which might be UTF-8, if you use a UTF-8 locale. > Byte strings are perfectly OK if they have a common encoding (meaning=20 > UTF-8, in some accepted normal form).=20 Unfortunately, that precondition is false. There is no common encoding on Linux. Regards, Martin From martin@v.loewis.de Sun Sep 1 22:57:32 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 01 Sep 2002 23:57:32 +0200 Subject: [Python-Dev] To commit or not to commit In-Reply-To: <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > > Any objections against committing the patch? > > What do MvL and MAL say? I'm still concerned about the massive amounts of C code, most of which could be expressed way more compact in Python code. Walter convinced me that this (the aspect that I picked in a discussion) does have a real performance impact for real data, so I guess I have to live with that. Because of the size, I'm sure there are still bugs in it. I couldn't spot any by inspection, so I think the patch is ready to be installed. Regards, Martin From tdelaney@avaya.com Sun Sep 1 23:53:39 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Mon, 2 Sep 2002 08:53:39 +1000 Subject: [Python-Dev] The first trustworthy GBayes results Message-ID: > From: Tim Peters [mailto:tim.one@comcast.net] > > Training GBayes is cheap, and the more you feed it the less need to do > information-destroying transformations (like folding case or ignoring > punctuation). Speaking of which, I had a thought this morning (in the shower of course ;) about a slightly more intelligent tokeniser. Split on whitespace, then runs of punctuation at the end of "words" are split off as a separate word. So: a.b.c -> 'a.b.c' (main use: keeps file extensions with filenames) A phrase. -> 'A', 'phrase', '.' WTF??? -> 'WTF', '???' >>> import module -> '>>>', 'import', 'module' Might this be useful? No code of course ;) Tim Delaney From drifty@bigfoot.com Sun Sep 1 23:57:53 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Sun, 1 Sep 2002 15:57:53 -0700 (PDT) Subject: [Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01 Message-ID: Yes, with Michael's permission, I am attempting to start up the Python-dev summaries again. Below is my attempt at summarizing the last half of August. It's longer then normal summaries, but that is because I bothered to include discussions on threads that were not directly relating to the Python core but are interesting nonetheless (e.g., the whole spambayes thread). I am posting to Python-dev first before posting to c.l.py, c.l.py.a (also lwn.net and probably Slashdot) because I want to get the general okay from the list that I have done a good enough of a job to send this out; I don't want to have a summary that represents the going-ons here without the general populace (or just the BDFL since he can overrule =) being okay with it. I am also curious as to whether I should go into more or less detail, leave out the summaries that do not directly pertain to the Python core, etc. So please read the summary and let me know if you are okay with it. If so I will try to do semi-monthly summaries from now on. Oh, and I am on vacation right now and will be doing a lot of travelling in the next two months, so I can't guarantee summaries will be this quick to come out for a while. I will do them, though, even if they are a week late. =) Oh, and if I do get the okay to do this, expect a lot of dumb questions from me in the future in terms of clarifying things. Just remember, it is for the good of the Python community. =) ======================================= This is a summary of traffic on the python-dev mailing list between August 16, 2002 and September 1, 2002 (exclusive). It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the first summary written by Brett Cannon. Summaries are archived no where at the moment. =) They will be, though, so stay tuned for the URL in future summaries. Posting distribution (with apologies to mbm, but thanks to mwh for the code) Number of articles in summary: 585 80 | [|] | [|] | [|] | [|] | [|] [|] 60 | [|] [|] [|] | [|] [|] [|] | [|] [|] [|] | [|] [|] [|] | [|] [|] [|] [|] 40 | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 20 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-071-025-012-042-063-084-030-021-039-009-047-027-033-041-036-005 Fri 16| Sun 18| Tue 20| Thu 22| Sat 24| Mon 26| Wed 28| Fri 30| Sat 17 Mon 19 Wed 21 Fri 23 Sun 25 Tue 27 Thu 29 Sat 31 ================ Type Categories ================ This VERY long thread was sparked by Andrew Koenig asking if a discussion of making type categories more explicit had ever occured (Andrew meant for category to mean "the set of all types that implement a particular marker interface"). As Andrew later pointed out, he was asking about "a way of making notions such as 'file-like object' more formal and/or automatic". The discussion quickly started using the term interface to mean defining a way to specify that an object implemented certain methods (think of it in terms of Java's 'implements' mechanism). Once that was out of the way, the discussion took off. Zope's implementation was pointed out (http://cvs.zope.org/Zope3/lib/python/Interface/) very quickly. PEP 245 (Python Interface Syntax) was also brought to the attention of the list. The idea of using inheritance to handle interfaces was brought up. Guido said that he hasn't "given up the hope that inheritance and interfaces could use the same mechanisms. But Jim Fulton, based on years of experience in Zope, claims they really should be different" in terms of how interfaces should be handled in objects. Jeremy Hylton tried to channel Jim's opinion by pointing out that "We'd like to use interfaces to make fairly strong claims. If a class A implements an interface I, then we should be able to use an instance of A anywhere that an I is needed." But "the inheritance mechanism is too general" because if a class A implements interface I and then a class B, which does not implement I, subclasses class A we end up with a class B that claims it has a certain interface which it doesn't actually have. Guido understood the point, but still thought inheritence could be used "if there was a way to "shut off" inheritance as far as isinstance() (or issubclass())" is concerned. Guido asked the simple question, "Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in *most* cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger." Samuele Pedroni asked that any implementation "allow also for refering to anonymous super-interfaces of an interface in terms of the interface plus a subset of its signatures, also e.g. FileLike and just 'write'. [that means an interface can be thought to correspond to a set of (tag,signature) tuples, where tag identifies the interface, and one can also just consider subsets of it]". The thread has finally seemed to have stopped (for now) with Guido saying he is mulling the whole thing in the back of his head. This is a very sticky topic because of the number of design decisions required and how it might change the way people program in Python. There was also a partial sub-thread in this whole discussion about multimethods; basically a way to do overloading of methods based on parameter signature. Most of the discussion was over syntax and such and how to handle resolution order. It then seemed to go to the wayside when the main part of the thread took over again. ============================== type categories -- an example ============================== This thread was starteed when Andrew Koenig said that the reason he brought up his type category question was because he wanted a way so as to be able to identify members of a type easily. He now had an example in a program he was writing where what the type of the argument was varied and thus what needed to be done to the data changed accordingly. Jermey Hylton suggested the isinstance(obj, type(re.compile(''))) idiom. Andrew asked if this was guaranteed to work, which Jeremy said no. I asked why this was not guaranteed, and Frederick Lundh said because re.compile() is a factory fxn and it is possible that a future version could return a different object based on the pattern. =============================================== Python build trouble with the new gcc/binutils =============================================== Andrew Koenig said that he couldn't compile Python using the newest gcc (this was the day after the latest release hit servers). With help from Zack Weinberg of Code Sourcery (who also recently rewrote the tempfile module), the problem was tracked down to binutils 2.13. being the culprit and was not Python's fault. =================================== Last call: mortal interned strings =================================== The patch python.org/sf/576101 removes the default immortality of interned strings. I believe it was in early August (possibly spilled over from late July) when Oren Tirosh proposed the idea and wrote the above mentioned patch. There had been some discussion over whether any 3rd party code was reliant upon interned strings being immortal; none was found (MacPython was reliant upon it, but since it is under Python core control it was considered a moot point since it could be changed). It has been checked in. With the patch the way to make a string immortal is to call PyString_InternImmortal(); no code in the core uses this function. ===================================== PEP 218 (sets); moving set.py to Lib ===================================== Thanks to Greg Wilson (for writing the PEP), Alex Martelli (for writing the module initially), and Guido (for refactoring Alex's code) the stdlib has now gained a sets module. It has both the notion of mutable and immutable sets (the latter used when you have a set of sets). There was discussion about how sets should print (sorted or not; unsorted is default but option is there to print sorted) and what operators should be overloaded for working on sets (| and & were chosen). The module is a beautiful chunk of code and I highly recommend reading its source. =========================================== A few lessons from the tempfile.py rewrite =========================================== Zack Weinberg, after rewriting the tempfile module, brought up three points: 1) Lack of dummy threads, 2) lack of a pthreads_once equivalent, and 3) lack of a way to skip tests from unittest.py via some built-in method. Guido responded accordingly: 1) since some code uses the idiom of trying to import thread and catching the exception if it fails, Guido said he would be willing to accept a dummy_thread.py that would allow: try: import thread as _thread except ImportError: import dummy_thread as _thread to work. No word on whether this is being written at the moment. 2) Guido said the method was, in his opinion, overkill. He said to "be Pythonic, live dangerously, accept the risk that a ^C can screw you. It can anyway. :-)". And as for 3) Guido deferred Zack to the PyUnit list and Steve Purcell since Python just tracks Steve's code (pyunit.sf.net). Guido's suggestion was to stick code that was reliant on some other code in a separate testing suite that is only run when the reliant code is available. =========================== Standard datetime objects? =========================== Kevin Jacobs asked what stage the new datetime object was at. Guido said it is in python/nondist/sandbox/datetime/ in CVS which also has comments pointing to a wiki containing the current work on it. Fred L. Drake, Jr. is working on the C re-implementation and Guido expects a checkin at any moment (hasn't happened as of this writing). =================== PEP 269 versus 283 =================== Jonathan Riehl noticed that PEP 283 said PEP 269 was dead; not good considering he was close to having a patch for PEP 269 (pgen module to interface with the C version). Guido said he will revive the PEP. The patch has since been put on SF at python.org/sf/599331 . ============================== What is a backport candidate? ============================== Since Python 2.2 is going to be around for a long time, the question was brought up of what constitutes code that should be backported. Guido made the following three points: 1) code trivial to backport should always be backported 2) code patcheing 2.3 code should obviously not be backported 3) 2.2 code requires changes to use patch, but applies; gradients of this exist. So please, when submitting patches, mention whether you think the patch should be backported to the 2.2 tree and any possible dependencies it might have in a backport. ================================= python/nondist/sandbox/spambayes ================================= In response to Paul Graham's spam filter written using Baye's Rule (Slashdot post on it is at http://developers.slashdot.org/article.pl?sid=02/08/16/1428238&tid=156), a thread spawned around this checkin of code that followed that paper's suggestions. This thread quickly jumped into discussions on data structures, Baye's Rule, and a whole lot of talk about spam. Very interesting if spam filtering interests you. Tim Peters has been leading the drive on this chunk of code (and thanks to his illness that befelled him in late August which he has subsequently gotten over he had a few days of major hacking on it; Tim showed he is a performance stats whore ). A very cool quote came out of this thread from Eric S. Raymond when discussing the spam filter he has been working on: "This is actually the first new program I've coded in C (rather than Python) in a good four years or so". ==================== Parsing vs. lexing. ==================== In response to a question by Aahz about what the differences were between a lexer, parser, and tokenizer, Eric Raymond posted a good overview of the differences. Guido later commented in an email mentioning SPARK and about how Python's lexer (pgen) works and why he wrote it. He also made some other comments on lexers. Jeremy Hylton pointed out a "neat new paper about an old algorithm for recursive descent parsers with backtracking and unlimited lookahead" by Bryan Ford at http://www.brynosaurus.com/pub.html . Alex Martelli pointed out that this discussion reminded him of "a long-ago interview with Borland's techies" in which they said they were able to make Borland PASCAL fit on a floppy while MS PASCAL took multiple floppies. Their trick was "we just did everything by the Dragon Book -- except that the parser is a hand-written recursive descent parser [Aho &c being adamant defenders of Yacc & the like], which buys us a lot". Someone named Noah also emailed a discussion on lexers and parsers pulling in Finite State Machines, Push Down Autonoma, and Turing Machines in his discussion. Martin Sj?n says that Haskell's pattern matching and lazy evaluation makes lexers easy (even a Recursive-Descent parser), but unfortunately Haskell does not play with other languages nicely. Haskell is where Python got it's list comprehension idea. ========================================= [Python-Dev] Fw: Security hole in rexec? ========================================= It was brought to the attention of the list that deleting __builtins__ allowed a compromise in rexec. Guido pointed out that python.org/sf/577530 reports this. He also said don't trust rexec. A patch is going to be submitted to document the view that rexec is really not that safe. ================= A `cogen' module ================= Francois Pinard asked about Cartesian products using the new sets module. Guido didn't think people would in general need it. Francois quickly started this thread of discussing a cogen module to generate Cartesian products and other ways of operating on sets. ================= Mersenne Twister ================= Raymond Hettinger volunteered to implement the Merseene Twister algorithm (one in Python exists at www.math.keio.ac.jp/~matumoto/emt.html). While discussing to implement in C or Python, Guido noticed that random.Random re-implements whrandom. Guido then came up with the idea of writing a base random class that is subclassed where .random() can be implemented; Tim Peters agreed and suggested more methods to subclass. ================================= New PEP Format: reStructuredText ================================= David Goodger and Barry Warsaw have now gotten reST as a usable syntax for PEPs. Read the PEPs on the subject to learn more: - PEP 12 -- Sample reStructuredText PEP Template (http://www.python.org/peps/pep-0012.html) - PEP 258 -- Docutils Design Specification (http://www.python.org/peps/pep-0258.html) - PEP 287 -- reStructuredText Docstring Format (http://www.python.org/peps/pep-0287.html) ==================================== tiny optimization in ceval mainloop ==================================== Jeremy Hylton noticed that in ceval that their is a test of whether the ticker was 0 or if things_to_do was set to true (explanation of the ticker, checkinterval, and the GIL follow this paragraph). Jeremy wondered if we could just drop the ticker to 0 when things_to_do is true. Jack Janssen, though, pointed out that clearing it is not guaranteed since there may be an interrupt routine when "we fiddle things_to_do". Skip Montanaro then pointed out that since neither ticker nor things_to_do is fiddled with unless the GIL is held that instead of causing each thread to execute this test that they could be made globals instead; he did a patch that implements this (python.org/sf/602191). Guido then said that if there wasn't a decent speed improvement, then no patch would be checked in. He then changed his mind when it was pointed out that it actually simplified the code. Skip tested anyway, though, and there is a speed improvement. This also brought up whether the default value of 10 for checkinterval was reasonable. It was then agreed to be bumped up to 100. Jack ran some code and said he noticed a definite improvement. Python's version of threading is not like in C. There is something called the GIL (Global Interpreter Lock) which any thread wishing to execute Python code or play with Python objects must hold. This means that when you have Python threads running (using the thread or threading module) they are usually all waiting in line to get the GIL. Now for Python to decide when to release the GIL for another thread to grab it, it uses the ticker. This variable counts down to zero by being decremented every time a Python opcode is executed (originally defaulted to 10, now defaulted to 100). The ticker's starting value after each release of the GIL is what sys.checkinterval() sets. To get a better understanding of therading under Python I recommend reading Aahz's tutorials on threading. From tim.one@comcast.net Mon Sep 2 00:40:38 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 01 Sep 2002 19:40:38 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: [Delaney, Timothy] > Speaking of which, I had a thought this morning (in the shower of > course ;) about a slightly more intelligent tokeniser. "Intelligence" isn't necessarily helpful with a statistical scheme, and always makes it harder to adapt to other languages. > Split on whitespace, then runs of punctuation at the end of "words" are > split off as a separate word. For example , "free!!" never appears in a ham msg in my corpora, but appears often in the spam samples. OTOH, plain "free" is a weak spam indicator on c.l.py, given the frequent supposedly on-topic arguments about free beer versus free speech, etc. > a.b.c -> 'a.b.c' (main use: keeps file extensions with filenames) > > A phrase. -> 'A', 'phrase', '.' > > WTF??? -> 'WTF', '???' > > >>> import module -> '>>>', 'import', 'module' The first and last are the same as just splitting on whitespace. The 2nd-last may lose the distinction between WTF??? and a solicitation to join the World Trade Federation ; WTF isn't likely to make it into a list of smoking guns regardless. Hard to guess about the 2nd. The database isn't large enough to worry about reducing its size, btw -- the only gimmicks I care about are those that increase accuracy. > Might this be useful? No code of course ;) It takes about an hour to run and evaluate tests for one change. If you want to motivate me to try, supply a patch against timtest.py (in the sandbox), else I've already got far more ideas than time to test them properly. Anyone else want to test this one? From tdelaney@avaya.com Mon Sep 2 01:04:39 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Mon, 2 Sep 2002 10:04:39 +1000 Subject: [Python-Dev] The first trustworthy GBayes results Message-ID: > From: Tim Peters [mailto:tim.one@comcast.net] > > For example , "free!!" never appears in a ham msg in my > corpora, but > appears often in the spam samples. OTOH, plain "free" is a weak spam > indicator on c.l.py, given the frequent supposedly on-topic > arguments about > free beer versus free speech, etc. I'd actually thought of this limitation, and how it could be avoided. This so-called "more intelligent" tokeniser would probably work best in a system which scored word pairs as well as single words. For example: "I want free beer!!!" would be split as 'I' 'want' 'free' 'beer' '!!!' This might then be scored as 'I' 0.5 'want' 0.5 'free' 0.5 'beer' 0.1 (beer is unlikely to be a spam indicator ;) '!!!' 0.9 'I want' 0.3 'want free' 0.99 (do you want free hot ...?) 'free beer' 0.01 (free beer is never a spam indicator ;) 'beer !!!' 0.5 Whether any weighting should be applied to single words or word pairs I don't know - my gut feeling is that they should be weighted the same, but guts are no replacement for empirical evidence. I just brought CVS python down at home and tried compiling with MinGW (no success so far ...) but I'll have a look at the GBayes stuff sometime soon and see if the above helps at all. Unfortunately, I just started my work day ... Tim Delaney From tdelaney@avaya.com Mon Sep 2 01:38:10 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Mon, 2 Sep 2002 10:38:10 +1000 Subject: [Python-Dev] The first trustworthy GBayes results Message-ID: > From: Delaney, Timothy [mailto:tdelaney@avaya.com] > > Whether any weighting should be applied to single words or > word pairs I > don't know - my gut feeling is that they should be weighted > the same, but > guts are no replacement for empirical evidence. On second thought - if a word-pair appears, then the separate parts should not be checked as separate words. So, If I had scores: 'free' 0.1 'beer' 0.1 ('want', 'free',) 0.9 ('free', 'beer',) 0.01 ('free', '!!!',) 0.99 then the following phrases would match (case-folding) as: 'I want free beer!!!': ('want', 'free',) 0.9 ('free', 'beer',) 0.01 'Get *** for free!!!' ('free', '!!!',) 0.99 'I want free beer. Free the beer!!!' ('want', 'free',) 0.9 ('free', 'beer',) 0.01 'free' 0.1 'beer' 0.1 Damn I wish I was at home to try this out ... :( Tim Delaney From skip@pobox.com Mon Sep 2 03:29:09 2002 From: skip@pobox.com (Skip Montanaro) Date: Sun, 1 Sep 2002 21:29:09 -0500 Subject: [Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: References: Message-ID: <15730.52469.604124.730029@localhost.localdomain> Brett> I am posting to Python-dev first before posting to c.l.py, Brett> c.l.py.a ... because I want to get the general okay from the Brett> list... Looks good to me. The only trivial nit I would like to raise is that any URLs you embed in the text be true URLs. I'd also prefer they be encased in <...>, but that's slightly less important and generally only matters when URLs are immediately followed by punctuation. So, instead of Brett> Guido said he will revive the PEP. The patch has since been put Brett> on SF at python.org/sf/599331 . you'd have Brett> Guido said he will revive the PEP. The patch has since been put Brett> on SF at . The two changes make it much more likely that email readers will be able to successfully highlight such URLs correctly. Skip From skip@pobox.com Mon Sep 2 03:34:24 2002 From: skip@pobox.com (Skip Montanaro) Date: Sun, 1 Sep 2002 21:34:24 -0500 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: References: Message-ID: <15730.52784.407584.441515@localhost.localdomain> Tim> It takes about an hour to run and evaluate tests for one change. Tim> If you want to motivate me to try, supply a patch against Tim> timtest.py (in the sandbox), else I've already got far more ideas Tim> than time to test them properly. Anyone else want to test this Tim> one? Care to identify some of those ideas? Skip From tim.one@comcast.net Mon Sep 2 03:43:01 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 01 Sep 2002 22:43:01 -0400 Subject: [Python-Dev] spambayes status Message-ID: This is a multi-part message in MIME format. --Boundary_(ID_Uqs78No0Dj49zOTKCoTyzA) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT I spent an enormous amount of time this weekend running tests against various changes -- a "1% inspiration, 99% perspiration" kind of thing. There are lots of words about the changes (both good and bad) in the comment blocks and checkin msgs. The biggest "conceptual" change is that I'm now using (but only using) the Subject and From lines from the headers (my earlier belief that the ham corpora Subject lines were too corrupted by Mailman decorations turned out to be wrong). Adding Subject lines gave a remarkably small improvement, btw. Most changes I tried either didn't matter, or hurt. Approximately 70 more blatant spams in the ham corpora were identified and replaced with (randomly selected) legitimate msgs. The f-p rate is too low now to measure changes with confidence. Best guess I can make from the evidence is that it's below 0.05% now. The false negative rate has improved more, and there's still plenty of those (so it's still easy to be confident about whether changes do or don't help that). Across all 20 runs (each training on 4000 ham + about 2750 spam, then predicting against a different set with the same number of each), these are the false positive and negative rates now (percentages; note that 0.025% is a single message in the f-p column; a single msg in the f-n column is about 0.036%): f-p f-n 0.000 1.236 0.000 1.164 0.050 1.454 0.000 1.599 0.025 1.527 0.025 1.236 0.050 1.163 0.025 1.309 0.025 1.891 0.000 1.418 0.075 1.745 0.050 1.708 0.025 1.491 0.000 0.836 0.050 1.091 0.025 1.309 0.025 1.491 0.000 1.127 0.025 1.309 0.050 1.636 The aggregate number of unique f-p across all runs is down to 8. The aggregate number of unique f-n across all runs is 336. The 8 ham messages for which at least one run claimed it was spam are attached. Note that I finally removed the "If AOL were a car" spam from the good corpus; while it may or may not be amusing, it *was* automated bulk email, even to the extent of including large blocks of random characters at the end. The message consisting almost entirely of quoting a Nigerian scam message looks like it would be a "false postitive" under any scheme worth using, but I left it in the good corpus (so it's still an f-p here), because it wasn't bulk email (the original msg was, but the reply was not). --Boundary_(ID_Uqs78No0Dj49zOTKCoTyzA) Content-type: application/x-zip-compressed; name=fp.zip Content-transfer-encoding: base64 Content-disposition: attachment; filename=fp.zip UEsDBBQAAAAIAOewIS0GzF+x8RsAABBiAAAGAAAAZnAudHh07FvrcttGlv6vKr1D2+vZSI4IoXEH x1YkW7KliWW7LDlOTSq11QSaJCxcGFxE0W+7z7AvsN9pXAhe5ImqkuxOlZ2KTaK7T5/7+fo0+PTp H/lnd+dUlOLwXCSHV7K0DrnuOp6rlXfl7s4sz0bsOdM13/dd09dN3fTN+vHed0GWzES6OPhuX03x fF/3usHbqIjKZsTnvtMNyEEiorgdMbnn+e1QOZW5bMnprudx123Hsipv1xiWYxnt80KmpdaOWJy7 TjsiUhEvio4HB/911HIZyOhWht1C03OcJRsivSk6PkxdB5Pt4FTGsx/aId4+jdJbWZQJeGkp+sut JlUsyuVe3UitiaJ97vku73goqtFnGZTD94tymqUb+83UY23jeZ6J8KD3dHfnQyPpkI3zLGG/cN/W uMs1W3P5r2wP4mTP0yyUls61MPmiFaVIQ5GHcTSWGiy8v7vDmj+jBSOGtWbzLJ+weVROmSyScsb2 zu6ihJmawdl/8N6qKGTczk+LaqDrumff4J/l4DjLWU1uEEdFebwk/Xf2SYYHjOvsXVAyAzZgujk0 raFpsIFuEZE10Yq4ugmD1NKitJQ5jL8hC9vr5qwPLVn6hesapNBc49dV2b+mJbb3Ps8GnuZr5mH3 ab9Wz9Xl9XtoYUnr4uTENh1HJ+GfbZf+aJv4Homv6+x7nW8VX3d909DZ3i/cNTSTa56nGSb/dR/M Lzf/PVp6k5VVUfN9eX3Cbi3N0RzG9jxf1zhzB9wZcCSE/Z5++pIyTzds5+SMa7pu+NYLR79HGj7U jVaaS1kUYiIHF6dD9kzXXanzgNvcDp+MRoZt+5b+RLq2x8ciOG5FvV+Go92dV1DKkD3+KQpuIsku ozguHrNnt+rrfyX09XjLqutseK9Ndneumqhk17mI0iidsJfIS4UsWJSyOlTp08cfVU6Vw+1iG0Oz M+LlxeXZ4CeZF1GWDhk0trvzMoNUaTm4XsxAoZR35eEsxnZ/X2o7mArsWj5/HBXwNc/2B/xxb2Eu 0mIs88FZGmQhuBwydxQhm/88eJ9HWR6ViyEz6evl1SUCuvf0bZYnIlZDGJD5EIoL8qzIxiV7V5Vx lt2ws7tZDmMxS1NuZtvENBZEiXz35mzI4P9hFciQvVj0VjfD7KeVZVcyDWmXnsYHIkyitKf33Z2z PM/yYkC2+frEnwcvpEyvqYwM78krrWyJSJd6NzQdDr7HdW64Orz6PdWHUKYByIyq+GZ35w1teI7c D/egJFhmwz4rufytkiv7/NBk8OdUMI4aAu+zotxKYNXN1Fz4WhHk0QgsPJuW5Wx4eLiWfdV3yHFI FKJ0nB32KB4dLN3lgRwX7c4tKxfIMa9lKnMRszAqgqogtTGiozI46nbr/ShDk1wkCcVGLNJJhZju MdLjQNsU+WNa/J8JXaUbYp/kwRQJ9n5WZtFM5vSwz8PhEZXd84jVaIa+XDCRMIod0gopbJFVDC6W JUAMoSihzIJBeWWbVYImq4hbEBejWFJWIS1TZsnaVKMx9lKkiphCJbTVdY1d6GM/84GHq2XSOqmR kZov4yFpS+cmZ4ZlM44QQPIUd2uPbRvheqYQixr5aiIlyq8gZ5LlxPqYsgpJSUJeNXPZG0w+YAon MsjL5nKEj/CWRtvz+Xwjrx9iZ4T3KhF2UhQVcl4gkY5rSNpNIFNixnlWFdgMeRe1bRpBax8IKbEz pMdRlU+m7Owcgp6eH7CogGkmMCSsF5Ler4KshCuD1N7bjF3909pn+MY6XEfYQEUAskkGzbKLDgyy kwqGotQKWylV1wo17N0dFDUgGRYIKBC4asFGUvlEHoKk0l0aldBfSEun4CrI0nGEpFRG2KPGjsQs lLu7Q/uLMKS8LCW0HNOGF2PW4lySQ1IWPWBhBiRTYqSENxwG2WxxSDEdZwVMBZ+gSp5VtUmwJeFr Je8slqIgFssqR5UrWZmxqsA2nySMmAMwFAH0CnEa5uB0I9ARcIF6w2IGchF5M8UAzMRCkSA/QAew jtKjaEjReAwL1F5DAyRjBIPPRF7i6xiDxAQUoE4A5HH/8qTzoD/rxyLDMmxj5VSkqnVzKoDL6kM4 aIO9HdszDa8/esSH0zJpzz0u97htrK6W3UnAxFmgg/T2ZKiiYeP4UK9KZXvkcE3Pc9zeMhVBG8sw oCJp24CKu60b8WGESn13z9gs2jzdNKu6AcAUy+ntpaJ8GxOZttTjyoDKCdsGVK7YKg8J1BvoUKGY BACD+PsYAYZgHWmyQN5+K+fFJM+qWTFkdLrVqIQ1Gb+H/mQai0AWLci7nlYAeS57JUcE8nR8HprO kHstyHuXT0QafVHOPGQXYSgzRP4HWUrKfv+TUpJKJfa013Cw5029yeQJv01jzzq+en1hOjaXYwBf otHw/Pb6vQIWyOyDcwUwHEPzDGB/HGOMh6E+UkAuRfh15AfopmuGzi0c9vhDsN/awveinA5Zij0f jcd3WlVp8OZHVUV/i6RoH4SZFsr2C80eSxniIFZUsVYOkOygPMxYDukbQ/3Po2hCM7VEhpGYi0Wx Qpe+lFWaaVH5qJD5LWD0SAuK+qgxqgoyVEGDtGDAtbyxIsqxLNTDzjiPkPYGSHUqG0JNuRzX0m74 1tCzXAqONYS7Alv/n0BlrhlsT94B+ERU4UR8P1z+C2DjClhIs0lV3AiyINBhWt0d5rOE/h+cv/t0 /U6j/LuyIJCVNo60KgxgrcPX7z++oXU3xeHKrFAmoohEqPLSoUqD2oxo/RCFzznn+n8i/T2vUc5/ l1EgiK0/ux7xQ9eydee+eiQVOiBE0ODfjU7RCDhxI2XO86yUG1PHsC1suDE7zOIY59D9zdSfUZxM Np4D1qZlvtisSWk0gT+JdGNgVgGDA3ZsDMBT4Az5YgtTCfJ5FNy3x8bzDCFeANpsDLQtuBdN0N9H cVMcwo9Jn+d+qtNGYpGQIgC/+vmumLYp7uzjakYCdp7Os/ymIAd8FCyA1ZDJivlIxrGaWS5m0wwx UeBLvXQl8TQ178NCpCh6Of45pr/G8SJNm7bH1wvfBzlGmkB4F83hFRkBBd3mlmE6jmZ4Fve1e0+z Xdn8IKlARLcomqxVKh7G9QFoGs3Y3sd8gqSy3xVE1/9z6peta5ZtGBo39Ac1L9ZXrlTrF/nPpaNZ pq8/eX2qGZZvIEKP141ztNIbMji7FHnTG9KHtkctsYFuE/FtlR2VEyjUcTVOveKfB3TgoeZQWed8 gQolj9utaAIOfZSW17lgZD7bsrDtCk22t8kWH4IzSgdtPj67ut5fY+8+kZq1tGQdB129PbvGWYk6 eIrVrjyuR8hmufThdu6/RbmkzpL5rbP0rbP0h3WWTubss5RfDnCAzqjFhNM+TvxjUhWO52+bSsrm Ec7dOJEDRtftA5qGUspG8GMcyhFf4e4O6djQ2UKijqsz9GCAdI/iwF5RdSB/FQFhmpjaMyyUtzLO ZgT8DnFsApDe3SGItFJOavDz+GQkkGp/FCMc6OmsdaM+HY+T+XSASKa6wxTeoN5EUufR3R2K/+HD SoymgfMj9vHD67O31+zk7Sl7+e7tq4tTfLs4eYMhGv1Q20c5VV1ktpegZvqrKMfkC5ZU+KfI4iiI Smq05QygIwrKrhkTSCZUw0jkYQEbQNHtIYHNp1Ewbbp/YkQ9lTIj4vDkGXVbykz17kZS9T8aGzLA kcVqrycVZQVjKnIwA5tmM0krofusikMiKWaobEGEJKyxer8U9MZgG+sbwq9qGIdiVkMnev6JYIVq 75zXmG13Z+/V5afzfRiF6DbupKmmW0vgNUGmVPW3QKKZolpNaRkvWIGMXc2YICFKcp/dnRxHJDln M5FKYoco19IoXoHwIjh7SgqpL1KjCWEEaiW1NMDkXFBr7IDJOxlU1HdTPbmZiEJl1YryPQmq7AcZ 2r7cjHaHdCyJ4qgUEHwKhFAQ70WpNHZa5ZRVaHLdeaUxfNjdCasyksWB0rq4VaRrs4wjMKC6bqpS 0YKPV08M88Dm7gH1APau55i3gOJyWTdhKbG9QtSzc7CaS/CuBDhiV4gqlEGE5TUhZ7LGxzQiCa+I v4KdNkCbjeNMULlt28EiUJC6aEwMqEdyi7JVw0tJ2ovZC4D9nq2ajJAgJSyY8iElXZ0ayub2iLqf MIhQSUIi1Nvd1HXikdq/02hexah4ypHwPKLO4TTLy6JN4ngya9uj5OtT0Uwk3yCwrFqEVLJDDShC uXRcZOAojqAdzIVQ0ZjskMsZKKvt5R0cWcWgkr8TCWw3hqfeIPTcdGGVKTGi+K/lnou0plWvXLKb zdNVlhvZFhB6Mi1ZIm4k3Hx3pyhBGkaqY5uKGDSHbFEVrZUQ7wE1DdiL1UhX7JM12zML6tq8UPoe RWGoLgKiW5zjqA8BPgvaQN31IgFQI/+APuS1RxB7CQHBpse/ZrTah5veb+3KNQsgUEYqGSmjNsZv MpmIRaPdMXy2UP6wTpnCTAVNjMhlTfe4pk4AhRZ3a1T2E3mJgk/aKBBLtUUa9VMDmZrXdS9cNZy7 9vqKLyPnwjGXCYbILlRGWmGxeUOl7jd3uaT2JqqFIqbDwqJ2/Dav1FWxl0iolV4HCNVbScTrFSFY nKgrAKIF9ZfUlSbr5lQ36H2JBXEZQ5IJyk9reJUHlbozdS3U9MNZexRtdQQHwoz5lIpFq21Ysfbc nqnAaW0fcgI1XXmKCCg8yTe7RVVddojtOPqiBOoIaexcxRImU88eoaNcUBUa5Q6xJPna+FQKERNk N5UniiktWpqynFbFgBn23zoc10qAyALXe6C7z1z9b7W6ES17jcrbulXsH7BmeYKowmEXasvoyuET YQYEfljfQdCWRdnk7IYRKjIgThPaHgI8SiitS5X2VRkm569v7xf1fYdqUCg9qvyD5aNqQWJGQRWj DMMBEwHwBhFgWhpWQYSMpNSg+F8S0ho48b6+O0kJ7SjnU/pUiid4RW8NFIzr4LcQ4zrVNBUHbjWK KzlDgVJBhbVz1JI6/TQK6zyB3B1GkgggZWFggTSIq1C20vVABWZJHAnTerexuAP2qpIRZfFWG8i0 sUqVpAja+UIFKAxCpVhhF3XBF9JVUlx0SbhzqEb6FwS8Goh00Dw7SyfwpD5GVDl/EySqq6DmsUwn UDwIK5R5RDdqh+qm0jAtxplr+y43TRW2vw/rLCdeyYBuxFAcS3WJGMu7A3Zxky2iA/ZGTLKiFubP buuZh9xyDd9Zf/vOcy3XcXzH6e6MqA4MH9/I9HHTZ7K7nlldbmT4w/rIfCrbFptBNwn2CrG6Z5iK RA6f3XQTuzltO+ykKrOk9z5bN2Eklz0/wze8jlWUYcRO2z+zuG171kqjcrmO3oHzl2/wrW3l2aZp mluYVo3OaVbSl2aua/nm8jZsfS6dPpqJpsEtfaPpd9Yj1W9sttdMpmP4Vve8fmOwGeKu7fjGQ5t+ MSJJC4pA9fjStJxpYrS8mdAScTendl+xyDUZVo/G1QCxCs3RjUaVRr1vM5MPMN1wMTnUomml4kg7 +bi9LfijpK4gDH7cKPB3tQS7pl7nDuysJtvciVXygFnsH1VKjSCDGkEGH1o2+x5pTr/3wkuEEx5m T7ieJrn9hB9fnA4s37U9TSkhHKdBhH/k9muvewQXFdszdBN2wf8213x7v9cY6ysSKdgwOdmPM9My TQ4fYqtL2R532C+Kp1/3/4zepKOuyBz8pesP601urHxAQ407jm38e1xAUUeN828ttW8ttT/uZa1s TkgPp6Ou4rC2MDEqmqwpYUzVuPZAhwq7fOXqr7j0sw+5aXuusY4ObMPyfFu3dd7d2c0prtpi7Jie 3pVpeQc0nE7aQdf0XW+jyBNWWlZl17K411VldZJoSdMb991ANO6oehayZzvQewOsreS6Z+rd+zFA klq3EHL0AEsUTLuyatnL99wJmra0XNOw3XXx2rfgl8U7jm5k98a8Y7mu20GJROQ33Xs0nm8ZS4AC PbbPLcfU230eR9+Fjxtahu75Tsda0yNFuuoYN33eQy10pG6HbNOje/+HQQVCB4iOWCzqIkfPmmqn pV9q8BAUWjUCmBDdfWEy16JZcRv0MEX/SXtBlKTLh9r7d1fXZ6fbccPnTMbeKCoBHj7Po+AG6P7p 0zQrZiJ5+vQYB888qmoA8TBA8cu761/ZaadE9ia6hSceNh5Jh5NXSoUdhviTrgipmPrc1xz3oVeE qytX8M3Lufmbp1HhenKpWaZj2vrxNs13l4SXWXoA8MROZnkNpAx9aNpDy2GvL6+3oiDDonf/DRu7 8HtvCEVZjqJ7rgj7fAARcdfxTRMb9gmzvXXGlrmcWw2HL++7JtwQqr/kgcDFtnX6RdK/CXDxvuGW b7jlD8MtlwtGNZeaR+r1NOxav+orWFFW1OlhbTlsddbeIqluYJLdUpM2V42rOqtSr/O7kFGtVG10 Ed/Uv+0RbCznbFncMEXl5fl6XlbE6C3ked3okuwzcEr9xrSIm95bXXBVn1GVZcZO0oW6olwTBJWh //54V92pLVTiQExbJ7SXzBXD44j64OuvnVPbqulmYU5C7cGx0iMTCBWqZCo+1otWh+6o16X9FQjP OnRN7lvrAE93TIebjm1a3hJHrHY0KGSof7yYQstlC5o6JEWNufVnG10fiiVxsz6t1/SgXyh2QIfU 37V7AEFXmy039HJgKrfuqfZZH1Hh2jx0cNg2rSXv/e6KB6zbaaF+Kb0DaNxytkmnFPRFHHXcmrZ7 ryKDrJvm+cuX4Lby3MLlXCKUujfkNoYp0Nsx3bftTrJePujB1W0/2nQ009I8R7Pb32yG0zJHXdPW 7E4Y8Iv4HT/a/Je/2bSCH397o36zyf/xu3+zqSo7N1beaTKcoe627zStSUa/VASu4Jal+cTqV6Vi e+ptoSuK5jNVX5dM/UQ/G/hEN8Zvr/e38eEODQIZ7HvdWOVD7fp5OjJqvbQvJDGiz65kfhshp+3Z mo1zlA2vWNPTs9eW7rw5+eePR/fsav9vO1fb0zYShL8j8R9WUSOCDhy/xXZQkiNA6XEtHC2ge+lV lYmdZFsnjrxOKf31NzPrt8QmTa+0AokIlMSx17Oz65lnZ+eZPdVI77qABftW++Cwfdg3D4z2Eayj Do51vEht9c3WAfypjmOYdtvc98ap2avWSy/F5K9wjrKX8sljHfr6cv+uaxDp1LaqR3Krtg4LEuc1 S+b+AmCt0EGm+QrGY842XEv7/5sg+YCAoP4EBJ+A4L0BwWR/ET0yopuxjGddnWU3kSkRuIeLuihG rchI/KTglam2Na1cWMJp220HHE3mFmu1dFMr978eH2TwoJCavuS8cTf1k8SYpXMp2exd6TAmdGWR rvxwpxlHvYJTXvToElqWL+LT2bxcAqLDJ+VkeCR7BlV3HvuYYNLV7HI7Hv9UPhh7ZTlB+qqjcZmm Bat+vxu5Hg8LP5X8tI844U4oMXMngwRLmIqqLwEJWzs6PiQgce6vDSSOI77DdKsQIgA3Yu+p1h3F H0Y3SmHsZRmDt7raVjQbIKOB2djv8jsTgkIEpCl9uuoivWp7nSIXKzt5dvLDOokiGMsSs0b1YQCN uqJZjmIoWnuplEVl11nDUeDxtJryLale8Twt6jAyXr96/acOLbbUgqct98FExEdVOuzFPjSoEwjw 4sl+EA7cYAweqixZRReXJMvEOdNblmF8izgSpXz1NAmoLlFWDPJ9ZJ1E7iXREhx1MZ/ymP0eijFn HfEB3/c5pjjIUw4HK6JTRXoE7kR4ITth0sCQzV6CjSiuauqWBv+GrvRhZrfMCrEWSBvO0BkOfctW kmsNTVds23ZsH/DlTAbnlFEYjoL0coxnrluYIoNbc7HrigHn+blHXEDzXGb8c+LgbW5cCT/a7WMS LuDFeRw3NUWHa06mu29wkU7Q65tFLkwC+bAsjgJSz2nUacitHTnsoGJV39Pgzzo/Tcf+r91jHhD9 e/fgdg953cHpZ8q4EUg5HrpCMqLB5yLN+sIvYI4Cb2xpSLAujDcfxKIpG2z+RkFovN0FGNBdPHEu MFK9w8Y8Ft3dlqIyxBAcJOnilxism+ienL1/8/z8/eUfmCqMw9OF51x7WPuz6hO8fYK39wZv3xbN q4RxmLsFMyigjLuTNFUusZoS0qUp5luT5DjlyVLmF5xKeJLS2uWPlOcmr+OUaAsgFN8AM4Ej9OJx t6b9W6/1WOULz2QS/bEcUTFYF8/9bq3eGM6DAGNs2wKBcb3BhcejbVGjwJs8+Qw+9WQ7J6cvWIYC 2Rjw1cDv6kxEg27N/0K85eYNn3rhTXPIKY0vDvkA9PhhNiKhpcBaO2kPwWC13A/nJSUFQUnpAHTZ IHCF6NYm/nQ+kN5E1Hodl40jf7io01qv3pCfOk2312nC1bLnj6Lrac+zaabiPFuthnoDvC92F7u6 VvuPaQ5kmmitownBv/irVfF4NaDrd9uc6pdsYYW6JqFH7JXtwnNCuiE946o3s36bG5rC/g7nCUtC 5lrXKanaFwN3hiYXjDbyfhiSwXZY5Ma0nzR2p6xZVzY3dNnAhGO6DMItjFTEtNOTLoDZlthCdOay LXiow3Abvl7D8uAjhSmiWwLBe8QpQ6GK86O+QjngvO7FIH+3OS4Y44c+CVFCmgn3YoAf+jMne5ub 3ToZm+81uo9llHMj+/V+f93EPr5+6/oq61H9whbWMa0iewQSy5rY1cSqHo59WC3toHUjfhrSblny jaINyefT5WX9z4jTIgVBs1SnREEwTMw+1Kx8+1KmAGS7r46VF98FRQyJMpNu25qGlifBYZGzLK3O Ug0j4wrM3DytT1fzElnePM4yAVXbtE2tQC8gZkn6Y9vQHS3b3U1IR1mbdsvMxB/5UcTLQVx/HhVj oUmYNGF+lc4mtlN6umU6uciNaVUxlWLdlZad5xnGblZcC3ScacrzxYznzVjttmlnd0iJNsXSJPfF OgCHmCYKjmj2wWosAp82DWTyYAxLVzfg9L0qPfAFKfe0xjqjCfgJP9q/CaPAS8owTYPe2gmB5xf9 5vnF8Q6b3f7D0wyRKSV2wBI64f2JBeKBoRYS5jQsm0vFuPRyMa5L2Q12ACvyYP45SyrE5NVFZoI7 iT6Gz1zn0zNtX+YPagUl9Cpz8bDYsm5qioaBaN0qJNuVW0DygQoX6I4DH1u6vXw1axS6lS/NtTb2 z3QwIXC7IuEPR2viYjnD/fxmlWl5JdVlTf+iEqX56vLwh3Ae7q8ey7dxHlTHfDxVRJ5iak8xtXuM qcm0asyxez5HImxaTDNGdqk7cb/Ar8Ety/w47SCnRNOELI9hdndzg7Zy2DXxshOOL7Kwp+zqop8d aZxl5NfEp9GogJmSYXGqDjuUvHGUiGFRhOyOguGjT6G7awIPIhc2E3FbYSdxktp4jbVEOGUjyuCg CCc+RexhMUsVAsBxgGs5Zg3wL7/Kkq7UcupmqHdSN7Rhfgzz6AhRSGK7RcrDxrkF7VwjZRpLqwja DuHxbVoygIADHo+RZev50F2s6UvkkbRyiHSXEgrKmiqw1ohcBUsYuDDMymwMi+UZlQBrgvML3Oum uBX40+bGf1BLAQIUABQAAAAIAOewIS0GzF+x8RsAABBiAAAGAAAAAAAAAAEAIAC2gQAAAABmcC50 eHRQSwUGAAAAAAEAAQA0AAAAFRwAAAAA --Boundary_(ID_Uqs78No0Dj49zOTKCoTyzA)-- From tim.one@comcast.net Mon Sep 2 03:50:36 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 01 Sep 2002 22:50:36 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <15730.52784.407584.441515@localhost.localdomain> Message-ID: > Tim> It takes about an hour to run and evaluate tests for one change. > Tim> If you want to motivate me to try, supply a patch against > Tim> timtest.py (in the sandbox), else I've already got far more ideas > Tim> than time to test them properly. Anyone else want to test this > Tim> one? [Skip Montanaro] > Care to identify some of those ideas? Nope, I'm puking sick of this topic now. Look for XXX comments in timtest.py for some of them. You can infer others from places where XXX comments aren't . The f-p rate can't be improved anymore (meaning that it's too low for me to measure an improvement if one were made). The f-n rate is still high, but adding more headers is likely the most effective way to cut f-n, and my testing corpora won't allow me to test that (the header lines are too damned different since my ham and spam came from entirely different sources). It's somebody else's turn now ... and thank Barry for the email pkg! It's been a joy to use. From oren-py-d@hishome.net Mon Sep 2 05:22:05 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 2 Sep 2002 00:22:05 -0400 Subject: [Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: References: Message-ID: <20020902042205.GA29553@hishome.net> Nice work! Some other threads you may want to include in your summary : The 'str' in 'string' feature: http://mail.python.org/pipermail/python-dev/2002-August/027354.html PEP 237 deprecation warnings and hex constants: http://mail.python.org/pipermail/python-dev/2002-August/027783.html PEP 277 - unicode filenames http://mail.python.org/pipermail/python-dev/2002-August/027651.html Oren From tim.one@comcast.net Mon Sep 2 07:54:35 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 02 Sep 2002 02:54:35 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: [Delaney, Timothy] > On second thought - if a word-pair appears, then the separate parts should > not be checked as separate words. > > So, If I had scores: > > 'free' 0.1 > 'beer' 0.1 > ('want', 'free',) 0.9 > ('free', 'beer',) 0.01 > ('free', '!!!',) 0.99 > > then the following phrases would match (case-folding) as: > > 'I want free beer!!!': > > ('want', 'free',) 0.9 > ('free', 'beer',) 0.01 > > 'Get *** for free!!!' > > ('free', '!!!',) 0.99 > > 'I want free beer. Free the beer!!!' > > ('want', 'free',) 0.9 > ('free', 'beer',) 0.01 > 'free' 0.1 > 'beer' 0.1 > > Damn I wish I was at home to try this out ... :( I'm going to say a lot of stuff here, and then shut up . I want to move on to other things, but there's an opportunity to pass on some darned good advice for those who can hear. Combining pairs of words is called "word bigrams". My intuition at the start was that it would do better. OTOH, my intuition also was that character n-grams for a relatively large n would do better still. The latter may be so for "foreign" languages, but for this particular task using Graham's scheme on the c.l.py tests, turns out they sucked. A comment block in timtest.py explains why. I didn't try word bigrams because the f-p rate is already supernaturally low, so there doesn't seem anything left to be gained there. This echoes what Graham sez on his web page: One idea that I haven't tried yet is to filter based on word pairs, or even triples, rather than individual words. This should yield a much sharper estimate of the probability. My comment with benefit of hindsight: it doesn't. Because the scoring scheme throws away everything except about a dozen extremes, the "probabilities" that come out are almost always very near 0 or very near 1; only very short or (or especially "and") very bland msgs come out in between. This outcome is largely independent of the tokenization scheme -- the scoring scheme forces it, provided only that the tokenization scheme produces stuff *some* of which *does* vary in frequency between spam and ham. For example, in my current database, the word "offers" has a probability of .96. If you based the probabilities on word pairs, you'd end up with "special offers" and "valuable offers" having probabilities of .99 and, say, "approach offers" (as in "this approach offers") having a probability of .1 or less. The theory is indeed appealing . The reason I haven't done this is that filtering based on individual words already works so well. Which is also the reason I didn't pursue it. But it does mean that there is room to tighten the filters if spam gets harder to detect. I expect it would also need a different scoring scheme then. OK, I ran a full test using word bigrams. It gets one strike against it at the start because the database size grows by a factor between 2 and 3. That's only justified if the results are better. Before-and-after f-p (false positive) percentages: before bigrams 0.000 0.025 0.000 0.025 0.050 0.050 0.000 0.025 0.025 0.050 0.025 0.100 0.050 0.075 0.025 0.025 0.025 0.050 0.000 0.025 0.075 0.050 0.050 0.000 0.025 0.050 0.000 0.025 0.050 0.075 0.025 0.025 0.025 0.025 0.000 0.000 0.025 0.050 0.050 0.025 Lost on 12 runs Tied on 5 runs Won on 3 runs total # of unique fps across all runs rose from 8 to 17 The f-n percentages on the same runs: before bigrams 1.236 1.091 1.164 1.091 1.454 1.708 1.599 1.563 1.527 1.491 1.236 1.127 1.163 1.345 1.309 1.309 1.891 1.927 1.418 1.382 1.745 1.927 1.708 1.963 1.491 1.782 0.836 0.800 1.091 1.127 1.309 1.309 1.491 1.709 1.127 1.018 1.309 1.018 1.636 1.672 Lost on 9 runs Tied on 2 runs Won on 9 runs total # of unique fns across all runs rose from 336 to 350 This doesn't need deep analysis: it costs more, and on the face of it either doesn't help, or helps so little it's not worth the cost. Now I'll tell in you confidence that the way to make a scheme like this excellent is to keep your ego out of it and let the data *tell* you what works: getting the best test setup you can is the most important thing you can possibly do, which must include multiple training and test corpora (e.g., if I had used only one pair, I would have had a 3/20 chance of erroneously concluding that bigrams might help the f-p rate, when running across 20 pairs shows that they almost certainly do it harm; while I would have had an even chance of drawing a wrong conclusion-- in either direction --about the effect on the f-n rate). The second most important thing is to run a fat test all the way to the end before concluding anything. A subtler point is that you should never keep a change that doesn't *prove* itself a winner: neutral changes bloat your code with proven irrelevancies that will come back to make your life harder later, in part because they'll randomly interfere with future changes in ways that make it harder to recognize a significant change when you stumble into one. Most things you try won't help -- indeed, many of them will deliver worse results. I dare say my intution for this kind of classification task is better than most programmers' (in part because I had years of professional experience in a related field), and most of the things I tried I had to throw away. BFD -- then you try something else. When I find something that works I can rationalize it, but when I try something that doesn't, no amount of argument can change that the data said it sucked . Two things about *this* task have fooled me repeatedly: 1. The "only look at smoking guns" nature of the scoring step makes many kinds of "on average" intuitions worthless: "on average" almost everything is thrown away! For example, you're not going to find bad results reported for n-grams (neither character- nor word-based) in the literature, and because most scoring schemes throw much less away. Graham's scheme strikes as brilliant in this specific respect: it's worth enduring the ego humiliation to get such a spectacularly low f-p rate from such simple and fast code. 2. Most mailing-list messages are much shorter than this one. This systematically frustrates "well, averaged over enough words" intuitions too. Cute: In particular, word bigrams systematically hate conference announcements. The current word one-gram scheme hated them too, until I started folding case. Then their SCREAMING stopped acting against them. But they're still using the language of advertisement, and word bigrams can't help but notice that more strongly than individual words do. Here from the TOOLS Europe '99 announcement: prob('more information') = 0.916003 prob('web site') = 0.895518 prob('please write') = 0.99 prob('you wish') = 0.984494 prob('our web') = 0.985578 prob('visit our') = 0.99 Here from the XP2001 - FINAL CALL FOR PAPERS: prob('web site:') = 0.926174 prob('receive this') = 0.945813 prob('you receive') = 0.987542 prob('most exciting') = 0.99 prob('alberta, canada') = 0.99 prob('e-mail to:') = 0.99 Here from the XP2002 - CALL FOR PRACTITIONER'S REPORTS ('BOM' is an artificial token I made up for "beginning of message", to give something for the first word in the message to pair up with): prob('web site:') = 0.926174 prob('this announcement') = 0.94359 prob('receive this') = 0.945813 prob('forward this') = 0.99 prob('e-mail to:') = 0.99 prob('BOM *****') = 0.99 prob('you receive') = 0.987542 Here from the TOOLS Europe 2000 announcement: prob('visit the') = 0.96 prob('you receive') = 0.967805 prob('accept our') = 0.99 prob('our apologies') = 0.99 prob('quality and') = 0.99 prob('receive more') = 0.99 prob('asia and') = 0.99 A vanilla f-p showing where bigrams can hurt was a short msg about setting up a Python user's group. Bigrams gave it large penalties for phrases like "fully functional" (most often seen in spams for bootleg software, but here applied to the proposed user group's web site -- and "web site" is also a strong spam indicator!). OTOH, the poster also said "Aahz rocks". As a bigram, that neither helped nor hurt (that 2-word phrase is unique in the corpus); but as an individual word, "Aahz" is a strong non-spam indicator on c.l.py (and will probably remain so until he starts spamming ). It did find one spam hiding in a ham corpus: """ NNTP-Posting-Host: 212.64.45.236 Newsgroups: comp.lang.python,comp.lang.rexx Date: Thu, 21 Oct 1999 10:18:52 -0700 Message-ID: <67821AB23987D311ADB100A0241979E5396955@news.ykm.com> From: znblrn@hetronet.com Subject: Rudolph The Rednose Hooters Here Lines: 4 Path: news!uunet!ffx.uu.net!newsfeed.fast.net!howland.erols.net!newsfeed.cwix.com! news.cfw.com!paxfeed.eni.net!DAIPUB.DataAssociatesInc..com Xref: news comp.lang.python:74468 comp.lang.rexx:31946 To: python-list@python.org THis IS it: The site where they talk about when you are 50 years old. http://huizen.dds.nl/~jansen20 """ there's-no-substitute-for-experiment-except-drugs-ly y'rs - tim From tdelaney@avaya.com Mon Sep 2 08:43:06 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Mon, 2 Sep 2002 17:43:06 +1000 Subject: [Python-Dev] The first trustworthy GBayes results Message-ID: > From: Tim Peters [mailto:tim.one@comcast.net] > > I'm going to say a lot of stuff here, and then shut up > . I want to > move on to other things, but there's an opportunity to pass > on some darned > good advice for those who can hear. Pretty darned good advice too ... but you won't object if I waste some time playing with this stuff anyway I hope. Only one way to accumulate experience after all ;) Personally, I considered that you were already well past the point of diminishing returns, and anything further was of academic interest to those who felt a desire to tinker ... (i.e. the hard work has been done, and everything else is just fun and games :) If enough people (or just one dedicated person) waste enough time, who knows what may come out. Hey - it worked for timsort didn't it ...? ;) Tim Delaney From mal@lemburg.com Mon Sep 2 09:02:27 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 02 Sep 2002 10:02:27 +0200 Subject: [Python-Dev] To commit or not to commit References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D731B13.9090909@lemburg.com> Martin v. Loewis wrote: > Guido van Rossum writes: > > >>>Any objections against committing the patch? >> >>What do MvL and MAL say? > > Because of the size, I'm sure there are still bugs in it. I couldn't > spot any by inspection, so I think the patch is ready to be installed. +1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From tim.one@comcast.net Mon Sep 2 09:09:54 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 02 Sep 2002 04:09:54 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: [Delaney, Timothy] > Pretty darned good advice too ... but you won't object if I waste > some time playing with this stuff anyway I hope. Only one way to accumulate > experience after all ;) Not at all! Knock yourself out -- it's really a lot of fun, except when it gets so tedious you start punching the wall just to watch your knuckles bleed . > Personally, I considered that you were already well past the point of > diminishing returns, Not yet -- false positives are a horrible thing, and the false negative rate still lets a lot of spam through. Cutting the f-n rate, e.g., in half, would mean half as much spam to deal with; generalization left to the reader. > and anything further was of academic interest to those who felt a desire to > tinker ... The best hope for reducing f-n lies in exploiting more header lines than I can test with my mixed corpora, and there's *tons* of room for improvement there (note that the f-n rate is more than 20x greater than the f-p rate now). Anyone who wants to tackle that with tedious experiment should first pick Neil Schemenauer's brain: he had a good start on that early last week. > (i.e. the hard work has been done, and everything else is just fun and > games :) If enough people (or just one dedicated person) waste enough time, > who knows what may come out. Hey - it worked for timsort didn't it ...? ;) Indeed so, and it works for this too -- never underestimate the power of working yourself sick. If you also *write* about it, you can make everyone else ill too by proxy . sharing-the-pain-ly y'rs - tim From walter@livinglogic.de Mon Sep 2 12:21:22 2002 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Mon, 02 Sep 2002 13:21:22 +0200 Subject: [Python-Dev] PyString_DecodeEscape and PEP293 References: <3D60EA3B.7030008@livinglogic.de> Message-ID: <3D7349B2.8010706@livinglogic.de> Martin v. Loewis wrote: > Walter Dörwald writes: > > >>A recent checkin added a function PyString_DecodeEscape() >>to stringobject.c. To make this function PEP293 compatible >>it would need access to unicode_decode_call_errorhandler >>which is defined static in unicodeobject.c. Does >>PyString_DecodeEscape() really need an errors argument? > > > What do you mean, "really need"? The callers of this function pass the > argument, in particular escape_decode. Is that "real"? So does escape_decode need an errors argument. AFAICT escape_decode is used only in the context of reading pickles. Will there ever be a need to call escape_decode with anything other than errors="strict"? >>If yes, we could either move it to unicodeobject.c > > > No. It has to do little with Unicode. > > >>or make unicode_decode_call_errorhandler externally visible. > > > I don't know this function. It's a static function in unicodeobject.c in the PEP293 patch that does the complete error handling for decoding. > What does this have to do with Unicode? I expected that all codecs to unicode<->8bit coding/decoding "string-escape" seems to be an exception. >>Another problem that I noticed is that string-escape can't >>be used for encoding Unicode objects: > > > That is a feature. string-escape has nothing to do with Unicode. So it doesn't need the new PEP293 error handling? Bye, Walter Dörwald From walter@livinglogic.de Mon Sep 2 12:22:25 2002 From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Mon, 02 Sep 2002 13:22:25 +0200 Subject: [Python-Dev] To commit or not to commit References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> <3D731B13.9090909@lemburg.com> Message-ID: <3D7349F1.4090100@livinglogic.de> M.-A. Lemburg wrote: > Martin v. Loewis wrote: > >> Guido van Rossum writes: >> >>>> Any objections against committing the patch? >>> >>> What do MvL and MAL say? >> >> Because of the size, I'm sure there are still bugs in it. I couldn't >> spot any by inspection, so I think the patch is ready to be installed. > > +1. OK, I'll check it in then. Bye, Walter Dörwald From pinard@iro.umontreal.ca Mon Sep 2 13:02:55 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Mon, 02 Sep 2002 08:02:55 -0400 Subject: [Python-Dev] Re: The first trustworthy GBayes results In-Reply-To: (Tim Peters's message of "Mon, 02 Sep 2002 02:54:35 -0400") References: Message-ID: [Tim Peters] [... extremely good work and stuff and comments, for a good while now ...] Hi, Tim. I read your messages, witnessing your work and progress in that area, with great interest, and also saved them for later contemplation! :-) Spam always annoyed me, as most of us, and despite many efforts I did, it is increasingly successful at traversing my filters -- so this idea of Graham or Bayesian filters is timely and welcome. Most previous filters I observed are based on various (random) tests or events (you surely know all this), and `procmail'-based filters, or even the popular SpamAssassin, are either very slow or at least slow. The tool I use since 1998 is much faster, especially after I rewrote it in Python!, it is also based on various tests or events. Your works concentrated on tuning the statistical formulas and lexical analysis, and building operational data from preset corpora. I'm sure all the knowledge gleaned there will make its way everywhere, and reach me. For a tiny share, I decided to experiment with day-to-day user aspects of using such a filter, and built a Gnus interface over Eric Raymond's Bogofilter. There are two functions to this program, one is about learning from messages known to be ham or spam, the other is about classification of incoming messages. By the way, if there are Gnus users among you, just ask me for the recipe... It goes pretty well for me, so far. The principle, put forward by Paul Graham, is to let the user have two delete commands: delete-as-ham or delete-as-spam. Eric pushed this idea a bit further by postponing learning until the user quits the mail reader, `mutt' in his case. As Gnus allows me to have many mailgroups and folders and shuffle between them, I postpone learning until the user switches mailgroups or quit, and only for the _final_ disposition of a message: that is, when a message is merely saved into another folder, the decision will be taken when leaving that other folder, and not the current one. Messages marked as "saved" are _not_ sent, so to avoid double learning. The fact is that ham messages are more likely to be postponed than spam, because ham is more often filed here and there. Even if many or most ham messages are deleted, this introduce a short term bias in the learning statistics by which the percentage of spam seems to be higher (in my case, 1157 messages have been learned in about three days, 20% of which were spam), but this percentage will later be lowered as filed messages get reprocessed. Another effect is that the delay itself in ham learning may have a slight effect on classification, but since both ham and spam are well represented, the effect is likely negligible. Tim corpora are surely very clean, at least by now, while day-to-day learning may yield slightly tainted learning. In my case, when a thread does not interest me, I often kill all articles it contains in one command, without opening each of them to see if it would not be spam: the threading itself makes it unlikely. But nevertheless possible, you surely noticed that bad guys now fetch and re-use already published subjects as a way to get through. That means that if big corpora are thinkable in case of mailing lists having existed for a while, those are probably not very usable for individual users. GBayes, Bogofilter and others should ideally resist some amount of ham-tainted-as-spam or spam-tainted-as-ham at learning time. After adding Graham filtering as a supplementary method to my spam detection tool, I gladly observe that it successfully detects many spam messages which would otherwise fall in the cracks, so it really brings something to me. But I also see many spam cases (are they?) it does not detect and that it would hardly: one simple example is that _for me_, invalidly structured MIME is indicative of an un-interesting message, as interesting people know better! One particular problem I observed are Tim messages themselves, which are undoubtedly very miummy ham messages, but discussing and quoting many spam inside them. Should these be registered as ham or spam? :-) Would not these defeat the learning to some extent? Where should Tim add his own messages in the corpora he uses, and what changes would result in `GBayes' effectiveness? -- François Pinard http://www.iro.umontreal.ca/~pinard From guido@python.org Mon Sep 2 15:01:45 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 02 Sep 2002 10:01:45 -0400 Subject: [Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: Your message of "Sun, 01 Sep 2002 21:29:09 CDT." <15730.52469.604124.730029@localhost.localdomain> References: <15730.52469.604124.730029@localhost.localdomain> Message-ID: <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> > Looks good to me. The only trivial nit I would like to raise is that any > URLs you embed in the text be true URLs. I'd also prefer they be encased in > <...>, but that's slightly less important and generally only matters when > URLs are immediately followed by punctuation. So, instead of > > Brett> Guido said he will revive the PEP. The patch has since been put > Brett> on SF at python.org/sf/599331 . > > you'd have > > Brett> Guido said he will revive the PEP. The patch has since been put > Brett> on SF at . > > The two changes make it much more likely that email readers will be able to > successfully highlight such URLs correctly. I think adding http:// alone should be sufficient. Despite all the official recommendations, I've always hated the <...> form. However, do keep a space after the URL if punctuation were to follow (which you already did). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Sep 2 15:06:05 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 02 Sep 2002 10:06:05 -0400 Subject: [Python-Dev] To commit or not to commit In-Reply-To: Your message of "Mon, 02 Sep 2002 10:02:27 +0200." <3D731B13.9090909@lemburg.com> References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> <3D731B13.9090909@lemburg.com> Message-ID: <200209021406.g82E65b30667@pcp02138704pcs.reston01.va.comcast.net> > >>>Any objections against committing the patch? > >> > >>What do MvL and MAL say? > > > > Because of the size, I'm sure there are still bugs in it. I couldn't > > spot any by inspection, so I think the patch is ready to be installed. > > +1. OK, anchors away then! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From pinard@iro.umontreal.ca Mon Sep 2 16:15:54 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Mon, 02 Sep 2002 11:15:54 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Mon, 02 Sep 2002 10:01:45 -0400") References: <15730.52469.604124.730029@localhost.localdomain> <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> Message-ID: >> you'd have >> >> Brett> Guido said he will revive the PEP. The patch has since been put >> Brett> on SF at . >> >> The two changes make it much more likely that email readers will be able to >> successfully highlight such URLs correctly. > > I think adding http:// alone should be sufficient. Despite all the > official recommendations, I've always hated the <...> form. Gnus highlights correctly with the `http://', and adds clickability. The `<' and '>' are not needed. I do not know what other mail readers do. To get the same effects with email addresses, I often prefer using `mailto:' as a prefix over writing `<' and `>' around a quoted address in a message body, even if not fully systematic about this. In the message header itself, `<' and '>' are the proper way to go, of course. -- François Pinard http://www.iro.umontreal.ca/~pinard From tim.one@comcast.net Mon Sep 2 16:41:00 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 02 Sep 2002 11:41:00 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: Message-ID: I usually add <> to http thingies when I remember to. A couple people yelled at me, claiming their readers couldn't recognize http thingies otherwise. This seems particularly odd, since I almost always put them on their own line: http://www.python.org OTOH, *my* reader doesn't recognize them in the style, neither with nor without <>. From barry@python.org Mon Sep 2 16:48:47 2002 From: barry@python.org (Barry A. Warsaw) Date: Mon, 2 Sep 2002 11:48:47 -0400 Subject: [Python-Dev] To commit or not to commit References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> <3D731B13.9090909@lemburg.com> <3D7349F1.4090100@livinglogic.de> Message-ID: <15731.34911.231999.691324@anthem.wooz.org> >>>>> "WD" =3D=3D Walter D=F6rwald writes: WD> OK, I'll check it in then. Does that mean it's time to mark PEP 293 as Final and move it to the Finished PEPs category in PEP 0? -Barry From walter@livinglogic.de Mon Sep 2 17:29:14 2002 From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Mon, 02 Sep 2002 18:29:14 +0200 Subject: [Python-Dev] To commit or not to commit References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> <3D731B13.9090909@lemburg.com> <3D7349F1.4090100@livinglogic.de> <15731.34911.231999.691324@anthem.wooz.org> Message-ID: <3D7391DA.6010306@livinglogic.de> Barry A. Warsaw wrote: >>>>>>"WD" == Walter Dörwald writes: >>>>> > > WD> OK, I'll check it in then. > > Does that mean it's time to mark PEP 293 as Final and move it to the > Finished PEPs category in PEP 0? Guido already changed PEP 283, so: yes. Only a few cleanup tasks remain (Neals comments, LaTeX documentation for the rest of the C functions). Bye, Walter Dörwald From barry@python.org Mon Sep 2 17:48:47 2002 From: barry@python.org (Barry A. Warsaw) Date: Mon, 2 Sep 2002 12:48:47 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 References: Message-ID: <15731.38511.160332.641594@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> I usually add <> to http thingies when I remember to. A TP> couple people yelled at me, claiming their readers couldn't TP> recognize http thingies otherwise. This seems particularly TP> odd, since I almost always put them on their own line: TP> http://www.python.org As do I. TP> OTOH, *my* reader doesn't recognize them in the TP> TP> style, neither with nor without <>. Mine does too, but it's not the <> that is the distinguishing feature, AFAIK. The <> seem to be most useful for inline urls where trailing punctuation gets incorrectly attached to the url. -Barry From drifty@bigfoot.com Mon Sep 2 18:11:40 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 2 Sep 2002 10:11:40 -0700 (PDT) Subject: [Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > > you'd have > > > > Brett> Guido said he will revive the PEP. The patch has since been put > > Brett> on SF at . > > > > The two changes make it much more likely that email readers will be able to > > successfully highlight such URLs correctly. > > I think adding http:// alone should be sufficient. Despite all the > official recommendations, I've always hated the <...> form. However, > do keep a space after the URL if punctuation were to follow (which you > already did). > I think I will go with adding http:// to all addresses and putting them on their own line. -Brett From martin@v.loewis.de Mon Sep 2 21:31:56 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 02 Sep 2002 22:31:56 +0200 Subject: [Python-Dev] PyString_DecodeEscape and PEP293 In-Reply-To: <3D7349B2.8010706@livinglogic.de> References: <3D60EA3B.7030008@livinglogic.de> <3D7349B2.8010706@livinglogic.de> Message-ID: Walter D=F6rwald writes: > So does escape_decode need an errors argument. AFAICT > escape_decode is used only in the context of reading pickles. > Will there ever be a need to call escape_decode with anything > other than errors=3D"strict"? It's a codec, so anybody is entitled to write "foo".decode("string-escape", "replace") if they chose to. If you are suggesting that this is not supported is only acceptable if you also suggest how it should fail. Silently ignoring the "replace" argument is not acceptable. > > What does this have to do with Unicode? >=20 > I expected that all codecs to unicode<->8bit coding/decoding > "string-escape" seems to be an exception. That was my original expectation as well. By now, I have accepted things like >>> "foo".encode("base64") 'Zm9v\n' So codecs can do way more things than converting between unicode<->byte strings. Whether it is a good thing that they are that flexible is still open to debate, however, it was convenient for string-escape. > So it doesn't need the new PEP293 error handling? Probably not - just supporting "strict", "replace", "ignore", and failing for any other error handling would be sufficient. If you manage to make it fail for anything but "strict", that would be acceptable as well (IMO). Regards, Martin From tdelaney@avaya.com Tue Sep 3 00:25:19 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Tue, 3 Sep 2002 09:25:19 +1000 Subject: [Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01 Message-ID: > From: Brett Cannon [mailto:bac@OCF.Berkeley.EDU] > > I think I will go with adding http:// to all addresses and > putting them on their own line. May I suggest that this may be a good test document for reStructuredText? Especially if it is going to such places as slashdot ... Tim Delaney From skip@pobox.com Tue Sep 3 02:37:48 2002 From: skip@pobox.com (Skip Montanaro) Date: Mon, 2 Sep 2002 20:37:48 -0500 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: References: <15730.52469.604124.730029@localhost.localdomain> <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15732.4716.629172.615326@12-248-11-90.client.attbi.com> >> I think adding http:// alone should be sufficient. Despite all = the >> official recommendations, I've always hated the <...> form. Fran=E7ois> Gnus highlights correctly with the `http://', and adds Fran=E7ois> clickability. The `<' and '>' are not needed. =20 I use VM. It highlights correctly as long as the leading "http://" is there, provided the URL isn't followed immediately by punctuation which= can occur in a URL. In Brett's summary he avoids that problem by adding a = space between the URL and the ambiguous punctuation. I think that looks odde= r than the <...> notation. Skip From barry@python.org Tue Sep 3 05:16:32 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 3 Sep 2002 00:16:32 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 References: <15730.52469.604124.730029@localhost.localdomain> <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> <15732.4716.629172.615326@12-248-11-90.client.attbi.com> Message-ID: <15732.14240.941982.728027@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: >> I think adding http:// alone should be sufficient. Despite all >> the official recommendations, I've always hated the <...> form. SM> In Brett's summary he avoids that problem by adding a space SM> between the URL and the ambiguous punctuation. I think that SM> looks odder than the <...> notation. I agree (and also use VM :), but putting the url on a separate line looks fine, unless that greatly increases the vertical whitespace. -Barry From akim@epita.fr Tue Sep 3 07:26:52 2002 From: akim@epita.fr (Akim Demaille) Date: 03 Sep 2002 08:26:52 +0200 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> <200207301622.g6UGMBl17143@odiug.zope.com> Message-ID: >>>>> "Fran=E7ois" =3D=3D Fran=E7ois Pinard wri= tes: Fran=E7ois> [Akim Demaille] >> I'm not sure I completely understand the question here: if >> HAVE_CONFIG_H is specified, it means config.h is created. So if >> you use a config.h, why does it matter not to define HAVE_CONFIG_H? Fran=E7ois> Hi, Akim. I hope life is still good to you! :-) Hi Fran=E7ois! The new (scholar) year is starting now, so life is still good, but I'm a bit afraid of what it might be done in the near future :) Fran=E7ois> In the beginnings of Autoconf, the `config.h' file did not Fran=E7ois> exist. David MacKenzie added it as a way to reduce the Fran=E7ois> `make' output clutter. Nowadays, I suspect almost all Fran=E7ois> packages of at least moderate size uses it. Agreed. Fran=E7ois> Our traditional `lib/' modules have to work in many Fran=E7ois> packages, whether `config.h' has been created or not, this Fran=E7ois> being decided on a per package basis, and that is why there Fran=E7ois> is a conditional inclusion of `config.h' in each of these Fran=E7ois> `lib/' modules. He took a good while before we got Fran=E7ois> stabilised on the exact stanza of this inclusion (I Fran=E7ois> especially remember the massive unilateral changes by Roland Fran=E7ois> McGrath introducing the BROKEN_BROKET define, or something Fran=E7ois> like that, and all the doing it later took to clean this Fran=E7ois> out.) I understand. Fran=E7ois> Python (the distribution, which is what is in question here) Fran=E7ois> does not use any of our `lib/' things, it is not going to Fran=E7ois> use them, and it is not going to provide new such modules, Fran=E7ois> so the distribution includes `config.h' everywhere, by Fran=E7ois> permanent choice, without any need to use `HAVE_CONFIG_H' to Fran=E7ois> decide if that inclusion is needed or not. So, even Fran=E7ois> `-DHAVE_CONFIG_H' is useless `make' clutter in this case, Fran=E7ois> and that's why the Python packagers wanted to get rid of it. Fran=E7ois> In fact, in practice `-DHAVE_CONFIG_H' is only needed for Fran=E7ois> packages using those common `lib/' modules, but many Fran=E7ois> packages do not. Now that Autoconf is used with projects Fran=E7ois> who have a life outside GNU, this is less necessary. Guido Fran=E7ois> found, and got me to remember, that `@DEFS@' is the culprit: Fran=E7ois> people just do not have to use it in their hand-crafted Fran=E7ois> Makefiles, which is the case for Python. For away-from-GNU Fran=E7ois> packages using Automake, some Automake option might exist so Fran=E7ois> `@DEFS@' does not get generated? The only goal here is to Fran=E7ois> get a cleaner `make' output. I understand the goal, but much of the effort is devoted to having the thing work cleanly, not being beautiful. Another goal is to have it being easy to maintain, i.e., not having too much to document, too much to support, too much to test etc. So, although I don't know what the Automake team might think of this idea, I suspect they'll want to focus on other features :( From sholden@holdenweb.com Tue Sep 3 11:52:36 2002 From: sholden@holdenweb.com (Steve Holden) Date: Tue, 3 Sep 2002 06:52:36 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 References: Message-ID: <00cb01c25338$05d87120$6300000a@holdenweb.com> ----- > I usually add <> to http thingies when I remember to. A couple people > yelled at me, claiming their readers couldn't recognize http thingies > otherwise. This seems particularly odd, since I almost always put them on > their own line: > > http://www.python.org > > OTOH, *my* reader doesn't recognize them in the > > > > style, neither with nor without <>. > But it nevertheless sends out something that *it* will recognise as a URL. Both your references were correctly represented as hyperlinks in OE when I read your message! regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming pydish.holdenweb.com/pwp/ Previous .sig file retired to www.homeforoldsigs.com ----------------------------------------------------------------------- From greg@python.org Tue Sep 3 14:41:12 2002 From: greg@python.org (Greg Ward) Date: Tue, 3 Sep 2002 09:41:12 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: References: <20020828194248.GA16407@cthulhu.gerg.ca> Message-ID: <20020903134112.GC1227@cthulhu.gerg.ca> [Tim, last week] > What's an acceptable false positive rate? [my response] > Speaking as one of the people who reviews suspected spam for python.org > and rescues false positives, I would say that the more relevant figure > is: how much suspected spam do I have to review every morning? < 10 > messages would be peachy; right now it's around 5-20 messages per day. [Tim again] > I must be missing something. I would *hope* that you review *all* messages > claimed to be spam, in which case the number of msgs to be reviewed would, > in a perfectly accurate system, be equal to the number of spams received. Good lord, certainly not! Remember that Exim rejects a couple hundred messages a day that never get near SpamAssassin -- that's mostly Chinese/Korean junk that's rejected on the basis of 8-bit chars or banned charsets in the headers. Then, probably 50-75% of what SA gets its hands on scores >= 10.0, so it too is rejected at SMTP time. Only messages that score < 10 are accepted, and those that score >= 5.0 are set aside in /var/mail/spam for review. That's 10-30 messages/day. (I do occasionally scan Exim's reject log on mail.python.org to see what's getting rejected today -- Exim kindly logs the full headers of every message that is rejected after the DATA command. I usually make it to about 11am of a given day's logfile before my eyes glaze over from the endless stream of spam and viruses.) Note that we *used* to accept messages before passing them to SpamAssassin, so never rejected anything on the basis of its SA score. Back then, we saved and reviewed probably 50-70 messages/day. Very, very, very few (if any) false positives scored >= 10.0, which is why that's the threshold for SMTP-time rejection. > OTOH, the false positive rate doesn't have anything to do with the number of > spams received, it has to do with the number of non-spams received. Err, yeah, good point. I make a point of talking about "suspected spam", which is any message that scores between 5.0 and 10.0. IMHO, the true nature of those messages can only be determined by manual inspection. > Maybe you don't want this kind of approach at all. The classifier doesn't > have "gray areas" in practice: it tends to give probabilites near 1, or > near 0, and there's very little in between -- a msg either has a > preponderance of spam indicators, or a preponderance of non-spam indicators. That's a great improvement over SpamAssassin then: with SA, the grey area (IMHO) is scores from 3 to 10... which is why several python.org lists now have a little bit of Mailman configuration magic that makes MM set aside messages with an SA score >= 3 for list admin review. (It's probably worth getting the list admin to do a bit more work in order to avoid sending low-scoring spam to the list.) However, as long as "very little" != "nothing", we still need to worry a bit about that grey area. What do you think we should do with a message whose spam probability is between (say) 0.1 and 0.9? Send it on, reject it, or set it aside? Just how many messages fall in that grey area anyways? Greg -- Greg Ward http://www.gerg.ca/ MTV -- get off the air! -- Dead Kennedys From mcherm@destiny.com Tue Sep 3 15:11:45 2002 From: mcherm@destiny.com (Michael Chermside) Date: Tue, 03 Sep 2002 10:11:45 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib Message-ID: <3D74C321.7070103@destiny.com> >> Hmm, I intended to have s1.refresh() return a new object for use in >> s2 while leaving s1 alone (being immutable and all). Now, I wonder >> if that was the right thing to do. The answer lies in use cases for >> algorithms that need sets of sets. If anyone knows off the top of >> their head that would be great; otherwise, I seem to remember that >> some of that business was found in compiler algorithms and graph >> packages. > > Let's call YAGNI on this one. > Furthermore, what if I create a BIG set like this: s = ImmutableSet( range(2**x) ) Now, not only do I use lots of memory for s, I ALSO keep around lots of memory to preserve a temporary list which I never wanted to keep anyhow! -- Michael Chermside From walter@livinglogic.de Tue Sep 3 17:05:21 2002 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Tue, 03 Sep 2002 18:05:21 +0200 Subject: [Python-Dev] PyString_DecodeEscape and PEP293 References: <3D60EA3B.7030008@livinglogic.de> <3D7349B2.8010706@livinglogic.de> Message-ID: <3D74DDC1.7040609@livinglogic.de> Martin v. Loewis wrote: > Walter Dörwald writes: > > >>So does escape_decode need an errors argument. AFAICT >>escape_decode is used only in the context of reading pickles. >>Will there ever be a need to call escape_decode with anything >>other than errors="strict"? > > > It's a codec, so anybody is entitled to write > > "foo".decode("string-escape", "replace") > > if they chose to. If you are suggesting that this is not supported is > only acceptable if you also suggest how it should fail. Silently > ignoring the "replace" argument is not acceptable. I won't suggest that. Let's keep PyString_DecodeEscape as it is now. It should not be a problem for encoding, because encoding can't fail, so there is no need for using "xmlcharrefreplace" etc. as the error handling. Decoding can fail, but lets add custom error handling only when the need for it arises (which hopefully won't). > [...] >>So it doesn't need the new PEP293 error handling? > > Probably not - just supporting "strict", "replace", "ignore", and > failing for any other error handling would be sufficient. If you > manage to make it fail for anything but "strict", that would be > acceptable as well (IMO). OK, lets keep PyString_DecodeEscape as it is now (i.e. "strict", "ignore", "replace" implemented inline with no custom error handling). Bye, Walter Dörwald From tim.one@comcast.net Tue Sep 3 17:27:57 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 03 Sep 2002 12:27:57 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <00cb01c25338$05d87120$6300000a@holdenweb.com> Message-ID: [Tim] > OTOH, *my* reader doesn't recognize them in the > > > > style, neither with nor without <>. [Steve Holden] > But it nevertheless sends out something that *it* will recognise as a URL. I think you're assuming I use Outlook Express. I don't; "my reader" is usually Outlook 2000. > Both your references were correctly represented as hyperlinks in OE when I > read your message! Yes, OE and Outlook differ in this repsect. From guido@python.org Tue Sep 3 17:53:45 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 03 Sep 2002 12:53:45 -0400 Subject: [Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: Your message of "Sun, 01 Sep 2002 15:57:53 PDT." References: Message-ID: <200209031653.g83GrjQ01929@odiug.zope.com> > Yes, with Michael's permission, I am attempting to start up the Python-dev > summaries again. Below is my attempt at summarizing the last half of > August. It's longer then normal summaries, but that is because I bothered > to include discussions on threads that were not directly relating to the > Python core but are interesting nonetheless (e.g., the whole spambayes > thread). > > I am posting to Python-dev first before posting to c.l.py, c.l.py.a (also > lwn.net and probably Slashdot) because I want to get the general okay from > the list that I have done a good enough of a job to send this out; I don't > want to have a summary that represents the going-ons here without the > general populace (or just the BDFL since he can overrule =) being okay > with it. I am also curious as to whether I should go into more or less > detail, leave out the summaries that do not directly pertain to the Python > core, etc. > > So please read the summary and let me know if you are okay with it. If so > I will try to do semi-monthly summaries from now on. Oh, and I am on > vacation right now and will be doing a lot of travelling in the next two > months, so I can't guarantee summaries will be this quick to come out for > a while. I will do them, though, even if they are a week late. =) > > Oh, and if I do get the okay to do this, expect a lot of dumb questions > from me in the future in terms of clarifying things. Just remember, it is > for the good of the Python community. =) Thanks, Brett. Minor comments ahead; but basically, go ahead -- don't let striving for perfection keep you from posting something good! > > ======================================= > > > This is a summary of traffic on the python-dev mailing list between August > 16, 2002 and September 1, 2002 (exclusive). It is intended to inform the > wider Python community of ongoing developments. To comment, just post to > python-list@python.org or comp.lang.python in the usual way. Give your > posting a meaningful subject line, and if it's about a PEP, include the > PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev > members are interested in seeing ideas discussed by the community, so > don't hesitate to take a stance on a PEP if you have an opinion. > > This is the first summary written by Brett Cannon. > Summaries are archived no where at the moment. =) They will be, though, > so stay tuned for the URL in future summaries. > > > > Posting distribution (with apologies to mbm, but thanks to mwh for the > code) > > Number of articles in summary: 585 > > 80 | [|] > | [|] > | [|] > | [|] > | [|] [|] > 60 | [|] [|] [|] > | [|] [|] [|] > | [|] [|] [|] > | [|] [|] [|] > | [|] [|] [|] [|] > 40 | [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] > 20 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] > | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] > 0 +-071-025-012-042-063-084-030-021-039-009-047-027-033-041-036-005 > Fri 16| Sun 18| Tue 20| Thu 22| Sat 24| Mon 26| Wed 28| Fri 30| > Sat 17 Mon 19 Wed 21 Fri 23 Sun 25 Tue 27 Thu 29 Sat 31 I'm not sure I care about this diagram. It's also kind of hard to read. I would mind less if it was at the end of the summary. > > > ================ > Type Categories > ================ > This VERY long thread was sparked by Andrew Koenig asking if a discussion > of making type categories more explicit had ever occured (Andrew meant for > category to mean "the set of all types that implement a particular marker > interface"). As Andrew later pointed out, he was asking about "a way of > making notions such as 'file-like object' more formal and/or automatic". > The discussion quickly started using the term interface to mean defining a > way to specify that an object implemented certain methods (think of it in > terms of Java's 'implements' mechanism). Once that was out of the way, > the discussion took off. Zope's implementation was pointed out > (http://cvs.zope.org/Zope3/lib/python/Interface/) very quickly. PEP 245 > (Python Interface Syntax) was also brought to the attention of the list. > The idea of using inheritance to handle interfaces was brought up. Guido > said that he hasn't "given up the hope that inheritance and interfaces > could use the same mechanisms. But Jim Fulton, based on years of > experience in Zope, claims they really should be different" in terms of > how interfaces should be handled in objects. Jeremy Hylton tried to > channel Jim's opinion by pointing out that "We'd like to use interfaces to > make fairly strong claims. If a class A implements an interface I, then > we should be able to use an instance of A anywhere that an I is needed." > But "the inheritance mechanism is too general" because if a class A > implements interface I and then a class B, which does not implement I, > subclasses class A we end up with a class B that claims it has a certain > interface which it doesn't actually have. Guido understood the point, but > still thought inheritence could be used "if there was a way to "shut off" > inheritance as far as isinstance() (or issubclass())" is concerned. Guido > asked the simple question, "Why do keep arguing for inheritance? (a) the > need to deny inheritance from an interface, while essential, is relatively > rare IMO, and in *most* cases the inheritance rules work just fine; (b) > having two separate but similar mechanisms makes the language larger." > Samuele Pedroni asked that any implementation "allow also for refering to > anonymous super-interfaces of an interface in terms of the interface plus > a subset of its signatures, also e.g. FileLike and just 'write'. [that > means an interface can be thought to correspond to a set of > (tag,signature) tuples, where tag identifies the interface, and one can > also just consider subsets of it]". The thread has finally seemed to have > stopped (for now) with Guido saying he is mulling the whole thing in the > back of his head. This is a very sticky topic because of the number of > design decisions required and how it might change the way people program > in Python. Please break up that paragraph into pieces shorter than 12 lines each. :-) > There was also a partial sub-thread in this whole discussion about > multimethods; basically a way to do overloading of methods based on > parameter signature. Most of the discussion was over syntax and such and > how to handle resolution order. It then seemed to go to the wayside when > the main part of the thread took over again. > > ============================== > type categories -- an example > ============================== > This thread was starteed when Andrew Koenig said that the reason he > brought up his type category question was because he wanted a way so as to > be able to identify members of a type easily. He now had an example in a > program he was writing where what the type of the argument was varied and > thus what needed to be done to the data changed accordingly. Jermey > Hylton suggested the isinstance(obj, type(re.compile(''))) idiom. Andrew > asked if this was guaranteed to work, which Jeremy said no. I asked why > this was not guaranteed, and Frederick Lundh said because re.compile() is > a factory fxn and it is possible that a future version could return a > different object based on the pattern. > > =============================================== > Python build trouble with the new gcc/binutils > =============================================== > Andrew Koenig said that he couldn't compile Python using the newest gcc > (this was the day after the latest release hit servers). With help from > Zack Weinberg of Code Sourcery (who also recently rewrote the tempfile > module), the problem was tracked down to binutils 2.13. being the culprit > and was not Python's fault. > > =================================== > Last call: mortal interned strings > =================================== > The patch python.org/sf/576101 removes the default immortality of interned > strings. I believe it was in early August (possibly spilled over from > late July) when Oren Tirosh proposed the idea and wrote the above > mentioned patch. There had been some discussion over whether any 3rd > party code was reliant upon interned strings being immortal; none was > found (MacPython was reliant upon it, but since it is under Python core > control it was considered a moot point since it could be changed). It has > been checked in. With the patch the way to make a string immortal is to > call PyString_InternImmortal(); no code in the core uses this function. > > ===================================== > PEP 218 (sets); moving set.py to Lib > ===================================== > Thanks to Greg Wilson (for writing the PEP), Alex Martelli (for writing > the module initially), and Guido (for refactoring Alex's code) the stdlib You might add Raymond Hettinger who wrote the docs and did significant work on the code after me. Also Tim Peters who added some good speedups. > has now gained a sets module. It has both the notion of mutable and > immutable sets (the latter used when you have a set of sets). There was > discussion about how sets should print (sorted or not; unsorted is default > but option is there to print sorted) This option is no longer documented though. It may yet disappear. > and what operators should be > overloaded for working on sets (| and & were chosen). The module is a > beautiful chunk of code and I highly recommend reading its source. Thanks. > =========================================== > A few lessons from the tempfile.py rewrite > =========================================== > Zack Weinberg, after rewriting the tempfile module, brought up three > points: > 1) Lack of dummy threads, 2) lack of a pthreads_once equivalent, and 3) > lack of a way to skip tests from unittest.py via some built-in method. > Guido responded accordingly: 1) since some code uses the idiom of trying > to import thread and catching the exception if it fails, Guido said he > would be willing to accept a dummy_thread.py that would allow: > > try: > import thread as _thread > except ImportError: > import dummy_thread as _thread > > to work. No word on whether this is being written at the moment. 2) > Guido said the method was, in his opinion, overkill. He said to "be > Pythonic, live dangerously, accept the risk that a ^C can screw you. It > can anyway. :-)". And as for 3) Guido deferred Zack to the PyUnit list > and Steve Purcell since Python just tracks Steve's code (pyunit.sf.net). > Guido's suggestion was to stick code that was reliant on some other code > in a separate testing suite that is only run when the reliant code is > available. > > =========================== > Standard datetime objects? > =========================== > Kevin Jacobs asked what stage the new datetime object was at. Guido said > it is in python/nondist/sandbox/datetime/ in CVS which also has comments > pointing to a wiki containing the current work on it. Fred L. Drake, Jr. > is working on the C re-implementation and Guido expects a checkin at any > moment (hasn't happened as of this writing). Has now, in the sandbox (more to come). > =================== > PEP 269 versus 283 > =================== > Jonathan Riehl noticed that PEP 283 said PEP 269 was dead; not good > considering he was close to having a patch for PEP 269 (pgen module to > interface with the C version). Guido said he will revive the PEP. The > patch has since been put on SF at python.org/sf/599331 . > > ============================== > What is a backport candidate? > ============================== > Since Python 2.2 is going to be around for a long time, the question was > brought up of what constitutes code that should be backported. Guido made > the following three points: > > 1) code trivial to backport should always be backported > > 2) code patcheing 2.3 code should obviously not be backported x > > 3) 2.2 code requires changes to use patch, but applies; gradients of this > exist. > > So please, when submitting patches, mention whether you think the patch > should be backported to the 2.2 tree and any possible dependencies it > might have in a backport. > > ================================= > python/nondist/sandbox/spambayes > ================================= > In response to Paul Graham's spam filter written using Baye's Rule > (Slashdot post on it is at > http://developers.slashdot.org/article.pl?sid=02/08/16/1428238&tid=156), a > thread spawned around this checkin of code that followed that paper's > suggestions. This thread quickly jumped into discussions on data > structures, Baye's Rule, and a whole lot of talk about spam. Very > interesting if spam filtering interests you. Tim Peters has been leading > the drive on this chunk of code (and thanks to his illness that befelled > him in late August which he has subsequently gotten over he had a few days > of major hacking on it; Tim showed he is a performance stats whore > ). > > A very cool quote came out of this thread from Eric S. Raymond when > discussing the spam filter he has been working on: "This is actually the > first new program I've coded in C (rather than > Python) in a good four years or so". (Several of us think even this didn't have to be coded in C after all. :-) > ==================== > Parsing vs. lexing. > ==================== > In response to a question by Aahz about what the differences were between > a lexer, parser, and tokenizer, Eric Raymond posted a good overview of the > differences. Guido later commented in an email mentioning SPARK and about > how Python's lexer (pgen) works and why he wrote it. He also made some > other comments on lexers. Jeremy Hylton pointed out a "neat new paper > about an old algorithm for recursive descent parsers with backtracking and > unlimited lookahead" by Bryan Ford at http://www.brynosaurus.com/pub.html > . Alex Martelli pointed out that this discussion reminded him of "a > long-ago interview with Borland's techies" in which they said they were > able to make Borland PASCAL fit on a floppy while MS PASCAL took multiple > floppies. Their trick was "we just did everything by the Dragon Book -- > except that the parser is a hand-written recursive descent parser [Aho &c > being adamant defenders of Yacc & the like], which buys us a lot". > Someone named Noah also emailed a discussion on lexers and parsers pulling > in Finite State Machines, Push Down Autonoma, and Turing Machines in his > discussion. > > Martin Sj?n says that Haskell's pattern matching and lazy evaluation makes Come on, you know his real name is Sjögren. :-) > lexers easy (even a Recursive-Descent parser), but unfortunately Haskell > does not play with other languages nicely. Haskell is where Python got > it's list comprehension idea. > > ========================================= > [Python-Dev] Fw: Security hole in rexec? > ========================================= > It was brought to the attention of the list that deleting __builtins__ > allowed a compromise in rexec. Guido pointed out that > python.org/sf/577530 reports this. He also said don't trust rexec. > > A patch is going to be submitted to document the view that rexec is really > not that safe. It was checked in. > ================= > A `cogen' module > ================= > Francois Pinard asked about Cartesian products using the new sets module. > Guido didn't think people would in general need it. Francois quickly > started this thread of discussing a cogen module to generate Cartesian > products and other ways of operating on sets. Tim Peters quickly posted *his* elaborate state-of-the-art code, which ended the discussion (as usual, posting code is a good way to stop discussion :-). > ================= > Mersenne Twister > ================= > Raymond Hettinger volunteered to implement the Merseene Twister algorithm > (one in Python exists at www.math.keio.ac.jp/~matumoto/emt.html). While > discussing to implement in C or Python, Guido noticed that random.Random > re-implements whrandom. Guido then came up with the idea of writing a > base random class that is subclassed where .random() can be implemented; > Tim Peters agreed and suggested more methods to subclass. > > ================================= > New PEP Format: reStructuredText > ================================= > David Goodger and Barry Warsaw have now gotten reST as a usable syntax for > PEPs. Read the PEPs on the subject to learn more: > > - PEP 12 -- Sample reStructuredText PEP Template > (http://www.python.org/peps/pep-0012.html) > > - PEP 258 -- Docutils Design Specification > (http://www.python.org/peps/pep-0258.html) > > - PEP 287 -- reStructuredText Docstring Format > (http://www.python.org/peps/pep-0287.html) > > ==================================== > tiny optimization in ceval mainloop > ==================================== > Jeremy Hylton noticed that in ceval that their is a test of whether the > ticker was 0 or if things_to_do was set to true (explanation of the > ticker, checkinterval, and the GIL follow this paragraph). Jeremy > wondered if we could just drop the ticker to 0 when things_to_do is true. > Jack Janssen, though, pointed out that clearing it is not guaranteed since > there may be an interrupt routine when "we fiddle things_to_do". Skip > Montanaro then pointed out that since neither ticker nor things_to_do is > fiddled with unless the GIL is held that instead of causing each thread to > execute this test that they could be made globals instead; he did a patch > that implements this (python.org/sf/602191). Guido then said that if > there wasn't a decent speed improvement, then no patch would be checked > in. He then changed his mind when it was pointed out that it actually > simplified the code. Skip tested anyway, though, and there is a speed > improvement. This also brought up whether the default value of 10 for > checkinterval was reasonable. It was then agreed to be bumped up to 100. > Jack ran some code and said he noticed a definite improvement. > > Python's version of threading is not like in C. There is something called > the GIL (Global Interpreter Lock) which any thread wishing to execute > Python code or play with Python objects must hold. This means that when > you have Python threads running (using the thread or threading module) > they are usually all waiting in line to get the GIL. Now for Python to > decide when to release the GIL for another thread to grab it, it uses the > ticker. This variable counts down to zero by being decremented every time > a Python opcode is executed (originally defaulted to 10, now defaulted to > 100). The ticker's starting value after each release of the GIL is what > sys.checkinterval() sets. > > To get a better understanding of therading under Python I recommend > reading Aahz's tutorials on threading. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev All in all, please keep this up!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Tue Sep 3 18:53:36 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 03 Sep 2002 13:53:36 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020903134112.GC1227@cthulhu.gerg.ca> Message-ID: [Tim again] >> I must be missing something. I would *hope* that you review >> *all* messages claimed to be spam, in which case the number of msgs >> to be reviewed would, in a perfectly accurate system, be equal to the >> number of spams received. [Greg Ward] > Good lord, certainly not! Remember that Exim rejects a couple hundred > messages a day that never get near SpamAssassin -- that's mostly > Chinese/Korean junk that's rejected on the basis of 8-bit chars or > banned charsets in the headers. Then, probably 50-75% of what SA gets > its hands on scores >= 10.0, so it too is rejected at SMTP time. Only > messages that score < 10 are accepted, and those that score >= 5.0 are > set aside in /var/mail/spam for review. That's 10-30 messages/day. > > (I do occasionally scan Exim's reject log on mail.python.org to see > what's getting rejected today -- Exim kindly logs the full headers of > every message that is rejected after the DATA command. I usually make > it to about 11am of a given day's logfile before my eyes glaze over from > the endless stream of spam and viruses.) I get about 200 spams per day on my own email accounts, and look at all of them. I don't look at the headers at all, I just look at the msgs in a capable HTML-aware mail reader, as a matter of course while dealing with all the day's email. It's rare that it takes more than a second to recognize a spam by eyeball and hit the delete key. At about 200 per day, it's just now reaching my "hmm, this is becoming a nuisance sometimes" threshold. Our tolerance levels for manual review seem to differ by a factor of 100 or more . > Note that we *used* to accept messages before passing them to > SpamAssassin, so never rejected anything on the basis of its SA score. > Back then, we saved and reviewed probably 50-70 messages/day. Very, > very, very few (if any) false positives scored >= 10.0, which is why > that's the threshold for SMTP-time rejection. I can tell you the mean false negative and false positive rates on what I've been working on, and even measure their variance across both training and prediction sets. (The fn rate is well under 2% now (adding in more headers should improve that a lot), and the fp rate under 0.05% (but I doubt that adding in more headers will improve this)). So long as we don't know the rates for the scheme you're using now, there's no objective basis for comparison. ... >> Maybe you don't want this kind of approach at all. The classifier doesn't >> have "gray areas" in practice: it tends to give probabilites near 1, or >> near 0, and there's very little in between -- a msg either has a >> preponderance of spam indicators, or a preponderance of non-spam >> indicators. > That's a great improvement over SpamAssassin then: with SA, the grey > area (IMHO) is scores from 3 to 10... which is why several python.org > lists now have a little bit of Mailman configuration magic that makes MM > set aside messages with an SA score >= 3 for list admin review. (It's > probably worth getting the list admin to do a bit more work in order to > avoid sending low-scoring spam to the list.) > > However, as long as "very little" != "nothing", we still need to worry a > bit about that grey area. What do you think we should do with a message > whose spam probability is between (say) 0.1 and 0.9? Send it on, reject > it, or set it aside? Under Graham's scheme, send it on. It doesn't have grey areas in a useful sense, becuase the scoring step only looks at a handful of extremes: extremes in, extremes out, and when it's wrong it's *spectacularly* wrong (e.g., the very rare (< 0.05%) false positives generally have "probabilties" exceeding 0.99, and a false negative often has a "probability" less then 0.01). > Just how many messages fall in that grey area anyways? I can't get at my testing setup now and don't know the answer offhand. I'll try to make time tonight to determine the answer. I guess the interesting stats are what percent of hams have probs in (0.1, 0.9), and what percent of spams. In general, it's only very brief messages that don't score near 0.0 or 1.0, so this *may* turn out to be the same thing as asking what percentages of hams and spams are very brief. Note too that adding the headers in *should* catch a lot more spam under this scheme. But, even as is, and even if I strip all the HTML tags out of spam, fewer than 1 spam in 50 scores less than 0.9. The ones that are passed on now include all spams with empty bodies (a message with an empty body scores 0.5). From tismer@tismer.com Tue Sep 3 19:26:01 2002 From: tismer@tismer.com (Christian Tismer) Date: Tue, 03 Sep 2002 20:26:01 +0200 Subject: [Python-Dev] Get rid of etype struct Message-ID: <3D74FEB9.5060406@tismer.com> This is a multi-part message in MIME format. --------------080306050703000101060801 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hi Guido, I think I have a solution for this one, see the attached diff. I did what you suggested: Make the adressing of the members dependant from the metatype. The etype struct has lost its members[1] field, to make it easier to extend the structure. Instead, the allocator always adds one to the size, to have the sentinel in place. I did not yet publish the etype stucture, since I didn't find a good name and place for it. Testing was also not very thorow. I just checked that types work from Python and that I can add __slots__ to them. Will re-port this stuff to my Py2.2 Stackless base and try it out as base type for my own C types. It took me the whole day to understand how it must work, and then just an hour to get it to work. This is quite some stuff :-) Can somebody please have a look, if there are subtle errors? ciao - chris ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=591586&group_id=5470 --------------080306050703000101060801 Content-Type: text/plain; charset=us-ascii; name="typeobject.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="typeobject.diff" cvs -z9 diff -u dist/src/Objects/typeobject.c Index: dist/src/Objects/typeobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/typeobject.c,v retrieving revision 2.179 diff -u -r2.179 typeobject.c --- dist/src/Objects/typeobject.c 16 Aug 2002 17:01:08 -0000 2.179 +++ dist/src/Objects/typeobject.c 3 Sep 2002 18:04:39 -0000 @@ -20,9 +20,12 @@ see add_operators() below. */ PyBufferProcs as_buffer; PyObject *name, *slots; - PyMemberDef members[1]; + /* here are optional user slots, followed by the members. */ } etype; +#define GET_MEMBERS(etype) \ + ((PyMemberDef *)(((char *)etype) + (etype)->type.ob_type->tp_basicsize)-1) + static PyMemberDef type_members[] = { {"__basicsize__", T_INT, offsetof(PyTypeObject,tp_basicsize),READONLY}, {"__itemsize__", T_INT, offsetof(PyTypeObject, tp_itemsize), READONLY}, @@ -213,7 +216,8 @@ PyType_GenericAlloc(PyTypeObject *type, int nitems) { PyObject *obj; - const size_t size = _PyObject_VAR_SIZE(type, nitems); + const size_t size = _PyObject_VAR_SIZE(type, nitems+1); + /* note that we need to add one, for the sentinel */ if (PyType_IS_GC(type)) obj = _PyObject_GC_Malloc(size); @@ -253,7 +257,7 @@ PyMemberDef *mp; n = type->ob_size; - mp = ((etype *)type)->members; + mp = GET_MEMBERS((etype *)type); for (i = 0; i < n; i++, mp++) { if (mp->type == T_OBJECT_EX) { char *addr = (char *)self + mp->offset; @@ -318,7 +322,7 @@ PyMemberDef *mp; n = type->ob_size; - mp = ((etype *)type)->members; + mp = GET_MEMBERS((etype *)type); for (i = 0; i < n; i++, mp++) { if (mp->type == T_OBJECT_EX && !(mp->flags & READONLY)) { char *addr = (char *)self + mp->offset; @@ -1125,7 +1129,8 @@ /* Are slots allowed? */ nslots = PyTuple_GET_SIZE(slots); - if (nslots > 0 && base->tp_itemsize != 0) { + if (nslots > 0 && base->tp_itemsize != 0 && !PyType_Check(base)) { + /* for the special case of meta types, allow slots */ PyErr_Format(PyExc_TypeError, "nonempty __slots__ " "not supported for subtype of '%s'", @@ -1334,7 +1339,7 @@ } /* Add descriptors for custom slots from __slots__, or for __dict__ */ - mp = et->members; + mp = GET_MEMBERS(et); slotoffset = base->tp_basicsize; if (slots != NULL) { for (i = 0; i < nslots; i++, mp++) { @@ -1366,7 +1371,7 @@ } type->tp_basicsize = slotoffset; type->tp_itemsize = base->tp_itemsize; - type->tp_members = et->members; + type->tp_members = GET_MEMBERS(et); type->tp_getset = subtype_getsets; /* Special case some slots */ *****CVS exited normally with code 1***** --------------080306050703000101060801-- From nas@python.ca Tue Sep 3 19:34:47 2002 From: nas@python.ca (Neil Schemenauer) Date: Tue, 3 Sep 2002 11:34:47 -0700 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: References: <20020903134112.GC1227@cthulhu.gerg.ca> Message-ID: <20020903183447.GA13310@glacier.arctrix.com> Tim Peters wrote: > Under Graham's scheme, send it on. It doesn't have grey areas in a useful > sense, becuase the scoring step only looks at a handful of extremes: > extremes in, extremes out, and when it's wrong it's *spectacularly* wrong > (e.g., the very rare (< 0.05%) false positives generally have "probabilties" > exceeding 0.99, and a false negative often has a "probability" less then > 0.01). I noticed that as well. When the classifier goes wrong it goes badly wrong and using different thresholds would not help. It seems that increasing the number of discriminators doesn't really help either. Too bad because otherwise you could flag those messages for human classification. On the bright side, based on the number of mis-classified messages in my corpus, it looks like a human would have a very hard time doing a better job. Perhaps all that is needed is a bypass mechanism for that small fraction of non-spammers. That way if their initial message is rejected they would still have some way of getting through. Erik Naggum made an interesting comment. He said that spam should be handled at the transport level. Greg's work on doing filtering at SMTP time accomplishes this and makes a lot of sense. When a message is rejected, the sending mail server is the one that has to deal with it. In the case of spam, the sending server is often an open rely. Letting it handle the bounces is sweet justice. :-) I bring this up because "STMP time filtering" makes a bypass mechanism work much better. With a system like TMDA, confirmation notices usually generate double-bounces. Instead, we could reject the message with a 5xx error that includes instructions on how to bypass the filter (e.g. include a cookie in the body of the message). Neil From python@discworld.dyndns.org Tue Sep 3 19:39:14 2002 From: python@discworld.dyndns.org (Charles Cazabon) Date: Tue, 3 Sep 2002 12:39:14 -0600 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: ; from tim.one@comcast.net on Tue, Sep 03, 2002 at 01:53:36PM -0400 References: <20020903134112.GC1227@cthulhu.gerg.ca> Message-ID: <20020903123914.B30532@twoflower.internal.do> Tim Peters wrote: > > Under Graham's scheme, send it on. It doesn't have grey areas in a useful > sense, becuase the scoring step only looks at a handful of extremes: > extremes in, extremes out, and when it's wrong it's *spectacularly* wrong > (e.g., the very rare (< 0.05%) false positives generally have "probabilties" > exceeding 0.99, and a false negative often has a "probability" less then > 0.01). I would love to see how the results would be affected by applying the scoring scheme to the entire content of the message, instead of just the 15 (or 16 in your case) most extreme samples. By the way, you never said why you increased that number by one; did it make that much difference? Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From guido@python.org Tue Sep 3 18:50:31 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 03 Sep 2002 13:50:31 -0400 Subject: [Python-Dev] Proposed Mixins for Wide Interfaces In-Reply-To: Your message of "Sat, 31 Aug 2002 12:44:06 EDT." <001101c2510d$9fce0920$5f66accf@othello> References: <001101c2510d$9fce0920$5f66accf@othello> Message-ID: <200209031750.g83HoVq05812@odiug.zope.com> > How about adding some mixins to simplify the > implementation of some of the fatter interfaces? Can you suggest implementations for these, to be absolutely clear what you mean? > class CompareMixin: > """ > Given an __eq__ method in a subclass, adds a __ne__ method > Given __eq__ and __lt__, adds !=, <=, >, >=. > """ What if the "natural" thing to implement is __le__ instead of __lt__? That's the case for sets. Or __gt__ (less likely)? > class MappingMixin: > """ > Given __setitem__, __getitem__, and keys, > implements values, items, update, get, setdefault, len, > iterkeys, iteritems, itervalues, has_key, and __contains__. > > If __delitem__ is also supplied, implements clear, pop, > and popitem. > > Takes advantage of __iter__ if supplied (recommended). Does that mean that if you have __iter__, you don't use keys()? In that case it should implement keys() out of __iter__. Maybe this should be required. > Takes advantage of __contains__ or has_key if supplied > (recommended). > """ Let's standardize on __contains__, not has_key(). I guess you could provide __contains__ as follows: def __contains__(self, key): try: self[key] except KeyError: return 0 else: return 1 I don't mind if there are some recursions amongst the various implementations; if you don't supply the minimum, the implementation will raise "RuntimeError: maximum recursion depth exceeded". > The idea is to make it easier to implement these interfaces. > Also, if the interfaces get expanded, the clients automatically > updated. A similar thing for sequences would be useful too, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Tue Sep 3 20:08:57 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 03 Sep 2002 15:08:57 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020903123914.B30532@twoflower.internal.do> Message-ID: [Charles Cazabon] > I would love to see how the results would be affected by applying > the scoring scheme to the entire content of the message, instead of > just the 15 (or 16 in your case) most extreme samples. Then it would be close to a classic Bayesian classifier, and like any such would need entirely different scoring code to avoid catastrophic floating-point errors (right now an intermediate result can't become smaller than 0.01**16 = 1e-32, so fp troubles are impossible; raise the exponent to a measly 200 and you're already out of the range of IEEE double precision; classic classifiers word in logarithm space instead for this reason). You can read lots of papers on how those do; all evidence suggests they do worse than this scheme on the spam versus non-spam task. > By the way, you never said why you increased that number by one; It's explained in the comment block preceding the MAX_DISCRIMINATORS definition. BTW, in an unreported experiment I boosted MAX_DISCRIMINATORS to 36. I don't recall what happened now, but it was a disaster for at least one of the error rates. > did it make that much difference? Not on average. It helped eliminate a narrow class of false positives, where previously the first 15 extremes the classifier saw had 8 probs of .99 and 7 of .01. That works out to "spam". Making the # of classifiers even instead allowed for graceful ties, which favor ham in this scheme. All previous decisions "should be" revisited after each new change, though, and in this particular case it could well be that stipping HTML tags out of plain-text messages also addressed the same narrow issue but in a more effective way (without some special gimmick, virtually every message including so much as an example of HTML got scored as spam). From guido@python.org Tue Sep 3 20:41:10 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 03 Sep 2002 15:41:10 -0400 Subject: [Python-Dev] Should KeyError use repr() on its argument? Message-ID: <200209031941.g83JfAK07542@odiug.zope.com> (SF bug 598451.) The KeyError exception doesn't apply repr() to its argument. That's annoying in cases like this: >>> a = {} >>> a[''] Traceback (most recent call last): File "", line 1, in ? KeyError >>> Should this be fixed? How? (I guess we could add a KeyError__str__ method to exceptions.c that applies repr().) I've got a feeling this is a feature, but not a very useful one. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 3 20:54:48 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 03 Sep 2002 15:54:48 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Your message of "Tue, 03 Sep 2002 11:34:47 PDT." <20020903183447.GA13310@glacier.arctrix.com> References: <20020903134112.GC1227@cthulhu.gerg.ca> <20020903183447.GA13310@glacier.arctrix.com> Message-ID: <200209031954.g83Jsmw07797@odiug.zope.com> > Erik Naggum made an interesting comment. He said that spam should be > handled at the transport level. Greg's work on doing filtering at SMTP > time accomplishes this and makes a lot of sense. When a message is > rejected, the sending mail server is the one that has to deal with it. > In the case of spam, the sending server is often an open rely. Letting > it handle the bounces is sweet justice. :-) In the case of a false positive, it has the added advantage that at least the poor sender, falsely accused of sending spam, gets a bounce and may try to try again. > I bring this up because "STMP time filtering" makes a bypass mechanism > work much better. With a system like TMDA, confirmation notices usually > generate double-bounces. Instead, we could reject the message with a > 5xx error that includes instructions on how to bypass the filter (e.g. > include a cookie in the body of the message). Do you still believe that TMDA is the only answer to spam? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 3 20:57:00 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 03 Sep 2002 15:57:00 -0400 Subject: [Python-Dev] Should KeyError use repr() on its argument? In-Reply-To: Your message of "Tue, 03 Sep 2002 15:41:10 EDT." <200209031941.g83JfAK07542@odiug.zope.com> References: <200209031941.g83JfAK07542@odiug.zope.com> Message-ID: <200209031957.g83Jv0k07810@odiug.zope.com> > The KeyError exception doesn't apply repr() to its argument. That's > annoying in cases like this: > > >>> a = {} > >>> a[''] > Traceback (most recent call last): > File "", line 1, in ? > KeyError > >>> > > Should this be fixed? How? (I guess we could add a KeyError__str__ > method to exceptions.c that applies repr().) > > I've got a feeling this is a feature, but not a very useful one. I take it back. args[0] being the actual key that failed is a feature. str() not using repr() on args[0] is a bug. I'll fix it. --Guido van Rossum (home page: http://www.python.org/~guido/) From pinard@iro.umontreal.ca Tue Sep 3 20:54:27 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Tue, 03 Sep 2002 15:54:27 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <200209031653.g83GrjQ01929@odiug.zope.com> (Guido van Rossum's message of "Tue, 03 Sep 2002 12:53:45 -0400") References: <200209031653.g83GrjQ01929@odiug.zope.com> Message-ID: [Guido van Rossum] >> 80 | [|] >> | [|] >> | [|] >> | [|] >> | [|] [|] >> 60 | [|] [|] [|] >> | [|] [|] [|] >> | [|] [|] [|] >> | [|] [|] [|] >> | [|] [|] [|] [|] >> 40 | [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] >> 20 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] >> | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] >> 0 +-071-025-012-042-063-084-030-021-039-009-047-027-033-041-036-005 >> Fri 16| Sun 18| Tue 20| Thu 22| Sat 24| Mon 26| Wed 28| Fri 30| >> Sat 17 Mon 19 Wed 21 Fri 23 Sun 25 Tue 27 Thu 29 Sat 31 > > [...] It's also kind of hard to read. [...] True. But not so difficult to improve. Adding a bit of simplicity yields: | 84 80 | [] | [] | [] | 71 [] | [] 63 [] 60 | [] [] [] | [] [] [] | [] [] [] | [] [] [] 47 | [] 42 [] [] [] 40 | [] [] [] [] 39 [] 41 | [] [] [] [] [] [] [] 36 | [] [] [] [] 30 [] [] 33 [] [] | [] [] [] [] [] [] [] 27 [] [] [] | [] 25 [] [] [] [] 21 [] [] [] [] [] [] 20 | [] [] [] [] [] [] [] [] [] [] [] [] [] | [] [] [] [] [] [] [] [] [] [] [] [] [] | [] [] 12 [] [] [] [] [] [] 9 [] [] [] [] [] | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] 5 | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] 0 +---------------------------------------------------------------- Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 -- François Pinard http://www.iro.umontreal.ca/~pinard From pinard@iro.umontreal.ca Tue Sep 3 20:57:50 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Tue, 03 Sep 2002 15:57:50 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <200209031653.g83GrjQ01929@odiug.zope.com> (Guido van Rossum's message of "Tue, 03 Sep 2002 12:53:45 -0400") References: <200209031653.g83GrjQ01929@odiug.zope.com> Message-ID: [Guido van Rossum] >> ================= >> A `cogen' module >> ================= >> Francois Pinard asked about Cartesian products using the new sets module. >> Guido didn't think people would in general need it. Francois quickly >> started this thread of discussing a cogen module to generate Cartesian >> products and other ways of operating on sets. > > Tim Peters quickly posted *his* elaborate state-of-the-art code, which > ended the discussion (as usual, posting code is a good way to stop > discussion :-). I'll be back! (Not that I especially look like Arnold Schwartzeneger!) -- François Pinard http://www.iro.umontreal.ca/~pinard From guido@python.org Tue Sep 3 21:18:03 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 03 Sep 2002 16:18:03 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: Your message of "Tue, 03 Sep 2002 15:54:27 EDT." References: <200209031653.g83GrjQ01929@odiug.zope.com> Message-ID: <200209032018.g83KI3q08343@odiug.zope.com> > > [...] It's also kind of hard to read. [...] > > True. But not so difficult to improve. Adding a bit of simplicity yields: > > | 84 > 80 | [] > | [] > | [] > | 71 [] > | [] 63 [] > 60 | [] [] [] > | [] [] [] > | [] [] [] > | [] [] [] 47 > | [] 42 [] [] [] > 40 | [] [] [] [] 39 [] 41 > | [] [] [] [] [] [] [] 36 > | [] [] [] [] 30 [] [] 33 [] [] > | [] [] [] [] [] [] [] 27 [] [] [] > | [] 25 [] [] [] [] 21 [] [] [] [] [] [] > 20 | [] [] [] [] [] [] [] [] [] [] [] [] [] > | [] [] [] [] [] [] [] [] [] [] [] [] [] > | [] [] 12 [] [] [] [] [] [] 9 [] [] [] [] [] > | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] 5 > | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] > 0 +---------------------------------------------------------------- > Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat > 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Ooh, much better. Still, put this at the end instead of at the top of the message. It's not *that* interesting. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Tue Sep 3 21:32:55 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 03 Sep 2002 16:32:55 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020903183447.GA13310@glacier.arctrix.com> Message-ID: [Neil Schemenauer] > I noticed that as well. When the classifier goes wrong it goes badly > wrong and using different thresholds would not help. It seems that > increasing the number of discriminators doesn't really help either. Too > bad because otherwise you could flag those messages for human > classification. I think it's worse than just that: suppose any scheme says "OK, this is spam, with probability 0.9995". If it's reporting accurate probabilities, then another way to read that claim is "On average, one time in 2000 this message actually isn't spam". In real life we have to accept that there's no scheme with a 0% false positive rate-- not even human review --short of the scheme that never calls anything spam. Since deciding on the largest acceptable false positive rate is far more a social than a technical issue, a group of nerds will do anything rather than face it . From David Abrahams" <200209031653.g83GrjQ01929@odiug.zope.com> <200209032018.g83KI3q08343@odiug.zope.com> Message-ID: <17d001c2538d$f82650f0$1c86db41@boostconsulting.com> Turn it sideways and it'll get smaller... From: "Guido van Rossum" > > > [...] It's also kind of hard to read. [...] > > > > True. But not so difficult to improve. Adding a bit of simplicity yields: > > > > | 84 > > 80 | [] > > | [] > > | [] > > | 71 [] > > | [] 63 [] > > 60 | [] [] [] > > | [] [] [] > > | [] [] [] > > | [] [] [] 47 > > | [] 42 [] [] [] > > 40 | [] [] [] [] 39 [] 41 > > | [] [] [] [] [] [] [] 36 > > | [] [] [] [] 30 [] [] 33 [] [] > > | [] [] [] [] [] [] [] 27 [] [] [] > > | [] 25 [] [] [] [] 21 [] [] [] [] [] [] > > 20 | [] [] [] [] [] [] [] [] [] [] [] [] [] > > | [] [] [] [] [] [] [] [] [] [] [] [] [] > > | [] [] 12 [] [] [] [] [] [] 9 [] [] [] [] [] > > | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] 5 > > | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] > > 0 +---------------------------------------------------------------- > > Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat > > 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 > > Ooh, much better. Still, put this at the end instead of at the top of > the message. It's not *that* interesting. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From skip@pobox.com Tue Sep 3 22:39:01 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 3 Sep 2002 16:39:01 -0500 Subject: [Python-Dev] Two random and nearly unrelated ideas Message-ID: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> While adding a blurb to Misc/NEWS about the change to the thread ticker and check interval, it occurred to me that perhaps Misc/NEWS would benefit from conversion to ReST format. You could pump an HTML version out to the website periodically. Second (also considered during the above edit), it would be nice to get rid of the ticker altogether in systems with proper signal support. On those platforms couldn't an alarm replace polling for the ticker? I know signals are tricky devils, but it still seems it would be a win if you could use it. You'd have to install a SIGALRM handler which would trip periodically. It would also have to keep track of any alarm handler the programmer installed. Just for the heck of it I recompiled ceval.c with the (--_Py_Ticker < 0) block ifdef'd out. Got a 1.7% increase in pystones over the now default checkinterval == 100 situation. Skip From nas@python.ca Tue Sep 3 22:52:51 2002 From: nas@python.ca (Neil Schemenauer) Date: Tue, 3 Sep 2002 14:52:51 -0700 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: References: <20020903183447.GA13310@glacier.arctrix.com> Message-ID: <20020903215251.GA14101@glacier.arctrix.com> Tim Peters wrote: > Since deciding on the largest acceptable false positive rate is far > more a social than a technical issue, a group of nerds will do > anything rather than face it . I think we pretty much ran out of things to do. :-) Still, I think the acceptable rate depends heavily on what happens to the rejects. If they go to /dev/null then it would have to be very low. If there are bounces and a way for the innocent victims to bypass the filter then I consider 0.5% good enough for most situations. The major remaining problem would be handing legitimate automated email. For mailing lists that probably isn't an issue. I'm probably not the guy to listen to about acceptable rates, though. I currently use TMDA and therefore am a heartless bastard. :-) Neil From jeremy@alum.mit.edu Tue Sep 3 22:53:46 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 3 Sep 2002 17:53:46 -0400 Subject: [Python-Dev] mysterious hangs in socket code Message-ID: <15733.12138.568668.562013@slothrop.zope.com> I've been running a small, multi-threaded program to retrieve web pages today. The entire program appears to hang when I perform a slow DNS operation, even there is no application-level coordinate between the threads. The motivation comes from http://www.python.org/sf/591349, but I ended up writing a similar small test script, which I've attached. When I run this program with Python 2.1, it produces a steady stream of output -- urls and the time it took to load them. Most of the pages take less than a second, but some take a very long time. If I run this program with Python 2.2 or 2.3, it produces little bursts of output, then pauses for a long time, then repeats. I believe that the problem relates to DNS lookups, but not in a way I fully understand. If I connect gdb to any of the threads while the program is hung, it is always inside getaddrinfo(). My first realization was that the socketmodule stopped wrapping DNS lookups in By_BEGIN/END_ALLOW_THREADS calls when the IPv6 changes were integrated. But if I restore these calls -- see http://www.python.org/sf/604210 -- I don't see any change in behavior. The program still hangs periodically. One possibility is that the Linux getaddrinfo() is thread-safe, but only by way of a lock that only allows one request to be outstanding at a time. Not sure what the other possibilities are, but the current behavior is awful. Jeremy --------------------------------------------------------------------- import httplib import Queue import random import sys import threading import time import traceback import urlparse headers = {"Accept": "text/plain, text/html, image/jpeg, image/jpg, " "image/gif, image/png, */*"} class URLThread(threading.Thread): def __init__(self, queue): threading.Thread.__init__(self) self._queue = queue self._stopevent = threading.Event() def stop(self): self._stopevent.set() def run(self): while not self._stopevent.isSet(): self.fetch() def fetch(self): url = self._queue.get() t0 = time.time() try: self._fetch(url) except: etype, value, tb = sys.exc_info() L = ["Error occurred fetching %s\n" % url, "%s: %s\n" % (etype, value), ] L += traceback.format_tb(tb) sys.stderr.write("".join(L)) t1 = time.time() print url, round(t1 - t0, 2) def _fetch(self, url): parts = urlparse.urlparse(url) host = parts[1] path = parts[2] h = httplib.HTTPConnection(host) h.connect() h.request("GET", path, headers=headers) r = h.getresponse() r.read() h.close() urls = """\ http://www.andersen.com/ http://www.google.com/ http://www.google.com/images/logo.gif http://www.microsoft.com/ http://www.microsoft.com/homepage/gif/bnr-microsoft.gif http://www.microsoft.com/homepage/gif/1ptrans.gif http://www.microsoft.com/library/toolbar/images/curve.gif http://www.yahoo.com/ http://www.sourceforge.net/ http://www.slashdot.org/ http://www.kuro5hin.org/ http://www.intel.com/ http://www.aol.com/ http://www.amazon.com/ http://www.cnn.com/ http://money.cnn.com/ http://www.expedia.com/ http://www.tripod.com/ http://www.hotmail.com/ http://www.angelfire.com/ http://www.excite.com/ http://www.verisign.com/ http://www.riaa.com/ http://www.enron.com/ http://www.securityspace.com/ http://www.directv.com/ http://www.att.com/ http://www.qwest.com/ http://www.covad.com/ http://www.sprint.com/ http://www.mci.com/ http://www.worldcom.com/ """ urls = [u for u in urls.split("\n") if u] REPEAT = 10 THREADS = 8 class RandomQueue: def __init__(self, L): self.list = L def get(self): return random.choice(self.list) if __name__ == "__main__": urlq = RandomQueue(urls) sys.setcheckinterval(10) threads = [] for i in range(THREADS): t = URLThread(urlq) t.start() threads.append(t) while 1: try: time.sleep(30) except: break print "Shutting down threads..." for t in threads: t.stop() for t in threads: t.join() From drifty@bigfoot.com Wed Sep 4 00:00:52 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Tue, 3 Sep 2002 16:00:52 -0700 (PDT) Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <200209032018.g83KI3q08343@odiug.zope.com> Message-ID: [Guido van Rossum] > > > [...] It's also kind of hard to read. [...] > > > > True. But not so difficult to improve. Adding a bit of simplicity yields: > > > > | 84 > > 80 | [] > > | [] > > | [] > > | 71 [] > > | [] 63 [] > > 60 | [] [] [] > > | [] [] [] > > | [] [] [] > > | [] [] [] 47 > > | [] 42 [] [] [] > > 40 | [] [] [] [] 39 [] 41 > > | [] [] [] [] [] [] [] 36 > > | [] [] [] [] 30 [] [] 33 [] [] > > | [] [] [] [] [] [] [] 27 [] [] [] > > | [] 25 [] [] [] [] 21 [] [] [] [] [] [] > > 20 | [] [] [] [] [] [] [] [] [] [] [] [] [] > > | [] [] [] [] [] [] [] [] [] [] [] [] [] > > | [] [] 12 [] [] [] [] [] [] 9 [] [] [] [] [] > > | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] 5 > > | [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] > > 0 +---------------------------------------------------------------- > > Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat > > 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 > > Ooh, much better. Still, put this at the end instead of at the top of > the message. It's not *that* interesting. > How about I just get rid of it? It is only in there because Michael had it in his summaries. Actually, the entire header (from the first line to first summary) is there just because Michael had it there. I personally am happy keeping the header as it is sans this count; I know I had to read a lot of emails but I don't think anyone else cares. =) -Brett From jason-exp-1031786493.04d3ca@mastaler.com Wed Sep 4 00:28:24 2002 From: jason-exp-1031786493.04d3ca@mastaler.com (jason-exp-1031786493.04d3ca@mastaler.com) Date: Tue, 03 Sep 2002 17:28:24 -0600 Subject: [Python-Dev] Re: The first trustworthy GBayes results References: <20020903134112.GC1227@cthulhu.gerg.ca> <20020903183447.GA13310@glacier.arctrix.com> Message-ID: Neil Schemenauer writes: > I bring this up because "STMP time filtering" makes a bypass > mechanism work much better. With a system like TMDA, confirmation > notices usually generate double-bounces. Instead, we could reject > the message with a 5xx error that includes instructions on how to > bypass the filter (e.g. include a cookie in the body of the > message). TMDA doesn't do this because it would make more work for the sender to get his message delivered. Because TMDA stores the incoming messages in a local queue, the sender just has to reply to a confirmation request, and his original message gets delivered. As opposed to having to cut and paste his message from the body of a bounce and then resend it. So, not operating at the transport level saves your correspondents some work at the expense of some bandwidth. -- (http://tmda.net/) From aahz@pythoncraft.com Wed Sep 4 00:49:01 2002 From: aahz@pythoncraft.com (Aahz) Date: Tue, 3 Sep 2002 19:49:01 -0400 Subject: [Python-Dev] mysterious hangs in socket code In-Reply-To: <15733.12138.568668.562013@slothrop.zope.com> References: <15733.12138.568668.562013@slothrop.zope.com> Message-ID: <20020903234901.GA29756@panix.com> On Tue, Sep 03, 2002, Jeremy Hylton wrote: > > I've been running a small, multi-threaded program to retrieve web > pages today. The entire program appears to hang when I perform a slow > DNS operation, even there is no application-level coordinate between > the threads. gethostbyname() IIRC has frequently been non-reentrant. it might be related. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From pinard@iro.umontreal.ca Wed Sep 4 01:31:50 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Tue, 03 Sep 2002 20:31:50 -0400 Subject: [Python-Dev] Nit about `setdefault' documentation Message-ID: Quite a small nit. Reading: ----------------------------------------------------------------------> >>> help({}.setdefault) Help on built-in function setdefault: setdefault(...) D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if not D.has_key(k) ----------------------------------------------------------------------< I wonder if writing the last line as: ----------------------------------------------------------------------> D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D ----------------------------------------------------------------------< would not better represent Python current fashion. :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From sholden@holdenweb.com Wed Sep 4 01:31:52 2002 From: sholden@holdenweb.com (Steve Holden) Date: Tue, 3 Sep 2002 20:31:52 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 References: <200209031653.g83GrjQ01929@odiug.zope.com> <200209032018.g83KI3q08343@odiug.zope.com> <17d001c2538d$f82650f0$1c86db41@boostconsulting.com> Message-ID: <008201c253aa$780144d0$6300000a@holdenweb.com> [Guido] > > > > Ooh, much better. Still, put this at the end instead of at the top of > > the message. It's not *that* interesting. > > [David] > Turn it sideways and it'll get smaller... > ... but no more interesting. Couldn't we just have a web page where this statistic was available slided and diced according to requirements? It looks especially bad in my standard mailreader variable-pitch font. The summary itself, however, looks excellent. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming pydish.holdenweb.com/pwp/ Previous .sig file retired to www.homeforoldsigs.com ----------------------------------------------------------------------- From tim.one@comcast.net Wed Sep 4 02:06:43 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 03 Sep 2002 21:06:43 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020903134112.GC1227@cthulhu.gerg.ca> Message-ID: [Greg Ward] > ... > Just how many messages fall in that grey area anyways? Heh. Here's the probability distribution for the 4000 ham messages in my first test pair: Ham distribution for this pair: * = 67 items 0.00 4000 ************************************************************ 2.50 0 5.00 0 7.50 0 10.00 0 12.50 0 15.00 0 17.50 0 20.00 0 22.50 0 25.00 0 27.50 0 30.00 0 32.50 0 35.00 0 37.50 0 40.00 0 42.50 0 45.00 0 47.50 0 50.00 0 52.50 0 55.00 0 57.50 0 60.00 0 62.50 0 65.00 0 67.50 0 70.00 0 72.50 0 75.00 0 77.50 0 80.00 0 82.50 0 85.00 0 87.50 0 90.00 0 92.50 0 95.00 0 97.50 0 That is, they *all* got a "probability score" less than 2.5% (0.025). Here's the spam probability distribution across the same run: Spam distribution for this pair: * = 46 items 0.00 5 * 2.50 2 * 5.00 1 * 7.50 0 10.00 0 12.50 0 15.00 1 * 17.50 0 20.00 1 * 22.50 0 25.00 2 * 27.50 1 * 30.00 0 32.50 1 * 35.00 0 37.50 0 40.00 0 42.50 0 45.00 1 * 47.50 1 * 50.00 1 * 52.50 0 55.00 0 57.50 1 * 60.00 3 * 62.50 0 65.00 2 * 67.50 0 70.00 0 72.50 0 75.00 1 * 77.50 1 * 80.00 0 82.50 0 85.00 0 87.50 0 90.00 3 * 92.50 1 * 95.00 6 * 97.50 2715 ************************************************************ IOW, a spam usually scored at least 0.975 on this run, but some spams scored under 0.025. There's very little "in the middle". I've got 19 more sets like this if you care a lot . Here's the aggregate across all 20 runs (each msg is counted 4 times here, once for each of the runs in which it served in the prediction set against training on one of the 4 spam+ham collection pairs it doesn't belong to): Ham distribution for all runs: * = 1333 items 0.00 79938 ************************************************************ 2.50 8 * 5.00 3 * 7.50 0 10.00 3 * 12.50 1 * 15.00 3 * 17.50 1 * 20.00 1 * 22.50 0 25.00 0 27.50 0 30.00 1 * 32.50 4 * 35.00 2 * 37.50 0 40.00 2 * 42.50 0 45.00 1 * 47.50 1 * 50.00 1 * 52.50 0 55.00 0 57.50 0 60.00 0 62.50 1 * 65.00 0 67.50 0 70.00 2 * 72.50 0 75.00 1 * 77.50 1 * 80.00 0 82.50 0 85.00 1 * 87.50 1 * 90.00 0 92.50 1 * 95.00 1 * 97.50 21 * Spam distribution for all runs: * = 905 items 0.00 215 * 2.50 18 * 5.00 8 * 7.50 12 * 10.00 6 * 12.50 6 * 15.00 14 * 17.50 6 * 20.00 10 * 22.50 8 * 25.00 9 * 27.50 9 * 30.00 3 * 32.50 3 * 35.00 5 * 37.50 3 * 40.00 7 * 42.50 24 * 45.00 3 * 47.50 29 * 50.00 34 * 52.50 8 * 55.00 6 * 57.50 18 * 60.00 64 * 62.50 12 * 65.00 7 * 67.50 5 * 70.00 3 * 72.50 7 * 75.00 4 * 77.50 18 * 80.00 10 * 82.50 23 * 85.00 13 * 87.50 20 * 90.00 27 * 92.50 18 * 95.00 57 * 97.50 54256 ************************************************************ In percentage terms, very little lives outside the tips of the tail ends. Note that calling the spam cutoff 0.975 instead of 0.90 would save 2 false positives, at the expense of letting an additional 27+18+57 = 102 spams go thru. Here's the first example of a low-prob spam: """ Low prob spam! 0.0133104753792 Data/Spam/Set2/8007.txt prob('from:email name:') = 0.0488301 prob('thanks,') = 0.0300188 prob('subject:Hey') = 0.99 prob('today') = 0.852792 Return-Path: Delivered-To: bruce-spam@localhost Received: (qmail 14409 invoked by alias); 6 Mar 2002 20:07:42 -0000 Delivered-To: spam@bruce-guenter.dyndns.org Received: (qmail 14405 invoked from network); 6 Mar 2002 20:07:42 -0000 Received: from agamemnon.bfsmedia.com (204.83.201.2) by lorien.untroubled.org (192.168.1.3) with SMTP; 06 Mar 2002 20:07:42 -0000 Received: (qmail 13063 invoked by uid 500); 6 Mar 2002 20:02:05 -0000 Delivered-To: em-ca-spam@em.ca Received: (qmail 13057 invoked by uid 502); 6 Mar 2002 20:02:05 -0000 Delivered-To: bfsmedia-goose.kennels@bfsmedia.com Received: (qmail 13051 invoked from network); 6 Mar 2002 20:02:05 -0000 Received: from unknown (HELO smtp2.forserve.com) (63.170.11.221) by agamemnon.bfsmedia.com with SMTP; 6 Mar 2002 20:02:05 -0000 Date: Wed, 6 Mar 2002 15:12:41 -0500 Message-Id: <200203062012.g26KCfn08192@smtp2.forserve.com> X-Mailer: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.1) Gecko/20010607 Reply-To: From: To: Subject: Hey Fred Content-Length: 95 Lines: 9 Fred, It was nice to talk to you today I will send the proposal tonight. Thanks, Heidi """ You figure it out . I suspect bfsmedia would have added a high spam score if I looked at Received lines, but even several additional strong spam indicators wouldn't be enough to nail this one. BTW, this msg shows up many times in the spam corpora, varying the "Fred" and "Heidi" with other male and female names; I assume this is a harvester that's trying to provoke the recipient into replying. Several others are damaged in ways such that the email pkg can't create a msg out of them. I could easily enough add code to force such a msg to be considered spam. Some are wildly embarrassing failures: """ Low prob spam! 0.000102019995919 Data/Spam/Set3/681.txt prob('common,') = 0.01 prob('definately') = 0.01 prob('logic') = 0.01 prob('hell,') = 0.01 prob('it".') = 0.01 prob('obvious.') = 0.01 prob('theory') = 0.01 prob('whilst') = 0.01 prob('earning') = 0.99 prob('same,') = 0.01 prob('$500,000') = 0.99 prob('"bull",') = 0.99 prob('year!!!') = 0.99 prob('internet!') = 0.99 prob('tv:') = 0.99 prob('*this') = 0.99 Return-Path: Delivered-To: em-ca-bruceg@em.ca Received: (qmail 25721 invoked from network); 17 Aug 2002 01:05:07 -0000 Received: from unknown (HELO 65.102.48.161) (65.102.48.161) by churchill.factcomp.com with SMTP; 17 Aug 2002 01:05:07 -0000 Received: from unknown (149.89.93.47) by rly-xr02.mx.aol.com with NNFMP; Aug, 17 2002 1:50:22 AM -0800 Received: from anther.webhostingtalk.com ([88.58.121.118]) by da001d2020.lax-ca.osd.concentric.net with QMQP; Aug, 17 2002 12:40:13 AM -0700 Received: from 34.57.158.148 ([34.57.158.148]) by rly-xr02.mx.aol.com with local; Aug, 17 2002 12:02:05 AM +0300 From: rnpyjohn To: Undisclosed Recipients Cc: Subject: Please read this letter carefully, it works 100% Sender: rnpyjohn Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Sat, 17 Aug 2002 02:03:28 +0100 X-Mailer: The Bat! (v1.52f) Business X-Priority: 1 Content-Length: 15985 *This is a one time mailing and this list will never be used again.* Hi, SEEN THIS MAIL BEFORE?, SICK OF FINDING IT IN YOUR INBOX? ME TOO, HONEST I was exactly the same, till one day whilst i was complaining about how tired i was of seeing ... """ The first 16 most extreme indicators are split 9 highly in favor of ham (.01) and 7 highly in favor of spam (.99). If I hadn't folded case away to let stinking conference announcements through , I expect it would have latched on to the SCREAMING at the start instead of looking deeper. Looking at the To: line probably would nail this one too, as "Undisclosed Recipients" has two 0.99 spam indicators right there. Whatever, you *don't* want to look at msgs with a mix of just 0.99 and 0.01 thingies: it's not all that unusual to get such an extreme mix, in spam or ham. this-isn't-your-father's-idea-of-probability-ly y'rs - tim From barry@python.org Wed Sep 4 02:35:27 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 3 Sep 2002 21:35:27 -0400 Subject: [Python-Dev] mysterious hangs in socket code References: <15733.12138.568668.562013@slothrop.zope.com> Message-ID: <15733.25439.461968.51583@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I've been running a small, multi-threaded program to retrieve JH> web pages today. The entire program appears to hang when I JH> perform a slow DNS operation, even there is no JH> application-level coordinate between the threads. Does strace'ing the program provide any clues? Also, if it's a DNS thing, you should definitely try to run it on different networks (or at least pointing to different DNS servers). Ok, running it now as "strace python foo.py" (Py2.2.1) and I see similar behavior. It seems to mostly be sitting in select() calls and rt_sigsuspend() which I guess is a wrapper around sigsuspend(). When I use Python 2.1.3 I never see it sit in sigsuspend(). -Barry From pinard@iro.umontreal.ca Wed Sep 4 02:39:44 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Tue, 03 Sep 2002 21:39:44 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <008201c253aa$780144d0$6300000a@holdenweb.com> ("Steve Holden"'s message of "Tue, 3 Sep 2002 20:31:52 -0400") References: <200209031653.g83GrjQ01929@odiug.zope.com> <200209032018.g83KI3q08343@odiug.zope.com> <17d001c2538d$f82650f0$1c86db41@boostconsulting.com> <008201c253aa$780144d0$6300000a@holdenweb.com> Message-ID: [Steve Holden] > It looks especially bad in my standard mailreader variable-pitch font. Oh! You are touching a sensible nerve! :-) There are many cases where people do ASCII art in messages, and I'm not speaking of signatures here. People often insert ASCII tables or simple explicative drawings, these capabilities are useful enough for not being dismissed. You should use fixed width fonts when receiving, and even when sending email. (And people should limit their messages to 79 columns.) If something looks bad because of your variable-pitch fonts, the problem is emphatically _not_ in the sent message, and does not justify any alteration to the format of those messages. Another example is the fact that many fonts nowadays decided to improve over ASCII, and have an apostrophe which is not symmetrical to a grave accent. By design and since ASCII 1, long ago, they should be symmetrical. A few people push for everybody to stop `quoting' like this. I strongly believe that for displaying ASCII text, people should use ASCII fonts. If fonts are wrong, and despite many fonts are wrong, this should not be seen as the sender problem. The push is sometimes accompanied with the suggestion of switching to Unicode all over, as a way to avoid the problem. It is surely a good idea, but we are not there yet. In the meantime, ASCII stays ASCII. -- François Pinard http://www.iro.umontreal.ca/~pinard From fredrik@pythonware.com Wed Sep 4 06:41:42 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 4 Sep 2002 07:41:42 +0200 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 References: <200209031653.g83GrjQ01929@odiug.zope.com><200209032018.g83KI3q08343@odiug.zope.com><17d001c2538d$f82650f0$1c86db41@boostconsulting.com><008201c253aa$780144d0$6300000a@holdenweb.com> Message-ID: <005601c253d5$d0a63c50$ced241d5@hagrid> Fran=E7ois Pinard wrote: > [Steve Holden] > > > It looks especially bad in my standard mailreader variable-pitch font. > > Oh! You are touching a sensible nerve! :-) > > There are many cases where people do ASCII art in messages, and I'm not > speaking of signatures here. People often insert ASCII tables or simpl= e > explicative drawings, these capabilities are useful enough for not bein= g > dismissed. You should use fixed width fonts when receiving, and even w= hen > sending email. loser. if python really was all about "everything computers did when I learned to use them will always be the best way to do it", it would probably never have been invented. and this mailing list is about python. From oren-py-d@hishome.net Wed Sep 4 10:49:47 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 4 Sep 2002 05:49:47 -0400 Subject: [Python-Dev] Two random and nearly unrelated ideas In-Reply-To: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> Message-ID: <20020904094947.GA56953@hishome.net> On Tue, Sep 03, 2002 at 04:39:01PM -0500, Skip Montanaro wrote: > Second (also considered during the above edit), it would be nice to get rid > of the ticker altogether in systems with proper signal support. On those > platforms couldn't an alarm replace polling for the ticker? Not before all all Python I/O calls are converted to be EINTR-safe. After running into some problems with I/O interrupted by signals I tried to fix it myself but it requires a lot of work in some of the hairiest places in the Python codebase. Oren From fredrik@pythonware.com Wed Sep 4 12:22:26 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 4 Sep 2002 13:22:26 +0200 Subject: [Python-Dev] Two random and nearly unrelated ideas References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> Message-ID: <001b01c25405$5a6da520$0900a8c0@spiff> oren wrote: > Not before all all Python I/O calls are converted to be EINTR-safe. >=20 > After running into some problems with I/O interrupted by signals I = tried to > fix it myself but it requires a lot of work in some of the hairiest = places=20 > in the Python codebase. sounds like a good topic for a "here's what I learned when trying to fix this problem" PEP. From guido@python.org Wed Sep 4 12:24:16 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 07:24:16 -0400 Subject: [Python-Dev] Should KeyError use repr() on its argument? In-Reply-To: Your message of "Tue, 03 Sep 2002 16:29:32 PDT." References: Message-ID: <200209041124.g84BOHY03377@pcp02138704pcs.reston01.va.comcast.net> > > > The KeyError exception doesn't apply repr() to its argument. That's > > > annoying in cases like this: > > > > > > >>> a = {} > > > >>> a[''] > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > KeyError > > > >>> > > > > > > Should this be fixed? How? (I guess we could add a KeyError__str__ > > > method to exceptions.c that applies repr().) > > > > > > I've got a feeling this is a feature, but not a very useful one. > > > > I take it back. args[0] being the actual key that failed is a > > feature. str() not using repr() on args[0] is a bug. I'll fix it. > > > > What is args[0]? args is the name of the instance variable that most exceptions use to store the arguments that were passed to them in the raise statement (or equivalent C API). It is a tuple. Examples: >>> a = KeyError() >>> a.args () >>> a = KeyError(1) >>> a.args (1,) >>> a = KeyError(1,2,3) >>> a.args (1, 2, 3) >>> try: {}[''] except KeyError, k: print k.args ('',) >>> > Are you saying that dicts use repr() instead of str() to > get the key value when accessing? No, I'm saying that str(KeyError('foo')) should return repr('foo') rather than 'foo' as it does now. See current CVS. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Sep 4 12:44:32 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 07:44:32 -0400 Subject: [Python-Dev] Two random and nearly unrelated ideas In-Reply-To: Your message of "Wed, 04 Sep 2002 05:49:47 EDT." <20020904094947.GA56953@hishome.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> Message-ID: <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> > > Second (also considered during the above edit), it would be nice to get rid > > of the ticker altogether in systems with proper signal support. On those > > platforms couldn't an alarm replace polling for the ticker? > > Not before all all Python I/O calls are converted to be EINTR-safe. > > After running into some problems with I/O interrupted by signals I tried to > fix it myself but it requires a lot of work in some of the hairiest places > in the Python codebase. Signals: just say no. It is impossible to write correct code in the presence of signals. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Sep 4 12:49:15 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 07:49:15 -0400 Subject: [Python-Dev] mysterious hangs in socket code In-Reply-To: Your message of "Tue, 03 Sep 2002 17:53:46 EDT." <15733.12138.568668.562013@slothrop.zope.com> References: <15733.12138.568668.562013@slothrop.zope.com> Message-ID: <200209041149.g84BnFV05659@pcp02138704pcs.reston01.va.comcast.net> > One possibility is that the Linux getaddrinfo() is thread-safe, but > only by way of a lock that only allows one request to be outstanding > at a time. The next step should be to get the getaddrinfo() source code from glibc and see what it does. It's open source, hey. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Sep 4 12:51:10 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 07:51:10 -0400 Subject: [Python-Dev] Two random and nearly unrelated ideas In-Reply-To: Your message of "Tue, 03 Sep 2002 16:39:01 CDT." <15733.11253.743055.864572@12-248-11-90.client.attbi.com> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> Message-ID: <200209041151.g84BpAg05683@pcp02138704pcs.reston01.va.comcast.net> > While adding a blurb to Misc/NEWS about the change to the thread > ticker and check interval, it occurred to me that perhaps Misc/NEWS > would benefit from conversion to ReST format. You could pump an > HTML version out to the website periodically. Nice idea. How much additional mark-up would this add to quote the occasional reST meta-character? Can you convert a section for test and show me? > Second (also considered during the above edit), it would be nice to > get rid of the ticker altogether in systems with proper signal > support. On those platforms couldn't an alarm replace polling for > the ticker? I know signals are tricky devils, but it still seems it > would be a win if you could use it. You'd have to install a SIGALRM > handler which would trip periodically. It would also have to keep > track of any alarm handler the programmer installed. -1,000,000. --Guido van Rossum (home page: http://www.python.org/~guido/) From praveen.patil@silver-software.com Wed Sep 4 13:31:00 2002 From: praveen.patil@silver-software.com (Praveen Patil) Date: Wed, 4 Sep 2002 13:31:00 +0100 Subject: [Python-Dev] Please help in calling python fucntion from 'c' Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0011_01C25417.4EC8F910 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Hi, I have written 'C' dll(MY_DLL.DLL) . I am importing 'C' dll in python file(example.py). I want to call python function from 'c' function. For your reference I have attached 'c' and python files to this mail. In my pc: python code is under the directory D:\test\example.py dll is under the directory C:\Program Files\Python\DLLs\MY_DLL.pyd Here are the steps I am following. step(1): I am calling 'C' function(RECEIVE_FROM_IL_S) from python. This 'C' function is existing imported dll(MY_DLL). step(2): I want to call python function(TestFunction) from 'C' function(RECEIVE_FROM_IL_S). Python code is(example.py) :- ---------------------------- import MY_DLL G_Logfile = None def TestFunction(): G_Logfile = open('Pytestfile.txt', 'w') G_Logfile.write("%s \n"%'I am writing python created text file') G_Logfile.close G_Logfile = None #end def TestFunction if __name__ == "__main__": MY_DLL.RECEIVE_FROM_IL_S(10,50) 'C' code is (MY_DLL.c) :- --------------------- #include #include #include PyObject* _wrap_RECEIVE_FROM_IL_S(PyObject *self, PyObject *args) { FILE* fp; PyObject* _resultobj; int i,j; if( !(PyArg_ParseTuple(args, "ii",&i,&j))) { return NULL; } fp= fopen("RECEIVE_IL_S.txt", "w"); fprintf(fp, "i=%d j=%d" , i,j); fclose(fp); /* Here I want to call python function(TestFunction). Please suggest me some solution*/ _resultobj = Py_None; return _resultobj; } static PyMethodDef MY_DLL_methods[] = { { "RECEIVE_FROM_IL_S", _wrap_RECEIVE_FROM_IL_S, METH_VARARGS }, { NULL , NULL} }; __declspec(dllexport) void __cdecl initMY_DLL(void) { Py_InitModule("MY_DLL",MY_DLL_methods); } Please anybody help me solving the problem. Cheers, Praveen. ------=_NextPart_000_0011_01C25417.4EC8F910 Content-Type: text/plain; name="exampl.py" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="exampl.py" import MY_DLL G_Logfile = None def TestFunction(): G_Logfile = open('Pytestfile.txt', 'w') G_Logfile.write("%s \n"%'I am writing python created text file') G_Logfile.close G_Logfile = None #end def TestFunction if __name__ == "__main__": MY_DLL.RECEIVE_FROM_IL_S(10,50) ------=_NextPart_000_0011_01C25417.4EC8F910 Content-Type: application/octet-stream; name="MY_DLL.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="MY_DLL.c" #include #include #include PyObject* _wrap_RECEIVE_FROM_IL_S(PyObject *self, PyObject *args) { FILE* fp; =20 PyObject* _resultobj; int i,j; =20 if( !(PyArg_ParseTuple(args, "ii",&i,&j))) { return NULL; } fp=3D fopen("RECEIVE_IL_S.txt", "w"); fprintf(fp, "i=3D%d j=3D%d" , i,j); fclose(fp); /* Here I want to call python function(TestFunction). Please suggest = me some solution*/ _resultobj =3D Py_None; return _resultobj; } static PyMethodDef MY_DLL_methods[] =3D { { "RECEIVE_FROM_IL_S", _wrap_RECEIVE_FROM_IL_S, METH_VARARGS }, { NULL , NULL} }; __declspec(dllexport) void __cdecl initMY_DLL(void) { Py_InitModule("MY_DLL",MY_DLL_methods); } ------=_NextPart_000_0011_01C25417.4EC8F910 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline [ The information contained in this e-mail is confidential and is intended for the named recipient only. If you are not the named recipient, please notify us by telephone on +44 (0)1249 442 430 immediately, destroy the message and delete it from your computer. Silver Software has taken every reasonable precaution to ensure that any attachment to this e-mail has been checked for viruses. However, we cannot accept liability for any damage sustained as a result of any such software viruses and advise you to carry out your own virus check before opening any attachment. Furthermore, we do not accept responsibility for any change made to this message after it was sent by the sender.] ------=_NextPart_000_0011_01C25417.4EC8F910-- From oren-py-d@hishome.net Wed Sep 4 13:46:46 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 4 Sep 2002 08:46:46 -0400 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020904124646.GA79746@hishome.net> On Wed, Sep 04, 2002 at 07:44:32AM -0400, Guido van Rossum wrote: > > > Second (also considered during the above edit), it would be nice to get rid > > > of the ticker altogether in systems with proper signal support. On those > > > platforms couldn't an alarm replace polling for the ticker? > > > > Not before all all Python I/O calls are converted to be EINTR-safe. > > > > After running into some problems with I/O interrupted by signals I tried to > > fix it myself but it requires a lot of work in some of the hairiest places > > in the Python codebase. > > Signals: just say no. It is impossible to write correct code in the > presence of signals. Wrapping all I/O calls with PyOS_ wrappers would be a good start. After that the wrappers can be modified to retry the call on EINTR. This should solve all the problems I have encountered with interference to Python code by signals. Any other problems I should be aware of? Oren From guido@python.org Wed Sep 4 14:25:01 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 09:25:01 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Wed, 04 Sep 2002 08:46:46 EDT." <20020904124646.GA79746@hishome.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> Message-ID: <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> > > Signals: just say no. It is impossible to write correct code in the > > presence of signals. > > Wrapping all I/O calls with PyOS_ wrappers would be a good start. And what should those wrappers do? > After that the wrappers can be modified to retry the call on EINTR. But that's not always what you want to happen! E.g. if an app is blocked on a read and uses an alarm to bail out of the read. > This should solve all the problems I have encountered with > interference to Python code by signals. Any other problems I should > be aware of? There's no way to sufficiently test a program that uses signals. The signal handler cannot touch *any* data, which makes it pretty useless. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Sep 4 15:45:51 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 4 Sep 2002 09:45:51 -0500 Subject: [Python-Dev] Two random and nearly unrelated ideas In-Reply-To: <20020904094947.GA56953@hishome.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> Message-ID: <15734.7327.163001.51042@12-248-11-90.client.attbi.com> >> On those platforms couldn't an alarm replace polling for the ticker? Oren> Not before all all Python I/O calls are converted to be Oren> EINTR-safe. Ah, yes. Thanks for pointing out that little stumbling block... Skip From oren-py-d@hishome.net Wed Sep 4 17:01:43 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 4 Sep 2002 12:01:43 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020904160143.GA1483@hishome.net> On Wed, Sep 04, 2002 at 09:25:01AM -0400, Guido van Rossum wrote: > > After that the wrappers can be modified to retry the call on EINTR. > > But that's not always what you want to happen! E.g. if an app is > blocked on a read and uses an alarm to bail out of the read. If I use a module that spawns an external process and uses SIGCHLD to be informed of its termination why should my innocent code that just reads lines from a file suddenly break? In C I can at least restart the operation after an EINTR but file.readline cannot even be properly restarted because the buffering and file position is all messed up. The example you gave of bailing out of a read with a signal can be done using other techniques such as non-blocking I/O (which is, IMHO, a much cleaner way to do it). Getting an notification of a child process terminating or other asynchronous events can only be done using signals and is currently dangerous because it will break code using I/O. > > interference to Python code by signals. Any other problems I should > > be aware of? > > There's no way to sufficiently test a program that uses signals. The > signal handler cannot touch *any* data, which makes it pretty useless. In order to be useful a signal handler needs to be able to set one bit. The next time the ticker expires this bit will be checked. If an I/O operation was interrupted the Python signal handler can be executed immediately from the wrapper. When it returns the wrapper will resume the interrupted operation. Oren I/O, I/O, it's off to work we go... The seven dwarfs From oren-py-d@hishome.net Wed Sep 4 19:51:31 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 4 Sep 2002 21:51:31 +0300 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: ; from bac@OCF.Berkeley.EDU on Wed, Sep 04, 2002 at 11:04:27AM -0700 References: <20020904094947.GA56953@hishome.net> Message-ID: <20020904215131.A12898@hishome.net> On Wed, Sep 04, 2002 at 11:04:27AM -0700, Brett Cannon wrote: > [Oren Tirosh] > > > > > > Not before all all Python I/O calls are converted to be EINTR-safe. > > what is EINTER-safe? When an I/O operation is interrupted by an unmasked signal it returns with errno==EINTR. The state of the file is not affected and repeating the operation should recover and continue with no loss of data. Here is an EINTR-safe version of read: ssize_t safe_read(int fd, void *buf, size_t count) { ssize_t result; do { result = read(fd, buf, count); } while (result == -1 && errno == EINTR); return result; } When exposing the C I/O calls to Python you can either: 1. Use EINTR-safe I/O and hide this from the user. 2. Pass on EINTR to the user. Python currently does #2 with a big caveat - the internal buffering of functions like file.read or file.readline is messed up and cannot be cleanly restarted. This makes signals unusable for delivery of asynchronous events in the background without affecting the state of the main program. Oren From guido@python.org Wed Sep 4 20:10:15 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 15:10:15 -0400 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Wed, 04 Sep 2002 21:51:31 +0300." <20020904215131.A12898@hishome.net> References: <20020904094947.GA56953@hishome.net> <20020904215131.A12898@hishome.net> Message-ID: <200209041910.g84JAGR08004@pcp02138704pcs.reston01.va.comcast.net> From guido@python.org Wed Sep 4 20:16:25 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 15:16:25 -0400 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Wed, 04 Sep 2002 21:51:31 +0300." <20020904215131.A12898@hishome.net> References: <20020904094947.GA56953@hishome.net> <20020904215131.A12898@hishome.net> Message-ID: <200209041916.g84JGPd08031@pcp02138704pcs.reston01.va.comcast.net> > > what is EINTER-safe? > > When an I/O operation is interrupted by an unmasked signal it returns > with errno==EINTR. The state of the file is not affected and repeating > the operation should recover and continue with no loss of data. What if the operation is a select() call? Is restarting the right thing? How to take into account the consumed portion of the timeout, if given? > Here is an EINTR-safe version of read: > > ssize_t safe_read(int fd, void *buf, size_t count) { > ssize_t result; > do { > result = read(fd, buf, count); > } while (result == -1 && errno == EINTR); > return result; > } > > When exposing the C I/O calls to Python you can either: > > 1. Use EINTR-safe I/O and hide this from the user. > 2. Pass on EINTR to the user. > > Python currently does #2 with a big caveat - the internal buffering > of functions like file.read or file.readline is messed up and cannot be > cleanly restarted. This makes signals unusable for delivery of asynchronous > events in the background without affecting the state of the main program. Can you point to a place in the code where this is happening? Or is this a stdio problem? I believe that calls like fgets() and getchar() don't lose data, but maybe I misunderstand your observation. As I said before, I'm very skeptical that making the I/O ops EINTR-safe would be enough to allow the use of signals as siggested by Skip, but that might still be useful for other purposes, *if* we can decide when to honor EINTR and when not. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Wed Sep 4 20:22:47 2002 From: nas@python.ca (Neil Schemenauer) Date: Wed, 4 Sep 2002 12:22:47 -0700 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209041916.g84JGPd08031@pcp02138704pcs.reston01.va.comcast.net> References: <20020904094947.GA56953@hishome.net> <20020904215131.A12898@hishome.net> <200209041916.g84JGPd08031@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020904192247.GA16797@glacier.arctrix.com> Guido van Rossum wrote: > What if the operation is a select() call? Is restarting the right > thing? How to take into account the consumed portion of the timeout, > if given? I think you would not restart select(). It's only a hint anyhow. Neil From oren-py-d@hishome.net Wed Sep 4 21:07:09 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 4 Sep 2002 23:07:09 +0300 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209041916.g84JGPd08031@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Wed, Sep 04, 2002 at 03:16:25PM -0400 References: <20020904094947.GA56953@hishome.net> <20020904215131.A12898@hishome.net> <200209041916.g84JGPd08031@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020904230709.A24623@hishome.net> On Wed, Sep 04, 2002 at 03:16:25PM -0400, Guido van Rossum wrote: > > When an I/O operation is interrupted by an unmasked signal it returns > > with errno==EINTR. The state of the file is not affected and repeating > > the operation should recover and continue with no loss of data. > > What if the operation is a select() call? Is restarting the right > thing? How to take into account the consumed portion of the timeout, > if given? Some versions of select update the timeout structure to the remainder if they are interrupted by a signal. It's probably not a good idea to rely on this so gettimeofday could be used to calculate the remainder. > Or is this a stdio problem? I believe that calls like fgets() and > getchar() don't lose data, but maybe I misunderstand your observation. This is not the point - even if Python I/O calls were fully restartable would you actually expect people to check for EINTR and restart for *every* I/O operation in the program just in case some module happens to use signals? Instead of for line in file: do_something_with(line) we would need to write while 1: try: line = file.next() except IOError, exc: if exc.errno == errno.EINTR: continue else: raise except StopIteration: break do_something_with(line) > As I said before, I'm very skeptical that making the I/O ops > EINTR-safe would be enough to allow the use of signals as suggested by > Skip If it's good enough for other purposes it should be good enough for Skip's proposal, too. > Skip, but that might still be useful for other purposes, *if* we can > decide when to honor EINTR and when not. Only low-level functions like os.read and os.write that map directly to stdio functions should ever return EINTR. To make Python signal-safe all other calls that can return EINTR should have a retry loop. On EINTR they should check if there are things to do and if so grab the GIL, make pending calls, release the GIL and retry the operation (unless an exception has been raised by the signal handler, of course). This way I could finally write a Python daemon that reloads its configuration files on getting the customary SIGHUP :-) Oren From guido@python.org Wed Sep 4 21:05:22 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 16:05:22 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Wed, 04 Sep 2002 12:01:43 EDT." <20020904160143.GA1483@hishome.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> <20020904160143.GA1483@hishome.net> Message-ID: <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> > If I use a module that spawns an external process and uses SIGCHLD to be > informed of its termination why should my innocent code that just reads > lines from a file suddenly break? In C I can at least restart the > operation after an EINTR but file.readline cannot even be properly > restarted because the buffering and file position is all messed up. I have never understood why a child dying should send a signal. You can poll for the child with waitpid() instead. But if you have a suggestion for how to fix this particular issue, I'd be happy to look it over, since this *is* something some people do. > The example you gave of bailing out of a read with a signal can be done > using other techniques such as non-blocking I/O (which is, IMHO, a much > cleaner way to do it). Yes. > Getting an notification of a child process terminating or other > asynchronous events can only be done using signals and is currently > dangerous because it will break code using I/O. See above. I see half your point; people wanting this tend to use signals and it causes breakage. > > > interference to Python code by signals. Any other problems I should > > > be aware of? > > > > There's no way to sufficiently test a program that uses signals. The > > signal handler cannot touch *any* data, which makes it pretty useless. > > In order to be useful a signal handler needs to be able to set one bit. > The next time the ticker expires this bit will be checked. OK. > If an I/O operation was interrupted the Python signal handler can be > executed immediately from the wrapper. When it returns the wrapper > will resume the interrupted operation. Is calling the Python signal handler from the wrapper always safe? What if the Python signal handler e.g. closes the file or reads from it? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Sep 4 21:24:04 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 16:24:04 -0400 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Wed, 04 Sep 2002 23:07:09 +0300." <20020904230709.A24623@hishome.net> References: <20020904094947.GA56953@hishome.net> <20020904215131.A12898@hishome.net> <200209041916.g84JGPd08031@pcp02138704pcs.reston01.va.comcast.net> <20020904230709.A24623@hishome.net> Message-ID: <200209042024.g84KO4G08242@pcp02138704pcs.reston01.va.comcast.net> > > What if the operation is a select() call? Is restarting the right > > thing? How to take into account the consumed portion of the timeout, > > if given? > > Some versions of select update the timeout structure to the remainder if > they are interrupted by a signal. It's probably not a good idea to rely > on this so gettimeofday could be used to calculate the remainder. I like Neil's suggestion: simply return. The timeout is a hint. > > Or is this a stdio problem? I believe that calls like fgets() and > > getchar() don't lose data, but maybe I misunderstand your observation. > > This is not the point - even if Python I/O calls were fully restartable > would you actually expect people to check for EINTR and restart for > *every* I/O operation in the program just in case some module happens to > use signals? > > Instead of > > for line in file: > do_something_with(line) > > we would need to write > > while 1: > try: > line = file.next() > except IOError, exc: > if exc.errno == errno.EINTR: > continue > else: > raise > except StopIteration: > break > do_something_with(line) OK, but you're changing your tune here. I agree that this is bad, but I still don't believe (or understand) your previous remark about readline losing track of buffering. But let's forget about this, I trust that you really meant what you showed here. > > As I said before, I'm very skeptical that making the I/O ops > > EINTR-safe would be enough to allow the use of signals as > > suggested by Skip > > If it's good enough for other purposes it should be good enough for > Skip's proposal, too. Well, it has to be *perfect* for Skip's proposal, since it means we'd be generating signals probably at a rate of 100 per second. > > Skip, but that might still be useful for other purposes, *if* we can > > decide when to honor EINTR and when not. > > Only low-level functions like os.read and os.write that map directly > to stdio functions should ever return EINTR. Um, os.read/write are the ones that *don't* map to stdio. Maybe you meant "that map directly to file descriptors"? But I doubt this would be acceptable -- if we were generating 100 signals per second, os.read/write become much harder to use if they could raise EINTR (currently they only raise EINTR if the app uses signal handlers, which isn't that common). > To make Python signal-safe all other calls that can return EINTR > should have a retry loop. On EINTR they should check if there are > things to do and if so grab the GIL, make pending calls, release the > GIL and retry the operation (unless an exception has been raised by > the signal handler, of course). > > This way I could finally write a Python daemon that reloads its > configuration files on getting the customary SIGHUP :-) If you really want that, maybe you could see if you can produce a working design and patch? Even if it's not perfect enough to use signals to replace the ticker, people who like to use signals would probably be happy. --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen@oratrix.com Wed Sep 4 21:45:30 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Wed, 4 Sep 2002 13:45:30 -0700 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <20020904215131.A12898@hishome.net> Message-ID: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> On woensdag, sep 4, 2002, at 11:51 US/Pacific, Oren Tirosh wrote: > When an I/O operation is interrupted by an unmasked signal it returns > with errno==EINTR. The state of the file is not affected and repeating > the operation should recover and continue with no loss of data. > I'm not sure about modern unixen (it's been a long time since I was interested in such lowlevel details) but historically this has been one complete mess. Aside from some unix variations that basically didn't do restart at all there have always been problems with signal restart semantics. For sockets and various devices (raw ttys, I think) you could definitely lose data. Hmm, and when I think of it I don't think it's even possible to restart safely. What if I do a read() on a socket, and I request more bytes than the available physical memory (but less than VM, of course)? The kernel simply doesn't have anywhere to store the bytes other than my buffer, and if it has to return EINTR then >POOF< these bytes are gone forever. From guido@python.org Wed Sep 4 21:48:11 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 16:48:11 -0400 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Wed, 04 Sep 2002 13:45:30 PDT." <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> Message-ID: <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> [Jack] > Hmm, and when I think of it I don't think it's even possible to restart > safely. What if I do a read() on a socket, and I request more bytes > than the available physical memory (but less than VM, of course)? The > kernel simply doesn't have anywhere to store the bytes other than my > buffer, and if it has to return EINTR then >POOF< these bytes are gone > forever. I think that if any bytes have already been copied into your buffer, you don't get an EINTR, you get a short read. --Guido van Rossum (home page: http://www.python.org/~guido/) From walter@livinglogic.de Wed Sep 4 22:21:40 2002 From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Wed, 04 Sep 2002 23:21:40 +0200 Subject: [Python-Dev] mimetypes patch #554192 References: <3D5BEBB8.7080904@livinglogic.de> <15707.61612.844119.819432@anthem.wooz.org> <3D5CE38D.9080905@livinglogic.de> <3D5F9C2D.8010209@livinglogic.de> Message-ID: <3D767964.4090405@livinglogic.de> Martin v. Loewis wrote: > Walter Dörwald writes: > > >>>>Even better would be, if we could assign priorities to the mappings, >>>>so that for e.g. image/jpeg the preferred extension is .jpeg. >>>>Then guess_type() and guess_extension() would return the preferred >>>>mimetype/extension. >>> >>>Do you have a specific application for that in mind? It sounds like >>>overkill. >> >>I'm using a web mirror script which uses the extensions from >>guess_extension to save all downloaded resources, and I hate it >>when the HTML files are named .htm and JPEG images are named .jpe. > > Then this is your preference - others might prefer jpg, just because > their file system can deal better with that. If you can agree that > this is your preference, you should put the preference mechanism into > the application. Agreed, other applications might have other priorities. > Maybe your preference can be expressed algorithmically? It might be > that you always want the longest known extension (it is unlikely that > you prefer "jpeg" over "jpg" just because that contains a vowel :-). I guess it's "longest one" or "the one most unencumbered by filesystem limitations". OK, so lets drop the priority idea. What do we do with the patch as it is now? Bye, Walter Dörwald From pinard@iro.umontreal.ca Wed Sep 4 22:21:44 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Wed, 04 Sep 2002 17:21:44 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Wed, 04 Sep 2002 16:48:11 -0400") References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > [Jack] >> Hmm, and when I think of it I don't think it's even possible to restart >> safely. What if I do a read() on a socket, and I request more bytes >> than the available physical memory (but less than VM, of course)? The >> kernel simply doesn't have anywhere to store the bytes other than my >> buffer, and if it has to return EINTR then >POOF< these bytes are gone >> forever. > > I think that if any bytes have already been copied into your buffer, > you don't get an EINTR, you get a short read. I'm not fully familiar with all the details of this problem, it surely has been in the air for quite a long time now (I might have first heard of it while Taylor UUCP was being developed). It might be dependent on the underlying system. If I'm not mistaken, this is Ian Taylor who introduced the following Autoconf macro: - Macro: AC_SYS_RESTARTABLE_SYSCALLS If the system automatically restarts a system call that is interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'. In GNU file utilities (now merged within the new GNU coreutils), Jim Meyering uses restart wrappers for many I/O functions, so the idea of wrappers has been maturing for a while, and is used in basic, heavily used programs. However, I did not look at such wrappers recently. Python might probably wrap calls when these are restartable, or transmit the error upwards for systems where calls are not restartable. -- François Pinard http://www.iro.umontreal.ca/~pinard From python@rcn.com Wed Sep 4 22:40:34 2002 From: python@rcn.com (Raymond Hettinger) Date: Wed, 4 Sep 2002 17:40:34 -0400 Subject: [Python-Dev] Proposed Mixins for Wide Interfaces References: <001101c2510d$9fce0920$5f66accf@othello> <200209031750.g83HoVq05812@odiug.zope.com> Message-ID: <001801c2545b$b43aba60$e8ea7ad1@othello> [RH] > > How about adding some mixins to simplify the > > implementation of some of the fatter interfaces? [GvR] > Can you suggest implementations for these, to be absolutely clear what > you mean? -- snip -- > What if the "natural" thing to implement is __le__ instead of __lt__? > That's the case for sets. Or __gt__ (less likely)? Yes. Here is some code --------------------------- class CompareMixin: """ Given an __eq__ method in a subclass, adds a __ne__ method Given __eq__ and __lt__, adds !=, <=, >, >=. If supplied, takes advantage of __lte__ for speed. """ def __eq__(self, other): raise NotImplementedError def __ne__(self, other): return not (self == other) def __lt__(self, other): raise NotImplementedError def __lte__(self, other): return self < other or self == other def __gt__(self, other): return not (self <= other) def __gte__(self, other): return not (self < other) ## Example from sets import mixins class BaseSet(object, mixins.CompareMixin): """Common base class for mutable and immutable sets.""" __slots__ = ['_data'] # . . . def issubset(self, other): """Report whether another set contains this set.""" self._binary_sanity_check(other) if len(self) > len(other): # Fast check for obvious cases return False otherdata = other._data for elt in self: if elt not in otherdata: return False return True def __eq__(self, other): self._binary_sanity_check(other) return self._data == other._data def __lt__(self, other): self._binary_sanity_check(other) return len(self) < len(other) and self.issubset(other) __le__ = issubset # optional, but recommended for speed. # Example where gt is the most natural implementation class Anyhoo(CompareMixin): __eq__ = someBigEqualityTest __gt__ = someBigComplexOrderingFunction def __lt__(self, other): return not(self>other or self==other) [RH] > > class MappingMixin: > > """ > > Given __setitem__, __getitem__, and keys, > > implements values, items, update, get, setdefault, len, > > iterkeys, iteritems, itervalues, has_key, and __contains__. > > > > If __delitem__ is also supplied, implements clear, pop, > > and popitem. > > > > Takes advantage of __iter__ if supplied (recommended). [GvR] > Does that mean that if you have __iter__, you don't use keys()? In > that case it should implement keys() out of __iter__. Maybe this > should be required. Not really. keys() is always required. If __iter__ is supplied, then things like iterkeys(), iteritems(), and itervalues() get computed from __iter__ rather than keys(). My thought on using keys() as part of the minimum specification is that database style interfaces always supply some type of list method. For instance, shelve can be instantly widened with the mixin, no other coding is required. OTOH, I'm not glued to the idea of using keys() as part of the minimum spec. [RH] > > Takes advantage of __contains__ or has_key if supplied > > (recommended). > > """ [GvR] > Let's standardize on __contains__, not has_key(). I guess you could > provide __contains__ as follows: Makes sense. [RH] > > The idea is to make it easier to implement these interfaces. > > Also, if the interfaces get expanded, the clients automatically > > updated. [GvR] > A similar thing for sequences would be useful too, right? Hmm, listing and concatenation beget repetition; len() and __getitem__() beget slicing. iteration and __cmp__ beget min(), max() For mutable sequences, supplying __setitem__ begets appending, extending, and slice assignment. Supplying __delitem__ begets pop(), remove() and slice deletion. For overachivers, the above are all that are needed for sort(), reverse(), index(), insert(), and count() Would you like me to create a mixin module and put it in the sandbox? Raymond Hettinger From pinard@iro.umontreal.ca Wed Sep 4 23:25:24 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Wed, 04 Sep 2002 18:25:24 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <005601c253d5$d0a63c50$ced241d5@hagrid> ("Fredrik Lundh"'s message of "Wed, 4 Sep 2002 07:41:42 +0200") References: <200209031653.g83GrjQ01929@odiug.zope.com> <200209032018.g83KI3q08343@odiug.zope.com> <17d001c2538d$f82650f0$1c86db41@boostconsulting.com> <008201c253aa$780144d0$6300000a@holdenweb.com> <005601c253d5$d0a63c50$ced241d5@hagrid> Message-ID: [Fredrik Lundh] > [...] and this mailing list is about python. Why did you reply to the mailing list, then? :-) > François Pinard wrote: >> [Steve Holden] >> > It looks especially bad in my standard mailreader variable-pitch font. >> [...] People often insert ASCII tables or simple explicative drawings, >> these capabilities are useful enough for not being dismissed. You should >> use fixed width fonts [...] > > loser. > > if python really was all about "everything computers did when I learned to > use them will always be the best way to do it", it would probably never have > been invented. Python did not build its success by trying to convince people that every else is wrong. It rather offered an environment in which participants happily considered they were gaining a lot. If someone breaks its screen appearance through selection of inappropriate fonts, he might gain some pleasure indeed while loosing the ability to read many existing messages. That's really his choice and preferences, he has to live with the drawbacks, without trying to convince senders that they are all wrong. Considering others as losers does not efficiently trigger progress. -- François Pinard http://www.iro.umontreal.ca/~pinard From hu.peress@mail.mcgill.ca Sat Sep 7 23:30:59 2002 From: hu.peress@mail.mcgill.ca (Hunter Peress) Date: 07 Sep 2002 17:30:59 -0500 Subject: [Python-Dev] Call for clarity Message-ID: <1031437860.636.29.camel@HillCountryPeress> I've been using python for a good few months. And im really bothered by some aspects of the documentation. I think that there should be a clear effort to provide API style information, rather than the mixed state that things currently are. There are tools for C++/Java...that are part of the official distributions that provide API style docs. Here's what gets me: when u look up something in pydoc, you have no idea what it returns/expects in terms of types. Now, since python is not an explicitely typed language, I ask rhetorically, how can u have good docs that tell u the return/input types without making the language explicitely typed? Make the documenation system explictely typed. The clarification needs to happen somewhere along the lines, and I really think that the world would rather not have it happening at runtime. This could clear up a lot of confusion and further python's effectiveness. -Hunter. From bkc@murkworks.com Wed Sep 4 23:39:01 2002 From: bkc@murkworks.com (Brad Clements) Date: Wed, 04 Sep 2002 18:39:01 -0400 Subject: [Python-Dev] Getting started with GBayes testing Message-ID: <3D7653AD.14352.14F391B6@localhost> Hi, I'm interested in contributing to GBayes .. I'm thinking of trying word stemming and adding other types of token indicators. How can I contribute? Btw, I have been saving up my spam for a year or so.. I have about 31,238 spam messages saved up now. These are categorized as spam based on my reading of the subject, or examining the body when in doubt. There are probably 10% dups in the corpus. Some of them have viruses, likely klez. I'd like to replicate Tim's test rig so I can compare my results with existing ones. My spam isn't in mbox format, but I can convert it.. I'm particularly intersted in how to allow html only messages (reduce false positives). I'm getting a lot of personal mail in that format, unfortunately. Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From martin@v.loewis.de Wed Sep 4 23:56:30 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 05 Sep 2002 00:56:30 +0200 Subject: [Python-Dev] Call for clarity In-Reply-To: <1031437860.636.29.camel@HillCountryPeress> References: <1031437860.636.29.camel@HillCountryPeress> Message-ID: Hunter Peress writes: > This could clear up a lot of confusion and further python's > effectiveness. It's not clear, to me, from reading your message, what kind of change you are requesting (that you are requesting a change, rather than offering help, or asking for advice, appears to be clear). Could you kindly provide a small patch that gives an idea of what you would like to see changed, and how? TIA, Martin From drifty@bigfoot.com Thu Sep 5 00:25:29 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Wed, 4 Sep 2002 16:25:29 -0700 (PDT) Subject: [Python-Dev] Proposed Mixins for Wide Interfaces In-Reply-To: <001801c2545b$b43aba60$e8ea7ad1@othello> Message-ID: [Raymond Hettinger] > [RH] > > > How about adding some mixins to simplify the > > > implementation of some of the fatter interfaces? > This is a spur-of-the-moment thought, so it might not be a reasonable comment, but do we care that all of these methods will show up in when using dir() or any other introspective check? While I think the idea is great, it might give this sense that they are really, truly implemented for the class instead of reliant on the other implementations; the side effects of changing one of the required methods might have unexpected consequences for the user. But since I think this is a great idea, I don't want to see it disappear because of this; I guess I better solve my issue. =) Perhaps we can just make sure that this gets documented in both the API and in the doc strings saying that it is from the mixin and what methods it is dependent upon. That should be enough to squash my worry. And yes, if this gets into the core and Raymond does not want to do it, I will help with the doc patches. -Brett C. From hu.peress@mail.mcgill.ca Sun Sep 8 00:47:43 2002 From: hu.peress@mail.mcgill.ca (Hunter Peress) Date: 07 Sep 2002 18:47:43 -0500 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: References: <1031437860.636.29.camel@HillCountryPeress> Message-ID: <1031442464.644.68.camel@HillCountryPeress> Ok heres some more detail. I have no idea how pydoc works right now. I assume you call some program on a python file, and it simply looks for all """ """. It seems to do SOME lexical/scoping analysis of where to look for """ """, and consequently, how to display that information in the final,doc form; but I'm asking for more. As I said, python methods/functions are not explcitely typed. So what I propose is this: When the pydoc generator comes accross a function/method, there should remain a normal """ """ area for any comments. I'm asking now, that when the generator sees its in a method/function, it does a NEW check for a set of docs that document the type of each input argument, and the output. EG (theoretical, and off the top of my head): in a file you have a function: def something(a,b,c="lalal"): """This will find its way into the pydocs because its a comment""" ##Here is the new stuff Im proposing ##note, a clearer sytnax can surely be devised. """file""" #documents the type of the first arg """string""" # "" second """list""" # "" third """string""" #documents the return type. Then the pydoc generator will do a check on the # arguments to the func/meth, verify that the correct amount of these new comments (which only supply the type) are provided. I do think that it would help to actually enforce this. I think its fine that doc's NOT be generated if they don't supply this information. This provides for better docs and shouldnt get that many complaints. Then: If the docs are generated into webpages, links to the known types that are checked are provided. And if the docs are going into shell format then i dont know if links are necessary. There are lots of cases and issues that I havent discussed for this proposed implemenation. So I would like to continue this thread for the purposes of detailing this idea further. > > This could clear up a lot of confusion and further python's > > effectiveness. As we know, python is not an explicitely typed language, but enforcing some level of typing at the documentation level will see a lot of people falling into line (depending on how rigidly its enforced, and i do suggest a pretty rigid level). I have no patch ATM because I tend to design software before writing it, and im looking for support from the developers first. PS whats TIA mean? On Wed, 2002-09-04 at 17:56, Martin v. Loewis wrote: > Hunter Peress writes: > > > This could clear up a lot of confusion and further python's > > effectiveness. > > It's not clear, to me, from reading your message, what kind of change > you are requesting (that you are requesting a change, rather than > offering help, or asking for advice, appears to be clear). > > Could you kindly provide a small patch that gives an idea of what you > would like to see changed, and how? > > TIA, > Martin > > From python@rcn.com Thu Sep 5 01:19:04 2002 From: python@rcn.com (Raymond Hettinger) Date: Wed, 4 Sep 2002 20:19:04 -0400 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> Message-ID: <003d01c25471$d83fe960$2fd8accf@othello> From: "Hunter Peress" > def something(a,b,c="lalal"): > """This will find its way into the pydocs because its a comment""" > ##Here is the new stuff Im proposing > ##note, a clearer sytnax can surely be devised. > """file""" #documents the type of the first arg > """string""" # "" second > """list""" # "" third > """string""" #documents the return type. > > Then the pydoc generator will do a check on the # arguments to the > func/meth, verify that the correct amount of these new comments (which > only supply the type) are provided. I do think that it would help to > actually enforce this. I think its fine that doc's NOT be generated if > they don't supply this information. This provides for better docs and > shouldnt get that many complaints. Thanks for the clarification. I see what you're trying to do; however, I think that any gains are more than offset by the new level of complexity and lengthier code. The current docs make a pretty good effort at describing what is needed for each argument. At the same time, they allow flexibility for dynamic arguments that share a similar interface (such as substituting a StringIO object for a File object. In your example, the docs strings could be made clear using existing tools: def something(file, promptstring, optionlist): """Returns a string extracted from the file for any line matching the promptstring. The optionlist can include any of the following: IGNORECASE, VERBOSE. MULTILINE, or ADDLINENUMBER.""" I can't see that a tool like you described would add any more clarity than the above docstring. > PS whats TIA mean? "Thanks In Advance" Do you have any examples of current python docstrings that are not clear enough? Raymond Hettinger From greg@cosc.canterbury.ac.nz Thu Sep 5 01:30:40 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Sep 2002 12:30:40 +1200 (NZST) Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Message-ID: <200209050030.g850UeI2026648@kuku.cosc.canterbury.ac.nz> pinard@iro.umontreal.ca: > - Macro: AC_SYS_RESTARTABLE_SYSCALLS > If the system automatically restarts a system call that is > interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'. > > Python might probably wrap calls when > these are restartable, or transmit the error upwards for systems where calls > are not restartable. I think that macro means that you *don't* have to use a wrapper to restart syscalls, because it happens automatically. So if it's not defined it means you have to restart them manually, not that they can't be restarted at all. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Thu Sep 5 01:24:29 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 20:24:29 -0400 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: Your message of "Wed, 04 Sep 2002 18:39:01 EDT." <3D7653AD.14352.14F391B6@localhost> References: <3D7653AD.14352.14F391B6@localhost> Message-ID: <200209050024.g850OTd08824@pcp02138704pcs.reston01.va.comcast.net> > I'm interested in contributing to GBayes .. > > I'm thinking of trying word stemming and adding other types of token > indicators. How can I contribute? Pretty soon, a SF propject will be created (Barry has already gotten the request in). We'll gladly add you to the list of developers. > Btw, I have been saving up my spam for a year or so.. I have about > 31,238 spam messages saved up now. These are categorized as spam > based on my reading of the subject, or examining the body when in > doubt. There are probably 10% dups in the corpus. Some of them have > viruses, likely klez. Cool. > I'd like to replicate Tim's test rig so I can compare my results > with existing ones. My spam isn't in mbox format, but I can convert > it.. If you can't wait for the SF project, you can find all the code in the Python CVS tree: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/spambayes/ > I'm particularly intersted in how to allow html only messages > (reduce false positives). I'm getting a lot of personal mail in > that format, unfortunately. You train it with an equal number of spam and non-spam ("ham") that you received. Just make sure the ham training messages contain enough representatives of the html-only mail. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Thu Sep 5 01:36:18 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Sep 2002 12:36:18 +1200 (NZST) Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> Message-ID: <200209050036.g850aIkU026656@kuku.cosc.canterbury.ac.nz> Jack Jansen : > Aside from some unix variations that basically didn't do restart at all > there have always been problems with signal restart semantics. For > sockets and various devices (raw ttys, I think) you could definitely > lose data. Sockets? Are you sure? I find it unlikely that such a severe problem could persist in many Unix variants for so long. I've never heard of any mention of such a thing. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Sep 5 01:38:09 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Sep 2002 12:38:09 +1200 (NZST) Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209050038.g850c9mY026662@kuku.cosc.canterbury.ac.nz> Guido van Rossum : > I have never understood why a child dying should send a signal. You > can poll for the child with waitpid() instead. Because child termination might not be the only thing you want to wait for. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Thu Sep 5 01:32:21 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 04 Sep 2002 20:32:21 -0400 Subject: [Python-Dev] Proposed Mixins for Wide Interfaces In-Reply-To: Your message of "Wed, 04 Sep 2002 16:25:29 PDT." References: Message-ID: <200209050032.g850WLG08875@pcp02138704pcs.reston01.va.comcast.net> > This is a spur-of-the-moment thought, so it might not be a > reasonable comment, but do we care that all of these methods will > show up in when using dir() or any other introspective check? While > I think the idea is great, it might give this sense that they are > really, truly implemented for the class instead of reliant on the > other implementations; the side effects of changing one of the > required methods might have unexpected consequences for the user. dir() *intends* to show methods regardless of whether they are implemented in the class or in a base class. So this doesn't sound like a valid objection. Pydoc shows inherited methods separately. --Guido van Rossum (home page: http://www.python.org/~guido/) From whisper@oz.net Thu Sep 5 01:48:07 2002 From: whisper@oz.net (David LeBlanc) Date: Wed, 4 Sep 2002 17:48:07 -0700 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: <200209050024.g850OTd08824@pcp02138704pcs.reston01.va.comcast.net> Message-ID: I would like to be in on that project too please. David LeBlanc Seattle, WA USA > -----Original Message----- > From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On > Behalf Of Guido van Rossum > Sent: Wednesday, September 04, 2002 17:24 > To: bkc@murkworks.com > Cc: python-dev@python.org > Subject: Re: [Python-Dev] Getting started with GBayes testing > > > > I'm interested in contributing to GBayes .. > > > > I'm thinking of trying word stemming and adding other types of token > > indicators. How can I contribute? > > Pretty soon, a SF propject will be created (Barry has already gotten > the request in). We'll gladly add you to the list of developers. > > > Btw, I have been saving up my spam for a year or so.. I have about > > 31,238 spam messages saved up now. These are categorized as spam > > based on my reading of the subject, or examining the body when in > > doubt. There are probably 10% dups in the corpus. Some of them have > > viruses, likely klez. > > Cool. > > > I'd like to replicate Tim's test rig so I can compare my results > > with existing ones. My spam isn't in mbox format, but I can convert > > it.. > > If you can't wait for the SF project, you can find all the code in the > Python CVS tree: > > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondi > st/sandbox/spambayes/ > > > I'm particularly intersted in how to allow html only messages > > (reduce false positives). I'm getting a lot of personal mail in > > that format, unfortunately. > > You train it with an equal number of spam and non-spam ("ham") that > you received. Just make sure the ham training messages contain enough > representatives of the html-only mail. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From barry@python.org Thu Sep 5 01:48:48 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 4 Sep 2002 20:48:48 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes classifier.py,1.8,1.9 References: <004301c25472$78f62f40$2fd8accf@othello> Message-ID: <15734.43504.641800.957590@anthem.wooz.org> >>>>> "RH" == Raymond Hettinger writes: >> A now-rare pure win, changing spamprob() to work harder to find >> more evidence when competing 0.01 and 0.99 clues appear RH> I hope these victories make it back to the world outside of RH> Python (assuming there is one). The world needs good spam RH> filters. Indeed, I too hope they will. I just got approved for a SF project called "spambayes" and plan to move the code there. I'll try to coordinate that with Tim, and then make a more detailed announcement tomorrow. -Barry From tim.one@comcast.net Thu Sep 5 01:52:14 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 04 Sep 2002 20:52:14 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: [Tim] > ... > The first 16 most extreme indicators are split 9 highly in favor of ham > (.01) and 7 highly in favor of spam (.99). If I hadn't folded > case away to let stinking conference announcements through , I > expect it would have latched on to the SCREAMING at the start instead of > looking deeper. Looking at the To: line probably would nail this one too, > as "Undisclosed Recipients" has two 0.99 spam indicators right there. > > Whatever, you *don't* want to look at msgs with a mix of just > 0.99 and 0.01 thingies: it's not all that unusual to get such an > extreme mix, in spam or ham. I should have added that it usually gets the right result when this happens. It's the exceptions to that rule that are mondo embarrassing, because it's making a mistake then while sitting on a mountain of strong evidence (albeit pointing as extremely as possible in both directions at once ). "A problem" is that when a MIN_SPAMPROB and MAX_SPAMPROB clue both appear, the math is such that they cancel out exactly. It's *almost* as if neither existed, but not quite: they also keep two lower-probability words *out* of the computation (only a grand total of the MAX_DISCRIMINATORS most extreme clues are retained). So I changed spamprob() to keep accepting more clues when MIN/MAX cancellations are inevitable, and to use the best of those in lieu of the cancelling extremes. This turned out to be a pure win: false positive percentages 0.000 0.000 tied 0.000 0.000 tied 0.050 0.050 tied 0.000 0.000 tied 0.025 0.025 tied 0.025 0.025 tied 0.050 0.050 tied 0.025 0.025 tied 0.025 0.025 tied 0.025 0.025 tied 0.075 0.075 tied 0.025 0.025 tied 0.025 0.025 tied 0.025 0.025 tied 0.075 0.025 won 0.025 0.025 tied 0.025 0.025 tied 0.000 0.000 tied 0.025 0.025 tied 0.050 0.050 tied won 1 times tied 19 times lost 0 times total unique fp went from 9 to 7 false negative percentages 0.909 0.764 won 0.800 0.691 won 1.091 0.981 won 1.381 1.309 won 1.491 1.418 won 1.055 0.873 won 0.945 0.800 won 1.236 1.163 won 1.564 1.491 won 1.200 1.200 tied 1.454 1.381 won 1.599 1.454 won 1.236 1.164 won 0.800 0.655 won 0.836 0.655 won 1.236 1.163 won 1.236 1.200 won 1.055 0.982 won 1.127 0.982 won 1.381 1.236 won won 19 times tied 1 times lost 0 times total unique fn went from 284 to 260 From sholden@holdenweb.com Thu Sep 5 01:55:59 2002 From: sholden@holdenweb.com (Steve Holden) Date: Wed, 4 Sep 2002 20:55:59 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 References: <200209031653.g83GrjQ01929@odiug.zope.com><200209032018.g83KI3q08343@odiug.zope.com><17d001c2538d$f82650f0$1c86db41@boostconsulting.com><008201c253aa$780144d0$6300000a@holdenweb.com><005601c253d5$d0a63c50$ced241d5@hagrid> Message-ID: <006901c25477$01b9cb30$6300000a@holdenweb.com> [Fran=E7ois Pinard] > [Fredrik Lundh] > > > [...] and this mailing list is about python. > > Why did you reply to the mailing list, then? :-) > The effbot is a law unto itself :-) > > Fran=E7ois Pinard wrote: > >> [Steve Holden] > > >> > It looks especially bad in my standard mailreader variable-pitch font. > [...] > > Python did not build its success by trying to convince people that ever= y else > is wrong. It rather offered an environment in which participants happi= ly > considered they were gaining a lot. > erm, ... > If someone breaks its screen appearance through selection of inappropri= ate > fonts, he might gain some pleasure indeed while loosing the ability to read > many existing messages. That's really his choice and preferences, he h= as to > live with the drawbacks, without trying to convince senders that they a= re all > wrong. Considering others as losers does not efficiently trigger progress. > I don't really consider """It looks especially bad in my standard mailrea= der variable-pitch font""" to be sufficiently evangelical to deserve this rebuke, but then I didn't really consider your rebuke deserved either, so= I guess we should just terminate this thread now. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming pydish.holdenweb.com/pwp/ Previous .sig file retired to www.homeforoldsigs.com ----------------------------------------------------------------------- From oren-py-d@hishome.net Thu Sep 5 06:27:37 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Thu, 5 Sep 2002 08:27:37 +0300 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Wed, Sep 04, 2002 at 04:48:11PM -0400 References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020905082737.A31267@hishome.net> On Wed, Sep 04, 2002 at 04:48:11PM -0400, Guido van Rossum wrote: > [Jack] > > Hmm, and when I think of it I don't think it's even possible to restart > > safely. What if I do a read() on a socket, and I request more bytes > > than the available physical memory (but less than VM, of course)? The > > kernel simply doesn't have anywhere to store the bytes other than my > > buffer, and if it has to return EINTR then >POOF< these bytes are gone > > forever. > > I think that if any bytes have already been copied into your buffer, > you don't get an EINTR, you get a short read. >From read(2) man page: EINTR The call was interrupted by a signal before any data was read. Same applies to write, recv, fcntl with locks, semop, etc. They're all designed to be restartable. The keyword in all cases is "before". Oren From goodger@users.sourceforge.net Thu Sep 5 03:44:23 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Wed, 04 Sep 2002 22:44:23 -0400 Subject: [Python-Dev] Misc/NEWS (was: Two random and nearly unrelated ideas) In-Reply-To: <200209041151.g84BpAg05683@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Skip] >> While adding a blurb to Misc/NEWS about the change to the thread >> ticker and check interval, it occurred to me that perhaps Misc/NEWS >> would benefit from conversion to ReST format. You could pump an >> HTML version out to the website periodically. I have the Docutils site auto-regenerated via a small cron script. Any time any of the source text files change, within an hour the site reflects the change. It makes site maintenance easy. (BTW, Skip, thanks for the bug report. I'll be looking into it ASAP.) [Guido] > Nice idea. How much additional mark-up would this add to quote the > occasional reST meta-character? Very little, depending on the desired effect. The extreme case would be if you want to mark up everything possible. The result may look too busy in the source text form though, especially because there are so many Python identifiers, expressions, code snippets, and file names that *could* be marked up. It's a trade-off. The nice thing is that Misc/NEWS is already almost valid reStrucuturedText (which shouldn't be surprising, since reStrucuturedText is based on common usage). In fact, most (if not all) of the standalone text files are almost there: README, PLAN.txt, etc. It wouldn't be much work to bring them up to spec. Here are the areas of Misc/NEWS that would require editing: * Sections: The two-line titles aren't supported. Either they should be combined into one line, or the "Release date" line should become part of the section body. Either:: What's New in Python 2.2 final? Release date: 21-Dec-2001 ========================================================== or:: What's New in Python 2.2 final? =============================== Release date: 21-Dec-2001 * Subsections (like "Core and builtins", "Library", "Extension modules", etc.): These could be made into true subsections by underlining them with dashes (and changing to title case):: Core and Builtins ----------------- I notice that there are many headers for empty subsections (such as "Tools/Demos" and "Build" in "What's New in Python 2.2 final?"). Should they be removed? * Inline literals (filenames, identifiers, expressions and code snippets): Surround with double-backquotes to get monospaced, uninterpreted text (like HTML TT tags). There are so many of these that it may be best to be selective. * Literal blocks: Example code should be indented and prefaced with double-colons ("::" at the end of the preceding paragraph). Doctest blocks (interactive sessions, begin with ">>> " and end with a blank line) don't need this, although it wouldn't hurt. > Can you convert a section for test and show me? I'll be happy to help. Hmm. Looking at the 2.2.1 Misc/NEWS file, I see sections for 2.2.1 final, 2.2.1c2, etc., but they're missing from the CVS Misc/NEWS file. Is this normal because of separate development branches or is something amiss? Following is a converted section from the current Misc/NEWS. Minimally marked up: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's New in Python 2.3 alpha 1? ================================= XXX Release date: DD-MMM-2002 XXX Type/class unification and new-style classes -------------------------------------------- - Assignment to __class__ is disallowed if either the old and the new class is a statically allocated type object (such as defined by an extension module). This prevents anomalies like ``2 .__class__ = bool``. - New-style object creation and deallocation have been sped up significantly; they are now faster than classic instance creation and deallocation. - The __slots__ variable can now mention "private" names, and the right thing will happen (e.g. ``__slots__ = ["__foo"]``). - The built-ins slice() and buffer() are now callable types. The types classobj (formerly class), code, function, instance, and instancemethod (formerly instance-method), which have no built-in names but are accessible through the types module, are now also callable. The type dict-proxy is renamed to dictproxy. - Cycles going through the __class__ link of a new-style instance are now detected by the garbage collector. - Classes using __slots__ are now properly garbage collected. [SF bug 519621] - Tightened the __slots__ rules: a slot name must be a valid Python identifier. - The constructor for the module type now requires a name argument and takes an optional docstring argument. Previously, this constructor ignored its arguments. As a consequence, deriving a class from a module (not from the module type) is now illegal; previously this created an unnamed module, just like invoking the module type did. [SF bug 563060] - A new type object, 'basestring', is added. This is a common base type for 'str' and 'unicode', and can be used instead of ``types.StringTypes``, e.g. to test whether something is "a string": ``isinstance(x, basestring)`` is True for Unicode and 8-bit strings. This is an abstract base class and cannot be instantiated directly. - Changed new-style class instantiation so that when C's __new__ method returns something that's not a C instance, its __init__ is not called. [SF bug #537450] - Fixed super() to work correctly with class methods. [SF bug #535444] - If you try to pickle an instance of a class that has __slots__ but doesn't define or override __getstate__, a TypeError is now raised. This is done by adding a bozo __getstate__ to the class that always raises TypeError. (Before, this would appear to be pickled, but the state of the slots would be lost.) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Maximally marked up: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's New in Python 2.3 alpha 1? ================================= XXX Release date: DD-MMM-2002 XXX Type/class unification and new-style classes -------------------------------------------- - Assignment to ``__class__`` is disallowed if either the old and the new class is a statically allocated type object (such as defined by an extension module). This prevents anomalies like ``2 .__class__ = bool``. - New-style object creation and deallocation have been sped up significantly; they are now faster than classic instance creation and deallocation. - The ``__slots__`` variable can now mention "private" names, and the right thing will happen (e.g. ``__slots__ = ["__foo"]``). - The built-ins ``slice()`` and ``buffer()`` are now callable types. The types classobj (formerly class), code, function, instance, and instancemethod (formerly instance-method), which have no built-in names but are accessible through the ``types`` module, are now also callable. The type dict-proxy is renamed to dictproxy. - Cycles going through the ``__class__`` link of a new-style instance are now detected by the garbage collector. - Classes using ``__slots__`` are now properly garbage collected. [SF bug 519621] - Tightened the ``__slots__`` rules: a slot name must be a valid Python identifier. - The constructor for the module type now requires a name argument and takes an optional docstring argument. Previously, this constructor ignored its arguments. As a consequence, deriving a class from a module (not from the module type) is now illegal; previously this created an unnamed module, just like invoking the module type did. [SF bug 563060] - A new type object, ``basestring``, is added. This is a common base type for ``str`` and ``unicode``, and can be used instead of ``types.StringTypes``, e.g. to test whether something is "a string": ``isinstance(x, basestring)`` is ``True`` for Unicode and 8-bit strings. This is an abstract base class and cannot be instantiated directly. - Changed new-style class instantiation so that when C's ``__new__`` method returns something that's not a C instance, its ``__init__`` is not called. [SF bug #537450] - Fixed ``super()`` to work correctly with class methods. [SF bug #535444] - If you try to pickle an instance of a class that has ``__slots__`` but doesn't define or override ``__getstate__``, a ``TypeError`` is now raised. This is done by adding a bozo ``__getstate__`` to the class that always raises ``TypeError``. (Before, this would appear to be pickled, but the state of the slots would be lost.) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From mwh@python.net Thu Sep 5 10:34:34 2002 From: mwh@python.net (Michael Hudson) Date: 05 Sep 2002 10:34:34 +0100 Subject: [Python-Dev] Please help in calling python fucntion from 'c' In-Reply-To: "Praveen Patil"'s message of "Wed, 4 Sep 2002 13:31:00 +0100" References: Message-ID: <2m8z2gok45.fsf@starship.python.net> "Praveen Patil" writes: > This is a multi-part message in MIME format. > ------=_NextPart_000_0011_01C25417.4EC8F910 > Content-Type: text/plain; > charset="iso-8859-1" > Content-Transfer-Encoding: 7bit > > Hi, > > I have written 'C' dll(MY_DLL.DLL) . I am importing 'C' dll in python > file(example.py). > I want to call python function from 'c' function. > For your reference I have attached 'c' and python files to this mail. > In my pc: > python code is under the directory D:\test\example.py > dll is under the directory C:\Program Files\Python\DLLs\MY_DLL.pyd > > Here are the steps I am following. > > step(1): I am calling 'C' function(RECEIVE_FROM_IL_S) from python. > This 'C' function is existing imported dll(MY_DLL). > step(2): I want to call python function(TestFunction) from 'C' > function(RECEIVE_FROM_IL_S). > > > Python code is(example.py) :- > ---------------------------- > import MY_DLL > > G_Logfile = None > > def TestFunction(): > G_Logfile = open('Pytestfile.txt', 'w') > G_Logfile.write("%s \n"%'I am writing python created text file') > G_Logfile.close > G_Logfile = None > #end def TestFunction > > if __name__ == "__main__": > > MY_DLL.RECEIVE_FROM_IL_S(10,50) > > > 'C' code is (MY_DLL.c) :- > --------------------- > #include > #include > #include > > PyObject* _wrap_RECEIVE_FROM_IL_S(PyObject *self, PyObject *args) > { > FILE* fp; > PyObject* _resultobj; > int i,j; > > if( !(PyArg_ParseTuple(args, "ii",&i,&j))) > { > return NULL; > } > fp= fopen("RECEIVE_IL_S.txt", "w"); > fprintf(fp, "i=%d j=%d" , i,j); > fclose(fp); > > /* Here I want to call python function(TestFunction). Please suggest me > some solution*/ > > _resultobj = Py_None; > return _resultobj; > } > > > static PyMethodDef MY_DLL_methods[] = { > { "RECEIVE_FROM_IL_S", _wrap_RECEIVE_FROM_IL_S, METH_VARARGS }, > { NULL , NULL} > }; > > __declspec(dllexport) void __cdecl initMY_DLL(void) > { > Py_InitModule("MY_DLL",MY_DLL_methods); > } > > > Please anybody help me solving the problem. > > > Cheers, > > Praveen. > > ------=_NextPart_000_0011_01C25417.4EC8F910 > Content-Type: text/plain; > name="exampl.py" > Content-Transfer-Encoding: 7bit > Content-Disposition: attachment; > filename="exampl.py" > > import MY_DLL > > G_Logfile = None > > def TestFunction(): > G_Logfile = open('Pytestfile.txt', 'w') > G_Logfile.write("%s \n"%'I am writing python created text file') > G_Logfile.close > G_Logfile = None > #end def TestFunction > > if __name__ == "__main__": > > MY_DLL.RECEIVE_FROM_IL_S(10,50) > > ------=_NextPart_000_0011_01C25417.4EC8F910 > Content-Type: application/octet-stream; > name="MY_DLL.c" > Content-Transfer-Encoding: quoted-printable > Content-Disposition: attachment; > filename="MY_DLL.c" > > #include > #include > #include > > PyObject* _wrap_RECEIVE_FROM_IL_S(PyObject *self, PyObject *args) > { > FILE* fp; =20 > PyObject* _resultobj; > int i,j; > =20 > if( !(PyArg_ParseTuple(args, "ii",&i,&j))) > { > return NULL; > } > fp=3D fopen("RECEIVE_IL_S.txt", "w"); > fprintf(fp, "i=3D%d j=3D%d" , i,j); > fclose(fp); > > /* Here I want to call python function(TestFunction). Please suggest = > me some solution*/ > > _resultobj =3D Py_None; > return _resultobj; > } > > > static PyMethodDef MY_DLL_methods[] =3D { > { "RECEIVE_FROM_IL_S", _wrap_RECEIVE_FROM_IL_S, METH_VARARGS }, > { NULL , NULL} > }; > > __declspec(dllexport) void __cdecl initMY_DLL(void) > { > Py_InitModule("MY_DLL",MY_DLL_methods); > } > > ------=_NextPart_000_0011_01C25417.4EC8F910 > Content-Type: text/plain; charset="us-ascii" > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > > [ The information contained in this e-mail is confidential and is intended for the named recipient only. If you are not the named recipient, please notify us by telephone on +44 (0)1249 442 430 immediately, destroy the message and delete it from your computer. Silver Software has taken every reasonable precaution to ensure that any attachment to this e-mail has been checked for viruses. However, we cannot accept liability for any damage sustained as a result of any such software viruses and advise you to carry out your own virus check before opening any attachment. Furthermore, we do not accept responsibility for any change made to this message after it was sent by the sender.] > > ------=_NextPart_000_0011_01C25417.4EC8F910-- > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev -- From mwh@python.net Thu Sep 5 10:33:04 2002 From: mwh@python.net (Michael Hudson) Date: 05 Sep 2002 10:33:04 +0100 Subject: [Python-Dev] Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Oren Tirosh's message of "Wed, 4 Sep 2002 08:46:46 -0400" References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> Message-ID: <2mbs7cok6n.fsf@starship.python.net> Oren Tirosh writes: > Any other problems I should be aware of? Wildly unpredicatble x-platform behaviour in the presence of threads. M. -- Indeed, when I design my killer language, the identifiers "foo" and "bar" will be reserved words, never used, and not even mentioned in the reference manual. Any program using one will simply dump core without comment. Multitudes will rejoice. -- Tim Peters, 29 Apr 1998 From mgilfix@eecs.tufts.edu Thu Sep 5 05:45:03 2002 From: mgilfix@eecs.tufts.edu (Michael Gilfix) Date: Thu, 5 Sep 2002 00:45:03 -0400 Subject: [apug] Re: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: <1031451760.644.97.camel@HillCountryPeress>; from hu.peress@mail.mcgill.ca on Sat, Sep 07, 2002 at 09:22:40PM -0500 References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> Message-ID: <20020905004503.A9680@eecs.tufts.edu> While I understand what you're trying to do here (and think it would be quite nice), I'm not sure how you're going to accomplish it. How will parsing python using a syntax-tree help? It's not going to tell you what the function does in all cases or the various types it could handle. Perhaps you could make educated guesses by looking at the types of operations on the objects (a 'has_key' is a sure indicator of a hash), but that would be sketchy at best. For a ready example, imagine having a module that contains useful helper functions. How are you going to identify the type requirements of those functions if you don't have context? How can you be sure that you've convered all contexts (including conversions). Such is the nature of dynamic languages. It's very hard to do what you'd like to do here. -- Mike On Sat, Sep 07 @ 21:22, Hunter Peress wrote: > I think its easier to enforce this from the level i describe, than have > guido saying "ok guys please be more explicit in your documentation". I > mean, both of those documents above are somewhat explicit, but they are > not COMPLETE. > > Could you provide me with some linkage on parsing python (from a > compilation/ syntax-tree analysis POV). SO that i can get to work on > writing a patch for the pydoc generation program. -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html" From hu.peress@mail.mcgill.ca Sun Sep 8 08:11:32 2002 From: hu.peress@mail.mcgill.ca (Hunter Peress) Date: 08 Sep 2002 02:11:32 -0500 Subject: [apug] Re: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: <20020905004503.A9680@eecs.tufts.edu> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <20020905004503.A9680@eecs.tufts.edu> Message-ID: <1031469093.644.196.camel@HillCountryPeress> Actually all of the thinking i did WAS taking into account the "dynamic" nature of python. But its not like the actual code is being rewritten fast enough to make this unfeasible or unneccesary. Im glad to get all of this feedback as its helping me formulate, and further specify my plans (or eventually healthily debunk them (as the past 3 responders have helped do)). Instead of just thinking: "arguments are not explicitely anything, therefore it makes no sense to even attempt to document them explicitely". I think this: simply add the capability for multiple definitions per each argument. eg going back to my original sample here is an updated version: def something(a,b,c="lalal"): """This will find its way into the pydocs because its a comment""" ##Here is the new stuff Im proposing ##note, a clearer sytnax can surely be devised. """file,socket""" #documents the type(s) of the first arg """string,list""" # "" second """list,hash""" # "" third """string,hash""" #documents the return type(s). Thats quite a simple solution, and still provides worlds better exactness and clarity than the current system allows. Onto more of your concerns: On Wed, 2002-09-04 at 23:45, Michael Gilfix wrote: > While I understand what you're trying to do here (and think it would > be quite nice), I'm not sure how you're going to accomplish it. How > will parsing python using a syntax-tree help? It's not going to tell > you what the function does in all cases or the various types it could > handle.Perhaps you could make educated guesses by looking at the > types of operations on the objects (a 'has_key' is a sure indicator of > a hash), but that would be sketchy at best. Actually I wasnt suggesting this AT ALL wrt intelligent guesses, and for now this proposal leans away from it. Rather there are only 2 simple things that I wanted to obtain from the parse-tree: the number of arguments, and if possible to see if there Assume for now that my whole proposal will simply be another option (instead of the default) to the pydoc-generator program. If invoked, it will fail (if the super strict option is specified) if you don't supply definitions for number of args for a given method. This brings up your "dynamic" language issue again. When u have lots of args being used as different things, my program then introduces another level of complexity to deciphering the docs in a meaningful way. Eg: a sample output of this program based on my example: ------------output----------------- method: something(a (file,socket),b (string,list),c="lalal" (list,hash)) return type :string,hash This will find its way into the pydocs because its a comment ----------------------------------- Now in html format it would be even nicer as there will be links to the types listed. And now looking at it, I think its much clearer than nothing at al. Of course there is going to be that type of code where u have no need of documenting every method because their names are self explantory, and such explicit documentation isnt necessary, thats not what this is really intended for. If the specific argument arises that "since python is a dynamic language your approach doesnt make sense" say, then I have to respond: an attempt at specifiying things is FAR better than nothing, and moreover, this is only my first attempt. Allowing it to become a part of the generator as an option will open it up to user input, and hence improvement, AND! *** it might just turn out that a "dynamic" approach will be necessary to document a "dynamic" language. *** So im still looking for more design tips, and a place where I could find out how to get into the meat of the python parser, but i think the "http://python.org/doc/2.2/lib/module-parser.html" is probably what I'll be using. > > For a ready example, imagine having a module that contains useful > helper functions. How are you going to identify the type requirements > of those functions if you don't have context? How can you be sure that > you've convered all contexts (including conversions). > > Such is the nature of dynamic languages. It's very hard to do > what you'd like to do here. > > -- Mike > > On Sat, Sep 07 @ 21:22, Hunter Peress wrote: > > I think its easier to enforce this from the level i describe, than have > > guido saying "ok guys please be more explicit in your documentation". I > > mean, both of those documents above are somewhat explicit, but they are > > not COMPLETE. > > > > Could you provide me with some linkage on parsing python (from a > > compilation/ syntax-tree analysis POV). SO that i can get to work on > > writing a patch for the pydoc generation program. > > -- > Michael Gilfix > mgilfix@eecs.tufts.edu > > For my gpg public key: > http://www.eecs.tufts.edu/~mgilfix/contact.html" > From mal@egenix.com Thu Sep 5 10:14:06 2002 From: mal@egenix.com (M.-A. Lemburg) Date: Thu, 05 Sep 2002 11:14:06 +0200 Subject: [Python-Dev] utf8 issue References: <200208232105.g7NL5RE16863@pcp02138704pcs.reston01.va.comcast.net> <2mznv9c1k4.fsf@starship.python.net> <200208261405.g7QE5Of05199@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D77205E.8080103@lemburg.com> Guido van Rossum wrote: >>Guido van Rossum writes: >> >> >>>This might beling on SF, except it's already been solved in Python >>>2.3, and I need guidance about what to do for Python 2.2.2. >>> >>>In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that >>>cannot be decode back. In 2.3, this is fixed. Should this be fixed >>>in 2.2.2 as well? >> >>I think this was discussed really quite a long time ago, like six >>months or so. >> >> >>>I'm asking because it caused problems with reading .pyc files: if >>>there's a Unicode literal containing a lone surrogate, reading the >>>.pyc file causes an exception: >>> >>>UnicodeError: UTF-8 decoding error: unexpected code byte >>> >>>It looks like revision 2.128 fixed this for 2.3, but that patch >>>doesn't cleanly apply to the 2.2 maintenance branch. Can someone >>>help? >> >>I think the reason this didn't get fixed in 2.2.1 is that it >>necessitates bumping MAGIC. >> >>I can probably dig up more references if you want. > > > Please do. Bumping MAGIC is a no-no between dot releases. But I > don't understand why that is necessary? It would be necessary since marshal uses UTF-8 for storing Unicode literals. Even though it's highly unlikely that the problem cases are used in Python Unicode literals, there's a tiny chance. Without the MAGIC change this could result in PYC files failing to load. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Thu Sep 5 14:51:49 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 09:51:49 -0400 Subject: [Python-Dev] utf8 issue In-Reply-To: Your message of "Thu, 05 Sep 2002 11:14:06 +0200." <3D77205E.8080103@lemburg.com> References: <200208232105.g7NL5RE16863@pcp02138704pcs.reston01.va.comcast.net> <2mznv9c1k4.fsf@starship.python.net> <200208261405.g7QE5Of05199@pcp02138704pcs.reston01.va.comcast.net> <3D77205E.8080103@lemburg.com> Message-ID: <200209051351.g85Dpnk12649@odiug.zope.com> > > Please do. Bumping MAGIC is a no-no between dot releases. But I > > don't understand why that is necessary? > > It would be necessary since marshal uses UTF-8 for storing > Unicode literals. Do you mean that in 2.2 it doesn't? > Even though it's highly unlikely that the problem cases are used in > Python Unicode literals, there's a tiny chance. Without the MAGIC > change this could result in PYC files failing to load. Ha. You may have missed the start of this thread, but the whole problem was that a PYC file *did* fail to load! (The .py file had a lone surrogate in it.) So I'm not sure this argument holds much water. Can someone please explain what change would be necessary to what part of the code to prevent a lone surrogate in a string literal from creating a PYC file from blowing up? --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Thu Sep 5 05:54:14 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Thu, 5 Sep 2002 00:54:14 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> <20020904160143.GA1483@hishome.net> <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020905045414.GA26104@hishome.net> On Wed, Sep 04, 2002 at 04:05:22PM -0400, Guido van Rossum wrote: > > If I use a module that spawns an external process and uses SIGCHLD to be > > informed of its termination why should my innocent code that just reads > > lines from a file suddenly break? In C I can at least restart the > > operation after an EINTR but file.readline cannot even be properly > > restarted because the buffering and file position is all messed up. > > I have never understood why a child dying should send a signal. You > can poll for the child with waitpid() instead. You're assuming too much about the structure of the program using child processes. The code that starts the child process may not be in control of the Python program counter by the time it ends. It's useful to be able to leave a signal handler to clean up the zombie process by waitpid(). > But if you have a suggestion for how to fix this particular issue, I'd > be happy to look it over, since this *is* something some people do. Of course people do it - it's documented and it works. Signal handling may have had some historical problems on some Unixes but I've never had any problem with it under Linux. My previous messages more or less outline my suggestion. I'll write a better summary. > > Getting an notification of a child process terminating or other > > asynchronous events can only be done using signals and is currently > > dangerous because it will break code using I/O. > > See above. I see half your point; people wanting this tend to use > signals and it causes breakage. Polling is not what I'd call "getting notification of asynchronous events". If it causes breakage it could be because people either use it incorrectly or the signal support on the underlying system is broken. In Linux it isn't broken. If it's broken on other Python platforms I don't see why it shouldn't be well-supported on the platforms that aren't. Has anyone here actually tried to use signal.signal ? > > > > interference to Python code by signals. Any other problems I should > > > > be aware of? > > > > > > There's no way to sufficiently test a program that uses signals. The > > > signal handler cannot touch *any* data, which makes it pretty useless. > > > > In order to be useful a signal handler needs to be able to set one bit. > > The next time the ticker expires this bit will be checked. > > OK. > > > If an I/O operation was interrupted the Python signal handler can be > > executed immediately from the wrapper. When it returns the wrapper > > will resume the interrupted operation. > > Is calling the Python signal handler from the wrapper always safe? > What if the Python signal handler e.g. closes the file or reads from > it? Code in signal handlers is executed at some arbitrary point in the program and the programmer should be aware of this and only do so simple things like setting a flag or appending to a list. Oren From bkc@murkworks.com Thu Sep 5 15:13:50 2002 From: bkc@murkworks.com (Brad Clements) Date: Thu, 05 Sep 2002 10:13:50 -0400 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: <200209050024.g850OTd08824@pcp02138704pcs.reston01.va.comcast.net> References: Your message of "Wed, 04 Sep 2002 18:39:01 EDT." <3D7653AD.14352.14F391B6@localhost> Message-ID: <3D772EC2.30217.184B6C78@localhost> On 4 Sep 2002 at 20:24, Guido van Rossum wrote: > Pretty soon, a SF propject will be created (Barry has already gotten > the request in). We'll gladly add you to the list of developers. I look forward to it. > > I'm particularly intersted in how to allow html only messages > > (reduce false positives). I'm getting a lot of personal mail in > > that format, unfortunately. > > You train it with an equal number of spam and non-spam ("ham") that > you received. Just make sure the ham training messages contain enough > representatives of the html-only mail. This is one way to do it, but I was planning on experimenting with tokenizer methods that strip out HTML tags, leaving only the text. My feeling is that the presentation of "the message" is independent of the message itself, so if I get a message in Text, HTML, RTF only the actual content is important, not the markup method. Though I suppose using lots of red and large fonts might be an indicator of spam, the text of the message should still suffice. Tim's comments in timtest.py hint that stripping tags isn't a catastrophe for f-n's, but he's not planning on doing that for use on technical lists. I would like to pursue general client-side filtering of spam, so I do need to contend with that. btw, Tim's comment: > # So if a message is multipart/alternative with both text/plain and text/html > # branches, we ignore the latter, else newbies would never get a message > # through. If a message is just HTML, it has virtually no chance of getting > # through Tells me (spammer hat on) that I can send message with a non-spammish text only part, and a spam html part since most "non-techie" email client users automatically display the html version when available, however Tim's implementation will ignore it. Most "average users" never even see the text-only part of multipart messages. In Tim's application, that's okay since he's going to use the text-only part anyway. But for my purposes, I need to consider both portions. So it's simpler for me to strip html and combine that text with the text-only part and then "test" the combined parts. Well these are just musings, I'll be looking for the SF project. -Brad Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From Anthony Baxter Thu Sep 5 15:28:25 2002 From: Anthony Baxter (Anthony Baxter) Date: Fri, 06 Sep 2002 00:28:25 +1000 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: <3D772EC2.30217.184B6C78@localhost> Message-ID: <200209051428.g85ESPR24749@localhost.localdomain> >>> "Brad Clements" wrote > This is one way to do it, but I was planning on experimenting with tokenizer methods > that strip out HTML tags, leaving only the text. The set I'm working with, I found I needed to strip out everything but for src="" and href="" attributes of tags. Too much goodness in them for the system to get it's teeth into. > Tells me (spammer hat on) that I can send message with a non-spammish text > only part, and a spam html part since most "non-techie" email client users > automatically display the html version when available, however Tim's > implementation will ignore it. I've actually got a bunch of spam like that. The text/plain is something like **This is a HTML message** and nothing else. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From guido@python.org Thu Sep 5 15:33:34 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 10:33:34 -0400 Subject: [Python-Dev] Proposed Mixins for Wide Interfaces In-Reply-To: Your message of "Wed, 04 Sep 2002 17:40:34 EDT." <001801c2545b$b43aba60$e8ea7ad1@othello> References: <001101c2510d$9fce0920$5f66accf@othello> <200209031750.g83HoVq05812@odiug.zope.com> <001801c2545b$b43aba60$e8ea7ad1@othello> Message-ID: <200209051433.g85EXY612883@odiug.zope.com> > [RH] > > > How about adding some mixins to simplify the > > > implementation of some of the fatter interfaces? On second thought, I don't think there's enough here to warrant putting this in the standard library. E.g. the example from BaseSet actually strikes me as indirect: because <= is the natural operation to provide for sets, hanging everything off __lt__ looks forced. Maybe this could go into the Demo directory or in some example or HOWTO. We'll revise this issue when we are going to introduce a standard type or interface hierarchy (not for Python 2.3). --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Thu Sep 5 15:43:50 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 5 Sep 2002 10:43:50 -0400 Subject: [Python-Dev] Proposed Mixins for Wide Interfaces References: <001101c2510d$9fce0920$5f66accf@othello> <200209031750.g83HoVq05812@odiug.zope.com> <001801c2545b$b43aba60$e8ea7ad1@othello> <200209051433.g85EXY612883@odiug.zope.com> Message-ID: <003901c254ea$a752d820$f6eb7ad1@othello> > > [RH] > > > > How about adding some mixins to simplify the > > > > implementation of some of the fatter interfaces? [GvR] > On second thought, I don't think there's enough here to warrant > putting this in the standard library. E.g. the example from BaseSet > actually strikes me as indirect: because <= is the natural operation > to provide for sets, hanging everything off __lt__ looks forced. Agreed. How about the MappingMixin and SequenceMixin? These both provide much more meat and have more natural attach points (getitem, setitem, delitem). From guido@python.org Thu Sep 5 15:53:08 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 10:53:08 -0400 Subject: [Python-Dev] Proposed Mixins for Wide Interfaces In-Reply-To: Your message of "Thu, 05 Sep 2002 10:43:50 EDT." <003901c254ea$a752d820$f6eb7ad1@othello> References: <001101c2510d$9fce0920$5f66accf@othello> <200209031750.g83HoVq05812@odiug.zope.com> <001801c2545b$b43aba60$e8ea7ad1@othello> <200209051433.g85EXY612883@odiug.zope.com> <003901c254ea$a752d820$f6eb7ad1@othello> Message-ID: <200209051453.g85Er8j12983@odiug.zope.com> > How about the MappingMixin and SequenceMixin? These both > provide much more meat and have more natural attach points > (getitem, setitem, delitem). I'd much rather have a howto that explains all the issues. This stuff is vastly underdocumented. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Sep 5 16:01:14 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 11:01:14 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Thu, 05 Sep 2002 00:54:14 EDT." <20020905045414.GA26104@hishome.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> <20020904160143.GA1483@hishome.net> <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> <20020905045414.GA26104@hishome.net> Message-ID: <200209051501.g85F1EY13017@odiug.zope.com> > > I have never understood why a child dying should send a signal. > > You can poll for the child with waitpid() instead. > > You're assuming too much about the structure of the program using > child processes. The code that starts the child process may not be > in control of the Python program counter by the time it ends. It's > useful to be able to leave a signal handler to clean up the zombie > process by waitpid(). I admit that I hate signals so badly that whenever I needed to wait for a child to finish I would always structure the program around this need (even when coding in C). > > But if you have a suggestion for how to fix this particular issue, I'd > > be happy to look it over, since this *is* something some people do. > > Of course people do it - it's documented and it works. Barely. This thread started when you pointed out the problems with using signals. I've always been reluctant about the fact that we had a signal module at all -- it's not portable (no non-Unix system supports it well), doesn't interact well with threads, etc., etc.; however, C programmers have demanded some sort of signal support and I caved in long ago when someone contributed a reasonable approach. I don't regret it like lambda, but I think it should only be used by people who really know about the caveats. > > See above. I see half your point; people wanting this tend to use > > signals and it causes breakage. > > Polling is not what I'd call "getting notification of asynchronous events". > If it causes breakage it could be because people either use it incorrectly > or the signal support on the underlying system is broken. In Linux it isn't > broken. If it's broken on other Python platforms I don't see why it > shouldn't be well-supported on the platforms that aren't. I meant in Python. The I/O problems make signals hard to use. > Has anyone here actually tried to use signal.signal ? Yes. > Code in signal handlers is executed at some arbitrary point in the > program and the programmer should be aware of this and only do so > simple things like setting a flag or appending to a list. Unfortunately the mechanism doesn't enforce this. I wish we could invent a Python signal API that only lets you do one of these simple things. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Sep 5 02:34:21 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 04 Sep 2002 21:34:21 -0400 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: <3D7653AD.14352.14F391B6@localhost> Message-ID: Guido addressed most points, so I'll just cover a few: [Brad Clements] > ... > I'd like to replicate Tim's test rig so I can compare my results > with existing ones. My spam isn't in mbox format, but I can convert it. Mine isn't either . Barry gave me mboxes, but the spam corpus I got off the web had one spam per file, and it only took two days of extreme pain to realize that one msg per file is enormously easier to work with when testing: you want to split these at random into random collections, you may need to replace some at random when testing reveals spam mistakenly called ham (and vice versa), etc -- even pasting examples into email is much easier when it's one msg per file (and the test driver makes it easy to print a msg's file path). My test driver and tokenizer are checked in (timtest.py), and also a little utility or two. The directory structure under my spambayes directory looks like so: Data/ Spam/ Set1/ (contains 2750 spam .txt files) Set2/ "" Set3/ "" Set4/ "" Set5/ "" Ham/ Set1/ (contains 4000 ham .txt files) Set2/ "" Set3/ "" Set4/ "" Set5/ "" reservoir/ (contains "backup ham") If you use the same names and structure, huge mounds of the tedious testing code will work as-is. The more Set directories the merrier, although you'll hit a point of diminishing returns if you exceed 10. The "reservoir" directory contains a few thousand other random hams. When a ham is found that's really spam, I delete it, and then the rebal.py utility moves in a message at random from the reservoir to replace it. If I had it to do over again, I think I'd move such spam into a Spam set (chosen at random), instead of deleting it. > I'm particularly intersted in how to allow html only messages > (reduce false positives). I'm getting a lot of personal mail in that > format, unfortunately. It will learn about that -- not a problem. It's a problem in *my* tests because HTML mail is so strongly hated on tech lists, but newbies use it there anyway, and it would be horrid to block newbies just because they're normal people who enjoy creating visually attractive messages <0.9 wink>. Read the "What about HTML?" section in timtest.py. You may also with to remove the guard from if part.get_content_type() == "text/plain": text = html_re.sub(' ', text) in tokenize(). Once you have a good test setup, you can try it both ways, and the data will tell you which way works best for your normal mix. Details of runs both ways on my c.l.py corpora are given in the "What about HTML?" section mentioned before, and even there stripping HTML decorations out of HTML-only messages had an insignificant effect on the f-p rate. It increased the f-n rate, though, and precisely because HTML messages are so very rare on c.l.py that they're *almost* certainly spam. From python@rcn.com Thu Sep 5 16:43:20 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 5 Sep 2002 11:43:20 -0400 Subject: [Python-Dev] GBayes design Message-ID: <002b01c254f2$f6c7c020$71b53bd0@othello> Is it too late to challenge a core design decision? Instead of multiplying probablities, use fuzzy logic methods. Classify the indicators into damning, strong, weak, neautral, ... After counting the number of indicators in each class, make a spam/ham decision that can be easily tweaked. This would make it easy to implement variations of Tim's recent clear win, where additional indicators are gathered until the balance shifts sharply to one side. Some other advantages are: -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... ) -- avoids mathematical issues with indicators not being independent -- allows the addition of non-token based indicators. for instance, a preponderance of caps would be a weak indicator. the presence of caps separated by spaces would be a strong indicator. -- the decision logic would be more intuitive -- avoids the issue of having equal amounts of spam and ham in the sample The core concept would stay the same -- it's really just a shift from continuous to discrete. of-course-this-is-entirely-outside-my-fields-of-knowledge-ly yours, Raymond Hettinger From mgilfix@eecs.tufts.edu Thu Sep 5 18:23:05 2002 From: mgilfix@eecs.tufts.edu (Michael Gilfix) Date: Thu, 5 Sep 2002 13:23:05 -0400 Subject: [apug] Re: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: <1031469093.644.196.camel@HillCountryPeress>; from hu.peress@mail.mcgill.ca on Sun, Sep 08, 2002 at 02:11:32AM -0500 References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <20020905004503.A9680@eecs.tufts.edu> <1031469093.644.196.camel@HillCountryPeress> Message-ID: <20020905132305.A19681@eecs.tufts.edu> Ok. I think I understand better what you're trying to accomplish. I got the impression earlier (and I think others did as well) that you were hoping to have pydoc automatically label types on the function call. A new convention might very well be welcomed. You might want to post a couple of examples and the corresponding documentation for feedback here before you start the hard work on the patch :) More below... On Sun, Sep 08 @ 02:11, Hunter Peress wrote: > Actually all of the thinking i did WAS taking into account the "dynamic" > nature of python. > > But its not like the actual code is being rewritten fast enough to make > this unfeasible or unneccesary. > > Im glad to get all of this feedback as its helping me formulate, and > further specify my plans (or eventually healthily debunk them (as the > past 3 responders have helped do)). > > Instead of just thinking: > > "arguments are not explicitely anything, therefore it makes no sense to > even attempt to document them explicitely". > > I think this: simply add the capability for multiple definitions per > each argument. eg going back to my original sample here is an updated > version: > > def something(a,b,c="lalal"): > """This will find its way into the pydocs because its a comment""" > ##Here is the new stuff Im proposing > ##note, a clearer sytnax can surely be devised. > """file,socket""" #documents the type(s) of the first arg > """string,list""" # "" second > """list,hash""" # "" third > """string,hash""" #documents the return type(s). > > Thats quite a simple solution, and still provides worlds better > exactness and clarity than the current system allows. > > Onto more of your concerns: > On Wed, 2002-09-04 at 23:45, Michael Gilfix wrote: > > While I understand what you're trying to do here (and think it would > > be quite nice), I'm not sure how you're going to accomplish it. How > > will parsing python using a syntax-tree help? It's not going to tell > > you what the function does in all cases or the various types it could > > handle.Perhaps you could make educated guesses by looking at the > > types of operations on the objects (a 'has_key' is a sure indicator of > > a hash), but that would be sketchy at best. > Actually I wasnt suggesting this AT ALL wrt intelligent guesses, and for > now this proposal leans away from it. > > Rather there are only 2 simple things that I wanted to obtain from the > parse-tree: the number of arguments, and if possible to see if there Agreed now that things are clearer. > Assume for now that my whole proposal will simply be another option > (instead of the default) to the pydoc-generator program. If invoked, it > will fail (if the super strict option is specified) if you don't supply > definitions for number of args for a given method. > > This brings up your "dynamic" language issue again. > > When u have lots of args being used as different things, my program then > introduces another level of complexity to deciphering the docs in a > meaningful way. > > Eg: a sample output of this program based on my example: > > ------------output----------------- > method: something(a (file,socket),b (string,list),c="lalal" > (list,hash)) > return type :string,hash > > This will find its way into the pydocs because its a comment > ----------------------------------- > > Now in html format it would be even nicer as there will be links to > the types listed. I agree. I know that I'd welcome an extra added option to enable some extra pydoc functionality. Developing a schema is tricky though and you should probably engage in some more debate first :) > And now looking at it, I think its much clearer than nothing at al. > > Of course there is going to be that type of code where u have no need of > documenting every method because their names are self explantory, and > such explicit documentation isnt necessary, thats not what this is > really intended for. > > If the specific argument arises that "since python is a dynamic language > your approach doesnt make sense" say, then I have to respond: Of course it applies. Because of my misunderstanding, I was under the impression that you wanted to generate the equivalent of function calls, not develop a scheme like javadoc. The dynamic nature of Python means that such specifications become even more important as project sizes increase. > an attempt at specifiying things is FAR better than nothing, and > moreover, this is only my first attempt. Allowing it to become a part of > the generator as an option will open it up to user input, and hence > improvement, AND! > > *** > it might just turn out that a "dynamic" approach will be necessary to > document a "dynamic" language. > *** > > So im still looking for more design tips, and a place where I could find > out how to get into the meat of the python parser, but i think the > "http://python.org/doc/2.2/lib/module-parser.html" is probably what I'll > be using. Shouldn't there be code in the existing pydoc to do much of what you want for you? It seems like it might be nice to re-engineer pydoc to take some handlers that allow you to do further customization after it's done it's thing. That way, we can add extensions into the existing code and all that integration stuff might be a little easier. Good luck n' keep us posted :) -- Mike -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html" From spambayes@python.org Thu Sep 5 18:57:17 2002 From: spambayes@python.org (Tim Peters) Date: Thu, 05 Sep 2002 13:57:17 -0400 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: <3D772EC2.30217.184B6C78@localhost> Message-ID: [Followups directed to spambayes@python.org http://mail.python.org/mailman-21/listinfo/spambayes ] [Brad Clements] > ... > My feeling is that the presentation of "the message" is independent of the > message itself, so if I get a message in Text, HTML, RTF only the actual > content is important, not the markup method. Everything's A Clue. Everything that gets ignored partly blinds the classifier, so the question isn't whether there's a difference, it's how much of a difference it makes. > Though I suppose using lots of red and large fonts might be an > indicator of spam, the text of the message should still suffice. Indeed, Graham reported that the hex color code for bright red was one of the strongest spam indicators in his database. > Tim's comments in timtest.py hint that stripping tags isn't a > catastrophe for f-n's, but he's not planning on doing that for use on > technical lists. When HTML-only email is a 99.99% spam indicator on a tech list, it would be crazy to ignore that clue. But note that the comments *also* say I'd be delighted to remove HTML tags even there if some other way of slashing the f-n rate is proven to work (and most people who have tried it say that mining more header lines does do it -- but then I haven't seen anything from them about how they do when they ignore the header lines. I was happy to ignore header lines in order to get *some* kind of handle on how well could be done on "pure content", and turned out that works remarkably well). >> # So if a message is multipart/alternative with both text/plain >> # and text/html branches, we ignore the latter, else newbies would never >> # get a message through. If a message is just HTML, it has virtually no >> # chance of getting through > Tells me (spammer hat on) that I can send message with a > non-spammish text only part, and a spam html part since most > "non-techie" email client users automatically display the html > version when available, however Tim's implementation will ignore it. Sure. It *certainly* isn't a problem on my test data (as witnessed by the measured error rates). If the nature of the world changes, the code has to adapt along with it. But 90% of the spam I receive (and I get a lot) is still trivial to recognize from a mere glance at the subject line, and I don't buy that spammers are a class of ubergeek with formidable skill. Response rates are a percentage game, and more so than anti-spammers I expect spammers are keen to go for high-percentage wins at the expense of esoterica. > Most "average users" never even see the text-only part of > multipart messages. In Tim's application, that's okay since he's going > to use the text-only part anyway. But for my purposes, I need to consider > both portions. So it's simpler for me to strip html and combine that text > with the text-only part and then "test" the combined parts. Not unreasonable , but testing remains the only way to decide. It's rare you can out-think a fraction of a percent! From oren-py-d@hishome.net Thu Sep 5 10:30:02 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Thu, 5 Sep 2002 05:30:02 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020905093002.GA61136@hishome.net> On Wed, Sep 04, 2002 at 05:21:44PM -0400, François Pinard wrote: > I'm not fully familiar with all the details of this problem, it surely has > been in the air for quite a long time now (I might have first heard of it > while Taylor UUCP was being developed). It might be dependent on the > underlying system. If I'm not mistaken, this is Ian Taylor who introduced the > following Autoconf macro: > > > - Macro: AC_SYS_RESTARTABLE_SYSCALLS > If the system automatically restarts a system call that is > interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'. The name of this macro is misleading. It doesn't check whether system calls are restartABLE but whether they are restartED automatically by libc. It forks a subprocess that sends a signal to the parent. The parent waits for the child and checks if the wait() was interrupted. If this macro is defined you will never get EINTR so there's no need to worry about this. If it isn't defined you need to restart system calls yourself. If a platform really has interruptible I/O calls that cannot be continued or restarted without data loss there is no way to use signal handlers on that system. I doubt that such totally broken platforms are common these days. > In GNU file utilities (now merged within the new GNU coreutils), Jim Meyering > uses restart wrappers for many I/O functions, so the idea of wrappers has been > maturing for a while, and is used in basic, heavily used programs. I'll check the sources. Oren From spambayes@python.org Thu Sep 5 18:39:36 2002 From: spambayes@python.org (Tim Peters) Date: Thu, 05 Sep 2002 13:39:36 -0400 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: <200209051428.g85ESPR24749@localhost.localdomain> Message-ID: [Followups directed to spambayes@python.org http://mail.python.org/mailman-21/listinfo/spambayes ] [Anthony Baxter] > ... > I've actually got a bunch of spam like that. The text/plain is something > like > > **This is a HTML message** > > and nothing else. Are you sure that's in a text/plain MIME section? I've seen that many times myself, but it's always been in the prologue (*between* MIME sections -- so it's something a non-MIME aware reader will show you). From spambayes@python.org Thu Sep 5 19:30:03 2002 From: spambayes@python.org (Tim Peters) Date: Thu, 05 Sep 2002 14:30:03 -0400 Subject: [Python-Dev] GBayes design In-Reply-To: <002b01c254f2$f6c7c020$71b53bd0@othello> Message-ID: [Followups directed to spambayes@python.org http://mail.python.org/mailman-21/listinfo/spambayes ] [Raymond Hettinger] > Is it too late to challenge a core design decision? Never too late, but somebody has to do real work to prove that a change is justified. Plausible ideas are cheaper than dirt, alas. > Instead of multiplying probablities, use fuzzy logic methods. > Classify the indicators into damning, strong, weak, neautral, ... Think about how that differs from 0.99, 0.80, 0.20 and 0.50. Does it? > After counting the number of indicators in each class, make > a spam/ham decision that can be easily tweaked. This would > make it easy to implement variations of Tim's recent clear > win, where additional indicators are gathered until the > balance shifts sharply to one side. > > Some other advantages are: > -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... ) I've seen people see the current prob("TV") = 0.99 style cold and pick it up at once. With character n-grams I think it's frustrating, but word-like tokenization gives easily recognized clues. > -- avoids mathematical issues with indicators not being independent How do you know this? > -- allows the addition of non-token based indicators. for instance, > a preponderance of caps would be a weak indicator. the presence > of caps separated by spaces would be a strong indicator. As far as the current classifier is concerned, "a token" is any Python object usable as a dict key. There are already several ways in which the current tokenization scheme in timtest.py uses strings to *represent* non-textual indicators. For example, if the headers lack an Organization line, a 'bool:noorg' "token" is generated. For large blobs of text that get skipped, a token is generated that records both the first character in that blob and the number of bytes skipped (chopped to the nearest multiple of 10). And so on -- you can inject anything you like into the scheme, including stuff like "number of caps separated by spaces: more than 10" (BTW, I happen to know that this particular "clue" acts to block relevant conference announcements, not just spam) I got some interesting results by injecting a crude characters/word statistic: yield "cpw:%.1g" % (float(len(text)) / len(text.split())) There are certain values of that statistic that turned out to be killer-strong spam indicators, but there's a potential problem I've mentioned before: if you have an unbounded number of free parameters you can fiddle, you can train a system to fit any given dataset exactly. That's in part why replication of results by others is necessary to make schemes like this superb (I can only make one merely excellent on my own ). > -- the decision logic would be more intuitive > -- avoids the issue of having equal amounts of spam and ham in > the sample It's not clear that this matters; some results of preliminary experiments are written up in the code comments. The way Graham computes P(Spam | Word) is via ratios, *as if* there were an equal number of each; and that's consistent with the other bogus equality assumption in the scorer. I haven't yet changed all these guys at the same time to take P(Spam) and P(Ham) into account. BTW, note that all the results I've reported had a ham/spam training ratio of 4000/2750. I left that non-unity on purpose. > The core concept would stay the same -- it's really just a shift from > continuous to discrete. Let us know how it turns out . From barry@zope.com Thu Sep 5 19:06:59 2002 From: barry@zope.com (Barry A. Warsaw) Date: Thu, 5 Sep 2002 14:06:59 -0400 Subject: [Python-Dev] New `spambayes' project on SourceForge Message-ID: <15735.40259.117828.402419@anthem.wooz.org> There's been a ton of press about applying Bayesian classifiers to spam detection lately, spurred on by Paul Graham's recent paper "A Plan for Spam" http://www.paulgraham.com/spam.html Tim Peters has done an incredible amount of work on our Python implementation of this idea. Some of the reasons why I think Tim's work is so cool is that he's brought along his deep knowledge of speech recognition's related issues, and his obsessive devotion to reducing the amount of spam I ultimately have to delete . In order to encourage more participation from the wider open source community, we've moved the code from a backwater of the Python cvs tree to its own project on SourceForge. The hope is that more people will be able to contribute to ideas, testing, and integration of the basic algorithms with other systems such as mail daemons, mailing list managers, and mail clients. The project is called "spambayes" (for lack of creativity on our part :) and is hosted here: http://sf.net/projects/spambayes If you're interested in becoming a developer on the project, let me know. Otherwise you can of course get anonymous checkouts of the code. There are also two mailing lists related to the spambayes project. The first is a general discussion list: http://mail.python.org/mailman-21/listinfo/spambayes and the other is a list for cvs checkin message notices: http://mail.python.org/mailman-21/listinfo/spambayes-checkins Feel free to join those lists (and help be a guinea pig for Mailman 2.1 :). Enjoy, -Barry PS to Python-devers: the code has been removed from nondist/sandbox/spambayes, so you won't be able to hack on it there. Also, please move discussion about this from python-dev@python.org to spambayes@python.org From nas@python.ca Thu Sep 5 19:52:28 2002 From: nas@python.ca (Neil Schemenauer) Date: Thu, 5 Sep 2002 11:52:28 -0700 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <20020905093002.GA61136@hishome.net> References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> <20020905093002.GA61136@hishome.net> Message-ID: <20020905185228.GA19726@glacier.arctrix.com> Oren Tirosh wrote: > If this macro is defined you will never get EINTR so there's no need to > worry about this. If it isn't defined you need to restart system calls > yourself. I don't think that is correct. Only certain systems calls will be restarted (for BSD 4.2 it's ioctl, read, readv, write, writev, wait, and waitpid). I think the system calls restarted varies depending on the OS. Signals are a gigantic mess. I'm starting to doubt that you realize the extent of the brain damage. While I would be pleased if there was some way Python could hide the mess, I'm not convinced it is possible. Neil From guido@python.org Thu Sep 5 19:19:02 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 14:19:02 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Thu, 05 Sep 2002 05:30:02 EDT." <20020905093002.GA61136@hishome.net> References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> <20020905093002.GA61136@hishome.net> Message-ID: <200209051819.g85IJ2113867@odiug.zope.com> > > - Macro: AC_SYS_RESTARTABLE_SYSCALLS > > If the system automatically restarts a system call that is > > interrupted by a signal, define `HAVE_RESTARTABLE_SYSCALLS'. > > The name of this macro is misleading. It doesn't check whether system calls > are restartABLE but whether they are restartED automatically by libc. It > forks a subprocess that sends a signal to the parent. The parent waits for > the child and checks if the wait() was interrupted. > > If this macro is defined you will never get EINTR so there's no need to > worry about this. If it isn't defined you need to restart system calls > yourself. This was a feature introduced by BSD Unix in a distant past, as a change from v7 Unix (which had only the EINTR behavior). For b/w compatibility, BSD had a system call to disable the restart feature. I'm guessing that over the years the feature has been found less than helpful, so POSIX defaults to off. POSIX sigaction() has a flag SA_RESTART to enable restarting. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Sep 5 20:15:54 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 15:15:54 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Thu, 05 Sep 2002 11:52:28 PDT." <20020905185228.GA19726@glacier.arctrix.com> References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> <20020905093002.GA61136@hishome.net> <20020905185228.GA19726@glacier.arctrix.com> Message-ID: <200209051915.g85JFsR14171@odiug.zope.com> > Signals are a gigantic mess. I'm starting to doubt that you realize the > extent of the brain damage. While I would be pleased if there was some > way Python could hide the mess, I'm not convinced it is possible. Thanks for the support Neil. That's exactly how I think about it. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Thu Sep 5 15:57:45 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 5 Sep 2002 09:57:45 -0500 Subject: [Python-Dev] Getting started with GBayes testing In-Reply-To: <3D772EC2.30217.184B6C78@localhost> References: <3D7653AD.14352.14F391B6@localhost> <3D772EC2.30217.184B6C78@localhost> Message-ID: <15735.28905.730200.821228@12-248-11-90.client.attbi.com> Brad> My feeling is that the presentation of "the message" is Brad> independent of the message itself, so if I get a message in Text, Brad> HTML, RTF only the actual content is important, not the markup Brad> method. Though I suppose using lots of red and large fonts might Brad> be an indicator of spam, the text of the message should still Brad> suffice. You might be surprised. In Paul Graham's "A New Plan for Spam" he writes: I don't know why I avoided trying the statistical approach for so long. I think it was because I got addicted to trying to identify spam features myself, as if I were playing some kind of competitive game with the spammers. (Nonhackers don't often realize this, but most hackers are very competitive.) When I did try statistical analysis, I found immediately that it was much cleverer than I had been. It discovered, of course, that terms like "virtumundo" and "teens" were good indicators of spam. But it also discovered that "per" and "FL" and "ff0000" are good indicators of spam. In fact, "ff0000" (html for bright red) turns out to be as good an indicator of spam as any pornographic term. As Tim has pointed out several times, intuition and hunches about this stuff often turns out to be incorrect. Skip From jason-exp-1031947065.5eb24b@mastaler.com Thu Sep 5 21:01:37 2002 From: jason-exp-1031947065.5eb24b@mastaler.com (jason-exp-1031947065.5eb24b@mastaler.com) Date: Thu, 05 Sep 2002 14:01:37 -0600 Subject: [Python-Dev] Re: New `spambayes' project on SourceForge References: <15735.40259.117828.402419@anthem.wooz.org> Message-ID: barry@zope.com (Barry A. Warsaw) writes: > There are also two mailing lists related to the spambayes project. > The first is a general discussion list: > > http://mail.python.org/mailman-21/listinfo/spambayes > > and the other is a list for cvs checkin message notices: > > http://mail.python.org/mailman-21/listinfo/spambayes-checkins These lists have now been added to Gmane (http://gmane.org) as well: spambayes@python.org <==> news://news.gmane.org/gmane.mail.spam.spambayes.general spambayes-checkins@python.org <==> news://news.gmane.org/gmane.mail.spam.spambayes.cvs -- (http://tmda.net/) From oren-py-d@hishome.net Thu Sep 5 21:27:16 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Thu, 5 Sep 2002 23:27:16 +0300 Subject: [Python-Dev] Re: Signal-resistant code In-Reply-To: <200209051501.g85F1EY13017@odiug.zope.com>; from guido@python.org on Thu, Sep 05, 2002 at 11:01:14AM -0400 References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> <20020904160143.GA1483@hishome.net> <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> <20020905045414.GA26104@hishome.net> <200209051501.g85F1EY13017@odiug.zope.com> Message-ID: <20020905232716.A8225@hishome.net> On Thu, Sep 05, 2002 at 11:01:14AM -0400, Guido van Rossum wrote: > > > I have never understood why a child dying should send a signal. > > > You can poll for the child with waitpid() instead. > > > > You're assuming too much about the structure of the program using > > child processes. The code that starts the child process may not be > > in control of the Python program counter by the time it ends. It's > > useful to be able to leave a signal handler to clean up the zombie > > process by waitpid(). > > I admit that I hate signals so badly that whenever I needed to wait > for a child to finish I would always structure the program around this > need (even when coding in C). Ummm... if you really hate signals that much perhaps you to step aside from this particular discussion? Naturally, you will get to pronounce on the results that come out of it (if any ;-) Westley: No, no. We have already succeeded. I mean, what are the three terrors of the fire swamp? One, the flame spurt - no problem - there's a popping sound preceding each. We can avoid that. Two, the lightning sand which you were clever enough to discover what that looks like, so in the future we can avoid that too. (from "The Princess Bride" by William Goldman) So what are the three problems of signals? One - what calls are allowed by the platform inside a signal handler. No problem. Nobody suggested actually executing Python code inside a signal handler so we don't need to be worried about user code. The C handler doesn't call anything unusual, just sets flags. This should work on all platforms. Two - Interruptible system calls. If all Python I/O calls are wrapped inside restarting wrappers this should be solved. If the system's libc wraps them it can be disabled by SA_RESTART (posix) or siginterrupt (BSD). On some systems read and recv return a short buffer instead of EINTR. This can be safely ignored because it only happens for pipes and sockets where this is a valid result. AFAIR it's guaranteed not to happen on regular files so we won't be tricked into believing they reached EOF. Are there any systems where system calls are interruptible but not restartable in any way without data loss? Three - Threads playing "who gets the signal". The Python signal module has a hack that appears to work on all relevant platform - ignore the signal if getpid() isn't the main thread. Oren Buttercup: Westley, what about the R.O.U.S.'s? Westley: Rodents Of Unusual Size? I don't think they exist. ... From stephen@ixokai.net Thu Sep 5 21:27:47 2002 From: stephen@ixokai.net (Stephen Hansen) Date: 05 Sep 2002 13:27:47 -0700 Subject: [Python-Dev] SF patch#555779, "import user" and Apache... *humble* Message-ID: <1031257667.16739.5.camel@jeremy> *cough* So. Hi. Python-Gods. Um. So. Anyways. *embarassed* I submitted a really tiny patch to SF awhile back, #555779, which would make "import user" actually useful in a certain specific CGI situation. The BDFL seemed to have no problems and said anyone could commit it.. no one has. :) Now, i'm not impatient at all, its already patched into all the machines i'm working on... however, i'm just sending this little reminder in the hopes that it won't be forgotten until after 2.3 comes out. :) I don't want to re-patch everything again later, i've got quite a few machines currently using it. :) *cough* So. Yes. Well. Thank you for your time. :) *runs away* --Stephen From mal@egenix.com Thu Sep 5 18:19:57 2002 From: mal@egenix.com (M.-A. Lemburg) Date: Thu, 05 Sep 2002 19:19:57 +0200 Subject: [Python-Dev] GBayes design References: <002b01c254f2$f6c7c020$71b53bd0@othello> Message-ID: <3D77923D.8060108@lemburg.com> Raymond Hettinger wrote: > Is it too late to challenge a core design decision? > > Instead of multiplying probablities, use fuzzy logic methods. > Classify the indicators into damning, strong, weak, neautral, ... > > After counting the number of indicators in each class, make > a spam/ham decision that can be easily tweaked. This would > make it easy to implement variations of Tim's recent clear > win, where additional indicators are gathered until the > balance shifts sharply to one side. > > Some other advantages are: > -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... ) > -- avoids mathematical issues with indicators not being independent > -- allows the addition of non-token based indicators. for instance, > a preponderance of caps would be a weak indicator. the presence > of caps separated by spaces would be a strong indicator. > -- the decision logic would be more intuitive > -- avoids the issue of having equal amounts of spam and ham in > the sample > > The core concept would stay the same -- it's really just a shift from > continuous to discrete. Hmm, there's nothing discrete about fuzzy logic (ok, this claim is 0.65% true ;-) The problem is more about multi-dimensional optimization where you are interested in distilling several different inputs into one value. A weighted average is the simplest form to use here and there are various multi-dimensional optimization algorithms around to aid in finding the "optimal" weights. Another approach would be using a shallow neural network. The only "problem" with these is that Tim generates a variable number of inputs, AFAICT, so that you'd have to use some preprocessing to make the number of inputs constant. Would make a nice internship project, I guess :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From hu.peress@mail.mcgill.ca Sun Sep 8 03:22:40 2002 From: hu.peress@mail.mcgill.ca (Hunter Peress) Date: 07 Sep 2002 21:22:40 -0500 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: <003d01c25471$d83fe960$2fd8accf@othello> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> Message-ID: <1031451760.644.97.camel@HillCountryPeress> On Wed, 2002-09-04 at 19:19, Raymond Hettinger wrote: > From: "Hunter Peress" > > > def something(a,b,c="lalal"): > > """This will find its way into the pydocs because its a comment""" > > ##Here is the new stuff Im proposing > > ##note, a clearer sytnax can surely be devised. > > """file""" #documents the type of the first arg > > """string""" # "" second > > """list""" # "" third > > """string""" #documents the return type. > > > > Then the pydoc generator will do a check on the # arguments to the > > func/meth, verify that the correct amount of these new comments (which > > only supply the type) are provided. I do think that it would help to > > actually enforce this. I think its fine that doc's NOT be generated if > > they don't supply this information. This provides for better docs and > > shouldnt get that many complaints. > > Thanks for the clarification. I see what you're trying to do; > however, I think that any gains are more than offset by the new > level of complexity and lengthier code. > > The current docs make a pretty good effort at describing what is > needed for each argument. At the same time, they allow flexibility > for dynamic arguments that share a similar interface (such as > substituting a StringIO object for a File object. > > In your example, the docs strings could be made clear > using existing tools: > > def something(file, promptstring, optionlist): > """Returns a string extracted from the file > for any line matching the promptstring. > The optionlist can include any of the > following: IGNORECASE, VERBOSE. > MULTILINE, or ADDLINENUMBER.""" > > I can't see that a tool like you described would add any > more clarity than the above docstring. > > > PS whats TIA mean? > > "Thanks In Advance" > > Do you have any examples of current python docstrings that are > not clear enough? this was the impetus behind my whole thinking here. I need not search far. example 1) pydoc os.fork Python Library Documentation: built-in function fork in os fork(...) fork() -> pid Fork a child process. Return 0 to child process and PID of child to parent process. example2) pydoc string.index Python Library Documentation: function index in string index(s, *args) index(s, sub [,start [,end]]) -> int Like find but raises ValueError when the substring is not found. >From these two, I have no idea what BOTH the input and return types are. I found those examples in 10 seconds (literally). The state of the python documentation is caca. And your complacency is a cause for concern. I think its easier to enforce this from the level i describe, than have guido saying "ok guys please be more explicit in your documentation". I mean, both of those documents above are somewhat explicit, but they are not COMPLETE. Could you provide me with some linkage on parsing python (from a compilation/ syntax-tree analysis POV). SO that i can get to work on writing a patch for the pydoc generation program. > > > Raymond Hettinger > > From guido@python.org Thu Sep 5 21:46:27 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 16:46:27 -0400 Subject: [Python-Dev] Re: Signal-resistant code In-Reply-To: Your message of "Thu, 05 Sep 2002 23:27:16 +0300." <20020905232716.A8225@hishome.net> References: <15733.11253.743055.864572@12-248-11-90.client.attbi.com> <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> <20020904160143.GA1483@hishome.net> <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> <20020905045414.GA26104@hishome.net> <200209051501.g85F1EY13017@odiug.zope.com> <20020905232716.A8225@hishome.net> Message-ID: <200209052046.g85KkR714802@odiug.zope.com> > > I admit that I hate signals so badly that whenever I needed to wait > > for a child to finish I would always structure the program around this > > need (even when coding in C). > > Ummm... if you really hate signals that much perhaps you to step aside > from this particular discussion? Naturally, you will get to pronounce on > the results that come out of it (if any ;-) Why? I don't think hating signals disqualifies me from understanding their problems. > So what are the three problems of signals? > > One - what calls are allowed by the platform inside a signal handler. > No problem. Nobody suggested actually executing Python code inside a > signal handler so we don't need to be worried about user code. The C > handler doesn't call anything unusual, just sets flags. This should work > on all platforms. > > Two - Interruptible system calls. If all Python I/O calls are wrapped > inside restarting wrappers this should be solved. I asked what the Python code called by the wrapper when a signal arrives is allowed to do (e.g. close the file?). If you replied to that, I missed it. > If the system's libc wraps them it can be disabled by SA_RESTART > (posix) or siginterrupt (BSD). On some systems read and recv return > a short buffer instead of EINTR. This latter sentence shows that you don't understand signals, or you're being very sloppy. You get *either* a short buffer *or* EINTR depending on whether some data was already transferred to user space. > This can be safely ignored because it only happens for pipes and > sockets where this is a valid result. AFAIR it's guaranteed not to > happen on regular files so we won't be tricked into believing they > reached EOF. I don't believe that a short read on a regular file can be used reliably to infer EOF anyway. The file could be growing while we read. > Are there any systems where system calls are interruptible but not > restartable in any way without data loss? Not AFAIK. > Three - Threads playing "who gets the signal". The Python signal module > has a hack that appears to work on all relevant platform - ignore the > signal if getpid() isn't the main thread. Doesn't that make signals unreliable? What if thread 4 has forked a child, and the child exist? Won't the SIGCHLD be sent to thread 4? AFAIK there's no standard for this, or if there is, not all systems comply. --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Thu Sep 5 21:45:52 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Thu, 5 Sep 2002 16:45:52 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <20020905185228.GA19726@glacier.arctrix.com> References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> <20020905093002.GA61136@hishome.net> <20020905185228.GA19726@glacier.arctrix.com> Message-ID: <20020905204552.GA51795@hishome.net> On Thu, Sep 05, 2002 at 11:52:28AM -0700, Neil Schemenauer wrote: > > Signals are a gigantic mess. I'm starting to doubt that you realize the > extent of the brain damage. While I would be pleased if there was some > way Python could hide the mess, I'm not convinced it is possible. > > Neil Ah... I can almost hear the pain, frustration and despair in your voice. Obviously Guido and you got burned by this. I know other old-time Unix hackers with the same attitude. From my experience signals on Linux work just fine - I don't carry any signal scars. I can show off my Oracle scars, though. They're really gnarly. I can't hear that name mentioned without turning completely irrational about it. Certain embedded software and hardware makers also make me want to scream. Oren From guido@python.org Thu Sep 5 21:51:57 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 05 Sep 2002 16:51:57 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Thu, 05 Sep 2002 16:45:52 EDT." <20020905204552.GA51795@hishome.net> References: <3FE2540C-C047-11D6-89C6-000A27B19B96@oratrix.com> <200209042048.g84KmCK08365@pcp02138704pcs.reston01.va.comcast.net> <20020905093002.GA61136@hishome.net> <20020905185228.GA19726@glacier.arctrix.com> <20020905204552.GA51795@hishome.net> Message-ID: <200209052059.g85Kxop14949@odiug.zope.com> > From my experience signals on Linux work just fine - I don't carry > any signal scars. That just shows you haven't written enough signal code. :-) Seriously, let's please not confuse Linux with portable. The issues here are about the cross-platform viability of your suggested approach. If you've only used signals on Linux, maybe you should withdraw yourself on account of lack of experience with the real issues. --Guido van Rossum (home page: http://www.python.org/~guido/) From paul-python@svensson.org Thu Sep 5 22:08:54 2002 From: paul-python@svensson.org (Paul Svensson) Date: Thu, 5 Sep 2002 17:08:54 -0400 (EDT) Subject: [Python-Dev] Re: Signal-resistant code In-Reply-To: <20020905232716.A8225@hishome.net> Message-ID: On Thu, 5 Sep 2002, Oren Tirosh wrote: >So what are the three problems of signals? >Two - Interruptible system calls. If all Python I/O calls are wrapped >inside restarting wrappers this should be solved. If the system's libc >wraps them it can be disabled by SA_RESTART (posix) or siginterrupt (BSD). >On some systems read and recv return a short buffer instead of EINTR. This >can be safely ignored because it only happens for pipes and sockets where >this is a valid result. AFAIR it's guaranteed not to happen on regular >files so we won't be tricked into believing they reached EOF. Are there >any systems where system calls are interruptible but not restartable >in any way without data loss? I don't see any guarantee against short reads in my documentation (Linux, HP-UX); indeed both state explicitly that only a 0 return from read() indicates EOF. /Paul From neal@metaslash.com Thu Sep 5 22:10:10 2002 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 05 Sep 2002 17:10:10 -0400 Subject: [Python-Dev] SF patch#555779, "import user" and Apache... *humble* References: <1031257667.16739.5.camel@jeremy> Message-ID: <3D77C832.FD342FCD@metaslash.com> Stephen Hansen wrote: > > I submitted a really tiny patch to SF awhile back, #555779, which would > make "import user" actually useful in a certain specific CGI situation. > The BDFL seemed to have no problems and said anyone could commit it.. Done. Neal From python@rcn.com Thu Sep 5 21:49:21 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 5 Sep 2002 16:49:21 -0400 Subject: [Python-Dev] SF patch#555779, "import user" and Apache... *humble* References: <1031257667.16739.5.camel@jeremy> Message-ID: <006801c2551d$b6e0e600$3961accf@othello> I'll check it in for you when I get back from class this evening. Raymond Hettinger BTW, no need for humility around here. ----- Original Message ----- From: "Stephen Hansen" To: Sent: Thursday, September 05, 2002 4:27 PM Subject: [Python-Dev] SF patch#555779, "import user" and Apache... *humble* > *cough* > > So. Hi. Python-Gods. Um. So. Anyways. *embarassed* > > I submitted a really tiny patch to SF awhile back, #555779, which would > make "import user" actually useful in a certain specific CGI situation. > The BDFL seemed to have no problems and said anyone could commit it.. no > one has. :) Now, i'm not impatient at all, its already patched into all > the machines i'm working on... however, i'm just sending this little > reminder in the hopes that it won't be forgotten until after 2.3 comes > out. :) I don't want to re-patch everything again later, i've got quite > a few machines currently using it. :) > > *cough* So. Yes. Well. Thank you for your time. :) > > *runs away* > > --Stephen > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > From fredrik@pythonware.com Thu Sep 5 23:06:22 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 6 Sep 2002 00:06:22 +0200 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) References: <1031437860.636.29.camel@HillCountryPeress><1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> Message-ID: <004701c25528$7c8b4530$ced241d5@hagrid> hunter wrote: > I need not search far. > example 1) pydoc os.fork > Python Library Documentation: built-in function fork in os > fork(...) > fork() -> pid > Fork a child process. > > Return 0 to child process and PID of child to parent process. why do you care about the type of a PID object? in most cases, all you need to know is that a PID isn't 0, which is exactly what the documentation says. and if you know what a PID is, you already know what type it is... > example2) pydoc string.index > Python Library Documentation: function index in string > index(s, *args) > index(s, sub [,start [,end]]) -> int > > Like find but raises ValueError when the substring is not found. > > From these two, I have no idea what BOTH the input and return > types are. the index documentation refers to the documentation for "find", which tells you that: >>> help(string.find) Help on function find in module string: find(s, *args) find(s, sub [,start [,end]]) -> in Return the lowest index in s where substring sub is found, such that sub is contained within s[start,end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure. which, given that you know how indexes and slices work in python, is all you need to know. > I found those examples in 10 seconds (literally). The state of the > python documentation is caca. how long have you been using Python? From oren-py-d@hishome.net Thu Sep 5 23:23:30 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Fri, 6 Sep 2002 01:23:30 +0300 Subject: [Python-Dev] Re: Signal-resistant code In-Reply-To: <200209052046.g85KkR714802@odiug.zope.com>; from guido@python.org on Thu, Sep 05, 2002 at 04:46:27PM -0400 References: <20020904094947.GA56953@hishome.net> <200209041144.g84BiXZ05244@pcp02138704pcs.reston01.va.comcast.net> <20020904124646.GA79746@hishome.net> <200209041325.g84DP1o06695@pcp02138704pcs.reston01.va.comcast.net> <20020904160143.GA1483@hishome.net> <200209042005.g84K5Ms08177@pcp02138704pcs.reston01.va.comcast.net> <20020905045414.GA26104@hishome.net> <200209051501.g85F1EY13017@odiug.zope.com> <20020905232716.A8225@hishome.net> <200209052046.g85KkR714802@odiug.zope.com> Message-ID: <20020906012330.A10575@hishome.net> On Thu, Sep 05, 2002 at 04:46:27PM -0400, Guido van Rossum wrote: > > > I admit that I hate signals so badly that whenever I needed to wait > > > for a child to finish I would always structure the program around this > > > need (even when coding in C). > > > > Ummm... if you really hate signals that much perhaps you to step aside > > from this particular discussion? Naturally, you will get to pronounce on > > the results that come out of it (if any ;-) > > Why? I don't think hating signals disqualifies me from understanding > their problems. In the past I have disqualified myself from making technical decisions on issues where I have been burned and knew that my opinion would be a calm rational decision. > I asked what the Python code called by the wrapper when a signal > arrives is allowed to do (e.g. close the file?). If you replied to > that, I missed it. Anything that a Python thread is allowed to do without grabbing a lock, i.e. anything that involves only exclusive resources or atomic Python operations on shared resources like setting a variable (but not read-modify-write). A signal handler is also allowed to raise an exception that will get delivered to the main thread. > > If the system's libc wraps them it can be disabled by SA_RESTART > > (posix) or siginterrupt (BSD). On some systems read and recv return > > a short buffer instead of EINTR. > > This latter sentence shows that you don't understand signals, or > you're being very sloppy. You get *either* a short buffer *or* EINTR > depending on whether some data was already transferred to user space. Did I say that you get *both* a short buffer *and* EINTR? What I meant is that it's really quite simple - if errno==EINTR I retry and if I get a short buffer I continue from whatever I got and ask for the remainder and this should work regardless of the differences in behavior between different systems, sockets and files, etc. > > This can be safely ignored because it only happens for pipes and > > sockets where this is a valid result. AFAIR it's guaranteed not to > > happen on regular files so we won't be tricked into believing they > > reached EOF. > > I don't believe that a short read on a regular file can be used > reliably to infer EOF anyway. The file could be growing while we read. You're right, only a zero result on read should be interpreted as EOF, not a short result. I got confused by fread where a short read does mark an end of file condition. I don't see how the growing file case is relevant, though. > > Three - Threads playing "who gets the signal". The Python signal module > > has a hack that appears to work on all relevant platform - ignore the > > signal if getpid() isn't the main thread. > > Doesn't that make signals unreliable? What if thread 4 has forked a > child, and the child exist? Won't the SIGCHLD be sent to thread 4? > AFAIK there's no standard for this, or if there is, not all systems > comply. I've never actually tried this one. I just went by the comments in signalmodule.c which claim that this works for all cases of how different implementations deliver signals to threads. I guess that was a bit hasty. Oren From list-python@ccraig.org Fri Sep 6 07:20:51 2002 From: list-python@ccraig.org (Christopher A. Craig) Date: 06 Sep 2002 02:20:51 -0400 Subject: [Python-Dev] Documentation inconsistency in re Message-ID: >From the Library Reference (2.2.1): \b Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric characters, so the end of a word is indicated by whitespace or a non-alphanumeric character. Inside a character range, \b represents the backspace character, for compatibility with Python's string literals. Now reality: Python 2.2.1 (#2, Apr 22 2002, 17:53:10) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> t = re.compile(r'\bbag\b') >>> t.search('test bag') <_sre.SRE_Match object at 0x812aad0> >>> t.search('test+bag') <_sre.SRE_Match object at 0x815d528> >>> t.search('test_bag') >>> [ chr(i) for i in xrange(256) if not t.search('test' + chr(i) + 'bag') ] ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] >>> So the implementation appears to define a word as a sequence of alphanumeric characters or underscores, which means either the documentation, or the library is wrong. Now it happens that this was found while a friend of mine and I were looking to get the exact behavior that is implemented, so I'd prefer it if the documentation were updated to meet the implementation <.8 wink>. -- Christopher A. Craig I develop for Linux for a living, I used to develop for DOS. Going from DOS to Linux is like trading a glider for an F117. - Lawrence Foard From fredrik@pythonware.com Fri Sep 6 07:47:12 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 6 Sep 2002 08:47:12 +0200 Subject: [Python-Dev] Documentation inconsistency in re References: Message-ID: <00e401c25571$3e6bc1f0$ced241d5@hagrid> Christopher A. Craig wrote: > >From the Library Reference (2.2.1): > > \b Matches the empty string, but only at the beginning or end of a > word. A word is defined as a sequence of alphanumeric characters, so > the end of a word is indicated by whitespace or a non-alphanumeric > character. Inside a character range, \b represents the backspace > character, for compatibility with Python's string literals. as you suspected, the documentation is flawed: \b is defined in terms of \w and \W. From mal@lemburg.com Fri Sep 6 08:55:13 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 06 Sep 2002 09:55:13 +0200 Subject: [Python-Dev] utf8 issue References: <200208232105.g7NL5RE16863@pcp02138704pcs.reston01.va.comcast.net> <2mznv9c1k4.fsf@starship.python.net> <200208261405.g7QE5Of05199@pcp02138704pcs.reston01.va.comcast.net> <3D77205E.8080103@lemburg.com> <200209051351.g85Dpnk12649@odiug.zope.com> Message-ID: <3D785F61.1090301@lemburg.com> Guido van Rossum wrote: >>>Please do. Bumping MAGIC is a no-no between dot releases. But I >>>don't understand why that is necessary? >> >>It would be necessary since marshal uses UTF-8 for storing >>Unicode literals. > > > Do you mean that in 2.2 it doesn't? Marshal uses it since 1.6. The point is that the fix to the lone surrogate problem resulted in a change of the UTF codec output. PYCs from unpatched and patched versions wouldn't interop if they use lone surrogates in Unicode literals. We usually bump the PYC magic in such a case, to avoid these issues. Since it's not possible for a patch level release, we have two choices: 1. leave things as they are 2. apply the fix and live with the consequences of having to regenerate PYCs by hand Just to give an example of the problem: Python 2.2: ------------- u'\ud800'.encode('utf-8') == '\xa0\x80' >>> unicode('\xa0\x80', 'utf-8') Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: unexpected code byte >>> unicode('\xed\xa0\x80', 'utf-8') Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: illegal encoding Current CVS Python: --------------------- u'\ud800'.encode('utf-8') == '\xed\xa0\x80' >>> unicode('\xed\xa0\x80', 'utf-8') u'\ud800' >>Even though it's highly unlikely that the problem cases are used in >>Python Unicode literals, there's a tiny chance. Without the MAGIC >>change this could result in PYC files failing to load. > > > Ha. You may have missed the start of this thread, but the whole > problem was that a PYC file *did* fail to load! (The .py file had a > lone surrogate in it.) So I'm not sure this argument holds much > water. Interesting. I wouldn't have expected that. > Can someone please explain what change would be necessary to what part > of the code to prevent a lone surrogate in a string literal from > creating a PYC file from blowing up? One possibility would be to: 1. change the UTF-8 encoder in Python 2.2 to produce correct output 2. let the UTF-8 decoder in Python 2.2 accept the correct output *and* the maformed output I am not sure whether 2. would introduce a security problem. Perhaps there is a way to restrict the work-around so that we don't run into UTF-8 encoding attack problems. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Fri Sep 6 15:06:21 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 06 Sep 2002 10:06:21 -0400 Subject: [Python-Dev] utf8 issue In-Reply-To: Your message of "Fri, 06 Sep 2002 09:55:13 +0200." <3D785F61.1090301@lemburg.com> References: <200208232105.g7NL5RE16863@pcp02138704pcs.reston01.va.comcast.net> <2mznv9c1k4.fsf@starship.python.net> <200208261405.g7QE5Of05199@pcp02138704pcs.reston01.va.comcast.net> <3D77205E.8080103@lemburg.com> <200209051351.g85Dpnk12649@odiug.zope.com> <3D785F61.1090301@lemburg.com> Message-ID: <200209061406.g86E6Lu14230@pcp02138704pcs.reston01.va.comcast.net> [MAL, on UTF-8 for unicode] > Marshal uses it since 1.6. The point is that the fix to the > lone surrogate problem resulted in a change of the UTF codec > output. PYCs from unpatched and patched versions wouldn't > interop if they use lone surrogates in Unicode literals. We > usually bump the PYC magic in such a case, to avoid these > issues. Since it's not possible for a patch level release, > we have two choices: > > 1. leave things as they are > > 2. apply the fix and live with the consequences of having > to regenerate PYCs by hand [but then later] > One possibility would be to: > > 1. change the UTF-8 encoder in Python 2.2 to produce correct > output > > 2. let the UTF-8 decoder in Python 2.2 accept the correct > output *and* the maformed output This sounds like the right solution. I hope you can produce a patch against the release22-maint branch. > I am not sure whether 2. would introduce a security problem. > Perhaps there is a way to restrict the work-around so that > we don't run into UTF-8 encoding attack problems. I don't see what this vulnerability (if it is one) adds to the already laughable security of marshal and .pyc files. If someone you don't trust can write your .pyc files, they can cause your interpreter to crash by inserting bogus bytecode. So I'd say this is a non-issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis@informatik.hu-berlin.de Fri Sep 6 15:12:25 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: 06 Sep 2002 16:12:25 +0200 Subject: [Python-Dev] Subsecond time stamps Message-ID: A number of systems provide subsecond time stamp resolution for files. In particular: - NFS v3 has nanosecond time stamps. - Solaris 9 has nanosecond time stamps in stat(2), and microsecond time stamps in utimes(2). In addition, they have microsecond time stamps on ufs. It appears that other Unices have also extended stat(2), as does OS X. - NTFS has 100ns resolution for time stamps. I'd like to expose atleast the stat extensions to Python. Adding new fields to stat_result is easy enough, but there are a number of alternatives: A. Add an additional field to hold the nanoseconds, i.e. st_mtimensec, st_atimensec, st_ctimensec. This is the BSD Posix extension. B. Follow the Unix API (Solaris and others). They define a struct timespec_t { time_t tv_sec; unsigned long tv_nsec; }; and fields st_mtim, st_ctim, st_atim of timespec_t. For compatibility, they #define st_mtime st_mtim.tv_sec So to get at the seconds, you can write either st_mtim.tv_sec, or st_mtime. For the nanoseconds, you write st_mtim.tv_nsec. This requires to add a new type. C. Make st_mtime a floating point number. This won't offer nanosecond resolution, as C doubles are not dense enough. What do you think? Regards, Martin From paul-python@svensson.org Fri Sep 6 15:31:25 2002 From: paul-python@svensson.org (Paul Svensson) Date: Fri, 6 Sep 2002 10:31:25 -0400 (EDT) Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Message-ID: On 6 Sep 2002, Martin v. Löwis wrote: >A number of systems provide subsecond time stamp resolution for >files. In particular: > >- NFS v3 has nanosecond time stamps. > >- Solaris 9 has nanosecond time stamps in stat(2), and microsecond > time stamps in utimes(2). In addition, they have microsecond time > stamps on ufs. It appears that other Unices have also extended > stat(2), as does OS X. > >- NTFS has 100ns resolution for time stamps. (---) >C. Make st_mtime a floating point number. This won't offer nanosecond > resolution, as C doubles are not dense enough. This seems to me the most Pythonic way. Are C doubles dense enough to offer 100 ns resolution ? /Paul From skip@pobox.com Fri Sep 6 15:39:02 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 6 Sep 2002 09:39:02 -0500 Subject: [Python-Dev] Documentation inconsistency in re In-Reply-To: References: Message-ID: <15736.48646.910216.93578@12-248-11-90.client.attbi.com> Christopher> So the implementation appears to define a word as a Christopher> sequence of alphanumeric characters or underscores, which Christopher> means either the documentation, or the library is wrong. Documentation has been fixed. Skip From erik@pythonware.com Fri Sep 6 15:44:16 2002 From: erik@pythonware.com (erik heneryd) Date: Fri, 06 Sep 2002 16:44:16 +0200 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> Message-ID: <3D78BF40.1030609@pythonware.com> Hunter Peress wrote: >example 1) pydoc os.fork >Python Library Documentation: built-in function fork in os >fork(...) > fork() -> pid > Fork a child process. > > Return 0 to child process and PID of child to parent process. > > my only objection is that the case where fork fails isn't documented. with a c background one expects a negative number, when in fact an exception is raised... erik From guido@python.org Fri Sep 6 15:41:54 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 06 Sep 2002 10:41:54 -0400 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: Your message of "Fri, 06 Sep 2002 16:44:16 +0200." <3D78BF40.1030609@pythonware.com> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <3D78BF40.1030609@pythonware.com> Message-ID: <200209061441.g86EfsV14529@pcp02138704pcs.reston01.va.comcast.net> > my only objection is that the case where fork fails isn't documented. > with a C background one expects a negative number, when in fact an > exception is raised... Ah jeez. Even with only half a day of Python you should've figured out that Python nearly always raises an exception where the corresponding C code returns an error value. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Fri Sep 6 16:03:06 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 6 Sep 2002 17:03:06 +0200 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <3D78BF40.1030609@pythonware.com> <200209061441.g86EfsV14529@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <00f201c255b6$82458c40$0900a8c0@spiff> guido wrote: > > my only objection is that the case where fork fails isn't = documented. > > with a C background one expects a negative number, when in fact an=20 > > exception is raised... >=20 > Ah jeez. Even with only half a day of Python you should've figured > out that Python nearly always raises an exception where the > corresponding C code returns an error value. otoh, it doesn't hurt to spell it out for functions like fork which almost always succeeds... (can you write a portable test that is guaranteed to raise an exception, and does that without locking up the system?) From Jack.Jansen@oratrix.com Fri Sep 6 16:09:16 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 6 Sep 2002 17:09:16 +0200 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209051501.g85F1EY13017@odiug.zope.com> Message-ID: <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> On donderdag, september 5, 2002, at 05:01 , Guido van Rossum wrote: >> Code in signal handlers is executed at some arbitrary point in the >> program and the programmer should be aware of this and only do so >> simple things like setting a flag or appending to a list. > > Unfortunately the mechanism doesn't enforce this. I wish we could > invent a Python signal API that only lets you do one of these simple > things. Could we connect signals to semaphores or locks or something like that? That would allow you to do the two things that i think are worth doing in a signal handler: setting a flag and/or making some other part of the code wake up. Only problem is that for completeness you would really want to wire up select-like functionality too, so that you could really have a single waiting mechanism. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From md9ms@mdstud.chalmers.se Fri Sep 6 16:10:21 2002 From: md9ms@mdstud.chalmers.se (Martin =?ISO-8859-1?Q?Sj=F6gren?=) Date: 06 Sep 2002 17:10:21 +0200 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: <200209061441.g86EfsV14529@pcp02138704pcs.reston01.va.comcast.net> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <3D78BF40.1030609@pythonware.com> <200209061441.g86EfsV14529@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1031325022.587.1.camel@winterfell> --=-Ph2A8jkujuq9XkvZKUg+ Content-Type: text/plain Content-Transfer-Encoding: quoted-printable fre 2002-09-06 klockan 16.41 skrev Guido van Rossum: > > my only objection is that the case where fork fails isn't documented. > > with a C background one expects a negative number, when in fact an=20 > > exception is raised... >=20 > Ah jeez. Even with only half a day of Python you should've figured > out that Python nearly always raises an exception where the > corresponding C code returns an error value. It would, however, be extremely useful if the documentation spelled out *which* exceptions can be raised! Kind of hard to write a decent try/except clause if you don't know what to expect. Regards, Martin --=-Ph2A8jkujuq9XkvZKUg+ Content-Type: application/pgp-signature; name=signature.asc Content-Description: Detta =?ISO-8859-1?Q?=E4r?= en digitalt signerad meddelandedel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA9eMVdGpBPiZwE9FYRAhetAJ4wknrWuT3HVjosDJBu7doPUPNQWACgrm34 cKfO5uHaFBC4JImx5b97vig= =kukK -----END PGP SIGNATURE----- --=-Ph2A8jkujuq9XkvZKUg+-- From guido@python.org Fri Sep 6 16:12:14 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 06 Sep 2002 11:12:14 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Fri, 06 Sep 2002 17:09:16 +0200." <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> References: <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> Message-ID: <200209061512.g86FCF314849@pcp02138704pcs.reston01.va.comcast.net> > Could we connect signals to semaphores or locks or something > like that? That would allow you to do the two things that i > think are worth doing in a signal handler: setting a flag and/or > making some other part of the code wake up. But that mixes signals with threads, which is even more poorly standardized than signals in general. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Sep 6 16:13:22 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 06 Sep 2002 11:13:22 -0400 Subject: [Python-Dev] Call for clarity ( clarification ;-) ) In-Reply-To: Your message of "Fri, 06 Sep 2002 17:10:21 +0200." <1031325022.587.1.camel@winterfell> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <3D78BF40.1030609@pythonware.com> <200209061441.g86EfsV14529@pcp02138704pcs.reston01.va.comcast.net> <1031325022.587.1.camel@winterfell> Message-ID: <200209061513.g86FDXi14877@pcp02138704pcs.reston01.va.comcast.net> > It would, however, be extremely useful if the documentation spelled out > *which* exceptions can be raised! Kind of hard to write a decent > try/except clause if you don't know what to expect. Yes, *this* is a deficiency in the Python docs that ought to be fixed. It's a lot of work though, and it's not always clear what to document (e.g. *everything* can raise MemoryError -- so it's not useful to mention that everywhere). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Sep 6 16:30:40 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 06 Sep 2002 11:30:40 -0400 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Your message of "Fri, 06 Sep 2002 16:12:25 +0200." References: Message-ID: <200209061530.g86FUeq15029@pcp02138704pcs.reston01.va.comcast.net> > C. Make st_mtime a floating point number. This won't offer nanosecond > resolution, as C doubles are not dense enough. This is the most Pythonic approach. --Guido van Rossum (home page: http://www.python.org/~guido/) From neal@metaslash.com Fri Sep 6 16:36:40 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 06 Sep 2002 11:36:40 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) References: <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> <200209061512.g86FCF314849@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D78CB88.9E642F78@metaslash.com> Guido van Rossum wrote: > > > Could we connect signals to semaphores or locks or something > > like that? That would allow you to do the two things that i > > think are worth doing in a signal handler: setting a flag and/or > > making some other part of the code wake up. > > But that mixes signals with threads, which is even more poorly > standardized than signals in general. Python can open a pipe to itself. When a signal arrives, write a character on the pipe in addition to setting a flag. Then select() on the pipe. I doubt this is worth the effort, though. Neal From martin@v.loewis.de Fri Sep 6 16:40:51 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 06 Sep 2002 17:40:51 +0200 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: References: Message-ID: Paul Svensson writes: > This seems to me the most Pythonic way. > Are C doubles dense enough to offer 100 ns resolution ? It looks like they are: >>> time.time() 1031326478.373606 >>> 1031326478 + 1e-6 1031326478.000001 >>> 1031326478 + 1e-7 1031326478.0000001 >>> 1031326478 + 1e-8 1031326478.0 but only just so: >>> 1031326478 + 2e-7 1031326478.0000002 >>> 1031326478 + 3e-7 1031326478.0000004 >>> 1031326478 + 4e-7 1031326478.0000004 I admit that this looks tempting, but I'm worried about applications that break because they expect time stamps in struct stat to be integers. Regards, Martin From guido@python.org Fri Sep 6 16:42:33 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 06 Sep 2002 11:42:33 -0400 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Your message of "Fri, 06 Sep 2002 17:40:51 +0200." References: Message-ID: <200209061542.g86FgXt15105@pcp02138704pcs.reston01.va.comcast.net> > > This seems to me the most Pythonic way. > > I admit that this looks tempting, but I'm worried about applications > that break because they expect time stamps in struct stat to be > integers. Hm, so maybe new field names is still the way to go. E.g. st_mtime gives an int, st_mtimef gives a float. The tuple version only gives the int. If the system doesn't support subsecond resolution, the st_mtimef field still exists but is an int (no point allocating a float and converting the int). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Fri Sep 6 16:50:45 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 06 Sep 2002 11:50:45 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <3D78CB88.9E642F78@metaslash.com> Message-ID: [Neal Norwitz] > Python can open a pipe to itself. When a signal arrives, write > a character on the pipe in addition to setting a flag. > Then select() on the pipe. Of course you meant to say it should do WaitForSingleObject(), so that this scheme is portable . > I doubt this is worth the effort, though. Few things are. From tim.one@comcast.net Fri Sep 6 17:01:43 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 06 Sep 2002 12:01:43 -0400 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Message-ID: [Paul Svensson] > Are C doubles dense enough to offer 100 ns resolution ? The question can't be answered unless you also specify how many years you want to cover. It takes about 25 bits to distinguish a year's worth of seconds, and an IEEE double has 53 bits to play with. So if you were only interested in representing one year, you've got about 28 bits left to play with. If you want to cover an N-year span, you've got about 28 - log2(N) bits to play with. It takes a bit over 23 bits to distinguish the number of 100 ns slices in a second, so N has to be small enough that 5 - log2(N) doesn't go negative. So if you count the start of the epoch at 1970, you've just created a year 2003 problem . From oren-py-d@hishome.net Fri Sep 6 17:54:49 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Fri, 6 Sep 2002 19:54:49 +0300 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com>; from Jack.Jansen@oratrix.com on Fri, Sep 06, 2002 at 05:09:16PM +0200 References: <200209051501.g85F1EY13017@odiug.zope.com> <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> Message-ID: <20020906195449.A23347@hishome.net> On Fri, Sep 06, 2002 at 05:09:16PM +0200, Jack Jansen wrote: > Could we connect signals to semaphores or locks or something > like that? That would allow you to do the two things that i > think are worth doing in a signal handler: setting a flag and/or > making some other part of the code wake up. Signal handlers and locks don't mix well. A signal handler can't grab a lock. The signal handler can't wait for the lock to be released because it has interrupted the code holding it. The traditional way this has been handled is with a global "interrupt enable" flag. Just like the good old days of 8 bit micros and DOS when any application could clear the interrupt flag :-) If Queue.Queue sets up a signal critical section as well as getting the queue lock a signal could write to a Queue and wake up a thread waiting on the other end. > Only problem is that for completeness you would really want to > wire up select-like functionality too, so that you could really > have a single waiting mechanism. If the program uses select as the central dispatcher you can set up a pipe. The signal handler writes to one end and the other end is listed in the select socket map. It's a simple way to handle an occasional event like a child process dying or a SIGHUP telling you to reload the configuration file. Do you want to use signals for more intensive tasks like asynchronous I/O? Oren From zack@codesourcery.com Fri Sep 6 18:28:03 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 6 Sep 2002 10:28:03 -0700 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <20020906195449.A23347@hishome.net> References: <200209051501.g85F1EY13017@odiug.zope.com> <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> <20020906195449.A23347@hishome.net> Message-ID: <20020906172803.GP6886@codesourcery.com> On Fri, Sep 06, 2002 at 07:54:49PM +0300, Oren Tirosh wrote: > Signal handlers and locks don't mix well. A signal handler can't grab a > lock. The signal handler can't wait for the lock to be released because > it has interrupted the code holding it. The traditional way this has been > handled is with a global "interrupt enable" flag. Just like the good old > days of 8 bit micros and DOS when any application could clear the > interrupt flag :-) > > If Queue.Queue sets up a signal critical section as well as getting the > queue lock a signal could write to a Queue and wake up a thread waiting > on the other end. Would this be an appropriate place to complain about how KeyboardInterrupt won't wake up a thread stuck waiting on a Queue? zw From guido@python.org Fri Sep 6 18:53:22 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 06 Sep 2002 13:53:22 -0400 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: Your message of "Fri, 06 Sep 2002 10:28:03 PDT." <20020906172803.GP6886@codesourcery.com> References: <200209051501.g85F1EY13017@odiug.zope.com> <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> <20020906195449.A23347@hishome.net> <20020906172803.GP6886@codesourcery.com> Message-ID: <200209061753.g86HrMx15903@pcp02138704pcs.reston01.va.comcast.net> > Would this be an appropriate place to complain about how > KeyboardInterrupt won't wake up a thread stuck waiting on a Queue? No, unless you have a real proposal on how to fix it (not just a vague idea -- we've all had those, and they don't work). Working code or shut up. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From zack@codesourcery.com Fri Sep 6 21:52:31 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 6 Sep 2002 13:52:31 -0700 Subject: [Python-Dev] Re: Signal-resistant code (was: Two random and nearly unrelated ideas) In-Reply-To: <200209061753.g86HrMx15903@pcp02138704pcs.reston01.va.comcast.net> References: <200209051501.g85F1EY13017@odiug.zope.com> <9CA26C2C-C1AA-11D6-8D51-003065517236@oratrix.com> <20020906195449.A23347@hishome.net> <20020906172803.GP6886@codesourcery.com> <200209061753.g86HrMx15903@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020906205231.GQ6886@codesourcery.com> On Fri, Sep 06, 2002 at 01:53:22PM -0400, Guido van Rossum wrote: > > Would this be an appropriate place to complain about how > > KeyboardInterrupt won't wake up a thread stuck waiting on a Queue? > > No, unless you have a real proposal on how to fix it (not just a vague > idea -- we've all had those, and they don't work). Working code or > shut up. :-) Fair enough. The underlying problem is that KeyboardInterrupt does not abort acquire() called on a thread lock. This is only noticeable when it was the main thread that called acquire -- if it's some other thread, the KeyboardInterrupt will still be delivered to the main thread. Compare the behavior of these two test programs: -- test1.py -- import time, thread lock = thread.allocate_lock() lock.acquire() def child_thread(): print "Acquiring lock" lock.acquire() print "Have lock (can't happen)" lock.release() thread.start_new_thread(child_thread, ()) print "Hit ^C now" time.sleep(3600) -- test2.py -- import time, thread lock = thread.allocate_lock() def child_thread(): print "Acquiring lock" lock.acquire() print "Have lock" time.sleep(3600) lock.release() thread.start_new_thread(child_thread, ()) time.sleep(1) # give child a chance to acquire lock print "Hit ^C now" lock.acquire() I'm going to look only at the pthread-based thread support; presumably similar changes to the ones I will propose, need to be made to the others. There are two cases of PyThread_acquire_lock in thread_pthread.h: using semaphores, and using condition variables. Let's look at the condition variable one first: /* mut must be locked by me -- part of the condition * protocol */ status = pthread_mutex_lock( &thelock->mut ); CHECK_STATUS("pthread_mutex_lock[2]"); while ( thelock->locked ) { status = pthread_cond_wait(&thelock->lock_released, &thelock->mut); CHECK_STATUS("pthread_cond_wait"); } thelock->locked = 1; status = pthread_mutex_unlock( &thelock->mut ); Naively, we'd like to shove a check of PyOS_InterruptOccurred in that loop so we can bail out if it's true. It is part of the spec for pthread_cond_wait that any signal which is handled (as SIGINT is) will not interrupt its execution. So in order to get a chance to check for interrupts we need to change this to a repeated timed wait, like so: while ( thelock->locked && !interrupted ) { timeout.tv_sec = time(0) + 1; status = pthread_cond_timedwait(&thelock->lock_released, &thelock->mut, &timeout); if (status != ETIMEDOUT) CHECK_STATUS("pthread_cond_wait"); interrupted = PyOS_InterruptOccurred(); } thelock->locked = 1; status = pthread_mutex_unlock( &thelock->mut ); Then we do a bit of fiddling in the return path to reset the interrupt flag and make sure the caller sees a failure. In the semaphore case, life is theoretically simpler: there is no mutex, and sem_wait is interrupted by a handled signal, assuming SA_RESTART was not set for that signal (which it isn't, in Python). do { if (waitflag) status = fix_status(sem_wait(thelock)); else status = fix_status(sem_trywait(thelock)); } while (status == EINTR); /* Retry if interrupted by a signal */ becomes do { if (waitflag) status = fix_status(sem_wait(thelock)); else status = fix_status(sem_trywait(thelock)); if (status == EINTR && PyOS_InterruptOccurred()) goto interrupted; } while (status == EINTR); /* Retry if interrupted by a signal */ ... interrupted: PyErr_SetInterrupt(); dprintf(("PyThread_acquire_lock(%p, %d) interrupted by user\n", lock, waitflag)); return 0; However, the Linux semaphore implementation is buggy and will not actually return EINTR from sem_wait, ever. I'll take this up with the libc maintainers; at the Python level, the thing to do is assume it works. Hence, the appended patch. (While I was at it I fixed CHECK_STATUS so that it actually prints the relevant system error, instead of whatever junk happens to be in errno.) zw =================================================================== Index: thread_pthread.h --- thread_pthread.h 17 Mar 2002 17:19:00 -0000 2.40 +++ thread_pthread.h 6 Sep 2002 20:51:31 -0000 @@ -128,7 +128,12 @@ typedef struct { pthread_mutex_t mut; } pthread_lock; -#define CHECK_STATUS(name) if (status != 0) { perror(name); error = 1; } +#define CHECK_STATUS(name) do { \ + if (status != 0) { \ + fprintf(stderr, "%s: %s\n", name, strerror(status)); \ + error = 1; \ + } \ +} while (0) /* * Initialization. @@ -387,6 +392,8 @@ PyThread_acquire_lock(PyThread_type_lock status = fix_status(sem_wait(thelock)); else status = fix_status(sem_trywait(thelock)); + if (status == EINTR && PyOS_InterruptOccurred()) + goto interrupted; } while (status == EINTR); /* Retry if interrupted by a signal */ if (waitflag) { @@ -399,6 +406,12 @@ PyThread_acquire_lock(PyThread_type_lock dprintf(("PyThread_acquire_lock(%p, %d) -> %d\n", lock, waitflag, success)); return success; + + interrupted: + PyErr_SetInterrupt(); + dprintf(("PyThread_acquire_lock(%p, %d) interrupted by user\n", + lock, waitflag)); + return 0; } void @@ -472,8 +485,10 @@ int PyThread_acquire_lock(PyThread_type_lock lock, int waitflag) { int success; + int interrupted = 0; pthread_lock *thelock = (pthread_lock *)lock; int status, error = 0; + struct timespec timeout; dprintf(("PyThread_acquire_lock(%p, %d) called\n", lock, waitflag)); @@ -491,10 +506,15 @@ PyThread_acquire_lock(PyThread_type_lock * protocol */ status = pthread_mutex_lock( &thelock->mut ); CHECK_STATUS("pthread_mutex_lock[2]"); - while ( thelock->locked ) { - status = pthread_cond_wait(&thelock->lock_released, - &thelock->mut); - CHECK_STATUS("pthread_cond_wait"); + timeout.tv_nsec = 0; + while ( thelock->locked && !interrupted ) { + timeout.tv_sec = time(0) + 1; + status = pthread_cond_timedwait(&thelock->lock_released, + &thelock->mut, + &timeout); + if (status != ETIMEDOUT) + CHECK_STATUS("pthread_cond_wait"); + interrupted = PyOS_InterruptOccurred(); } thelock->locked = 1; status = pthread_mutex_unlock( &thelock->mut ); @@ -502,6 +522,10 @@ PyThread_acquire_lock(PyThread_type_lock success = 1; } if (error) success = 0; + if (interrupted) { + PyErr_SetInterrupt(); + success = 0; + } dprintf(("PyThread_acquire_lock(%p, %d) -> %d\n", lock, waitflag, success)); return success; } From martin@v.loewis.de Sat Sep 7 08:35:26 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 07 Sep 2002 09:35:26 +0200 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: <200209061542.g86FgXt15105@pcp02138704pcs.reston01.va.comcast.net> References: <200209061542.g86FgXt15105@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > Hm, so maybe new field names is still the way to go. E.g. st_mtime > gives an int, st_mtimef gives a float. The tuple version only gives > the int. If the system doesn't support subsecond resolution, the > st_mtimef field still exists but is an int (no point allocating a > float and converting the int). OTOH, I just found that the time values are already floats on the Mac. Did the change in return value for time.time() cause any problems at the time it was made? Regards, Martin From aahz@pythoncraft.com Sat Sep 7 22:44:09 2002 From: aahz@pythoncraft.com (Aahz) Date: Sat, 7 Sep 2002 17:44:09 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: References: <15730.52469.604124.730029@localhost.localdomain> <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020907214409.GA1939@panix.com> On Mon, Sep 02, 2002, François Pinard wrote: > > To get the same effects with email addresses, I often prefer using > `mailto:' as a prefix over writing `<' and `>' around a quoted address > in a message body, even if not fully systematic about this. In the > message header itself, `<' and '>' are the proper way to go, of > course. Ewww. I hate "mailto:" because it interferes with cut'n'paste. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From Jack.Jansen@oratrix.com Sat Sep 7 23:11:36 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Sun, 8 Sep 2002 00:11:36 +0200 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Message-ID: On zaterdag, september 7, 2002, at 09:35 , Martin v. Loewis wrote: > Guido van Rossum writes: > >> Hm, so maybe new field names is still the way to go. E.g. st_mtime >> gives an int, st_mtimef gives a float. The tuple version only gives >> the int. If the system doesn't support subsecond resolution, the >> st_mtimef field still exists but is an int (no point allocating a >> float and converting the int). > > OTOH, I just found that the time values are already floats on the > Mac. Did the change in return value for time.time() cause any problems > at the time it was made? It's been causing me headaches in the form of failing test suites about once a year:-) But if I break down the time problems I have on the Mac (100% of which are due to people having a completely unix-centric idea of what a timestamp is) I would say 90% are due to the Mac epoch being in 1904 in stead of in 1970, 9% are due to mac timestamps being localtime in stead of GMT and only 1% are due to the timestamps being floats. And the latter are the easiest to fix, too. The localtime/gmt issues are the hardest, especially because of DST. My preference would be that st_mtime and all other such values are defined to be cookies (sort of similar to lseek values). You would then invoke one of the mythical Python datetime routines to convert the cookie into something guaranteed to be of your liking. (and this specific datetime routine would be platform dependent). If you use the cookie as-is you have a good chance of it working, but you're living dangerously (an analogy would be opening a binary file without "rb"). But this isn't very friendly for backwards compatibility... -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From pinard@iro.umontreal.ca Sat Sep 7 23:50:20 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Sat, 07 Sep 2002 18:50:20 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <20020907214409.GA1939@panix.com> (Aahz's message of "Sat, 7 Sep 2002 17:44:09 -0400") References: <15730.52469.604124.730029@localhost.localdomain> <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> <20020907214409.GA1939@panix.com> Message-ID: [Aahz] > Ewww. I hate "mailto:" because it interferes with cut'n'paste. I read that you cannot cut and paste a string preceded by `mailto:'? Is it what you meant? What is this interference you mention? What I like in `mailto:' for text or message bodies, is that my editor and mail user agent highlights it and makes it clickable. I would be tempted to guess that other editors do this too, but the truth is that I do not know. Maybe we should not let the strengths and drawbacks of the various editors we use drive us into religious feelings for or against a specific markup. Yet, such comparisons let us have an overall feeling on the usefulness of a particular approach. As long as we resist editor wars, it may be useful. If reStructuredText is going to gain popularity in the Python developers community, maybe we should bet in that direction, and prefer the conventions it proposes for Python-dev summaries and other simple documents. The bet to be taken, here, is that our editors and tools would eventually better support reST, or be supplemented with a dependable set of programs to do so. On the other hand, it seems that not everybody is comfortable with reST yet, this might be a problem if there is strong resistance. For one, I rather liked what I saw so far, and without knowing how much time or effort it would take before I use reST fluently, I would probably be happy to share the bet! -- François Pinard http://www.iro.umontreal.ca/~pinard From guido@python.org Sun Sep 8 00:24:54 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 07 Sep 2002 19:24:54 -0400 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Your message of "Sun, 08 Sep 2002 00:11:36 +0200." References: Message-ID: <200209072324.g87NOsG15613@pcp02138704pcs.reston01.va.comcast.net> > >> Hm, so maybe new field names is still the way to go. E.g. st_mtime > >> gives an int, st_mtimef gives a float. The tuple version only gives > >> the int. If the system doesn't support subsecond resolution, the > >> st_mtimef field still exists but is an int (no point allocating a > >> float and converting the int). > > > > OTOH, I just found that the time values are already floats on the > > Mac. Did the change in return value for time.time() cause any problems > > at the time it was made? > > It's been causing me headaches in the form of failing test > suites about once a year:-) But if I break down the time > problems I have on the Mac (100% of which are due to people > having a completely unix-centric idea of what a timestamp is) I > would say 90% are due to the Mac epoch being in 1904 in stead of > in 1970, 9% are due to mac timestamps being localtime in stead > of GMT and only 1% are due to the timestamps being floats. And > the latter are the easiest to fix, too. The localtime/gmt issues > are the hardest, especially because of DST. I'm not sure if this can be used as an argument for making st_mtime and friends floats and be done with it. I wish it could be, because in the long run that's a much nicer API than adding new fields. > My preference would be that st_mtime and all other such values > are defined to be cookies (sort of similar to lseek values). You > would then invoke one of the mythical Python datetime routines > to convert the cookie into something guaranteed to be of your > liking. (and this specific datetime routine would be platform > dependent). If you use the cookie as-is you have a good chance > of it working, but you're living dangerously (an analogy would > be opening a binary file without "rb"). But this isn't very > friendly for backwards compatibility... There's at least one place I know of in Python that assumes the epoch being 1970: calendar.timegm() -- note the line "EPOCH = 1970" right in front of it. :-) Would it make sense if the portable Python APIs translated everything to an epoch of 1970 and UTC? That's what the Windows C library does. Very helpful. (Or is this a problem that's going to disappear with MacOS X? I presume it uses UTC and I hope its epoch is 1970?) --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Sun Sep 8 05:02:24 2002 From: aahz@pythoncraft.com (Aahz) Date: Sun, 8 Sep 2002 00:02:24 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: References: <15730.52469.604124.730029@localhost.localdomain> <200209021401.g82E1k030628@pcp02138704pcs.reston01.va.comcast.net> <20020907214409.GA1939@panix.com> Message-ID: <20020908040224.GA27302@panix.com> On Sat, Sep 07, 2002, François Pinard wrote: > [Aahz] >> >> Ewww. I hate "mailto:" because it interferes with cut'n'paste. > > I read that you cannot cut and paste a string preceded by `mailto:'? Is it > what you meant? What is this interference you mention? xterm does a nifty job usually of figuring out what to highlight when I double-click on a word. It fails with mailto: because normally when I cut'n'paste an address, I *don't* want to include the "mailto:" portion. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From skip@manatee.mojam.com Sun Sep 8 13:00:23 2002 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 8 Sep 2002 07:00:23 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200209081200.g88C0N5Z008526@manatee.mojam.com> Bug/Patch Summary ----------------- 278 open / 2830 total bugs (-6) 115 open / 1686 total patches (-4) New Bugs -------- setting file buffer size is unreliable (2002-09-02) http://python.org/sf/603724 spurious SyntaxWarning (2002-09-03) http://python.org/sf/604036 time.struct_time undocumented (2002-09-03) http://python.org/sf/604128 long list in Pythonwin -> weird text (2002-09-03) http://python.org/sf/604387 faster [None]*n or []*n (2002-09-04) http://python.org/sf/604716 pre bug (2002-09-04) http://python.org/sf/604803 python-mode.el replaces function on f1 (2002-09-06) http://python.org/sf/605818 python-mode kills arrow in gdb (gud.el) (2002-09-08) http://python.org/sf/606250 elisp: doesn't recognize comment-syntax (2002-09-08) http://python.org/sf/606251 py-electric-colon & delete-selection-mod (2002-09-08) http://python.org/sf/606254 New Patches ----------- ccompiler argument checking too strict (2002-09-02) http://python.org/sf/603831 release GIL around getaddrinfo() (2002-09-03) http://python.org/sf/604210 For Bug [ 490168 ] shutil.copy(path, pat (2002-09-04) http://python.org/sf/604600 nntplib: group descriptions and RFC2980 (2002-09-05) http://python.org/sf/605370 Tweaks to calls to AH/Help (2002-09-07) http://python.org/sf/606067 fast dictionary lookup by name (2002-09-07) http://python.org/sf/606098 Mac OS X keydefs (2002-09-07) http://python.org/sf/606132 install_IDLE target in Mac/OSX/Makefile (2002-09-07) http://python.org/sf/606134 Closed Bugs ----------- Unicode in sys.path not supported (2001-10-30) http://python.org/sf/476326 PDB single steps list comprehensions (2002-02-28) http://python.org/sf/523995 surprise overriding __radd__ in subclass of complex (2002-03-18) http://python.org/sf/531355 import user doesn't work with CGIs (2002-05-14) http://python.org/sf/555779 whatsnew explains noargs incorrectly (2002-06-11) http://python.org/sf/567607 Invalid mmap crashes Python interpreter (2002-07-24) http://python.org/sf/585792 spawn*() doesn't handle errors well (2002-08-20) http://python.org/sf/597795 The KeyError message doesn't use repr on the key value reported (2002-08-21) http://python.org/sf/598451 Method resolution order in Py 2.2 - 2.3 (2002-08-23) http://python.org/sf/599452 bug in new execvpe (2002-08-27) http://python.org/sf/601077 xmlrpclib ignores CDATA (2002-08-28) http://python.org/sf/601534 some int results that should be bool (2002-08-29) http://python.org/sf/601775 smtplib mishandles empty sender (2002-08-29) http://python.org/sf/602029 configure finds c++ w/o --with-cxx (2002-08-29) http://python.org/sf/602102 Closed Patches -------------- unicode encoding error callbacks (2001-06-12) http://python.org/sf/432401 Pure Python strptime() (PEP 42) (2001-10-23) http://python.org/sf/474274 mimetypes: all extensions for a type (2002-05-09) http://python.org/sf/554192 socketmodule.[ch] downgrade (2002-08-09) http://python.org/sf/593069 email: RFC 2231 parameters encoding (2002-08-26) http://python.org/sf/600096 IDLE [Open module]: import submodules (2002-08-26) http://python.org/sf/600152 Robustness tweak to httplib.py (2002-08-26) http://python.org/sf/600488 obmalloc,structmodule: 64bit, big endian (2002-08-28) http://python.org/sf/601369 expose PYTHON_API_VERSION via sys (2002-08-28) http://python.org/sf/601456 replace_header method for Message class (2002-08-29) http://python.org/sf/601959 sys.path in user.py (2002-08-29) http://python.org/sf/602005 single shared ticker (2002-08-29) http://python.org/sf/602191 From Jack.Jansen@oratrix.com Sun Sep 8 22:51:59 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Sun, 8 Sep 2002 23:51:59 +0200 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: <200209072324.g87NOsG15613@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <33811F04-C375-11D6-9BF8-003065517236@oratrix.com> On zondag, september 8, 2002, at 01:24 , Guido van Rossum wrote: > Would it make sense if the portable Python APIs translated everything > to an epoch of 1970 and UTC? That's what the Windows C library does. > Very helpful. (Or is this a problem that's going to disappear with > MacOS X? I presume it uses UTC and I hope its epoch is 1970?) On MacOSX (if you use unix-based Python, not if you use old MacPython) the problem is gone. At least, if you ignore the timestamps returned by mac-specific filesystem routines, but I think we can do that safely. Changing the APIs to return unix-style timestamps is what the GUSI unix-compatible socket and I/O library used by MacPython did originally, but I had to rip it out. The problem was that GUSI did provide all the unix system calls, but not the other library routines that handled timestamps. So these were provided by the Metrowerks C library, which assumes localtime. So ctime() and gmtime() and all its friends did the wrong thing, and I didn't cherish the idea of finding replacements for them. If your suggestion is that every timestamp goes through a conversion routine before being passed from C to Python and through a reverse conversion when it goes from Python to C: yes, that would definitely make sense. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From pinard@iro.umontreal.ca Mon Sep 9 14:59:10 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Mon, 09 Sep 2002 09:59:10 -0400 Subject: [Python-Dev] Codecs lookup order Message-ID: Hi, people. Happily playing with codecs (using Python 2.2.1), I found out that one should be careful about _not_ naming a module after the encoding name, when closely following the documentation in the Library Reference manual. Here is what I guess is happening. `codecs.register()' appends the search function from the new codec module at end of existing search functions. `codecs.lookup()' tries the search functions in the same order in which they were declared. Consequently, `encodings.lookup()' is tried first. If the encoding does not exist in the cache, `encodings.lookup()' tries to import a module by the name of the encoding, slightly transformed, and will indeed import the new user codec module, because that module has the name of the encoding, and is on the module search path. But now, `encodings.lookup()' expects a `getregentry' function in that module, does not find it, and raises a CodecRegistryError, not leaving a chance to subsequent codec search functions to be used. On the user side, a mere renaming the user module holding the new codec solves the problem. I'm not sure what should best be done. The documentation might be modified to explain the limitation, so other users do not trip up on it. `encoding.lookup()' might merely return None in case `getregentry' is not defined in the imported module, or else, it could make sure that it imports modules exclusively from within the `encodings' package. The best and simplest might be to lookup the code search functions in reverse order of their registration. `encoding.lookup()' would be called last instead of first. It would be easier for the user to override an encoding bundled with the Python distribution, if there is a need to do so. Because the Python Library Reference does not specify yet in which order codec search functions are tried, the order is not frozen yet and it might be easier to change it. -- François Pinard http://www.iro.umontreal.ca/~pinard From rledwith@cas.org Mon Sep 9 17:34:18 2002 From: rledwith@cas.org (ledwith@cas.org) Date: Mon, 9 Sep 2002 12:34:18 -0400 (EDT) Subject: [Python-Dev] 64-bit process optimization 1 Message-ID: <20020909123418.AAB25999@cas.org> Hello, This is my first post to Python-Dev. As requested by the list manager I am supplying a brief personal introduction before getting to the topic of this message: I am a Senior Research Scientist at CAS, a branch of the American Chemical Society. I have used Python as my programming language of choice for the last four years. I typically work with large collections of text documents performing analyses of text, computer indexing of text, and information retrieval. I use Python as (1) a general purpose programming language, and (2) a high-level programming language to invoke high-performance C and C++ modules (including Numeric). If I examine my programs by data structures, I would find that they contain mostly: 1. Very large dictionaries using tuples and strings as keys. Guido's essay on Implementing Graphs was the inspiration for my using dictionaries to create very large directed acyclic graphs. 2. Specialized C++ objects to represent inverted lists. 3. Numeric objects for representing vectors and tables of floating point values. My primary computing platforms are four dedicated Sun servers, containing 30 processors, 88GB of RAM and 2TB of DASD. Most of the programs I write require between 1 hour and 27 days to complete. (Obviously, I am an atypical Python user!) During the last three months, I have been forced to migrate from 32-bit python processes to 64-bit processes due to the large number of data points I am analyzing within a single program run. It is my experiences while migrating from 32-bit to 64-bit code that prompted this message. It is with some trepidation that as the subject of my first posting I am suggesting that Python 2.3 should use a different layout of all Python objects than is defined in Python 2.2.1. Specifically, I have found that changing lines 63-74 of Include/object.h from: #ifdef Py_TRACE_REFS #define PyObject_HEAD \ struct _object *_ob_next, *_ob_prev; \ int ob_refcnt; \ struct _typeobject *ob_type; #define PyObject_HEAD_INIT(type) 0, 0, 1, type, #else /* !Py_TRACE_REFS */ #define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; #define PyObject_HEAD_INIT(type) 1, type, #endif /* !Py_TRACE_REFS */ to: #ifdef Py_TRACE_REFS #define PyObject_HEAD \ struct _object *_ob_next, *_ob_prev; \ struct _typeobject *ob_type; \ int ob_refcnt; #define PyObject_HEAD_INIT(type) 0, 0, type, 1, #else /* !Py_TRACE_REFS */ #define PyObject_HEAD \ struct _typeobject *ob_type; \ int ob_refcnt; #define PyObject_HEAD_INIT(type) type, 1, #endif /* !Py_TRACE_REFS */ significantly improved the performance of my 64-bit processes. Basically, I have just changed the order of the items in PyObject and PyVarObject to avoid gas due to an "int" being a 4-byte long and aligned types, while "long" and pointers are 8-byte long and aligned types (on 64-bit platforms that conform to the LP64 guideline). For the ILP32 guideline, such as Intel x86 and AMD CPUs, this should have no effect. On the Sun platform on which I live, the changes work for both ILP32 and LP64. For the very large programs I run, the modification saved me 40% execution time. This was probably due to the increased number of Python objects that would fit into the L2 cache, so I don't believe that others would necessarily see as large as a difference with this coding change. Please consider this change for inclusion in the upcoming Python release. - Bob From aahz@pythoncraft.com Mon Sep 9 18:03:02 2002 From: aahz@pythoncraft.com (Aahz) Date: Mon, 9 Sep 2002 13:03:02 -0400 Subject: [Python-Dev] 64-bit process optimization 1 In-Reply-To: <20020909123418.AAB25999@cas.org> References: <20020909123418.AAB25999@cas.org> Message-ID: <20020909170301.GA8457@panix.com> Without commenting on the merits of your proposal, I can tell you that it'll get lost unless you file a bug report on SourceForge. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From guido@python.org Mon Sep 9 18:55:21 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 09 Sep 2002 13:55:21 -0400 Subject: [Python-Dev] 64-bit process optimization 1 In-Reply-To: Your message of "Mon, 09 Sep 2002 12:34:18 EDT." <20020909123418.AAB25999@cas.org> References: <20020909123418.AAB25999@cas.org> Message-ID: <200209091755.g89HtLV30441@pcp02138704pcs.reston01.va.comcast.net> > I am suggesting that Python 2.3 should use a different layout of > all Python objects than is defined in Python 2.2.1. > Specifically, I have found that changing lines 63-74 of > Include/object.h from: > > #ifdef Py_TRACE_REFS > #define PyObject_HEAD \ > struct _object *_ob_next, *_ob_prev; \ > int ob_refcnt; \ > struct _typeobject *ob_type; > #define PyObject_HEAD_INIT(type) 0, 0, 1, type, > #else /* !Py_TRACE_REFS */ > #define PyObject_HEAD \ > int ob_refcnt; \ > struct _typeobject *ob_type; > #define PyObject_HEAD_INIT(type) 1, type, > #endif /* !Py_TRACE_REFS */ > > to: > > #ifdef Py_TRACE_REFS > #define PyObject_HEAD \ > struct _object *_ob_next, *_ob_prev; \ > struct _typeobject *ob_type; \ > int ob_refcnt; > #define PyObject_HEAD_INIT(type) 0, 0, type, 1, > #else /* !Py_TRACE_REFS */ > #define PyObject_HEAD \ > struct _typeobject *ob_type; \ > int ob_refcnt; > #define PyObject_HEAD_INIT(type) type, 1, > #endif /* !Py_TRACE_REFS */ > > significantly improved the performance of my 64-bit processes. > > Basically, I have just changed the order of the items in > PyObject and PyVarObject to avoid gas due to an "int" being a > 4-byte long and aligned types, while "long" and pointers are > 8-byte long and aligned types (on 64-bit platforms that conform > to the LP64 guideline). For the ILP32 guideline, such as Intel > x86 and AMD CPUs, this should have no effect. On the Sun > platform on which I live, the changes work for both ILP32 and > LP64. For the very large programs I run, the modification saved > me 40% execution time. This was probably due to the increased > number of Python objects that would fit into the L2 cache, so I > don't believe that others would necessarily see as large as a > difference with this coding change. Interesting! I can see why this makes sense. Strings, lists and tuples all have an int (ob_size) directly following the standard HEAD, and after that something that requires pointer alignment, so that these object types would all save 8 bytes! To wit: string int refcnt, ptr type, int size, long hash, ... ^gap ^gap list int refcnt, ptr type, int size, ptr item* ^gap ^gap tuple int refcnt, ptr type, int size, ptr item[] ^gap ^gap By swapping the first two fields, these gaps would all disappear. The dict object doesn't use ob_size, but starts with an odd number of ints, so the same reasoning shows it would also save 8 bytes. I don't have access to a 64-bit platform to experiment with this. Unfortunately, one problem is binary compatibility. We try to make it possible to link newer Python versions with extension modules (like Numeric, which you use) compiled for older versions. This requires that the binary lay-out of objects remains the same, and swapping ob_refcnt and ob_type would cause immediate crashes in this case. It may be that there are other reasons why binary incompatibilities exist between 2.2 and 2.3 that make this impractical, so perhaps I'm being too conservative here. Another issue is that at least theoretically, on a 64-bit platform, there could be more than 2 billion references to a particular object. E.g. if you have enough memory, the following allocates 3 lists each containing a billion references to None, causing the reference count of None to go negative: A = [] for i in range(3): A.append([None]*1000000000) So perhaps the refcnt should have been a long in the first place. A similar argument may hold for the length of e.g. strings and lists: one could wish to have a list of more than 2 billion elements, or a string containing more than 2 gigabytes (that much RAM is easily found on the larger 64-bit servers, I believe). Opinions? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Sep 9 19:16:18 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 09 Sep 2002 14:16:18 -0400 Subject: [Python-Dev] 64-bit process optimization 1 In-Reply-To: <200209091755.g89HtLV30441@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > So perhaps the refcnt should have been a long in the first place. We agreed to that years ago, but never bothered to change it. In fact, you used to tell people it *was* a long until I beat that out of you . Do note that a long is still only 4 bytes on Win64. The type we really want here is what pyport.h calls Py_intptr_t (a Python spelling of the appropriate C99 type; C99 introduced ways to say what you really mean in these cases). > A similar argument may hold for the length of e.g. strings and lists: > one could wish to have a list of more than 2 billion elements, or a > string containing more than 2 gigabytes (that much RAM is easily found > on the larger 64-bit servers, I believe). > > Opinions? Those are more naturally addressed by size_t, since strlen and malloc are constrained to that type. I generally declare string-slinging code as using size_t vars now, and endure the pain of casting back and forth to int to talk with Python's idea of a string size. Whether it's worth the pain to change this stuff depends on whether we think 64-bit boxes are just another passing fad like the Internet . From python-dev@liveevil.com Mon Sep 9 20:42:56 2002 From: python-dev@liveevil.com (john spurling) Date: Mon, 9 Sep 2002 12:42:56 -0700 Subject: [Python-Dev] raw headers in rfc822.Message Message-ID: <20020909194256.GA13424@c7c8.colobox.com> greetings, since the raw headers don't seem to be available in an rfc822.Message, i added a quick two line hack to populate a rawheaders member. attached is a patch to rfc822.py from the python 2.2.1 distribution. if you don't like my two line hack, consider this a request to provide the raw headers in some way in an rfc822.Message. thanks, john spurling -- "nothing brings people together like doom." --sarah vowell From python-dev@liveevil.com Mon Sep 9 20:52:34 2002 From: python-dev@liveevil.com (john spurling) Date: Mon, 9 Sep 2002 12:52:34 -0700 Subject: [Python-Dev] Re: raw headers in rfc822.Message Message-ID: <20020909195234.GA18807@c7c8.colobox.com> --OXfL5xGRrasGEqWY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline maybe it would help if i actually attached the diff... --OXfL5xGRrasGEqWY Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="rfc822.diff" 139d138 < self.rawheaders = '' 158,159d156 < # Add line to the raw input < self.rawheaders += line --OXfL5xGRrasGEqWY-- From aahz@pythoncraft.com Mon Sep 9 20:52:08 2002 From: aahz@pythoncraft.com (Aahz) Date: Mon, 9 Sep 2002 15:52:08 -0400 Subject: [Python-Dev] raw headers in rfc822.Message In-Reply-To: <20020909194256.GA13424@c7c8.colobox.com> References: <20020909194256.GA13424@c7c8.colobox.com> Message-ID: <20020909195208.GA1662@panix.com> On Mon, Sep 09, 2002, john spurling wrote: > > since the raw headers don't seem to be available in an rfc822.Message, > i added a quick two line hack to populate a rawheaders member. > attached is a patch to rfc822.py from the python 2.2.1 > distribution. File a bug report on SourceForge. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From zack@codesourcery.com Mon Sep 9 21:09:34 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Mon, 9 Sep 2002 13:09:34 -0700 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: <20020909195234.GA18807@c7c8.colobox.com> References: <20020909195234.GA18807@c7c8.colobox.com> Message-ID: <20020909200934.GA17001@codesourcery.com> On Mon, Sep 09, 2002 at 12:52:34PM -0700, john spurling wrote: > maybe it would help if i actually attached the diff... > > 139d138 > < self.rawheaders = '' > 158,159d156 > < # Add line to the raw input > < self.rawheaders += line You've generated this patch backward, and in a format which makes it useless to us. Please regenerate it with diff -c or diff -u (either is acceptable) and put the newer file _second_ on the command line: diff -u OLD_FILE NEW_FILE. zw From barry@python.org Mon Sep 9 21:11:46 2002 From: barry@python.org (Barry A. Warsaw) Date: Mon, 9 Sep 2002 16:11:46 -0400 Subject: [Python-Dev] raw headers in rfc822.Message References: <20020909194256.GA13424@c7c8.colobox.com> Message-ID: <15741.130.736249.914221@anthem.wooz.org> >>>>> "js" == john spurling writes: js> since the raw headers don't seem to be available in an js> rfc822.Message, i added a quick two line hack to populate a js> rawheaders member. attached is a patch to rfc822.py from the js> python 2.2.1 distribution. js> if you don't like my two line hack, consider this a request to js> provide the raw headers in some way in an rfc822.Message. Why not just use email.Message.Message? You can get the original headers from it, and the email package tries really hard to produce output identical to the input. -Barry From barry@barrys-emacs.org Mon Sep 9 23:21:45 2002 From: barry@barrys-emacs.org (Barry Scott) Date: Mon, 9 Sep 2002 23:21:45 +0100 Subject: [Python-Dev] Re: Python-dev summary for 2002-08-15 - 2002-09-01 In-Reply-To: <20020908040224.GA27302@panix.com> Message-ID: <002001c2584f$480a4b10$070210ac@LAPDANCE> > xterm does a nifty job usually of figuring out what to highlight when I > double-click on a word. It fails with mailto: because normally when I > cut'n'paste an address, I *don't* want to include the "mailto:" portion. You can configure xterm to treat : as punctuation and not a word char. See man xterm. BArry From bsder@mail.allcaps.org Mon Sep 9 23:21:49 2002 From: bsder@mail.allcaps.org (Andrew P. Lentvorski) Date: Mon, 9 Sep 2002 15:21:49 -0700 (PDT) Subject: [Python-Dev] Subsecond time stamps In-Reply-To: <200209061530.g86FUeq15029@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020909150038.W79275-100000@mail.allcaps.org> On Fri, 6 Sep 2002, Guido van Rossum wrote: > > C. Make st_mtime a floating point number. This won't offer nanosecond > > resolution, as C doubles are not dense enough. > > This is the most Pythonic approach. -1 This then locks Python into a specific bit-description notion of a double in order to get the appropriate number of significant digits to describe time sufficiently. Embedded/portable processors may not support the notion of an IEEE double. In addition, timers get increasingly dense as computers get faster. Thus, doubles may work for nanoseconds, but will not be sufficient for picoseconds. If the goal is a field which never has to be changed to support any amount of time, the value should be "infinite precision". At that point, a Python Long used in some tuple representation of fixed-point arithmetic springs to mind. ie. (, ) -a From martin@v.loewis.de Mon Sep 9 23:26:55 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 10 Sep 2002 00:26:55 +0200 Subject: [Python-Dev] Codecs lookup order In-Reply-To: References: Message-ID: pinard@iro.umontreal.ca (Fran=E7ois Pinard) writes: > I'm not sure what should best be done. The documentation might be > modified to explain the limitation, so other users do not trip up on > it. `encoding.lookup()' might merely return None in case > `getregentry' is not defined in the imported module, or else, it > could make sure that it imports modules exclusively from within the > `encodings' package. This is what Python 2.3, and Python 2.2.2 will do. Regards, Martin From martin@v.loewis.de Mon Sep 9 23:33:20 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 10 Sep 2002 00:33:20 +0200 Subject: [Python-Dev] 64-bit process optimization 1 In-Reply-To: <200209091755.g89HtLV30441@pcp02138704pcs.reston01.va.comcast.net> References: <20020909123418.AAB25999@cas.org> <200209091755.g89HtLV30441@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > So perhaps the refcnt should have been a long in the first place. A > similar argument may hold for the length of e.g. strings and lists: > one could wish to have a list of more than 2 billion elements, or a > string containing more than 2 gigabytes (that much RAM is easily found > on the larger 64-bit servers, I believe). > > Opinions? I agree with that position, and Tim's, that those fields should widen to 64 bits on a 64-bit system. I disagree that size_t is suitable for ob_size, since some types put negative values into ob_size. The signed version of that, ssize_t, is not universally available, so we'd need to add Py_ssize_t. Regards, Martin From aahz@pythoncraft.com Tue Sep 10 00:07:13 2002 From: aahz@pythoncraft.com (Aahz) Date: Mon, 9 Sep 2002 19:07:13 -0400 Subject: [Python-Dev] Cut'n'paste In-Reply-To: <002001c2584f$480a4b10$070210ac@LAPDANCE> References: <20020908040224.GA27302@panix.com> <002001c2584f$480a4b10$070210ac@LAPDANCE> Message-ID: <20020909230713.GA5338@panix.com> On Mon, Sep 09, 2002, Barry Scott wrote: >Aahz: >> >> xterm does a nifty job usually of figuring out what to highlight when I >> double-click on a word. It fails with mailto: because normally when I >> cut'n'paste an address, I *don't* want to include the "mailto:" portion. > > You can configure xterm to treat : as punctuation and not a word > char. See man xterm. Then it would fail with regular URLs. You can't win. ;-) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From guido@python.org Tue Sep 10 00:06:30 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 09 Sep 2002 19:06:30 -0400 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Your message of "Mon, 09 Sep 2002 15:21:49 PDT." <20020909150038.W79275-100000@mail.allcaps.org> References: <20020909150038.W79275-100000@mail.allcaps.org> Message-ID: <200209092306.g89N6V806944@pcp02138704pcs.reston01.va.comcast.net> > > > C. Make st_mtime a floating point number. This won't offer nanosecond > > > resolution, as C doubles are not dense enough. > > > > This is the most Pythonic approach. > > -1 > > This then locks Python into a specific bit-description notion of a double > in order to get the appropriate number of significant digits to describe > time sufficiently. Embedded/portable processors may not support the > notion of an IEEE double. > > In addition, timers get increasingly dense as computers get faster. Thus, > doubles may work for nanoseconds, but will not be sufficient for > picoseconds. > > If the goal is a field which never has to be changed to support any amount > of time, the value should be "infinite precision". At that point, a > Python Long used in some tuple representation of fixed-point arithmetic > springs to mind. ie. (, ) I'm sorry, but I really don't see the point of wanting to record file mtimes all the way up to nanosecond precision. What would it mean? Most clocks are off by a few seconds at least anyway. Python has represented time as Pythin floats (implemented as C doubles) all its life long and it has served us well. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Tue Sep 10 00:34:12 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 10 Sep 2002 01:34:12 +0200 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: <20020909150038.W79275-100000@mail.allcaps.org> References: <20020909150038.W79275-100000@mail.allcaps.org> Message-ID: "Andrew P. Lentvorski" writes: > This then locks Python into a specific bit-description notion of a double > in order to get the appropriate number of significant digits to describe > time sufficiently. Embedded/portable processors may not support the > notion of an IEEE double. That's not true. Support you have two fields, tv_sec and tv_nsec. Then the resulting float expression is tv_sec + 1e-9 * tv_nsec; This expression works on all systems that support floating point numbers - be it IEEE or not. > In addition, timers get increasingly dense as computers get faster. > Thus, doubles may work for nanoseconds, but will not be sufficient > for picoseconds. At the same time, floating point numbers get increasingly more accurate as computer registers widen. In a 64-bit float, you can just barely express 1e-7s (if you base the era at 1970); with a 128-bit float, you can express 1e-20s easily. > If the goal is a field which never has to be changed to support any amount > of time, the value should be "infinite precision". No, just using floating point numbers is sufficient. Notice that time.time() also returns a floating point number. > At that point, a Python Long used in some tuple representation of > fixed-point arithmetic springs to mind. ie. (, fractional point>) Yes, when/if Python gets rational numbers, or decimal fixed-or-floating point numbers, those data types might represent the the value that the system reports more accurately. At that time, there will be a transition plan to introduce those numbers at all places where it is reasonable, with as little impact on applications as possible. Regards, Martin From brian@sweetapp.com Tue Sep 10 00:55:15 2002 From: brian@sweetapp.com (Brian Quinlan) Date: Mon, 09 Sep 2002 16:55:15 -0700 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Message-ID: <01b501c2585c$584b4a80$df7e4e18@brianspiv1700> MvL wrote: > That's not true. Support you have two fields, tv_sec and tv_nsec. Then > the resulting float expression is > > tv_sec + 1e-9 * tv_nsec; > > This expression works on all systems that support floating point > numbers - be it IEEE or not. Don't you have to truncate tv_sec for that to work? i.e. Truncate(tv_sec, 9) + 1e-9 * tv_nsec Cheers, Brian From drifty@bigfoot.com Tue Sep 10 01:25:56 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 9 Sep 2002 17:25:56 -0700 (PDT) Subject: [Python-Dev] Cut'n'paste In-Reply-To: <20020909230713.GA5338@panix.com> Message-ID: [Aahz] > > You can configure xterm to treat : as punctuation and not a word > > char. See man xterm. > > Then it would fail with regular URLs. You can't win. ;-) I am now officially ignoring any more comments on how to format URLs and email addresses in the summary. Aahz is right, "You can't win" and thus I am not going to bother to try to please everyone. I will just do it the way I feel like it and if someone doesn't like it they can just reformat the code with a regex to make themselves happy. Now I know how Guido must have felt with everyone and their mother throwing in their opinion about booleans. =) -Brett From tim.one@comcast.net Tue Sep 10 02:21:11 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 09 Sep 2002 21:21:11 -0400 Subject: [Python-Dev] Cut'n'paste In-Reply-To: Message-ID: [Brett Cannon] > I am now officially ignoring any more comments on how to format URLs > and email addresses in the summary. What?! I didn't get around to insisting that you use XML for this, with one UTF8-encoded character per element thingie. > Aahz is right, "You can't win" and thus I am not going to bother to try > to please everyone. I will just do it the way I feel like it and if > someone doesn't like it they can just reformat the code with a regex > to make themselves happy. Such a small-minded attitude, Brett. OTOH, it may preserve a bit of your life for something enjoyable. > Now I know how Guido must have felt with everyone and their mother > throwing in their opinion about booleans. =) Not until you're accused of destroying all that's good about Python, going out of your way to make it impossible to teach programming, and most likely breaking every important Python program ever written. It will take several years for you to earn that level of abuse . no-good-deed-goes-unpunished-ly y'rs - tim From python@rcn.com Tue Sep 10 04:07:11 2002 From: python@rcn.com (Raymond Hettinger) Date: Mon, 9 Sep 2002 23:07:11 -0400 Subject: [Python-Dev] Cut'n'paste References: Message-ID: <001501c25877$29c19460$a661accf@othello> From: "Brett Cannon" > I am now officially ignoring any more comments on how to format URLs and > email addresses in the summary. Aahz is right, "You can't win" and thus I > am not going to bother to try to please everyone. I will just do it the > way I feel like it and if someone doesn't like it they can just reformat > the code with a regex to make themselves happy. > > Now I know how Guido must have felt with everyone and their mother > throwing in their opinion about booleans. =) BTW, my mother would have wanted spaces as delimiters. Raymond Hettinger From martin@v.loewis.de Tue Sep 10 07:30:02 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 10 Sep 2002 08:30:02 +0200 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: <01b501c2585c$584b4a80$df7e4e18@brianspiv1700> References: <01b501c2585c$584b4a80$df7e4e18@brianspiv1700> Message-ID: Brian Quinlan writes: > > tv_sec + 1e-9 * tv_nsec; > > [...] > Don't you have to truncate tv_sec for that to work? i.e. > > Truncate(tv_sec, 9) + 1e-9 * tv_nsec What is Truncate, and why would I need it? Regards, Martin From brian@sweetapp.com Tue Sep 10 07:50:09 2002 From: brian@sweetapp.com (Brian Quinlan) Date: Mon, 09 Sep 2002 23:50:09 -0700 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: Message-ID: <01cd01c25896$4e301d70$df7e4e18@brianspiv1700> > What is Truncate, and why would I need it? You wouldn't need it because I misunderstood the problem. Sorry. Cheers, Brian From fredrik@pythonware.com Tue Sep 10 09:30:37 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 10 Sep 2002 10:30:37 +0200 Subject: [Python-Dev] 64-bit process optimization 1 References: <20020909123418.AAB25999@cas.org> <200209091755.g89HtLV30441@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <001501c258a4$6f889ca0$ced241d5@hagrid> guido wrote: > Unfortunately, one problem is binary compatibility. We try to make it > possible to link newer Python versions with extension modules (like > Numeric, which you use) compiled for older versions. This requires > that the binary lay-out of objects remains the same, and swapping > ob_refcnt and ob_type would cause immediate crashes in this case. a compromise could be to make the swap in 2.3, but only on 64-bit platforms. it's obvious that most people are stuck on 32-bit platforms today, and I think it's safe to say that users on 64-bit plat- forms might be a bit more willing to build everything they need on their local platform. another alternative would be to make it a configuration option, with a platform-dependent default. From Anthony Baxter Tue Sep 10 11:06:31 2002 From: Anthony Baxter (Anthony Baxter) Date: Tue, 10 Sep 2002 20:06:31 +1000 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: <200209092306.g89N6V806944@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209101006.g8AA6Vb28742@localhost.localdomain> >>> Guido van Rossum wrote > I'm sorry, but I really don't see the point of wanting to record file > mtimes all the way up to nanosecond precision. What would it mean? > Most clocks are off by a few seconds at least anyway. Not only that, but if you're that precise, are you measuring the time when the modification started, the time when it started hitting the disks, when the write on the disk completed, when the O/S signalled to the application that the modification was complete... questions questions.. .:) From guido@python.org Tue Sep 10 14:54:58 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 10 Sep 2002 09:54:58 -0400 Subject: [Python-Dev] 64-bit process optimization 1 In-Reply-To: Your message of "Tue, 10 Sep 2002 10:30:37 +0200." <001501c258a4$6f889ca0$ced241d5@hagrid> References: <20020909123418.AAB25999@cas.org> <200209091755.g89HtLV30441@pcp02138704pcs.reston01.va.comcast.net> <001501c258a4$6f889ca0$ced241d5@hagrid> Message-ID: <200209101354.g8ADswV23058@odiug.zope.com> > > Unfortunately, one problem is binary compatibility. We try to make it > > possible to link newer Python versions with extension modules (like > > Numeric, which you use) compiled for older versions. This requires > > that the binary lay-out of objects remains the same, and swapping > > ob_refcnt and ob_type would cause immediate crashes in this case. > > a compromise could be to make the swap in 2.3, but only > on 64-bit platforms. > > it's obvious that most people are stuck on 32-bit platforms > today, and I think it's safe to say that users on 64-bit plat- > forms might be a bit more willing to build everything they > need on their local platform. > > another alternative would be to make it a configuration option, > with a platform-dependent default. I like all of that. Maybe it should also be a config option whether refcount, sizes etc. should be 32 or 64 bit quantities on 64 bit platforms. --Guido van Rossum (home page: http://www.python.org/~guido/) From mcherm@destiny.com Tue Sep 10 15:18:38 2002 From: mcherm@destiny.com (Michael Chermside) Date: Tue, 10 Sep 2002 10:18:38 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message Message-ID: <3D7DFF3E.3030200@destiny.com> Zack Weinberg writes: > You've generated this patch backward, and in a format which makes it > useless to us. Please regenerate it with diff -c or diff -u (either > is acceptable) and put the newer file _second_ on the command line: > diff -u OLD_FILE NEW_FILE. It wasn't all that long ago that I submitted my first patch (of documentation, not code) to SourceForge. It took me > 20 minutes of careful web searching to figure out the desired way of submitting files and the correct way to generate that. And I still wasn't 100% sure I was generating the diff in the correct direction. Couldn't Zack's comment be added to the directions found at https://sourceforge.net/tracker/?func=add&group_id=5470&atid=305470 so that anyone submitting a patch would see how to do it. (But of course that wouldn't have helped THIS person, who didn't use sourceforge... :-( ) -- Michael Chermside From guido@python.org Tue Sep 10 15:26:48 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 10 Sep 2002 10:26:48 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: Your message of "Tue, 10 Sep 2002 10:18:38 EDT." <3D7DFF3E.3030200@destiny.com> References: <3D7DFF3E.3030200@destiny.com> Message-ID: <200209101426.g8AEQm023271@odiug.zope.com> > Zack Weinberg writes: > > You've generated this patch backward, and in a format which makes it > > useless to us. Please regenerate it with diff -c or diff -u (either > > is acceptable) and put the newer file _second_ on the command line: > > diff -u OLD_FILE NEW_FILE. [Michael Chermside] > It wasn't all that long ago that I submitted my first patch (of > documentation, not code) to SourceForge. It took me > 20 minutes of > careful web searching to figure out the desired way of submitting files > and the correct way to generate that. And I still wasn't 100% sure I was > generating the diff in the correct direction. > > Couldn't Zack's comment be added to the directions found at > https://sourceforge.net/tracker/?func=add&group_id=5470&atid=305470 > so that anyone submitting a patch would see how to do it. I guess we're assuming that even people who aren't familiar with SourceForge are familiar with diff. Is that not a reasonable assumption any more? There's also the developer FAQ, which has carefull instructions for patch generation at http://www.python.org/dev/devfaq.html#patches and in addition points to http://www.python.org/patches/ which has everything you need (except the hint about forward diffs; I'll add that). --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen@cwi.nl Tue Sep 10 15:39:37 2002 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Tue, 10 Sep 2002 16:39:37 +0200 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: <200209101426.g8AEQm023271@odiug.zope.com> Message-ID: <2167CDEB-C4CB-11D6-911E-0030655234CE@cwi.nl> On Tuesday, September 10, 2002, at 04:26 , Guido van Rossum wrote: >> Couldn't Zack's comment be added to the directions found at >> https://sourceforge.net/tracker/?func=add&group_id=5470&atid=305470 >> so that anyone submitting a patch would see how to do it. > > I guess we're assuming that even people who aren't familiar with > SourceForge are familiar with diff. Is that not a reasonable > assumption any more? Not cross-platform. I've had patches for MacPython in rather outlandish diff-like formats, so a note that tells people to use the unix diff program wouldn't hurt. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From guido@python.org Tue Sep 10 15:41:40 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 10 Sep 2002 10:41:40 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: Your message of "Tue, 10 Sep 2002 16:39:37 +0200." <2167CDEB-C4CB-11D6-911E-0030655234CE@cwi.nl> References: <2167CDEB-C4CB-11D6-911E-0030655234CE@cwi.nl> Message-ID: <200209101441.g8AEfeW23387@odiug.zope.com> > > I guess we're assuming that even people who aren't familiar with > > SourceForge are familiar with diff. Is that not a reasonable > > assumption any more? > > Not cross-platform. I've had patches for MacPython in rather > outlandish diff-like formats, so a note that tells people to use the > unix diff program wouldn't hurt. But what good does a reference to "the unix diff program" do a Mac developer? --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Tue Sep 10 15:48:19 2002 From: aahz@pythoncraft.com (Aahz) Date: Tue, 10 Sep 2002 10:48:19 -0400 Subject: [Python-Dev] Writing patches In-Reply-To: <200209101426.g8AEQm023271@odiug.zope.com> References: <3D7DFF3E.3030200@destiny.com> <200209101426.g8AEQm023271@odiug.zope.com> Message-ID: <20020910144818.GA13037@panix.com> On Tue, Sep 10, 2002, Guido van Rossum wrote: > > There's also the developer FAQ, which has carefull instructions for > patch generation at > > http://www.python.org/dev/devfaq.html#patches > > and in addition points to http://www.python.org/patches/ which has > everything you need (except the hint about forward diffs; I'll add > that). Perhaps the "patches" link at http://www.python.org/ should point at either DevFAQ#patches or the patches page. (That was my original intention in not linking directly to SF -- you're the one who added the direct links.) The question IMO is whether those links are for the benefit of core developers or newbies. I'm +1 on the latter. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From skip@pobox.com Tue Sep 10 15:52:53 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 10 Sep 2002 09:52:53 -0500 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: <3D7DFF3E.3030200@destiny.com> References: <3D7DFF3E.3030200@destiny.com> Message-ID: <15742.1861.180590.431080@12-248-11-90.client.attbi.com> Michael> Couldn't Zack's comment be added to the directions found at Michael> https://sourceforge.net/tracker/?func=add&group_id=5470&atid=305470 Michael> so that anyone submitting a patch would see how to do it. On that page there's a link entitled "See our hints on how to create a patch." This links to http://www.python.org/patches/ which has, I think, the required details. -- Skip Montanaro skip@pobox.com consulting: http://manatee.mojam.com/~skip/resume.html From guido@python.org Tue Sep 10 15:59:13 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 10 Sep 2002 10:59:13 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: Your message of "Tue, 10 Sep 2002 09:52:53 CDT." <15742.1861.180590.431080@12-248-11-90.client.attbi.com> References: <3D7DFF3E.3030200@destiny.com> <15742.1861.180590.431080@12-248-11-90.client.attbi.com> Message-ID: <200209101459.g8AExD323473@odiug.zope.com> > Michael> Couldn't Zack's comment be added to the directions found at > Michael> https://sourceforge.net/tracker/?func=add&group_id=5470&atid=305470 > Michael> so that anyone submitting a patch would see how to do it. > > On that page there's a link entitled "See our hints on how to create a > patch." This links to > > http://www.python.org/patches/ > > which has, I think, the required details. I added that link a few minutes ago. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mcherm@destiny.com Tue Sep 10 16:02:23 2002 From: mcherm@destiny.com (Michael Chermside) Date: Tue, 10 Sep 2002 11:02:23 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message Message-ID: <3D7E097F.7000003@destiny.com> >> On that page there's a link entitled "See our hints on how to create a >> patch." This links to >> >> http://www.python.org/patches/ >> >> which has, I think, the required details. > > I added that link a few minutes ago. :-) > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think that's a great fix. Thanks! -- Michael Chermside From thomas@xs4all.net Tue Sep 10 16:12:53 2002 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 10 Sep 2002 17:12:53 +0200 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Mac/Include getapplbycreator.h,1.3,1.4 macdefs.h,1.11,1.12 macglue.h,1.61,1.62 pythonresources.h,1.27,1.28 In-Reply-To: References: Message-ID: <20020910151252.GA830@xs4all.nl> On Tue, Sep 10, 2002 at 05:32:49AM -0700, jackjansen@users.sourceforge.net wrote: > Modified Files: > getapplbycreator.h macdefs.h macglue.h pythonresources.h > Log Message: > Added include guards and C++ extern "C" {} constructs. Partial fix for #607253. > Bugfix candidate. [..] > *** getapplbycreator.h 19 May 2001 12:32:39 -0000 1.3 > --- getapplbycreator.h 10 Sep 2002 12:32:47 -0000 1.4 [..] > ******************************************************************/ > + #ifndef Py_GETAPPLBYCREATOR_H > + #define Py_GETALLPBYCREATOR_H This looks suspiciously like a bug. If you really do intend to #define something different than you just checked against, you should add a comment stating that this really isn't a typo of a very common idiom :) I'm-not-dead--I-feel-fine-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From pinard@iro.umontreal.ca Tue Sep 10 16:26:56 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Tue, 10 Sep 2002 11:26:56 -0400 Subject: [Python-Dev] Re: Codecs lookup order In-Reply-To: (martin@v.loewis.de's message of "10 Sep 2002 00:26:55 +0200") References: Message-ID: [Martin v. Loewis] > pinard@iro.umontreal.ca (Fran.ois Pinard) writes: >> I'm not sure what should best be done. 1) The documentation might be >> modified to explain the limitation, so other users do not trip up on it. >> 2) `encoding.lookup()' might merely return None in case `getregentry' is >> not defined in the imported module, or else, 3) it could make sure that it >> imports modules exclusively from within the `encodings' package. > This is what Python 2.3, and Python 2.2.2 will do. Hi, Martin. I added "1)", "2)" and "3)" in the original text for clarity. Will Python 2.2.2 and 2.3 do "3)", or all of "1)", "2)" and "3)"? If the codec search order is not changed, how one proceeds if s/he wants to override a bundled codec, with a provided other with the same encoding name? -- François Pinard http://www.iro.umontreal.ca/~pinard From xscottg@yahoo.com Tue Sep 10 17:15:04 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 10 Sep 2002 09:15:04 -0700 (PDT) Subject: [Python-Dev] 64-bit process optimization 1 In-Reply-To: <200209101354.g8ADswV23058@odiug.zope.com> Message-ID: <20020910161504.10967.qmail@web40110.mail.yahoo.com> --- Guido wrote: > > > > a compromise could be to make the swap in 2.3, but only > > on 64-bit platforms. > > > > it's obvious that most people are stuck on 32-bit platforms > > today, and I think it's safe to say that users on 64-bit plat- > > forms might be a bit more willing to build everything they > > need on their local platform. > > > > another alternative would be to make it a configuration option, > > with a platform-dependent default. > > I like all of that. Maybe it should also be a config option whether > refcount, sizes etc. should be 32 or 64 bit quantities on 64 bit > platforms. > +1 from this 64 bit user. __________________________________________________ Yahoo! - We Remember 9-11: A tribute to the more than 3,000 lives lost http://dir.remember.yahoo.com/tribute From barry@python.org Tue Sep 10 18:00:32 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 10 Sep 2002 13:00:32 -0400 Subject: [Python-Dev] The first trustworthy GBayes results References: <15726.13053.111171.335483@12-248-11-90.client.attbi.com> <200208291631.g7TGVgd28718@localhost.localdomain> Message-ID: <15742.9520.698662.836695@anthem.wooz.org> >>>>> "AB" == Anthony Baxter writes: >> Skip Montanaro wrote >> One thing worth noting before everybody starts using it to >> massage their mailboxes is that the email package contains a >> bug which causes it to occasionally delete whitespace when >> reformatting headers. BTW, I fixed Greg's problem but not Skip's. I'm still looking at this one... AB> There's one other known problem - seriously misformatted MIME AB> (as seen in spam, and email from Microsoft Entourage) causes AB> the email package to barf out. I plan, at some point, to try AB> and make a "if it fails, just leave the body as one chunk of AB> text" mode, but it's a long long way down my list of AB> priorities. I just checked this into cvs. -Barry From martin@v.loewis.de Tue Sep 10 19:25:16 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 10 Sep 2002 20:25:16 +0200 Subject: [Python-Dev] Subsecond time stamps In-Reply-To: <200209101006.g8AA6Vb28742@localhost.localdomain> References: <200209101006.g8AA6Vb28742@localhost.localdomain> Message-ID: Anthony Baxter writes: > Not only that, but if you're that precise, are you measuring the time > when the modification started, the time when it started hitting the > disks, when the write on the disk completed, when the O/S signalled > to the application that the modification was complete... questions > questions.. .:) For Python, these questions are easy to answer: We just report to the application what the system reports to us. It the the file system implementor's job to define the notion of modification time. Regards, Martin From martin@v.loewis.de Tue Sep 10 19:26:06 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 10 Sep 2002 20:26:06 +0200 Subject: [Python-Dev] Re: Codecs lookup order In-Reply-To: References: Message-ID: pinard@iro.umontreal.ca (Fran=E7ois Pinard) writes: > >> I'm not sure what should best be done. 1) The documentation might be > >> modified to explain the limitation, so other users do not trip up on i= t. > >> 2) `encoding.lookup()' might merely return None in case `getregentry' = is > >> not defined in the imported module, or else, 3) it could make sure tha= t it > >> imports modules exclusively from within the `encodings' package. >=20 > > This is what Python 2.3, and Python 2.2.2 will do. >=20 > Hi, Martin. >=20 > I added "1)", "2)" and "3)" in the original text for clarity. Will Python > 2.2.2 and 2.3 do "3)", or all of "1)", "2)" and "3)"? Oops, it's 2) that Python 2.3 will do. Regards, Martin From barry@barrys-emacs.org Tue Sep 10 20:17:26 2002 From: barry@barrys-emacs.org (Barry Scott) Date: Tue, 10 Sep 2002 20:17:26 +0100 Subject: [Python-Dev] Cut'n'paste In-Reply-To: <20020909230713.GA5338@panix.com> Message-ID: <000001c258fe$b326d800$070210ac@LAPDANCE> You double click and drag to highlight two words. I have to be missing the problem here. This is all basic GUI usage and nothing to do with whatever it is that's output URI. BArry > -----Original Message----- > From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On > Behalf Of Aahz > Sent: 10 September 2002 00:07 > To: barry@barrys-emacs.org > Cc: python-dev@python.org > Subject: [Python-Dev] Cut'n'paste > > > On Mon, Sep 09, 2002, Barry Scott wrote: > >Aahz: > >> > >> xterm does a nifty job usually of figuring out what to highlight when I > >> double-click on a word. It fails with mailto: because normally when I > >> cut'n'paste an address, I *don't* want to include the > "mailto:" portion. > > > > You can configure xterm to treat : as punctuation and not a word > > char. See man xterm. > > Then it would fail with regular URLs. You can't win. ;-) > -- > Aahz (aahz@pythoncraft.com) <*> > http://www.pythoncraft.com/ > > Project Vote Smart: http://www.vote-smart.org/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > From Jack.Jansen@oratrix.com Tue Sep 10 21:03:05 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Tue, 10 Sep 2002 22:03:05 +0200 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: <200209101441.g8AEfeW23387@odiug.zope.com> Message-ID: <51D8ADCF-C4F8-11D6-88B2-003065517236@oratrix.com> On dinsdag, september 10, 2002, at 04:41 , Guido van Rossum wrote: >>> I guess we're assuming that even people who aren't familiar with >>> SourceForge are familiar with diff. Is that not a reasonable >>> assumption any more? >> >> Not cross-platform. I've had patches for MacPython in rather >> outlandish diff-like formats, so a note that tells people to use the >> unix diff program wouldn't hurt. > > But what good does a reference to "the unix diff program" do a Mac > developer? At the very least they won't send me MPW diffs. At best they fire up OSX and use the One True Diff:-) -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From Jack.Jansen@oratrix.com Tue Sep 10 22:05:42 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Tue, 10 Sep 2002 23:05:42 +0200 Subject: [Python-Dev] Weeding out obsolete modules and Demos Message-ID: <10D997B8-C501-11D6-88B2-003065517236@oratrix.com> Folks, how about going over the various demos, and see which ones have really lost their usefulness? I happened to come across Demo/sgi/audio (works only on SGI 4D/35 machines, which went out of production about 12 years ago), sv and video (works on Indigo's with the Starter Video board, last seen about 8 years ago). And there's the svmodule.c (yup, same board). There are probably Indigo's still alive (4D35's? I doubt it, I can still remember the noise it made:-), but I wonder whether anyone in their right mind is still using the SV board. The forms/fl stuff still technically works on newer SGI's, but we might also wonder how useful they still are. And this is for SGI only, there's probably a lot more dead wood out there, -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From guido@python.org Tue Sep 10 22:19:01 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 10 Sep 2002 17:19:01 -0400 Subject: [Python-Dev] Weeding out obsolete modules and Demos In-Reply-To: Your message of "Tue, 10 Sep 2002 23:05:42 +0200." <10D997B8-C501-11D6-88B2-003065517236@oratrix.com> References: <10D997B8-C501-11D6-88B2-003065517236@oratrix.com> Message-ID: <200209102119.g8ALJ1h29280@odiug.zope.com> > how about going over the various demos, and see which ones have > really lost their usefulness? Yeah! > I happened to come across Demo/sgi/audio (works only on SGI > 4D/35 machines, which went out of production about 12 years > ago), sv and video (works on Indigo's with the Starter Video > board, last seen about 8 years ago). And there's the svmodule.c > (yup, same board). > There are probably Indigo's still alive (4D35's? I doubt it, I > can still remember the noise it made:-), but I wonder whether > anyone in their right mind is still using the SV board. > > The forms/fl stuff still technically works on newer SGI's, but > we might also wonder how useful they still are. > > And this is for SGI only, there's probably a lot more dead wood > out there, I haven't seen or heard an SGI machine for years. If you think those SGI demos have lost their usefulness, please use your CVS powers to delete them! --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@bigfoot.com Wed Sep 11 01:07:58 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Tue, 10 Sep 2002 17:07:58 -0700 (PDT) Subject: [Python-Dev] utf-8 issue thread question Message-ID: So here is the summary question for this thread: what exactly is a surrogate? I think I get it (from reading a l18n email from MAL on the l18n list), but I am not confident enough to stick in the summary as of yet. The following is my current rough summary explanation for what a surrogate is. Can someone please correct it as needed? """ In Unicode, a surrogate is when you encode from a higher bit total encoding (such as utf-16) into a smaller bit total encoding by representing the character as several more bit chunks (such as two utf-8 chunks). The following line is an example: >>> u'\ud800'.encode('utf-8') == '\xed\xa0\x80' Notice how the initial Unicode character ends up being encoded as three characters in utf-8. """ Also, anyone know of some good Unicode tutorials, explanations, etc. on the web, in book form, whatever? Most of the threads that I don't totally comprehend are Unicode related and I would like to minimize my brain-dead questions to a minimum. Don't want my reputation to go down the drain. =) -Brett From fredrik@pythonware.com Wed Sep 11 01:24:53 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 11 Sep 2002 02:24:53 +0200 Subject: [Python-Dev] utf-8 issue thread question References: Message-ID: <004a01c25929$a6fb3f50$0900a8c0@spiff> Brett Cannon wrote: > The following is my current rough summary explanation for what a = surrogate > is. Can someone please correct it as needed? needed, indeed. it's 2.30 am over here, so I'm not going to try to explain this myself, but some random googling brought up this page: http://216.239.37.100/search?q=3Dcache:Dk12BZNt6skC:uk.geocities.com/Babe= lStone1357/Software/surrogates.html The code points U+D800 through U+DB7F are reserved as High = Surrogates, and the code points U+DC00 through U+DFFF are reserved as Low = Surrogates. Each code point in [the full 20-bit unicode character space] maps to = a pair of 16-bit code points comprising a High Surrogate followed by a Low = Surrogate. Thus, for example, the Gothic letter AHSA has the UTF-32 value of = U+10330, which maps to the surrogate pair U+D800 and U+DF30. That is to say, = in the 16-bit encoding of Unicode (UTF-16), the Gothic letter AHSA is = represented by two consecutive 16-bit code points (U+D800 and U+DF30), whereas = in the 32-bit encoding of Unicode (UTF-32), the same letter is represented = by a single 32-bit value (U+10330). From whisper@oz.net Wed Sep 11 01:40:56 2002 From: whisper@oz.net (David LeBlanc) Date: Tue, 10 Sep 2002 17:40:56 -0700 Subject: [Python-Dev] Weeding out obsolete modules and Demos In-Reply-To: <200209102119.g8ALJ1h29280@odiug.zope.com> Message-ID: > I haven't seen or heard an SGI machine for years. If you think those > SGI demos have lost their usefulness, please use your CVS powers to > delete them! > > --Guido van Rossum (home page: http://www.python.org/~guido/) > Um... maybe just move them to the not-shipped side of things at first in case there are hold-outs out there still clinging to their stone axes? ;) Dave LeBlanc Seattle, WA USA From drifty@bigfoot.com Wed Sep 11 01:40:00 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Tue, 10 Sep 2002 17:40:00 -0700 (PDT) Subject: [Python-Dev] utf-8 issue thread question In-Reply-To: <004a01c25929$a6fb3f50$0900a8c0@spiff> Message-ID: [Fredrik Lundh] > Brett Cannon wrote: > > it's 2.30 am over here, so I'm not going to try to explain this myself, > but some random googling brought up this page: > > http://216.239.37.100/search?q=cache:Dk12BZNt6skC:uk.geocities.com/BabelStone1357/Software/surrogates.html > > The code points U+D800 through U+DB7F are reserved as High Surrogates, > and the code points U+DC00 through U+DFFF are reserved as Low Surrogates. > Each code point in [the full 20-bit unicode character space] maps to a pair of > 16-bit code points comprising a High Surrogate followed by a Low Surrogate. > Thus, for example, the Gothic letter AHSA has the UTF-32 value of U+10330, > which maps to the surrogate pair U+D800 and U+DF30. That is to say, in the > 16-bit encoding of Unicode (UTF-16), the Gothic letter AHSA is represented > by two consecutive 16-bit code points (U+D800 and U+DF30), whereas in the > 32-bit encoding of Unicode (UTF-32), the same letter is represented by a > single 32-bit value (U+10330). > > > So with that explanation, here is the current rewrite: """ In Unicode, a surrogate pair is when you create the representation of a character by using two values. So, for instance, UTF-32 can cover the entire Unicode space (Unicode is 20 bits), but UTF-16 can't. To solve the issue a character can be represented as a pair of UTF-16 values. The problem in Python 2.2.1 is that when there is only a lone surrogate (instead of there being a pair of values), the encoder for UTF-8 messes up and leaves off a UTF-8 value. The following line is an example: >>> u'\ud800'.encode('utf-8') '\xa0\x80' #In Python 2.2.1 '\xed\xa0\x80' #In Python 2.3a0 Notice how in Python 2.3a0 the extra value is inserted so as to make the representation a complete Unicode character instead of only encoding the half of the surrogate pair that the encode was given. """ How is that? -Brett From guido@python.org Wed Sep 11 01:39:14 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 10 Sep 2002 20:39:14 -0400 Subject: [Python-Dev] utf-8 issue thread question In-Reply-To: Your message of "Tue, 10 Sep 2002 17:07:58 PDT." References: Message-ID: <200209110039.g8B0dEQ09916@pcp02138704pcs.reston01.va.comcast.net> > So here is the summary question for this thread: what exactly is a > surrogate? Unicode surrogates are used specifically to encode Unicode characters with values >= 2**16 as two 16-bit code points. The Unicode standard has conveniently reserved two ranges for these (see /F's post). The first (high) surrogate encodes the high 10 bits, the second (low) surrogate encodes the low 10 bits. For redundancy, the top bit pattern is different for high and low surrogates. One thing to watch out for: I believe that the bit pattern that's encoded is not the bit pattern of the full unicode character, but 2**16 less. This allows one to encode 2**16 more characters, at the cost of some extra complexity. > I think I get it (from reading a l18n email from MAL on the > l18n list), but I am not confident enough to stick in the summary as of > yet. > > The following is my current rough summary explanation for what a surrogate > is. Can someone please correct it as needed? > > """ > In Unicode, a surrogate is when you encode from a higher bit total > encoding (such as utf-16) into a smaller bit total encoding by > representing the character as several more bit chunks (such as two utf-8 > chunks). The following line is an example: > > >>> u'\ud800'.encode('utf-8') == '\xed\xa0\x80' > > Notice how the initial Unicode character ends up being encoded as three > characters in utf-8. > """ No, the UTF8 encoding is not called surrogate. Only 16-bit values are surrogates. In this example, \ud800 is a high surrogate that's not followed by a low surrogate. The UTF-8 encoder could do two things with this: encode the bit pattern, or throw an error. Note that when the UTF-8 encoder sees a *pair* of surrogates (a high surrogate followed by a low surrogate), it is supposed to extract the single unicode character from them, and encode that. The UTF-8 decoder must in turn create a surrogate pair when decoding to 16-bit Unicode (as opposed to when decoding to 32-bit Unicode, when it should not generate surrogates). Note that there are various problems with this. Surrogates are illegal in 32-bit Unicode, but of course you cannot really prevent them from occurring. What should that mean? > Also, anyone know of some good Unicode tutorials, explanations, > etc. on the web, in book form, whatever? Most of the threads that I > don't totally comprehend are Unicode related and I would like to > minimize my brain-dead questions to a minimum. Don't want my > reputation to go down the drain. =) I think the Unicode consortium website, www.unicode.org, has lots of good stuff, including the complete standard online. --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Wed Sep 11 04:39:52 2002 From: python@rcn.com (Raymond Hettinger) Date: Tue, 10 Sep 2002 23:39:52 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message References: <3D7DFF3E.3030200@destiny.com> <200209101426.g8AEQm023271@odiug.zope.com> Message-ID: <003201c25944$e4225a60$69d8accf@othello> From: "Guido van Rossum" > I guess we're assuming that even people who aren't familiar with > SourceForge are familiar with diff. Is that not a reasonable > assumption any more? > > There's also the developer FAQ, which has carefull instructions for > patch generation at > > http://www.python.org/dev/devfaq.html#patches > > and in addition points to http://www.python.org/patches/ which has > everything you need (except the hint about forward diffs; I'll add > that). FAQs and pointers be darned; there is only one way to developerhood and that is through the school of hard knocks. - Learn to use CVS (which, of course, entails SSH and such). - Use Googol (a lot), or risk proposing that which has already been decided, researched, or discussed ad naseum. - Submit a patch. Find out that it was the wrong diff format or keyed-off of an older, non-current version of the file. Then find-out that your editor's tabbing and spacing confounds somebody's life, somewhere. Oh, did you forget to run the regression tests? Did your tests run fine, but you didn't run them in debug mode? Perhaps your Windows machine skips the test for the library you modified. Break working code and suffer public flogging. - Read every PEP and make sure your patch style has no deviations (unless, of course, your C and Python coding style already matched the PEPs). - BTW, did you submit unittests and docs with your patch? Did you make appropriate adjustments to the makefiles, and every other reference to you work? And appropriate announcements in Misc/NEWS? - Surely, you've learned TeX and its many Python specific macros (forward slash or backslash, verbatim or code?) How many characters were on your longest line (72, 78, hopefully, not more). - When learning a guitar, it helps to develop calluses on the fingers. Write a PEP is the fastest way to develop the calluses; contradicting Guido is the second fastest way; submitting a great idea is third fastest (bad ideas either get ignored or are slammed so quickly that the scar tissue doesn't have time to develop). - Experience the politics of bug resolution. If a developer proposed it, then it should not be dismissed lightly. If someone had a grandiose scheme in mind when they submitted the report, be prepared for wrath when you apply a simple solution. Realize that, in some cases, someone, somewhere is relying on the undocumented buggy behavior and your fixing it is breaking their code. - And, my all time favorite, do everything right (formatting, procedure, profiling, testing, etc) and watch the Timbot come along five minutes later and improve your code making it faster, clearer, more conformant, more elegant, and also gel neatly with the vaguaries of memory allocation, cache performance, and compilers you've never heard of. Raymond Hettinger Oh, and did I mention that native speakers of ASCII will never be able to master Unicode like a native? From drifty@bigfoot.com Wed Sep 11 05:36:30 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Tue, 10 Sep 2002 21:36:30 -0700 (PDT) Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: <003201c25944$e4225a60$69d8accf@othello> Message-ID: [Raymond Hettinger] Kudos to Raymond on this email. Great stuff. I know I have had my growing pains with learning how to do everything correctly, so I really appreciate his points. -Brett From martin@v.loewis.de Wed Sep 11 07:23:17 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 11 Sep 2002 08:23:17 +0200 Subject: [Python-Dev] utf-8 issue thread question In-Reply-To: <200209110039.g8B0dEQ09916@pcp02138704pcs.reston01.va.comcast.net> References: <200209110039.g8B0dEQ09916@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > One thing to watch out for: I believe that the bit pattern that's > encoded is not the bit pattern of the full unicode character, but > 2**16 less. This allows one to encode 2**16 more characters, at the > cost of some extra complexity. Correct. That allows to encode a total of 17 planes in Unicode, a plane being 2**16 characters. Therefore, saying that Unicode is 20 bits is somewhat imprecise - its better to say that it is 21 bits. Regards, Martin From sjoerd@acm.org Wed Sep 11 09:44:03 2002 From: sjoerd@acm.org (Sjoerd Mullender) Date: Wed, 11 Sep 2002 10:44:03 +0200 Subject: [Python-Dev] Weeding out obsolete modules and Demos In-Reply-To: <200209102119.g8ALJ1h29280@odiug.zope.com> References: <10D997B8-C501-11D6-88B2-003065517236@oratrix.com> <200209102119.g8ALJ1h29280@odiug.zope.com> Message-ID: <200209110844.g8B8i3D14336@indus.ins.cwi.nl> I would not be opposed to deleting the whole Demo/sgi tree. I don't have an SGI workstation anymore, but I did have an SGI O2 until recently. I think the audio (and al) stuff and the gl stuff probably still work (I used mclock until recently). I think we can definitely get rid of the video directory (CMIF video format, remember that?). I'm not sure whether sv still compiles on modern SGI's. The cd module also still works. Having said this, I'm not sure there is still much value in keeping the demos. The modules in the Modules directory is another matter. Until recently I have used cd and al. I think cl might still work, but I'm not sure. I don't think sv works on anything other than Indigo's with a Starter Video board. gl (and I think also fm) still works. sgi also still works, but I'm not sure how useful it still is. It just defines functions nap and _getpty. rgbimg is for reading SGI RGB images, but is portable. Although one must ask whether it has a place in the standard library, since there is no similar level of support for more popular image formats. On Tue, Sep 10 2002 Guido van Rossum wrote: > > how about going over the various demos, and see which ones have > > really lost their usefulness? > > Yeah! > > > I happened to come across Demo/sgi/audio (works only on SGI > > 4D/35 machines, which went out of production about 12 years > > ago), sv and video (works on Indigo's with the Starter Video > > board, last seen about 8 years ago). And there's the svmodule.c > > (yup, same board). > > There are probably Indigo's still alive (4D35's? I doubt it, I > > can still remember the noise it made:-), but I wonder whether > > anyone in their right mind is still using the SV board. > > > > The forms/fl stuff still technically works on newer SGI's, but > > we might also wonder how useful they still are. > > > > And this is for SGI only, there's probably a lot more dead wood > > out there, > > I haven't seen or heard an SGI machine for years. If you think those > SGI demos have lost their usefulness, please use your CVS powers to > delete them! > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > -- Sjoerd Mullender From mcherm@destiny.com Wed Sep 11 14:02:28 2002 From: mcherm@destiny.com (Michael Chermside) Date: Wed, 11 Sep 2002 09:02:28 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message Message-ID: <3D7F3EE4.1050009@destiny.com> Raymond Hettinger writes: > FAQs and pointers be darned; there is only one way to > developerhood and that is through the school of hard knocks. > > [Terrific course catalog for said school elided] Raymond, I had to laugh at your "course catalog"... very funny but true, all of it very true. It made it into my long-term list of bookmarks not to lose. But it's worth noting that, although the school of hard knocks may be required by life itself (and the nature of programming), anytime that we can just add a link to a web page and perhaps allow someone to skip a course, it's a win all around. Anyway, thanks for bringing some humor into my morning. -- Michael Chermside From guido@python.org Wed Sep 11 15:52:30 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 11 Sep 2002 10:52:30 -0400 Subject: [Python-Dev] Re: raw headers in rfc822.Message In-Reply-To: Your message of "Wed, 11 Sep 2002 09:02:28 EDT." <3D7F3EE4.1050009@destiny.com> References: <3D7F3EE4.1050009@destiny.com> Message-ID: <200209111452.g8BEqUe07618@odiug.zope.com> > Raymond Hettinger writes: > > FAQs and pointers be darned; there is only one way to > > developerhood and that is through the school of hard knocks. > > > > [Terrific course catalog for said school elided] > > Raymond, I had to laugh at your "course catalog"... very funny but true, > all of it very true. It made it into my long-term list of bookmarks not > to lose. Better yet, it made the Developer FAQ. :-) > But it's worth noting that, although the school of hard knocks may be > required by life itself (and the nature of programming), anytime that we > can just add a link to a web page and perhaps allow someone to skip a > course, it's a win all around. Got specific text you'd like us to add to a specific page? Send it to webmaster! --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Wed Sep 11 18:15:01 2002 From: mwh@python.net (Michael Hudson) Date: 11 Sep 2002 18:15:01 +0100 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules _hotshot.c,1.26,1.27 In-Reply-To: "Raymond Hettinger"'s message of "Wed, 11 Sep 2002 12:56:36 -0400" References: <000901c259b4$318ceda0$d4e97ad1@othello> Message-ID: <2md6rk30tm.fsf@starship.python.net> "Raymond Hettinger" writes: > Houston, we have a problem: > > C:\py23\Modules\_hotshot.c(891) : error C2198: 'pack_lineno_tdelta' : too few actual parameters > C:\py23\Modules\_hotshot.c(892) : error C2059: syntax error : ')' Yikes! Fixed. I was sure I checked everything before checking in... Cheers, M. -- Imagine if every Thursday your shoes exploded if you tied them the usual way. This happens to us all the time with computers, and nobody thinks of complaining. -- Jeff Raskin From hbl@st-andrews.ac.uk Wed Sep 11 19:18:02 2002 From: hbl@st-andrews.ac.uk (Hamish Lawson) Date: Wed, 11 Sep 2002 19:18:02 +0100 Subject: [Python-Dev] Patch to make cgi.FieldStorage iterate over its keys Message-ID: <5.1.1.6.0.20020911190503.035e7590@spey.st-andrews.ac.uk> Below is a patch to make cgi.FieldStorage iterate over its keys, allowing it to behave like any other dictionary in this kind of construct: form = cgi.FieldStorage() for key in form: do something ... Hamish Lawson --- Compare: (<)E:\Python22\Lib\cgi.py (34894 bytes) with: (>)E:\temp\cgi.py (34955 bytes) 524a524,526 > def __iter__(self): > return iter(self.keys()) > From guido@python.org Wed Sep 11 19:23:56 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 11 Sep 2002 14:23:56 -0400 Subject: [Python-Dev] Patch to make cgi.FieldStorage iterate over its keys In-Reply-To: Your message of "Wed, 11 Sep 2002 19:18:02 BST." <5.1.1.6.0.20020911190503.035e7590@spey.st-andrews.ac.uk> References: <5.1.1.6.0.20020911190503.035e7590@spey.st-andrews.ac.uk> Message-ID: <200209111823.g8BINuP22410@odiug.zope.com> > Below is a patch to make cgi.FieldStorage iterate over its keys, allowing > it to behave like any other dictionary in this kind of construct: > > form = cgi.FieldStorage() > for key in form: > do something ... Thanks. I've applied this. Points subtracted though for (1) sending a patch to python-dev instead of using SourceForge and (2) sending a plain diff instead of a context diff. For that, your name won't be added the the list of contributors. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mats@laplaza.org Thu Sep 12 18:10:36 2002 From: mats@laplaza.org (Mats Wichmann) Date: Thu, 12 Sep 2002 11:10:36 -0600 Subject: [Python-Dev] Re: 64-bit process optimization 1 In-Reply-To: <20020909233501.11696.35777.Mailman@mail.python.org> Message-ID: <5.1.0.14.1.20020912105340.00aae7f0@204.151.72.2> >So perhaps the refcnt should have been a long in the first place. A >similar argument may hold for the length of e.g. strings and lists: >one could wish to have a list of more than 2 billion elements, or a >string containing more than 2 gigabytes (that much RAM is easily found >on the larger 64-bit servers, I believe). > >Opinions? If you change to longs it seems the reported performance increase goes away, which would seem to eliminate one of the motivations for accepting the pain of a binary incompatibility. Leaving just "getting it right". Mats From guido@python.org Thu Sep 12 18:19:54 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 12 Sep 2002 13:19:54 -0400 Subject: [Python-Dev] Re: 64-bit process optimization 1 In-Reply-To: Your message of "Thu, 12 Sep 2002 11:10:36 MDT." <5.1.0.14.1.20020912105340.00aae7f0@204.151.72.2> References: <5.1.0.14.1.20020912105340.00aae7f0@204.151.72.2> Message-ID: <200209121719.g8CHJs729947@odiug.zope.com> > >So perhaps the refcnt should have been a long in the first place. A > >similar argument may hold for the length of e.g. strings and lists: > >one could wish to have a list of more than 2 billion elements, or a > >string containing more than 2 gigabytes (that much RAM is easily found > >on the larger 64-bit servers, I believe). > > > >Opinions? > > If you change to longs it seems the reported > performance increase goes away, which would > seem to eliminate one of the motivations for > accepting the pain of a binary incompatibility. > > Leaving just "getting it right". Yup. That's why I think it might have to be a 3-valued config option, relevant for 64-bit machines only: "compat", "optimal", or "right". --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Thu Sep 12 19:02:47 2002 From: tismer@tismer.com (Christian Tismer) Date: Thu, 12 Sep 2002 20:02:47 +0200 Subject: [Python-Dev] flextype.c -- extended type system Message-ID: <3D80D6C7.9040201@tismer.com> Hi Guido, py-dev, preface: -------- a week ago or so, I sent a patch to Guido that removes the "etype" struct. This is a hidden structure that extends types when they are allocated on the heap. One restriction with this type was that types could not be extended by metatypes for some internal reason. I fixed this. Now meta-types can define extra slots for types. the point: ---------- I wasn't really after slots in types, but I wanted to have a type that can be extended as the user likes to. Using the re-worked etype (now named PyHeapType_Type), I created a new meta-type with some cool new features which give you C++ - like virtual methods and inheritance. The new dynamic type (PyFlexType_Type) allows to clone any existing type, and thereby to pass a virtual method table which will be bound into the type. It is a bit like slots and slot definitions, but the VMT definition is written like a PyMethodDef list (in fact, I have PyCMethodDef), and the created virtual function entries are spelled explicitly in the type structure. Structure of a VMT definition: typedef struct _pycmethoddef { char *name; /* name to lookup in __dict__ */ PyCFunction match; /* to be found if non-overridden */ void *fast; /* native C call */ void *wrap; /* wrapped call into Python */ int offset; /* slot offset in heap type */ } PyCMethodDef; At creation time of a new flextype, all VMT entries in the accumulated bases (accessed via the MRT) are scanned from oldest to newest, and the new type's methods are retrieved by the "name" entry. Then it is checked whether the method descriptor still points to the original PyCFunction entry (the "match" field). If it is still original, the native C call (field "fast") is inserted into the VMT, otherwise the wrapped Python callback (field "wrap") is inserted. As a result, it is now very cheap to use overridable small methods in your C implementations, since it nearly comes to no cost if the method isn't overridden. It is also possible to have private methods, in the sense that you can use inheritance between your flextypes without publishing every virtual method to Python at all. Here an example of my Stackless type system, where I made my channel interface overridable: (channelobject.h) """ #define CHANNEL_SEND_HEAD(func) \ int func (PyChannelObject *self, PyObject *arg) #define CHANNEL_SEND_EXCEPTION_HEAD(func) \ int func (PyChannelObject *self, PyObject *klass, PyObject *value) #define CHANNEL_RECEIVE_HEAD(func) \ PyObject * func (PyChannelObject *self) typedef struct _pychannel_heaptype { PyFlexTypeObject type; /* the fast callbacks */ CHANNEL_SEND_HEAD( (*send) ); CHANNEL_SEND_EXCEPTION_HEAD( (*send_exception) ); CHANNEL_RECEIVE_HEAD( (*receive) ); } PyChannel_HeapType; int init_channeltype(void); """ Here the VMT definition of channelobject.c: """ static PyCMethodDef channel_cmethods[] = { CMETHOD_PUBLIC_ENTRY(PyChannel_HeapType, channel, send), CMETHOD_PUBLIC_ENTRY(PyChannel_HeapType, channel, send_exception), CMETHOD_PUBLIC_ENTRY(PyChannel_HeapType, channel, receive), {NULL} /* sentinel */ }; """ where the CMETHOD_PUBLIC_ENTRY macro looks like this: /* * a public entry defines * - the function name "name" * - the PyCFunction class_name seen from Python, * - the fast function impl_class_name implements the method for C * - the wrapper function wrap_class_name that calls back into a Python override. */ #define CMETHOD_PUBLIC_ENTRY(type, prefix, name) \ {#name, (PyCFunction)prefix##_##name, &impl_##prefix##_##name, &wrap_##prefix##_##name, \ offsetof(type, name)} So basically three functions are involved in a virtual method: the PyCFunction, the C implementation and a wrapper. Normally, the PyCFunction and the implementation can be identical, but usually my C interface looks slightly different from the Python interface, for convenience. Here an excerpt from channel_send: """ int PyChannel_Send(PyChannelObject *self, PyObject *arg) { PyChannel_HeapType *t = (PyChannel_HeapType *) self->ob_type; return t->send(self, arg); } static CHANNEL_SEND_HEAD(impl_channel_send) { PyThreadState *ts = PyThreadState_GET(); PyTaskletObject *sender, *receiver; .... implementation skipped .... } static CHANNEL_SEND_HEAD(wrap_channel_send) { PyObject * ret = PyObject_CallMethod((PyObject *) self, "send", "(O)", arg); return slp_return_wrapper(ret); } static PyObject * channel_send(PyObject *myself, PyObject *arg) { if (impl_channel_send((PyChannelObject*)myself, arg)) return NULL; Py_INCREF(Py_None); return Py_None; } """ end of story. Summary: -------- Overridable methods have always been present in Python, via the built-in method slots. My extension methods give the same functionality to the user, at maximum possible speed (only templates can be faster). The benefit is that users can use much more flexibility in C modules than before, without fear of speed loss. I believe that virtual methods will be used more often, since it is cheap, flexible and compatible with Python. Please let me know if there is interest to use this techique in the Python core. I'm also not sure how to show the complete thing, since it is partially a patch to the existing type implementation (concerning the etype), partially a new C module flextype.c, and the rest is part of Stackless. Does it make sense (would somebody look at it) if I create a little demo application or something? cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From barry@barrys-emacs.org Thu Sep 12 23:12:37 2002 From: barry@barrys-emacs.org (Barry Scott) Date: Thu, 12 Sep 2002 23:12:37 +0100 Subject: [Python-Dev] Feedback request on popen2.py Unix fix Message-ID: <000101c25aa9$8094e650$070210ac@LAPDANCE> I have logged a bug against python 2.2.1 with a fix. [ 608635 ] Unix popen does not return exit status Attached to the bug report is a proposed fix for popen2.py. I'd appreciate feedback on the validity of the changes. Barry From aahz@pythoncraft.com Fri Sep 13 02:23:06 2002 From: aahz@pythoncraft.com (Aahz) Date: Thu, 12 Sep 2002 21:23:06 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020913012306.GA20221@panix.com> On Sat, Aug 24, 2002, Guido van Rossum wrote: > > Why do keep arguing for inheritance? (a) the need to deny inheritance > from an interface, while essential, is relatively rare IMO, and in > *most* cases the inheritance rules work just fine; (b) having two > separate but similar mechanisms makes the language larger. > > For example, if we ever are going to add argument type declarations to > Python, it will probably look like this: > > def foo(a: classA, b: classB): > ...body... I'm curious, and I don't recall having seen anything about this: why wouldn't we simply use attributes to hold this information, like __slots__? After all, attributes get inherited, too, and there's no need to pretzel the syntax. Using attributes IMO would make it easier to handle the case where derived classes need to mangle type and interface declarations. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From David Abrahams" <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> Message-ID: <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> From: "Aahz" > On Sat, Aug 24, 2002, Guido van Rossum wrote: > > > > Why do keep arguing for inheritance? (a) the need to deny inheritance > > from an interface, while essential, is relatively rare IMO, and in > > *most* cases the inheritance rules work just fine; (b) having two > > separate but similar mechanisms makes the language larger. > > > > For example, if we ever are going to add argument type declarations to > > Python, it will probably look like this: > > > > def foo(a: classA, b: classB): > > ...body... > > I'm curious, and I don't recall having seen anything about this: why > wouldn't we simply use attributes to hold this information, like > __slots__? After all, attributes get inherited, too, and there's no > need to pretzel the syntax. Using attributes IMO would make it easier > to handle the case where derived classes need to mangle type and > interface declarations. A few weeks ago I realized there was reason in principle that declaring a class satisfies an interface shouldn't just amount to adding the interface to the class' __bases__ (as Guido has been suggesting all along). Why not? Am we missing somethings? -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From guido@python.org Fri Sep 13 05:37:57 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 13 Sep 2002 00:37:57 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Thu, 12 Sep 2002 21:23:06 EDT." <20020913012306.GA20221@panix.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> Message-ID: <200209130437.g8D4bvi13109@pcp02138704pcs.reston01.va.comcast.net> > > Why do keep arguing for inheritance? (a) the need to deny inheritance > > from an interface, while essential, is relatively rare IMO, and in > > *most* cases the inheritance rules work just fine; (b) having two > > separate but similar mechanisms makes the language larger. > > > > For example, if we ever are going to add argument type declarations to > > Python, it will probably look like this: > > > > def foo(a: classA, b: classB): > > ...body... > > I'm curious, and I don't recall having seen anything about this: why > wouldn't we simply use attributes to hold this information, like > __slots__? After all, attributes get inherited, too, and there's no > need to pretzel the syntax. Using attributes IMO would make it easier > to handle the case where derived classes need to mangle type and > interface declarations. That's exactly what Zope does with the __inherits__ attribute. But it's got limitations: there's only one __inherits__ attribute, so it isn't automatically merged properly on multiple inheritance, and adding one new interface to it means you have to copy or reference the base class __inherits__ attribute. Also, __slots__ is provisional. The plan is for this to eventually get nicer syntax (when I get over my fear of adding new keywords :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Sep 13 05:42:31 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 13 Sep 2002 00:42:31 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Thu, 12 Sep 2002 21:26:38 EDT." <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> Message-ID: <200209130442.g8D4gVH13129@pcp02138704pcs.reston01.va.comcast.net> > A few weeks ago I realized there was reason in principle that ^^^^^^^^^^ Did you mean "was no reason"??? > declaring a class satisfies an interface shouldn't just amount to > adding the interface to the class' __bases__ (as Guido has been > suggesting all along). > > Why not? Am we missing somethings? We'd need a trick to deny an interface that would be inherited by default. Something like private inheritance. There's also the ambiguity of inheriting from a single interface: does that create a sub-interface or an implementation of the interface? Of course with your C++ hat on you probably don't care. On Mondays, Wednesdays, Fridays and alternating Sundays I don't care either. --Guido van Rossum (home page: http://www.python.org/~guido/) From dave@boost-consulting.com Fri Sep 13 05:48:13 2002 From: dave@boost-consulting.com (David Abrahams) Date: Fri, 13 Sep 2002 00:48:13 -0400 Subject: [Python-Dev] type categories References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> <200209130442.g8D4gVH13129@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <0c0001c25ae0$c6298860$6401a8c0@boostconsulting.com> From: "Guido van Rossum" > > A few weeks ago I realized there was reason in principle that > ^^^^^^^^^^ > Did you mean "was no reason"??? > > > declaring a class satisfies an interface shouldn't just amount to > > adding the interface to the class' __bases__ (as Guido has been > > suggesting all along). > > > > Why not? Am we missing somethings? > > We'd need a trick to deny an interface that would be inherited by > default. Something like private inheritance. I think it's more than that. You might need to "uninherit": Say Interface A begets class B which begets class C. What if C doesn't fulfill A? > There's also the ambiguity of inheriting from a single interface: does > that create a sub-interface or an implementation of the interface? > Of course with your C++ hat on you probably don't care. On Mondays, > Wednesdays, Fridays and alternating Sundays I don't care either. With my C++ hat on I can't even imagine this. In C++ we don't express interfaces in code: they're written down as "concepts" in the some documentation somewhere (no, I don't think an abstract class in C++ is a good analogy for these Python interfaces). -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From guido@python.org Fri Sep 13 06:08:22 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 13 Sep 2002 01:08:22 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Fri, 13 Sep 2002 00:48:13 EDT." <0c0001c25ae0$c6298860$6401a8c0@boostconsulting.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> <200209130442.g8D4gVH13129@pcp02138704pcs.reston01.va.comcast.net> <0c0001c25ae0$c6298860$6401a8c0@boostconsulting.com> Message-ID: <200209130508.g8D58NH13288@pcp02138704pcs.reston01.va.comcast.net> > > > A few weeks ago I realized there was reason in principle that > > ^^^^^^^^^^ > > Did you mean "was no reason"??? So did you? > > > declaring a class satisfies an interface shouldn't just amount to > > > adding the interface to the class' __bases__ (as Guido has been > > > suggesting all along). > > > > > > Why not? Am we missing somethings? > > > > We'd need a trick to deny an interface that would be inherited by > > default. Something like private inheritance. > > I think it's more than that. You might need to "uninherit": Say > Interface A begets class B which begets class C. What if C doesn't > fulfill A? Sorry, I meant to include that case. How do you do that in C++? Inherit privately from B and publicly from A, and making A virtual base everywhere? > > There's also the ambiguity of inheriting from a single interface: does > > that create a sub-interface or an implementation of the interface? > > Of course with your C++ hat on you probably don't care. On Mondays, > > Wednesdays, Fridays and alternating Sundays I don't care either. > > With my C++ hat on I can't even imagine this. In C++ we don't > express interfaces in code: they're written down as "concepts" in > the some documentation somewhere (no, I don't think an abstract > class in C++ is a good analogy for these Python interfaces). What's the difference between an abstract class and an interface in C++? --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Fri Sep 13 08:12:55 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 13 Sep 2002 03:12:55 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200209130437.g8D4bvi13109@pcp02138704pcs.reston01.va.comcast.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <200209130437.g8D4bvi13109@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020913071255.GA13052@panix.com> On Fri, Sep 13, 2002, Guido van Rossum wrote: >Aahz: >> >> I'm curious, and I don't recall having seen anything about this: why >> wouldn't we simply use attributes to hold this information, like >> __slots__? After all, attributes get inherited, too, and there's no >> need to pretzel the syntax. Using attributes IMO would make it easier >> to handle the case where derived classes need to mangle type and >> interface declarations. > > That's exactly what Zope does with the __inherits__ attribute. > > But it's got limitations: there's only one __inherits__ attribute, so > it isn't automatically merged properly on multiple inheritance, and > adding one new interface to it means you have to copy or reference the > base class __inherits__ attribute. Isn't that what metaclasses are for? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From dave@boost-consulting.com Fri Sep 13 12:57:21 2002 From: dave@boost-consulting.com (David Abrahams) Date: Fri, 13 Sep 2002 07:57:21 -0400 Subject: [Python-Dev] type categories References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> <200209130442.g8D4gVH13129@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <0c2901c25b1d$4660bf80$6401a8c0@boostconsulting.com> From: "Guido van Rossum" > > A few weeks ago I realized there was reason in principle that > ^^^^^^^^^^ > Did you mean "was no reason"??? Oh. Yup. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From David Abrahams" <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> <200209130442.g8D4gVH13129@pcp02138704pcs.reston01.va.comcast.net> <0c0001c25ae0$c6298860$6401a8c0@boostconsulting.com> <200209130508.g8D58NH13288@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <0c4801c25b24$4c5822f0$6401a8c0@boostconsulting.com> From: "Guido van Rossum" > > > We'd need a trick to deny an interface that would be inherited by > > > default. Something like private inheritance. > > > > I think it's more than that. You might need to "uninherit": Say > > Interface A begets class B which begets class C. What if C doesn't > > fulfill A? > > Sorry, I meant to include that case. How do you do that in C++? We don't use inheritance for this kind of interface. When we're making a Java-style interface, sure, inheritance works fine in C++. However, because of Python's dynamic, generic nature what we've been calling an interface for Python is much more like a "concept", which has no direct expression in code: http://www.boost.org/more/generic_programming.html#concept Actually if you read a little further on down the page (the Traits and Tag Dispatching sections), you'll see that it's possible to create an expression in code of a concept in C++. Usually you want to do that when concepts form a refinement hierarchy (e.g. bidirectional_iterator refines forward_iterator) which may or may not correspond to inheritance relationships. > Inherit privately from B and publicly from A, and making A virtual > base everywhere? I guess you /could/ do that. I don't think anyone does, though ;-) I was going to say that is seems to me if you can dynamically inject base classes in Python there's no problem using inheritance to do this sort of labelling. However, on third though, maybe there is a problem. Suppose you have an inheritance chain A->B->C...->Z and I come a long later to say that A fulfills interface II and add II to A's bases. Which of A's subclasses also fulfill II. I might not know. I might not even know about them. For this, maybe you'd need a way to express inheritance that goes just "one level deep" (i.e. A inherits II publicly, but nothing else does). And that might just screw with the notion of inheritance enough that you want a separate parallel mechanism. So I guess I'm back to where I was before. Inheritance probably doesn't work out too well for expressing "satisfies interface". ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From jeremy@alum.mit.edu Fri Sep 13 15:14:49 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 13 Sep 2002 10:14:49 -0400 Subject: [Python-Dev] type categories In-Reply-To: <0c4801c25b24$4c5822f0$6401a8c0@boostconsulting.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> <200209130442.g8D4gVH13129@pcp02138704pcs.reston01.va.comcast.net> <0c0001c25ae0$c6298860$6401a8c0@boostconsulting.com> <200209130508.g8D58NH13288@pcp02138704pcs.reston01.va.comcast.net> <0c4801c25b24$4c5822f0$6401a8c0@boostconsulting.com> Message-ID: <15745.62169.148898.620458@slothrop.zope.com> >>>>> "DA" == David Abrahams writes: DA> I was going to say that is seems to me if you can dynamically DA> inject base classes in Python there's no problem using DA> inheritance to do this sort of labelling. However, on third DA> though, maybe there is a problem. Suppose you have an DA> inheritance chain A->B->C...->Z and I come a long later to say DA> that A fulfills interface II and add II to A's bases. Which of DA> A's subclasses also fulfill II. I might not know. I might not DA> even know about them. For this, maybe you'd need a way to DA> express inheritance that goes just "one level deep" (i.e. A DA> inherits II publicly, but nothing else does). And that might DA> just screw with the notion of inheritance enough that you want a DA> separate parallel mechanism. DA> So I guess I'm back to where I was before. Inheritance probably DA> doesn't work out too well for expressing "satisfies interface". I had similar third thoughts a couple of weeks ago :-). So I guess I agree with you. Jeremy From barry@python.org Fri Sep 13 15:38:03 2002 From: barry@python.org (Barry A. Warsaw) Date: Fri, 13 Sep 2002 10:38:03 -0400 Subject: [Python-Dev] type categories References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020913012306.GA20221@panix.com> <0b0501c25ac4$a5ffcd90$6401a8c0@boostconsulting.com> <200209130442.g8D4gVH13129@pcp02138704pcs.reston01.va.comcast.net> <0c0001c25ae0$c6298860$6401a8c0@boostconsulting.com> <200209130508.g8D58NH13288@pcp02138704pcs.reston01.va.comcast.net> <0c4801c25b24$4c5822f0$6401a8c0@boostconsulting.com> <15745.62169.148898.620458@slothrop.zope.com> Message-ID: <15745.63563.522111.553274@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: >>>>> "DA" == David Abrahams writes: DA> I was going to say that is seems to me if you can dynamically DA> inject base classes in Python there's no problem using DA> inheritance to do this sort of labelling. However, on third DA> though, maybe there is a problem. Suppose you have an DA> inheritance chain A->B->C...->Z and I come a long later to say DA> that A fulfills interface II and add II to A's bases. Which of DA> A's subclasses also fulfill II. I might not know. I might not DA> even know about them. For this, maybe you'd need a way to DA> express inheritance that goes just "one level deep" (i.e. A DA> inherits II publicly, but nothing else does). And that might DA> just screw with the notion of inheritance enough that you want DA> a separate parallel mechanism. DA> So I guess I'm back to where I was before. Inheritance DA> probably doesn't work out too well for expressing "satisfies DA> interface". JH> I had similar third thoughts a couple of weeks ago :-). So I JH> guess I agree with you. I tend to agree as well. But to play devil's advocate for a moment: I think Guido said that inheritance won't be the only way to spell conforms-to, but it'll be the predominately common way. So you'd definitely need a way to spell that outside of inheritance as your example clearly shows. Which means that any conformsto() function will have to be more complicated because it'll need to check both mechanisms. Is that a worthwhile price to pay to allow conforms-to-by-inheritance? What I don't like about the inheritance mechanism is that the syntax isn't explicit. I look at a class definition and I don't really know what's a base class for implementation purposes and what's an interface assertion. It might even be difficult if I had the source code for all the classes in the base class list if there's little except convention to syntactically distinguish between a class definition and an interface definition (no keyword, but just a stylized bunch of defs). I think it's going to be important to know what's an interface and what's a base class. Naming conventions (IThingie) can help but aren't enforced. -Barry From drifty@bigfoot.com Sun Sep 15 06:49:00 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Sat, 14 Sep 2002 22:49:00 -0700 (PDT) Subject: [Python-Dev] flextype.c -- extended type system In-Reply-To: <3D80D6C7.9040201@tismer.com> Message-ID: [Christian Tismer] > Hi Guido, py-dev, > > preface: > -------- > a week ago or so, I sent a patch to Guido that removes > the "etype" struct. This is a hidden structure that > extends types when they are allocated on the heap. > One restriction with this type was that types could > not be extended by metatypes for some internal reason. > I fixed this. Now meta-types can define extra slots > for types. > I have never written a type or object in C, so bear with my newbie questions. Are you saying, Chris, that before you could not inherit a type written in C and override a method? Is this only in regards to the magic method slots or just any method? >From what I gather in your email, it seems like you came up with proper overriding inheritence in C for methods defined in a type. So does this means you can now override the __contains__ magic slot in C code through some inherited type and this was not doable before? Perhaps an example of something from the Python core that was not possible before would solidify this for me. -Brett From skip@manatee.mojam.com Sun Sep 15 13:00:16 2002 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 15 Sep 2002 07:00:16 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200209151200.g8FC0Gux000552@manatee.mojam.com> Bug/Patch Summary ----------------- 281 open / 2853 total bugs (+1) 108 open / 1690 total patches (-7) New Bugs -------- PyString_AsString underdocumented (2002-09-08) http://python.org/sf/606463 defining away __attribute__ is not good (2002-09-08) http://python.org/sf/606493 xml.sax second time file loading problem (2002-09-09) http://python.org/sf/606692 header file problems (2002-09-10) http://python.org/sf/607253 IDE should have "open recent" menu (2002-09-11) http://python.org/sf/607810 IDE look and feel (2002-09-11) http://python.org/sf/607814 IDE Preferences (2002-09-11) http://python.org/sf/607816 IDE output window (2002-09-11) http://python.org/sf/607821 Implied __init__.py not copied (2002-09-11) http://python.org/sf/608033 IDE - Breakpoints don't stick to lines (2002-09-11) http://python.org/sf/608085 gethostbyname("LOCALHOST") fails (2002-09-12) http://python.org/sf/608584 Problems in IDLE Browsers & Viewers (2002-09-12) http://python.org/sf/608595 Unix popen does not return exit status (2002-09-12) http://python.org/sf/608635 test_b1.py, disabling of list test (2002-09-13) http://python.org/sf/609041 cPickle.BadPickleGet is a string (2002-09-13) http://python.org/sf/609164 New Patches ----------- Enhanced file constructor (2002-09-11) http://python.org/sf/608182 configure on Irix (sockets, posix) (2002-09-13) http://python.org/sf/608999 Closed Bugs ----------- 64-bit zip problems (2001-08-19) http://python.org/sf/453208 mmap bus error on linux (2001-09-19) http://python.org/sf/462783 shutil.copy(path, path) deletes contents (2001-12-07) http://python.org/sf/490168 IDLE doesn't save 8bit files (2002-04-18) http://python.org/sf/545600 del __builtins__ breaks out of rexec (2002-07-04) http://python.org/sf/577530 Docs unclear about cleanup. (2002-07-05) http://python.org/sf/577793 Get rid of FutureWarnings in Carbon (2002-08-15) http://python.org/sf/595763 import cycle in distutils (2002-08-19) http://python.org/sf/597604 Python not handling cText (2002-08-22) http://python.org/sf/598981 weird header wrapping in email.Generator (2002-08-28) http://python.org/sf/601392 spurious SyntaxWarning (2002-09-03) http://python.org/sf/604036 pre bug (2002-09-04) http://python.org/sf/604803 Closed Patches -------------- patch for bug 462783 mmap bus error (2002-03-28) http://python.org/sf/536578 OpenBSD updates for build process (2002-05-10) http://python.org/sf/554718 THREAD_STACK_SIZE for 2.1 (2002-05-11) http://python.org/sf/554841 Remove import string in Tools/ directory (2002-06-21) http://python.org/sf/572113 types.BoolType (2002-08-02) http://python.org/sf/590119 improper use of strncpy in getpath (2002-08-29) http://python.org/sf/602108 For Bug [ 490168 ] shutil.copy(path, pat (2002-09-04) http://python.org/sf/604600 Tweaks to calls to AH/Help (2002-09-07) http://python.org/sf/606067 install_IDLE target in Mac/OSX/Makefile (2002-09-07) http://python.org/sf/606134 From tismer@tismer.com Sun Sep 15 14:04:50 2002 From: tismer@tismer.com (Christian Tismer) Date: Sun, 15 Sep 2002 15:04:50 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: Message-ID: <3D848572.3020408@tismer.com> Brett Cannon wrote: > [Christian Tismer] > > >>Hi Guido, py-dev, >> >>preface: >>-------- >>a week ago or so, I sent a patch to Guido that removes >>the "etype" struct. This is a hidden structure that >>extends types when they are allocated on the heap. >>One restriction with this type was that types could >>not be extended by metatypes for some internal reason. >>I fixed this. Now meta-types can define extra slots >>for types. > > I have never written a type or object in C, so bear with my newbie > questions. Are you saying, Chris, that before you could not inherit a > type written in C and override a method? Is this only in regards to the > magic method slots or just any method? Sure you could. The just was no general interface to it. The magic method slots are already easy to override, assuming that you always call these via the type slots and don't call them directly. For your own, non-magic methods, there was not support, yet. Sure, you could override your methods, but you needed extra machinery to keep track of the methods, to find out which to call when, and so on. The proper way to store extra info about methods is to put this info into the type object itself. This was not possible before my patch. You could help yourself my extending some of the existing method tables, but this is hackish. With my flextype stuff, you explicitly extend your type object with extra function pointers. Then you provide a table with your implementation and wrapper functions, and inheritance works from alone. That's what I was after. > From what I gather in your email, it seems like you came up with proper > overriding inheritence in C for methods defined in a type. So does this > means you can now override the __contains__ magic slot in C code through > some inherited type and this was not doable before? Perhaps an example of > something from the Python core that was not possible before would solidify > this for me. I didn't care of the magic slots at all. I think they don't need to be changed, but I will have a look at it. The difference with my dynamic methods is that the method tables are filled once, at the time when your type/class is created. After that, there is no longer any lookup necessary. Method calls which are not overridden are called with maximum possible speed. In order to support changes to the undelying classes *after* type creation, I will provide an extra type method that allows to "re-bind" explictly. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From drifty@bigfoot.com Sun Sep 15 20:33:24 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Sun, 15 Sep 2002 12:33:24 -0700 (PDT) Subject: [Python-Dev] flextype.c -- extended type system In-Reply-To: <3D848572.3020408@tismer.com> Message-ID: [Christian Tismer] > For your own, non-magic methods, there was not support, yet. > Sure, you could override your methods, but you needed > extra machinery to keep track of the methods, to find out > which to call when, and so on. > The proper way to store extra info about methods is to > put this info into the type object itself. This was not > possible before my patch. You could help yourself my > extending some of the existing method tables, but this > is hackish. > That sounds great. Anything to make coding C extensions easier. > I didn't care of the magic slots at all. I think they don't need > to be changed, but I will have a look at it. Part of the reason I asked about the magic slots is that I personally think it would be great if you didn't have to use the specific struct slots for magic slots but instead were called based on their name in Python. That way you would not have to view Include/object.h every time you wanted to use one of the magic methods; you could just add it just like any other method and just give it a Python name that matched its magic method name. The obvious drawback is you would lose compiler checking that the arguments were correct for the method. But wouldn't this simplify keeping binary-compatibility if it was used since the struct would be pruned down significantly? I don't know how much of a stumbling block this all is for newbies, but I know when I looked at extending sre's pattern objects to add a __contains__ method it took me a little while to find where the slot was and what all the macros were for. But that might also be because I didn't read the C extension docs and just dove in. =) -Brett From guido@python.org Sun Sep 15 20:43:38 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 15 Sep 2002 15:43:38 -0400 Subject: [Python-Dev] flextype.c -- extended type system In-Reply-To: Your message of "Sun, 15 Sep 2002 12:33:24 PDT." References: Message-ID: <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net> > [Christian Tismer] > > > > For your own, non-magic methods, there was not support, yet. > > Sure, you could override your methods, but you needed > > extra machinery to keep track of the methods, to find out > > which to call when, and so on. > > The proper way to store extra info about methods is to > > put this info into the type object itself. This was not > > possible before my patch. You could help yourself my > > extending some of the existing method tables, but this > > is hackish. [Brett Cannon] > That sounds great. Anything to make coding C extensions easier. Brett, may I politely suggest that you try writing C extensions first before claiming it needs to be made easier? Christian's additions (as far as I understand them :-) are mostly intended for very esoteric situations. > > I didn't care of the magic slots at all. I think they don't need > > to be changed, but I will have a look at it. > > > Part of the reason I asked about the magic slots is that I > personally think it would be great if you didn't have to use the > specific struct slots for magic slots but instead were called based > on their name in Python. That way you would not have to view > Include/object.h every time you wanted to use one of the magic > methods; you could just add it just like any other method and just > give it a Python name that matched its magic method name. The > obvious drawback is you would lose compiler checking that the > arguments were correct for the method. But wouldn't this simplify > keeping binary-compatibility if it was used since the struct would > be pruned down significantly? Alas, it would cause a major slowdown if this was the only way to provide heavily-used operations like __add__ and __getitem__. Most of the machinery to allow this probably already exists, but I wouldn't recommend using it. Also, you'd have to provide two implementations for binary operators, e.g. __add__ and __radd__. > I don't know how much of a stumbling block this all is for newbies, > but I know when I looked at extending sre's pattern objects to add a > __contains__ method it took me a little while to find where the slot > was and what all the macros were for. But that might also be > because I didn't read the C extension docs and just dove in. =) You could've picked a simpler extension to try to modify. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@bigfoot.com Mon Sep 16 05:31:53 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Sun, 15 Sep 2002 21:31:53 -0700 (PDT) Subject: [Python-Dev] Python-dev summary for 2002-09-01 to 2002-09-15 Message-ID: Since posting here first before posting to c.l.py worked out rather nicely last time, I am going to do it again. Basically everyone gets 24 hours to reply an corrections for this summary. Now please note that the links that point to where I am going to keep the summaries are not up yet (but they will be by the time I post to c.l.py). Enjoy. ============================== This is a summary of traffic on the `python-dev mailing list`_ between September 01, 2002 and September 15, 2002 (exclusive). It is intended to inform the wider Python community of ongoing developments on the list; everything from new features of the language to how to handle discovered bugs that might affect the general Python programmer. To comment on anything mentioned here, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP (or anything else for that matter) if you have an opinion. :: This is the second summary written by Brett Cannon (hopefully my sophomoric performance will be better then most sophomore music albums). Summaries by me (2002-09-15 to ... when I burn out) are archived at http://www.ocf.berkeley.edu/~bac/python-dev/summaries/index.php . You can find summaries by Michael Hudson (2002-02-01 to 2001-07-05) at http://starship.python.net/crew/mwh/summaries/index.html . Summaries by A.M. Kuchling (2000-12-01 to 2001-01-31) are at http://www.amk.ca/python/dev/ . Please note that this summary is written using reST_ which can be found at http://docutils.sourceforge.net/rst.html . If there is some markup in the summary that seems odd, chances are it is because of reST. Also, I am considering keeping a list of names that people are often referred to in emails. This would serve a dual purpose: allows people who read emails from the list to have a reference to be able to figure out who is who and makes the summaries easier for me because I reference these people in my head by their nicknames. =) Any comments on that idea are appreciated. .. _python-dev mailing list: http://mail.python.org/mailman/listinfo/python-dev ============================ `To commit or not commit`_ ============================ Walter Dorwald asked if there were "any objections against committing the patch" for implementing `PEP 293`_ (Codec Error Handling Callbacks). Guido asked what Martin V. Lowis and M.A. Lemburg had to say about it. MAL responded that he was +1 on the patch. Martin was "concerned about the massive amounts of C code, most of which could be expressed way more compact in Python code", but "Walter convinced [MvL] that this does have a real performance impact for real data" so he would live with it. In the end he gave it his vote. Walter said he would check it in (and he has). The PEP has now been moved to the finished PEP list. .. _To commit or not commit: http://mail.python.org/pipermail/python-dev/2002-September/028502.html .. _PEP 293: http://www.python.org/peps/pep-0293.html ======================================= `Proposed Mixins for Wide Interfaces`_ ======================================= Raymond Hettinger suggested adding mixin classes that automatically implement magic methods when certain basic magic methods were already implemented (e.g., "given an __eq__ method in a subclass, adds a __ne__ method"). David Abrahams said that he thought "these are a great idea, *in the context of* an understanding of what we want interfaces to be, say, and do." Guido brought up some points about the initial suggestions Raymond made. He then said that he thought that there wasn't "enough here to warrant putting this into the standard library"; the issue will be revisited when a standard type or interface hierarchy is added to Python (not in 2.3). .. _Proposed Mixins for Wide Interfaces: http://mail.python.org/pipermail/python-dev/2002-September/028543.html =================================== `mysterious hangs in socket code`_ =================================== Jeremy Hylton wrote some threaded code to fetch some web pages that hung when performing a slow DNS operation. Apparently, in Python 2.1 "it produces a steady stream of output -- urls and the time it took to load them". In Python 2.2 and 2.3, though, "it produces little bursts of output, then pauses for a long time, then repeats". Jeremy guessed that it *might* have something to do with Linux's getaddrinfo() being thread-safe by allowing only a single lookup at a time. Aahz said that "gethostbyname() IIRC has frequently been non-reentrant". Skip ran the code in question under strace and said that "it seems mostly to be sitting in select() calls and rt_sigsuspend() which [Skip] guess[es] is a wrapper around sigsuspend()." .. _mysterious hangs in socket code: http://mail.python.org/pipermail/python-dev/2002-September/028555.html ======================================== `Two random and nearly unrelated ideas`_ ======================================== Skip Montanaro had two ideas; one was to make the info in `Misc/NEWS`_ (which is a summary of what has been changed in Python for each release) and "to get rid of the ticker altogether in systems with proper signal support" (see the `2002-08-16 - 2002-09-01 summary`_ for an explanation of what the ticker is). That would get rid of the polling of the ticker and thus reduce the overhead on threads. For the first idea, Guido asked Skip to try seeing what it would look like with reST_ markup and what the resulting page would look like. In response to the second idea, Oren Tirosh said it couldn't be done until "all Python I/O calls are converted to be EINTR-safe" (EINTER-safe means to be able to handle the EINTER signal which what is raised "When an I/O operation is interrupted by an unmasked signal"). That "requires a lot of work in some of the hairiest places in the Python codebase." Fredrik Lundh said that this "sounds like a good topic for a "here's what I learned when trying to fix this problem" PEP. This is most likely in reference to Skip writing the patch to make the ticker global instead of a per-thread issue. Guido said, in terms of signals, to "just say no"; "it is impossible to write correct code in the presense of signals". Guido, in a later email, gave this whole idea a vote of -1,000,000; so it ain't ever going to happen. Some discussion on signals ensued, but Guido never budged from his position. Oren pointed out that if some C code used signals and people didn't handle it in their Python code by checking if IOError was caused by EINTER (as shown below by Oren's code):: while 1: try: except IOError, exc: if exc.errno == errno.EINTR: continue else: raise , it would not restart properly even though there was no reason for it to have stopped. Oren said that Python could add the loop in the C code of the core where EINTR might be raised ("Only low-level functions like os.read_ and os.write_ that map directly to stdio functions should ever return EINTR"). The proposed idea was to wrap functions that might raise this that can be re-entered safely. .. _Two random and nearly unrelated ideas: http://mail.python.org/pipermail/python-dev/2002-September/028555.html .. _Misc/NEWS: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Misc/NEWS .. _2002-08-16 - 2002-09-01 summary: http://www.ocf.berkeley.edu/~bac/python-dev/summaries/2002-08-16--2002-09-01.html .. _reST: http://docutils.sourceforge.net/rst.html .. _os.read: .. _os.write: http://www.python.org/dev/doc/devel/lib/os-fd-ops.html ================================================ `Should KeyError use repr() on its arguments?`_ ================================================ Originally, when an exception was raised and you passed in an optional object to act as a description of why the exception was raised (such as ``KeyError("there is no spoon")`` where ``there is now spoon`` is the optional argument bound to ``.args``), it just returned what args was bound to when you called; ``str() == .args``. Now it calls repr() on what args is bound to; ``str() == str(.args)``. Much better. =) .. _Should KeyError use repr() on its arguments?: http://mail.python.org/pipermail/python-dev/2002-September/028545.html ========================================== `New 'spambayes' project on SourceForge`_ ========================================== Thanks to great work done by Tim Peters and several other contributors, Barry Warsaw started an SF project to host the spambayes code. It can be found at http://sf.net/projects/spambayes . There are two mailing lists: http://mail.python.org/mailman-21/listinfo/spambayes and http://mail.python.org/mailman-21/listinfo/spambaye-checkins (yes, that is Mailman 2.1, and yes, you will "help be a guinea pig for Mailman 2.1"). .. _New 'spambayes' project on SourceForge: http://mail.python.org/pipermail/python-dev/2002-September/028626.html ========================= `Subsecond time stamps`_ ========================= Martin V. Lowis wanted to introduce subsecond timestamps on platforms that supported it. He suggested adding another field to stat, create a new type, or make st_mtime a floating point. The first one option is easy, the second has the usual problems of defining a new type, and the third does not guarantee enough accuracy. Paul Svensson and Guido said that the last option (turning st_mtime into a float) was the most Pythonic. MvL agreed, but worried about breaking code that expected an int. Guido then suggested that maybe the new field is the way to go; define something like st_mtimef that will contain the float if available or contain an int otherwise. Tim Peters also weighed in with his `IEEE 754`_ voodoo about how a float can hold enough info to be accurate up to 100 nanoseconds if you only span a single year; issues start to come up once you try to go past a year's worth of seconds. But then MvL discovered that st_mtime was already a float on the Mac; had that caused issues? Jack Jansen of course chimed in on this by saying that it caused him a headache about once a year in the form of a failing test (other issues caused by timestamps is the Classic Macs having the epoch at 1904 and not using UTC time). He said he would prefer to see the timestamp as a cookie that was passed into a function that spit out "something guaranteed to be of your liking". To address the other issues that Jack mentioned, Guido suggested that all timestamps be converted to UTC time with the epoch at 1970. MvL has `SF patch 606592`_ up on SF that has already been closed that makes all the relevant changes to have timestamps return floats. .. _Subsecond time stamps: http://mail.python.org/pipermail/python-dev/2002-September/028648.html .. _IEEE 754: http://grouper.ieee.org/groups/754/ .. _SF patch 606592: http://www.python.org/sf/606592 ================================= `64-bit process optimization 1`_ ================================= Bob Ledwith posted a simple patch for `Include/object.h`_ that changed the order of certain parts of the PyObject_HEAD macros, affecting PyObject and PyVarObject. This was for a 64-bit platform performance boost (40% for large data sets according to Bob). The reordering eliminated some padding in the struct and allows more Python objects to fit in the L2 cache, or at least that is what Bob thinks is going on. Guido pointed out that this would save 8 bytes per object; he thought all of this was "Interesting!". But alas, using this patch would break binary compatibility. Guido was not sure, though, whether it had been broken yet between Python 2.2 and 2.3 and thus he might be "being too conservative here" in terms of saying that it should be held back for now. A problem Guido pointed out for 64-bit systems, is that theoretically the reference count for an object could go negative with enough references as things stand now. Guido then suggested that perhaps refcnt (struct item that holds the reference count) should be a ``long``. And while dealing with that, Guido suggested that anything that stores a length should store that number in a ``long``. Chime in Tim Peters. He pointed out that it was agreed upon years ago to move refcnt to ``long`` but no one had bothered to do it. Heck, even Guido thought for a long time that it was a long when it wasn't; it required Tim to "beat that out of [Guido] " to stop him from saying that it was a ``long``. He then pointed out that Win64 was still only 4 bytes for a ``long``; what was really desired was for it to be ``Py_intptr_t`` which is the Python way for spelling the C99 type that we wanted. Apparently C99 has a way to specify that things be a specific byte length (now if everyone just had a C99 compiler we wouldn't need these macros; oh, to dream...). Tim also pointed out that what we wanted for the type that held a length argument to be size_t since that is what strlen() and malloc() are restricted by. He said that he writes all of his "string-slinging code as using size_t vars now". Tim pointed out that the issue then became "Whether it's worth the pain to change this stuff" which "depends on whether we think 64-bit boxes are just another passing fad like the Internet ". =) Martin V. Lowis agreed with the changing of refcnt to a long but had reservations about using size_t for the length field (ob_size). He pointed out that some objects put negative values into that field. Frederik suggested that the proposed changes be default on 64-bit systems since the chances that they are willing to recompile is higher then people on 32-bit systems. He also suggested making it a compiler option. Guido thought it was a good idea. But then Mats Wichmann discovered that the switch to long killed the performance boost. So Guido re-iterated that he thinks it should be a compiler option only on 64-bit systems; have "compat", "optimal", and "right" compiler options. .. _64-bit process optimization 1: http://mail.python.org/pipermail/python-dev/2002-September/028677.html .. _Include/object.h: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Include/object.h ========================================== `Weeding out obsolete modules and Demos`_ ========================================== Jack Jansen noticed that there demos for some of the SGI-specific modules that use severely outdated systems and hardware (stuff discontinued 8 to 12 years ago). Guido gave the go-ahead to yank them from CVS. So the demos are now history. .. _Weeding out obsolete modules and Demos: http://mail.python.org/pipermail/python-dev/2002-September/028718.html ============== `utf8 issue`_ ============== (This thread actually started in August) There was a bug in Python 2.2 that raised a UnicodeError when trying to decode a lone surrogate (explanation of surrogates to follow this summary). This caused issues in importing .pyc files that contained a lone surrogate because marshal_ (which is what is used to create .pyc files) encodes Unicode_ literals in UTF-8. This has all been fixed in Python 2.3, but Guido was wondering how to backport this for Python 2.2.2. The option of bumping the magic number for .pyc files was raised and instantly thrown out by Guido; "Bumping MAGIC is a no-no between dot releases". So M.A. Lemburg suggested to either fix the Unicode encoder or change the Unicode decoder to handle the malformed Unicode. MAL wasn't sure, though, if some security issue would be raised by the latter option. Guido said go for the latter and didn't see any possible security issue since "If someone you don't trust can write your .pyc files, they can cause your interpreter to crash by inserting bogus bytecode". Explanation of lone surrogates: In Unicode, a surrogate pair is when you create the representation of a character by using two values. So, for instance, UTF-32 can cover the entire Unicode space (since Unicode is 20 bits, although MvL says it is really more like 21 bits), but UTF-16 can't. To solve the issue for an encoding that cannot cover all possible characters in a single value a character can be represented as a pair of UTF-16 values. The high surrogate cover the high 10 bits while the low surrogate cover the lower 10 bits. High and low surrogates can never be the same since they are defined by a range of possible values and those ranges do not overlap. So with the proper high and low surrogate paired together you can make any possible Unicode character. The problem in Python 2.2.1 is that when there is only a lone surrogate (instead of there being a pair of surrogates), the encoder for UTF-8 messes up and leaves off a UTF-8 value. The following line is an example: >>> u'\ud800'.encode('utf-8') '\xa0\x80' #In Python 2.2.1 '\xed\xa0\x80' #In Python 2.3a0 Notice how in Python 2.3a0 the extra value is inserted so as to make the representation a complete Unicode character instead of only encoding the half of the surrogate pair that the encode was given. You can read http://216.239.37.100/search?q=cache:Dk12BZNt6skC:uk.geocities.com/BabelStone1357/Software/surrogates.html for more info. Thanks goes to Frederik for the link and Guido for some clarification. .. _utf8 issue: http://mail.python.org/pipermail/python-dev/2002-August/028254.html .. _marshal: http://www.python.org/dev/doc/devel/lib/module-marshal.html .. _Unicode: http://www.unicode.org/ ===================================== `Documentation inconsistency in re`_ ===================================== Christopher Craig noticed that the docs for the re_ module for the \b metacharacter was incorrect; it says that "the end of a word is indicated by whitespace or a non-alphanumeric character". That would indicate that an underscore would be the end of a word, which turns out to be false. Frederik said that "\b is defined in terms of \w and \W" and thus allows underscore to be a alphanumeric character. The documentaiton has been fixed. .. _Documentation inconsistency in re: http://mail.python.org/pipermail/python-dev/2002-September/028644.html .. _re: http://www.python.org/dev/doc/devel/lib/module-re.html ======================= `Codecs lookup order`_ ======================= Francois Pinard discovered that for the codecs_ module "one should be careful about **not** [altered emphasis] naming a module after the encoding name, when closely following the documentation in the Library Reference manual". This is because the codecs module first searches the registry of codecs, then searches for a module with the same name and use that module. The issue comes up when the module does not contain a function named getregentry(); "\`encodings.lookup()\` expects a \`getregentry\` function in that module, does not find it, and raises a CodecRegistryError, not leaving a chance to subsequent codec search functions to be used". M.A. Lemburg said that this has been fixed in Python 2.3 and will be in 2.2.2 by having encodings.lookup() return None if getregentry() is not found and thus allowing the search to continue. .. _Codecs lookup order: http://mail.python.org/pipermail/python-dev/2002-September/028676.html .. _codecs: http://www.python.org/dev/doc/devel/lib/module-codecs.html ================================= `raw headers in rfc822.Message`_ ================================= John Spurling provided a two-line hack to keep the raw headers in an rfc822.Message_ . Barry responded that email.Message.Message_ keeps the raw headers around. But the reason I am summarizing this is what this thread quickly changed to is how to properly generate a patch. Patches should be generated using UNIX diff, either the -c or -u option with preference for -c (using cvs diff -c is even better; puts the version of the file you are diffing with in the output); Mac folk can send MPW diffs, but UNIX diff is the definitely preference. Always put the order of the files `diff -c OLD_FILE NEW_FILE` . And always post the patches_ to SourceForge_! Getting random patches, no matter how small, on the list is annoying (at least to me) because the point of the list is to discuss the design and implementation of Python, not to patch Python. SF is used so that Python-dev does not need to be bothered with mundame problems like applying patches (and to annoy Aahz with SF's UI sucking in Lynx_ =). So please, for my sake and everyone else on Python-dev, use SF! For a funny email from Raymond Hettinger about developing for Python read http://mail.python.org/pipermail/python-dev/2002-September/028725.html . .. _raw headers in rfc822.Message: http://mail.python.org/pipermail/python-dev/2002-September/028682.html .. _rfc822.Message: http://www.python.org/dev/doc/devel/lib/message-objects.html .. _email.Message.Message: http://www.python.org/dev/doc/devel/lib/module-email.Message.html .. _patches: http://sourceforge.net/patch/?group_id=5470 .. _SourceForge: http://www.sourceforge.net/ .. _Lynx: http://lynx.browser.org/ =================== `type categories`_ =================== Yes, the `same thread`_ from the `last summary`_ is back. This thread has become the bane of my summarizing existence. =) Aahz asked "why wouldn't we simply use attributes to hold" interfaces that a class implemented (think of __slots__). David Abrahams then brought up the idea of just adding interfaces to the __class__ attribute. Guido then chimed in on the attributes idea. He pointed out that this is how Zope does it, using the __inherits__ attribute. The limitation is that "it isn't automatically merged properly on multiple inheritance, and adding one new interface to it means you have to copy or reference the base class __inherits__ attribute". And as for David's idea of just adding to __class__, that doesn't work because there is no way to limit the interface; you need "Something like private inheritance" for when an interface is broken by some inherited class. David subsequently added the issue of being able to disinherit when an interface is not valid but is inherited by default as another problem for using inheritence for interfaces. David then brought up the issue of having Python being so dynamic that you could inject an interface if you used __class__ like he suggested through black magic code. If the injected interface didn't work because of the inheritence chain, then you have a problem. Barry Warsaw brought in his objections. He tried playing Devil's Advocate by saying that Guido had said that inheritance would not be the only way to handle interfaces, but that it would be the predominent way. But this duality would complicate any conformsto()-like function since it would have to handle two different ways for a class to get an interface. Barry then brought up the objection that he didn't like the idea of using straight inheritence because he wanted a syntactic way to separate out interfaces. As a side note, Guido pointed out that __slots__ is provisional; nicer syntax will eventually surface when Guido gets over his "fear of adding new keywords". .. _type categories: http://mail.python.org/pipermail/python-dev/2002-September/028738.html .. _same thread: http://www.ocf.berkeley.edu/~bac/python-dev/summaries/2002-08-16--2002-09-01.html#type-categories .. _last summary: http://www.ocf.berkeley.edu/~bac/python-dev/summaries/2002-08-16--2002-09-01.html ======================================= `flextype.c -- extended type system`_ ======================================= Christian Tismer has come up with a replacement for the etype which is "a hidden structure that extends types when they are allocated on the heap" (you can find it in `Objects/typeobject.c`_ in the CVS_). There is a limitation with the etype where it could not be extended by metatypes. Well, Chris worked his magic and came up with a new flextype that allows overriding of methods. So with Christian's code you would be able to override methods in a type without having to hack something together to handle the overriding correctly; it would be handled automatically. Through some clarification from Christian and Guido, it was pointed out to me (as of this moment I am the only one to make any noise on this thread, and it was for this summary) that this simplifies an esoteric issue; note the use of the words "metatype" above. This is type/metatype black magic hacking. Spiffy, but something most of us "normal" folk will not have to worry about. .. _flextype.c -- extended type system: http://mail.python.org/pipermail/python-dev/2002-September/028736.html .. _Objects/typeobject.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/typeobject.c .. _CVS: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/#dirlist From goodger@users.sourceforge.net Mon Sep 16 06:45:27 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Mon, 16 Sep 2002 01:45:27 -0400 Subject: [Python-Dev] Python-dev summary for 2002-09-01 to 2002-09-15 In-Reply-To: Message-ID: Brett Cannon wrote: > Please note that this summary is written using reST_ which can be found at > http://docutils.sourceforge.net/rst.html . If there is some markup in the > summary that seems odd, chances are it is because of reST. Please don't blame the markup! By the time people see it, it's been mutilated by mailers to the point where it's unrecognizable. Like Python, leading whitespace is significant in reStructuredText. As the author, please take steps to prevent your document's mutilation. There are some serious problems in the text I received, probably due to emailer handling of the text. Specifically, line wrapping gets screwed up if lines are longer than 76 or 78 characters, and indentation goes out the window. I always limit files to 70 characters per line to prevent this. I haven't had a chance to look through it thoroughly (gotta get some sleep), but I noticed you used a literal block for your author's intro, beginning "This is the second summary". I think a block quote would be better; just drop the "::" and fix the indentation (which was totally wacky). If you send me the original as an attachment (gzipped would be best), I'll be happy to take a look and give a detailed critique. -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From mal@lemburg.com Mon Sep 16 10:10:05 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 16 Sep 2002 11:10:05 +0200 Subject: [Python-Dev] Re: Automatic flex interface for Python? References: Message-ID: <3D859FED.8020609@lemburg.com> Tim Peters wrote: > [Gordon McMillan] > >>mxTextTools lets (encourages?) you to break all >>the rules about lex -> parse. If you can (& want to) >>put a good deal of the "parse" stuff into the scanning >>rules, you can get a speed advantage. You're also >>not constrained by the rules of BNF, if you choose >>to see that as an advantage :-). >> >>My one successful use of mxTextTools came after >>using SPARK to figure out what I actually needed >>in my AST, and realizing that the ambiguities in the >>grammar didn't matter in practice, so I could produce >>an almost-AST directly. > > > I don't expect anyone will have much luck writing a fast lexer using > mxTextTools *or* Python's regexp package unless they know quite a bit about > how each works under the covers, and about how fast lexing is accomplished > by DFAs. If you know both, you can build a DFA by hand and painfully > instruct mxTextTools in the details of its construction, and get a very fast > tokenizer (compared to what's possible with re), regardless of the number of > token classes or the complexity of their definitions. Writing to > mxTextTools directly is a lot like writing in an assembly language for a > character-matching machine, with all the pains and potential joys that > implies. If I were Eric, I'd use Flex . FYI, there are a few meta languages to make life easier for mxTextTools like e.g. Mike Fletcher's SimpleParse. The upcoming version 2.1 will also support Unicode and allows text jump targets which boosts readability of the tag tables a lot and makes hand-writing the tables much easier. The beta of 2.1 is available to the subscribers of the egenix-users mailing list. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From thomas@xs4all.net Mon Sep 16 11:59:05 2002 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 16 Sep 2002 12:59:05 +0200 Subject: [Python-Dev] Python-dev summary for 2002-09-01 to 2002-09-15 In-Reply-To: References: Message-ID: <20020916105905.GE797@xs4all.nl> On Sun, Sep 15, 2002 at 09:31:53PM -0700, Brett Cannon wrote: > Skip Montanaro had two ideas; one was to make the info in `Misc/NEWS`_ I suspect there's a "using reST" or "in reST" missing here. > (which is a summary of what has been changed in Python for each release) > and "to get rid of the ticker altogether in systems with proper signal > support" (see the `2002-08-16 - 2002-09-01 summary`_ for an explanation of > what the ticker is). That would get rid of the polling of the ticker and > thus reduce the overhead on threads. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tismer@tismer.com Mon Sep 16 13:05:00 2002 From: tismer@tismer.com (Christian Tismer) Date: Mon, 16 Sep 2002 14:05:00 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D85C8EC.50007@tismer.com> Guido van Rossum wrote: ... > Christian's additions (as far as I understand them :-) are mostly > intended for very esoteric situations. My additions support a subset of C++ virtual methods. How is that esoteric? ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer@tismer.com Mon Sep 16 13:12:16 2002 From: tismer@tismer.com (Christian Tismer) Date: Mon, 16 Sep 2002 14:12:16 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: Message-ID: <3D85CAA0.6070603@tismer.com> Brett Cannon wrote: ... > Part of the reason I asked about the magic slots is that I personally > think it would be great if you didn't have to use the specific struct > slots for magic slots but instead were called based on their name in > Python. That way you would not have to view Include/object.h every time > you wanted to use one of the magic methods; you could just add it just > like any other method and just give it a Python name that matched its > magic method name. The obvious drawback is you would lose compiler > checking that the arguments were correct for the method. No, vice versa. I *could* support any magic slot and put it into the extended type object with a Python name. And even better, this version could have full type checking, as my other methods have as well! This could go far bejond what we have now. My system is explicit as types: You repeat the whole function argument list in the new gown slot. This is as type safe as can be. Esoterically y'rs - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer@tismer.com Mon Sep 16 13:26:36 2002 From: tismer@tismer.com (Christian Tismer) Date: Mon, 16 Sep 2002 14:26:36 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: Message-ID: <3D85CDFC.7000004@tismer.com> Brett Cannon wrote: ... > Part of the reason I asked about the magic slots is that I personally > think it would be great if you didn't have to use the specific struct > slots for magic slots but instead were called based on their name in > Python. That way you would not have to view Include/object.h every time > you wanted to use one of the magic methods; you could just add it just > like any other method and just give it a Python name that matched its > magic method name. The obvious drawback is you would lose compiler > checking that the arguments were correct for the method. No, vice versa. I *could* support any magic slot and put it into the extended type object with a Python name. And even better, this version could have full type checking, as my other methods have as well! This could go far beyond what we have now. My system is explicit at types: You repeat the whole function argument list in the newly grown slot. This is as type safe as can be. Esoterically y'rs - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From pinard@iro.umontreal.ca Mon Sep 16 15:13:57 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Mon, 16 Sep 2002 10:13:57 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-09-01 to 2002-09-15 In-Reply-To: (David Goodger's message of "Mon, 16 Sep 2002 01:45:27 -0400") References: Message-ID: [David Goodger] > Please don't blame the markup! By the time people see it, it's been > mutilated by mailers to the point where it's unrecognizable. [...] As the > author, please take steps to prevent your document's mutilation. The message seems adequately formatted, as delivered here. This is a recurring problem, deciding how far maintainers or writers should keep in mind various broken software of the recipients. There is an equilibrium to reach, but the pressure is often undue on the authors, as recipients want them to take care of everything bad they see. My guess is that everybody has his/her share in that adventure. As long as the author does well, and he did fairly well here, most recipients problems have to be addressed by recipients. > Specifically, line wrapping gets screwed up if lines are longer than 76 or > 78 characters, and indentation goes out the window. I always limit files to > 70 characters per line to prevent this. The 79 or 80 character limit is still a reasonable convention and a good goal. Some long URLs just do not fit within that space, they are not easily broken. Some people use lower limits, as an aid for recipients later quoting the original text, yet the proper refilling of quotes (and proper quotation) is really the job of those who reply. It goes a bit far that people limit themselves to 70 characters per line, if because randomly broken software. -- François Pinard http://www.iro.umontreal.ca/~pinard From praveen.patil@silver-software.com Mon Sep 16 16:01:45 2002 From: praveen.patil@silver-software.com (Praveen Patil) Date: Mon, 16 Sep 2002 16:01:45 +0100 Subject: [Python-Dev] Please help solving the problem In-Reply-To: Message-ID: Hi , Please help me in solving the problem below. step 1: I have written three dlls : a.dll , b.dll , c.dll. a.dll contains funct_A(); b.dll contains funct_B(); c.dll contains funct_C(); step 2: I am copying a.dll to directory C:\Program Files\Python\DLLs and renaming as a.pyd similarly I am copying b.dll to directory C:\Program Files\Python\DLLs. I am not renaming as b.pyd I am copying c.dll to directory C:\Program Files\Python\DLLs and renaming as c.pyd So my C:\Program Files\Python\DLLs directory contain a.pyd , b.dll , c.pyd step 3: a)Python function func_pyA() calls funct_A() b)funct_A() call funct_B() c)funct_B() call funct_C() d)funct_C() call python fuction func_pyC() step 4: I am importing a.pyd and c.pyd in python program. import a import c step 5: I am having problem in importing 'a' because 'a' need to load b.dll and c.dll. But I copied c.dll as c.pyd. Please suggest me some solution. here is my code : 1)a.c (a.dll) ---------- void func_A(); 2)b.c (b.dll) ----------- void func_B(); 3)c.c( c.dll) ----------- void func_C(); 4) example.py --------- import a import c G_Logfile = None def TestFunction(): G_Logfile = open('Pytestfile.txt', 'w') G_Logfile.write("%s \n"%'I am writing python created text file') G_Logfile.close G_Logfile = None if __name__ == "__main__": a.func_A(); ..... ..... Please help me in solving the problem. Cheers, Praveen. [ The information contained in this e-mail is confidential and is intended for the named recipient only. If you are not the named recipient, please notify us by telephone on +44 (0)1249 442 430 immediately, destroy the message and delete it from your computer. Silver Software has taken every reasonable precaution to ensure that any attachment to this e-mail has been checked for viruses. However, we cannot accept liability for any damage sustained as a result of any such software viruses and advise you to carry out your own virus check before opening any attachment. Furthermore, we do not accept responsibility for any change made to this message after it was sent by the sender.] From aahz@pythoncraft.com Mon Sep 16 16:14:19 2002 From: aahz@pythoncraft.com (Aahz) Date: Mon, 16 Sep 2002 11:14:19 -0400 Subject: [Python-Dev] Please help solving the problem In-Reply-To: References: Message-ID: <20020916151418.GA9134@panix.com> Please post this question to comp.lang.python; python-dev is only for discussion for development of the Python project itself. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From martin@v.loewis.de Mon Sep 16 16:50:46 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Sep 2002 17:50:46 +0200 Subject: [Python-Dev] flextype.c -- extended type system In-Reply-To: <3D85C8EC.50007@tismer.com> References: <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net> <3D85C8EC.50007@tismer.com> Message-ID: Christian Tismer writes: > > Christian's additions (as far as I understand them :-) are mostly > > intended for very esoteric situations. > > My additions support a subset of C++ virtual methods. > How is that esoteric? Why would an extension writer ever want to do this? "Normal" extension types either wrap some C type, so you don't have inheritance at all, or some C++ type, in which case a single type method can wrap arbitrary virtual methods (since the VMT is done in C++). A real-world example would help. Regards, Martin From thomas.heller@ion-tof.com Mon Sep 16 19:46:06 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 16 Sep 2002 20:46:06 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net><3D85C8EC.50007@tismer.com> Message-ID: <08a701c25db1$50aa0730$e000a8c0@thomasnotebook> > > My additions support a subset of C++ virtual methods. > > How is that esoteric? > > Why would an extension writer ever want to do this? "Normal" extension > types either wrap some C type, so you don't have inheritance at all, > or some C++ type, in which case a single type method can wrap > arbitrary virtual methods (since the VMT is done in C++). I'm still in favor of a 'clean' method to add additional C accessible structure fields to types. Currently I'm attaching them to the the type's dict, as I reported before. As I understand it, Christian's first patch allows this. Thomas From David Abrahams" <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net><3D85C8EC.50007@tismer.com> Message-ID: <056601c25db3$6aa6a330$6701a8c0@boostconsulting.com> From: "Martin v. Loewis" > Christian Tismer writes: > > > > Christian's additions (as far as I understand them :-) are mostly > > > intended for very esoteric situations. > > > > My additions support a subset of C++ virtual methods. > > How is that esoteric? > > Why would an extension writer ever want to do this? "Normal" extension > types either wrap some C type, so you don't have inheritance at all, > or some C++ type, in which case a single type method can wrap > arbitrary virtual methods (since the VMT is done in C++). > > A real-world example would help. Well, I want to do something like this, and I think it's for a fairly simple reason. All of my (dynamically-generated) extension classes need a piece of data which tells them how much extra data to allocate in the variable-sized area of their instances. This is an implementation detail which I don't want to expose to users. Right now I have to stick it in the class' __dict__, which not only means that it's exposed, but that users can change it at will. It also costs me an extra lookup every time an instance of the extension class is allocated. It would be much nicer if I could get a little data area in the type object where I could stick this value, but right now there's no place to put it. Chris' patch allows me to handle the issue much more naturally. It doesn't seem esoteric to add information to a type which doesn't live it its __dict__. Not being able to do so makes types very different from other objects. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com Of course, that makes it esoteric by its very definition ;-) From drifty@bigfoot.com Mon Sep 16 20:12:22 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 16 Sep 2002 12:12:22 -0700 (PDT) Subject: [Python-Dev] Python-dev summary for 2002-09-01 to 2002-09-15 In-Reply-To: Message-ID: [David Goodger] > Brett Cannon wrote: > > Please note that this summary is written using reST_ which can be found at > > http://docutils.sourceforge.net/rst.html . If there is some markup in the > > summary that seems odd, chances are it is because of reST. > > Please don't blame the markup! By the time people see it, it's been > mutilated by mailers to the point where it's unrecognizable. Like Python, > leading whitespace is significant in reStructuredText. As the author, > please take steps to prevent your document's mutilation. > OK, I will mention that it might be reformatted in a strange way by their reader as well, but I am going to leave in the mention of reST. People are going to not necessarily understand why I have :: after a paragraph. > There are some serious problems in the text I received, probably due to > emailer handling of the text. Specifically, line wrapping gets screwed up > if lines are longer than 76 or 78 characters, and indentation goes out the > window. I always limit files to 70 characters per line to prevent this. > I am willing to guarantee that is the problem. I ran the summary through tools/html.py and everything turned out well. Problem is that wrapping at 70 characters will be a pain for me. I can try to use textwrap from Python 2.3 to do it for me, but unless I discover a setting in my editor (BBEdit Lite; Vim was driving me nuts for straight text editing), I don't know if my sanity is going to allow for this request. I am willing, though, to put in a line saying that various email and newsgroup readers might reformat the code and that if you want the original to run through reST code yourself, get it at from my summary repository. > I haven't had a chance to look through it thoroughly (gotta get some sleep), > but I noticed you used a literal block for your author's intro, beginning > "This is the second summary". I think a block quote would be better; just > drop the "::" and fix the indentation (which was totally wacky). > I was playing with that just before sending it out. You answered my personal email about it already, so that will be fixed before the summary goes out. > If you send me the original as an attachment (gzipped would be best), I'll > be happy to take a look and give a detailed critique. > OK. -Brett From drifty@bigfoot.com Mon Sep 16 20:14:38 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 16 Sep 2002 12:14:38 -0700 (PDT) Subject: [Python-Dev] Python-dev summary for 2002-09-01 to 2002-09-15 In-Reply-To: <20020916105905.GE797@xs4all.nl> Message-ID: [Thomas Wouters] > On Sun, Sep 15, 2002 at 09:31:53PM -0700, Brett Cannon wrote: > > > Skip Montanaro had two ideas; one was to make the info in `Misc/NEWS`_ > > I suspect there's a "using reST" or "in reST" missing here. > Yep. Thanks. -Brett From thomas.heller@ion-tof.com Mon Sep 16 20:24:54 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 16 Sep 2002 21:24:54 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net><3D85C8EC.50007@tismer.com> <056601c25db3$6aa6a330$6701a8c0@boostconsulting.com> Message-ID: <092701c25db6$bc9508f0$e000a8c0@thomasnotebook> From: "David Abrahams" > All of my (dynamically-generated) extension classes need a piece of data > which tells them how much extra data to allocate in the variable-sized area > of their instances. This is an implementation detail which I don't want to Not so different from what I need... > expose to users. Right now I have to stick it in the class' __dict__, which > not only means that it's exposed, but that users can change it at will. It > also costs me an extra lookup every time an instance of the extension class > is allocated. It would be much nicer if I could get a little data area in > the type object where I could stick this value, but right now there's no > place to put it. You can (but you probably know this already) replace the type's tp_dict by a custom subclass of PyDict_Object, which adds additional fields. > Chris' patch allows me to handle the issue much more naturally. It doesn't > seem esoteric to add information to a type which doesn't live it its > __dict__. Not being able to do so makes types very different from other > objects. Actually this is not specific to types - it is for all variable size objects. Thomas From David Abrahams" <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net><3D85C8EC.50007@tismer.com> <056601c25db3$6aa6a330$6701a8c0@boostconsulting.com> <092701c25db6$bc9508f0$e000a8c0@thomasnotebook> Message-ID: <05a301c25db5$b3161db0$6701a8c0@boostconsulting.com> From: "Thomas Heller" > > You can (but you probably know this already) replace the type's tp_dict > by a custom subclass of PyDict_Object, which adds additional fields. I probably knew that once. Thanks for reminding me. When I have time... ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From mcolli@SyscomCipher.com.ar Mon Sep 16 20:24:45 2002 From: mcolli@SyscomCipher.com.ar (mcolli@SyscomCipher.com.ar) Date: Mon, 16 Sep 2002 16:24:45 -0300 Subject: [Python-Dev] looking for Python programmers Message-ID: Hi, I would like to know how can I do to get information about Python programmers that could be interested to work in a Zope/Python project in Buenos Aires, Argentina. Is this the right address to contact? Many thanks and regards Mariela From drifty@bigfoot.com Mon Sep 16 20:34:09 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 16 Sep 2002 12:34:09 -0700 (PDT) Subject: [Python-Dev] looking for Python programmers In-Reply-To: Message-ID: [mcolli@SyscomCipher.com.ar] > Hi, > > I would like to know how can I do to get information about Python > programmers that could be interested to work in a Zope/Python project in > Buenos Aires, Argentina. > > Is this the right address to contact? > No. This address is used to discuss the development of Python. Your search would be better performed either on comp.lang.python and comp.lang.python.announce . -Brett C. From just@letterror.com Mon Sep 16 22:31:05 2002 From: just@letterror.com (Just van Rossum) Date: Mon, 16 Sep 2002 23:31:05 +0200 Subject: [Python-Dev] SystemError: unknown opcode Message-ID: After building Python from CVS (it's been a while) I get this error: Python 2.3a0 (#43, Sep 16 2002, 22:47:33) [GCC 2.95.2 19991024 (release)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import fontTools XXX lineno: 1, opcode: 127 Traceback (most recent call last): File "", line 1, in ? File "/Users/just/code/fonttools/Lib/fontTools/__init__.py", line 1, in ? version = "2.0b2" SystemError: unknown opcode >>> Does this mean the .pyc magic number needs to be changed? Or is it simply the risk of using CVS Python? ;-) Just From martin@v.loewis.de Mon Sep 16 22:41:08 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Sep 2002 23:41:08 +0200 Subject: [Python-Dev] looking for Python programmers In-Reply-To: References: Message-ID: Brett Cannon writes: > > I would like to know how can I do to get information about Python > > programmers that could be interested to work in a Zope/Python project in > > Buenos Aires, Argentina. [...] > No. This address is used to discuss the development of Python. Your > search would be better performed either on comp.lang.python and > comp.lang.python.announce . Actually, I think the Python Job Board (http://python.org/Jobs.html) is the right forum for posting intents to hire. Regards, Martin From guido@python.org Mon Sep 16 22:43:29 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 16 Sep 2002 17:43:29 -0400 Subject: [Python-Dev] SystemError: unknown opcode In-Reply-To: Your message of "Mon, 16 Sep 2002 23:31:05 +0200." References: Message-ID: <200209162143.g8GLhTa28570@pcp02138704pcs.reston01.va.comcast.net> > After building Python from CVS (it's been a while) I get this error: ^^^^^^^^^^^^^^^^^ This is key. > Python 2.3a0 (#43, Sep 16 2002, 22:47:33) > [GCC 2.95.2 19991024 (release)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import fontTools > XXX lineno: 1, opcode: 127 > Traceback (most recent call last): > File "", line 1, in ? > File "/Users/just/code/fonttools/Lib/fontTools/__init__.py", line 1, in ? > version = "2.0b2" > SystemError: unknown opcode > >>> > > Does this mean the .pyc magic number needs to be changed? Or is it > simply the risk of using CVS Python? ;-) The magic number was changed several times due to the SET_LINENO changes. At the end I changed it *back* to what we changed it to after another change earlier during the 2.3 cycle. You may be the only unlucky guy who missed both rounds of SET_LINENO changes. Remove all your .pyc/.pyo files and be done with it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Sep 16 23:21:31 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 16 Sep 2002 18:21:31 -0400 Subject: [Python-Dev] Moratorium on changes to IDLE Message-ID: <200209162221.g8GMLVY30469@pcp02138704pcs.reston01.va.comcast.net> I'd like to put a stop to all changes to the version of IDLE in the Python source tree (Tools/idle/* -- let's call it Python-idle). The current crop of changes are being merged into Idlefork, the separate SF project where a new IDLE version is being cooked. I hope that Idlefork will be ready to be merged back into Python before we release Python 2.3, and that will be easiest if we can simply abandon the existing Python-idle code and copy the latest Idlefork in its place. Any changes made to the Python-idle code will be lost at that point. If you have a bug, fix or feature for IDLE, please suggest it on the idle-dev mailing list or on Idlefork's SF bug/patch managers! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 17 00:23:15 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 16 Sep 2002 19:23:15 -0400 Subject: [Python-Dev] Python-dev summary for 2002-09-01 to 2002-09-15 In-Reply-To: Your message of "Sun, 15 Sep 2002 21:31:53 PDT." References: Message-ID: <200209162323.g8GNNFa30696@pcp02138704pcs.reston01.va.comcast.net> > Jack Jansen noticed that there demos for some of the SGI-specific modules > that use severely outdated systems and hardware (stuff discontinued 8 to > 12 years ago). Guido gave the go-ahead to yank them from CVS. So the > demos are now history. Wish they were! Nobody ripped them out. Sjoerd Mullender gave some more feedback (some of the code still works) but in the end nobody did anything. I still hope it'll happen though. > Guido said go for the latter and didn't see any possible security issue > since "If someone you don't trust can write your .pyc files, they can > cause your interpreter to crash by inserting bogus bytecode". Another issue where I fear the action item is still in somebody's corner. --Guido van Rossum (home page: http://www.python.org/~guido/) From goodger@users.sourceforge.net Tue Sep 17 01:53:49 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Mon, 16 Sep 2002 20:53:49 -0400 Subject: [Python-Dev] Re: Python-dev summary for 2002-09-01 to 2002-09-15 In-Reply-To: Message-ID: [David Goodger] >> Please don't blame the markup! By the time people see it, it's >> been mutilated by mailers to the point where it's unrecognizable. >> [...] As the author, please take steps to prevent your document's >> mutilation. [Fran=E7ois Pinard] > The message seems adequately formatted, as delivered here. I'm seeing subtle things that you probably won't notice if you're not used to writing reStructuredText. (Which is as it should be -- easy to read, even though some care must be taken in the writing.) Specifically, the paragraph beginnng "This is the second summary" has very strange indentation. Every other line is indented by one space (tab?). The simplest explanation for this is that the whole thing was supposed to be indented, but the lines were very long. > This is a recurring problem, deciding how far maintainers or writers > should keep in mind various broken software of the recipients. I think the problem is earlier than my mail client. The text I received in my mailbox is identical (indentation is equally wonky) to that on the web in the Python-dev archive: http://mail.python.org/pipermail/python-dev/2002-September/028754.html. > There is an equilibrium to reach, but the pressure is often undue on > the authors, as recipients want them to take care of everything bad > they see. In this case, it's a document posted to mailing lists. I don't think it's too much to ask that mailer line wrapping be allowed for. I agree that long URLs must remain, but there shouldn't be a problem for ordinary text. --=20 David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From fg@nuxeo.com Tue Sep 17 02:00:09 2002 From: fg@nuxeo.com (Florent Guillaume) Date: 17 Sep 2002 03:00:09 +0200 Subject: [Python-Dev] Unicode regexp problem Message-ID: <1032224410.13730.9.camel@twin.in.efge.org> I've got the following problem, in python 2.1, 2.2 and 2.3a0 (Debian): >>> import re >>> re.compile(r'\w+', re.U).sub('X', u'hello caf\xe9') u'X X' >>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXXX' >>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9') u'XXXXX XXX\xe9' The first two results are ok, but the third is not. Thanks, Florent PS: I'd appreciate a Cc on answers. -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com From aahz@pythoncraft.com Tue Sep 17 02:05:19 2002 From: aahz@pythoncraft.com (Aahz) Date: Mon, 16 Sep 2002 21:05:19 -0400 Subject: [Python-Dev] Unicode regexp problem In-Reply-To: <1032224410.13730.9.camel@twin.in.efge.org> References: <1032224410.13730.9.camel@twin.in.efge.org> Message-ID: <20020917010519.GA9969@panix.com> On Tue, Sep 17, 2002, Florent Guillaume wrote: > > I've got the following problem, in python 2.1, 2.2 and 2.3a0 (Debian): > > >>> import re > >>> re.compile(r'\w+', re.U).sub('X', u'hello caf\xe9') > u'X X' > >>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9') > u'XXXXX XXXX' > >>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9') > u'XXXXX XXX\xe9' > > The first two results are ok, but the third is not. python-dev is the wrong forum for bug reports, unless a) it's *only* in the CVS tree and b) you know you need advice for fixing it (and are planning to help fix) In any case, you should write a bug report on SourceForge first (unless you're posting to c.l.python to check whether it is a bug). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From martin@v.loewis.de Tue Sep 17 06:25:46 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 17 Sep 2002 07:25:46 +0200 Subject: [Python-Dev] Moratorium on changes to IDLE In-Reply-To: <200209162221.g8GMLVY30469@pcp02138704pcs.reston01.va.comcast.net> References: <200209162221.g8GMLVY30469@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > I'd like to put a stop to all changes to the version of IDLE in the > Python source tree (Tools/idle/* -- let's call it Python-idle). The > current crop of changes are being merged into Idlefork, the separate > SF project where a new IDLE version is being cooked. Does Idlefork also require CVS Python? Regards, Martin From guido@python.org Tue Sep 17 06:28:19 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 17 Sep 2002 01:28:19 -0400 Subject: [Python-Dev] Moratorium on changes to IDLE In-Reply-To: Your message of "Tue, 17 Sep 2002 07:25:46 +0200." References: <200209162221.g8GMLVY30469@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209170528.g8H5SJA04609@pcp02138704pcs.reston01.va.comcast.net> > > I'd like to put a stop to all changes to the version of IDLE in the > > Python source tree (Tools/idle/* -- let's call it Python-idle). The > > current crop of changes are being merged into Idlefork, the separate > > SF project where a new IDLE version is being cooked. > > Does Idlefork also require CVS Python? Not yet, AFAIK, it requires 2.2. A very small number of changes to Python-idle could not be merged for that reason (e.g. mkstemp). I'd like to keep Idlefork working with 2.2 so there's a reasonable potential user base for an Idlefork release. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Tue Sep 17 10:44:38 2002 From: tismer@tismer.com (Christian Tismer) Date: Tue, 17 Sep 2002 11:44:38 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net><3D85C8EC.50007@tismer.com> <056601c25db3$6aa6a330$6701a8c0@boostconsulting.com> Message-ID: <3D86F986.7000305@tismer.com> David Abrahams wrote: > Chris' patch allows me to handle the issue much more naturally. It doesn't > seem esoteric to add information to a type which doesn't live it its > __dict__. Not being able to do so makes types very different from other > objects. > > > ----------------------------------------------------------- > David Abrahams * Boost Consulting > dave@boost-consulting.com * http://www.boost-consulting.com > > Of course, that makes it esoteric by its very definition ;-) Hee hee :-)) -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer@tismer.com Tue Sep 17 10:56:34 2002 From: tismer@tismer.com (Christian Tismer) Date: Tue, 17 Sep 2002 11:56:34 +0200 Subject: [Python-Dev] flextype.c -- extended type system References: <200209151943.g8FJhc809943@pcp02138704pcs.reston01.va.comcast.net><3D85C8EC.50007@tismer.com> <08a701c25db1$50aa0730$e000a8c0@thomasnotebook> Message-ID: <3D86FC52.9010505@tismer.com> Thomas Heller wrote: >>>My additions support a subset of C++ virtual methods. >>>How is that esoteric? >> >>Why would an extension writer ever want to do this? "Normal" extension >>types either wrap some C type, so you don't have inheritance at all, >>or some C++ type, in which case a single type method can wrap >>arbitrary virtual methods (since the VMT is done in C++). > > > I'm still in favor of a 'clean' method to add additional > C accessible structure fields to types. Currently I'm > attaching them to the the type's dict, as I reported before. > > As I understand it, Christian's first patch allows this. Please let me know when you're actually going to use it. I know there is a bug in the 2.3 patch. For Stackless, I'm still hacking against 2.2.1, and the patch has been extended in serveral ways as well: I removed the assumption that objects generated from heap types need always to be GC objects. This was probably decided with too much classes in mind, but now this feature also makes sense to simple types where you might want to avoid GC for space or other reasons. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido@python.org Tue Sep 17 21:24:18 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 17 Sep 2002 16:24:18 -0400 Subject: [Python-Dev] Weeding out obsolete modules and Demos In-Reply-To: Your message of "Wed, 11 Sep 2002 10:44:03 +0200." <200209110844.g8B8i3D14336@indus.ins.cwi.nl> References: <10D997B8-C501-11D6-88B2-003065517236@oratrix.com> <200209102119.g8ALJ1h29280@odiug.zope.com> <200209110844.g8B8i3D14336@indus.ins.cwi.nl> Message-ID: <200209172024.g8HKOIo08287@odiug.zope.com> > I would not be opposed to deleting the whole Demo/sgi tree. OK, I've done this. I've saved a private copy of the mclock code for nostalgic purposes; I may yet rewrite using Tkinter. :-) > I don't have an SGI workstation anymore, but I did have an SGI O2 > until recently. I think the audio (and al) stuff and the gl stuff > probably still work (I used mclock until recently). I think we can > definitely get rid of the video directory (CMIF video format, remember > that?). I'm not sure whether sv still compiles on modern SGI's. The > cd module also still works. > > Having said this, I'm not sure there is still much value in keeping > the demos. The modules in the Modules directory is another matter. > Until recently I have used cd and al. I think cl might still work, > but I'm not sure. I don't think sv works on anything other than > Indigo's with a Starter Video board. > gl (and I think also fm) still works. sgi also still works, but I'm > not sure how useful it still is. It just defines functions nap and > _getpty. > rgbimg is for reading SGI RGB images, but is portable. Although one > must ask whether it has a place in the standard library, since there > is no similar level of support for more popular image formats. OK, I won't touch the SGI specific code in Lib and Modules. --Guido van Rossum (home page: http://www.python.org/~guido/) From kbk@shore.net Wed Sep 18 00:59:57 2002 From: kbk@shore.net (Kurt B. Kaiser) Date: 17 Sep 2002 19:59:57 -0400 Subject: [Python-Dev] Idle Development Message-ID: Guido posted a moratorium on futher Python Idle development, with the intention that work be shifted to Idlefork. I'd like to extend an invitation to any interested developers with Python CVS access to join the Idlefork project and continue in their idle ways. Send me an email and I'll set you up with Idlefork access. KBK From kbk@shore.net Wed Sep 18 18:55:52 2002 From: kbk@shore.net (Kurt B. Kaiser) Date: 18 Sep 2002 13:55:52 -0400 Subject: [Python-Dev] ANNOUNCE -- Python-idle to Idlefork Merge Completed Message-ID: Python-idle has been merged into Idlefork as of 13 Sep 2002. I believe there are no Python-idle check-ins after that date, and since there will not be any future check-ins this should be the final merge from Python-idle. The Idlefork CVS is again open for business! Please submit bugs and patches to the Idlefork Tracker and any comments to the Idle-dev list. KBK From praveen.patil@silver-software.com Thu Sep 19 14:39:43 2002 From: praveen.patil@silver-software.com (Praveen Patil) Date: Thu, 19 Sep 2002 14:39:43 +0100 Subject: [Python-Dev] How pass array from C to python function Message-ID: Hi , I have problem in passing array to python function. Please help me passing array to python function. Here is my 'C' program ----------------------- void RECEIVE_IL_STATE_S( int Instance , int vital_data[5]) { PyObject*arglist; PyObject* ret; PyObject* mylist; int count; mylist = PyList_New(5); for (count=0; count< 5; count++) { myint = PyInt_FromLong(vital_data[count]); PyList_Append(mylist,myint); } arglist = Py_BuildValue("O", mylist); ret = PyEval_CallObject(my_callback , arglist); Py_DECREF(arglist); Py_DECREF(ret); } Here is my Python program ------------------------- G_Logfile = None def TestFunction(a): G_Logfile = open('Pytestfile.txt', 'w') G_Logfile.write("%d \n"% a[0]) G_Logfile.write("%d \n"% a[1]) G_Logfile.close Cheers, Praveen. [ The information contained in this e-mail is confidential and is intended for the named recipient only. If you are not the named recipient, please notify us by telephone on +44 (0)1249 442 430 immediately, destroy the message and delete it from your computer. Silver Software has taken every reasonable precaution to ensure that any attachment to this e-mail has been checked for viruses. However, we cannot accept liability for any damage sustained as a result of any such software viruses and advise you to carry out your own virus check before opening any attachment. Furthermore, we do not accept responsibility for any change made to this message after it was sent by the sender.] From thomas.heller@ion-tof.com Thu Sep 19 15:28:33 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 19 Sep 2002 16:28:33 +0200 Subject: [Python-Dev] CVS hosed? Message-ID: <3D89DF11.4070106@ion-tof.com> I cannot reach python's CVS repository. Should I look on my side for a problem, or is it the same for other developers? Thanks, Thomas From guido@python.org Thu Sep 19 15:31:08 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 19 Sep 2002 10:31:08 -0400 Subject: [Python-Dev] CVS hosed? In-Reply-To: Your message of "Thu, 19 Sep 2002 16:28:33 +0200." <3D89DF11.4070106@ion-tof.com> References: <3D89DF11.4070106@ion-tof.com> Message-ID: <200209191431.g8JEV8b03080@pcp02138704pcs.reston01.va.comcast.net> > I cannot reach python's CVS repository. > Should I look on my side for a problem, or is it the same for other > developers? Same here. I'll submit a support request; the SF status page says everything's online. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Thu Sep 19 15:33:20 2002 From: aahz@pythoncraft.com (Aahz) Date: Thu, 19 Sep 2002 10:33:20 -0400 Subject: [Python-Dev] How pass array from C to python function In-Reply-To: References: Message-ID: <20020919143320.GA11594@panix.com> On Thu, Sep 19, 2002, Praveen Patil wrote: > > I have problem in passing array to python function. > Please help me passing array to python function. python-dev is not for general questions about Python programming. Please post your question to comp.lang.python. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From sjoerd@acm.org Thu Sep 19 15:42:05 2002 From: sjoerd@acm.org (Sjoerd Mullender) Date: Thu, 19 Sep 2002 16:42:05 +0200 Subject: [Python-Dev] CVS hosed? In-Reply-To: <200209191431.g8JEV8b03080@pcp02138704pcs.reston01.va.comcast.net> References: <3D89DF11.4070106@ion-tof.com> <200209191431.g8JEV8b03080@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209191442.g8JEg5Z25259@indus.ins.cwi.nl> There are already half a dozen such support requests. Lots of people are bothered by this in lots of projects. On Thu, Sep 19 2002 Guido van Rossum wrote: > > I cannot reach python's CVS repository. > > Should I look on my side for a problem, or is it the same for other > > developers? > > Same here. I'll submit a support request; the SF status page says > everything's online. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > -- Sjoerd Mullender From barry@python.org Thu Sep 19 15:40:55 2002 From: barry@python.org (Barry A. Warsaw) Date: Thu, 19 Sep 2002 10:40:55 -0400 Subject: [Python-Dev] CVS hosed? References: <3D89DF11.4070106@ion-tof.com> Message-ID: <15753.57847.13384.639206@anthem.wooz.org> >>>>> "TH" == Thomas Heller writes: TH> I cannot reach python's CVS repository. Should I look on my TH> side for a problem, or is it the same for other developers? All of SF's CVS has been hosed for a while this morning. I suspect it'll eventually come back . -Barry From mats@laplaza.org Thu Sep 19 18:26:38 2002 From: mats@laplaza.org (Mats Wichmann) Date: Thu, 19 Sep 2002 11:26:38 -0600 Subject: [Python-Dev] Re: mysterious hangs in socket code In-Reply-To: <20020904160006.20673.57240.Mailman@mail.python.org> Message-ID: <5.1.0.14.1.20020919112055.01ef1828@204.151.72.2> >> One possibility is that the Linux getaddrinfo() is thread-safe, but >> only by way of a lock that only allows one request to be outstanding >> at a time. > >The next step should be to get the getaddrinfo() source code from >glibc and see what it does. It's open source, hey. :-) I can dig around a bit, but I have to figure out what I'm looking for. On the failure platform, are we sure Python is using the native getaddrinfo, not the Python-supplied one? I've had some fun (not) with the latter; for working on an LSB-conforming version of Python, I can't let it use the glibc version of getaddrinfo because it's not in the spec (will be in the next version); but the Python addrinfo.h header has some fields in different order than the Linux one, and it managed to call the Linux one anyway. The result of that was not subtle, however :-) so I don't think that's the problem that started this thread. I do know the Linux (or rather, glibc) getaddrinfo doesn't get rentrancy through magic, it calls gethostbyname_r and gethostbyaddr_r. (Note the Python emulation getaddrinfo just calls the straight gethostbyname and gethostbyaddr routines and so is likely not to be reentrant). From martin@v.loewis.de Thu Sep 19 19:33:23 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 19 Sep 2002 20:33:23 +0200 Subject: [Python-Dev] Re: mysterious hangs in socket code In-Reply-To: <5.1.0.14.1.20020919112055.01ef1828@204.151.72.2> References: <5.1.0.14.1.20020919112055.01ef1828@204.151.72.2> Message-ID: Mats Wichmann writes: > >> One possibility is that the Linux getaddrinfo() is thread-safe, but > >> only by way of a lock that only allows one request to be outstanding > >> at a time. > > > >The next step should be to get the getaddrinfo() source code from > >glibc and see what it does. It's open source, hey. :-) > > I can dig around a bit, but I have to figure out what > I'm looking for. I think that part is already settled: getaddrinfo, on Linux, is thread-safe. > On the failure platform, are we sure Python is using > the native getaddrinfo, not the Python-supplied one? Correct. I think the remaining question is: Even if the GIL is released around getaddrinfo - why is the performance of Jeremy's test script still that bad? Regards, Martin From guido@python.org Thu Sep 19 19:40:38 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 19 Sep 2002 14:40:38 -0400 Subject: [Python-Dev] Re: mysterious hangs in socket code In-Reply-To: Your message of "Thu, 19 Sep 2002 20:33:23 +0200." References: <5.1.0.14.1.20020919112055.01ef1828@204.151.72.2> Message-ID: <200209191840.g8JIecF02462@odiug.zope.com> > Mats Wichmann writes: > > > >> One possibility is that the Linux getaddrinfo() is thread-safe, but > > >> only by way of a lock that only allows one request to be outstanding > > >> at a time. > > > > > >The next step should be to get the getaddrinfo() source code from > > >glibc and see what it does. It's open source, hey. :-) > > > > I can dig around a bit, but I have to figure out what > > I'm looking for. [MvL] > I think that part is already settled: getaddrinfo, on Linux, is > thread-safe. > > > On the failure platform, are we sure Python is using > > the native getaddrinfo, not the Python-supplied one? > > Correct. > > I think the remaining question is: Even if the GIL is released around > getaddrinfo - why is the performance of Jeremy's test script still > that bad? I tried to read the glibc getaddrinfo() source, but it looks like it would be a term project... It could be that it's just doing a lot more interaction with a DNS server. I believe that Jeremy suspects that the test program isn't just slow, but that one slow thread actually blocks all other threads from making progress. If that's the case (we don't know for sure), we're looking for a bottleneck in the getaddrinfo() code that somehow holds a resource needed by all threads calling getaddrinfo(). --Guido van Rossum (home page: http://www.python.org/~guido/) From srinivas.rao.kollipara@ai.ag Fri Sep 20 09:28:08 2002 From: srinivas.rao.kollipara@ai.ag (Srinivas Rao Kollipara) Date: Fri, 20 Sep 2002 10:28:08 +0200 Subject: [Python-Dev] information needed Message-ID: <6747D50F1AE5D511A02E009027CA36D137B977@exchange.mucs.ai.ag> Hi, I have a small python code, is there any tool which can convert the python code to oracle plsql code. Thanks kolli From drifty@bigfoot.com Fri Sep 20 10:10:55 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Fri, 20 Sep 2002 02:10:55 -0700 (PDT) Subject: [Python-Dev] information needed In-Reply-To: <6747D50F1AE5D511A02E009027CA36D137B977@exchange.mucs.ai.ag> Message-ID: [Srinivas Rao Kollipara] > Hi, > I have a small python code, is there any tool which can convert the python > code to oracle plsql code. > > Thanks > > kolli > The Python-dev list is used to discuss the development of Python. General questions about Python such as this are best served by being posted to comp.lang.python . -Brett C. From skip@pobox.com Fri Sep 20 15:47:05 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 20 Sep 2002 09:47:05 -0500 Subject: [Python-Dev] ReST-ing Misc/NEWS Message-ID: <15755.13545.743552.357262@12-248-11-90.client.attbi.com> I just checked in a ReST-ified version of Misc/NEWS. While the total number of changes was fairly large, the number of different types of changes made were quite small. The overwhelming majority of changes involved properly highlighting section and subsection headers. I'll review the changes here so that people who modify this file in the future will be able to easily adapt to the new format. First, to process Misc/NEWS using ReST, you'll need the latest docutils snapshot: http://docutils.sf.net/docutils-snapshot.tgz David Goodger made a change to the allowable structure of internal references which simplified my job significantly. The changes required fell into the following categories: * The top-level "What's New" section headers changed to: What's New in Python 2.3 alpha 1? ================================= *XXX Release date: DD-MMM-2002 XXX* * Subsections are underlined with a single row of hyphens: Type/class unification and new-style classes -------------------------------------------- * Places where "balanced" single quotes were used were switched to use just apostrophes (`string' -> 'string'). * In a few places asterisks needed to be escaped which would otherwise have been interpreted as beginning blocks of italic or bold text, e.g.: - The type of tp_free has been changed from "void (*)(PyObject \*)" to "void (*)(void \*)". Note that only the asterisks preceded by whitespace needed to be escaped. * One instance of a word ending with an underscore needed to be quoted ("PyCmp_" became "``PyCmp_``"). * One table was converted to ReST form (search Misc/NEWS for "New codecs" for this example). * A few places where chunks of code or indented text were displayed needed to be properly introduced (preceding paragraph terminated by "::" and the chunk of code or text indented w.r.t. the paragraph). For example: - Note that PyLong_AsDouble can fail! This has always been true, but no callers checked for it. It's more likely to fail now, because overflow errors are properly detected now. The proper way to check: :: double x = PyLong_AsDouble(some_long_object); if (x == -1.0 && PyErr_Occurred()) { /* The conversion failed. */ } Not yet addressed is whether to automatically convert Misc/NEWS to other formats (such as HTML). I assume an automatic conversion to HTML is in the cards, with the output made available on the python.org website. Skip From mats@laplaza.org Fri Sep 20 17:12:33 2002 From: mats@laplaza.org (Mats Wichmann) Date: Fri, 20 Sep 2002 10:12:33 -0600 Subject: [Python-Dev] Re: Re: mysterious hangs in socket code In-Reply-To: <20020920160008.21719.12166.Mailman@mail.python.org> Message-ID: <5.1.0.14.1.20020920100536.027f1158@204.151.72.2> Martin: >I think that part is already settled: getaddrinfo, on Linux, is >thread-safe. The latest Posix/Single UNIX spec in fact require getaddrinfo (and getnameinfo) to be thread-safe. and Guido: >I tried to read the glibc getaddrinfo() source, but it looks like it >would be a term project... It could be that it's just doing a lot >more interaction with a DNS server. > >I believe that Jeremy suspects that the test program isn't just slow, >but that one slow thread actually blocks all other threads from making >progress. If that's the case (we don't know for sure), we're looking >for a bottleneck in the getaddrinfo() code that somehow holds a >resource needed by all threads calling getaddrinfo(). Gives me a headache, too, especially once it vectors off into the glibc nss code. These routines (__gethostbyname2_r is the likely suspect, __gethostbyaddr_r and __getservbyname_r might also get called) do have paths that could twiddle a glibc internal lock so it's not impossible there's an issue here, although at something less than a term-paper-depth look it doesn't SEEM like it should ever be able to block for very long. From guido@python.org Fri Sep 20 17:16:38 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 20 Sep 2002 12:16:38 -0400 Subject: [Python-Dev] Re: Re: mysterious hangs in socket code In-Reply-To: Your message of "Fri, 20 Sep 2002 10:12:33 MDT." <5.1.0.14.1.20020920100536.027f1158@204.151.72.2> References: <5.1.0.14.1.20020920100536.027f1158@204.151.72.2> Message-ID: <200209201616.g8KGGdj09850@pcp02138704pcs.reston01.va.comcast.net> > Gives me a headache, too, especially once it vectors off into > the glibc nss code. These routines (__gethostbyname2_r is the > likely suspect, __gethostbyaddr_r and __getservbyname_r might > also get called) do have paths that could twiddle a glibc internal > lock so it's not impossible there's an issue here, although at > something less than a term-paper-depth look it doesn't SEEM like > it should ever be able to block for very long. Are there any tools for observing the locking calls made? Maybe just an strace would help. --Guido van Rossum (home page: http://www.python.org/~guido/) From cgw@alum.mit.edu Fri Sep 20 17:37:15 2002 From: cgw@alum.mit.edu (Charles G Waldman) Date: Fri, 20 Sep 2002 09:37:15 -0700 Subject: [Python-Dev] Puzzling behavior when subclassing from float Message-ID: <15755.20155.992326.621736@dragonfly.sportsdatabase.com> Aplogies in advance if this is the wrong forum for this question. I'm trying to understand some puzzling behavior related to subclassing from built-in types. I'm running Python 2.2.1 If I subclass from "dict", it seems that the base class constructor is not being called, which is just as I would expect: class D(dict): def __init__(self, spam, eggs): print "spam=",spam, "eggs=", eggs >>> d = D(1,2) spam= 1 eggs= 2 >>> print d {} But if I subclass from "float", some magic is happening, which I don't quite understand - it seems that the base class constructor *is* called: class F(float): def __init__(self, spam, eggs): print "spam=",spam, "eggs=", eggs >>> f = F(1,2) Traceback (most recent call last): File "", line 1, in ? TypeError: float() takes at most 1 argument (2 given) If I modify the constructor to only take a single argument, I get the following: class F(float): def __init__(self, spam): print "spam=",spam >>> f = F(3.14) spam= 3.14 >>> print f 3.14 >>> print f*2 6.28 How is the value "3.14" getting associated with f? Apparently the base class constructor is called. How come this is happening? From guido@python.org Fri Sep 20 17:42:14 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 20 Sep 2002 12:42:14 -0400 Subject: [Python-Dev] Puzzling behavior when subclassing from float In-Reply-To: Your message of "Fri, 20 Sep 2002 09:37:15 PDT." <15755.20155.992326.621736@dragonfly.sportsdatabase.com> References: <15755.20155.992326.621736@dragonfly.sportsdatabase.com> Message-ID: <200209201642.g8KGgEM10013@pcp02138704pcs.reston01.va.comcast.net> > Aplogies in advance if this is the wrong forum for this question. It is, but because you're you, I don't mind. You're missing that besides __init__, new-style classes also have a lower-level constructor, __new__. This is called before __init__. For immutable objects, __new__ is where the action is. Read about it in http://www.python.org/2.2.1/descrintro.html --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Fri Sep 20 18:02:55 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 20 Sep 2002 13:02:55 -0400 Subject: [Python-Dev] Re: Re: mysterious hangs in socket code In-Reply-To: <5.1.0.14.1.20020920100536.027f1158@204.151.72.2> References: <20020920160008.21719.12166.Mailman@mail.python.org> <5.1.0.14.1.20020920100536.027f1158@204.151.72.2> Message-ID: <20020920170255.GA13783@panix.com> On Fri, Sep 20, 2002, Mats Wichmann wrote: > Martin: >> >>I think that part is already settled: getaddrinfo, on Linux, is >>thread-safe. > > The latest Posix/Single UNIX spec in fact require getaddrinfo > (and getnameinfo) to be thread-safe. Thread-safe or thread-hot? E.g., Python in the absence of releasing the GIL is thread-safe but not thread-hot. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From ark@research.att.com Fri Sep 20 19:54:24 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 20 Sep 2002 14:54:24 -0400 (EDT) Subject: [Python-Dev] Installation fails under Solaris 2.7 or 2.8 with binutils 2.13 Message-ID: <200209201854.g8KIsO208760@europa.research.att.com> Nick Clifton at Red Hat has been kind enough to figure out for me why I had been unable to install Python on Solaris 2.7 or 2.8 when using binutils 2.13. The problem turns out to be a change in default options with 2.13 that affects dynamic linking. I'm mentioning the problem here in the hope that someone will pick up the patch I've put in with the bug report on Sourceforge (http://sf.net/tracker/?func=detail&aid=596422&group_id=5470&atid=105470) and include it as part of the next Python release. From guido@python.org Fri Sep 20 22:26:30 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 20 Sep 2002 17:26:30 -0400 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks Message-ID: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> I'd like to release something called Python 2.2.2 in a few weeks (say, around Oct 8; I like Tuesday release dates). PythonLabs has no time to do a thorough search for all backport/bugfix candidates in the trunk; if you want to help, the best thing you can do is take your favorite set of modules or core files and systematically backport anything that's clearly a bugfix and backports easily. Or you could simply make sure that your favorite bugfix is backported. I know Laura Creighton volunteered to help on behalf of the PBF, but I don't know how long she'll take, and she can surely use help. OTOH, if nobody has time, I think it's fine to release what we have in CVS on the 2.2 maintenance branch (the branch is named release22-maint). Why release now? It's been a while! It'll be almost 6 months since 2.2.1 was released. There have been a few important bugfixes (e.g. a crash with ExtensionClasses on Solaris) that have bugged real-world users. Why release what we've got? Frankly, I expect that nobody has the time to backport everything that could reasonably be backported, so if we wait for that to happen, we'll never have another release. What we've got is definitely a lot better than 2.2.1. What about Python-in-a-tie? Maybe Laura can shed light on the PBF's schedule for that; I expect it'll be much longer in the making than the planned 2.2.2 release. What about Python 2.3? Alpha by the end of 2002 is the best I can promise. What can you do? Here's a brief treatise on backporting bugs that I sent to Laura earlier: Basically, someone does the tedious part of triage, which means going over *every* 2.3 checkin message (with quick access to the corresponding diffs) and sorting them into: - already applied - trivial reject (e.g. new feature or fix for a bug introduced in 2.3) - trivial accept (pure bugfix that applies cleanly to 2.2) - messy (e.g. unclear whether it's a bugfix or a feature even after staring at the source, bugfixes that affect binary compatibility, bugfixes that can only be applied with much code wrangling due to other changes in the code at the same place, etc.) Feel free to compile a list of "messy" ones and send it to python-dev. It doesn't have to be all at once -- for big messy ones a separate python-dev discussion may be appropriate. I think it's best not use the SF trackers to suggest bugs to be backported -- this would merely be confusing, and it's a pretty heavy communication mechanism. If you want to help but don't have checkin permission, find someone who does and work with them -- or we can give you checkin permission (depending on your reputation). --Guido van Rossum (home page: http://www.python.org/~guido/) From mclay@nist.gov Fri Sep 20 23:20:47 2002 From: mclay@nist.gov (Michael McLay) Date: Fri, 20 Sep 2002 18:20:47 -0400 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209201820.47302.mclay@nist.gov> On Friday 20 September 2002 05:26 pm, Guido van Rossum wrote: > > I think it's best not use the SF trackers to suggest bugs to be > backported -- this would merely be confusing, and it's a pretty heavy > communication mechanism. If you want to help but don't have checkin > permission, find someone who does and work with them -- or we can give > you checkin permission (depending on your reputation). Perhaps Wiki pages would be a good mechanism for collaboration on the classification of patches. Create one page for each classification type and then use the patch names and title as section titles within the page. From guido@python.org Fri Sep 20 23:36:25 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 20 Sep 2002 18:36:25 -0400 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Fri, 20 Sep 2002 18:20:47 EDT." <200209201820.47302.mclay@nist.gov> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209201820.47302.mclay@nist.gov> Message-ID: <200209202236.g8KMaP124837@pcp02138704pcs.reston01.va.comcast.net> > > I think it's best not use the SF trackers to suggest bugs to be > > backported -- this would merely be confusing, and it's a pretty heavy > > communication mechanism. If you want to help but don't have checkin > > permission, find someone who does and work with them -- or we can give > > you checkin permission (depending on your reputation). > > Perhaps Wiki pages would be a good mechanism for collaboration on the > classification of patches. Create one page for each classification type and > then use the patch names and title as section titles within the page. Excellent! Let's start a 2.2.2 wiki as soon as there's anything to discuss. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Fri Sep 20 23:38:18 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 21 Sep 2002 00:38:18 +0200 Subject: [Python-Dev] Re: Re: mysterious hangs in socket code In-Reply-To: <5.1.0.14.1.20020920100536.027f1158@204.151.72.2> References: <5.1.0.14.1.20020920100536.027f1158@204.151.72.2> Message-ID: Mats Wichmann writes: > >I think that part is already settled: getaddrinfo, on Linux, is > >thread-safe. > > The latest Posix/Single UNIX spec in fact require getaddrinfo > (and getnameinfo) to be thread-safe. Yes, but this doesn't stop *BSD from providing a getaddrinfo implementation that is not thread-safe - they merely document that limitation on the man page. > Gives me a headache, too, especially once it vectors off into > the glibc nss code. These routines (__gethostbyname2_r is the > likely suspect, __gethostbyaddr_r and __getservbyname_r might > also get called) do have paths that could twiddle a glibc internal > lock so it's not impossible there's an issue here, although at > something less than a term-paper-depth look it doesn't SEEM like > it should ever be able to block for very long. I'd recommend to use strace for further analysis, perhaps using it's -r option. Regards, Martin From skip@pobox.com Fri Sep 20 23:52:13 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 20 Sep 2002 17:52:13 -0500 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209202236.g8KMaP124837@pcp02138704pcs.reston01.va.comcast.net> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209201820.47302.mclay@nist.gov> <200209202236.g8KMaP124837@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15755.42653.874623.68250@12-248-11-90.client.attbi.com> >> Perhaps Wiki pages would be a good mechanism for collaboration on the >> classification of patches. Create one page for each classification >> type and then use the patch names and title as section titles within >> the page. Guido> Excellent! Let's start a 2.2.2 wiki as soon as there's anything to Guido> discuss. It will only take me a couple minutes to create a wiki on the mojam.com server. Michael can be the first editor. ;-) Skip From exarkun@meson.dyndns.org Sat Sep 21 06:50:11 2002 From: exarkun@meson.dyndns.org (Jp Calderone) Date: Sat, 21 Sep 2002 01:50:11 -0400 Subject: [Python-Dev] Built-in functions, kw args Message-ID: <20020921055011.GA1555@meson.dyndns.org> --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline >>> ''.split(maxsplit=10) Traceback (most recent call last): File "", line 1, in ? TypeError: split() takes no keyword arguments /usr/src/Python-2.2.1/Objects$ egrep "PyArg_ParseTuple[^A]" *.c | wc -l 73 /usr/src/Python-2.2.1/Objects$ egrep PyArg_ParseTupleAndKeywords *.c | wc -l 16 It looks like the usage of ParseTuple vs ParseTupleAndKeywords is just whatever the author felt like using at the time (to me, anyway). For the sake of consistency at least (and convenience to boot), might it be nice to use PyArg_ParseTupleAndKeywords in more places -- I hesitate to say everywhere, but at least in the places it makes sense? Is there a reason not to do this? Would a patch be accepted that made it so? Jp -- 1:00am up 123 days, 1:54, 3 users, load average: 0.47, 0.54, 0.53 --Q68bSM7Ycu6FN28Q Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQE9jAiSedcO2BJA+4YRAgiVAKDNB567/2y7GKZwyCHHv3Ew76ltnwCaAnsM 7GDjNNU3ucBGFwA8Rgtqx7A= =WdYD -----END PGP SIGNATURE----- --Q68bSM7Ycu6FN28Q-- From martin@v.loewis.de Sat Sep 21 11:37:04 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 21 Sep 2002 12:37:04 +0200 Subject: [Python-Dev] Built-in functions, kw args In-Reply-To: <20020921055011.GA1555@meson.dyndns.org> References: <20020921055011.GA1555@meson.dyndns.org> Message-ID: Jp Calderone writes: > It looks like the usage of ParseTuple vs ParseTupleAndKeywords is just > whatever the author felt like using at the time (to me, anyway). For the > sake of consistency at least (and convenience to boot), might it be nice to > use PyArg_ParseTupleAndKeywords in more places -- I hesitate to say > everywhere, but at least in the places it makes sense? Is there a reason > not to do this? Would a patch be accepted that made it so? I would require more precise criteria than "in the places it makes sense". For example, if the documentation suggests that some operation has a keyword argument, this could be used as an indication that the implementation should follow. Notice that the parameter names get cast into stone that way. Regards, Martin From guido@python.org Sat Sep 21 12:37:47 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 21 Sep 2002 07:37:47 -0400 Subject: [Python-Dev] Built-in functions, kw args In-Reply-To: Your message of "Sat, 21 Sep 2002 01:50:11 EDT." <20020921055011.GA1555@meson.dyndns.org> References: <20020921055011.GA1555@meson.dyndns.org> Message-ID: <200209211137.g8LBbm225981@pcp02138704pcs.reston01.va.comcast.net> > It looks like the usage of ParseTuple vs ParseTupleAndKeywords is > just whatever the author felt like using at the time (to me, > anyway). Almost. Historically, ParseTupleAndKeywords didn't exist for many years. Also, it's much more painful to use. And it's slower. It's likely that the few occurrences you found were almost all the new class constructors, which pretty much require it. > For the sake of consistency at least (and convenience to boot), > might it be nice to use PyArg_ParseTupleAndKeywords in more places > -- I hesitate to say everywhere, but at least in the places it makes > sense? Is there a reason not to do this? Would a patch be accepted > that made it so? It would be a *humungous* patch, and it would take forever to verify that it was 100% correct. And you haven't even looked in the Modules directory. Another issue is to decide on the argument names. IOW I'm hoping you'll forget it. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Sat Sep 21 15:50:22 2002 From: skip@pobox.com (Skip Montanaro) Date: Sat, 21 Sep 2002 09:50:22 -0500 Subject: [Python-Dev] Python for OpenVMS Message-ID: <15756.34606.540052.330650@12-248-11-90.client.attbi.com> FYI, Jean-Fran=E7ois Pi=E9ronne contacted me a few days with news that = he has 2.1.3 running under OpenVMS (I updated question 7.4 of the FAQ to refer= to his work). He now has the CVS tree and is working on a port of 2.3. Skip From skip@manatee.mojam.com Sun Sep 22 13:00:20 2002 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 22 Sep 2002 07:00:20 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200209221200.g8MC0KYQ021465@manatee.mojam.com> Bug/Patch Summary ----------------- 283 open / 2867 total bugs (no change) 106 open / 1698 total patches (-2) New Bugs -------- Mac IDE Browser / ListManager issue (2002-09-16) http://python.org/sf/610149 unicode alphanumeric regexp bug (2002-09-16) http://python.org/sf/610299 SO name is too short! Python 2.2.1 (2002-09-16) http://python.org/sf/610332 linuxaudiodev not documented (2002-09-17) http://python.org/sf/610401 mhlib does not obey MHCONTEXT env var (2002-09-17) http://python.org/sf/610556 Numpy doesn't build for Python.framework (2002-09-17) http://python.org/sf/610730 Lone surrogates cause bad .pyc files (2002-09-17) http://python.org/sf/610783 SMTP.login() uses invalid base64 enc. (2002-09-18) http://python.org/sf/611052 Cannot compile escaped unicode character (2002-09-20) http://python.org/sf/612074 2 bugs in turtle.py (2002-09-21) http://python.org/sf/612595 New Patches ----------- Updated .spec file for 2.2 series. (2002-09-18) http://python.org/sf/611191 select problems on Windows (2002-09-19) http://python.org/sf/611464 zipfile.py reads archives with comments (2002-09-19) http://python.org/sf/611760 quietly select between 'less' and 'more' (2002-09-20) http://python.org/sf/612111 "Bare" text tag_configure in Tkinter (2002-09-21) http://python.org/sf/612602 Allow more Unicode on sys.stdout (2002-09-21) http://python.org/sf/612627 Closed Bugs ----------- bugs in Tix.py ListNoteBook PanedWindow (2001-11-23) http://python.org/sf/484994 multifile different in 2.2 from 2.1.1 (2002-02-07) http://python.org/sf/514676 Sig11 in cPickle (stack overflow) (2002-07-01) http://python.org/sf/576084 Empty genindex.html pages (2002-07-26) http://python.org/sf/586926 defining away __attribute__ is not good (2002-09-08) http://python.org/sf/606493 Problems in IDLE Browsers & Viewers (2002-09-12) http://python.org/sf/608595 test_b1.py, disabling of list test (2002-09-13) http://python.org/sf/609041 cPickle.BadPickleGet is a string (2002-09-13) http://python.org/sf/609164 Closed Patches -------------- Reimplementation of multifile.py (2001-04-04) http://python.org/sf/413766 MSVC Preprocessor (2001-07-15) http://python.org/sf/441528 Setup and distutils changes. (2001-08-21) http://python.org/sf/454041 Extension to Calltips / Show attributes (2002-03-03) http://python.org/sf/525109 PEP 4 update: deprecations (2002-03-18) http://python.org/sf/531491 Support PyChecker in IDLE (2002-04-03) http://python.org/sf/539043 Mac OS X keydefs (2002-09-07) http://python.org/sf/606132 configure on Irix (sockets, posix) (2002-09-13) http://python.org/sf/608999 From lac@strakt.com Mon Sep 23 11:01:20 2002 From: lac@strakt.com (Laura Creighton) Date: Mon, 23 Sep 2002 12:01:20 +0200 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Message from Guido van Rossum of "Fri, 20 Sep 2002 17:26:30 EDT." <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> > I'd like to release something called Python 2.2.2 in a few weeks (say, > around Oct 8; I like Tuesday release dates). > > PythonLabs has no time to do a thorough search for all backport/bugfix > candidates in the trunk; if you want to help, the best thing you can > do is take your favorite set of modules or core files and > systematically backport anything that's clearly a bugfix and backports > easily. Or you could simply make sure that your favorite bugfix is > backported. Ah, before you start systemetically backporting, make sure you _announce_ that you are about to do this backporting. Otherwise you will find that your favourite modules are also somebody else's favourite modules and you have both wasted time and made the eventual merge job harder. This is particularily true of people who need to make a idiom change throughout the entire code base -- they will need to modify files which aren't of any particular interest to you. If somebody is in the middle of working on those files in particular when you come through with your idiom change it is very easy for them to overwrite your change, either because it happened in a different part of the file, one that does not have their attention, or because they did not notice that, in addition to the masive refactoring which they got right as part of a backport, they also have to change an idiom. This sort of thing drives release managers nuts. Note: 'I mentioned it on the wiki someplace' is not good enough. Busy people want to get their bugfixes out and in quickly, not participate in a community or create content. Thus the loose structure of a wiki, which is its strength in community building and in building broad-based participation becomes its downfall when you actually want to quickly know what you have to do, get in, and then get out. A page where one lists such things seems a fine compromise, as long as everybody is aware that some people won't be update, or update properly, anyway. I've been reading the 2.2 cvs over the last week, trying to ge my brain around which changes go with which bugs and which features. I have found some patches where what I think happened is that in addition to adding some feature, while somebody was there they decided to fix a little unrelated ugliness at the same time. Now the task is deciding if that ugliness is also a bug. > I know Laura Creighton volunteered to help on behalf of the PBF, but I > don't know how long she'll take, and she can surely use help. OTOH, > if nobody has time, I think it's fine to release what we have in CVS > on the 2.2 maintenance branch (the branch is named release22-maint). I certainly can use help. But given your current plan to release in a few weeks, I wonder if my task might be better changed from tracing all the changes from the last release to 2.2 maintenance, to starting from 2.2 maint and seeing if there is something we _don't_ want in, which needs removing. I still don't have a good enough perspective to judge this. > > Why release now? It's been a while! It'll be almost 6 months since > 2.2.1 was released. There have been a few important bugfixes (e.g. a > crash with ExtensionClasses on Solaris) that have bugged real-world > users. Ah, that doesn't exactly explain why now -- that is in a few weeks -- rather than in one months time, or even now, that is this morning. The only problem I see on my end is if I decide to procede starting with 2.2.2 as the base release for Python-in-a-Tie and then swarms of people show up saying, well, actually I was half way through something when 2.2.2 came out, so for PIT you have to either remove stuff or add . Is now a quiet time, or do people expect a lot of that to occur? There is nothing like announcing an impending release to get a lot of code out from the woodwork -- so I guess we will find out. > > Why release what we've got? Frankly, I expect that nobody has the > time to backport everything that could reasonably be backported, so if > we wait for that to happen, we'll never have another release. What > we've got is definitely a lot better than 2.2.1. > > What about Python-in-a-tie? Maybe Laura can shed light on the PBF's > schedule for that; I expect it'll be much longer in the making than > the planned 2.2.2 release. It has to be, by definition. We need the Python for people to test their extension modules against before we can package up the extension modules. > > What about Python 2.3? Alpha by the end of 2002 is the best I can > promise. > > What can you do? Here's a brief treatise on backporting bugs that I > sent to Laura earlier: > > Basically, someone does the tedious part of triage, which means > going over *every* 2.3 checkin message (with quick access to the > corresponding diffs) and sorting them into: > > - already applied > > - trivial reject (e.g. new feature or fix for a bug introduced in > 2.3) > > - trivial accept (pure bugfix that applies cleanly to 2.2) > > - messy (e.g. unclear whether it's a bugfix or a feature even > after staring at the source, bugfixes that affect binary > compatibility, bugfixes that can only be applied with much code > wrangling due to other changes in the code at the same place, > etc.) > > Feel free to compile a list of "messy" ones and send it to > python-dev. It doesn't have to be all at once -- for big messy > ones a separate python-dev discussion may be appropriate. > > I think it's best not use the SF trackers to suggest bugs to be > backported -- this would merely be confusing, and it's a pretty heavy > communication mechanism. If you want to help but don't have checkin > permission, find someone who does and work with them -- or we can give > you checkin permission (depending on your reputation). > > --Guido van Rossum (home page: http://www.python.org/~guido/) You also pointed me at Tools/scripts/logmerge.py , which I thought I would mention in case anybody reading here isn't familiar with it. It sorts the messages produced by cvs log by date and time, rather than by file. really useful. Thank you. Laura Creighton From guido@python.org Mon Sep 23 13:07:08 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 08:07:08 -0400 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Mon, 23 Sep 2002 12:01:20 +0200." <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> Message-ID: <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> [Laura] > Ah, before you start systemetically backporting, make sure you > _announce_ that you are about to do this backporting. Otherwise you > will find that your favourite modules are also somebody else's > favourite modules and you have both wasted time and made the eventual > merge job harder. This is particularily true of people who need to > make a idiom change throughout the entire code base -- they will need > to modify files which aren't of any particular interest to you. If > somebody is in the middle of working on those files in particular when > you come through with your idiom change it is very easy for them to > overwrite your change, either because it happened in a different part of > the file, one that does not have their attention, or because > they did not notice that, in addition to the masive refactoring > which they got right as part of a backport, they also have to change > an idiom. This sort of thing drives release managers nuts. Yes, but for 2.2.2 (and in general for maintenance branches) I would never try to do any global idiom changes. Those are only for the trunk (and even then I usually frown upon them -- in our experience here, these usually introduce a few bugs in rarely-used code that don't get found out until 6-12 months mater). > Note: 'I mentioned it on the wiki someplace' is not good enough. Busy > people want to get their bugfixes out and in quickly, not participate > in a community or create content. Thus the loose structure of a wiki, > which is its strength in community building and in building > broad-based participation becomes its downfall when you actually want > to quickly know what you have to do, get in, and then get out. A page > where one lists such things seems a fine compromise, as long as everybody > is aware that some people won't be update, or update properly, anyway. OTOH I don't want this to generate tons of messages to python-dev with little more content than announcements that C is going to look at module Y. > I've been reading the 2.2 cvs over the last week, trying to ge my > brain around which changes go with which bugs and which features. I > have found some patches where what I think happened is that in > addition to adding some feature, while somebody was there they > decided to fix a little unrelated ugliness at the same time. Now > the task is deciding if that ugliness is also a bug. And then you still need to decide whether you want that bug fixed in 2.2. How thoroughly fixed does 2.2 need to be? > > I know Laura Creighton volunteered to help on behalf of the PBF, but I > > don't know how long she'll take, and she can surely use help. OTOH, > > if nobody has time, I think it's fine to release what we have in CVS > > on the 2.2 maintenance branch (the branch is named release22-maint). > > I certainly can use help. > > But given your current plan to release in a few weeks, I wonder if my > task might be better changed from tracing all the changes from the > last release to 2.2 maintenance, to starting from 2.2 maint and > seeing if there is something we _don't_ want in, which needs removing. > I still don't have a good enough perspective to judge this. I very much doubt that anything would have crept into 2.2 cvs that we don't want. Or are you talking from the PBF POV and could they be more conservative for Py-tie than we've been with 2.2.2? > > Why release now? It's been a while! It'll be almost 6 months since > > 2.2.1 was released. There have been a few important bugfixes (e.g. a > > crash with ExtensionClasses on Solaris) that have bugged real-world > > users. > > Ah, that doesn't exactly explain why now -- that is in a few weeks -- > rather than in one months time, or even now, that is this morning. I meant "why stop procrastinating". :-) The ~two-week period was chosen to give people enough notice but not be so far in the future that procrastinators will say to themselves "I'll think about it closer to the release." Two weeks seems just about right based on my experience in this group. > The only problem I see on my end is if I decide to procede starting > with 2.2.2 as the base release for Python-in-a-Tie and then swarms > of people show up saying, well, actually I was half way through something > when 2.2.2 came out, so for PIT you have to either remove > stuff or add . Is now a quiet time, or do people expect > a lot of that to occur? There is nothing like announcing an impending > release to get a lot of code out from the woodwork -- so I guess we > will find out. Halfway through with what? I would expect that checkins would be complete sets. And if someone just *has* to check in stuff that requires some more work, they should let us know so we can hold up the release for them or make it a priority to fix it (by backing out or finishing those changes). Given the nature of most changes that need to be backported this is unlikely -- almost all of them are small fixes to one file. > > Why release what we've got? Frankly, I expect that nobody has the > > time to backport everything that could reasonably be backported, so if > > we wait for that to happen, we'll never have another release. What > > we've got is definitely a lot better than 2.2.1. > > > > What about Python-in-a-tie? Maybe Laura can shed light on the PBF's > > schedule for that; I expect it'll be much longer in the making than > > the planned 2.2.2 release. > > It has to be, by definition. We need the Python for people to test > their extension modules against before we can package up the > extension modules. Can you tell us more here about the Py-tie plans? I know nothing about it except that it'll be based on Python 2.2; I think it would be helpful for the developer community to know what the long-term Py-tie plans are. > > What about Python 2.3? Alpha by the end of 2002 is the best I can > > promise. > > > > What can you do? Here's a brief treatise on backporting bugs that I > > sent to Laura earlier: > > > > Basically, someone does the tedious part of triage, which means > > going over *every* 2.3 checkin message (with quick access to the > > corresponding diffs) and sorting them into: > > > > - already applied > > > > - trivial reject (e.g. new feature or fix for a bug introduced in > > 2.3) > > > > - trivial accept (pure bugfix that applies cleanly to 2.2) > > > > - messy (e.g. unclear whether it's a bugfix or a feature even > > after staring at the source, bugfixes that affect binary > > compatibility, bugfixes that can only be applied with much code > > wrangling due to other changes in the code at the same place, > > etc.) > > > > Feel free to compile a list of "messy" ones and send it to > > python-dev. It doesn't have to be all at once -- for big messy > > ones a separate python-dev discussion may be appropriate. > > > > I think it's best not use the SF trackers to suggest bugs to be > > backported -- this would merely be confusing, and it's a pretty heavy > > communication mechanism. If you want to help but don't have checkin > > permission, find someone who does and work with them -- or we can give > > you checkin permission (depending on your reputation). > > You also pointed me at Tools/scripts/logmerge.py , which I thought > I would mention in case anybody reading here isn't familiar with it. > It sorts the messages produced by cvs log by date and time, rather > than by file. really useful. Thank you. You're welcome. (At times I've wanted an addition to logmerge that would restrict it to a certain branch; but I've not wanted it enough to implement it. I think you'd have to mine the "tags" output from cvs log for each file to know the branch point and then act accordingly for the revisions of that file.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Mon Sep 23 14:00:19 2002 From: mwh@python.net (Michael Hudson) Date: 23 Sep 2002 14:00:19 +0100 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Guido van Rossum's message of "Fri, 20 Sep 2002 17:26:30 -0400" References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2m1y7kyi70.fsf@starship.python.net> Guido van Rossum writes: > I'd like to release something called Python 2.2.2 in a few weeks (say, > around Oct 8; I like Tuesday release dates). Cool. I have a mailbox containing about 50 checkins that I thought deserved backporting at some point. I'll try to grind through that by the end of the week (i.e. by 28/9). There's no way I'm doing the "pore over logs" duty this time. Cheers, M. -- [1] If you're lost in the woods, just bury some fibre in the ground carrying data. Fairly soon a JCB will be along to cut it for you - follow the JCB back to civilsation/hitch a lift. -- Simon Burr, cam.misc From David Abrahams" <2m1y7kyi70.fsf@starship.python.net> Message-ID: <0a0201c26300$924fd350$6701a8c0@boostconsulting.com> > Guido van Rossum writes: > > > I'd like to release something called Python 2.2.2 in a few weeks (say, > > around Oct 8; I like Tuesday release dates). I've been planning to release Boost.Python v2 around the same time. Is there any chance we can coordinate this so that we Boost.Python people can test against all of the backported changes before either of these products "goes final"? ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From guido@python.org Mon Sep 23 14:19:29 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 09:19:29 -0400 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Mon, 23 Sep 2002 14:00:19 BST." <2m1y7kyi70.fsf@starship.python.net> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <2m1y7kyi70.fsf@starship.python.net> Message-ID: <200209231319.g8NDJTs06610@pcp02138704pcs.reston01.va.comcast.net> > I have a mailbox containing about 50 checkins that I thought deserved > backporting at some point. I'll try to grind through that by the end > of the week (i.e. by 28/9). Great! If you run out of time, you can mail me that mailbox. > There's no way I'm doing the "pore over logs" duty this time. And nobody expects you to. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Sep 23 14:22:14 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 09:22:14 -0400 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Mon, 23 Sep 2002 08:54:05 EDT." <0a0201c26300$924fd350$6701a8c0@boostconsulting.com> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <2m1y7kyi70.fsf@starship.python.net> <0a0201c26300$924fd350$6701a8c0@boostconsulting.com> Message-ID: <200209231322.g8NDMGX06641@pcp02138704pcs.reston01.va.comcast.net> > > > I'd like to release something called Python 2.2.2 in a few weeks (say, > > > around Oct 8; I like Tuesday release dates). > > I've been planning to release Boost.Python v2 around the same > time. Is there any chance we can coordinate this so that we > Boost.Python people can test against all of the backported changes > before either of these products "goes final"? If you check out the release22-maint branch of Python from CVS and subscribe to the python-checkins list (http://mail.python.org/mailman/listinfo/python-checkins) you should be able to track the work leading up to 2.2.2 pretty closely. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Mon Sep 23 14:50:49 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 23 Sep 2002 09:50:49 -0400 (EDT) Subject: [Python-Dev] -zcombreloc Message-ID: <200209231350.g8NDonx28099@europa.research.att.com> I now believe, after discussion with the binutils developers, that the problem is that -zcombreloc just plain doesn't work in any release of binutils. One implication of this fact is that turning on -znocombreloc when building python is insufficient to make it work if you built gcc with -zcombreloc, because that build will write dynamic libraries that the Sun loader cannot handle no matter what. So I think you are right -- the correct fix is to warn people that they should not use binutils 2.13 on Solaris, period. I believe that they will fix the problem in 2.13.1 and will let you know. Meanwhile, might I suggest including the following test somewhere in the build procedure? If it fails, I believe it will not be possible to build Python successfully, so one might as well find about it early.... If you run this and the output includes "core dumped', it failed :-) -------------------------cut here---------------------- #! /bin/sh mkdir /tmp/t.$$ || exit 3 cd /tmp/t.$$ || exit 3 cat >main.c <<'EOF' #include #include int main(void) { void *handle, *sym; char *error; puts("calling dlopen"); handle = dlopen("./dyn.so", RTLD_NOW); if (!handle) { printf("%s\n", dlerror()); return 1; } puts("calling dlsym"); sym = dlsym(handle, "sym"); if ((error = dlerror()) != 0) { printf("%s\n", error); return 1; } puts("calling sym"); ((void (*)(void))sym)(); puts("done"); return 0; } EOF cat >dyn.c <<'EOF' #include void sym(void) { puts("in sym"); } EOF [ -n "$SHFLAGS" ] || SHFLAGS="-fPIC -shared" [ -n "$CC" ] || CC=gcc set -x $CC $CFLAGS $SHFLAGS dyn.c -o dyn.so $CC $CFLAGS main.c -o main -ldl ./main || exit $? cd /tmp rm -rf t.$$ From guido@python.org Mon Sep 23 15:03:42 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 10:03:42 -0400 Subject: [Python-Dev] -zcombreloc In-Reply-To: Your message of "Mon, 23 Sep 2002 09:50:49 EDT." <200209231350.g8NDonx28099@europa.research.att.com> References: <200209231350.g8NDonx28099@europa.research.att.com> Message-ID: <200209231403.g8NE3hh06996@pcp02138704pcs.reston01.va.comcast.net> > Meanwhile, might I suggest including the following test somewhere > in the build procedure? If it fails, I believe it will not be > possible to build Python successfully, so one might as well find > about it early.... > > If you run this and the output includes "core dumped', it failed :-) I hope someone does this. In the mean time, I've added a warning about binutils 2.13 to the README file (and also to the 2.2.2 branch README file). --Guido van Rossum (home page: http://www.python.org/~guido/) From lac@strakt.com Mon Sep 23 15:42:34 2002 From: lac@strakt.com (Laura Creighton) Date: Mon, 23 Sep 2002 16:42:34 +0200 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Message from Guido van Rossum of "Mon, 23 Sep 2002 08:07:08 EDT." <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> > > > > But given your current plan to release in a few weeks, I wonder if my > > task might be better changed from tracing all the changes from the > > last release to 2.2 maintenance, to starting from 2.2 maint and > > seeing if there is something we _don't_ want in, which needs removing. > > I still don't have a good enough perspective to judge this. > > I very much doubt that anything would have crept into 2.2 cvs that we > don't want. Or are you talking from the PBF POV and could they be > more conservative for Py-tie than we've been with 2.2.2? I sure don't think so. But it looks now to me as if the new plan of attack for making a PyTie release is to start with 2.2.2 and then see what absolutely needs to be added to that, as opposed to my old approach, which was to trace from 2.2.1 until 2.3 labelling things that _shouldn't_ be included. Does this make sense, or was my old plan of attack better? > Can you tell us more here about the Py-tie plans? I know nothing > about it except that it'll be based on Python 2.2; I think it would be > helpful for the developer community to know what the long-term Py-tie > plans are. If there is consensus that making Py-Tie out of 2.2.2 plus a list of things that have to be added/fixed, then the thing to do is to start the Snake Farm testing 2.2maints. Then we need to get a list of what software that isn't part of the standard library should be included in PyTie. Then we have to test that against the PyTie candidate. We're working on a way to add that to the snakefarm builds now. The long term plans are to fix serious bugs in the release if they should be discovered, not only in Python but in any third party modules. Also we are working on how to license the whole thing, given that every extra bit has its own particular license. Laura From guido@python.org Mon Sep 23 16:07:35 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 11:07:35 -0400 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Mon, 23 Sep 2002 16:42:34 +0200." <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> Message-ID: <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> > > > But given your current plan to release in a few weeks, I wonder if my > > > task might be better changed from tracing all the changes from the > > > last release to 2.2 maintenance, to starting from 2.2 maint and > > > seeing if there is something we _don't_ want in, which needs removing. > > > I still don't have a good enough perspective to judge this. > > > > I very much doubt that anything would have crept into 2.2 cvs that we > > don't want. Or are you talking from the PBF POV and could they be > > more conservative for Py-tie than we've been with 2.2.2? > > I sure don't think so. But it looks now to me as if the new plan of > attack for making a PyTie release is to start with 2.2.2 and then see > what absolutely needs to be added to that, as opposed to my old approach, > which was to trace from 2.2.1 until 2.3 labelling things that _shouldn't_ > be included. Does this make sense, or was my old plan of attack better? I would never have suggested labeling things that *shouldn't* be included; it's better to label things that *should* be included. Whether your criteria for inclusion is "absolutely must have" or "would be nice" depends on how much time you have and what the PBF's real goal is. I would suggest that if your primary goal is stability, being conservative is probably right; everything that's not very clearly a pure bugfix should be frowned upon. > > Can you tell us more here about the Py-tie plans? I know nothing > > about it except that it'll be based on Python 2.2; I think it would be > > helpful for the developer community to know what the long-term Py-tie > > plans are. > > If there is consensus that making Py-Tie out of 2.2.2 plus a list > of things that have to be added/fixed, then the thing to do is to > start the Snake Farm testing 2.2maints. Then we need to get a list > of what software that isn't part of the standard library should be > included in PyTie. Then we have to test that against the PyTie candidate. > We're working on a way to add that to the snakefarm builds now. The > long term plans are to fix serious bugs in the release if they should be > discovered, not only in Python but in any third party modules. Also > we are working on how to license the whole thing, given that every > extra bit has its own particular license. Thanks. In addition, I was hoping to hear about your timeline (when do you expect to release PyTie?) and a hint on the 3rd party packages you're thinking of adding. Also a list of target platforms for which PyTie must absolutely work. (Note e.g. that we just discovered a problem with Solaris and the latest version of binutils (2.13), which seems to be used by the latest GCC version (3.2 IIRC) but is also separately downloadable. The bug is in binutils 2.13. Is this *combination* (Solaris + binutils 2.13) a target platform? If so, you might want to use a different approach than we plan to do for Python 2.3 and 2.2.2 (which is merely to bail out if a certain test dumps core during configuration). --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Mon Sep 23 16:30:42 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 23 Sep 2002 11:30:42 -0400 (EDT) Subject: [Python-Dev] binutils/solaris -- one more thing Message-ID: <200209231530.g8NFUgu28520@europa.research.att.com> Assuming that the binutils developers do conclude that the -zcombreloc problem is to be fixed, not worked around (as I think they will), there is still one more binutils-related build problem that I encountered with Solaris: At binutils 2.12, the output from "ld -V" changed in a way that invalidated the previous way of testing for the presence of dynamic linking. Someone--I forget who--gave me a patch that solved the problem; I believe that this patch is necessary to build Python on Solaris with binutils 2.12 or later. Can I ask someone to check whether it is already part of 2.2.2? -------- *** configure.in 2002-09-23 10:07:42.559545843 -0400 --- configure.in.new 2002-09-23 10:08:32.944415830 -0400 *************** *** 889,895 **** fi;; SunOS/5*) case $CC in *gcc*) ! if $CC -Xlinker -V 2>&1 | grep BFD >/dev/null then LINKFORSHARED="-Xlinker --export-dynamic" fi;; --- 889,895 ---- fi;; SunOS/5*) case $CC in *gcc*) ! if $CC -Xlinker --help 2>&1 | grep export-dynamic >/dev/null then LINKFORSHARED="-Xlinker --export-dynamic" fi;; From guido@python.org Mon Sep 23 16:33:04 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 11:33:04 -0400 Subject: [Python-Dev] binutils/solaris -- one more thing In-Reply-To: Your message of "Mon, 23 Sep 2002 11:30:42 EDT." <200209231530.g8NFUgu28520@europa.research.att.com> References: <200209231530.g8NFUgu28520@europa.research.att.com> Message-ID: <200209231533.g8NFX4I10033@pcp02138704pcs.reston01.va.comcast.net> > Assuming that the binutils developers do conclude that the -zcombreloc > problem is to be fixed, not worked around (as I think they will), > there is still one more binutils-related build problem that I > encountered with Solaris: At binutils 2.12, the output from "ld -V" > changed in a way that invalidated the previous way of testing for > the presence of dynamic linking. > > Someone--I forget who--gave me a patch that solved the problem; > I believe that this patch is necessary to build Python on Solaris > with binutils 2.12 or later. Can I ask someone to check whether > it is already part of 2.2.2? > > > -------- > *** configure.in 2002-09-23 10:07:42.559545843 -0400 > --- configure.in.new 2002-09-23 10:08:32.944415830 -0400 > *************** > *** 889,895 **** > fi;; > SunOS/5*) case $CC in > *gcc*) > ! if $CC -Xlinker -V 2>&1 | grep BFD >/dev/null > then > LINKFORSHARED="-Xlinker --export-dynamic" > fi;; > --- 889,895 ---- > fi;; > SunOS/5*) case $CC in > *gcc*) > ! if $CC -Xlinker --help 2>&1 | grep export-dynamic >/dev/null > then > LINKFORSHARED="-Xlinker --export-dynamic" > fi;; But what if this code is used with a version of binutils prior to 2.12? --Guido van Rossum (home page: http://www.python.org/~guido/) From hu.peress@mail.mcgill.ca Mon Sep 23 16:38:13 2002 From: hu.peress@mail.mcgill.ca (Hunter Peress) Date: 23 Sep 2002 10:38:13 -0500 Subject: [Python-Dev] os.wait unweirding In-Reply-To: <004701c25528$7c8b4530$ced241d5@hagrid> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <004701c25528$7c8b4530$ced241d5@hagrid> Message-ID: <1032795494.16226.478.camel@HillCountryPeress> for i in a: print os.spawnv(os.P_NOWAIT,scr,["",str(i)]) for i in a: os.wait() you have to do the second loop in order to wait for all children that u spawned off. I think that os.wait() without any arguments should wait for all chilren, not wait for the earliest executed child. On Thu, 2002-09-05 at 17:06, Fredrik Lundh wrote: > hunter wrote: > > > I need not search far. > > example 1) pydoc os.fork > > Python Library Documentation: built-in function fork in os > > fork(...) > > fork() -> pid > > Fork a child process. > > > > Return 0 to child process and PID of child to parent process. > > why do you care about the type of a PID object? in most > cases, all you need to know is that a PID isn't 0, which is > exactly what the documentation says. > > and if you know what a PID is, you already know what type > it is... > > > example2) pydoc string.index > > Python Library Documentation: function index in string > > index(s, *args) > > index(s, sub [,start [,end]]) -> int > > > > Like find but raises ValueError when the substring is not found. > > > > From these two, I have no idea what BOTH the input and return > > types are. > > the index documentation refers to the documentation > for "find", which tells you that: > > >>> help(string.find) > Help on function find in module string: > > find(s, *args) > find(s, sub [,start [,end]]) -> in > > Return the lowest index in s where substring sub is found, > such that sub is contained within s[start,end]. Optional > arguments start and end are interpreted as in slice notation. > > Return -1 on failure. > > which, given that you know how indexes and slices work in > python, is all you need to know. > > > I found those examples in 10 seconds (literally). The state of the > > python documentation is caca. > > how long have you been using Python? > > > > From ark@research.att.com Mon Sep 23 16:40:11 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 23 Sep 2002 11:40:11 -0400 (EDT) Subject: [Python-Dev] binutils/solaris -- one more thing In-Reply-To: <200209231533.g8NFX4I10033@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Mon, 23 Sep 2002 11:33:04 -0400) References: <200209231530.g8NFUgu28520@europa.research.att.com> <200209231533.g8NFX4I10033@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209231540.g8NFeBU28773@europa.research.att.com> Guido> But what if this code is used with a version of binutils prior to Guido> 2.12? It should still work -- it asks ld for a list of options that it supports and looks for "export-dynamic" in the list. If the list of supported options doesn't contain "export-dynamic", then the build procedure had better not supply "export-dynamic" as an option, had it? :-) In other words, I believe that the patch replaces a test that works only for 2.11 and earlier with a slightly more elaborate test that works for all versions. From neal@metaslash.com Mon Sep 23 16:41:13 2002 From: neal@metaslash.com (Neal Norwitz) Date: Mon, 23 Sep 2002 11:41:13 -0400 Subject: [Python-Dev] binutils/solaris -- one more thing References: <200209231530.g8NFUgu28520@europa.research.att.com> <200209231533.g8NFX4I10033@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D8F3619.19FB7B80@metaslash.com> Guido van Rossum wrote: > > > Assuming that the binutils developers do conclude that the -zcombreloc > > problem is to be fixed, not worked around (as I think they will), > > there is still one more binutils-related build problem that I > > encountered with Solaris: At binutils 2.12, the output from "ld -V" > > changed in a way that invalidated the previous way of testing for > > the presence of dynamic linking. > > > > Someone--I forget who--gave me a patch that solved the problem; > > I believe that this patch is necessary to build Python on Solaris > > with binutils 2.12 or later. Can I ask someone to check whether > > it is already part of 2.2.2? I believe it was Martin that provided the patch. And this patch is in 2.2.2. [Guido] > But what if this code is used with a version of binutils prior to > 2.12? On Linux (but I think it's the same on Solaris): [neal@epoch src]$ ld -V GNU ld version 2.11.90.0.8 (with BFD 2.11.90.0.8) [neal@epoch src]$ gcc -Xlinker --help 2>&1 | grep export-dynamic -E, --export-dynamic Export all dynamic symbols Neal From ark@research.att.com Mon Sep 23 16:44:21 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 23 Sep 2002 11:44:21 -0400 (EDT) Subject: [Python-Dev] binutils/solaris -- one more thing In-Reply-To: <3D8F3619.19FB7B80@metaslash.com> (message from Neal Norwitz on Mon, 23 Sep 2002 11:41:13 -0400) References: <200209231530.g8NFUgu28520@europa.research.att.com> <200209231533.g8NFX4I10033@pcp02138704pcs.reston01.va.comcast.net> <3D8F3619.19FB7B80@metaslash.com> Message-ID: <200209231544.g8NFiLI28812@europa.research.att.com> Neal> I believe it was Martin that provided the patch. Neal> And this patch is in 2.2.2. Thank you! Neal> On Linux (but I think it's the same on Solaris): Neal> [neal@epoch src]$ ld -V Neal> GNU ld version 2.11.90.0.8 (with BFD 2.11.90.0.8) Neal> [neal@epoch src]$ gcc -Xlinker --help 2>&1 | grep export-dynamic Neal> -E, --export-dynamic Export all dynamic symbols And that's the reason for the patch: [europa] ld -V GNU ld version 2.12.1 Supported emulations: elf32_sparc elf64_sparc [europa] gcc -Xlinker --help 2>&1 | grep export-dynamic -E, --export-dynamic Export all dynamic symbols From guido@python.org Mon Sep 23 16:47:13 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 11:47:13 -0400 Subject: [Python-Dev] binutils/solaris -- one more thing In-Reply-To: Your message of "Mon, 23 Sep 2002 11:30:42 EDT." <200209231530.g8NFUgu28520@europa.research.att.com> References: <200209231530.g8NFUgu28520@europa.research.att.com> Message-ID: <200209231547.g8NFlDv10207@pcp02138704pcs.reston01.va.comcast.net> > Someone--I forget who--gave me a patch that solved the problem; > I believe that this patch is necessary to build Python on Solaris > with binutils 2.12 or later. Can I ask someone to check whether > it is already part of 2.2.2? Duh. It's already in CVS for 2.2.2 and 2.3. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Mon Sep 23 16:51:57 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 23 Sep 2002 11:51:57 -0400 (EDT) Subject: [Python-Dev] binutils/solaris -- one more thing In-Reply-To: <200209231547.g8NFlDv10207@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Mon, 23 Sep 2002 11:47:13 -0400) References: <200209231530.g8NFUgu28520@europa.research.att.com> <200209231547.g8NFlDv10207@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209231551.g8NFpvm28837@europa.research.att.com> Guido> Duh. It's already in CVS for 2.2.2 and 2.3. Thanks! From guido@python.org Mon Sep 23 16:42:09 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 11:42:09 -0400 Subject: [Python-Dev] os.wait unweirding In-Reply-To: Your message of "Mon, 23 Sep 2002 10:38:13 CDT." <1032795494.16226.478.camel@HillCountryPeress> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <004701c25528$7c8b4530$ced241d5@hagrid> <1032795494.16226.478.camel@HillCountryPeress> Message-ID: <200209231542.g8NFg9710148@pcp02138704pcs.reston01.va.comcast.net> > for i in a: > print os.spawnv(os.P_NOWAIT,scr,["",str(i)]) > > for i in a: > os.wait() > > you have to do the second loop in order to wait for all children that u > spawned off. I think that os.wait() without any arguments should wait > for all chilren, not wait for the earliest executed child. Go talk to the designers of Unix and the POSIX standard committee. --Guido van Rossum (home page: http://www.python.org/~guido/) From hu.peress@mail.mcgill.ca Mon Sep 23 17:00:12 2002 From: hu.peress@mail.mcgill.ca (Hunter Peress) Date: 23 Sep 2002 11:00:12 -0500 Subject: [Python-Dev] os.wait unweirding. impetus In-Reply-To: <200209231542.g8NFg9710148@pcp02138704pcs.reston01.va.comcast.net> References: <1031437860.636.29.camel@HillCountryPeress> <1031442464.644.68.camel@HillCountryPeress> <003d01c25471$d83fe960$2fd8accf@othello> <1031451760.644.97.camel@HillCountryPeress> <004701c25528$7c8b4530$ced241d5@hagrid> <1032795494.16226.478.camel@HillCountryPeress> <200209231542.g8NFg9710148@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1032796812.16226.506.camel@HillCountryPeress> The idea came from bash. where the wait command will wait for all child processes. Its clearly doable. On Mon, 2002-09-23 at 10:42, Guido van Rossum wrote: > > for i in a: > > print os.spawnv(os.P_NOWAIT,scr,["",str(i)]) > > > > for i in a: > > os.wait() > > > > you have to do the second loop in order to wait for all children that u > > spawned off. I think that os.wait() without any arguments should wait > > for all chilren, not wait for the earliest executed child. > > Go talk to the designers of Unix and the POSIX standard committee. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > From lac@strakt.com Mon Sep 23 17:05:13 2002 From: lac@strakt.com (Laura Creighton) Date: Mon, 23 Sep 2002 18:05:13 +0200 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Message from Guido van Rossum of "Mon, 23 Sep 2002 11:07:35 EDT." <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> > > Thanks. In addition, I was hoping to hear about your timeline (when > do you expect to release PyTie?) and a hint on the 3rd party packages > you're thinking of adding. Also a list of target platforms for which > PyTie must absolutely work. (Note e.g. that we just discovered a > problem with Solaris and the latest version of binutils (2.13), which > seems to be used by the latest GCC version (3.2 IIRC) but is also > separately downloadable. The bug is in binutils 2.13. Is this > *combination* (Solaris + binutils 2.13) a target platform? If so, you > might want to use a different approach than we plan to do for Python > 2.3 and 2.2.2 (which is merely to bail out if a certain test dumps > core during configuration). > > --Guido van Rossum (home page: http://www.python.org/~guido/) Do you know when a fixed binutils is due? This may explain one bug report I have right here at Strakt. Solaris is the preferred platform of our beta-testing customer, Chalmers, so I would like PyTie to run on as many Solaris-including hardware and software platforms as possible. I'll bring this up in a meeting. Right now are you advising people not to use GCC 3.2 or binutils 2.13? or do you have other advice for them which you can steer me towards? Laura From ark@research.att.com Mon Sep 23 17:24:45 2002 From: ark@research.att.com (Andrew Koenig) Date: 23 Sep 2002 12:24:45 -0400 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> Message-ID: Laura> Do you know when a fixed binutils is due? This may explain one Laura> bug report I have right here at Strakt. Solaris is the Laura> preferred platform of our beta-testing customer, Chalmers, so I Laura> would like PyTie to run on as many Solaris-including hardware Laura> and software platforms as possible. I'll bring this up in a Laura> meeting. Right now are you advising people not to use GCC 3.2 Laura> or binutils 2.13? or do you have other advice for them which Laura> you can steer me towards? On my machine, gcc 3.2 works just fine -- it's binutils 2.13 that is the culprit. Use 2.12.1 instead (but be sure to install the configure.in patch I posted earlier today). I hope to get a patch from the binutils developers today for testing; if it works, I expect that the patch will be in binutils 2.13.1, which I understand is to be released shortly. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From martin@v.loewis.de Mon Sep 23 17:43:08 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 23 Sep 2002 18:43:08 +0200 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> Message-ID: Laura Creighton writes: > Do you know when a fixed binutils is due? The bug hasn't been acknowledged by binutils maintainers yet; gcc maintainers report many problems with binutils, but have not identified any specific problem. So, unless somebody looks down into the details and studies the resulting binaries, it may be a matter of months for a fix to appear. Until then, binutils 2.13 should be avoided on Solaris. > Right now are you advising people not to use GCC 3.2 or binutils > 2.13? or do you have other advice for them which you can steer me > towards? gcc 3.2 is fine, binutils 2.13 is not - use 2.12 or the system tools instead. Regards, Martin From martin@v.loewis.de Mon Sep 23 17:46:39 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 23 Sep 2002 18:46:39 +0200 Subject: [Python-Dev] binutils/solaris -- one more thing In-Reply-To: <3D8F3619.19FB7B80@metaslash.com> References: <200209231530.g8NFUgu28520@europa.research.att.com> <200209231533.g8NFX4I10033@pcp02138704pcs.reston01.va.comcast.net> <3D8F3619.19FB7B80@metaslash.com> Message-ID: Neal Norwitz writes: > I believe it was Martin that provided the patch. > And this patch is in 2.2.2. All correct. Regards, Martin From mats@laplaza.org Mon Sep 23 17:43:41 2002 From: mats@laplaza.org (Mats Wichmann) Date: Mon, 23 Sep 2002 10:43:41 -0600 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <20020923151002.31927.90702.Mailman@mail.python.org> Message-ID: <5.1.0.14.1.20020923103904.027d4298@204.151.72.2> >Thanks. In addition, I was hoping to hear about your timeline (when >do you expect to release PyTie?) and a hint on the 3rd party packages >you're thinking of adding. Also a list of target platforms for which >PyTie must absolutely work. (Note e.g. that we just discovered a >problem with Solaris and the latest version of binutils (2.13), which >seems to be used by the latest GCC version (3.2 IIRC) but is also >separately downloadable. The bug is in binutils 2.13. Is this >*combination* (Solaris + binutils 2.13) a target platform? If so, you >might want to use a different approach than we plan to do for Python >2.3 and 2.2.2 (which is merely to bail out if a certain test dumps >core during configuration). gcc 3.2 requires binutils 2.12 or better; it doesn't have a specific requirement on 2.13. However the sunfreeware bundle may bump bintuils to 2.13 (I haven't checked what's going on there for a long time). Sadly, the 2.13 release message indicates the purpose is (only) to support a new platform and says nothing about clever little modifications behind the scenes, like changing default linker options so that relocation tables are built differently (sigh). From ark@research.att.com Mon Sep 23 18:01:12 2002 From: ark@research.att.com (Andrew Koenig) Date: 23 Sep 2002 13:01:12 -0400 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> Message-ID: Martin> So, unless somebody looks down into the details and studies Martin> the resulting binaries, it may be a matter of months for a fix Martin> to appear. Until then, binutils 2.13 should be avoided on Martin> Solaris. Also, note that if you already have binutils 2.13, it is not enough just to reinstall 2.12; you have to rebuild gcc also. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From bkc@murkworks.com Mon Sep 23 18:29:20 2002 From: bkc@murkworks.com (Brad Clements) Date: Mon, 23 Sep 2002 13:29:20 -0400 Subject: [Python-Dev] Need advice: cloning python cvs for CE project Message-ID: <3D8F173B.11372.4250401E@localhost> There are now 4 different people working on the "python ce" project. We've been passing around build tree .zips and it's getting out of hand. I think we should setup our own CVS somewhere so we can work out all the CE kinks before submitting patches to the core Python CVS. Is it possible to maintain a single working directory that can be checked into two different CVS systems? I really have no idea what the proper way is to do this.. Looking for recommendations, including "don't do that, do this instead". Thanks Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From guido@python.org Mon Sep 23 18:41:14 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 13:41:14 -0400 Subject: [Python-Dev] PEP 282 Implementation In-Reply-To: Your message of "Wed, 28 Aug 2002 11:45:11 BST." <006601c24e7f$fcff5440$652b6992@alpha> References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> Message-ID: <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> I'm sorry that this seems to be a thread with one message per month! I'll try to be more responsive from now on, the big Zope projects that were keeping me busy have given me some slack time. > > In general the code looks good. Only one style nits: I prefer > > docstrings that have a one-line summary, then a blank line, and then a > > longer description. > > I will update the docstrings as per your feedback. Great! (When can we see a new release on http://www.red-dove.com/python_logging.html ?) > > There's a lot of code there! Should it perhaps be broken up into > > different modules? Perhaps it should become a logging *package* with > > submodules that define the various filters and handlers. > > How strongly do you feel about this? I did think about doing this > and in fact the first implementation of the module was as a > package. I found this a little more cumbersome than the single-file > solution, and reimplemented as logging.py. The module is a little on > the large side but the single-file organization makes it a little > easier to use. I would feel much less strongly about this if several of the additional things could be moved to separate files without making it a package. > > - Why does the FileHandler open the file with mode "a+" (and later > > with "w+")? The "+" makes the file readable, but I see no > > reason to read it. Am I missing? > > No, you're right - using "a" and "w" should work. I'll change the > code to lose the "+". OK. > > - setRollover(): the explanation isn't 100% clear. I *think* that > > you always write to "app.log", and when that's full, you rename > > it to app.log.1, and app.log.1 gets renamed to app.log.2, and so > > on, and then you start writing to a new app.log, right? > > Yes. The original implementation was different - it just closed the > current file and opened a new file app.log.n. The current > implementation is slightly slower due to the need to rename several > files, but the user can tell more easily which the latest log file > is. I will update the setRollover() docstring to indicate more > clearly how it works; I'm assuming that the current algorithm is > deemed good enough. Yes, this seems how log rotation is generally done. (Please remove the commented-out old code.) > > - class SocketHandler: why set yourself up for buffer overflow by > > using only 2 bytes for the packet size? You can use the struct > > module to encode/decode this, BTW. I also wonder what the > > application for this is, BTW. > > I agree about the 2-byte limit. I can change it to use struct and an > integer length. The application for encoding the length is simply to > allow a socket-based server to handle multiple events sent by > SocketHandler, in the event that the connection is kept open as long > as possible and not shut down after every event. OK, please change it to a 4-byte length header. I understand why you need the length header; I'm just curious about the need for a socket server. > > - method send(): in Python 2.2 and later, you can use the > > sendall() socket method which takes care of this loop for you. > > OK. I can update the code to use this in the case of 2.2 and later. Especially since this is slated to go into 2.3 only. :-) > > - class DatagramHandler, method send(): I don't think UDP handles > > fragmented packets very well -- if you have to break the packet up, > > there's no guarantee that the receiver will see the parts in order > > (or even all of them). > > You're absolutely right - I wasn't thinking clearly enough about how > UDP actually works. I will replace the loop with a single sendto() > call. The length header might still be useful just to be format-compatible with the TCP variant though. > > - fileConfig(): Is there documentation for the configuration file? > > There is some documentation in the python_logging.html file which is > part of the distribution and also on the Web at > http://www.red-dove.com/python_logging.html - it's in the form of > comments in an annotated logconf.ini. I have not polished the > documentation in this area as I'm not sure how much of the > configuration stuff should be in the logging module itself. Feedback > I've had indicates that at least some people object moderately > strongly to having a particular configuration design forced on > them. I'd appreciate views on this. This is an example of something that I'd like to see relegated to a separate file. It really looks like fileConfig(), listen() and stopListening() are a separate feature bundle that looks like it is a specific example application rather than a core feature of the logging module. It certainly doesn't appear in PEP 282. Maybe the socket handler classes belong in the same category. Of course, the same can be said about all Handler subclasses except StreamHandler. Only StreamHandler is referenced by basicConfig(). Perhaps these should all (except StreamHandler) be moved to separate files? This sounds like a reason to make it a package. The main logging code could be in the __init__.py file -- there's no rule that says __init__.py should be empty or short! PS. In your comments you seem fond of the word "needful". I've rarely heard that word -- perhaps it is archaic or common only in India? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Sep 23 18:44:20 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 13:44:20 -0400 Subject: [Python-Dev] Need advice: cloning python cvs for CE project In-Reply-To: Your message of "Mon, 23 Sep 2002 13:29:20 EDT." <3D8F173B.11372.4250401E@localhost> References: <3D8F173B.11372.4250401E@localhost> Message-ID: <200209231744.g8NHiKB11123@pcp02138704pcs.reston01.va.comcast.net> > There are now 4 different people working on the "python ce" > project. We've been passing around build tree .zips and it's getting > out of hand. > > I think we should setup our own CVS somewhere so we can work out all > the CE kinks before submitting patches to the core Python CVS. > > Is it possible to maintain a single working directory that can be > checked into two different CVS systems? > > I really have no idea what the proper way is to do this.. Looking > for recommendations, including "don't do that, do this instead". Perhaps you can work on a branch of the standard Python CVS tree? If you're all Python developers (or can be sworn in easily) that would work. Otherwise you could set up your own SF project "pythonce" and do a vendor branch checkin of Python. I've never used vendor branches myself, but Kurt Kaiser uses them in the idlefork CVS, which deals with a similar issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Mon Sep 23 18:48:24 2002 From: skip@pobox.com (Skip Montanaro) Date: Mon, 23 Sep 2002 12:48:24 -0500 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> Message-ID: <15759.21480.428390.593910@12-248-11-90.client.attbi.com> Martin> Laura Creighton writes: >> Do you know when a fixed binutils is due? Martin> The bug hasn't been acknowledged by binutils maintainers yet; Martin> gcc maintainers report many problems with binutils, but have not Martin> identified any specific problem. Martin> So, unless somebody looks down into the details and studies the Martin> resulting binaries, it may be a matter of months for a fix to Martin> appear. Until then, binutils 2.13 should be avoided on Solaris. Perhaps on Solaris the Python configure script should detect the presence of binutils 2.13 and barf if it's found? Something like if [ `uname` = 'SunOS' ] ; then v=`as --version 2>/dev/null \ | head -1 \ | sed -e 's/.* \([^.]*\.[^.]*\.[^.]*\).*/\1/` if [ $? -eq 0 ] ; then # got the gnu version of as - Sun as doesn't grok --version if [ $v = '2.13.0' ] ; then barf fi fi fi seems like it should come close to working. Skip From ark@research.att.com Mon Sep 23 18:58:28 2002 From: ark@research.att.com (Andrew Koenig) Date: 23 Sep 2002 13:58:28 -0400 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <15759.21480.428390.593910@12-248-11-90.client.attbi.com> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <200209231001.g8NA1Ko9011212@ratthing-b246.strakt.com> <200209231207.g8NC78C06322@pcp02138704pcs.reston01.va.comcast.net> <200209231442.g8NEgYo9012614@ratthing-b246.strakt.com> <200209231507.g8NF7ZU09833@pcp02138704pcs.reston01.va.comcast.net> <200209231605.g8NG5Do9012985@ratthing-b246.strakt.com> <15759.21480.428390.593910@12-248-11-90.client.attbi.com> Message-ID: Skip> Perhaps on Solaris the Python configure script should detect the Skip> presence of binutils 2.13 and barf if it's found? I've already suggested a slightly different test, that has the advantage of allowing a patched 2.13 (and of detecting a broken 2.13.1 should it still be broken). -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From guido@python.org Mon Sep 23 22:30:42 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 17:30:42 -0400 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Mon, 23 Sep 2002 13:00:47 EDT." Message-ID: <200209232130.g8NLUgi19325@pcp02138704pcs.reston01.va.comcast.net> Skip, where's your 2.2.2 Wiki? (Or should we just pick a page name in the moinmoin on python.org?) I've backported the following items to 2.2.2, most of which were my responsibility and/or 64-bit issues needed for the snake farm: ---------------------------------------------------------------------- Modified Files: Tag: release22-maint regrtest.py Log Message: Backport 1.96 from trunk (because I want Xenofarm to test 2.2.2): Add a bunch of sys.stdout.flush() calls that will hopefully improve the usability of the output of the Xenofarm builds. ---------------------------------------------------------------------- Modified Files: Tag: release22-maint unicodeobject.c Log Message: Backport 2.166 from trunk: Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do the wrong thing for a unicode subclass when there were zero string replacements. The example given in the SF bug report was only one way to trigger this; replacing a string of length >= 2 that's not found is another. The code would actually write outside allocated memory if replacement string was longer than the search string. ---------------------------------------------------------------------- Modified Files: Tag: release22-maint test_unicode.py Log Message: Backport 1.56 and 1.68 from trunk: 1.56: Apply diff3.txt from SF patch http://www.python.org/sf/536241 If a str or unicode method returns the original object, make sure that for str and unicode subclasses the original will not be returned. This should prevent SF bug http://www.python.org/sf/460020 from reappearing. 1.68: Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do the wrong thing for a unicode subclass when there were zero string replacements. The example given in the SF bug report was only one way to trigger this; replacing a string of length >= 2 that's not found is another. The code would actually write outside allocated memory if replacement string was longer than the search string. ---------------------------------------------------------------------- Modified Files: Tag: release22-maint structmodule.c Log Message: Backport 2.57 from trunk: (Most of) SF patch 601369 (Christos Georgiou): obmalloc,structmodule: 64bit, big endian (issue 2 only). This adds a bunch of memcpy calls via a temporary variable to avoid alignment errors. That's needed for some platforms. ---------------------------------------------------------------------- Modified Files: Tag: release22-maint test_b1.py Log Message: Backport 1.51 and 1.54 from trunk. 1.51: Bug #556025: list(xrange(1e9)) --> seg fault Close the bug report again -- this time for Cygwin due to a newlib bug. See the following for the details: http://sources.redhat.com/ml/newlib/2002/msg00369.html Note that this commit is only a documentation (i.e., comment) change. 1.54: The list(xrange(sys.maxint / 4)) test blew up on 64-bit platforms. Because ob_size is a 32-bit int but sys.maxint is LONG_MAX which is a 64-bit value, there's no way to make this test succeed on a 64-bit platform. So just skip it when sys.maxint isn't 0x7fffffff. ---------------------------------------------------------------------- Modified Files: Tag: release22-maint intobject.c Log Message: Backport 2.93 from trunk: Insert an overflow check when the sequence repetition count is outside the range of ints. The old code would pass random truncated bits to sq_repeat() on a 64-bit machine. ---------------------------------------------------------------------- Modified Files: Tag: release22-maint unicodeobject.c stringobject.c Log Message: Backport from trunk: unicodeobject.c 2.169 stringobject.c 2.189 Fix warnings on 64-bit platforms about casts from pointers to ints. Two of these were real bugs. ---------------------------------------------------------------------- Modified Files: Tag: release22-maint exceptions.c Log Message: Backported 1.39 and 1.40 from trunk: 1.39: Fix SF bug 610610 (reported by Martijn Pieters, diagnosed by Neal Norwitz). The switch in Exception__str__ didn't clear the error if PySequence_Size() raised an exception. Added a case -1 which clears the error and falls through to the default case. 1.40: Two more cases of switch(PySequence_Size()) without checking for case -1. (Same problem as last checkin for SF bug 610610) Need to clear the error and proceed. ---------------------------------------------------------------------- Note that I've been careful to vary the formatting of my log messages a bit. :-) Michael Hudson backported a bunch of things too. I notice a test suite failure with rfc822 as a result of these. Michael, did you run the test suite? FAILED (errors=1) Traceback (most recent call last): File "../Lib/test/test_rfc822.py", line 211, in ? test_main() File "../Lib/test/test_rfc822.py", line 207, in test_main test_support.run_unittest(MessageTestCase) File "../Lib/test/test_support.py", line 180, in run_unittest run_suite(unittest.makeSuite(testclass), testclass) File "../Lib/test/test_support.py", line 175, in run_suite raise TestFailed(err) test_support.TestFailed: Traceback (most recent call last): File "../Lib/test/test_rfc822.py", line 199, in test_parseaddr eq(rfc822.parseaddr('<>'), ('', '')) File "/home/guido/branch-2.2/Lib/rfc822.py", line 491, in parseaddr list = a.addresslist AttributeError: AddrlistClass instance has no attribute 'addresslist' --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Mon Sep 23 22:49:46 2002 From: skip@pobox.com (Skip Montanaro) Date: Mon, 23 Sep 2002 16:49:46 -0500 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209232130.g8NLUgi19325@pcp02138704pcs.reston01.va.comcast.net> References: <200209232130.g8NLUgi19325@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15759.35962.479042.152695@12-248-11-90.client.attbi.com> Guido> Skip, where's your 2.2.2 Wiki? (Or should we just pick a page Guido> name in the moinmoin on python.org?) Ain't been created yet. This is the first response I got to my offer to create one. Just a sec... Okay, it's at http://manatee.mojam.com/py222wiki/ and is completely untarnished by (virtual) human hands. Guido> I've backported the following items to 2.2.2, most of which were my Guido> responsibility and/or 64-bit issues needed for the snake farm: ... I have to get off to a soccer game. If nobody beats me to it I'll try to update the wiki later tonight or first thing tomorrow. Skip From martin@v.loewis.de Mon Sep 23 23:02:40 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 24 Sep 2002 00:02:40 +0200 Subject: [Python-Dev] Need advice: cloning python cvs for CE project In-Reply-To: <200209231744.g8NHiKB11123@pcp02138704pcs.reston01.va.comcast.net> References: <3D8F173B.11372.4250401E@localhost> <200209231744.g8NHiKB11123@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > Otherwise you could set up your own SF project "pythonce" and do a > vendor branch checkin of Python. I've never used vendor branches > myself, but Kurt Kaiser uses them in the idlefork CVS, which deals > with a similar issue. I would recommend this strategy. Supposedly, you will rarely need to perform imports from PythonLabs Python: the CE changes should be largely independent of how PythonLabs Python develops. Imports might be needed only when a chunk of your changes is accepted into CVS Python. You don't need a separate SF project, perhaps: A CVS module on the python project might be sufficient. We can ask SF to remove the tree when/if CE incorporation is complete. Regards, Martin From bkc@murkworks.com Mon Sep 23 23:10:59 2002 From: bkc@murkworks.com (Brad Clements) Date: Mon, 23 Sep 2002 18:10:59 -0400 Subject: [Python-Dev] Need advice: cloning python cvs for CE project In-Reply-To: References: <200209231744.g8NHiKB11123@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D8F593D.21953.43521B2B@localhost> On 24 Sep 2002 at 0:02, Martin v. Loewis wrote: > Guido van Rossum writes: > > > Otherwise you could set up your own SF project "pythonce" and do a > > vendor branch checkin of Python. I've never used vendor branches > > myself, but Kurt Kaiser uses them in the idlefork CVS, which deals > > with a similar issue. > > I would recommend this strategy. Supposedly, you will rarely need to > perform imports from PythonLabs Python: the CE changes should be > largely independent of how PythonLabs Python develops. Agreed. We'd snapshot from the core to the CE working CVS periodically. We cannot keep up with the rate of core changes until we get our act together. In the end, we shouldn't really have anything outside the core if we "do it right". > You don't need a separate SF project, perhaps: A CVS module on the > python project might be sufficient. We can ask SF to remove the tree > when/if CE incorporation is complete. Can someone who understands the mechanics of how this works explain it? I'm not skilled enough in CVS to visualize the process of importing from the core, while still being able to track commit/update's from the CE tree. Also, none of the developers have core CVS access now, so I do not think a branch would be appropriate. Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From greg@cosc.canterbury.ac.nz Mon Sep 23 23:20:08 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 24 Sep 2002 10:20:08 +1200 (NZST) Subject: [Python-Dev] os.wait unweirding In-Reply-To: <1032795494.16226.478.camel@HillCountryPeress> Message-ID: <200209232220.g8NMK8d16141@oma.cosc.canterbury.ac.nz> Hunter Peress : > I think that os.wait() without any arguments should wait > for all chilren, not wait for the earliest executed child. Actually, it waits for *any* one child to exit, not necessarily the first one spawned. In any case, the functions in the os module are supposed to be direct wrappers of the platform's system calls. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin@v.loewis.de Tue Sep 24 00:02:40 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 24 Sep 2002 01:02:40 +0200 Subject: [Python-Dev] Need advice: cloning python cvs for CE project In-Reply-To: <3D8F593D.21953.43521B2B@localhost> References: <200209231744.g8NHiKB11123@pcp02138704pcs.reston01.va.comcast.net> <3D8F593D.21953.43521B2B@localhost> Message-ID: "Brad Clements" writes: > Can someone who understands the mechanics of how this works explain it? 1. Export a "blank tree" cvs -d :pserver:anonymous@cvs.python.sourceforge.net:/cvsroot/python export python 2. Import it into a fresh repository cvs -d :ext:developername@cvs.python.sourceforge.net:/cvsroot/python import pythonce Pythonlabs cvs_from_020924 3. Make a sandbox for your module cvs -d :pserver:anonymous@cvs.python.sourceforge.net:/cvsroot/python export -d MyPythonCE pythonce Then, whenever you incorporate another Pythonlabs snapshot, import it again. cvs will tell you the command to join your tree with the imported tree when the import is complete. HTH, Martin From vinay_sajip@red-dove.com Tue Sep 24 00:04:15 2002 From: vinay_sajip@red-dove.com (Vinay Sajip) Date: Tue, 24 Sep 2002 00:04:15 +0100 Subject: [Python-Dev] PEP 282 Implementation References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <000e01c26355$8b2dcdc0$652b6992@alpha> Guido van Rossum wrote: > I'm sorry that this seems to be a thread with one message per month! > I'll try to be more responsive from now on, the big Zope projects that > were keeping me busy have given me some slack time. Great. >> I will update the docstrings as per your feedback. > > Great! (When can we see a new release on > http://www.red-dove.com/python_logging.html > ?) I was waiting for your feedback about the packaging - the docstrings have been changed but I wanted to roll everything into the next release. Speaking of which... > I would feel much less strongly about this if several of the > additional things could be moved to separate files without making it a > package. > [stuff snipped] > This is an example of something that I'd like to see relegated to a > separate file. It really looks like fileConfig(), listen() and > stopListening() are a separate feature bundle that looks like it is > a specific example application rather than a core feature of the > logging module. It certainly doesn't appear in PEP 282. Maybe the > socket handler classes belong in the same category. > > Of course, the same can be said about all Handler subclasses except > StreamHandler. Only StreamHandler is referenced by basicConfig(). > Perhaps these should all (except StreamHandler) be moved to separate > files? This sounds like a reason to make it a package. The main > logging code could be in the __init__.py file -- there's no rule that > says __init__.py should be empty or short! How about this suggestion? We could leave the core code in the existing module, "logging". This would include a minimal set of handlers, and all the Filters, and I think StreamHandler and FileHandler should be in here. All other handlers would live in "logging.handlers". As for configuration - basicConfig() could live in "logging" and any other configuration code in "logging.config". If the above seems a good idea, please let me know and I'll refactor accordingly - then the next release will (hopefully) be in the next 2-3 weeks. > PS. In your comments you seem fond of the word "needful". I've rarely > heard that word -- perhaps it is archaic or common only in India? I only found 2 uses of "needful" - in BufferingHandler and ConfigStreamHandler. It's the whole phrase "do the needful", which I think is peculiar to England but has its share of users on the subcontinent :-) Regards Vinay Sajip From rwgk@yahoo.com Tue Sep 24 00:52:24 2002 From: rwgk@yahoo.com (Ralf W. Grosse-Kunstleve) Date: Mon, 23 Sep 2002 16:52:24 -0700 (PDT) Subject: [C++-sig] Re: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209231322.g8NDMGX06641@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020923235224.76315.qmail@web20208.mail.yahoo.com> --- Guido van Rossum wrote: > If you check out the release22-maint branch of Python from CVS and > subscribe to the python-checkins list > (http://mail.python.org/mailman/listinfo/python-checkins) you should > be able to track the work leading up to 2.2.2 pretty closely. Apparently the bug report https://sourceforge.net/tracker/?func=detail&atid=105470&aid=607253&group_id=5470 has not yet lead to any changes in the release22-maint branch. The worst problem are missing extern "C" in descrobject.h and iterobject.h. This is compounded by missing include guards. We struggled quite a bit to find a workaround for Boost.Python. It will also be helpful if include guards are added to pymactoolbox.h. Ralf __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com From tim.one@comcast.net Tue Sep 24 01:13:25 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 23 Sep 2002 20:13:25 -0400 Subject: [C++-sig] Re: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <20020923235224.76315.qmail@web20208.mail.yahoo.com> Message-ID: [Ralf W. Grosse-Kunstleve] > Apparently the bug report > > https://sourceforge.net/tracker/?func=detail&atid=105470&aid=60725 > 3&group_id=5470 > > has not yet lead to any changes in the release22-maint branch. The > worst problem are missing extern "C" in descrobject.h and iterobject.h. > This is compounded by missing include guards. We struggled quite a bit > to find a workaround for Boost.Python. > > It will also be helpful if include guards are added to pymactoolbox.h. The odds of something like this getting fixed to your satisfication (not to mention at all ) greatly increase if you submit a patch. Looks to me like what you want to do is both correct and harmless, but I'm (speaking as a generic Python developer) not going to be able to make time to test it in the context you're concerned about. OTOH, if there were a patch that you knew worked for *you*, cool, I could apply it and just make sure it didn't break anything for me (speaking as a etc). From guido@python.org Tue Sep 24 01:53:44 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 20:53:44 -0400 Subject: [C++-sig] Re: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Mon, 23 Sep 2002 16:52:24 PDT." <20020923235224.76315.qmail@web20208.mail.yahoo.com> References: <20020923235224.76315.qmail@web20208.mail.yahoo.com> Message-ID: <200209240053.g8O0riW20237@pcp02138704pcs.reston01.va.comcast.net> > > If you check out the release22-maint branch of Python from CVS and > > subscribe to the python-checkins list > > (http://mail.python.org/mailman/listinfo/python-checkins) you should > > be able to track the work leading up to 2.2.2 pretty closely. > > Apparently the bug report > > https://sourceforge.net/tracker/?func=detail&atid=105470&aid=607253&group_id=5470 > > has not yet lead to any changes in the release22-maint branch. The > worst problem are missing extern "C" in descrobject.h and iterobject.h. > This is compounded by missing include guards. We struggled quite a bit > to find a workaround for Boost.Python. Please submit patches. Not being a C++ user myself I find it hard to guess exactly what needs to be done based upon your terse description. > It will also be helpful if include guards are added to pymactoolbox.h. I suppose you mean in the 2.2 branch? Jack added them two weeks ago, according to the bug report. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 24 02:12:34 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 23 Sep 2002 21:12:34 -0400 Subject: [Python-Dev] PEP 282 Implementation In-Reply-To: Your message of "Tue, 24 Sep 2002 00:04:15 BST." <000e01c26355$8b2dcdc0$652b6992@alpha> References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> <000e01c26355$8b2dcdc0$652b6992@alpha> Message-ID: <200209240112.g8O1CYE20440@pcp02138704pcs.reston01.va.comcast.net> > > I would feel much less strongly about this if several of the > > additional things could be moved to separate files without making it a > > package. > > > [stuff snipped] > > > This is an example of something that I'd like to see relegated to a > > separate file. It really looks like fileConfig(), listen() and > > stopListening() are a separate feature bundle that looks like it is > > a specific example application rather than a core feature of the > > logging module. It certainly doesn't appear in PEP 282. Maybe the > > socket handler classes belong in the same category. > > > > Of course, the same can be said about all Handler subclasses except > > StreamHandler. Only StreamHandler is referenced by basicConfig(). > > Perhaps these should all (except StreamHandler) be moved to separate > > files? This sounds like a reason to make it a package. The main > > logging code could be in the __init__.py file -- there's no rule that > > says __init__.py should be empty or short! > > How about this suggestion? We could leave the core code in the > existing module, "logging". This would include a minimal set of > handlers, and all the Filters, and I think StreamHandler and > FileHandler should be in here. All other handlers would live in > "logging.handlers". As for configuration - basicConfig() could live > in "logging" and any other configuration code in "logging.config". Sounds good to me. I hope that whoever felt strongly about this (Martin von Loewis?) agrees. > If the above seems a good idea, please let me know and I'll refactor > accordingly - then the next release will (hopefully) be in the next > 2-3 weeks. > > > PS. In your comments you seem fond of the word "needful". I've rarely > > heard that word -- perhaps it is archaic or common only in India? > > I only found 2 uses of "needful" - in BufferingHandler and > ConfigStreamHandler. It's the whole phrase "do the needful", which I > think is peculiar to England but has its share of users on the > subcontinent :-) Oh well. Shows how Americanized I am, despite my thoroughly European upbringing, after 7 years here. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From chrism@zope.com Tue Sep 24 04:22:01 2002 From: chrism@zope.com (Chris McDonough) Date: Mon, 23 Sep 2002 23:22:01 -0400 Subject: [Python-Dev] PEP 282 Implementation References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <01aa01c26379$8d6def60$4901000a@dorothy> > > > - setRollover(): the explanation isn't 100% clear. I *think* that > > > you always write to "app.log", and when that's full, you rename > > > it to app.log.1, and app.log.1 gets renamed to app.log.2, and so > > > on, and then you start writing to a new app.log, right? > > > > Yes. The original implementation was different - it just closed the > > current file and opened a new file app.log.n. The current > > implementation is slightly slower due to the need to rename several > > files, but the user can tell more easily which the latest log file > > is. I will update the setRollover() docstring to indicate more > > clearly how it works; I'm assuming that the current algorithm is > > deemed good enough. > > Yes, this seems how log rotation is generally done. (Please remove > the commented-out old code.) It would be helpful for the FileHandler class to define a method which just closes and reopens the current logfile (instead of actually rotating a set like-named logfiles). This would allow logfile rotation to be performed by a separate process (e.g. RedHat's logrotate). Sometimes it's better (and even necessary) to be able to use system-provided log rotation facilities instead of relying on the native rotation facilities. - C From mhammond@skippinet.com.au Tue Sep 24 05:16:07 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 24 Sep 2002 14:16:07 +1000 Subject: [Python-Dev] Need advice: cloning python cvs for CE project In-Reply-To: <3D8F593D.21953.43521B2B@localhost> Message-ID: FWIW, a breakin at my house means my insurance company is funding a sparkling new CE machine for me - which means PythonCE should be able to work again for me soon :) My Linux box (laptop) was taken at the same time, and it seems the replacement will be significantly faster than my desktop. Gotta love insurance ;) > Also, none of the developers have core CVS access now, so I do > not think a branch > would be appropriate. I havent been up to date with the latest PythonCE work, but I believe there are two key issues: 1) Fairly simple patches to random files that allow CE to compile. These are generally fairly transparent, and generally just #ifdef out certain features. 2) Larger patches or new source files that involve significant code - often a re-implementation of something missing from CE that Python really likes to have, or converting everything to and from Unicode for the CE API. I believe (1) could be handled using the source forge patch manager, as patches to the core. Depending on how much this reduces the size of the patch, the best way to handle (2) could be determined later. I'm happy to help steer some of this through, and as I said above, I should actually be in a position to build and test PythonCE again soon. I've got a few busy weeks ahead of me, but after that will have some Python time back. Mark. From martin@v.loewis.de Tue Sep 24 05:49:24 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 24 Sep 2002 06:49:24 +0200 Subject: [Python-Dev] PEP 282 Implementation In-Reply-To: <200209240112.g8O1CYE20440@pcp02138704pcs.reston01.va.comcast.net> References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> <000e01c26355$8b2dcdc0$652b6992@alpha> <200209240112.g8O1CYE20440@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > Sounds good to me. I hope that whoever felt strongly about this > (Martin von Loewis?) agrees. I don't think I ever voiced an opinion on logging. Regards, Martin From vinay_sajip@red-dove.com Tue Sep 24 08:56:33 2002 From: vinay_sajip@red-dove.com (Vinay Sajip) Date: Tue, 24 Sep 2002 08:56:33 +0100 Subject: [Python-Dev] PEP 282 Implementation References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> <01aa01c26379$8d6def60$4901000a@dorothy> Message-ID: <003901c2639f$e7e21ae0$652b6992@alpha> Chris McDonough wrote: > It would be helpful for the FileHandler class to define a method > which just closes and reopens the current logfile (instead of > actually rotating a set like-named logfiles). This would allow > logfile rotation to be performed by a separate process (e.g. > RedHat's logrotate). Sometimes it's better (and even necessary) to > be able to use system-provided log rotation facilities instead of > relying on the native rotation facilities. I'm not sure whether this should be in the core functionality. I presume you don't mean an atomic "close and reopen" operation - rather, are you suggesting close the file, maybe rename it at the application level, then reopen? If so, then it's best handled entirely in the application level, through a subclass of FileHandler. This allows each application to consider issues such as what to do with events that occur between close and reopen (e.g. if multiple threads are running). Regards Vinay From mwh@python.net Tue Sep 24 10:14:47 2002 From: mwh@python.net (Michael Hudson) Date: Tue, 24 Sep 2002 10:14:47 +0100 (BST) Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209232130.g8NLUgi19325@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Mon, 23 Sep 2002, Guido van Rossum wrote: > Michael Hudson backported a bunch of things too. I notice a test > suite failure with rfc822 as a result of these. Michael, did you run > the test suite? No, I'd just got started when the network started acting up. I'll get to it. Cheers, M. From mwh@python.net Tue Sep 24 10:24:04 2002 From: mwh@python.net (Michael Hudson) Date: 24 Sep 2002 10:24:04 +0100 Subject: [Python-Dev] PEP 282 Implementation In-Reply-To: "Vinay Sajip"'s message of "Tue, 24 Sep 2002 00:04:15 +0100" References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> <000e01c26355$8b2dcdc0$652b6992@alpha> Message-ID: <2md6r3oi4r.fsf@starship.python.net> "Vinay Sajip" writes: > I only found 2 uses of "needful" - in BufferingHandler and > ConfigStreamHandler. It's the whole phrase "do the needful", which I > think is peculiar to England but has its share of users on the > subcontinent :-) I've never heard it before, having lived my whole life in England... The meaning's pretty obvious, though. Cheers, M. -- I'll write on my monitor fifty times 'I must not post self-indulgent wibble nobody is interested in to ucam.chat just because I'm bored and I can't find the bug I'm supposed to fix'. -- Steve Kitson, ucam.chat From rengelin@strw.leidenuniv.nl Tue Sep 24 12:10:33 2002 From: rengelin@strw.leidenuniv.nl (Roeland Rengelink) Date: Tue, 24 Sep 2002 13:10:33 +0200 Subject: [Python-Dev] bug 576990 Message-ID: <3D904828.766779BE@strw.leidenuniv.nl> Hi, Early july I submitted a bug report ( http://www.python.org/sf/576990 ). Although Raymond Hettinger briefly looked at it (closed it, and the re-opened it), there's presently no assignee for the bug. I am certainly willing to do the work myself, but before doing so, I'd like to be sure that I understand the non-repsonse correctly. I see several possibilities: 1. This is not a bug but somebody forgot to tell me. 2. This is a completely trivial to solve, but everybody overlooked it. 3. This is a small bug, only seen in a marginal corner case that is of no particular interest to anyone, so there is no reason ( and certainly no time) for anybody to respond and/or solve this 4. This is a mildly interesting, but relatively obscure bug, that might be straightforward to solve if somebody had the spare time. (what spare time?) 5. This is clearly a profound and interesting bug, but solving this seems to involve cans of worms, ten-foot poles, and a re-write of the core. I supsect that in this case the answer lies somewhere between 3 and 4. I just want to make sure that this is not a type 1, 2 or 5 bug. ... Ok. So this is actually a blatant attempt to get someone to look at this again before 2.2.2 goes out the door. On the other hand, I really am willing to do the work (clarify the report, give more use-cases, explain the reasoning behind the patch, implement alternative solutions). Thanks, Roeland Rengelink From guido@python.org Tue Sep 24 13:10:21 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 08:10:21 -0400 Subject: [Python-Dev] PEP 282 Implementation In-Reply-To: Your message of "Mon, 23 Sep 2002 23:22:01 EDT." <01aa01c26379$8d6def60$4901000a@dorothy> References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> <01aa01c26379$8d6def60$4901000a@dorothy> Message-ID: <200209241210.g8OCALQ22479@pcp02138704pcs.reston01.va.comcast.net> > It would be helpful for the FileHandler class to define a method > which just closes and reopens the current logfile (instead of > actually rotating a set like-named logfiles). This would allow > logfile rotation to be performed by a separate process (e.g. > RedHat's logrotate). Sometimes it's better (and even necessary) to > be able to use system-provided log rotation facilities instead of > relying on the native rotation facilities. Maybe this could be a different Handler subclass? I have to admit that I find log rotation borderline functionality for the logging module. Perhaps Chris' suggestion is sufficient. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 24 13:30:04 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 08:30:04 -0400 Subject: [Python-Dev] "do the needful" In-Reply-To: Your message of "Tue, 24 Sep 2002 10:24:04 BST." <2md6r3oi4r.fsf@starship.python.net> References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> <000e01c26355$8b2dcdc0$652b6992@alpha> <2md6r3oi4r.fsf@starship.python.net> Message-ID: <200209241230.g8OCU4G22577@pcp02138704pcs.reston01.va.comcast.net> A Google search on "do the needful" suggests that the phrase is indeed popular on the subcontinent: associated with top hits are names like Ibrahim Hunkunti, Lord Sri Krishna, India's National Newspaper, an astrology site run by Mr. Harsh Nigram, Pakistan, Nepal, ... You get the picture. I'm fine if Vinay leaves it in! It definitely sounded funny to me, but it's not broken English -- it's cultural diversity. --Guido van Rossum (home page: http://www.python.org/~guido/) From niemeyer@conectiva.com Tue Sep 24 13:52:33 2002 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Tue, 24 Sep 2002 09:52:33 -0300 Subject: [Python-Dev] Re: ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: <200209232130.g8NLUgi19325@pcp02138704pcs.reston01.va.comcast.net> References: <200209232130.g8NLUgi19325@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020924095233.A22181@ibook.distro.conectiva> > Skip, where's your 2.2.2 Wiki? (Or should we just pick a page name in > the moinmoin on python.org?) (unashamed plug ahead) Btw, if you're MoinMoin extensively (as we have been), you may want to check a small script I've written (editmoin.py) to allow edition of moin pages with your preferred editor, and also a syntax highlighting file for vim: http://moin.conectiva.com.br/EditMoin -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From thomas.heller@ion-tof.com Tue Sep 24 14:24:41 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 24 Sep 2002 15:24:41 +0200 Subject: [Python-Dev] Assign to errno allowed? Message-ID: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> I'm trying to fix selectmodule.c on Windows (it raises bogus exceptions, because select() on Windows does not set errno). The first patch I had was this: if (n < 0) { + #ifdef MS_WINDOWS + PyErr_SetExcFromWindowsErr(SelectError, WSAGetLastError()); + #else PyErr_SetFromErrno(SelectError); + #endif } else if (n == 0) { /* optimization */ but PyErr_SetExcFromWindowsErr is not present in the 2.2 maintainance branch. An easier fix would be this one, but I wonder if it is allowed/good style to set 'errno': *** 274,279 **** --- 274,282 ---- Py_END_ALLOW_THREADS if (n < 0) { + #ifdef MS_WINDOWS + errno = WSAGetLastError(); + #endif PyErr_SetFromErrno(SelectError); } else if (n == 0) { Thomas From skip@pobox.com Tue Sep 24 14:41:28 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 24 Sep 2002 08:41:28 -0500 Subject: [Python-Dev] Python 2.2.2 Wiki Message-ID: <15760.27528.301965.340841@12-248-11-90.client.attbi.com> For folks interested in 2.2.2, I created an empty Wiki available at http://manatee.mojam.com/py222wiki/ I did *nothing* other than set it up. I don't know how people want to use the Wiki. Feel free to organize it any way you want, or give me some clues and I'll take a crack at a top-level structure. Michael McLay had suggested: >> Perhaps Wiki pages would be a good mechanism for collaboration on the >> classification of patches. Create one page for each classification >> type and then use the patch names and title as section titles within >> the page. What are the classification types he referred to? Skip From mhammond@skippinet.com.au Tue Sep 24 14:49:05 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 24 Sep 2002 23:49:05 +1000 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> Message-ID: > but PyErr_SetExcFromWindowsErr is not present in the 2.2 > maintainance branch. An easier fix would be this one, but > I wonder if it is allowed/good style to set 'errno': > > *** 274,279 **** > --- 274,282 ---- > Py_END_ALLOW_THREADS > > if (n < 0) { > + #ifdef MS_WINDOWS > + errno = WSAGetLastError(); > + #endif > PyErr_SetFromErrno(SelectError); > } > else if (n == 0) { Well, I'd agree it's not good style - therefore it deserves a comment . I'd say with a reasonable comment you should just go for it. BDFL-pronouncement-not-withstanding ly, Mark. From mal@lemburg.com Tue Sep 24 15:00:38 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 24 Sep 2002 16:00:38 +0200 Subject: [Python-Dev] Assign to errno allowed? References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> Message-ID: <3D907006.504@lemburg.com> Thomas Heller wrote: > I'm trying to fix selectmodule.c on Windows (it raises > bogus exceptions, because select() on Windows does not > set errno). > The first patch I had was this: > > if (n < 0) { > + #ifdef MS_WINDOWS > + PyErr_SetExcFromWindowsErr(SelectError, WSAGetLastError()); > + #else > PyErr_SetFromErrno(SelectError); > + #endif > } > else if (n == 0) { > /* optimization */ > > but PyErr_SetExcFromWindowsErr is not present in the 2.2 > maintainance branch. An easier fix would be this one, but > I wonder if it is allowed/good style to set 'errno': > > *** 274,279 **** > --- 274,282 ---- > Py_END_ALLOW_THREADS > > if (n < 0) { > + #ifdef MS_WINDOWS > + errno = WSAGetLastError(); > + #endif > PyErr_SetFromErrno(SelectError); > } > else if (n == 0) { Here's what the man page has to say: NAME errno - number of last error SYNOPSIS #include extern int errno; DESCRIPTION The integer errno is set by system calls (and some library functions) to indicate what went wrong. Its value is significant only when the call returned an error (usually -1), and a library function that does succeed is allowed to change errno. Sometimes, when -1 is also a legal return value one has to zero errno before the call in order to detect possible errors. errno is defined by the ISO C standard to be a modifiable lvalue of type int, and must not be explicitly declared; errno may be a macro. errno is thread-local; setting it in one thread does not affect its value in any other thread. Valid error numbers are all non-zero; errno is never set to zero by any library function. All the error names specified by POSIX.1 must have distinct values. ... Setting errno is allowed; in fact, it is required to set it to 0 sometimes in order to narrow down the location of an error (in a sequence of C library calls). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Tue Sep 24 15:16:39 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 10:16:39 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 15:24:41 +0200." <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> Message-ID: <200209241416.g8OEGd902355@odiug.zope.com> > I'm trying to fix selectmodule.c on Windows (it raises > bogus exceptions, because select() on Windows does not > set errno). Are you *sure* about that? > The first patch I had was this: [...] > but PyErr_SetExcFromWindowsErr is not present in the 2.2 > maintainance branch. An easier fix would be this one, but > I wonder if it is allowed/good style to set 'errno': Yes, assignment to errno is fine. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Tue Sep 24 15:14:05 2002 From: ark@research.att.com (Andrew Koenig) Date: 24 Sep 2002 10:14:05 -0400 Subject: [Python-Dev] -zcombreloc In-Reply-To: <200209231350.g8NDonx28099@europa.research.att.com> References: <200209231350.g8NDonx28099@europa.research.att.com> Message-ID: ark> So I think you are right -- the correct fix is to warn people that ark> they should not use binutils 2.13 on Solaris, period. I believe that ark> they will fix the problem in 2.13.1 and will let you know. I received a fix this morning from one of the binutils developers and am testing it now. The good news is that the Python build has gotten further than it did last time, so I'm going to try rebuilding gcc with the fixed binutils 2.13 and then rebuilding Python. That process takes about 16 hours of cpu time, so don't expect to hear from me until tomorrow at the earliest. The bad news is that the fix is specific to Solaris, which means that installing it breaks the linker for Sparc/Linux. They are now trying to figure out how to fix it in a way that does not break Linux; obviously, they're not going to put it in a release until they have one that works on both operating systems. More news as I get it. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From guido@python.org Tue Sep 24 15:18:19 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 10:18:19 -0400 Subject: [Python-Dev] Python 2.2.2 Wiki In-Reply-To: Your message of "Tue, 24 Sep 2002 08:41:28 CDT." <15760.27528.301965.340841@12-248-11-90.client.attbi.com> References: <15760.27528.301965.340841@12-248-11-90.client.attbi.com> Message-ID: <200209241418.g8OEIJ502369@odiug.zope.com> > http://manatee.mojam.com/py222wiki/ > > I did *nothing* other than set it up. I don't know how people want to use > the Wiki. Feel free to organize it any way you want, or give me some clues > and I'll take a crack at a top-level structure. > > Michael McLay had suggested: > > >> Perhaps Wiki pages would be a good mechanism for collaboration on the > >> classification of patches. Create one page for each classification > >> type and then use the patch names and title as section titles within > >> the page. > > What are the classification types he referred to? I suppose he was referring to this (from my initial 2.2.2 post last Friday): Basically, someone does the tedious part of triage, which means going over *every* 2.3 checkin message (with quick access to the corresponding diffs) and sorting them into: - already applied - trivial reject (e.g. new feature or fix for a bug introduced in 2.3) - trivial accept (pure bugfix that applies cleanly to 2.2) - messy (e.g. unclear whether it's a bugfix or a feature even after staring at the source, bugfixes that affect binary compatibility, bugfixes that can only be applied with much code wrangling due to other changes in the code at the same place, etc.) Feel free to compile a list of "messy" ones and send it to python-dev. It doesn't have to be all at once -- for big messy ones a separate python-dev discussion may be appropriate. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Tue Sep 24 15:19:38 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 24 Sep 2002 16:19:38 +0200 Subject: [Python-Dev] Assign to errno allowed? References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> Message-ID: <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> > > I'm trying to fix selectmodule.c on Windows (it raises > > bogus exceptions, because select() on Windows does not > > set errno). > > Are you *sure* about that? > Yes. MSDN: The select function returns the total number of socket handles that are ready and contained in the fd_set structures, zero if the time limit expired, or SOCKET_ERROR if an error occurred. If the return value is SOCKET_ERROR, WSAGetLastError can be used to retrieve a specific error code. Thomas From martin@v.loewis.de Tue Sep 24 15:32:45 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 24 Sep 2002 16:32:45 +0200 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> Message-ID: "Thomas Heller" writes: > > Are you *sure* about that? [...} > The select function returns the total number of socket handles that > are ready and contained in the fd_set structures, zero if the time > limit expired, or SOCKET_ERROR if an error occurred. If the return > value is SOCKET_ERROR, WSAGetLastError can be used to retrieve a > specific error code. This is a strong indication, but not enough for certainty. It does not mention errno at all. Regards, Martin From thomas.heller@ion-tof.com Tue Sep 24 15:47:17 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 24 Sep 2002 16:47:17 +0200 Subject: [Python-Dev] Assign to errno allowed? References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook><200209241416.g8OEGd902355@odiug.zope.com><000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> Message-ID: <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> From: "Martin v. Loewis" > "Thomas Heller" writes: > > > > Are you *sure* about that? > [...} > > > The select function returns the total number of socket handles that > > are ready and contained in the fd_set structures, zero if the time > > limit expired, or SOCKET_ERROR if an error occurred. If the return > > value is SOCKET_ERROR, WSAGetLastError can be used to retrieve a > > specific error code. > > This is a strong indication, but not enough for certainty. It does not > mention errno at all. > Yes. Here's an experiment (unpatched python): Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import select >>> select.select([], [], [], 10) Traceback (most recent call last): File "", line 1, in ? select.error: (0, 'Error') >>> Patched python: Python 2.3a0 (#29, Sep 19 2002, 12:38:34) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import select >>> select.select([], [], [], 10) Traceback (most recent call last): File "", line 1, in ? select.error: (10093, 'Either the application has not called WSAStartup, or WSAStartup failed') >>> import socket >>> select.select([], [], [], 10) Traceback (most recent call last): File "", line 1, in ? select.error: (10022, 'An invalid argument was supplied') >>> Thomas From thomas.heller@ion-tof.com Tue Sep 24 15:51:48 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 24 Sep 2002 16:51:48 +0200 Subject: [Python-Dev] Assign to errno allowed? References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook><200209241416.g8OEGd902355@odiug.zope.com><000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> Message-ID: <003901c263d9$e8ab03d0$e000a8c0@thomasnotebook> From: "Martin v. Loewis" > "Thomas Heller" writes: > > > > Are you *sure* about that? > [...} > > > The select function returns the total number of socket handles that > > are ready and contained in the fd_set structures, zero if the time > > limit expired, or SOCKET_ERROR if an error occurred. If the return > > value is SOCKET_ERROR, WSAGetLastError can be used to retrieve a > > specific error code. > > This is a strong indication, but not enough for certainty. It does not > mention errno at all. Before we dive into philosophical discussions about what this sentence says, my interpretation would be: If select() returns SOCKET_ERROR, you *should* call WSAGetLastError() to get "details about the problem". Thomas From skip@pobox.com Tue Sep 24 15:52:37 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 24 Sep 2002 09:52:37 -0500 Subject: [Python-Dev] Python 2.2.2 Wiki In-Reply-To: <200209241418.g8OEIJ502369@odiug.zope.com> References: <15760.27528.301965.340841@12-248-11-90.client.attbi.com> <200209241418.g8OEIJ502369@odiug.zope.com> Message-ID: <15760.31797.295764.509527@12-248-11-90.client.attbi.com> >> What are the classification types he referred to? Guido> I suppose he was referring to this (from my initial 2.2.2 post Guido> last Friday): ... Okay, I created blank WikiNames for those categories. Kevin Jacobs added a bunch of bugs to the front page. Now would probably be a good time for people to jump in and review those bugs, then move them to the appropriate classification page. Skip From guido@python.org Tue Sep 24 15:56:26 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 10:56:26 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 16:19:38 +0200." <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> Message-ID: <200209241456.g8OEuQe11641@odiug.zope.com> > > > I'm trying to fix selectmodule.c on Windows (it raises > > > bogus exceptions, because select() on Windows does not > > > set errno). > > > > Are you *sure* about that? > > Yes. > > MSDN: > > The select function returns the total number of socket handles that > are ready and contained in the fd_set structures, zero if the time > limit expired, or SOCKET_ERROR if an error occurred. If the return > value is SOCKET_ERROR, WSAGetLastError can be used to retrieve a > specific error code. Argh! So select() has never returned proper return values on Windows. :-( Thanks for fixing this. Are you gonna fix it in 2.2.2 as well as 2.3? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 24 16:01:35 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 11:01:35 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 16:47:17 +0200." <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> Message-ID: <200209241501.g8OF1ZN12148@odiug.zope.com> > Patched python: > > Python 2.3a0 (#29, Sep 19 2002, 12:38:34) [MSC 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> import select > >>> select.select([], [], [], 10) > Traceback (most recent call last): > File "", line 1, in ? > select.error: (10093, 'Either the application has not called WSAStartup, or WSAStartup failed') > >>> import socket > >>> select.select([], [], [], 10) > Traceback (most recent call last): > File "", line 1, in ? > select.error: (10022, 'An invalid argument was supplied') > >>> Hm... I can confirm this on my Win98SE box. But questions pop up: Why is the error different the first time? And why is this an error at all? On Linux, this is not an error. (In fact, time.sleep() uses this to sleep using subsecond precision.) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Tue Sep 24 16:15:04 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 24 Sep 2002 17:15:04 +0200 Subject: [Python-Dev] Assign to errno allowed? References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> <200209241501.g8OF1ZN12148@odiug.zope.com> Message-ID: <005101c263dd$2908a5b0$e000a8c0@thomasnotebook> > > Patched python: > > > > Python 2.3a0 (#29, Sep 19 2002, 12:38:34) [MSC 32 bit (Intel)] on win32 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import select > > >>> select.select([], [], [], 10) > > Traceback (most recent call last): > > File "", line 1, in ? > > select.error: (10093, 'Either the application has not called WSAStartup, or WSAStartup failed') > > >>> import socket > > >>> select.select([], [], [], 10) > > Traceback (most recent call last): > > File "", line 1, in ? > > select.error: (10022, 'An invalid argument was supplied') > > >>> > > Hm... I can confirm this on my Win98SE box. But questions pop up: > > Why is the error different the first time? And why is this an > error at all? The winsock library is not initialized the first time - it seems that socketmodule calls WSAStartup(), but I haven't looked at this in detail. Also I think it's not worth to fix it, there's no use for select() on windows if you don't use sockets - you have to supply at least one socket descriptor (that's the cause for the second error above). Although it could be argued whether it makes sense to simulate a Linux-compatible select for Windows. > On Linux, this is not an error. (In fact, time.sleep() > uses this to sleep using subsecond precision.) >From my early Unix (actually Minix) experiments I remember that select(3) was the only possibility to do subsecond delays in Unix. Is this still the same today? > > --Guido van Rossum (home page: http://www.python.org/~guido/) > And yes, I will fix it in 2.3 and backport it to 2.2.2. Thomas From mwh@python.net Tue Sep 24 16:21:05 2002 From: mwh@python.net (Michael Hudson) Date: 24 Sep 2002 16:21:05 +0100 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Tools/freeze extensions_win32.ini,1.5,1.6 In-Reply-To: mhammond@users.sourceforge.net's message of "Thu, 27 Jun 2002 18:13:04 -0700" References: Message-ID: <2mwupbmn1a.fsf@starship.python.net> mhammond@users.sourceforge.net writes: > Update of /cvsroot/python/python/dist/src/Tools/freeze > In directory usw-pr-cvs1:/tmp/cvs-serv26120 > > Modified Files: > extensions_win32.ini > Log Message: > Patch 574531/Bug 574570 - allow freeze on windows to use the _winreg > extension. Is this a bugfix? Cheers, M. -- This is an off-the-top-of-the-head-and-not-quite-sober suggestion, so is probably technically laughable. I'll see how embarassed I feel tomorrow morning. -- Patrick Gosling, ucam.comp.misc From bkc@murkworks.com Tue Sep 24 16:22:18 2002 From: bkc@murkworks.com (Brad Clements) Date: Tue, 24 Sep 2002 11:22:18 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209241416.g8OEGd902355@odiug.zope.com> References: Your message of "Tue, 24 Sep 2002 15:24:41 +0200." <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> Message-ID: <3D904AF0.23984.4702510D@localhost> On 24 Sep 2002 at 10:16, Guido van Rossum wrote: > Yes, assignment to errno is fine. > Please see patch 505846. I haven't supplied this patch in proper form yet, but this discussion relates to the patch. I would like to remind folks that on some platforms, one cannot just use "errno = 0". On those platforms calling a function is required to set errno. The point of patch 505846 is to "standardized" the "errno = " function, and secondarily provide a way to "get" the errno. This is done in pyport.h and "all modules" that use or set errno. (not as many as you might think) It's an ugly patch, requires a lot of changes to the core. I'm willing to make all the changes to the core as needed, once we figure out the best way to handle this issue is. In fact, it's this patch that is the principal cause of the "fork python ce" thread also recently discussed in this forum. See "Need advice: cloning python cvs for CE project" Windows CE doesn't allow setting errno. Neither does NetWare (CLIB). Is it worthwhile to discuss patch 505846 some more in this thread? Perhaps those who haven't read the comments on the patch have a clever solution? Or should I just clean up my patch, resubmit it and move on? I agree with Mark's post about keeping CE changes in the core. I'd rather do that. I submitted patch 505846 incorrectly and need to fix it.. But after it's submitted and if accepted, core developers would need to use Py_SetErrno instead of "errno = " And for extension developers. Using the macro would be nice, but it's less of an issue since CE and NetWare ports have to be done "by hand" anyway for these modules, we can make those changes as they're encountered. So .. discuss this, look for better insight, or resubmit the patch and move on? Thanks Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From guido@python.org Tue Sep 24 16:27:46 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 11:27:46 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 17:15:04 +0200." <005101c263dd$2908a5b0$e000a8c0@thomasnotebook> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> <200209241501.g8OF1ZN12148@odiug.zope.com> <005101c263dd$2908a5b0$e000a8c0@thomasnotebook> Message-ID: <200209241527.g8OFRkZ12485@odiug.zope.com> > > Why is the error different the first time? And why is this an > > error at all? > > The winsock library is not initialized the first time - it seems > that socketmodule calls WSAStartup(), but I haven't looked at > this in detail. Oh well, that makes some sense. > Also I think it's not worth to fix it, there's no use for select() > on windows if you don't use sockets - you have to supply at least > one socket descriptor (that's the cause for the second error above). OK. > Although it could be argued whether it makes sense to simulate > a Linux-compatible select for Windows. Nah, it's been like this for a decade. > > On Linux, this is not an error. (In fact, time.sleep() > > uses this to sleep using subsecond precision.) > > From my early Unix (actually Minix) experiments I remember > that select(3) was the only possibility to do subsecond delays > in Unix. Is this still the same today? Probably. HAVE_SELECT is the first thing tested in floatsleep(). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 24 16:37:49 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 11:37:49 -0400 Subject: [Python-Dev] PEP 282 Implementation In-Reply-To: Your message of "Tue, 24 Sep 2002 08:56:33 BST." <003901c2639f$e7e21ae0$652b6992@alpha> References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> <006601c24e7f$fcff5440$652b6992@alpha> <200209231741.g8NHfEl11092@pcp02138704pcs.reston01.va.comcast.net> <01aa01c26379$8d6def60$4901000a@dorothy> <003901c2639f$e7e21ae0$652b6992@alpha> Message-ID: <200209241537.g8OFbnc13228@odiug.zope.com> > Chris McDonough wrote: > > It would be helpful for the FileHandler class to define a method > > which just closes and reopens the current logfile (instead of > > actually rotating a set like-named logfiles). This would allow > > logfile rotation to be performed by a separate process (e.g. > > RedHat's logrotate). Sometimes it's better (and even necessary) to > > be able to use system-provided log rotation facilities instead of > > relying on the native rotation facilities. > > I'm not sure whether this should be in the core functionality. I > presume you don't mean an atomic "close and reopen" operation - > rather, are you suggesting close the file, maybe rename it at the > application level, then reopen? If so, then it's best handled > entirely in the application level, through a subclass of > FileHandler. This allows each application to consider issues such as > what to do with events that occur between close and reopen (e.g. if > multiple threads are running). No, this is using Unix functionality where once you have opened a file, if the file is renamed, you can continue to write to it and you will be writing to the renamed file. IOW the open file is connected to the inode, not the filename. Typically an application catches SIGHUP (though that has its share of problems!) and in response simply closes and reopens the file, using the original filename. The sysadmin uses this as follows: mv foo.log foo.log.1 kill -HUP `cat foo.pid` Having looked at it again, I think that this is definitely better than doing log rotation in the FileHandler. The rotation code in the log handler currently calls tell() after each record is emitted. This is expensive, and not needed if you use an external process to watch over the log files and rotate them. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Tue Sep 24 16:35:00 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 24 Sep 2002 17:35:00 +0200 Subject: [Python-Dev] Assign to errno allowed? References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> <200209241501.g8OF1ZN12148@odiug.zope.com> <005101c263dd$2908a5b0$e000a8c0@thomasnotebook> <200209241527.g8OFRkZ12485@odiug.zope.com> Message-ID: <008c01c263df$f2296090$e000a8c0@thomasnotebook> > > Also I think it's not worth to fix it, there's no use for select() > > on windows if you don't use sockets - you have to supply at least > > one socket descriptor (that's the cause for the second error above). > > OK. > > > Although it could be argued whether it makes sense to simulate > > a Linux-compatible select for Windows. > > Nah, it's been like this for a decade. In the current form, it breaks asyncore - this is what I wanted to fix in the first place. asyncore contains this code snippet in the poll() function: try: r,w,e = select.select (r,w,e, timeout) except select.error, err: if err[0] != EINTR: raise r = []; w = []; e = [] This will fail on Windows if all of r,w,e are empty. Even if there are active sockets, it may be that this code is executed with all three lists empty. How can this be fixed? I have an SF item at http://www.python.org/sf/611464 discussing this. Thomas From guido@python.org Tue Sep 24 17:02:36 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 12:02:36 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 17:35:00 +0200." <008c01c263df$f2296090$e000a8c0@thomasnotebook> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <200209241416.g8OEGd902355@odiug.zope.com> <000b01c263d5$6a8721e0$e000a8c0@thomasnotebook> <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> <200209241501.g8OF1ZN12148@odiug.zope.com> <005101c263dd$2908a5b0$e000a8c0@thomasnotebook> <200209241527.g8OFRkZ12485@odiug.zope.com> <008c01c263df$f2296090$e000a8c0@thomasnotebook> Message-ID: <200209241602.g8OG2am13367@odiug.zope.com> > > > Although it could be argued whether it makes sense to simulate > > > a Linux-compatible select for Windows. > > > > Nah, it's been like this for a decade. > > In the current form, it breaks asyncore - this is what > I wanted to fix in the first place. > asyncore contains this code snippet in the poll() function: > > try: > r,w,e = select.select (r,w,e, timeout) > except select.error, err: > if err[0] != EINTR: > raise > r = []; w = []; e = [] > > This will fail on Windows if all of r,w,e are empty. Aargh!!! Apparently asyncore has never worked properly on Windows. Note that it also doesn't check for the Windows error codes on connect(). > Even if there are active sockets, it may be that this > code is executed with all three lists empty. Yes. > How can this be fixed? Change poll() in asyncore.py to use this: if [] == r == w == e: time.sleep(timeout) else: try: r, w, e = select.select(r, w, e, timeout) except select.error, err: ...etc... > I have an SF item at http://www.python.org/sf/611464 discussing this. The conclusion there seems that select() should be fixed, but then goes on to say that there's no easy way to make it interruptible. Since we don't try to hide the differences between select on Windows and on Unix in other areas (on Windows you can only select on sockets) I'm not sure it's worth trying to fix select if you lose interruptability; fixing asyncore instead is easy enough, and I don't think this is going to bite too many other applications. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@barrys-emacs.org Tue Sep 24 17:09:26 2002 From: barry@barrys-emacs.org (Barry Scott) Date: Tue, 24 Sep 2002 17:09:26 +0100 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <008c01c263df$f2296090$e000a8c0@thomasnotebook> Message-ID: <000001c263e4$c1181730$060210ac@private> select on windows is very limited. It is only allowed to be called with socket handles. You cannot use C RTL fd with it or another sort of handle. Because its part of winsock and not part of the C RTL so it cannot mess with errno itself. > HAVE_SELECT is the first thing tested in floatsleep(). HAVE_SELECT should probably be undefined on windows. With the expection that the sockets module for windows can use it. BArry From tim.one@comcast.net Tue Sep 24 17:31:54 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 24 Sep 2002 12:31:54 -0400 Subject: [Python-Dev] Assign to errno allowed? Message-ID: [Brad Clements] > ... > I would like to remind folks that on some platforms, one cannot > just use "errno = 0". On those platforms calling a function is > required to set errno. Except that errno=0 works fine on any platform with a standard- conforming C implementation, and this isn't even a fuzzy POSIX issue -- it's a requirement of the C standard. We can define piles of macros instead, but I expect that, as the years drag on, people will "forget" to use them. From gmcm@hypernet.com Tue Sep 24 17:27:42 2002 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 24 Sep 2002 12:27:42 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209241501.g8OF1ZN12148@odiug.zope.com> References: Your message of "Tue, 24 Sep 2002 16:47:17 +0200." <000b01c263d9$4780ed80$e000a8c0@thomasnotebook> Message-ID: <3D905A3E.29317.245048AE@localhost> While we're discussing the non-conformance of Window's select, these 2 errors: > > select.error: (10093, 'Either the application has not called > > WSAStartup, or WSAStartup failed') > > select.error: (10022, 'An invalid argument was supplied') are about the only errors you'll get from select on Windows. Where select would return a socket in the errors list on *nix, on Windows it will come out as readable / writeable, and it's the socket send / rcv that will find out what the problem is. Each version of winsock gets a bit better, but (for example), selecting for write in Win9x-era winsock is essentially a busy-wait. You'll get the socket back immediately, go to write and get the Window's EWOULDBLOCK error. but-heck-it-multitasks-ly-y'rs -- Gordon http://www.mcmillan-inc.com/ From guido@python.org Tue Sep 24 17:23:16 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 12:23:16 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 17:09:26 BST." <000001c263e4$c1181730$060210ac@private> References: <000001c263e4$c1181730$060210ac@private> Message-ID: <200209241623.g8OGNGG13479@odiug.zope.com> > select on windows is very limited. It is only allowed to be called > with socket handles. You cannot use C RTL fd with it or another sort > of handle. > > Because its part of winsock and not part of the C RTL so it cannot mess > with errno itself. Yes I know. > > HAVE_SELECT is the first thing tested in floatsleep(). > > HAVE_SELECT should probably be undefined on windows. With the expection > that the sockets module for windows can use it. Sorry, floatsleep() on Windows doesn't ever get to testing HAVE_SELECT. So no worry, that part at least works. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Sep 24 17:21:13 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 12:21:13 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 11:22:18 EDT." <3D904AF0.23984.4702510D@localhost> References: "Your message of Tue, 24 Sep 2002 15:24:41 +0200." <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <3D904AF0.23984.4702510D@localhost> Message-ID: <200209241621.g8OGLD113452@odiug.zope.com> > > Yes, assignment to errno is fine. > > Please see patch 505846. > > I haven't supplied this patch in proper form yet, but this > discussion relates to the patch. > > I would like to remind folks that on some platforms, one cannot just > use "errno = 0". On those platforms calling a function is required > to set errno. Shucks. That's in violation of the ISO C standard. > The point of patch 505846 is to "standardized" the "errno = " > function, and secondarily provide a way to "get" the errno. This is > done in pyport.h and "all modules" that use or set errno. (not as > many as you might think) Why also provide an alternative way to get it? Sure you can *get* it even on Win/CE? > It's an ugly patch, requires a lot of changes to the core. I'm > willing to make all the changes to the core as needed, once we > figure out the best way to handle this issue is. I have a strong urge to tell you to start porting Linux to your CE hardware rather than bothering with Win/CE. Or buy an iPAQ for which Linux is already available. > In fact, it's this patch that is the principal cause of the "fork > python ce" thread also recently discussed in this forum. See "Need > advice: cloning python cvs for CE project" I've given all the advice I have time for. > Windows CE doesn't allow setting errno. Neither does NetWare (CLIB). Sigh. > Is it worthwhile to discuss patch 505846 some more in this thread? > Perhaps those who haven't read the comments on the patch have a > clever solution? > > Or should I just clean up my patch, resubmit it and move on? > > I agree with Mark's post about keeping CE changes in the core. I'd > rather do that. I submitted patch 505846 incorrectly and need to fix > it.. But after it's submitted and if accepted, core developers would > need to use Py_SetErrno instead of "errno = " Except in extensions that don't have a snowball in hell's chance of working on Win/CE, of course. > And for extension developers. Using the macro would be nice, but > it's less of an issue since CE and NetWare ports have to be done "by > hand" anyway for these modules, we can make those changes as they're > encountered. > > So .. discuss this, look for better insight, or resubmit the patch > and move on? As I said, I have a very strong urge to tell you to go away. But I won't. But I really don't like the idea of coding around this particular platform's quirks. --Guido van Rossum (home page: http://www.python.org/~guido/) From bkc@murkworks.com Tue Sep 24 17:43:13 2002 From: bkc@murkworks.com (Brad Clements) Date: Tue, 24 Sep 2002 12:43:13 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Message-ID: <3D905DE7.6823.474C66D7@localhost> On 24 Sep 2002 at 12:31, Tim Peters wrote: > > I would like to remind folks that on some platforms, one cannot > > just use "errno = 0". On those platforms calling a function is > > required to set errno. > > Except that errno=0 works fine on any platform with a standard- > conforming C implementation, and this isn't even a fuzzy POSIX issue -- > it's a requirement of the C standard. We can define piles of macros > instead, but I expect that, as the years drag on, people will "forget" > to use them. I don't argue the point that "this stinks". What are our choices: 1. CE port will always be a distinct branch/other cvs, and require gobs of work by "CE porters" for every new core release, changing all the "errno" statements 2. CE changes will always be a pain in the butt for core developers forced to remember Py_SetErrno() 3. No CE port. (oh, also put "NetWare" in there wherever you see the word CE .. and I suspect some other embedded operating systems running on MMU-less processors that can't virtualize errno and don't have TLS) Surprisingly, there aren't that many modules that reference errno directly. In fact, the math stuff is the worst offender .. ;-) Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From martin@v.loewis.de Tue Sep 24 15:30:24 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 24 Sep 2002 16:30:24 +0200 Subject: [Python-Dev] bug 576990 In-Reply-To: <3D904828.766779BE@strw.leidenuniv.nl> References: <3D904828.766779BE@strw.leidenuniv.nl> Message-ID: Roeland Rengelink writes: > 5. This is clearly a profound and interesting bug, but solving this > seems to involve cans of worms, ten-foot poles, and a re-write of the > core. To me, it sounds like this. This has been changed forth and back, and in every state, somebody is unhappy. Regards, Martin From guido@python.org Tue Sep 24 18:44:14 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 13:44:14 -0400 Subject: [Python-Dev] bug 576990 In-Reply-To: Your message of "Tue, 24 Sep 2002 16:30:24 +0200." References: <3D904828.766779BE@strw.leidenuniv.nl> Message-ID: <200209241744.g8OHiEk28139@odiug.zope.com> > Roeland Rengelink writes: > > > 5. This is clearly a profound and interesting bug, but solving this > > seems to involve cans of worms, ten-foot poles, and a re-write of the > > core. [Martin] > To me, it sounds like this. This has been changed forth and back, and > in every state, somebody is unhappy. Yes, it's very messy, see my comments to the SF bug entry. I see no fix that doesn't break something else. Note that this "worked" in the initial 2.2 release only when the subclass didn't have a docstring of its own: >>> class P(property): ... "This is class P" ... >>> p = P(None, None, None, "this is property p") >>> p.__doc__ 'This is class P' >>> The best workaround is I can see that works everywhere is: class P(property): "class P's docstring" __doc__ = property.__dict__['__doc__'] --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Tue Sep 24 19:46:23 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 24 Sep 2002 20:46:23 +0200 Subject: [Python-Dev] Assign to errno allowed? References: <3D905DE7.6823.474C66D7@localhost> Message-ID: <004101c263fa$b1c88ec0$ced241d5@hagrid> brad wrote: > (oh, also put "NetWare" in there wherever you see the word CE .. and I > suspect some other embedded operating systems running on MMU-less > processors that can't virtualize errno and don't have TLS) that's why the specification says that "errno" might be a macro, and why many platforms define that macro to be something like: #define errno (*_errno()) From guido@python.org Tue Sep 24 19:51:49 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 14:51:49 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 12:43:13 EDT." <3D905DE7.6823.474C66D7@localhost> References: <3D905DE7.6823.474C66D7@localhost> Message-ID: <200209241851.g8OIpn328512@odiug.zope.com> > What are our choices: > > 1. CE port will always be a distinct branch/other cvs, and require > gobs of work by "CE porters" for every new core release, changing > all the "errno" statements > > 2. CE changes will always be a pain in the butt for core developers > forced to remember Py_SetErrno() > > 3. No CE port. > > (oh, also put "NetWare" in there wherever you see the word CE .. and > I suspect some other embedded operating systems running on MMU-less > processors that can't virtualize errno and don't have TLS) > > Surprisingly, there aren't that many modules that reference errno directly. > > In fact, the math stuff is the worst offender .. ;-) I'm strongly against 2. 5 years from now, CE and NetWare and their limitations will only be a vague memory, but this convention will still cripple the Python source code. --Guido van Rossum (home page: http://www.python.org/~guido/) From bkc@murkworks.com Tue Sep 24 20:04:40 2002 From: bkc@murkworks.com (Brad Clements) Date: Tue, 24 Sep 2002 15:04:40 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <004101c263fa$b1c88ec0$ced241d5@hagrid> Message-ID: <3D907F0D.19031.47CDE6D8@localhost> On 24 Sep 2002 at 20:46, Fredrik Lundh wrote: > brad wrote: > > > (oh, also put "NetWare" in there wherever you see the word CE .. and I > > suspect some other embedded operating systems running on MMU-less > > processors that can't virtualize errno and don't have TLS) > > that's why the specification says that "errno" might be a macro, > and why many platforms define that macro to be something like: > > #define errno (*_errno()) Yes, but still on CE you cannot get a pointer to the thread specific errno. You cannot take it's address. So .. errno is #define errno GetLastError() I wish this were not true. Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From barry@barrys-emacs.org Tue Sep 24 20:10:31 2002 From: barry@barrys-emacs.org (Barry Scott) Date: Tue, 24 Sep 2002 20:10:31 +0100 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> Message-ID: <000a01c263fe$0d796de0$060210ac@private> Windows CE prevents assignment to errno... There would be a solution if you compiled all the code as C++. (Assuming that C++ reserved words are not used in the python code.) Inject the following definitions: class ErrnoHack { public: operator int(); // return errno value operator =( int ); // assign to errno }; ErrnoHack ErrnoObject #define errno ErrnoObject and you can then write errno = 0; BArry From bkc@murkworks.com Tue Sep 24 20:31:23 2002 From: bkc@murkworks.com (Brad Clements) Date: Tue, 24 Sep 2002 15:31:23 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209241851.g8OIpn328512@odiug.zope.com> References: Your message of "Tue, 24 Sep 2002 12:43:13 EDT." <3D905DE7.6823.474C66D7@localhost> Message-ID: <3D908550.9509.47E65C8B@localhost> On 24 Sep 2002 at 14:51, Guido van Rossum wrote: > I'm strongly against 2. 5 years from now, CE and NetWare and their > limitations will only be a vague memory, but this convention will > still cripple the Python source code. No arguments about CE, anyway .. (noting that NetWare has reached it's ten year anniversary ;-) I guess then the best solution is a distinct CVS for "the port to oddball platforms" or, the other option is lots of #ifdefs in the code. The original reason I proposed the macro idea was to eliminate multiple nested #ifdefs.. For example, I had trouble figuring out nested #ifdefs in posixmodule, as generated by Mark in his CE port.. It's awful. By switching to macros for "errno =" I was able to clean up a lot of the #ifdefs If changes for CE (and other errno-less OS's) are to be kept in the core, then we'll either have (from Modules/cpickle.c) #ifndef WINDOWCE errno = 0; #else SetLastError(0); #endif l = strtol(s, &endptr, 0); #ifndef WINDOWSCE if (errno || (*endptr != '\n') || (endptr[1] != '\0')) { #else if (GetLastError() || (*endptr != '\n') || (endptr[1] != '\0')) { #endif /* Hm, maybe we've got something long. Let's try reading it as a Python long object. */ #ifndef WINDOWSCE errno = 0; #else SetLastError(0); #endif --- Or --- Py_SetErrno(0) l = strtol(s, &endptr, 0); if (Py_GetErrno() || (*endptr != '\n') || (endptr[1] != '\0')) { /* Hm, maybe we've got something long. Let's try reading it as a Python long object. */ Py_SetErrno(0); Keeping in mind that adding NetWare or (other embedded OS or BIOS that wants to play) then the #ifdef version gets much worse. The macro version doesn't change. There are approximately 140 references to errno in the Modules directory alone. For the Alpha port of Python 2.2 to CE I changed every one of them (at least for any module that could run on CE, which is just about everything that runs on Win32) I've already said that I've made these changes and am willing to make them all again. Once they're in there, how difficult will it be to keep them? New code that uses errno will fail on subsequent builds on these errno-less platforms, but there will only be a handful of changes needed, rather than hundreds on every release of the core. So .. developers who just write "errno = " in the future won't be penalized, rather the porters to errno-less platforms will just have to convert the expression to macro mode. And that conversion process isn't a hardship if it only has to be done once for any given line of code on any given core release.. But if we (porters) have to change 140 references every single time there's a release.. I could see enthusiasm fading away faster than those errno-less platforms ;-) Clearly you guys know what's best better than I do. My line of reasoning for errno macros was: 1. on most platforms the macros compile away to what you'd write in C anyway 2. I'd generate and submit all the initial patches to the core to switch errno references to macros, leaving the burden of review and checkin to the core team (sorry) but not the burden of finding and changing all errno references 3. But once the patches are in-place, future releases wouldn't require nearly as much effort for re-port's to errno-less platforms because only a few lines would need to be "fixed up" to use macros, and only if the changed code didn't use the macros in the first place. 4. Not using the macros in core or extension source isn't an issue for any platform, except errno-less OS's.. at which time that code gets macro'ized at the time of the port. 5. What this does is reduces effort for future ports to crippled systems, at the expense of many initial changes, whose subsequent maintenence shouldn't (hopefully) be a burden, since ports to crippled systems would maintain the changes. Though I do agree, a future mismash of macro's and direct errno references in the core will be ugly and confusing if that occurs. (sorry this is so long, just want to clearly state my case if I have not already done so) Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From guido@python.org Wed Sep 25 02:29:19 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 24 Sep 2002 21:29:19 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Tue, 24 Sep 2002 15:31:23 EDT." <3D908550.9509.47E65C8B@localhost> References: "Your message of Tue, 24 Sep 2002 12:43:13 EDT." <3D905DE7.6823.474C66D7@localhost> <3D908550.9509.47E65C8B@localhost> Message-ID: <200209250129.g8P1TJp26117@pcp02138704pcs.reston01.va.comcast.net> [Brad, would you mind limiting your messages to lines of about 72 characters at most?] > I guess then the best solution is a distinct CVS for "the port to > oddball platforms" > > or, the other option is lots of #ifdefs in the code. > > The original reason I proposed the macro idea was to eliminate > multiple nested #ifdefs.. > > For example, I had trouble figuring out nested #ifdefs in > posixmodule, as generated by Mark in his CE port.. It's awful. > > By switching to macros for "errno =" I was able to clean up a lot of > the #ifdefs Absolutely. *If* you want to tackle this, macros for setting errno are the way to go. > If changes for CE (and other errno-less OS's) are to be kept in the > core, then we'll either have (from Modules/cpickle.c) > > #ifndef WINDOWCE > errno = 0; > #else > SetLastError(0); > #endif > l = strtol(s, &endptr, 0); > > #ifndef WINDOWSCE > if (errno || (*endptr != '\n') || (endptr[1] != '\0')) { > #else > if (GetLastError() || (*endptr != '\n') || (endptr[1] != '\0')) { > #endif > /* Hm, maybe we've got something long. Let's try reading > it as a Python long object. */ > #ifndef WINDOWSCE > errno = 0; > #else > SetLastError(0); > #endif > > --- Or --- > > Py_SetErrno(0) > l = strtol(s, &endptr, 0); > > if (Py_GetErrno() || (*endptr != '\n') || (endptr[1] != '\0')) { > /* Hm, maybe we've got something long. Let's try reading > it as a Python long object. */ > Py_SetErrno(0); Question. You showed that errno was #defined as a call to the right function. Why don't you leave *getting* errno alone? You talk of 100s of places using errno. But how many places *set* errno? > Keeping in mind that adding NetWare or (other embedded OS or BIOS > that wants to play) then the #ifdef version gets much worse. The > macro version doesn't change. Brad, nobody said they preferred the #ifdef version over the macro! The question is simply whether to use the macro or keep using errno, in tune with Standard C. > There are approximately 140 references to errno in the Modules > directory alone. For the Alpha port of Python 2.2 to CE I changed > every one of them (at least for any module that could run on CE, > which is just about everything that runs on Win32) > > I've already said that I've made these changes and am willing to > make them all again. Once they're in there, how difficult will it > be to keep them? Experience shows that each new release will be broken for your platform anyway unless you actively maintain it as we gear up for a release. Even between alpha or beta releases the code base is likely to change in some subtle way that breaks your release. So you'll have to chase down new uses of errno assignment leading up to each release. > New code that uses errno will fail on subsequent builds on these > errno-less platforms, but there will only be a handful of changes > needed, rather than hundreds on every release of the core. If you maintain a branch that uses the errno macro, you could merge the trunk into that branch each time you feel like synching up with the trunk. That's a mostly mechanical process, certainly less than fixing 100s of errno uses manually each time. Or you could simply maintain a patch in the form of a context diff that patches the 100s of places using errno -- assuming this is mostly in stable code, you'd only have to fix up a handful of new occurrences and places where the patch context has gotten out of sync. > So .. developers who just write "errno = " in the future won't be > penalized, rather the porters to errno-less platforms will just have > to convert the expression to macro mode. They *would* be penalized, because you have to fix their code, and then they have to test it again, etc. It's yet one more thing to worry about. > And that conversion process isn't a hardship if it only has to be > done once for any given line of code on any given core release.. But > if we (porters) have to change 140 references every single time > there's a release.. I could see enthusiasm fading away faster than > those errno-less platforms ;-) > > Clearly you guys know what's best better than I do. My line of > reasoning for errno macros was: > > 1. on most platforms the macros compile away to what you'd write in > C anyway > > 2. I'd generate and submit all the initial patches to the core to > switch errno references to macros, leaving the burden of review > and checkin to the core team (sorry) but not the burden of > finding and changing all errno references That's another problem. Whenever there's a massive "peephole" change like this, there are always a few places that are broken but that no reviewer notices and that don't happen to be tested by the test suite. (After all, errno is only consulted when an error occurs, and some errors are darn hard to provoke.) > 3. But once the patches are in-place, future releases wouldn't > require nearly as much effort for re-port's to errno-less > platforms because only a few lines would need to be "fixed up" to > use macros, and only if the changed code didn't use the macros in > the first place. > > 4. Not using the macros in core or extension source isn't an issue > for any platform, except errno-less OS's.. at which time that > code gets macro'ized at the time of the port. > > 5. What this does is reduces effort for future ports to crippled > systems, at the expense of many initial changes, whose subsequent > maintenence shouldn't (hopefully) be a burden, since ports to > crippled systems would maintain the changes. > > Though I do agree, a future mismash of macro's and direct errno > references in the core will be ugly and confusing if that occurs. > > (sorry this is so long, just want to clearly state my case if I have > not already done so) I totally understand your case. I just don't like having to avoid something that's legal according to the C Standard because of backward platforms. Sure, there were platform-specific changes for many other minority platforms. But none of then AFAIK required us to change something that's Standard C -- these changes were usually about system calls or filename conventions. And we bend over for Win32 because it's the dominant platform. For handhelds, I expect that WinCE will be replaced by something less broken soon. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Wed Sep 25 11:21:46 2002 From: mwh@python.net (Michael Hudson) Date: 25 Sep 2002 11:21:46 +0100 Subject: [Python-Dev] httplib.py vs 222 Message-ID: <2mvg4ul685.fsf@starship.python.net> I'm tempted to just dump the trunk's version of httplib.py onto the release branch. Poring over logs reveals that nearly every checkin is marked as a bugfix candidate (occasionally with a question mark). Certainly it seems ot would be easier to work the checkins that aren't bugfixes out of the trunk than work those that are into the branch, if you see what I mean. But I've never used httplib, so I thought I should ask here first (and Cc: the people responsible for most of the changes). Comments? Cheers, M. -- I've even been known to get Marmite *near* my mouth -- but never actually in it yet. Vegamite is right out. UnicodeError: ASCII unpalatable error: vegamite found, ham expected -- Tim Peters, comp.lang.python From rengelin@strw.leidenuniv.nl Wed Sep 25 11:46:15 2002 From: rengelin@strw.leidenuniv.nl (Roeland Rengelink) Date: Wed, 25 Sep 2002 12:46:15 +0200 Subject: [Python-Dev] bug 576990 References: <3D904828.766779BE@strw.leidenuniv.nl> <200209241744.g8OHiEk28139@odiug.zope.com> Message-ID: <3D9193F7.4976E479@strw.leidenuniv.nl> Guido van Rossum wrote: > > > Roeland Rengelink writes: > > > > > 5. This is clearly a profound and interesting bug, but solving this > > > seems to involve cans of worms, ten-foot poles, and a re-write of the > > > core. > > [Martin] > > To me, it sounds like this. This has been changed forth and back, and > > in every state, somebody is unhappy. > > Yes, it's very messy, see my comments to the SF bug entry. I see no > fix that doesn't break something else. > > Note that this "worked" in the initial 2.2 release only when the > subclass didn't have a docstring of its own: > > >>> class P(property): > ... "This is class P" > ... > >>> p = P(None, None, None, "this is property p") > >>> p.__doc__ > 'This is class P' > >>> > > The best workaround is I can see that works everywhere is: > > class P(property): > "class P's docstring" > __doc__ = property.__dict__['__doc__'] > > --Guido van Rossum (home page: http://www.python.org/~guido/) Thanks for the response and thanks for the workaround. It does solve my immediate problem, and I can live with losing "class P's docstring" in pydoc. I wish I could do more to help though, Roeland From bkc@murkworks.com Wed Sep 25 15:20:21 2002 From: bkc@murkworks.com (Brad Clements) Date: Wed, 25 Sep 2002 10:20:21 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: <000a01c263fe$0d796de0$060210ac@private> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> Message-ID: <3D918DE6.582.3CFA8D6@localhost> On 24 Sep 2002 at 20:10, Barry Scott wrote: > Windows CE prevents assignment to errno... > > There would be a solution if you compiled all the code as C++. > (Assuming that C++ reserved words are not used in the python code.) Oh that's the other problem I ran into. The core .c has a lot of goto finally; finally: In C mode, this shouldn't matter. But MS's EVT compiler considers "finally" to be a reserved word. I had to change all of these too. (I used local_finally or some such thing) Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From jeremy@alum.mit.edu Wed Sep 25 15:44:10 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 25 Sep 2002 10:44:10 -0400 Subject: [Python-Dev] Re: httplib.py vs 222 In-Reply-To: <2mvg4ul685.fsf@starship.python.net> References: <2mvg4ul685.fsf@starship.python.net> Message-ID: <15761.52154.3349.23866@slothrop.zope.com> >>>>> "MH" == Michael Hudson writes: MH> I'm tempted to just dump the trunk's version of httplib.py onto MH> the release branch. Poring over logs reveals that nearly every MH> checkin is marked as a bugfix candidate (occasionally with a MH> question mark). MH> Certainly it seems ot would be easier to work the checkins that MH> aren't bugfixes out of the trunk than work those that are into MH> the branch, if you see what I mean. MH> But I've never used httplib, so I thought I should ask here MH> first (and Cc: the people responsible for most of the changes). MH> Comments? I think it makes sense to make httplib identical. The changes to httplib have all been intended to make it more robust. My one worry is that a set of changes I made may have broken pipelined https requests in order to fix a different set of bugs. I had intended to check whether pipelined https requests actually worked in 2.2.1. Jeremy From mwh@python.net Wed Sep 25 16:14:25 2002 From: mwh@python.net (Michael Hudson) Date: 25 Sep 2002 16:14:25 +0100 Subject: [Python-Dev] Re: httplib.py vs 222 In-Reply-To: Jeremy Hylton's message of "Wed, 25 Sep 2002 10:44:10 -0400" References: <2mvg4ul685.fsf@starship.python.net> <15761.52154.3349.23866@slothrop.zope.com> Message-ID: <2madm6qey6.fsf@starship.python.net> Jeremy Hylton writes: > >>>>> "MH" == Michael Hudson writes: > > MH> I'm tempted to just dump the trunk's version of httplib.py onto > MH> the release branch. Poring over logs reveals that nearly every > MH> checkin is marked as a bugfix candidate (occasionally with a > MH> question mark). > > MH> Certainly it seems ot would be easier to work the checkins that > MH> aren't bugfixes out of the trunk than work those that are into > MH> the branch, if you see what I mean. > > MH> But I've never used httplib, so I thought I should ask here > MH> first (and Cc: the people responsible for most of the changes). > > MH> Comments? > > I think it makes sense to make httplib identical. The changes to > httplib have all been intended to make it more robust. > > My one worry is that a set of changes I made may have broken pipelined > https requests in order to fix a different set of bugs. I had > intended to check whether pipelined https requests actually worked in > 2.2.1. Ooh! Delagation! This is now your problem :) If you (or someone else) don't get to it before 2.2.2's release date, I'll just dump the trunk's version into the branch. Cheers, M. -- Premature optimization is the root of all evil. -- Donald E. Knuth, Structured Programming with goto Statements From guido@python.org Wed Sep 25 16:32:00 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 25 Sep 2002 11:32:00 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Wed, 25 Sep 2002 10:20:21 EDT." <3D918DE6.582.3CFA8D6@localhost> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <3D918DE6.582.3CFA8D6@localhost> Message-ID: <200209251532.g8PFW0O02531@odiug.zope.com> > Oh that's the other problem I ran into. > > The core .c has a lot of > > goto finally; > > > finally: > > In C mode, this shouldn't matter. But MS's EVT compiler considers > "finally" to be a reserved word. I had to change all of these > too. (I used local_finally or some such thing) This compiler seems to fly in the face of the C std whenever it can. What are they trying to accomplish? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Sep 25 16:31:29 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 25 Sep 2002 10:31:29 -0500 Subject: [Python-Dev] Re: httplib.py vs 222 In-Reply-To: <2mvg4ul685.fsf@starship.python.net> References: <2mvg4ul685.fsf@starship.python.net> Message-ID: <15761.54993.969097.285258@12-248-11-90.client.attbi.com> mwh> I'm tempted to just dump the trunk's version of httplib.py onto the mwh> release branch. Poring over logs reveals that nearly every checkin mwh> is marked as a bugfix candidate (occasionally with a question mwh> mark). ... mwh> But I've never used httplib, so I thought I should ask here first mwh> (and Cc: the people responsible for most of the changes). The one change I applied to httplib since the 2.2 release was to fix a problem with invalid urls. If a colon follows the server name but is followed by a non-numeric string or no string at all before the start of the path, an InvalidURL exception is raised. This is definitely a bugfix candidate. Jeremy's hand has been on that module much more heavily. I think it should probably be his call. Skip From bkc@murkworks.com Wed Sep 25 16:35:56 2002 From: bkc@murkworks.com (Brad Clements) Date: Wed, 25 Sep 2002 11:35:56 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209251532.g8PFW0O02531@odiug.zope.com> References: Your message of "Wed, 25 Sep 2002 10:20:21 EDT." <3D918DE6.582.3CFA8D6@localhost> Message-ID: <3D919F9D.20505.414DCD1@localhost> On 25 Sep 2002 at 11:32, Guido van Rossum wrote: > > In C mode, this shouldn't matter. But MS's EVT compiler considers > > "finally" to be a reserved word. I had to change all of these > > too. (I used local_finally or some such thing) > > This compiler seems to fly in the face of the C std whenever it can. > What are they trying to accomplish? World domination, what else? -- I'm not sure, but I think my latest version of Metrowerks has the same issue. In any case, using "finally" eliminates the possibility of compiling the core as C++, regardless of the compiler being used. I thought I remember seeing a thread on the subject of C++ compilation somewhere. So, the suggested Errno class hack would work, except I still have to change all the finally's. I suspect there are other issues with C++ compilation that I am not aware of. I'm not proposing anything specific here, just rambling. I need to answer your other post. Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From ark@research.att.com Wed Sep 25 16:38:43 2002 From: ark@research.att.com (Andrew Koenig) Date: 25 Sep 2002 11:38:43 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209251532.g8PFW0O02531@odiug.zope.com> References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook> <3D918DE6.582.3CFA8D6@localhost> <200209251532.g8PFW0O02531@odiug.zope.com> Message-ID: Guido> This compiler seems to fly in the face of the C std whenever it can. Guido> What are they trying to accomplish? To extend the language in ways that lock customers into their platform. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From mwh@python.net Wed Sep 25 16:41:44 2002 From: mwh@python.net (Michael Hudson) Date: Wed, 25 Sep 2002 16:41:44 +0100 (BST) Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: <3D919F9D.20505.414DCD1@localhost> Message-ID: On Wed, 25 Sep 2002, Brad Clements wrote: > In any case, using "finally" eliminates the possibility of compiling the > core as C++, regardless of the compiler being used. Not in this universe. I think there are already bigger barriers in the way of compiling the Python source as C# or Java... Cheers, M. From bkc@murkworks.com Wed Sep 25 16:44:11 2002 From: bkc@murkworks.com (Brad Clements) Date: Wed, 25 Sep 2002 11:44:11 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209250129.g8P1TJp26117@pcp02138704pcs.reston01.va.comcast.net> References: Your message of "Tue, 24 Sep 2002 15:31:23 EDT." <3D908550.9509.47E65C8B@localhost> Message-ID: <3D91A18C.28183.41C688A@localhost> On 24 Sep 2002 at 21:29, Guido van Rossum wrote: > Question. You showed that errno was #defined as a call to the right > function. Why don't you leave *getting* errno alone? Sorry I forgot to clarify that part. Windows CE 1 and 2 have "errno", but in CE 3.0 they improved the OS by eliminating errno and replacing it with GetLastError() I suspect this was done to allow CE to be embedded on new processor types that would not otherwise be supported. > You talk of 100s of places using errno. But how many places *set* > errno? In the modules dir, grep shows: File cmathmodule.c: Py_SetErrno(0); File cPickle.c: Py_SetErrno(0); Py_SetErrno(0); Py_SetErrno(0); File mathmodule.c: Py_SetErrno(0); Py_SetErrno(0); Py_SetErrno(0); Py_SetErrno(0); Py_SetErrno(0); But alas for the GetLastError() issue, this wouldn't be so bad. > If you maintain a branch that uses the errno macro, you could merge > the trunk into that branch each time you feel like synching up with > the trunk. That's a mostly mechanical process, certainly less than > fixing 100s of errno uses manually each time. I agree. I think this will be the best way to go. > That's another problem. Whenever there's a massive "peephole" change > like this, there are always a few places that are broken but that no > reviewer notices and that don't happen to be tested by the test suite. > (After all, errno is only consulted when an error occurs, and some errors > are darn hard to provoke.) I hadn't considered that aspect of the issue. -- Seems then that creating a new SF project to hold a "derivative work?" of the core is the best way to go, but the only difference in this work is a) using macros for errno b) changing "finally" labels to something else. Anyone have a good suggestion for the name of this proposed project? (or, would it be a branch of the core? I'm sorry, I'm still a cvs virgin) I don't think making it CE specific is correct, since it would also be used for NetWare. (oh, did I say 10 years for NetWare? I meant 19) Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From ark@research.att.com Wed Sep 25 16:44:15 2002 From: ark@research.att.com (Andrew Koenig) Date: 25 Sep 2002 11:44:15 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: <3D919F9D.20505.414DCD1@localhost> References: <3D918DE6.582.3CFA8D6@localhost> <3D919F9D.20505.414DCD1@localhost> Message-ID: Brad> In any case, using "finally" eliminates the possibility of Brad> compiling the core as C++, regardless of the compiler being Brad> used. "finally" is not a keyword in standard C++. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From bkc@murkworks.com Wed Sep 25 16:45:39 2002 From: bkc@murkworks.com (Brad Clements) Date: Wed, 25 Sep 2002 11:45:39 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: References: <3D919F9D.20505.414DCD1@localhost> Message-ID: <3D91A1E4.7608.41DC28B@localhost> On 25 Sep 2002 at 16:41, Michael Hudson wrote: > On Wed, 25 Sep 2002, Brad Clements wrote: > > > In any case, using "finally" eliminates the possibility of compiling the > > core as C++, regardless of the compiler being used. > > Not in this universe. I think there are already bigger barriers in the way > of compiling the Python source as C# or Java... Uh, how'd we go from C++ to C#/Java ? Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From pedronis@bluewin.ch Wed Sep 25 16:36:47 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Wed, 25 Sep 2002 17:36:47 +0200 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? References: <3D919F9D.20505.414DCD1@localhost> <3D91A1E4.7608.41DC28B@localhost> Message-ID: <003501c264a9$5c46f540$6d94fea9@newmexico> From: Brad Clements > On 25 Sep 2002 at 16:41, Michael Hudson wrote: > > > On Wed, 25 Sep 2002, Brad Clements wrote: > > > > > In any case, using "finally" eliminates the possibility of compiling the > > > core as C++, regardless of the compiler being used. > > > > Not in this universe. I think there are already bigger barriers in the way > > of compiling the Python source as C# or Java... > > Uh, how'd we go from C++ to C#/Java ? C++ to C#, I think first passing through managed C++, one of the latest MS inventions . regards. From bkc@murkworks.com Wed Sep 25 16:50:30 2002 From: bkc@murkworks.com (Brad Clements) Date: Wed, 25 Sep 2002 11:50:30 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209251537.LAA00118@anvil.murkworks.com> References: <3D91A1E4.7608.41DC28B@localhost> Message-ID: <3D91A307.29702.42231ED@localhost> On 25 Sep 2002 at 17:48, Alex Martelli wrote: > On Wednesday 25 September 2002 05:45 pm, you wrote: > > On 25 Sep 2002 at 16:41, Michael Hudson wrote: > > > On Wed, 25 Sep 2002, Brad Clements wrote: > > > > In any case, using "finally" eliminates the possibility of compiling > > > > the core as C++, regardless of the compiler being used. > > > > > > Not in this universe. I think there are already bigger barriers in the > > > way of compiling the Python source as C# or Java... > > > > Uh, how'd we go from C++ to C#/Java ? > > "finally" is a reserved word in Java and C# (and Python:-), but not in C++. > > > Alex Thanks for the clarification. This is what I get for using tools from the evil empire -- I lose my perspective of standards! ;-) Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements From guido@python.org Wed Sep 25 17:16:42 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 25 Sep 2002 12:16:42 -0400 Subject: [Python-Dev] Re: httplib.py vs 222 In-Reply-To: Your message of "Wed, 25 Sep 2002 16:14:25 BST." <2madm6qey6.fsf@starship.python.net> References: <2mvg4ul685.fsf@starship.python.net> <15761.52154.3349.23866@slothrop.zope.com> <2madm6qey6.fsf@starship.python.net> Message-ID: <200209251616.g8PGGgM11638@odiug.zope.com> > If you (or someone else) don't get to it before 2.2.2's release date, > I'll just dump the trunk's version into the branch. Why don't you do that now, so we don't forget that part. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Sep 25 17:32:29 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 25 Sep 2002 12:32:29 -0400 Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: Your message of "Wed, 25 Sep 2002 11:44:11 EDT." <3D91A18C.28183.41C688A@localhost> References: "Your message of Tue, 24 Sep 2002 15:31:23 EDT." <3D908550.9509.47E65C8B@localhost> <3D91A18C.28183.41C688A@localhost> Message-ID: <200209251632.g8PGWT811833@odiug.zope.com> > > Question. You showed that errno was #defined as a call to the right > > function. Why don't you leave *getting* errno alone? > > Sorry I forgot to clarify that part. > > Windows CE 1 and 2 have "errno", but in CE 3.0 they improved > the OS by eliminating errno and replacing it with > GetLastError() > > I suspect this was done to allow CE to be embedded on new > processor types that would not otherwise be supported. > > > > You talk of 100s of places using errno. But how many places *set* > > errno? > > In the modules dir, grep shows: > > File cmathmodule.c: > Py_SetErrno(0); > File cPickle.c: > Py_SetErrno(0); > Py_SetErrno(0); > Py_SetErrno(0); > File mathmodule.c: > Py_SetErrno(0); > Py_SetErrno(0); > Py_SetErrno(0); > Py_SetErrno(0); > Py_SetErrno(0); > > But alas for the GetLastError() issue, this wouldn't be so bad. Well, *that* is easily solved in pyport.h: #ifdef ...WINCE... #ifndef errno #define errno GetLastError() #endif #endif Much better than changing every use of errno, isn't it? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Wed Sep 25 18:01:14 2002 From: mwh@python.net (Michael Hudson) Date: 25 Sep 2002 18:01:14 +0100 Subject: [Python-Dev] Re: httplib.py vs 222 References: <2mvg4ul685.fsf@starship.python.net> <15761.52154.3349.23866@slothrop.zope.com> <2madm6qey6.fsf@starship.python.net> <200209251616.g8PGGgM11638@odiug.zope.com> Message-ID: <2mznu62ecl.fsf@starship.python.net> Guido van Rossum writes: > > If you (or someone else) don't get to it before 2.2.2's release date, > > I'll just dump the trunk's version into the branch. > > Why don't you do that now, so we don't forget that part. Sure. It seems Jeremy had already backported quite a pile of these fixes back in July. Cheers, M. -- There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare From tim@multitalents.net Wed Sep 25 21:09:07 2002 From: tim@multitalents.net (Tim Rice) Date: Wed, 25 Sep 2002 13:09:07 -0700 (PDT) Subject: [Python-Dev] building 2.2.2 on SCO Open Server Message-ID: I'm trying to get the release22-maint branch to build on SCO Open Server 5. When setup.py fails to import an extention but the .c file compiles, how do you track down why it failed? Ie. (lines formated for readability) case $MAKEFLAGS in \ *-s*) CC='cc' LDSHARED='cc -G -Kpic -Ki486 -belf -Wl,-Bexport' \ OPT='-DNDEBUG -O -Ki486 -DSCO5' ./python \ -E /opt/src/utils/python/python-2.2.2/src/setup.py -q build;; \ *) CC='cc' LDSHARED='cc -G -Kpic -Ki486 -belf -Wl,-Bexport' \ OPT='-DNDEBUG -O -Ki486 -DSCO5' ./python \ -E /opt/src/utils/python/python-2.2.2/src/setup.py build;; \ esac running build running build_ext building 'struct' extension [snip] building 'pwd' extension cc -DNDEBUG -O -Ki486 -DSCO5 -Kpic -dy -Bdynamic -I. \ -I/opt/src/utils/python/python-2.2.2/src/./Include \ -I/usr/local/include -IInclude/ \ -c /opt/src/utils/python/python-2.2.2/src/Modules/pwdmodule.c \ -o build/temp.sco_sv-3.2-i386-2.2/pwdmodule.o cc -G -Kpic -Ki486 -belf -Wl,-Bexport \ build/temp.sco_sv-3.2-i386-2.2/pwdmodule.o -L/usr/local/lib \ -o build/lib.sco_sv-3.2-i386-2.2/pwd.so WARNING: removing "pwd" since importing it failed -- Tim Rice Multitalents (707) 887-1469 tim@multitalents.net From martin@v.loewis.de Wed Sep 25 21:19:56 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 25 Sep 2002 22:19:56 +0200 Subject: [Python-Dev] building 2.2.2 on SCO Open Server In-Reply-To: References: Message-ID: Tim Rice writes: > I'm trying to get the release22-maint branch to build on > SCO Open Server 5. When setup.py fails to import an extention but > the .c file compiles, how do you track down why it failed? You invoke the compilation commands manually (as printed), then start an interactive session and import try to import the module. Regards, Martin From sholden@holdenweb.com Wed Sep 25 21:22:57 2002 From: sholden@holdenweb.com (Steve Holden) Date: Wed, 25 Sep 2002 16:22:57 -0400 Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? References: <000d01c263cd$bd76cc00$e000a8c0@thomasnotebook><3D918DE6.582.3CFA8D6@localhost><200209251532.g8PFW0O02531@odiug.zope.com> Message-ID: <01b801c264d1$586260e0$6300000a@holdenweb.com> ----- Original Message ----- From: "Andrew Koenig" To: "Guido van Rossum" Cc: ; Sent: Wednesday, September 25, 2002 11:38 AM Subject: Re: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? > Guido> This compiler seems to fly in the face of the C std whenever it can. > Guido> What are they trying to accomplish? > > To extend the language in ways that lock customers into their platform. > The infamous "hijack an open standard by adding proprietary extensions" philosophy, first really publicised by the Halloween documents. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ Previous .sig file retired to www.homeforoldsigs.com ----------------------------------------------------------------------- From tim@multitalents.net Wed Sep 25 21:46:28 2002 From: tim@multitalents.net (Tim Rice) Date: Wed, 25 Sep 2002 13:46:28 -0700 (PDT) Subject: [Python-Dev] building 2.2.2 on SCO Open Server In-Reply-To: Message-ID: On 25 Sep 2002, Martin v. Loewis wrote: > Tim Rice writes: > > > I'm trying to get the release22-maint branch to build on > > SCO Open Server 5. When setup.py fails to import an extention but > > the .c file compiles, how do you track down why it failed? > > You invoke the compilation commands manually (as printed), then start > an interactive session and import try to import the module. > > Regards, > Martin > Thanks. I was sure I had tried that before. Oh well. It does tell me what the problem is. Now I have to track down why it can't find setpwent(). The man pacge says it's in libc. -- Tim Rice Multitalents (707) 887-1469 tim@multitalents.net From pinard@iro.umontreal.ca Wed Sep 25 22:07:13 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: Wed, 25 Sep 2002 17:07:13 -0400 Subject: [Python-Dev] sorted() Message-ID: Hi, Guido, and people. It recurrently happens that newcomers on the Python mailing list are surprised that list.sort() does not return the sorted list as value. I quite understand and agree that this is a good thing, because sorting is done in place, and Python programmers should stay aware and alert of this fact. Yet, I often see myself writing things like: keys = messages.keys() keys.sort() for key in keys: DO_SOMETHING This is not difficult to write, only slightly annoying. Writing: def sorted(list): list = list[:] list.sort() return list with the goal of simplifying the first excerpt into: for key in sorted(message.keys()): DO_SOMETHING it is not really worth for small programs. But in larger programs, where one often loops over the sorted element of a list, it might become reasonable to write this extra definition. My feeling is that the idiom is common enough to be worth a list method, so the above could be written instead: for key in message.keys().sorted(): DO_SOMETHING I immediately see an advantage and an inconvenient. The inconvenient is that users might confuse `.sort()' with `.sorted()', however we decide to spell `sorted', so the existence of both may be some kind of trap. The advantage is that the `.sorted()' method fits well within how Python has evolved recently, offering more concise and legible writings for frequent idioms. Tim invested a lot of courageous efforts so Python `sort' becomes speedier. A `.sorted()' method requires separate space to hold the result, using the same size as the original, and that guaranteed extra-space may eventually be put to good use for speeding up the sorting even more. The constraint of a sort being in-place has indeed a cost, and deep down, we agree that this constraint is artificial in contexts where `.sorted()' is really what the user needs. -- François Pinard http://www.iro.umontreal.ca/~pinard From nas@python.ca Wed Sep 25 22:51:25 2002 From: nas@python.ca (Neil Schemenauer) Date: Wed, 25 Sep 2002 14:51:25 -0700 Subject: [Python-Dev] No __mod__ on str? Message-ID: <20020925215125.GA14922@glacier.arctrix.com> Here's the code for PyNumber_Remainder: PyObject * PyNumber_Remainder(PyObject *v, PyObject *w) { if (PyString_Check(v)) return PyString_Format(v, w); #ifdef Py_USING_UNICODE else if (PyUnicode_Check(v)) return PyUnicode_Format(v, w); #endif return binary_op(v, w, NB_SLOT(nb_remainder), "%"); } Is there any good reason why str.__mod__ != PyString_Format? I want to make a subclass of str that overrides the format operator. I guess one side effect would be that PyNumber_Check(astring) would start returning true. Should I file a bug saying "can't override __mod__ on str and unicode subclasses"? I guess the fix would be to check for nb_remainder first and then fallback to PyString_Format or PyUnicode_Format. Neil From guido@python.org Thu Sep 26 00:46:05 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 25 Sep 2002 19:46:05 -0400 Subject: [Python-Dev] No __mod__ on str? In-Reply-To: Your message of "Wed, 25 Sep 2002 14:51:25 PDT." <20020925215125.GA14922@glacier.arctrix.com> References: <20020925215125.GA14922@glacier.arctrix.com> Message-ID: <200209252346.g8PNk5q29231@pcp02138704pcs.reston01.va.comcast.net> > Here's the code for PyNumber_Remainder: > > PyObject * > PyNumber_Remainder(PyObject *v, PyObject *w) > { > if (PyString_Check(v)) > return PyString_Format(v, w); > #ifdef Py_USING_UNICODE > else if (PyUnicode_Check(v)) > return PyUnicode_Format(v, w); > #endif > return binary_op(v, w, NB_SLOT(nb_remainder), "%"); > } > > Is there any good reason why str.__mod__ != PyString_Format? I want to > make a subclass of str that overrides the format operator. I guess one > side effect would be that PyNumber_Check(astring) would start returning > true. Good catch. I think this is a relic from before str and unicode were subclassable. > Should I file a bug saying "can't override __mod__ on str and unicode > subclasses"? I guess the fix would be to check for nb_remainder first > and then fallback to PyString_Format or PyUnicode_Format. Yes please. If you can provide a fix, make it a patch. Anyway assign it to me. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Thu Sep 26 01:44:15 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 26 Sep 2002 12:44:15 +1200 (NZST) Subject: Reserved keywords in source: was RE: [Python-Dev] Assign to errno allowed? In-Reply-To: <3D919F9D.20505.414DCD1@localhost> Message-ID: <200209260044.g8Q0iF306159@oma.cosc.canterbury.ac.nz> Brad Clements : > I'm not sure, but I think my latest version of Metrowerks has the > same issue. Doesn't Metrowerks have a "strict ANSI" switch that turns off all the language extensions? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Sep 26 01:56:47 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 26 Sep 2002 12:56:47 +1200 (NZST) Subject: [Python-Dev] sorted() In-Reply-To: Message-ID: <200209260056.g8Q0ulp06182@oma.cosc.canterbury.ac.nz> pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard): > The advantage is that the `.sorted()' method fits well within how > Python has evolved recently, offering more concise and legible > writings for frequent idioms. I prefer the idea of making sorted() a separate function, because it can then be made to work on any sequence that can be copied and has a sort() method. To support specialised non-in-place sorting algorithms, it could check whether its argument has a sorted() method, and if not, fall back on the general implementation. This seems more Pythonic to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Sep 26 01:59:48 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 26 Sep 2002 12:59:48 +1200 (NZST) Subject: [Python-Dev] Assign to errno allowed? In-Reply-To: <200209251632.g8PGWT811833@odiug.zope.com> Message-ID: <200209260059.g8Q0xmL06185@oma.cosc.canterbury.ac.nz> > #ifdef ...WINCE... ^^^^^ I wonder if Microsoft foresaw that abbreviation when they chose the name CE... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tdelaney@avaya.com Thu Sep 26 02:13:27 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Thu, 26 Sep 2002 11:13:27 +1000 Subject: [Python-Dev] sorted() Message-ID: > From: Greg Ewing [mailto:greg@cosc.canterbury.ac.nz] > > pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard): > > > The advantage is that the `.sorted()' method fits well within how > > Python has evolved recently, offering more concise and legible > > writings for frequent idioms. > > To support specialised non-in-place sorting algorithms, > it could check whether its argument has a sorted() > method, and if not, fall back on the general implementation. Hmm - this actually suggests a couple more magic methods: __sort__ __isort__ corresponding to "sort a copy" and "sort in-place". Defining the rules for how these would be called requires a bit more thought however. Do you want a sort() function to prefer __sort__ or __isort__? def sort (seq, in_place=1): if in_place: return seq.__isort__() try: return seq.__sort__() except: pass seq = list(seq) seq.sort() return seq So - if an in-place sort is specified, try to do one, throwing an exception if it's not possible. Otherwise sort a copy. This would allow a generic mechanism for objects to ort copies of themselves, rather than blindly changing them to a list. Would two methods be better for in-place and copy sort? Tim Delaney From David Abrahams" Message-ID: <124601c26506$91150c00$6701a8c0@boostconsulting.com> From: "Greg Ewing" > Brad Clements : > > > I'm not sure, but I think my latest version of Metrowerks has the > > same issue. > > Doesn't Metrowerks have a "strict ANSI" switch that turns > off all the language extensions? Yes. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From tim@multitalents.net Thu Sep 26 16:52:01 2002 From: tim@multitalents.net (Tim Rice) Date: Thu, 26 Sep 2002 08:52:01 -0700 (PDT) Subject: [Python-Dev] Lib/plat-xxxx directories Message-ID: Can someone enlighten me as to the purpose of the Lib/plat-xxxx directories. What goes in there? Why? I'm trying to figure out if I should be creating one for SCO Open Server BTW. Someone with CVS write access should probably cd Lib ln -s plat-unixware7 plat-openunix8 cvs add plat-openunix8 OpenUNIX 8.0.0 is really UnixWare 7.1.2 underneath. -- Tim Rice Multitalents (707) 887-1469 tim@multitalents.net From guido@python.org Thu Sep 26 17:05:30 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 26 Sep 2002 12:05:30 -0400 Subject: [Python-Dev] Lib/plat-xxxx directories In-Reply-To: Your message of "Thu, 26 Sep 2002 08:52:01 PDT." References: Message-ID: <200209261605.g8QG5UP29963@odiug.zope.com> > Can someone enlighten me as to the purpose of the Lib/plat-xxxx > directories. What goes in there? Why? They're for platform-specific modules. In most cases, the only platform-specific modules are collections of system constants generated by Tools/scripts/h2py.py. For an example, see the regen script in plat-linux2. (It assumes you've set up an alias "h2py" for the script.) I wouldn't bother unless you have an actual use in mind. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim@multitalents.net Thu Sep 26 17:09:07 2002 From: tim@multitalents.net (Tim Rice) Date: Thu, 26 Sep 2002 09:09:07 -0700 (PDT) Subject: [Python-Dev] Lib/plat-xxxx directories In-Reply-To: <200209261605.g8QG5UP29963@odiug.zope.com> Message-ID: On Thu, 26 Sep 2002, Guido van Rossum wrote: > > Can someone enlighten me as to the purpose of the Lib/plat-xxxx > > directories. What goes in there? Why? > > They're for platform-specific modules. In most cases, the only > platform-specific modules are collections of system constants > generated by Tools/scripts/h2py.py. For an example, see the regen > script in plat-linux2. (It assumes you've set up an alias "h2py" for > the script.) I wouldn't bother unless you have an actual use in mind. OK, Thanks. I'll bundle up my patches and post them to the patch manager. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- Tim Rice Multitalents (707) 887-1469 tim@multitalents.net From guido@python.org Thu Sep 26 20:13:15 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 26 Sep 2002 15:13:15 -0400 Subject: [Python-Dev] How to add an encoding alias? Message-ID: <200209261913.g8QJDFe01575@odiug.zope.com> In the spambayes project we encountered some mail samples that use an encoding name ('ansi-x3-4-1968') that's not in encodings/aliases.py. (At least not until I added it to CVS yesterday.) I'd like the spambayes code base to be compatible with Python 2.2.1, so I like to add this one to the list of aliases. Is there an official API to add an alias, or do I just have to write import encodings.aliases encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii' ??? (BTW, there's an alias 'ansi_x3.4_1986' for ASCII. Was the ASCII standard renewed in 1986, or is that simply because there are encoding designators out there in real life that contain a typo?) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Sep 26 20:41:59 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 26 Sep 2002 21:41:59 +0200 Subject: [Python-Dev] How to add an encoding alias? References: <200209261913.g8QJDFe01575@odiug.zope.com> Message-ID: <3D936307.1040709@lemburg.com> Guido van Rossum wrote: > In the spambayes project we encountered some mail samples that use an > encoding name ('ansi-x3-4-1968') that's not in encodings/aliases.py. > (At least not until I added it to CVS yesterday.) > > I'd like the spambayes code base to be compatible with Python 2.2.1, > so I like to add this one to the list of aliases. > > Is there an official API to add an alias, or do I just have to write > > import encodings.aliases > encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii' > > ??? There's no other API to do this and since new features are not allowed in 2.2.x that's the only way to go unless you register your own lookup function which knows about the extra alias. > (BTW, there's an alias 'ansi_x3.4_1986' for ASCII. Was the ASCII > standard renewed in 1986, or is that simply because there are encoding > designators out there in real life that contain a typo?) That was one of the official names for ASCII: http://www.archivists.org/catalog/stds99/chapter7.html#x3_4 More details on the history of ASCII can be found at the top of that page. The original version X3.4 was approved in 1968, so it's not a typo. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From mal@lemburg.com Thu Sep 26 20:43:37 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 26 Sep 2002 21:43:37 +0200 Subject: [Python-Dev] How to add an encoding alias? References: <200209261913.g8QJDFe01575@odiug.zope.com> Message-ID: <3D936369.3000908@lemburg.com> Guido van Rossum wrote: > import encodings.aliases > encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii' In order for the lookup to work, you have to replace hyphens with underscores; see the top of aliases.py. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Thu Sep 26 21:00:15 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 26 Sep 2002 16:00:15 -0400 Subject: [Python-Dev] How to add an encoding alias? In-Reply-To: Your message of "Thu, 26 Sep 2002 21:41:59 +0200." <3D936307.1040709@lemburg.com> References: <200209261913.g8QJDFe01575@odiug.zope.com> <3D936307.1040709@lemburg.com> Message-ID: <200209262000.g8QK0Fl01925@odiug.zope.com> > > I'd like the spambayes code base to be compatible with Python 2.2.1, > > so I like to add this one to the list of aliases. > > > > Is there an official API to add an alias, or do I just have to write > > > > import encodings.aliases > > encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii' > > > > ??? > > There's no other API to do this and since new features are > not allowed in 2.2.x that's the only way to go unless you register > your own lookup function which knows about the extra alias. Thanks, I'll do that. > > (BTW, there's an alias 'ansi_x3.4_1986' for ASCII. Was the ASCII > > standard renewed in 1986, or is that simply because there are encoding > > designators out there in real life that contain a typo?) > > That was one of the official names for ASCII: > > http://www.archivists.org/catalog/stds99/chapter7.html#x3_4 > > More details on the history of ASCII can be found at the > top of that page. The original version X3.4 was approved > in 1968, so it's not a typo. Wow. Cute. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Sep 26 21:03:05 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 26 Sep 2002 16:03:05 -0400 Subject: [Python-Dev] How to add an encoding alias? In-Reply-To: Your message of "Thu, 26 Sep 2002 21:43:37 +0200." <3D936369.3000908@lemburg.com> References: <200209261913.g8QJDFe01575@odiug.zope.com> <3D936369.3000908@lemburg.com> Message-ID: <200209262003.g8QK36P01952@odiug.zope.com> > Guido van Rossum wrote: > > import encodings.aliases > > encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii' > > In order for the lookup to work, you have to replace hyphens > with underscores; see the top of aliases.py. Good catch! Then my "fix" to aliases.py was also wrong. Would it make sense to change the lookup function to convert *all* punctuation to underscores before doing the lookup? (Then this one would actually have worked...) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Sep 26 21:14:28 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 26 Sep 2002 22:14:28 +0200 Subject: [Python-Dev] How to add an encoding alias? References: <200209261913.g8QJDFe01575@odiug.zope.com> <3D936369.3000908@lemburg.com> <200209262003.g8QK36P01952@odiug.zope.com> Message-ID: <3D936AA4.1080302@lemburg.com> Guido van Rossum wrote: >>Guido van Rossum wrote: >> >>> import encodings.aliases >>> encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii' >> >>In order for the lookup to work, you have to replace hyphens >>with underscores; see the top of aliases.py. > > > Good catch! Then my "fix" to aliases.py was also wrong. > > Would it make sense to change the lookup function to convert *all* > punctuation to underscores before doing the lookup? (Then this one > would actually have worked...) Codecs must currently use names as defined by the search function in the encodings package: Codec modules must have names corresponding to standard lower-case encoding names with hyphens mapped to underscores, e.g. 'utf-8' is implemented by the module 'utf_8.py'. We could extend this to: Codec modules must have names corresponding to standard lower-case encoding names with all non-alphanumeric charactersmapped to underscores, e.g. 'utf-8' is implemented by the module 'utf_8.py' and 'ISO 639:1988' would be implemented as module 'iso_639_1988'. Note that the aliasing dictionary is consulted *after* having applied this mapping. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Thu Sep 26 21:27:47 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 26 Sep 2002 16:27:47 -0400 Subject: [Python-Dev] How to add an encoding alias? In-Reply-To: Your message of "Thu, 26 Sep 2002 22:14:28 +0200." <3D936AA4.1080302@lemburg.com> References: <200209261913.g8QJDFe01575@odiug.zope.com> <3D936369.3000908@lemburg.com> <200209262003.g8QK36P01952@odiug.zope.com> <3D936AA4.1080302@lemburg.com> Message-ID: <200209262027.g8QKRlO02176@odiug.zope.com> > > Would it make sense to change the lookup function to convert *all* > > punctuation to underscores before doing the lookup? (Then this one > > would actually have worked...) > > Codecs must currently use names as defined by the search function in the > encodings package: > > Codec modules must have names corresponding to standard lower-case > encoding names with hyphens mapped to underscores, e.g. 'utf-8' is > implemented by the module 'utf_8.py'. > > We could extend this to: > > Codec modules must have names corresponding to standard lower-case > encoding names with all non-alphanumeric charactersmapped to > underscores, e.g. 'utf-8' is implemented by the module 'utf_8.py' > and 'ISO 639:1988' would be implemented as module 'iso_639_1988'. > > Note that the aliasing dictionary is consulted *after* > having applied this mapping. +1; +1 on backport to 2.2.2 also. Note that this requires some changes to the dict in aliases.py. --Guido van Rossum (home page: http://www.python.org/~guido/) From jack_diederich@email.com Fri Sep 27 02:41:45 2002 From: jack_diederich@email.com (Jack Diederich) Date: Thu, 26 Sep 2002 20:41:45 -0500 Subject: [Python-Dev] Sets/Combinitorics, pointer to an implementation Message-ID: <20020927014145.28754.qmail@email.com> I was reading the summary of python-dev for August and saw the exchange about Combinatorics, I emailed a few people in the discussion (guido, et al) who suggested I post to python-dev directly instead. http://probstat.sourceforge.net "Probability and Statistics Utils for Python" Running Code I released in early August that does Combinations, Permutations, and Cartesian Products over python lists. It has python object wrappers for fast C algorithms. It supports iterating over objects, slices, len(), and random access. It was 10+ times faster than the same algos done in python, but I wrote those without generators and I haven't done a benchmark since. It uses standard algos, largely from the Gnu Scientific Library. It lazily produces the cycles in lexiographic order, so the memory it consumes is about twice the size of a shallow copy of the list. A slice of a Permutation object is another Permutation object with smaller internal start/end bounds. I could write more about the implementation, but I'll save my keystrokes unless people actually want to know. A Power class that lazily evaluates the output would be easy to add, here is one: from __future__ import generators from probstat import Combination class Power: def __init__(self, set): self.__set = set def __iter__(self): self.__iter = self.setup_iter() def setup_iter(self): for (i) in range(len(self.__set)+1): if (i == 0): # Combination() doesn't allow N choose zero yield [] else: for (i) in Combination(self.__set, i): yield i def next(self): return self.__iter.next() Enjoy, -jack Eratta: I tried using the Cartesian class to mimic nested for() loops. It is 3 times slower than doing depth 3 nested for loops (i,j,k) in python. That's probably the overhead of the Cartesian class new'ing a tuple and unpacking it for each iteration of the loop. -- __________________________________________________________ Sign-up for your own FREE Personalized E-mail at Mail.com http://www.mail.com/?sr=signup From mal@lemburg.com Fri Sep 27 10:25:15 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 27 Sep 2002 11:25:15 +0200 Subject: [Python-Dev] User extendable literal modifiers ?! Message-ID: <3D9423FB.9070303@lemburg.com> As you might have noticed, I have wrapped several parts of the GMP Multi-Precision (GMP) library in form of Python types in mxNumber. Since these are numbers, it would be convenient if there were some way to create them in form of literals, much like 123L creates longs instead of integers or u"abc" gives you Unicode instead of an 8-bit string. I was wondering whether it would be worth adding something like a registry of literal modifiers to Python, so that extensions can register new modifiers with the compiler, e.g. sitecustomize.py: def create_I_literal(literal_string): return 'mx.Number.Integer(%s)' % literal_string sys.register_numberlitmod('I', create_I_literal) test.py: x = 123I * 456I print x, 234I -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From gerhard.haering@opus-gmbh.net Fri Sep 27 11:33:11 2002 From: gerhard.haering@opus-gmbh.net (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Fri, 27 Sep 2002 10:33:11 +0000 (UTC) Subject: [Python-Dev] Re: User extendable literal modifiers ?! References: <3D9423FB.9070303@lemburg.com> Message-ID: In article <3D9423FB.9070303@lemburg.com>, M.-A. Lemburg wrote: > [mxNumber] > I was wondering whether it would be worth adding something > like a registry of literal modifiers to Python, Especially for this purpose, that would be great. And have potential for misuse, too. Just like, say, operator overloading. But in the context of Python, I didn't see any misuse of operator overloading, yet. > [...] so that > extensions can register new modifiers with the compiler, > e.g. > > sitecustomize.py: > def create_I_literal(literal_string): > return 'mx.Number.Integer(%s)' % literal_string > sys.register_numberlitmod('I', create_I_literal) A single literal, however, doesn't (easily) allow you to give precision and scale arguments to your decimal literal. That's of course easy if you can declare your variable, which you can't in Python. So we're back to constructors/factory functions here, right? -- Gerhard From gmccaughan@synaptics-uk.com Fri Sep 27 12:03:17 2002 From: gmccaughan@synaptics-uk.com (Gareth McCaughan) Date: Fri, 27 Sep 2002 12:03:17 +0100 (BST) Subject: [Python-Dev] Re: User extendable literal modifiers ?! Message-ID: <200209271104.MAA27895@synaptics-uk.com> Marc-Andre Lemburg wrote: > Since these are numbers, it would be convenient if there were > some way to create them in form of literals, much like 123L > creates longs instead of integers or u"abc" gives you Unicode > instead of an 8-bit string. > > I was wondering whether it would be worth adding something > like a registry of literal modifiers to Python, so that > extensions can register new modifiers with the compiler, > e.g. > > sitecustomize.py: > def create_I_literal(literal_string): > return 'mx.Number.Integer(%s)' % literal_string > sys.register_numberlitmod('I', create_I_literal) > > test.py: > x = 123I * 456I > print x, 234I Too limiting. You'd only be able to do this for numbers, and it doesn't seem worth the pain just for numbers. Better would be user-definable *prefixes*. Common Lisp, for instance, makes it easy to customize the reader to recognize tokens of the form . So you can arrange that #Q123,234,456:a(b)c turns into, erm, something terribly useful :-). Some of these characters are already taken for things like arrays [#(1 2 3), #2((1 2) (3 4))], "logical pathnames" (lightly abstracted filenames) [#"foo/bar/baz"], bit vectors [#*0001101011001], and so on. As perceptive readers will have noticed, you can splice a number between "#" and the magic character for special effects. Python could do something similar, though obviously "#" isn't a suitable character :-). Letting the user hijack the reader as completely as can be done in CL would probably be un-Pythonic, too. Here's a strawman suggestion. For any character "x" in some set I can't be bothered to specify, the Python tokenizer/parser will subject input of the form $x to special processing. The string-literal can be formed using any of {',",''',"""}. When I say "tokenizer/parser", I mean: the tokenizer will produce a special token encoding the character "x" and the contents of the string-literal. The parser will perform "special processing" in an attempt to turn it into a more normal token. The default "special processing" is to raise a SyntaxError. The user can define the special processing appropriate for a particular character "x" by making a function that interprets the string and feeding it to sys.register_dollar_handler. (In fact, anything callable will do.) The function will be passed two arguments: the character "x" and the string. Its return value will replace the $x"..." combination in the token stream, as a literal token. If an exception other than a SyntaxError is raised and not caught in the handler function then it will be silently replaced by a SyntaxError whose parameter has the form "ill-formed literal". The value of "xxx" is defined when registering the handler. Handler functions are permitted to call "eval". Example: >>> def handle_rational(char, s): ... assert char == 'r' ... components = s.split('/') ... numerator, denominator = map(int, components) ... return Rational(numerator, denominator) ... >>> sys.register_dollar_handler('r', handle_rational, 'rational') >>> print $r"1/2" + $r"3/4" $r"5/4" >>> print $r"12345" File "", line 1 print $r"12345" ^ SyntaxError: ill-formed rational literal >>> Alternatively: >>> class Rational: ... def __init__(self, x, y): ... if isinstance(x, str): ... x,y = map(int, y.split("/")) ... self._numerator, self._denominator = x,y ... [etc] ... >>> sys.register_dollar_handler('r', Rational, 'rational') Some dollar-syntax characters may be handled by Python itself or the standard library, or may be reserved for their use. It is possible for users to override them, but this should be considered bad practice. Registering a handler when one is already in place will produce a warning. To un-register a handler, pass None instead of the handler function. Possible applications: - Rational numbers. $r"123/234" - Regular expressions. $/"foo.*bar" - Dates and times. $t"2002-09-27 11:38" - Hostnames and ports. $h"www.google.com:80" Questions: - Is this insane? - Is "$" the best character? - Should there be a way to return tokens other than literal ones? For instance, identifiers or keywords? - Is the behaviour with exceptions correct? -- g From mal@lemburg.com Fri Sep 27 12:17:55 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 27 Sep 2002 13:17:55 +0200 Subject: [Python-Dev] Re: User extendable literal modifiers ?! References: <3D9423FB.9070303@lemburg.com> Message-ID: <3D943E63.30104@lemburg.com> Gerhard H=E4ring wrote: > In article <3D9423FB.9070303@lemburg.com>, M.-A. Lemburg wrote: >=20 >>[mxNumber] >>I was wondering whether it would be worth adding something >>like a registry of literal modifiers to Python, >=20 >=20 > Especially for this purpose, that would be great. And have potential fo= r > misuse, too. Just like, say, operator overloading. But in the context o= f > Python, I didn't see any misuse of operator overloading, yet. I was thinking of giving the current concept of literal modifiers a more general scope. Of course, this can be misused, but then we could e.g. put certain constraints on the possible modifiers, say only allow a predefined number of modifiers and then have the compiler at compile time or the interpreter at run-time apply the necessary logic to the literal to turn it into an object. We currently have 'u', 'r', 'U', 'R' as modifiers for strings (prefixes) and 'l', 'L', 'j', 'J' for numbers (postfix). >>[...] so that >>extensions can register new modifiers with the compiler, >>e.g. >> >>sitecustomize.py: >>def create_I_literal(literal_string): >> return 'mx.Number.Integer(%s)' % literal_string >>sys.register_numberlitmod('I', create_I_literal) >=20 >=20 > A single literal, however, doesn't (easily) allow you to give precision= and > scale arguments to your decimal literal. That's of course easy if you c= an > declare your variable, which you can't in Python. So we're back to > constructors/factory functions here, right? Not really, since mxNumber Integers have arbitrary precision, so scale and precision are not needed. --=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From aleax@aleax.it Fri Sep 27 12:53:28 2002 From: aleax@aleax.it (Alex Martelli) Date: Fri, 27 Sep 2002 13:53:28 +0200 Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: <200209271104.MAA27895@synaptics-uk.com> References: <200209271104.MAA27895@synaptics-uk.com> Message-ID: On Friday 27 September 2002 01:03 pm, Gareth McCaughan wrote: ... > Better would be user-definable *prefixes*. Yes -- nice idea. > Its return value will replace the $x"..." combination in > the token stream, as a literal token. Why just one token, and why just literal. Returning an arbitrary sequence of tokens seems more natural. This would allow e.g. Tim Berners-Lee to have basically what he wants (and asked for in his talk at IPC10) in terms of extended syntax for graphs, just with some $x in front. I had a similar idea right after Tim's talk, but could not articulate it clearly enough in a chat with Guido right afterwards, and later I didn't follow through with it. It seems to me that your proposal is detailed and precise enough (while my idea was rather vague) and that, by returning an arbitrary sequence of tokens, it will let Tim embed whatever funky syntax it requires. This power is also the downside of the whole idea of course -- no guarantee that somebody can't use this mechanism to produce highly obfuscated programs. But I think that such a somebody could already obfuscate quite effectively in other ways, and the risk of abuse shouldn't stop this interesting proposal. > ... return Rational(numerator, denominator) Hmmm, how would this "return a literal token"? It returns an instance of Rational -- how does the parser treat this instance as a literal token? I thought this use would have to return the sequence of tokens for identifier 'Rational', open parenthesis, literal (value of) numerator, comma, literal (value of) denominator, closed parenthesis -- which in turn is why I thought of an arbitrary sequence of tokens. If a single instance of any arbitrary class may be returned and get treated as a literal token by the parser, then that's much better (maybe I don't know Python's parser well enough, but I don't clearly see how that would be done). > - Is this insane? Hope not, since I like it. > - Is "$" the best character? Among the few available ones, I think I slightly prefer "@" for this use, but there's little to choose IMHO. Alex From pedronis@bluewin.ch Fri Sep 27 13:29:49 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Fri, 27 Sep 2002 14:29:49 +0200 Subject: [Python-Dev] Re: User extendable literal modifiers ?! References: <200209271104.MAA27895@synaptics-uk.com> Message-ID: <007f01c26621$92d3c060$6d94fea9@newmexico> From: Alex Martelli > I thought this use would have to return the sequence of > tokens for identifier 'Rational', open parenthesis, literal > (value of) numerator, comma, literal (value of) denominator, > closed parenthesis -- which in turn is why I thought of an > arbitrary sequence of tokens. If a single instance of any > arbitrary class may be returned and get treated as a > literal token by the parser, then that's much better indeed, because then otherwise $r"123/234" = literal transformation => Rational(123,234) would require Rational to be installed in the builtins, or some kind of implicit import (ugly) or people would have to rember to put an explicit from ... import Rational in all modules that use $r, one import per program just to register $r would not be enough. regards From gmccaughan@synaptics-uk.com Fri Sep 27 13:58:42 2002 From: gmccaughan@synaptics-uk.com (Gareth McCaughan) Date: Fri, 27 Sep 2002 13:58:42 +0100 (BST) Subject: [Python-Dev] Re[3]: User extendable literal modifiers ?! In-Reply-To: <200209271152.MAA27946@synaptics-uk.com> References: <200209271104.MAA27895@synaptics-uk.com> <200209271152.MAA27946@synaptics-uk.com> Message-ID: <200209271259.NAA28049@synaptics-uk.com> > > Its return value will replace the $x"..." combination in > > the token stream, as a literal token. > > Why just one token, and why just literal. Returning an > arbitrary sequence of tokens seems more natural. This > would allow e.g. Tim Berners-Lee to have basically what > he wants (and asked for in his talk at IPC10) in terms of > extended syntax for graphs, just with some $x in front. 1. I wasn't sure how easy it would be to return an arbitrary sequence of tokens. 2. I wasn't sure how appropriate it was to make users understand the internals of the parser in that way. Transforming a magic token into a literal Python object is easy to understand. Transforming it into an arbitrary sequence of tokens is more powerful but harder to understand. (And harder to claim as analogous with u"...", 123L, etc., though I'm not sure that matters.) > I had a similar idea right after Tim's talk, but could not > articulate it clearly enough in a chat with Guido right > afterwards, and later I didn't follow through with it. It > seems to me that your proposal is detailed and precise > enough (while my idea was rather vague) and that, by > returning an arbitrary sequence of tokens, it will let > Tim embed whatever funky syntax it requires. If we want to be able to generate arbitrary sequences of tokens, I think I'd prefer a more flexible input syntax. > This power is also the downside of the whole idea of > course -- no guarantee that somebody can't use this > mechanism to produce highly obfuscated programs. > But I think that such a somebody could already > obfuscate quite effectively in other ways, and the > risk of abuse shouldn't stop this interesting proposal. I am inclined to agree. > > ... return Rational(numerator, denominator) > > Hmmm, how would this "return a literal token"? It returns > an instance of Rational -- how does the parser treat this > instance as a literal token? > > I thought this use would have to return the sequence of > tokens for identifier 'Rational', open parenthesis, literal > (value of) numerator, comma, literal (value of) denominator, > closed parenthesis -- which in turn is why I thought of an > arbitrary sequence of tokens. If a single instance of any > arbitrary class may be returned and get treated as a > literal token by the parser, then that's much better (maybe > I don't know Python's parser well enough, but I don't > clearly see how that would be done). I don't know Python's parser well enough either :-). However: it can accept NUMBER and STRING tokens. As far as the grammar is concerned, they are exactly the same (except that multiple STRING tokens are implicitly concatenated). As far as everything else is concerned, they are very nearly exactly the same. We could have a LITERAL token, treated in the same sort of way as NUMBER and STRING. That was what I was intending; certainly not returning the token-sequence , <(>, , <,>, , <)> ! > > - Is this insane? > > Hope not, since I like it. Hmm. The other proposal I know you and I both like is the adaptation protocol. This is not necessarily a good omen. :-) > > - Is "$" the best character? > > Among the few available ones, I think I slightly prefer "@" > for this use, but there's little to choose IMHO. Curiously, "@" was the first option I thought of for this. I didn't have any very concrete reason for switching to "$". -- g From fredrik@pythonware.com Fri Sep 27 14:21:45 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 27 Sep 2002 15:21:45 +0200 Subject: [Python-Dev] Re: User extendable literal modifiers ?! References: <200209271104.MAA27895@synaptics-uk.com> Message-ID: <03fd01c26628$d4d6ff20$0900a8c0@spiff> alex wrote: > If a single instance of any arbitrary class may be returned and > get treated as a literal token by the parser, then that's much > better how do you marshal the resulting byte code? From jepler@unpythonic.net Fri Sep 27 14:43:38 2002 From: jepler@unpythonic.net (Jeff Epler) Date: Fri, 27 Sep 2002 08:43:38 -0500 Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: <200209271104.MAA27895@synaptics-uk.com> References: <200209271104.MAA27895@synaptics-uk.com> Message-ID: <20020927134328.GA5941@unpythonic.net> On Fri, Sep 27, 2002 at 12:03:17PM +0100, Gareth McCaughan wrote: > Possible applications: > > - Rational numbers. $r"123/234" > - Regular expressions. $/"foo.*bar" > - Dates and times. $t"2002-09-27 11:38" > - Hostnames and ports. $h"www.google.com:80" Of course, if you have no shame , each of these but $/ can be written with today's syntax in no more characters, placing the type identifier first and then an arbitrary, existing operator second: r+"123/234" This, in turn, saves only one character over r("123/234") Here's an example I wrote for work: class Dimension: ... class DimensionMaker: def __call__(self, v): return Dimension(v) def __add__(self, v): return Dimension(v) D = DimensionMaker() I don't know if we'll ultimately judge the D+"..." syntax justified, given that it feels yucky and saves only one character. Note that we're also treading very close to allowing function calls without parens, if we allow an arbitrary identifier before string literals. What actually happens if you write trailer: test | '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME in the grammar and change the compiler accordingly? I guess the problem becomes that '(' could be the beginning of a testlist from inside atom, but if you could arrange for '(' here to always start an arglist instead, or invent a new production "altpower" trailer: altpower | '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME altpower: altatom trailer* altatom: NAME | NUMBER | STRING+ Now, a x.y.z()[:] becomes legal syntax (and would be a call to 'a' with one arg, x.y.z()[:]) Likewise, D"123/234" becomes legal, and is equivalent to D("123/234"). you have a problem with anything now recognized as a prefix of a string, so r"123/234" can't work as $r"123/234" is proposed to work. Of course, you could make R 123/234 work, since that'd be (R 123)/234 which would be R(123)/234. Personally, I think all of this is pretty ugly. Jeff From mal@lemburg.com Fri Sep 27 14:47:58 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 27 Sep 2002 15:47:58 +0200 Subject: [Python-Dev] Re: User extendable literal modifiers ?! References: <200209271104.MAA27895@synaptics-uk.com> <007f01c26621$92d3c060$6d94fea9@newmexico> Message-ID: <3D94618E.5070509@lemburg.com> Samuele Pedroni wrote: > From: Alex Martelli > >>I thought this use would have to return the sequence of >>tokens for identifier 'Rational', open parenthesis, literal >>(value of) numerator, comma, literal (value of) denominator, >>closed parenthesis -- which in turn is why I thought of an >>arbitrary sequence of tokens. If a single instance of any >>arbitrary class may be returned and get treated as a >>literal token by the parser, then that's much better > > > indeed, because then otherwise > > $r"123/234" = literal transformation => Rational(123,234) > > would require Rational to be installed in the builtins, or some kind of > implicit import (ugly) or people would have to rember to put an explicit from > ... import Rational in all modules that use $r, one import per program just to > register $r would not be enough. These are implementation details, e.g. if Python would provide a way to register new modifiers, these would only start working after having been registered. Let's say that a user wants 123I to map to mx.Number.Integer(123), then he'd have to make sure that mx.Number is imported in sitecustomize.py to have Python load modules containing the 123I literal using the registered object constructor for that literal modifier. Otherwise, the compiler or module loader would fail. There should not be any magic imports going on behind the scenes. Note that the whole point of the idea is to simplify using really basic types. Anything more complicated than a single character modifier would fail to meet this requirement. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From martin@v.loewis.de Fri Sep 27 14:55:48 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 27 Sep 2002 15:55:48 +0200 Subject: [Python-Dev] User extendable literal modifiers ?! In-Reply-To: <3D9423FB.9070303@lemburg.com> References: <3D9423FB.9070303@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > Since these are numbers, it would be convenient if there were > some way to create them in form of literals, much like 123L > creates longs instead of integers or u"abc" gives you Unicode > instead of an 8-bit string. How would you marshal them? Curious, Martin From pedronis@bluewin.ch Fri Sep 27 14:45:57 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Fri, 27 Sep 2002 15:45:57 +0200 Subject: R: [Python-Dev] Re: User extendable literal modifiers ?! References: <200209271104.MAA27895@synaptics-uk.com> <007f01c26621$92d3c060$6d94fea9@newmexico> <3D94618E.5070509@lemburg.com> Message-ID: <022e01c2662c$354c9240$6d94fea9@newmexico> From: M.-A. Lemburg > Samuele Pedroni wrote: > > From: Alex Martelli > > > >>I thought this use would have to return the sequence of > >>tokens for identifier 'Rational', open parenthesis, literal > >>(value of) numerator, comma, literal (value of) denominator, > >>closed parenthesis -- which in turn is why I thought of an > >>arbitrary sequence of tokens. If a single instance of any > >>arbitrary class may be returned and get treated as a > >>literal token by the parser, then that's much better > > > > > > indeed, because then otherwise > > > > $r"123/234" = literal transformation => Rational(123,234) > > > > would require Rational to be installed in the builtins, or some kind of > > implicit import (ugly) or people would have to rember to put an explicit from > > ... import Rational in all modules that use $r, one import per program just to > > register $r would not be enough. > > These are implementation details, e.g. if Python would > provide a way to register new modifiers, these would only > start working after having been registered. > > Let's say that a user wants 123I to map to mx.Number.Integer(123), > then he'd have to make sure that mx.Number is imported in > sitecustomize.py to have Python load modules containing the > 123I literal using the registered object constructor for that > literal modifier. Otherwise, the compiler or module loader > would fail. There should not be any magic imports going on > behind the scenes. yes but that' my point, I simply pointed out that the strategy that simply re-interprets $r through a lexical transformation does not work. regards. From guido@python.org Fri Sep 27 15:08:02 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 10:08:02 -0400 Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: Your message of "Fri, 27 Sep 2002 15:47:58 +0200." <3D94618E.5070509@lemburg.com> References: <200209271104.MAA27895@synaptics-uk.com> <007f01c26621$92d3c060$6d94fea9@newmexico> <3D94618E.5070509@lemburg.com> Message-ID: <200209271408.g8RE82m05291@pcp02138704pcs.reston01.va.comcast.net> Given all the discussion, this will need a PEP first. I'd suggest Marc-Andre and Alex as co-authors, but that's up to you. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Sep 27 15:30:35 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 27 Sep 2002 16:30:35 +0200 Subject: [Python-Dev] User extendable literal modifiers ?! References: <3D9423FB.9070303@lemburg.com> Message-ID: <3D946B8B.3080303@lemburg.com> Martin v. Loewis wrote: > "M.-A. Lemburg" writes: > >>Since these are numbers, it would be convenient if there were >>some way to create them in form of literals, much like 123L >>creates longs instead of integers or u"abc" gives you Unicode >>instead of an 8-bit string. > > > How would you marshal them? Using a new marshal token which only stores the modifier together with the literal as string. marshal.load() would then restore the object by looking up the constructor in the modifier registry and calling it with the string argument. But that's just an implementation detail. What's more important is whether this ideas raises interest or not. I'm not sure myself whether it's a good idea and that's why I posted the idea here. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Fri Sep 27 16:04:36 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 11:04:36 -0400 Subject: [Python-Dev] User extendable literal modifiers ?! In-Reply-To: Your message of "Fri, 27 Sep 2002 16:30:35 +0200." <3D946B8B.3080303@lemburg.com> References: <3D9423FB.9070303@lemburg.com> <3D946B8B.3080303@lemburg.com> Message-ID: <200209271504.g8RF4at05601@pcp02138704pcs.reston01.va.comcast.net> > What's more important is whether this ideas raises interest or > not. I'm not sure myself whether it's a good idea and that's why I > posted the idea here. There are lots of possibilities for overgeneralization here. E.g. most of the examples of the $x"..." syntax are just as easily done using a function call, either passing a string or a few numbers. One danger of new notations is that it could be much harder to find out what it means if you're not familiar with a program. If you see a call to Frobozz(1, 2), it usually isn't hard to find the definition of Frobozz -- at the worst, it's hidden in an "import *", and that's one reason to avoid those. But if you see $f"1 2" in a file, you may have to grep all code that is imported by the program containing that file for calls to sys.register. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Sep 27 16:19:26 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 11:19:26 -0400 Subject: [Python-Dev] sorted() In-Reply-To: Your message of "Wed, 25 Sep 2002 17:07:13 EDT." References: Message-ID: <200209271519.g8RFJQK05777@pcp02138704pcs.reston01.va.comcast.net> Since François is probably waiting for a pronouncement for me, let me say that I think this is a problem that should not be addressed by changes to the language, builtins or library. A sorted() method for lists would require a copy. François argues that the extra space could be used by the sorting algorithm. But if the requirement is that the original array must not be shuffled at all, I expect that there's no way you can make use of the extra space: you have to make a copy of the whole list first, which then gets shuffled in various ways. I suppose it would be possible to write a sorting algorithm that made some use of the availability of an output array, but rewriting the sort code once again so that you can avoid writing a three line function doesn't seem a good trade-off. More generalized solutions seem overkill: I've not seen demand for sorting other container types (except for list subclasses). The argument against making sort() return self (while sorting in-place) still holds, and this argument also means that having a sorted() that sorts in-place is a bad idea. You could consider adding a "sort" option to keys(), values() and items(), but that doesn't solve other similar cases. I think you'll just have to live with it. Or you can create a dict subclass that sorts its keys. --Guido van Rossum (home page: http://www.python.org/~guido/) From dave@boost-consulting.com Fri Sep 27 15:45:22 2002 From: dave@boost-consulting.com (David Abrahams) Date: Fri, 27 Sep 2002 10:45:22 -0400 Subject: [Python-Dev] Keyword for first argument of methods? Message-ID: <049f01c26634$afbda890$6501a8c0@boostconsulting.com> Hi, When implementing keyword argument support for Boost.Python, I noticed the following. I'm sure it's not worth a lot of effort to change this behavior, but I thought someone might like to know: >>> class X: ... def foo(self, y): print y ... >>> X.foo(y = 1, self = X()) Traceback (most recent call last): File "", line 1, in ? TypeError: unbound method foo() must be called with X instance as first argument (got nothing instead) -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From skip@pobox.com Fri Sep 27 16:15:44 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 27 Sep 2002 10:15:44 -0500 Subject: [Python-Dev] User extendable literal modifiers ?! In-Reply-To: <200209271504.g8RF4at05601@pcp02138704pcs.reston01.va.comcast.net> References: <3D9423FB.9070303@lemburg.com> <3D946B8B.3080303@lemburg.com> <200209271504.g8RF4at05601@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15764.30240.265628.247675@12-248-11-90.client.attbi.com> Guido> But if you see $f"1 2" in a file, you may have to grep all code Guido> that is imported by the program containing that file for calls to Guido> sys.register. Even worse, if you happen to see it in isolation (a module disconnected from the program it was written for), you might have no way to find out what the $f prefix means. Skip From guido@python.org Fri Sep 27 16:32:57 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 11:32:57 -0400 Subject: [Python-Dev] Keyword for first argument of methods? In-Reply-To: Your message of "Fri, 27 Sep 2002 10:45:22 EDT." <049f01c26634$afbda890$6501a8c0@boostconsulting.com> References: <049f01c26634$afbda890$6501a8c0@boostconsulting.com> Message-ID: <200209271532.g8RFWv405872@pcp02138704pcs.reston01.va.comcast.net> > When implementing keyword argument support for Boost.Python, I > noticed the following. I'm sure it's not worth a lot of effort to > change this behavior, but I thought someone might like to know: > > >>> class X: > ... def foo(self, y): print y > ... > >>> X.foo(y = 1, self = X()) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: unbound method foo() must be called with X instance as first > argument (got nothing instead) You can't pass in self to an unbound method as a keyword argument -- it has to be the first positional argument. The unbound method __call__ implementation contains a check that ensures that the 'self' argument is an instance of the class, but when this check is made, it cannot assume that the 'self' argument is actually called 'self' -- that's only a naming convention. It also doesn't know (in general) the name of the first argument to the underlying function, since the function can be an arbitrary callable -- there's no standard introspection interface for callables to find out the argument names. As you said, I see no reason to try to work harder in the case that the underlying callable supports introspection using obj.func_code.co_{argcount,varnames}. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Fri Sep 27 16:26:42 2002 From: ark@research.att.com (Andrew Koenig) Date: 27 Sep 2002 11:26:42 -0400 Subject: [Python-Dev] Keyword for first argument of methods? In-Reply-To: <049f01c26634$afbda890$6501a8c0@boostconsulting.com> References: <049f01c26634$afbda890$6501a8c0@boostconsulting.com> Message-ID: Dave> When implementing keyword argument support for Boost.Python, I noticed the Dave> following. I'm sure it's not worth a lot of effort to change this behavior, Dave> but I thought someone might like to know: Dave> class X: Dave> ... def foo(self, y): print y Dave> ... Dave> X.foo(y = 1, self = X()) Dave> Traceback (most recent call last): Dave> File "", line 1, in ? Dave> TypeError: unbound method foo() must be called with X instance as first Dave> argument (got nothing instead) Perhaps more interesting: >>> X.foo(X(), 1) 1 >>> X.foo(self = X(), y = 1) TypeError: unbound method foo() must be called with X instance as first argument (got nothing instead) -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From dave@boost-consulting.com Fri Sep 27 16:01:54 2002 From: dave@boost-consulting.com (David Abrahams) Date: Fri, 27 Sep 2002 11:01:54 -0400 Subject: [Python-Dev] Keyword for first argument of methods? References: <049f01c26634$afbda890$6501a8c0@boostconsulting.com> Message-ID: <04c801c26636$d24eec50$6501a8c0@boostconsulting.com> From: "Andrew Koenig" > Dave> When implementing keyword argument support for Boost.Python, I noticed the > Dave> following. I'm sure it's not worth a lot of effort to change this behavior, > Dave> but I thought someone might like to know: > > Dave> class X: > Dave> ... def foo(self, y): print y > Dave> ... > Dave> X.foo(y = 1, self = X()) > Dave> Traceback (most recent call last): > Dave> File "", line 1, in ? > Dave> TypeError: unbound method foo() must be called with X instance as first > Dave> argument (got nothing instead) > > Perhaps more interesting: > > >>> X.foo(X(), 1) > 1 > >>> X.foo(self = X(), y = 1) > TypeError: unbound method foo() must be called with X instance as first argument (got nothing instead) Given my post, that behavior falls out of the (nicely documented) rules for how functions are called, so it's unsurprising if you read the docs. I wonder if that makes any difference in the real world ;-) ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From ark@research.att.com Fri Sep 27 16:31:33 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 27 Sep 2002 11:31:33 -0400 (EDT) Subject: [Python-Dev] Keyword for first argument of methods? In-Reply-To: <04c801c26636$d24eec50$6501a8c0@boostconsulting.com> (dave@boost-consulting.com) References: <049f01c26634$afbda890$6501a8c0@boostconsulting.com> <04c801c26636$d24eec50$6501a8c0@boostconsulting.com> Message-ID: <200209271531.g8RFVXY07106@europa.research.att.com> Dave> Given my post, that behavior falls out of the (nicely Dave> documented) rules for how functions are called, so it's Dave> unsurprising if you read the docs. I wonder if that makes any Dave> difference in the real world ;-) Probably not. From haering_python@gmx.de Fri Sep 27 01:42:54 2002 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Fri, 27 Sep 2002 02:42:54 +0200 Subject: [Python-Dev] Strange bug only happens with Python 2.2 Message-ID: <20020927004254.GA2069@lilith.ghaering.test> This is somewhat off-topic, but I'm hoping maybe someone can give a hint why this only happens on Python 2.2.1. Ok, here's the story: I've had a bug report against our pyPgSQL database interface package that retrieving Large Objects doesn't work with Python 2.2.1. The reproducible traceback we get is: Traceback (most recent call last): File "p.py", line 20, in ? res = cs.fetchone() File "pyPgSQL/PgSQL.py", line 2672, in fetchone return self.__fetchOneRow() File "pyPgSQL/PgSQL.py", line 2281, in __fetchOneRow for _i in range(self.res.nfields): AttributeError: 'str' object has no attribute '__bases__' This traceback is quite obviously bogus, as self.res.nfields is a Python int and no strings are involved here whatsoever. After some debugging, I found that something very strange happens in a function call that happens in this for loop. Inside the for loop, a function typecast is called, which has this code within: if isinstance(value, PgBytea) or type(value) is PgLargeObjectType: This code is causing the problems which result in the bogus traceback later on. Now in my case, 'value' is of type PgLargeObjectType, which is a custom type from our extension module. PgBytea is a Python class. Now comes the first very strange observation: Swapping the checks, so that the 'type(value) is PgLargeObjectType' check comes first makes the problem go away. So my conclusion is that there's some problem with isinstance and my custom extension type. The second strange thing is that this only happens on Python 2.2.1 (Linux, FreeBSD, Windows), but _not_ on Python 2.1.3 or Python 2.3-CVS. Oh, the problem isn't tied to isinstance(value, PgBytea). Any isinstance check causes it later on. Of course I'm suspecting that there's some problem with the extension type. Looks like some internal interpreter data gets corrupted. No idea how to debug that, too. Does anybody have any tips where to look or how to debug this further? -- Gerhard From mwh@python.net Fri Sep 27 16:39:42 2002 From: mwh@python.net (Michael Hudson) Date: Fri, 27 Sep 2002 16:39:42 +0100 (BST) Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: <20020927004254.GA2069@lilith.ghaering.test> Message-ID: On Fri, 27 Sep 2002, Gerhard H=E4ring wrote: > This is somewhat off-topic, but I'm hoping maybe someone can give a hint > why this only happens on Python 2.2.1. Guessing, but the (Jeremy's?) changes I recently backported to=20 classobject.c on the release22-maint branch might relate to this. Can you try with a 222 build? > Ok, here's the story: >=20 > I've had a bug report against our pyPgSQL database interface package that > retrieving Large Objects doesn't work with Python 2.2.1. The reproducible > traceback we get is: >=20 > Traceback (most recent call last): > File "p.py", line 20, in ? > res =3D cs.fetchone() > File "pyPgSQL/PgSQL.py", line 2672, in fetchone > return self.__fetchOneRow() > File "pyPgSQL/PgSQL.py", line 2281, in __fetchOneRow > for _i in range(self.res.nfields): > AttributeError: 'str' object has no attribute '__bases__' >=20 > This traceback is quite obviously bogus, as self.res.nfields is a Python > int and no strings are involved here whatsoever. After some debugging, I > found that something very strange happens in a function call that > happens in this for loop. Inside the for loop, a function typecast is > called, which has this code within: >=20 > if isinstance(value, PgBytea) or type(value) is PgLargeObjectType: >=20 > This code is causing the problems which result in the bogus traceback > later on. So something's setting an exception and not letting the interpreter know. > Now in my case, 'value' is of type PgLargeObjectType, which is a custom > type from our extension module. PgBytea is a Python class. >=20 > Now comes the first very strange observation: Swapping the checks, so > that the 'type(value) is PgLargeObjectType' check comes first makes the > problem go away. So my conclusion is that there's some problem with > isinstance and my custom extension type. >=20 > The second strange thing is that this only happens on Python 2.2.1 > (Linux, FreeBSD, Windows), but _not_ on Python 2.1.3 or Python 2.3-CVS. This is no surprise. > Oh, the problem isn't tied to isinstance(value, PgBytea). Any isinstance > check causes it later on. Huh? > Of course I'm suspecting that there's some problem with the extension > type. Looks like some internal interpreter data gets corrupted. No idea > how to debug that, too. >=20 > Does anybody have any tips where to look or how to debug this further? Try a release22-maint build? Cheers, M. From guido@python.org Fri Sep 27 17:04:01 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 12:04:01 -0400 Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: Your message of "Fri, 27 Sep 2002 02:42:54 +0200." <20020927004254.GA2069@lilith.ghaering.test> References: <20020927004254.GA2069@lilith.ghaering.test> Message-ID: <200209271604.g8RG41M06049@pcp02138704pcs.reston01.va.comcast.net> > This is somewhat off-topic, but I'm hoping maybe someone can give a hint > why this only happens on Python 2.2.1. > > Ok, here's the story: > > I've had a bug report against our pyPgSQL database interface package that > retrieving Large Objects doesn't work with Python 2.2.1. The reproducible > traceback we get is: > > Traceback (most recent call last): > File "p.py", line 20, in ? > res = cs.fetchone() > File "pyPgSQL/PgSQL.py", line 2672, in fetchone > return self.__fetchOneRow() > File "pyPgSQL/PgSQL.py", line 2281, in __fetchOneRow > for _i in range(self.res.nfields): > AttributeError: 'str' object has no attribute '__bases__' > > This traceback is quite obviously bogus, as self.res.nfields is a Python > int and no strings are involved here whatsoever. After some debugging, I > found that something very strange happens in a function call that > happens in this for loop. Inside the for loop, a function typecast is > called, which has this code within: > > if isinstance(value, PgBytea) or type(value) is PgLargeObjectType: > > This code is causing the problems which result in the bogus traceback > later on. > > Now in my case, 'value' is of type PgLargeObjectType, which is a custom > type from our extension module. PgBytea is a Python class. > > Now comes the first very strange observation: Swapping the checks, so > that the 'type(value) is PgLargeObjectType' check comes first makes the > problem go away. So my conclusion is that there's some problem with > isinstance and my custom extension type. > > The second strange thing is that this only happens on Python 2.2.1 > (Linux, FreeBSD, Windows), but _not_ on Python 2.1.3 or Python 2.3-CVS. > > Oh, the problem isn't tied to isinstance(value, PgBytea). Any isinstance > check causes it later on. > > Of course I'm suspecting that there's some problem with the extension > type. Looks like some internal interpreter data gets corrupted. No idea > how to debug that, too. > > Does anybody have any tips where to look or how to debug this further? Probably some C code receives an exception and decides to go a different path (rather than propagating the exception), but forgets to call PyErr_Clear(). If you call some other code that raises an exception or calls PyErr_Clear(), the spurious exception is gone; but if you call some other code that *tests* for an exception (usually with PyExc_Occurred() or PyErr_ExceptionMatches()), that code may raise the bogus exception at an unexpected place. So I'd look in your extension for places where it tests for an exception and decides to ignore it but forgets to clear it. It's also possible that this occurs in the Python code (have you tried the 2.2.2 CVS? Use "cvs update -r release22-maint") but if I had to bet, I'd bet on your SQL extension. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From haering_python@gmx.de Fri Sep 27 17:23:13 2002 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Fri, 27 Sep 2002 18:23:13 +0200 Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: References: <20020927004254.GA2069@lilith.ghaering.test> Message-ID: <20020927162313.GA6854@lilith.ghaering.test> * Michael Hudson [2002-09-27 16:39 +0100]: > On Fri, 27 Sep 2002, Gerhard Häring wrote: > > > This is somewhat off-topic, but I'm hoping maybe someone can give a hint > > why this only happens on Python 2.2.1. > > Guessing, but the (Jeremy's?) changes I recently backported to > classobject.c on the release22-maint branch might relate to this. > > Can you try with a 222 build? Yep. The problem goes away with release22-maint :-) > > Ok, here's the story: > > [bogus traceback, caused by:] > > if isinstance(value, PgBytea) or type(value) is PgLargeObjectType: > > So something's setting an exception and not letting the interpreter know. > > Oh, the problem isn't tied to isinstance(value, PgBytea). Any isinstance > > check causes it later on. > > Huh? To clarify, any isinstance(value, x), where x is a Python class, causes the problem. > > [Any tips?] > Try a release22-maint build? That fixes the problem, so I'm now pretty confident it is a Python 2.2.1 problem (haven't tried with 2.2.0 yet, would that be of any use?). -- Gerhard From haering_python@gmx.de Fri Sep 27 17:24:43 2002 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Fri, 27 Sep 2002 18:24:43 +0200 Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: <200209271604.g8RG41M06049@pcp02138704pcs.reston01.va.comcast.net> References: <20020927004254.GA2069@lilith.ghaering.test> <200209271604.g8RG41M06049@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020927162443.GB6854@lilith.ghaering.test> * Guido van Rossum [2002-09-27 12:04 -0400]: > It's also possible that this occurs in the Python code (have you tried > the 2.2.2 CVS? Yep, the problem goes away, then. > Use "cvs update -r release22-maint") but if I had to bet, I'd bet on > your SQL extension. :-) How much? ;-) -- Gerhard From guido@python.org Fri Sep 27 18:03:08 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 13:03:08 -0400 Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: Your message of "Fri, 27 Sep 2002 18:23:13 +0200." <20020927162313.GA6854@lilith.ghaering.test> References: <20020927004254.GA2069@lilith.ghaering.test> <20020927162313.GA6854@lilith.ghaering.test> Message-ID: <200209271703.g8RH38q07337@pcp02138704pcs.reston01.va.comcast.net> > That fixes the problem, so I'm now pretty confident it is a Python 2.2.1 > problem (haven't tried with 2.2.0 yet, would that be of any use?). No. If it's really fixed in 2.2.2, there's nothing else we can do. But I'm curious what caused this. Can you show self-contained example code (not using SQL) that shows this behavior in 2.2.1? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Sep 27 19:09:32 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 27 Sep 2002 20:09:32 +0200 Subject: [Python-Dev] User extendable literal modifiers ?! References: <3D9423FB.9070303@lemburg.com> <3D946B8B.3080303@lemburg.com> <200209271504.g8RF4at05601@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D949EDC.5040202@lemburg.com> Guido van Rossum wrote: >>What's more important is whether this ideas raises interest or >>not. I'm not sure myself whether it's a good idea and that's why I >>posted the idea here. > > > There are lots of possibilities for overgeneralization here. > E.g. most of the examples of the $x"..." syntax are just as easily > done using a function call, either passing a string or a few numbers. > > One danger of new notations is that it could be much harder to find > out what it means if you're not familiar with a program. If you see a > call to Frobozz(1, 2), it usually isn't hard to find the definition of > Frobozz -- at the worst, it's hidden in an "import *", and that's one > reason to avoid those. But if you see $f"1 2" in a file, you may have > to grep all code that is imported by the program containing that file > for calls to sys.register. You're probably right. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From tim.one@comcast.net Fri Sep 27 19:10:08 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 27 Sep 2002 14:10:08 -0400 Subject: [Python-Dev] sorted() In-Reply-To: <200209271519.g8RFJQK05777@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > A sorted() method for lists would require a copy. Fran=E7ois argue= s > that the extra space could be used by the sorting algorithm. But i= f > the requirement is that the original array must not be shuffled at > all, I expect that there's no way you can make use of the extra spa= ce: > you have to make a copy of the whole list first, which then gets > shuffled in various ways. > > I suppose it would be possible to write a sorting algorithm that ma= de > some use of the availability of an output array, but rewriting the > sort code once again so that you can avoid writing a three line > function doesn't seem a good trade-off. There's no efficiency argument to be made here unless someone can wri= te a sort function this way and demonstrate an improvement. I expect that would be hard. Back when I wrote the samplesort hybrid= , I tried several ways of coding mergesorts too, and they all lost on ran= dom data. They all used a temp array of the same size as the original ar= ray. The current mergesort does not: it uses a temp array at most half th= e size. This effectively doubled the amount of code needed, but cut the size = of the working set. I first tried the current mergesort again with a temp a= rray the same size as the original, but it again lost (a little on random = data, a lot on many kinds of partially ordered data -- for example, take a so= rted array, and move its last element to the front; no matter how large th= e array, the current mergesort only needs a few dozen temp words to get= it sorted again, and caches are much happier with that). From jrw@pobox.com Fri Sep 27 19:51:57 2002 From: jrw@pobox.com (John Williams) Date: Fri, 27 Sep 2002 13:51:57 -0500 Subject: [Python-Dev] proposal for interfaces Message-ID: <005001c26657$05000330$0100a8c0@shura> I have an idea for an interface mechnism for Python, and I'd like to see if anyone likes it before writing an actual PEP. The key features are: - It's implementable in pure Python (I've already started working on it). - The syntax to use it is fairly concise. - Interfaces are inherited by default, but can be turned off. - Classes are made to implement interfaces without altering the class definition in any way. - A class can support any number of interfaces, even multiple interfaces that define methods with the same names. - It's easily extensible to add new features in a backward-compatible way. - It has support for design-by-contract idioms (this part is not essential to the proposal, so I won't discuss it further here, but interfaces without DBC seem kind of incomplete to me). Basic Usage =========== In actual practice it would look something like this: Suppose you have a class like this: class SomeClass: def foo(...): ... def bar(...): ... def foo2(...): ... In the simplest case, suppose you have an interface Foo that defines a single method, foo. To declare that SomeClass implements foo, you'd say: Foo.bind(SomeClass) Now, suppose you have a function that requires an argument implemeting interface Foo. You would probably code it like this: def foo_proc(foo_arg): foo_proxy = Foo(foo_arg) ... x = foo_proxy.foo(...) ... Two things are happening here. First, foo_arg is being checked to make sure it implements Foo; if not, an InterfaceError will be raised. Then, foo_proxy becomes a proxy for foo_arg, but it *only* supports calling the method foo, since that's all the interface defines. (If foo_arg is already a proxy object, the call the Foo will just return foo_arg.) Defining an Interface ===================== How is "Foo" defined? It could look something like this: class Foo(interface): def doc_foo(self,...): "Docstring for foo method." The "doc_" prefix on foo is not part of the method name; it is needed to control how the interface treats the method. If the method has default behavior, you could say this instead: class Foo(interface): def default_foo(self,...): "Docstring for foo method." print "Defaults can be handy." In the version, unlike the first, classes implemeting Foo need not define their own "foo" method is the defualt will suffice. Requiring some sort of prefix attached to every name defined by the interface is a little ugly, but it opens up a lot of possibilities for creating different behaviors with a minimum of fuss--I have a lot of uses in mind for different prefixes that I won't go into here. Advanced Examples ================= Let's say we have a new interface, FooBar, defined like this: class FooBar(interface): def doc_foo(...): "Method foo." def doc_bar(...): "Method quux." And suppose we'd like to make SomeClass above implement FooBar, but we want FooBar.foo to call SomeClass.foo2 instead of SomeClass.foo. It's easy! FooBar.bind(SomeClass) FooBar[SomeClass].foo = "foo2" # Override default binding. Now we can do really confusing stuff: x = SomeClass() Foo(x).foo() # Calls x.foo() FooBar(x).foo() # Calls x.foo2() Of course you probably wouldn't do something so confusing on purpose, but it could be useful when an object must support two different interfaces (written by different people) that happen to have method names in common, or to connect a class to an interface where the class defines all the needed functionality but the methods have the wrong names. For the last trick, let's imagine you want to derive a subclass of SomeClass. If you want the new class to inherit all the interfaces, do nothing. To remove an inteface, just do something like this: class AnotherClass(SomeClass): ... Foo.unbind(AnotherClass) And it's done! Please let me know if you like this idea (or hate it). If I get a good response I'll try to write a PEP this weekend and make the implementation availble to try out. --jw From mark@freelance-developer.com Fri Sep 27 20:44:37 2002 From: mark@freelance-developer.com (Mark Nenadov) Date: Fri, 27 Sep 2002 15:44:37 -0400 Subject: [Python-Dev] proposal for interfaces In-Reply-To: <005001c26657$05000330$0100a8c0@shura> References: <005001c26657$05000330$0100a8c0@shura> Message-ID: <200209271544.37296.mark@freelance-developer.com> John, I like your idea! I would look forward to seeing the implementation. I personally prefer to have the "interface binding" to be part of a class definition. However, I can see some advantages to having the binding proc= ess seperate from the interface. Good day, ~Mark On September 27, 2002 02:51 pm, John Williams wrote: > And it's done! > > Please let me know if you like this idea (or hate it). If I get a good > response I'll try to write a PEP this weekend and make the implementati= on > availble to try out. From jriehl@spaceship.com Fri Sep 27 20:44:19 2002 From: jriehl@spaceship.com (Jonathan Riehl) Date: Fri, 27 Sep 2002 14:44:19 -0500 (CDT) Subject: [Python-Dev] Extension module difficulty w/pgen. Message-ID: Hi all, I know this is what I get for trying to integrate pgen into an extension module: I can't get it to link properly. I first saw the following problem on a FreeBSD box. Now, I have the following two external dependencies in my extension module (.../src/Modules/pgenmodule.c): extern grammar * _Py_pgen (node * n); extern grammar * _Py_meta_grammar (void); I added an entry to setup.py for it: exts.append( Extension('pgen', ['pgenmodule.c'])) Now when I run make, the extension module is not built, with the system complaining about being unable to resolve "_Py_meta_grammar", but not "_Py_pgen". When I run nm, I can see both symbols in libpython.2.3.a (these symbols are in pgen.c and metagrammar.c, both of which have been added to the libpython build target): ~/cvs/python/dist/src> nm ./libpython2.3.a | grep _Py_pgen [26] | 2676| 48|FUNC |GLOB |0 |2 |_Py_pgen ~/cvs/python/dist/src> nm ./libpython2.3.a | grep _Py_meta [36] | 0| 12|FUNC |GLOB |0 |2 |_Py_meta_grammar On the FreeBSD box, I was able to add "-L. -lpython2.3" to the command line, and this builds. However, when I use this hack on a Solaris platform, it complains about being unable to reserve a text offset for most if not all of the symbols in libpython. It seems to me that I should not have to use this workaround, which only works on one of the systems I use. Does anyone have an idea as to what I should do now? I am a bit confused by this, since Fred Drake's parser extension does not require any of this wackiness. As an aside, the code for the modules I am working on and the diffs are on Sourceforge (PEP 269 implementation), so you can play too, if so inclined. Thanks! -Jon From guido@python.org Fri Sep 27 21:24:58 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 16:24:58 -0400 Subject: [Python-Dev] Extension module difficulty w/pgen. In-Reply-To: Your message of "Fri, 27 Sep 2002 14:44:19 CDT." References: Message-ID: <200209272024.g8RKOwq22992@pcp02138704pcs.reston01.va.comcast.net> > I know this is what I get for trying to integrate pgen into an > extension module: I can't get it to link properly. I first saw the > following problem on a FreeBSD box. > > Now, I have the following two external dependencies in my extension > module (.../src/Modules/pgenmodule.c): > > extern grammar * _Py_pgen (node * n); > extern grammar * _Py_meta_grammar (void); > > I added an entry to setup.py for it: > > exts.append( Extension('pgen', ['pgenmodule.c'])) > > Now when I run make, the extension module is not built, with the system > complaining about being unable to resolve "_Py_meta_grammar", but not > "_Py_pgen". Maybe you only get an error for the first unresolved symbol? > When I run nm, I can see both symbols in libpython.2.3.a > (these symbols are in pgen.c and metagrammar.c, both of which have been > added to the libpython build target): > > ~/cvs/python/dist/src> nm ./libpython2.3.a | grep _Py_pgen > [26] | 2676| 48|FUNC |GLOB |0 |2 |_Py_pgen > ~/cvs/python/dist/src> nm ./libpython2.3.a | grep _Py_meta > [36] | 0| 12|FUNC |GLOB |0 |2 |_Py_meta_grammar Linux nm output looks very different, so I don't know what this means. Are you *sure* it doesn't mean that there are global references but no definitions for these symbols? And what does the 0 in the second column for _Py_meta_grammar mean? > On the FreeBSD box, I was able to add "-L. -lpython2.3" to the command > line, and this builds. However, when I use this hack on a Solaris > platform, it complains about being unable to reserve a text offset for > most if not all of the symbols in libpython. > > It seems to me that I should not have to use this workaround, which only > works on one of the systems I use. Does anyone have an idea as to what I > should do now? I am a bit confused by this, since Fred Drake's parser > extension does not require any of this wackiness. > > As an aside, the code for the modules I am working on and the diffs are on > Sourceforge (PEP 269 implementation), so you can play too, if so inclined. Maybe the problem is that nothing else uses these symbols? Try sticking dummy references (e.g. an unreachable call) to them in main.c, to see if that makes a difference. I recall we had to do this for something else that wasn't used by Python itself. --Guido van Rossum (home page: http://www.python.org/~guido/) From pedronis@bluewin.ch Fri Sep 27 21:40:30 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Fri, 27 Sep 2002 22:40:30 +0200 Subject: [Python-Dev] buitlins instance have modifiable __class__? Message-ID: <0a0301c26666$1f0f7800$6d94fea9@newmexico> question on bultin types (under 2.2): >>> d={} >>> class ndict(dict): ... __slots__ = () ... def __getitem__(self,k): ... print "__getitem__" ... return dict.__getitem__(self,k) ... >>> d.items() [] >>> d['a']=3 >>> d.__class__=ndict is intended to work? it seems it does, but is that the intention? >>> d['a'] __getitem__ 3 [ >>> exec "print a" in d 3 Ok, that is the non cooperative behavior I already know about. ] Thanks. From pedronis@bluewin.ch Fri Sep 27 22:07:21 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Fri, 27 Sep 2002 23:07:21 +0200 Subject: [Python-Dev] buitlins instance have modifiable __class__? References: <0a0301c26666$1f0f7800$6d94fea9@newmexico> Message-ID: <0ac201c26669$dee14660$6d94fea9@newmexico> typos apart, there was also another question, sorry I was typing and reflecting on the consequences of all of this on Jython ... [me] > > >>> exec "print a" in d > 3 > > Ok, that is the non cooperative behavior I already know about. ] > I recall this was already discussed here, what is the idea, to leave it as it is or make this work? Thanks. From jriehl@spaceship.com Fri Sep 27 22:20:20 2002 From: jriehl@spaceship.com (Jonathan Riehl) Date: Fri, 27 Sep 2002 16:20:20 -0500 (CDT) Subject: [Python-Dev] Extension module difficulty w/pgen. In-Reply-To: <200209272024.g8RKOwq22992@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Fri, 27 Sep 2002, Guido van Rossum wrote: > > Maybe you only get an error for the first unresolved symbol? > Yup. When I comment out the references to _Py_meta_grammar(), it still complains about not being able to see _Py_pgen(). > > When I run nm, I can see both symbols in libpython.2.3.a > > (these symbols are in pgen.c and metagrammar.c, both of which have been > > added to the libpython build target): > > > > ~/cvs/python/dist/src> nm ./libpython2.3.a | grep _Py_pgen > > [26] | 2676| 48|FUNC |GLOB |0 |2 |_Py_pgen > > ~/cvs/python/dist/src> nm ./libpython2.3.a | grep _Py_meta > > [36] | 0| 12|FUNC |GLOB |0 |2 |_Py_meta_grammar > > Linux nm output looks very different, so I don't know what this > means. Are you *sure* it doesn't mean that there are global > references but no definitions for these symbols? And what does the 0 > in the second column for _Py_meta_grammar mean? Come on Guido, I thought you were at one point a Solaris hack ;-). FUNC and GLOB means it is a global function defined in the module. The second column is some sort of memory offset value, and incidently the third column is the byte size reserved for the object (i.e. the objects should be in the library, and are not just place holders). Here is the output from Linux (where I have just duplicated the problem): $ nm libpython2.3.a | egrep "_Py_(meta|pgen)" 00000000 T _Py_meta_grammar 00000e38 T _Py_pgen If I understood the GNU info file for binutils, this means that the symbols are defined in the text segment, and should be available for external linkage. > Maybe the problem is that nothing else uses these symbols? Try > sticking dummy references (e.g. an unreachable call) to them in > main.c, to see if that makes a difference. I recall we had to do this > for something else that wasn't used by Python itself. I tried this just now, but to no avail. Maybe I am not being thorough enough. If the linker is excluding these symbols because they are not used, why would nm seem to say they are there, and why would statically linking libpython work (on FreeBSD, anyway)? Conversely, I seem to remember this working on an earlier, but abandoned attempt I made on a Linux box. Maybe I just need more vacation. :P Thanks! -Jon From gerhard.haering@gmx.de Fri Sep 27 23:07:40 2002 From: gerhard.haering@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Sat, 28 Sep 2002 00:07:40 +0200 Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: References: <20020927004254.GA2069@lilith.ghaering.test> Message-ID: <20020927220740.GA7751@lilith.ghaering.test> * Michael Hudson [2002-09-27 16:39 +0100]: > On Fri, 27 Sep 2002, Gerhard Häring wrote: > > > This is somewhat off-topic, but I'm hoping maybe someone can give a hint > > why this only happens on Python 2.2.1. > > Guessing, but the (Jeremy's?) changes I recently backported to > classobject.c on the release22-maint branch might relate to this. Maybe. I've not viewed the control flow in a debugger, but my tries to come up with a minimalistic test case and my gut feeling says that this piece of code has something to do with it: static PyObject *PgLargeObject_getattr(PgLargeObject *self, char* attr) { PyObject *res; res = Py_FindMethod(PgLargeObject_methods, (PyObject *)self, attr); if (res != NULL) return res; PyErr_Clear(); if (strcmp(attr, "closed") == 0) return Py_BuildValue("l", (long)(self->lo_fd == -1)); if (!strcmp(attr, "__module__")) return Py_BuildValue("s", MODULE_NAME); if (!strcmp(attr, "__class__")) { printf("__class__ accessed!\n"); return Py_BuildValue("s", self->ob_type->tp_name); } return PyMember_Get((char *)self, PgLargeObject_members, attr); } from which I can see that isinstance tries to access the __class__ attribute. Am I supposed to /not/ provide a __class__ attribute for classic types? I haven't looked into the python22-maint changelogs yet, but I couldn't find any related registered SF bug. -- Gerhard From haering_python@gmx.de Fri Sep 27 23:53:50 2002 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Sat, 28 Sep 2002 00:53:50 +0200 Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: <20020927220740.GA7751@lilith.ghaering.test> References: <20020927004254.GA2069@lilith.ghaering.test> <20020927220740.GA7751@lilith.ghaering.test> Message-ID: <20020927225349.GA8862@lilith.ghaering.test> * Gerhard Häring [2002-09-28 00:07 +0200]: > * Michael Hudson [2002-09-27 16:39 +0100]: > > On Fri, 27 Sep 2002, Gerhard Häring wrote: > > > > > This is somewhat off-topic, but I'm hoping maybe someone can give a hint > > > why this only happens on Python 2.2.1. > > > > Guessing, but the (Jeremy's?) changes I recently backported to > > classobject.c on the release22-maint branch might relate to this. > > Maybe. I've not viewed the control flow in a debugger, but my tries to come up > with a minimalistic test case and my gut feeling says that this piece of code > has something to do with it: > > static PyObject *PgLargeObject_getattr(PgLargeObject *self, char* attr) > { > PyObject *res; > > res = Py_FindMethod(PgLargeObject_methods, (PyObject *)self, attr); > if (res != NULL) > return res; > PyErr_Clear(); > > if (strcmp(attr, "closed") == 0) > return Py_BuildValue("l", (long)(self->lo_fd == -1)); > > if (!strcmp(attr, "__module__")) > return Py_BuildValue("s", MODULE_NAME); > > if (!strcmp(attr, "__class__")) { > printf("__class__ accessed!\n"); > return Py_BuildValue("s", self->ob_type->tp_name); > } > > return PyMember_Get((char *)self, PgLargeObject_members, attr); > } > > from which I can see that isinstance tries to access the __class__ attribute. > Am I supposed to /not/ provide a __class__ attribute for classic types? > > I haven't looked into the python22-maint changelogs yet, but I couldn't find > any related registered SF bug. Ok, I've now further narrowed down this isinstance issue: python22-maint ==> bug does not appear python22-maint with abstract.c from Python 2.2.1 ==> bug appears So for what it's worth (i. e. not much), I'd say please /do/ include the abstract.c changes into the upcoming Python 2.2.2 :-) -- Gerhard From guido@python.org Sat Sep 28 01:11:54 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 20:11:54 -0400 Subject: [Python-Dev] Strange bug only happens with Python 2.2 In-Reply-To: Your message of "Sat, 28 Sep 2002 00:53:50 +0200." <20020927225349.GA8862@lilith.ghaering.test> References: <20020927004254.GA2069@lilith.ghaering.test> <20020927220740.GA7751@lilith.ghaering.test> <20020927225349.GA8862@lilith.ghaering.test> Message-ID: <200209280011.g8S0BsF17313@pcp02138704pcs.reston01.va.comcast.net> > Ok, I've now further narrowed down this isinstance issue: > > python22-maint ==> bug does not appear > python22-maint with abstract.c from Python 2.2.1 ==> bug appears > > So for what it's worth (i. e. not much), I'd say please /do/ include the > abstract.c changes into the upcoming Python 2.2.2 :-) I'm sure it's this change, which was backported to the 2.2 maintenance branch (and hence will be in 2.2.2). It fixes several several occurrences where an error is not cleared. ---------------------------- revision 2.101 date: 2002/04/23 22:45:44; author: bwarsaw; state: Exp; lines: +46 -9 abstract_get_bases(): Clarify exactly what the return values and states can be for this function, and ensure that only AttributeErrors are masked. Any other exception raised via the equivalent of getattr(cls, '__bases__') should be propagated up. abstract_issubclass(): If abstract_get_bases() returns NULL, we must call PyErr_Occurred() to see if an exception is being propagated, and return -1 or 0 as appropriate. This is the specific fix for a problem whereby if getattr(derived, '__bases__') raised an exception, an "undetected error" would occur (under a debug build). This nasty situation was uncovered when writing a security proxy extension type for the Zope3 project, where the security proxy raised a Forbidden exception on getattr of __bases__. PyObject_IsInstance(), PyObject_IsSubclass(): After both calls to abstract_get_bases(), where we're setting the TypeError if the return value is NULL, we must first check to see if an exception occurred, and /not/ mask an existing exception. Neil Schemenauer should double check that these changes don't break his ExtensionClass examples (there aren't any test cases for those examples and abstract_get_bases() was added by him in response to problems with ExtensionClass). Neil, please add test cases if possible! I belive this is a bug fix candidate for Python 2.2.2. ---------------------------- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Sep 28 01:17:50 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 27 Sep 2002 20:17:50 -0400 Subject: [Python-Dev] buitlins instance have modifiable __class__? In-Reply-To: Your message of "Fri, 27 Sep 2002 22:40:30 +0200." <0a0301c26666$1f0f7800$6d94fea9@newmexico> References: <0a0301c26666$1f0f7800$6d94fea9@newmexico> Message-ID: <200209280017.g8S0Hol17357@pcp02138704pcs.reston01.va.comcast.net> > question on bultin types (under 2.2): > > >>> d={} > >>> class ndict(dict): > ... __slots__ = () > ... def __getitem__(self,k): > ... print "__getitem__" > ... return dict.__getitem__(self,k) > ... > >>> d.items() > [] > >>> d['a']=3 > >>> d.__class__=ndict > > is intended to work? > > it seems it does, but is that the intention? It was a mistake. In 2.3, it's disallowed. In 2.2.2, it'll still be allowed, but you shouldn't do this -- all sorts of bizarre stuff can happen because you can do this. So if you're asking about this for Jython, please don't allow this in Jython! > typos apart, there was also another question, sorry I was typing and > reflecting on the consequences of all of this on Jython ... > > [me] > > > > >>> exec "print a" in d > > 3 > > > > Ok, that is the non cooperative behavior I already know about. ] > > > > I recall this was already discussed here, what is the idea, to leave > it as it is or make this work? That's not going to change in CPython, because I believe it would slow down lookup for builtins and globals too much if we had to check for a custom __getitem__. But if you can fix it for Jython, go ahead. I don't mind if there are places where Jython is "purer" than CPython. --Guido van Rossum (home page: http://www.python.org/~guido/) From pedronis@bluewin.ch Sat Sep 28 01:15:11 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sat, 28 Sep 2002 02:15:11 +0200 Subject: [Python-Dev] buitlins instance have modifiable __class__? References: <0a0301c26666$1f0f7800$6d94fea9@newmexico> <200209280017.g8S0Hol17357@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <0ec901c26684$1cae5ae0$6d94fea9@newmexico> From: Guido van Rossum > It was a mistake. In 2.3, it's disallowed. In 2.2.2, it'll still be > allowed, but you shouldn't do this -- all sorts of bizarre stuff can > happen because you can do this. > > So if you're asking about this for Jython, please don't allow this in > Jython! honestly I was hoping that it was unintended, implementing this would make already complicated things even more so. So we are happy campers. Maybe others will be less so, I have seen a module referred on c.l.p using the "feature" escaped from the lab to make builtins observable . > > > >>> exec "print a" in d > > > 3 > > > > > > Ok, that is the non cooperative behavior I already know about. ] > > > > > > > I recall this was already discussed here, what is the idea, to leave > > it as it is or make this work? > > That's not going to change in CPython, because I believe it would slow > down lookup for builtins and globals too much if we had to check for a > custom __getitem__. But if you can fix it for Jython, go ahead. I > don't mind if there are places where Jython is "purer" than CPython. Very likely the other way is what we will get in Jython by doing nothing regards. From esteban@ccpgames.com Sat Sep 28 09:26:38 2002 From: esteban@ccpgames.com (Esteban U.C.. Castro) Date: Sat, 28 Sep 2002 08:26:38 -0000 Subject: [Python-Dev] proposal for interfaces Message-ID: <25A39AFEB31B06408C675CEF28199E5B073CA3@postur.ccp.cc> Hi, I have just joined python-dev and I saw your very interesting proposal=20 for implementing intefaces. > I have an idea for an interface mechnism for Python, and I'd like to see if > anyone likes it before writing an actual PEP. [...] I like it a lot! Anyway, if it can be implemented in python as is, what is=20 the point of the PEP? Making the 'interface' root class and/or InterfaceError=20 builtins, maybe? I have some comments which I thought I would bounce. I'll organize these attending to the activities they relate to. Don't hesitate to tell me if I'm=20 sayig something stupid. :) Define an interface =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D In your example, it seems that class Foo(interface): def default_foo(self, a, b): "Docstring for foo method." print "Defaults can be handy." [I added the a, b arguments to illustrate one point below] Does the following:=20 - Defines the requirement that all objects that implement Foo have a foo=20 function - Defines that foo should accept arguments of the form (a, b) maybe? - Sets a doc string for the foo method. - Sets a default implementation for the method. =20 Some questions on this: - Can an interface only define method names (and maybe argument formats)?=20 I think it would be handy to let it expose attributes. - Is method names (and maybe format of arguments) the only thing you can=20 'promise' in the interface? In other words, is that the only type of=20 guarantee that code that works against the interface can get? I think a __check__(self, obj) special method in interfaces would be a simple way to boost their flexibility. - For the uses you have given to the prefix_methodname notation so far, I don't think it's really needed. Isn't the following sufficient? =20 class Foo(interface): =20 def foo(self, a, b): "foo docstring" # nothing here; no default definition =20 def bar(self, a, b): pass # no docstring, _empty definition_ =20 This has the side effect that a method with no default definition and no doc string is a SyntaxError. Is this too bad?=20 - It would maybe be hard to figure out what such a method is supposed to do, so you _should_ provide a docstring.=20 - If you're in a hurry, an empty docstring will do the trick. While in=20 'quick and dirty mode' you probably won't be using interfaces a lot,=20 anyway. Defaults look indeed useful, but the really crucial aspect to lay down=20 in an interface definition is what does it guarantee on the objects that implement it. If amalgamating this with default defs would otherwise=20 obscure it (there's another issue I'm addressing below), I think defaults=20 belong more properly to a class that implements the interface, not to the=20 interface definition itself. I guess knowing what other uses you have in mind for the prefix_methodname=20 notation could be useful to decide whether it's warranted. Check whether an object implements an interface =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >From your examples, I get it that an object implements an interface iff it is an instance of a class that implements that interface.=20 So I guess any checking of the requirements expressed by the interface is=20 done at the time when you bind() a class to an interface. This paradigm perfectly fits static and strongly typed languages, but it falls a bit short of the flexibility of python IMO. You can do very funny=20 things to classes and objects at runtime, which can break any assumptions=20 based on object class. In your example: def foo_proc(foo_arg): foo_proxy =3D Foo(foo_arg) ... x =3D foo_proxy.foo(a, b) [added a, b again] imagine foo_proc may only really cares that foo_arg is an object that has=20 a foo() method that takes (a, b) arguments (this is all Foo guarantees). * Will the Foo() call check this, or will it just check that some class in foo_arg's bases is bound to the Foo interface?=20 In the second case, if someone has been fiddling with foo_arg or some of its base classes, foo_arg.foo() may no longer exist or it may have a different signature. =20 * Why should Foo() _always_ fail for objects that _do_ meet the requirements=20 expressed by the interface Foo but have _not_ declared that they implement=20 the interface? If the point is to avoid false positives, interfaces=20 with this concern may still make the class check: class Foo(interface): def __call__(self, obj): error =3D __check__(obj) if error: raise InterfaceError, error else: return self.proxy(obj) =20 def __check__(self, obj): if not hasattr(obj, "foo"): return "No method foo found" ... return interface.check_class(self, obj) Making such check optional allows implicit (not declared) interface=20 satisfaction for those who want it. This should extend the applicability of=20 interfaces. And this brings up another problem with defaults: they would increase false=20 positives. What if an interface wants to provide defaults for all its methods?=20 Will then any object match it? This would force additional checking.=20 Even thought this doesn't look like a big issue to me, I think it's cleanest=20 to leave validation for interfaces and implementation for classes. Declare that an object implements an interface or part of it =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Foo.bind(SomeClass) Problems: * I agree the declaration had better be included in the class definition, at=20 least as an option.=20 * Declarative better than procedural, for this purpose. * Only classes (not instances) can declare that they implement an interface.=20 * Indexing notation to 'resolve' methods a bit counterintuitive. What about: # interfaces=20 class Foo(interface): def foo(self, a, b):=20 ... def clash1(self, x): ... def clash2(self): ... =20 =20 class Bar(interface): def bar(self, x, y): ... def clash1(self, a, s, d, f): ... def clash2(self):=20 # actually equivalent to Foo.clash2 # we should really factor this out in one interface # but imagine we can't do so for some reason... ... =20 =20 # implementations class SomeClass: # promise that all instances of SomeClass will implement Foo and Bar __implements__ =3D (Foo, Bar)=20 # automatically assumed to implement Foo.foo() =20 def foo(self, a, b):=20 ... =20 # There is not automatic name clash resolution. InterfaceError unless=20 # we resolve these explicitly =20 def fclash1(self, x): ... fclash1.__implements__ =3D Foo.clash1 # maybe we should require this=20 # to be a tuple too? =20 def bclash1(self, x): ... bclash1.__implements__ =3D Bar.clash1 =20 def clash2(self): ... clash2.__implements__ =3D (Foo.clash2, Bar.clash2) # or maybe this is=20 =09 # not really needed? =09 # seems to reflect bad=20 =09 # design anyway # 'Remove' an interface from a subclass without actually removing it from=20 # the base (or just cut the search with negative result): =20 class Child(SomeClass): __implements__ =3D (-Foo,) # will be found before the 'Foo' in = the base=20 # class # an object that is *not* an instance of a class that implements Foo wants=20 # to play the Foo obj =3D Child() obj.foo =3D lambda a, b: ... obj.clash1 =3D lambda x: ... obj.for_the_fun_of_it =3D lambda: ... obj.__implements__ =3D (Foo,) # will be found before the '-Foo' in = the # class obj.for_the_fun_of_it.__implements__ =3D Foo.clash2 # object dict will be=20 # searched first Restrict =3D=3D=3D=3D=3D=3D=3D=3D This is, to make sure that an object is only accessed in the ways=20 defined in the interface (via the proxy).=20 This should be optional too, but your syntax does this nicely; you can call Foo() as an assertion of sorts and ignore the result. Note that the __implements__ method resolution magic would require=20 that you get a proxy, though. What do you think? Esteban. From esteban@ccpgames.com Sat Sep 28 10:32:28 2002 From: esteban@ccpgames.com (Esteban U.C.. Castro) Date: Sat, 28 Sep 2002 09:32:28 -0000 Subject: [Python-Dev] sorry Message-ID: <25A39AFEB31B06408C675CEF28199E5B06EB0B@postur.ccp.cc> phew! sorry about the formatting of the last one! :( Esteban. From aleax@aleax.it Sat Sep 28 11:30:08 2002 From: aleax@aleax.it (Alex Martelli) Date: Sat, 28 Sep 2002 12:30:08 +0200 Subject: [Python-Dev] Why are useful tools omitted from the Win bin distro? In-Reply-To: <25A39AFEB31B06408C675CEF28199E5B06EB0B@postur.ccp.cc> References: <25A39AFEB31B06408C675CEF28199E5B06EB0B@postur.ccp.cc> Message-ID: <02092812300806.05324@arthur> Just helped some people on the Italian Python mailing list find some indispensable tool (Tools/i18n/pygettext.py, in this specific case) and they expressed astonishment that the tool isn't in the standard Windows binary distribution, which was all they had downloaded. This set me to wondering -- is there any reason why this and other tools &c should NOT be included in that binary? Could we add them in 2.2.2 and later releases? Alex From mwh@python.net Sat Sep 28 12:37:56 2002 From: mwh@python.net (Michael Hudson) Date: 28 Sep 2002 12:37:56 +0100 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Guido van Rossum's message of "Fri, 20 Sep 2002 17:26:30 -0400" References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2mn0q2fip7.fsf@starship.python.net> Guido van Rossum writes: > I'd like to release something called Python 2.2.2 in a few weeks (say, > around Oct 8; I like Tuesday release dates). One minor point of concern: I think Jack Jansen's on holiday. Perhaps we should wait for him to get back... Cheers, M. -- There's an aura of unholy black magic about CLISP. It works, but I have no idea how it does it. I suspect there's a goat involved somewhere. -- Johann Hibschman, comp.lang.scheme From guido@python.org Sat Sep 28 15:22:58 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 28 Sep 2002 10:22:58 -0400 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Your message of "Sat, 28 Sep 2002 12:37:56 BST." <2mn0q2fip7.fsf@starship.python.net> References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <2mn0q2fip7.fsf@starship.python.net> Message-ID: <200209281422.g8SEMw720102@pcp02138704pcs.reston01.va.comcast.net> > > I'd like to release something called Python 2.2.2 in a few weeks (say, > > around Oct 8; I like Tuesday release dates). > > One minor point of concern: I think Jack Jansen's on holiday. Perhaps > we should wait for him to get back... He should be back by Oct 5 or 6 if what he told me of his schedule is true. I'm not sure that it would matter much if MacPython 2.2.2 was released a week after the main release. Maybe we should do one release candidate anyway and give him space that way. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Sep 28 15:35:40 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 28 Sep 2002 10:35:40 -0400 Subject: [Python-Dev] Why are useful tools omitted from the Win bin distro? In-Reply-To: Your message of "Sat, 28 Sep 2002 12:30:08 +0200." <02092812300806.05324@arthur> References: <25A39AFEB31B06408C675CEF28199E5B06EB0B@postur.ccp.cc> <02092812300806.05324@arthur> Message-ID: <200209281435.g8SEZe620175@pcp02138704pcs.reston01.va.comcast.net> > Just helped some people on the Italian Python mailing list find some > indispensable tool (Tools/i18n/pygettext.py, in this specific case) and > they expressed astonishment that the tool isn't in the standard Windows > binary distribution, which was all they had downloaded. This set me to > wondering -- is there any reason why this and other tools &c should NOT be > included in that binary? Could we add them in 2.2.2 and later releases? Good idea. I think it was a simple oversight -- adding anything from the Tools directory is not automatic, Tim has to add lines to the Windows installer script. I believe that the following Tools subdirectories are currently being distributed: idle scripts webchecker versioncheck pynche That means these are not: audiopy (Solaris only) bgen (only used by Mac developers AFAIK) compiler (experimental AFAIK) faqwiz (only useful for people running web servers) framer (new in 2.3) freeze (only useful for developers?) i18n modulator (only useful for developers) unicode (only useful for developers?) world Of these, I think i18n and world are candidates for inclusion on the Windows installer. --Guido van Rossum (home page: http://www.python.org/~guido/) From whisper@oz.net Sat Sep 28 20:03:05 2002 From: whisper@oz.net (David LeBlanc) Date: Sat, 28 Sep 2002 12:03:05 -0700 Subject: [Python-Dev] Why are useful tools omitted from the Win bin distro? In-Reply-To: <02092812300806.05324@arthur> Message-ID: I agree - downloading the source distro yields some goodies that would be nice in the binary distro - many windows folks won't d/l the source since many of them don't have a C compiler, especially the "VB" types. I nominate Alex as "Mr. WinBin" ;) Regards, David LeBlanc Seattle, WA USA > -----Original Message----- > From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On > Behalf Of Alex Martelli > Sent: Saturday, September 28, 2002 3:30 > To: python-dev@python.org > Subject: [Python-Dev] Why are useful tools omitted from the Win bin > distro? > > > Just helped some people on the Italian Python mailing list find some > indispensable tool (Tools/i18n/pygettext.py, in this specific case) and > they expressed astonishment that the tool isn't in the standard Windows > binary distribution, which was all they had downloaded. This set me to > wondering -- is there any reason why this and other tools &c > should NOT be > included in that binary? Could we add them in 2.2.2 and later releases? > > > Alex > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From jriehl@spaceship.com Sun Sep 29 00:18:55 2002 From: jriehl@spaceship.com (Jonathan Riehl) Date: Sat, 28 Sep 2002 18:18:55 -0500 (CDT) Subject: [Python-Dev] Extension module difficulty w/pgen. In-Reply-To: Message-ID: > On Fri, 27 Sep 2002, Guido van Rossum wrote: < > > Maybe the problem is that nothing else uses these symbols? Try > > sticking dummy references (e.g. an unreachable call) to them in > > main.c, to see if that makes a difference. I recall we had to do this > > for something else that wasn't used by Python itself. > > I tried this just now, but to no avail. Maybe I am not being thorough > enough. If the linker is excluding these symbols because they are not > used, why would nm seem to say they are there, and why would statically > linking libpython work (on FreeBSD, anyway)? Conversely, I seem to > remember this working on an earlier, but abandoned attempt I made on a > Linux box. > It turns out I wasn't being thorough enough. I used nm on my python build, and didn't see the symbols there. I moved the dummy calls to a global function in python.c and only then were the symbols linked into the interpreter. Keeping a dummy function in one of the core modules doesn't seem like a terribly elegant solution, even if it allows me to keep developing the pgen module. What would you suggest be done to ensure that statically linked builds link those symbols? I would assume that since the required symbols *are* in libpython (per my modifications to Makefile.pre.in), building a python using dynamic libraries would allow the extension module to "see" those symbols. You had mentioned doing something like this before. Is there some linkage graveyard where I can bury calls to these symbols in order to ensure they are linked? Or are some of those fancy API macros used to ensure linkage (perhaps by API functions that are utilities for extension writers and not needed by the python core)? Thanks! -Jon From guido@python.org Sun Sep 29 00:55:26 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 28 Sep 2002 19:55:26 -0400 Subject: [Python-Dev] Extension module difficulty w/pgen. In-Reply-To: Your message of "Sat, 28 Sep 2002 18:18:55 CDT." References: Message-ID: <200209282355.g8SNtQR21241@pcp02138704pcs.reston01.va.comcast.net> > > > Maybe the problem is that nothing else uses these symbols? Try > > > sticking dummy references (e.g. an unreachable call) to them in > > > main.c, to see if that makes a difference. I recall we had to > > > do this for something else that wasn't used by Python itself. > > > > I tried this just now, but to no avail. Maybe I am not being > > thorough enough. If the linker is excluding these symbols because > > they are not used, why would nm seem to say they are there, and > > why would statically linking libpython work (on FreeBSD, anyway)? > > Conversely, I seem to remember this working on an earlier, but > > abandoned attempt I made on a Linux box. > > > It turns out I wasn't being thorough enough. I used nm on my python > build, and didn't see the symbols there. I moved the dummy calls to > a global function in python.c and only then were the symbols linked > into the interpreter. > > Keeping a dummy function in one of the core modules doesn't seem > like a terribly elegant solution, even if it allows me to keep > developing the pgen module. What would you suggest be done to > ensure that statically linked builds link those symbols? I would > assume that since the required symbols *are* in libpython (per my > modifications to Makefile.pre.in), building a python using dynamic > libraries would allow the extension module to "see" those symbols. Yes, I think building a shared lib would work. But we don't build shared libs for all platforms. > You had mentioned doing something like this before. Is there some > linkage graveyard where I can bury calls to these symbols in order > to ensure they are linked? Or are some of those fancy API macros > used to ensure linkage (perhaps by API functions that are utilities > for extension writers and not needed by the python core)? No, AFAIK you have to create a dummy reference somewhere. I'd suggest adding it to the end of Python/pythonrun.c. --Guido van Rossum (home page: http://www.python.org/~guido/) From dave@boost-consulting.com Sun Sep 29 01:21:54 2002 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 28 Sep 2002 20:21:54 -0400 Subject: [Python-Dev] Doc location question Message-ID: <096801c2674e$b3dab4c0$6501a8c0@boostconsulting.com> Hi there, Is there a good reason that http://www.python.org/2.2.1/descrintro.html isn't also available as http://www.python.org/current/descrintro.html ? Maybe I don't understand the system. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From guido@python.org Sun Sep 29 02:27:51 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 28 Sep 2002 21:27:51 -0400 Subject: [Python-Dev] Doc location question In-Reply-To: Your message of "Sat, 28 Sep 2002 20:21:54 EDT." <096801c2674e$b3dab4c0$6501a8c0@boostconsulting.com> References: <096801c2674e$b3dab4c0$6501a8c0@boostconsulting.com> Message-ID: <200209290127.g8T1RpD23189@pcp02138704pcs.reston01.va.comcast.net> > Is there a good reason that http://www.python.org/2.2.1/descrintro.html > isn't also available as http://www.python.org/current/descrintro.html ? descrintro.html is "rogue" documentation, i.e. it's not part of the official documentation set. Its contents should eventually be incorporated into the official reference manual. Possibly it's good to keep it as a separate tutorial for new-style classes, in which case it should be converted to Latex and incorporated in the official documentation. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Sun Sep 29 04:48:51 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 28 Sep 2002 23:48:51 -0400 Subject: [Python-Dev] Why are useful tools omitted from the Win bin distro? In-Reply-To: Message-ID: [Alex Martelli] > Just helped some people on the Italian Python mailing list find some > indispensable tool (Tools/i18n/pygettext.py, in this specific case) and > they expressed astonishment that the tool isn't in the standard Windows > binary distribution, which was all they had downloaded. This set me to > wondering -- is there any reason why this and other tools &c > should NOT be included in that binary? Could we add them in 2.2.2 and > later releases? Last time this came up, Guido was opposed to it. The problem is that the majority of Python Windows users aren't particularly clueful, and the Demo and Tools directories are loaded with stuff that's not maintained, not documented, platform-dependent, and may not even work anymore. This all conspires to give it a "developers only" status. Repeated calls for volunteers to clean this up (i.e., document it, fix it, clean out the crap) went unanswered. I've been been known to respond to requests to include specific pieces in the Windows distro, though (for example, IDLE and pynche). This is a PITA because it requires custom WISE scripting for each one, so someone has to convince me they really, really want a piece first. From guido@python.org Sun Sep 29 05:48:25 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 29 Sep 2002 00:48:25 -0400 Subject: [Python-Dev] New logmerge feature Message-ID: <200209290448.g8T4mQ607379@pcp02138704pcs.reston01.va.comcast.net> For those interested in poring over CVS logs, I've added a new feature to logmerge.py: a -b tag option that restricts the output to a specific branch tag. Use -b HEAD to show only the CVS HEAD (a.k.a. trunk). (The default is to show all revisions regardless of the branch on which they occur, which isn't always so easy if you're interested in a specific branch.) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@manatee.mojam.com Sun Sep 29 13:00:24 2002 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 29 Sep 2002 07:00:24 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200209291200.g8TC0Okm026518@manatee.mojam.com> Bug/Patch Summary ----------------- 290 open / 2889 total bugs (+5) 109 open / 1709 total patches (+3) New Bugs -------- property missing doc'd __name__ attr (2002-09-22) http://python.org/sf/612969 memory leaks when importing posix module (2002-09-23) http://python.org/sf/613222 win32 build_ext problem (2002-09-24) http://python.org/sf/614051 fpectl module broken on Linux (2002-09-24) http://python.org/sf/614060 Rewrite _reduce and _reconstructor in C (2002-09-25) http://python.org/sf/614555 LookupError etc. need API to get the key (2002-09-25) http://python.org/sf/614557 gethostbyname() blocks when threaded (2002-09-25) http://python.org/sf/614791 broken link in documentation (2002-09-25) http://python.org/sf/614821 socket.getfqdn() doesn't on Windows (2002-09-27) http://python.org/sf/615472 No __mod__ on str subclass (2002-09-27) http://python.org/sf/615506 Tkinter.Misc has no __contains__ method (2002-09-27) http://python.org/sf/615772 getdefaultlocale failure on OS X (2002-09-28) http://python.org/sf/616002 cPickle documentation incomplete (2002-09-28) http://python.org/sf/616013 list(xrange(sys.maxint / 4)) -> swapping (2002-09-28) http://python.org/sf/616019 New Patches ----------- koi8_u codec (2002-09-23) http://python.org/sf/613173 add unescape method to xml.sax.saxutils (2002-09-23) http://python.org/sf/613256 rm email package dependency on rfc822.py (2002-09-23) http://python.org/sf/613434 Bugfix: content-type header parsing (2002-09-23) http://python.org/sf/613605 OpenVMS patches (2002-09-24) http://python.org/sf/614055 fix for urllib2.AbstractBasicAuthHandler (2002-09-25) http://python.org/sf/614596 MSVC 7.0 compiler support (2002-09-25) http://python.org/sf/614770 build fixes for SCO (2002-09-26) http://python.org/sf/615069 acconfig.h out of date (2002-09-26) http://python.org/sf/615343 Closed Bugs ----------- New features need syntax (2001-08-17) http://python.org/sf/452222 ConfigParser has_option case sensitive (2002-05-29) http://python.org/sf/561822 resize readonly memory mapped file (2002-07-05) http://python.org/sf/577782 ConfigParser spaces in keys not read (2002-07-18) http://python.org/sf/583248 exec*() doesn't handle errors well (2002-08-20) http://python.org/sf/597797 Lone surrogates cause bad .pyc files (2002-09-17) http://python.org/sf/610783 2 bugs in turtle.py (2002-09-21) http://python.org/sf/612595 Closed Patches -------------- pyport.h, Wince and errno getter/setter (2002-01-19) http://python.org/sf/505846 Fix "file:" URL to have right no. of /'s (2002-08-06) http://python.org/sf/591713 bugfixes and cleanup for _strptime.py (2002-08-10) http://python.org/sf/593560 turtle tracer bugfixes and new functions (2002-08-14) http://python.org/sf/595111 select problems on Windows (2002-09-19) http://python.org/sf/611464 quietly select between 'less' and 'more' (2002-09-20) http://python.org/sf/612111 From pedronis@bluewin.ch Sun Sep 29 14:03:01 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sun, 29 Sep 2002 15:03:01 +0200 Subject: [Python-Dev] "incriminated" module (was: buitlins instance have modifiable __class__?) References: <0a0301c26666$1f0f7800$6d94fea9@newmexico> <200209280017.g8S0Hol17357@pcp02138704pcs.reston01.va.comcast.net> <0ec901c26684$1cae5ae0$6d94fea9@newmexico> Message-ID: <014c01c267b8$8adedd20$6d94fea9@newmexico> This is a multi-part message in MIME format. ------=_NextPart_000_0149_01C267C9.4E13C0C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit [me] > Maybe others will be less so, I have seen a module referred on c.l.p using the > "feature" escaped from the lab to make builtins observable . > the thread on c.l.p is google: watching mutables? group:comp.lang.python the package is at http://oomadness.tuxfamily.org/en/editobj/ author: Jean-Baptiste LAMY -- jiba@tuxfamily I have directly attached the "incriminated" module for the curious. Should I give them the bad news? regards. ------=_NextPart_000_0149_01C267C9.4E13C0C0 Content-Type: text/plain; name="eventobj.py" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="eventobj.py" # EventObj=0A= # Copyright (C) 2001-2002 Jean-Baptiste LAMY -- jiba@tuxfamily=0A= #=0A= # This program is free software; you can redistribute it and/or modify=0A= # it under the terms of the GNU General Public License as published by=0A= # the Free Software Foundation; either version 2 of the License, or=0A= # (at your option) any later version.=0A= #=0A= # This program is distributed in the hope that it will be useful,=0A= # but WITHOUT ANY WARRANTY; without even the implied warranty of=0A= # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the=0A= # GNU General Public License for more details.=0A= #=0A= # You should have received a copy of the GNU General Public License=0A= # along with this program; if not, write to the Free Software=0A= # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 = USA=0A= =0A= # This module is certainly my best hack ;-)=0A= # It is nothing else than a sequence of hacks, from the beginning to the = end !!!!!=0A= =0A= """EventObj -- Allow to add attribute-change, content-change or = hierarchy-change event to any Python instance.=0A= =0A= Provides the following functions (view their doc for more info):=0A= dumper_event(obj, attr, oldvalue, newvalue)=0A= addevent(obj, event)=0A= hasevent(obj[, event])=0A= removeevent(obj, event)=0A= addevent_rec(obj, event)=0A= removeevent_rec(obj, event)=0A= And the following constant for addition/removal events:=0A= ADDITION=0A= REMOVAL=0A= =0A= Events : An event is any callable object that take 4 parameters (=3D the = listened instance, the changed attribute name, the new value, the old = value).=0A= After registration with addevent, it will be called just after any = attribute is change on the instance.=0A= You can add many events on the same instance, and use the same event = many times.=0A= An instance can implement __addevent__(event, copiable =3D 0), = __hasevent__(event =3D None) and __removeevent__(event) to allow a = custom event system.=0A= =0A= Notice that the event is weakref'ed, so if it is no longer accessible, = the event is silently removed.=0A= =0A= Caution : As event management is performed by changing the class of the = instance, you use eventobj with critical objects at your own risk... !=0A= =0A= Quick example :=0A= >>> from editobj.eventobj import *=0A= >>> class C: pass=0A= >>> c =3D C()=0A= >>> def event(obj, attr, value, oldvalue):=0A= ... print "c." + attr, "was", `oldvalue` + ", is now", `value`=0A= >>> addevent(c, event)=0A= >>> c.x =3D 1=0A= c.x was None, is now 1=0A= =0A= Addition/Removal events : if you add them to a list / UserList or a dict = / UserDict, or a subclass, events are also called for addition or = removal in the list/dict.=0A= Addition or removal can be performed by any of the methods of = UserList/UserDict (e.g. append, extend, remove, __setitem__,...)=0A= In this case, name will be the ADDITION or REMOVAL constant, value the = added object for a list and the key-value pair for a dict, and oldvalue = None.=0A= =0A= Quick example :=0A= >>> from editobj.eventobj import *=0A= >>> c =3D []=0A= >>> def event(obj, attr, value, oldvalue):=0A= ... if attr is ADDITION: # Only for list / dict=0A= ... print `value`, "added to c"=0A= ... elif attr is REMOVAL: # Only for list / dict=0A= ... print `value`, "removed from c"=0A= ... else:=0A= ... print "c." + attr, "was", `oldvalue` + ", is now", `value`=0A= >>> addevent(c, event)=0A= >>> c.append(0)=0A= 0 added to c=0A= =0A= Hierachy events : such events are used with UserList or UserDict, and = are usefull to listen an entire hierarchy (e.g. a list that contains = other lists that can contain other lists...).=0A= The event apply to the registred instance, and any other item it = contains, and so on if the list/dict is a deeper hierarchy.=0A= If you want to automatically add or remove the event when the hierarchy = change (addition or removal at any level), use a HierarchyEvent (see = example below).=0A= =0A= Quick example :=0A= >>> from editobj.eventobj import *=0A= >>> c =3D []=0A= >>> def event(obj, attr, value, oldvalue):=0A= ... if attr is ADDITION:=0A= ... print `value`, "added to", `obj`=0A= ... elif attr is REMOVAL:=0A= ... print `value`, "removed from", `obj`=0A= >>> addevent_rec(c, event)=0A= >>> c.append([]) # This sub-list was not in c when we add = the event...=0A= [] added to [[]] # [[]] is c=0A= >>> c[0].append(12) # ...but the hierarchical event has been = automatically added !=0A= 12 added to [12] # [12] is c[0]=0A= """=0A= =0A= __all__ =3D [=0A= "addevent",=0A= "hasevent",=0A= "removeevent",=0A= "addevent_rec",=0A= "removeevent_rec",=0A= "dumper_event",=0A= "ADDITION",=0A= "REMOVAL",=0A= ]=0A= =0A= import new, weakref, types #, copy=0A= from UserList import UserList=0A= from UserDict import UserDict=0A= =0A= =0A= def to_list(obj):=0A= if isinstance(obj, list) or isinstance(obj, UserList): return obj=0A= if hasattr(obj, "children"):=0A= items =3D obj.children=0A= if callable(items): return items()=0A= return items=0A= if hasattr(obj, "items"):=0A= items =3D obj.items=0A= if callable(items): return items()=0A= return items=0A= if hasattr(obj, "__getitem__"): return obj=0A= return None=0A= =0A= def to_dict(obj):=0A= if isinstance(obj, dict) or isinstance(obj, UserDict): return obj=0A= return None=0A= =0A= def to_dict_or_list(obj):=0A= content =3D to_dict(obj) # Try dict...=0A= if content is None:=0A= content =3D to_list(obj) # Try list...=0A= if content is None: return None, None=0A= return list, content=0A= else:=0A= return dict, content=0A= =0A= def to_content(obj):=0A= content =3D to_dict(obj) # Try dict...=0A= if content is None: return to_list(obj) or () # Try list...=0A= return content.values()=0A= =0A= =0A= def dumper_event(obj, attr, value, old):=0A= """dumper_event -- an event that dump a stacktrace when called. = Usefull for debugging !=0A= =0A= dumper_event is the default value for add_event.=0A= """=0A= import traceback=0A= traceback.print_stack()=0A= =0A= if attr is ADDITION: print "%s now contains %s." % (obj, value)=0A= elif attr is REMOVAL : print "%s no longer contains %s." % (obj, value)=0A= else: print "%s.%s was %s, is now %s." % (obj, attr, = old, value)=0A= =0A= =0A= def addevent(obj, event =3D dumper_event):=0A= """addevent(obj, event =3D dumper_event)=0A= Add the given attribute-change event to OBJ. OBJ must be a class = instance (old or new style) or a list / dict, and EVENT a function that = takes 4 args: (obj, attr, value, old).=0A= EVENT will be called when any attribute of obj will be changed; OBJ is = the object, ATTR the name of the attribute, VALUE the new value of the = attribute and OLD the old one.=0A= =0A= EVENT defaults to the "dumper" event, which print each object = modification.=0A= =0A= Raise eventobj.NonEventableError if OBJ cannot support event.=0A= """=0A= =0A= event =3D _wrap_event(obj, event)=0A= =0A= try:=0A= obj.__addevent__(event)=0A= except:=0A= if hasattr(obj, "__dict__"):=0A= # Store the event and the old class of obj in an instance of = _EventObj_stuff (a class we use to contain that).=0A= # Do that BEFORE we link the event (else we'll get a change event = for the "_EventObj_stuff" attrib) !=0A= obj._EventObj_stuff =3D _EventObj_stuff(obj.__class__)=0A= =0A= # Change the class of the object.=0A= obj.__class__ =3D _create_class(obj, obj.__class__)=0A= else:=0A= # Change the class of the object.=0A= old_class =3D obj.__class__=0A= obj.__class__ =3D _create_class(obj, obj.__class__)=0A= =0A= stuff_for_non_dyn_objs[id(obj)] =3D _EventObj_stuff(old_class)=0A= =0A= obj.__addevent__(event)=0A= =0A= def hasevent(obj, event =3D None):=0A= """hasevent(obj[, event]) -> Boolean=0A= Return wether the obj instance has the given event (or has any event, if = event is None)."""=0A= return hasattr(obj, "__hasevent__") and obj.__hasevent__(event)=0A= =0A= def removeevent(obj, event =3D dumper_event):=0A= """removeevent(obj, event =3D dumper_event)=0A= Remove the given event from obj."""=0A= hasattr(obj, "__removeevent__") and obj.__removeevent__(event)=0A= =0A= ADDITION =3D "__added__"=0A= REMOVAL =3D "__removed__"=0A= =0A= NonEventableError =3D "NonEventableError"=0A= =0A= # Private stuff :=0A= =0A= # A dict to store the created classes.=0A= #_classes =3D weakref.WeakKeyDictionary() # old-style class cannot be = weakref'ed !=0A= _classes =3D {}=0A= =0A= # Create a with-event class for "clazz". the returned class is a "mixin" = that will extends clazz and _EventObj (see below).=0A= def _create_class(obj, clazz):=0A= try: return _classes[clazz]=0A= except:=0A= if hasattr(obj, "__dict__"):=0A= # The name of the new class is the same name of the original class = (mimetism !)=0A= if issubclass(clazz, list) or issubclass(clazz, UserList): cl = =3D new.classobj(clazz.__name__, (_EventObj_List, _EventObj, clazz), {})=0A= elif issubclass(clazz, dict) or issubclass(clazz, UserDict): cl = =3D new.classobj(clazz.__name__, (_EventObj_Dict, _EventObj, clazz), {})=0A= else:=0A= if issubclass(clazz, object): cl = =3D new.classobj(clazz.__name__, (_EventObj, clazz), {})=0A= else: cl = =3D new.classobj(clazz.__name__, (_EventObj_OldStyle, clazz), {})=0A= =0A= else:=0A= # list and dict were added in _classes at module initialization=0A= # Other types are not supported yet !=0A= raise NonEventableError, obj=0A= =0A= # Change also the module name.=0A= cl.__module__ =3D clazz.__module__=0A= _classes[clazz] =3D cl=0A= return cl=0A= =0A= =0A= # A container for _EventObj attribs.=0A= class _EventObj_stuff:=0A= def __init__(self, clazz):=0A= self.clazz =3D clazz=0A= self.events =3D []=0A= =0A= def __call__(self, obj, attr, value, oldvalue):=0A= # Clone the list, since executing an event function may add or = remove some events.=0A= for event in self.events[:]: event(obj, attr, value, oldvalue)=0A= =0A= def remove_event(self, event): self.events.remove(event)=0A= =0A= def has_event(self, event): return event in self.events=0A= =0A= def _wrap_event(obj, event, hi =3D 0):=0A= if not isinstance(event, WrappedEvent):=0A= #dump =3D repr(event)=0A= =0A= try: obj =3D weakref.proxy(obj) # Avoid cyclic ref=0A= except TypeError: pass=0A= =0A= def callback(o):=0A= #print "attention !", dump, "est mourant !"=0A= # This seems buggy since it is called when some objects are being = destructed=0A= try:=0A= ob =3D obj=0A= if ob:=0A= if removeevent and hasevent(ob, event): removeevent(ob, event)=0A= except: pass=0A= if hi:=0A= if type(event) is types.MethodType: event =3D WeakHiMethod(event, = callback)=0A= else: event =3D WeakHiFunc (event, = callback)=0A= else:=0A= if type(event) is types.MethodType: event =3D WeakMethod(event, = callback)=0A= else: event =3D WeakFunc (event, = callback)=0A= =0A= return event=0A= =0A= =0A= class WrappedEvent: pass=0A= =0A= class WeakFunc(WrappedEvent):=0A= def __init__(self, func, callback =3D None):=0A= if callback: self.func =3D weakref.ref(func, callback)=0A= else: self.func =3D weakref.ref(func)=0A= =0A= def original(self): return self.func()=0A= =0A= def __call__(self, *args): self.func()(*args)=0A= =0A= def __eq__(self, other):=0A= return (self.func() =3D=3D other) or (isinstance(other, WeakFunc) = and (self.func() =3D=3D other.func()))=0A= =0A= def __repr__(self): return "" % self.func()=0A= =0A= class WeakMethod(WrappedEvent):=0A= def __init__(self, method, callback =3D None):=0A= if callback: self.obj =3D weakref.ref(method.im_self, callback)=0A= else: self.obj =3D weakref.ref(method.im_self)=0A= self.func =3D method.im_func=0A= =0A= def original(self):=0A= obj =3D self.obj()=0A= return new.instancemethod(self.func, obj, obj.__class__)=0A= =0A= def __call__(self, *args): self.func(self.obj(), *args)=0A= =0A= def __eq__(self, other):=0A= return ((type(other) is types.MethodType) and (self.obj() is = other.im_self) and (self.func is other.im_func)) or (isinstance(other, = WeakMethod) and (self.obj() is other.obj()) and (self.func is = other.func))=0A= =0A= def __repr__(self): return "" % self.original()=0A= =0A= class HierarchyEvent:=0A= def __call__(self, obj, attr, value, oldvalue):=0A= if attr is ADDITION:=0A= try: =0A= if isinstance(obj, _EventObj_List): addevent_rec(value, = self.original())=0A= else: addevent_rec(value[1], = self.original())=0A= except NonEventableError: pass=0A= elif attr is REMOVAL:=0A= try: =0A= if isinstance(obj, _EventObj_List): removeevent_rec(value, = self.original())=0A= else: removeevent_rec(value[1], = self.original())=0A= except NonEventableError: pass=0A= =0A= class WeakHiFunc(HierarchyEvent, WeakFunc):=0A= def __call__(self, obj, attr, value, oldvalue):=0A= HierarchyEvent.__call__(self, obj, attr, value, oldvalue)=0A= WeakFunc.__call__(self, obj, attr, value, oldvalue)=0A= =0A= class WeakHiMethod(HierarchyEvent, WeakMethod):=0A= def __call__(self, obj, attr, value, oldvalue):=0A= HierarchyEvent.__call__(self, obj, attr, value, oldvalue)=0A= WeakMethod.__call__(self, obj, attr, value, oldvalue)=0A= =0A= =0A= # Mixin class used as base class for any with-event class.=0A= class _EventObj:=0A= stocks =3D []=0A= def __setattr__(self, attr, value):=0A= # Get the old value of the changing attrib.=0A= oldvalue =3D getattr(self, attr, None)=0A= if attr =3D=3D "__class__":=0A= newclass =3D _create_class(self, value)=0A= self._EventObj_stuff.clazz.__setattr__(self, "__class__", newclass)=0A= self._EventObj_stuff.clazz =3D value=0A= else:=0A= # If a __setattr__ is defined for obj's old class, call it. Else, = just set the attrib in obj's __dict__=0A= if hasattr(self._EventObj_stuff.clazz, "__setattr__"): = self._EventObj_stuff.clazz.__setattr__(self, attr, value)=0A= else: = self.__dict__[attr] =3D value=0A= =0A= # Comparison may fail=0A= try:=0A= if value =3D=3D oldvalue: return=0A= except: pass=0A= =0A= # Call registered events, if needed=0A= for event in self._EventObj_stuff.events:=0A= event(self, attr, value, oldvalue)=0A= =0A= def __addevent__(self, event):=0A= self._EventObj_stuff.events.append(event)=0A= l =3D to_list(self)=0A= if (not l is None) and (not l is self): addevent(l, event)=0A= def __hasevent__(self, event =3D None):=0A= return (event is None) or (self._EventObj_stuff.has_event(event))=0A= def __removeevent__(self, event):=0A= self._EventObj_stuff.remove_event(event)=0A= l =3D to_list(self)=0A= if (not l is None) and (not l is self): removeevent(l, event)=0A= if len(self._EventObj_stuff.events) =3D=3D 0: self.__restore__()=0A= =0A= def __restore__(self):=0A= # If no event left, reset obj to its original class.=0A= if hasattr(self._EventObj_stuff.clazz, "__setattr__"):=0A= self._EventObj_stuff.clazz.__setattr__(self, "__class__", = self._EventObj_stuff.clazz)=0A= else:=0A= self.__class__ =3D self._EventObj_stuff.clazz=0A= # And delete the _EventObj_stuff.=0A= del self._EventObj_stuff=0A= =0A= # Called at pickling time=0A= def __getstate__(self):=0A= try:=0A= dict =3D self._EventObj_stuff.clazz.__getstate__(self)=0A= =0A= if dict is self.__dict__: dict =3D dict.copy()=0A= except: dict =3D self.__dict__.copy()=0A= =0A= try:=0A= del dict["_EventObj_stuff"] # Remove what we have added.=0A= if dict.has_key("children"): dict["children"] =3D = list(dict["children"])=0A= elif dict.has_key("items" ): dict["items" ] =3D = list(dict["items" ])=0A= except: pass # Not a dictionary ??=0A= =0A= return dict=0A= =0A= def __reduce__(self):=0A= def rec_check(t):=0A= if t is self.__class__: return self._EventObj_stuff.clazz=0A= if type(t) is tuple: return tuple(map(rec_check, t))=0A= return t=0A= =0A= red =3D self._EventObj_stuff.clazz.__reduce__(self)=0A= =0A= if len(red) =3D=3D 2: return red[0], tuple(map(rec_check, red[1]))=0A= else: return red[0], tuple(map(rec_check, red[1])), = red[2]=0A= =0A= class _EventObj_OldStyle(_EventObj):=0A= def __deepcopy__(self, memo):=0A= if hasattr(self._EventObj_stuff.clazz, "__deepcopy__"):=0A= clone =3D self._EventObj_stuff.clazz.__deepcopy__(self, memo)=0A= if clone.__class__ is self.__class__:=0A= clone.__class__ =3D self._EventObj_stuff.clazz=0A= if hasattr(clone, "_EventObj_stuff"): del clone._EventObj_stuff=0A= return clone=0A= else:=0A= import copy=0A= =0A= if hasattr(self, '__getinitargs__'):=0A= args =3D self.__getinitargs__()=0A= copy._keep_alive(args, memo)=0A= args =3D copy.deepcopy(args, memo)=0A= y =3D apply(self._EventObj_stuff.clazz, args)=0A= else:=0A= y =3D copy._EmptyClass()=0A= y.__class__ =3D self._EventObj_stuff.clazz=0A= memo[id(self)] =3D y=0A= if hasattr(self, '__getstate__'):=0A= state =3D self.__getstate__()=0A= copy._keep_alive(state, memo)=0A= else:=0A= state =3D self.__dict__=0A= state =3D copy.deepcopy(state, memo)=0A= if hasattr(y, '__setstate__'): y.__setstate__(state)=0A= else: y.__dict__.update(state)=0A= return y=0A= =0A= =0A= class _EventObj_List(_EventObj):=0A= def __added__ (self, value): self._EventObj_stuff(self, ADDITION, = value, None)=0A= def __removed__(self, value): self._EventObj_stuff(self, REMOVAL , = value, None)=0A= =0A= def append(self, value):=0A= self._EventObj_stuff.clazz.append(self, value)=0A= self.__added__(value)=0A= def insert(self, before, value):=0A= self._EventObj_stuff.clazz.insert(self, before, value)=0A= self.__added__(value)=0A= def extend(self, list):=0A= self._EventObj_stuff.clazz.extend(self, list)=0A= for value in list: self.__added__(value)=0A= =0A= def remove(self, value):=0A= self._EventObj_stuff.clazz.remove(self, value)=0A= self.__removed__(value)=0A= def pop(self, index =3D -1):=0A= value =3D self._EventObj_stuff.clazz.pop(self, index)=0A= self.__removed__(value)=0A= return value=0A= =0A= def __setitem__(self, index, new):=0A= old =3D self[index]=0A= self._EventObj_stuff.clazz.__setitem__(self, index, new)=0A= self.__removed__(old)=0A= self.__added__ (new)=0A= def __delitem__(self, index):=0A= value =3D self[index]=0A= self._EventObj_stuff.clazz.__delitem__(self, index)=0A= self.__removed__(value)=0A= def __setslice__(self, i, j, slice):=0A= olds =3D self[i:j]=0A= self._EventObj_stuff.clazz.__setslice__(self, i, j, slice)=0A= for value in olds : self.__removed__(value)=0A= for value in slice: self.__added__ (value)=0A= def __delslice__(self, i, j):=0A= olds =3D self[i:j]=0A= self._EventObj_stuff.clazz.__delslice__(self, i, j)=0A= for value in olds : self.__removed__(value)=0A= def __iadd__(self, list):=0A= self._EventObj_stuff.clazz.__iadd__(self, list)=0A= for value in list: self.__added__(value)=0A= return self=0A= def __imul__(self, n):=0A= olds =3D self[:]=0A= self._EventObj_stuff.clazz.__imul__(self, n)=0A= if n =3D=3D 0:=0A= for value in olds: self.__removed__(value)=0A= else:=0A= for value in olds * (n - 1): self.__added__(value)=0A= return self=0A= =0A= =0A= class _EventObj_Dict(_EventObj):=0A= def __added__ (self, key, value): self._EventObj_stuff(self, = ADDITION, (key, value), None)=0A= def __removed__(self, key, value): self._EventObj_stuff(self, REMOVAL = , (key, value), None)=0A= =0A= def update(self, dict):=0A= old =3D {}=0A= for key, value in dict.items():=0A= if self.has_key(key): old[key] =3D value=0A= self._EventObj_stuff.clazz.update(self, dict)=0A= for key, value in old .items(): self.__removed__(key, value)=0A= for key, value in dict.items(): self.__added__ (key, value)=0A= def popitem(self):=0A= old =3D self._EventObj_stuff.clazz.popitem(self)=0A= self.__removed__(old[0], old[1])=0A= return old=0A= def clear(self):=0A= old =3D self.items()=0A= self._EventObj_stuff.clazz.clear(self)=0A= for key, value in old: self.__removed__(key, value)=0A= =0A= def __setitem__(self, key, new):=0A= if self.has_key(key):=0A= old =3D self[key]=0A= self._EventObj_stuff.clazz.__setitem__(self, key, new)=0A= self.__removed__(key, old)=0A= else:=0A= self._EventObj_stuff.clazz.__setitem__(self, key, new)=0A= self.__added__(key, new)=0A= def __delitem__(self, key):=0A= value =3D self[key]=0A= self._EventObj_stuff.clazz.__delitem__(self, key)=0A= self.__removed__(key, value)=0A= =0A= # EventObj class for plain list (e.g. []) and plain dict :=0A= =0A= # EventObj stuff is not stored in the object's dict (because no such = dict...)=0A= # but in this dictionary :=0A= #stuff_for_non_dyn_objs =3D weakref.WeakKeyDictionary()=0A= stuff_for_non_dyn_objs =3D {}=0A= =0A= class _EventObj_PlainList(_EventObj_List, list):=0A= __slots__ =3D []=0A= =0A= #__hash__ =3D object.__hash__ # Allows to hash it ! (needed to use = "self" as a dict key)=0A= =0A= def _get_EventObj_stuff(self): return stuff_for_non_dyn_objs[id(self)]=0A= def _set_EventObj_stuff(self, stuff): stuff_for_non_dyn_objs[id(self)] = =3D stuff=0A= _EventObj_stuff =3D property(_get_EventObj_stuff, _set_EventObj_stuff)=0A= =0A= def __restore__(self):=0A= # If no event left, delete the _EventObj_stuff and reset obj to its = original class.=0A= # Bypass the _EventObj.__setattr__ (it would crash since = _EventObj_stuff is no longer available after the class change)=0A= self._EventObj_stuff.clazz.__setattr__(self, "__class__", = self._EventObj_stuff.clazz)=0A= del stuff_for_non_dyn_objs[id(self)]=0A= =0A= def __getstate__(self): return None=0A= =0A= _classes[list] =3D _EventObj_PlainList=0A= =0A= class _EventObj_PlainDict(_EventObj_Dict, dict):=0A= __slots__ =3D []=0A= =0A= def _get_EventObj_stuff(self): return stuff_for_non_dyn_objs[id(self)]=0A= def _set_EventObj_stuff(self, stuff): stuff_for_non_dyn_objs[id(self)] = =3D stuff=0A= _EventObj_stuff =3D property(_get_EventObj_stuff, _set_EventObj_stuff)=0A= =0A= def __restore__(self):=0A= # If no event left, delete the _EventObj_stuff and reset obj to its = original class.=0A= # Bypass the _EventObj.__setattr__ (it would crash since = _EventObj_stuff is no longer available after the class change)=0A= self._EventObj_stuff.clazz.__setattr__(self, "__class__", = self._EventObj_stuff.clazz)=0A= del stuff_for_non_dyn_objs[id(self)]=0A= =0A= def __getstate__(self): return None=0A= =0A= _classes[dict] =3D _EventObj_PlainDict=0A= =0A= =0A= # Hierarchy stuff :=0A= =0A= def addevent_rec(obj, event =3D dumper_event):=0A= """addevent_rec(obj, event =3D dumper_event)=0A= Add event for obj, like addevent, but proceed recursively in all the = hierarchy : if obj is a UserList/UserDict, event will be added to each = instance obj contains, recursively.=0A= If the hierarchy is changed, the newly added items will DO have the = event, and the removed ones will no longuer have it."""=0A= if not hasevent(obj, event): # Avoid problem with cyclic list/dict=0A= # Wrap event in a hierarchy event=0A= if not isinstance(event, HierarchyEvent): wevent =3D = _wrap_event(obj, event, 1)=0A= =0A= addevent(obj, wevent)=0A= =0A= for o in to_content(obj):=0A= try: addevent_rec(o, event)=0A= except NonEventableError: pass=0A= =0A= def removeevent_rec(obj, event =3D dumper_event):=0A= """removeevent_rec(obj, event =3D dumper_event)=0A= Remove event for obj, like removeevent, but proceed recursively."""=0A= if hasevent(obj, event): # Avoid problem with cyclic list/dict=0A= removeevent(obj, event)=0A= =0A= for o in to_content(obj):=0A= if isinstance(o, _EventObj): removeevent_rec(o, event)=0A= =0A= def change_class(obj, newclass):=0A= """Change the class of OBJ to NEWCLASS, but keep the events it may = have."""=0A= events =3D obj._EventObj_stuff.events[:]=0A= for event in events: removeevent(obj, event)=0A= obj.__class__ =3D newclass=0A= for event in events: addevent(obj, event)=0A= =0A= ------=_NextPart_000_0149_01C267C9.4E13C0C0-- From pedronis@bluewin.ch Sun Sep 29 16:23:35 2002 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sun, 29 Sep 2002 17:23:35 +0200 Subject: [Python-Dev] Re: FYI (was: watching mutables?) Message-ID: <001701c267cc$3115d960$6d94fea9@newmexico> "Jiba" ha scritto nel messaggio news:3d96f25a$0$498$7a628cd7@news.club-internet.fr... > > Try EditObj (http://oomadness.tuxfamily.org/en/editobj) and look at the > module editobj.eventobj. > > It does exactely what you need, but it is quite a hack... FYI in Python 2.3: __class__ will be immutable for builtin types, and only mutable for (user) subtypes. See: http://mail.python.org/pipermail/python-checkins/2002-August/028681.html regards. From jrw@pobox.com Sun Sep 29 21:26:54 2002 From: jrw@pobox.com (John Williams) Date: Sun, 29 Sep 2002 15:26:54 -0500 Subject: [Python-Dev] proposal for interfaces References: <25A39AFEB31B06408C675CEF28199E5B073CA3@postur.ccp.cc> Message-ID: <3D97620E.2030405@pobox.com> Esteban U.C.. Castro wrote: > I like it a lot! Anyway, if it can be implemented in python as is, > what is the point of the PEP? Making the 'interface' root class and/or > InterfaceError builtins, maybe? Well, aside from the ego boost of having my very own PEP, I think the nature of interfaces is such that they're vastly more useful if lots of people use them. Putting something in the standard distribution almost guarantees that. > I have some comments which I thought I would bounce. I'll organize > these attending to the activities they relate to. Don't hesitate to > tell me if I'm sayig something stupid. :) You have some very good points; I'll address them individually below, but let me get it out of the way that most of my programming background is with compiled languages with strong type checking. As much as I love Python, sometimes I really miss the rigor of more stongly-typed languages, so rather than trying to make something very Pythonic, I've tried to go the opposite direction in order to complent what's there already. > Define an interface > =================== > > In your example, it seems that > > class Foo(interface): > def default_foo(self, a, b): > "Docstring for foo method." > print "Defaults can be handy." > > [I added the a, b arguments to illustrate one point below] > > Does the following: > - Defines the requirement that all objects that implement Foo have a > foo function > - Defines that foo should accept arguments of the form (a, b) maybe? > - Sets a doc string for the foo method. > - Sets a default implementation for the method. Yes, exactly. > Some questions on this: > > - Can an interface only define method names (and maybe argument > formats)? I think it would be handy to let it expose attributes. Actually I wanted to have attributes, too (and operators, since they're just methods). This brings up one of the uses I had in mind for the prefixes in front of the method names. To add a property, you could do something like this: doc_myProperty = "docstring for myproperty" readonly_myOtherProperty = "docstring for a read-only property" writeonly_myLastProperty = "docstring for write-only property" This would require implementions to either have properties named "myProperty", "myOtherProperty" and "myLastProperty", or define methods named __get__myProperty, __set__myProperty, __get__myOtherProperty, and __set__myLastProperty. > - Is method names (and maybe format of arguments) the only thing you > can 'promise' in the interface? In other words, is that the only > type of guarantee that code that works against the interface can > get? I think a __check__(self, obj) special method in interfaces > would be a simple way to boost their flexibility. Something I didn't include in the original message was the design-by-contract feature, which would allow pre- and postconditions to be specified for any method, like this: def check_foo(self, a, b): "docstring for foo" if not (some precondition for foo): raise ExceptionOfYourChoice if not (some other precondition for foo): raise DifferentExceptionOfYourChoice return lambda result: (postconditions of foo) # optional I'd even like to allow multiple declarations per method, so you could have a check_foo (which is always called, at least when __debug__ is true), and default_foo, which is called only in classes that don't give their own implementation of foo. > - For the uses you have given to the prefix_methodname notation so far, > I don't think it's really needed. Isn't the following sufficient? > > class Foo(interface): > def foo(self, a, b): > "foo docstring" > # nothing here; no default definition > def bar(self, a, b): > pass # no docstring, _empty definition_ Not IMHO, since I'd want methods with no implementation to raise NotImplementedError instead of silently returning None. One this that *could* be done to make the the simplest case (just a docstring?) easier (and make the syntax look more Pythonic) would be this: def foo(): "Docstring for foo" # Never called, so arguments aren't needed, but they would be nice # for documentation purposes. def __default__bar(self): (default implementation of bar) def __check__baz(self): (error checking for baz) This has to side effect of making the __ before the prefixes necessary, so you can still define define methods that start with "default", "check", etc. I suppose the aestheics of the __ are a matter of taste, though it does at least make the "magic" nature of the prefixes stand out better. > This has the side effect that a method with no default definition and no > doc string is a SyntaxError. Is this too bad? > - It would maybe be hard to figure out what such a method is supposed > to do, so you _should_ provide a docstring. > - If you're in a hurry, an empty docstring will do the trick. While in > 'quick and dirty mode' you probably won't be using interfaces a lot, > anyway. Exactly. > Defaults look indeed useful, but the really crucial aspect to lay down > in an interface definition is what does it guarantee on the objects > that implement it. If amalgamating this with default defs would > otherwise obscure it (there's another issue I'm addressing below), I > think defaults belong more properly to a class that implements the > interface, not to the interface definition itself. I got the idea of defaults from Haskell, where it's common to see an interface define methods with mutually recursive default definitions, kind of like (to use a somewhat silly Python example) defining __eq__, __ne__, __cmp__, etc. all in terms of one another and expecting implementations to define enough of the methods that everything works. > Check whether an object implements an interface > =============================================== > > From your examples, I get it that an object implements an interface > iff it is an instance of a class that implements that interface. So I > guess any checking of the requirements expressed by the interface is > done at the time when you bind() a class to an interface. This > paradigm perfectly fits static and strongly typed languages, but it > falls a bit short of the flexibility of python IMO. You can do very > funny things to classes and objects at runtime, which can break any > assumptions based on object class. Very good point. The dynamic nature of Python it what makes it possible to implement interfaces this way. It would be a shame (and a little ironic) if interfaces didn't play nice with dynamic code. > In your example: > > def foo_proc(foo_arg): > foo_proxy = Foo(foo_arg) > ... > x = foo_proxy.foo(a, b) > > [added a, b again] > > imagine foo_proc may only really cares that foo_arg is an object that > has a foo() method that takes (a, b) arguments (this is all Foo > guarantees). Here's where my compiled language bias comes in. If you only care the foo_arg has a certain method, you don't want to use interfaces at all. Using the interface doesn't just imply that foo_arg has a method named foo, but also that the method satisfies the requirements laid out in the interface definition. > * Will the Foo() call check this, or will it just check that some class > in foo_arg's bases is bound to the Foo interface? Both, in a way. The call to Foo() checks that foo_arg's class has been declared to implement Foo, no more and no less. Checking that the class implements Foo *correctly* is a more multifaceted problem. Some parts (like making sure the right method names exist) could happen when the interface is bound to the class, but other parts, like checking method pre- and postconditions would have to happen at every method call. > In the second case, if someone has been fiddling with foo_arg or some > of its base classes, foo_arg.foo() may no longer exist or it may have > a different signature. You can't really stop people from shooting themselves in the foot. Making methods disappear is black magic in my book. > * Why should Foo() _always_ fail for objects that _do_ meet the > requirements expressed by the interface Foo but have _not_ declared > that they implement the interface? > > Making such check optional allows implicit (not declared) interface > satisfaction for those who want it. This should extend the > applicability of interfaces. One of the major points of my design is to make the interface mechanism very formal and explicit. Keeping things implicit and informal is what Python is already good at so I don't want to go there. Also, having the separate "bind" call means that it's really not necessary for a class to declare that it implements the interface, only that some declaration of that fact exists somewhere. > Declare that an object implements an interface or part of it > ============================================================ > > Foo.bind(SomeClass) > > > Problems: > > * I agree the declaration had better be included in the class > definition, at least as an option. I agree, but as an option. The motivation for having interface bindings separate from the class (and interface) definitions is mainly so that it's not necessary to modify the source code for your classes to make them implement an interface, so you can do things like add a new interface to a builtin or legacy class without having access to the source code. This is not nearly as big a deal as with a language like C++ where you often can't get to the source code in any useful way, but there are other reasons to avoid touching the source, like avoiding the need to keep track of patches to 3rd-party code. OTOH, I agree that the level of control the "bindd" call gives you is usually not useful. I left out alterntives for the sake of brevity, but it would be easy enough to add it in the interest of keeping the most common case as simple as possible. > * Declarative better than procedural, for this purpose. Whether this is declarative of procedural is mostly a matter of perpective, IMHO. In my imagination, calls to bind occur only at the module level, and almost always immediately after the class or interface definition, so the "flavor" is declarative. > * Only classes (not instances) can declare that they implement an > interface. Maybe there should be something like a "bindinstance" method as well. I'm sure there's a way to do it, but I don't consider it a very high priority. > * Indexing notation to 'resolve' methods a bit counterintuitive. > What about: [snip] Good ideas. Your method has a lot of advantages, but it would be hard (and messy) to make it do everything you can do with seprarate method calls, and one thing I'm *very* reluctant to do is have two ways of doing everything with only subtle or stylistic differences between them. The best thing to do here may be to allow your style for simple cases (the 90% that would have required only a single "bind" call), but use the method-based syntax for anything more elaborate, like method renaming and unbinding subclasses. > Restrict > ======== > > This is, to make sure that an object is only accessed in the ways > defined in the interface (via the proxy). > > This should be optional too, but your syntax does this nicely; you > can call Foo() as an assertion of sorts and ignore the result. I think it would be very misleading to require than an object support a formal interface but then expect it to also support methods not specified in the interface. If this is what you want, the right thing to do is derive a new interface from the old one that has the extra functionality you need. > Note that the __implements__ method resolution magic would require > that you get a proxy, though. Unfortunately, yes. Of course there'd be nothing stopping you from calling the class's methods without going through the interface at all (provided you know the right names for them), but in that case you really just want to check that the object is an instance of a class, not that it implements an interface. OTOH, if you think the main problem with the proxy approach is just that it's verbose, perhaps I can interest you in this idea (or some variation thereof). Here's the Python iterator protocol implemented with interfaces instead of magic method names: class Iterable(interface): def iter(): "Return an object implmenting the Iterator interface." class Iterator(Iterable): def next(): "Return the next item or raise StopIteration." def iter(object): return Iterable(object).iter() If you want to be even lazier, let Foo.foo(x) a synonym for Foo(x).foo(), so you can define the "iter" function like this: iter = Iterable.__iter__ Here's how you might add the iterator protocol to builtin lists, string, and tuples (if it wasn't already there, of course): # Define iteration semantics. class SequenceIterator(object): def __init__(self, seq) self.seq = seq self.index = 0 def next(self): try: self.index += 1 return self.seq[self.index - 1] except IndexError: raise StopIteration # Bind the interface to this implementation. Iterator.bind(SequenceIterator) # Add the Iterable interface to existing classes that don't have an # "iter" method. for t in list, tuple, str: Iterable.bind(t) Iterable[t].iter = SequenceIterator Here are a few variations on the loop body, since you don't like the indexing notation: Iterable.bind(t, {"iter": SequenceIterator}) binding = Iterable.bind(t) binding.iter = SequenceIterator Iterable.bind(t) Iterable.bindmethod("iter", SequenceIterator) Whew! Ok, I guess I'm done now. Thanks you your comments! jw From martin@v.loewis.de Sun Sep 29 22:06:55 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 29 Sep 2002 23:06:55 +0200 Subject: [Python-Dev] Extension module difficulty w/pgen. In-Reply-To: References: Message-ID: Jonathan Riehl writes: > It seems to me that I should not have to use this workaround, which only > works on one of the systems I use. Does anyone have an idea as to what I > should do now? It appears that metagrammar.o is not needed in the Python executable, that's why the linker does not fetch it from the library when building Python. As for pgen, the first question would be why you need metagrammar.o in your extension module. Assuming there is a good reason to expose it, you should arrange to 1. exclude metagrammar.o from libpython.a, 2. include it explicitly into as a source for building the pgen module. HTH, Martin From greg@cosc.canterbury.ac.nz Mon Sep 30 00:39:44 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 30 Sep 2002 11:39:44 +1200 (NZST) Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: <200209271104.MAA27895@synaptics-uk.com> Message-ID: <200209292339.g8TNdiv28330@oma.cosc.canterbury.ac.nz> Gareth McCaughan : > - Rational numbers. $r"123/234" > - Regular expressions. $/"foo.*bar" > - Dates and times. $t"2002-09-27 11:38" > - Hostnames and ports. $h"www.google.com:80" This strikes me as ugly. There doesn't seem to be much, if any, syntactical advantage over using a constructor: Rat("123/234") Regex("foo.*bar") Date("2002-09-27 11:38") Port("www.google.com:80") These look cleaner and easier to read to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mhammond@skippinet.com.au Mon Sep 30 02:16:41 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Mon, 30 Sep 2002 11:16:41 +1000 Subject: [Python-Dev] Snapshot win32all builds of interest? Message-ID: I'm wondering if there is any interest in me making regular, basically untested win32all builds against the current Python CVS tree? It would be fairly simple for me to do - I run against CVS Python, so it is really just bundling up my latest built files into an installer .EXE. I would only do it for the current CVS trunk - ie, no Python 2.2 or earlier builds in this form. However, I would only bother if there were people willing to use it. I figure that if there aren't people in this forum who would use it, I won't find them anywhere ;) OTOH, people in this forum using CVS Python on Windows may prefer to use CVS and build their own win32all - I really have no clue ;) Thoughts? Mark. From dave@boost-consulting.com Mon Sep 30 01:36:58 2002 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 29 Sep 2002 20:36:58 -0400 Subject: [Python-Dev] Documentation: type-vs.-function Message-ID: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> I note that http://www.python.org/dev/doc/devel/lib/built-in-funcs.html#l2h-14 describes dict as a built-in function, whereas we all know that Guido's cool 2.2 changes made it into a type >>> dict Does this distinction matter? A little, I think. Calling it a function makes it sound like we're living in the past. Same goes for str, type, list, tuple, et. al. I realize that the type (especially ) acts like a function under many circumstances... un-important-ly y'rs, dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From martin@v.loewis.de Mon Sep 30 06:41:55 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 30 Sep 2002 07:41:55 +0200 Subject: [Python-Dev] Documentation: type-vs.-function In-Reply-To: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> References: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> Message-ID: "David Abrahams" writes: > Does this distinction matter? Yes. However, I think a few patches changing this have been rejected, on the grounds of being confusing to users. So careful wording is necessary, which probably requires mastery of the English language. Regards, Martin From esteban@ccpgames.com Mon Sep 30 07:27:00 2002 From: esteban@ccpgames.com (Esteban U.C.. Castro) Date: Mon, 30 Sep 2002 06:27:00 -0000 Subject: [Python-Dev] proposal for interfaces Message-ID: <25A39AFEB31B06408C675CEF28199E5B073CA7@postur.ccp.cc> Thanks for your reply-- and thanks for taking the work of fixing the=20 formatting in the quotations. I thought it would maybe be considered=20 spam to send the message again just to fix the mess. (I think I know what caused the problem last time; I hope this one reads fine).=20 Is there some sort of policy on this? > Well, aside from the ego boost of having my very own PEP, I think the > nature of interfaces is such that they're vastly more useful if lots=20 > of people use them. Putting something in the standard distribution=20 > almost guarantees that. Agreed. I was just curious about what is exactly that _something_ you=20 want to put in the standard distribution. Just a recommendation on the=20 docs? Have some of the existing modules refactored to use this method?=20 [P.S.: I'm afraid your ego will have to give up on having your own=20 interfaces PEP, since there is already one. See below.] I don't know if there are precedents of code practices being 'officially'=20 endorsed by the python development team, when it implies no changes to=20 either the language or the standard library. To start with a non-intrusive=20 addition, I can think of a module defining precisely the interfaces of=20 builtin types, and other commonplace de facto interfaces that already=20 exist in the standard library. Imagine you want to use a function that takes a builtin type, but you=20 want to pass your own fake instance. Making it just inherit the builtin=20 type is an option, but maybe that's not practical, or yours is, too, a=20 type not implemented in python, or you don't want to inherit the=20 behavior of the builtin type. Having the interface formalized somewhere will help you know what will=20 be expected from your custom object. Most 'interfaces' in standard python libraries (eg. iterable, iterator, stream...). are really simple, and it has worked quite well to have=20 them as just de facto. I think code checking for an interface would be more expressive than code checking for method names or just trying to=20 use the object and catch exceptions (you can still do this for other=20 reasons if you want). The latter also relies on everyone knowing and=20 agreeing on what makes an int an int, and a list a list. Now, many functions take lists, but only use a subset of the list=20 interface. It would be handy if common subsets of (e.g.) the list=20 interface could be identified and formalized in a standard module. We=20 could define a SimpleReadList that would only guarantee getting=20 items by index, a SliceRWDList() that would guarantee index and slice getting, setting, and deletion, and the complete List interface with all the append, insert, sort methods.=20 [Note: of course it wouldn't be done like this, but I don't have a=20 good idea of what are the most commonly used subsets of the list=20 interface. Names ugly on purpose, so you won't take them seriously=20 :).] This is also and example of why I see the usefulness or implicit=20 interface satisfaction. This way interface definitions can have=20 'retroactive' effect, so you don't have to mess with the builtin=20 types at all, in order to define how they interact with other objects=20 (this is, to define and use their interfaces). > As much as I love Python, sometimes I really miss the rigor of more=20 > stongly-typed languages, so rather than trying to make something=20 > very Pythonic, I've tried to go the opposite direction in order to=20 > complent what's there already. It looks like you are not alone in this-- or at least others have=20 been where you are. You may want to take a look at the homepage for=20 this (retired) Special Interest Group: http://www.python.org/sigs/types-sig/ While looking for this, I found a PEP, proposed by this group, which=20 addresses interfaces too: http://www.python.org/peps/pep-0245.html I think there's a point in continuing this discussion though, as > Dissenting Opinion > This PEP has not yet been discussed on python-dev. If this is not a good time/place to talk about this, I guess we'll be warned :). I got a strong deja-vu reading this PEP. It matches so closely some=20 of the changes I proposed to your model, that I must have read it=20 long ago and then forgotten about it. Still, this suffers from some=20 statically-typed background and may have the problems I pointed about=20 your proposal; it seems to limit itself to the functionality and needs=20 of Java interfaces (the only example I am more or less familiar with).=20 Since there is a group effort behind it, I guess they have taken this=20 into account and still agreed on this solution for some reason; I=20 haven't looked at the archives yet. Also, this PEP is less shy about proposing changes to the language=20 itself. This is maybe a good idea; if one of the important features=20 for the usefulness of interfaces is that they will be widely used, it would help if there was only one universally accepted way to use=20 them. Endorsing one such way in the language itself should help. > Actually I wanted to have attributes, too (and operators, since they're > just methods). This brings up one of the uses I had in mind for the > prefixes in front of the method names. To add a property, you could do > something like this: > > doc_myProperty =3D "docstring for myproperty" > readonly_myOtherProperty =3D "docstring for a read-only property" > writeonly_myLastProperty =3D "docstring for write-only property" Again, why not just: # name access docstring myProperty =3D "rw", "docstring for myProperty" myOtherProperty =3D "r", "docstring for a read-only property" myLastProperty =3D "w", "docstring for write-only property" > This would require implementions to either have properties named > "myProperty", "myOtherProperty" and "myLastProperty", or define methods > named __get__myProperty, __set__myProperty, __get__myOtherProperty, and > __set__myLastProperty. =20 It could just require that implementations can be called getattr and/or setattr (depending on the access declared). There is the problem to check setattr non-destructively when the object is read-only. Maybe this is an issue somewhere else too? I wish this can be solved to keep the syntax as=20 simple as possible.=20 > Something I didn't include in the original message was the > design-by-contract feature, which would allow pre- and postconditions to > be specified for any method, like this: > > def check_foo(self, a, b): > "docstring for foo" > if not (some precondition for foo): > raise ExceptionOfYourChoice > if not (some other precondition for foo): > raise DifferentExceptionOfYourChoice > return lambda result: (postconditions of foo) # optional I think the ability to set attributes on python functions, or builtin=20 properties (I'd have to refresh my memory on these :) could be used for=20 this. Either way, the syntax for the client user could be something=20 like: class SomeInterface(interface):=20 def foo(self, a, b): "foo doc" def bar(self, a, b): "bar doc" =20 foo.__pre__ =3D some_checking_func # takes a, b, raises something if=20 # they're wrong foo.__post__ =3D other_checking_func # takes the return value, makes sure # it's not broken bar.__around__ =3D yet_another_checking_func # this is called = _instead_ of # the function, and _should_ # call the function in turn This makes it more explicit that __pre__, __post__ and/or __around__ are something that relates to foo in some way. Your second approach (bar,=20 __default__bar) comes closer to this, and it may be more convenient than first defining the function, then assigning it. Having part of the name=20 of an object have an special meaning is convenient but a bit hacky. I=20 myself do it often, but I don't think I'd propose a standard based on=20 that. A matter of personal taste, I guess. >> - For the uses you have given to the prefix_methodname notation so far, >> I don't think it's really needed. Isn't the following sufficient? >> =20 >> class Foo(interface): >> def foo(self, a, b): >> "foo docstring" >> # nothing here; no default definition >> def bar(self, a, b): >> pass # no docstring, _empty definition_ > > Not IMHO, since I'd want methods with no implementation to raise > NotImplementedError instead of silently returning None. One this that > *could* be done to make the the simplest case (just a docstring?) > easier (and make the syntax look more Pythonic) would be this: The absence of 'pass' in an interface method (in foo) would be considered=20 absence of any default implementation and therefore you'd get an=20 InterfaceError when trying to validate/get proxy on the object if it=20 doesn't implement that method. In second thought, I admit this is dirty, and I don't know if this magic is even possible. A more explicit approach could be: class Foo(interface): def foo(self, a, b): "foo docstring" raise NotImplementedError def bar(self, a, b): pass # no docstring, _empty definition_ Or, for consistency with the proposal above (and with the existing PEP): class Foo(interface): def foo(self, a, b): "foo docstring" # no definition allowed here, sorry def bar(self, a, b): "" # empty docstring required at the very least bar.__default__ =3D lambda a, b: None =20 Now that I think about it, the __underscores__ could maybe be taken out=20 for interface method special attributes. They remind us that we are=20 looking at magic stuff that the 'system' will be using in special ways, but if you are not expected to assign arbitrary attributes to interface=20 methods, then there is no ambiguity. Aesthetic choice, again. > I got the idea of defaults from Haskell, where it's common to see an > interface define methods with mutually recursive default definitions, > kind of like (to use a somewhat silly Python example) defining __eq__, > __ne__, __cmp__, etc. all in terms of one another and expecting > implementations to define enough of the methods that everything works. I like it! :) I may be being too purist at this, but I still think that=20 doesn't belong in the interface definition. I admit putting defaults=20 there is convenient, but I wish we could find a solution that is both convenient and keeps implementation details out of the interface=20 definition. Although the existing interface PEP stresses the separation between=20 interface and class, it provides one possible solution for this: it=20 talks about a deferred() method (in interfaces) that will return a=20 class that implements the interface. In the PEP, it seems this is only=20 meant to provide error reporting, but I guess it could be put to good=20 use in other ways.=20 I admit I don't understand the 'deferred' name :), I'm not sure how=20 that default class would be defined, and whether it is intended to be=20 customizable in the PEP. Having a convenient, standard way to define=20 a default class and associate it with an interface without looking=20 like something intrinsic to it, _that_ would be, IMHO, the ideal=20 solution. =09 >> In your example: >> >> def foo_proc(foo_arg): >> foo_proxy =3D Foo(foo_arg) >> ... >> x =3D foo_proxy.foo(a, b) >> >> [added a, b again] >> >> imagine foo_proc may only really cares that foo_arg is an object that >> has a foo() method that takes (a, b) arguments (this is all Foo >> guarantees). > > Here's where my compiled language bias comes in. If you only care the > foo_arg has a certain method, you don't want to use interfaces at all. > Using the interface doesn't just imply that foo_arg has a method named > foo, but also that the method satisfies the requirements laid out in the > interface definition. I agree; this case was only simplified for the sake of exposition. The=20 point here is that the requirements laid out in the interface definition may possibly be checked on objects (as opposed to classes), at runtime=20 (as opposed to, um, import-time :).=20 If you really want to check the class you can do so in one specific=20 interface. If you don't want the overhead of checking every time, and you know you won't be messing with class instances or objects, you may either check by class, or maybe even cache the results of checking on classes/objects. >> In the second case, if someone has been fiddling with foo_arg or some >> of its base classes, foo_arg.foo() may no longer exist or it may have >> a different signature. > > You can't really stop people from shooting themselves in the foot. > Making methods disappear is black magic in my book. Very true. This is of course no crucial point. Still, since method calls are always late-bound I think it only makes sense that restrictions on=20 them _can_, at least, be late-checked. >> * I agree the declaration had better be included in the class >> definition, at least as an option. > > I agree, but as an option. The __implements__ alternative lets you fiddle with interfaces outside=20 the class too: # SomeClass itself defined elsewhere SomeClass.__implements__ +=3D (Foo,)=20 All in all, the syntax I like the most so far is the one described in=20 the existing PEP. >> * Declarative better than procedural, for this purpose. > > Whether this is declarative of procedural is mostly a matter of > perpective, IMHO. In my imagination, calls to bind occur only at the > module level, and almost always immediately after the class or interface > definition, so the "flavor" is declarative. You're right; an assignment is just as procedural as a function call.=20 It was very personal aesthetic appreciation again; a function call looks more like it's "doing" something, to me. But that is very arguable. >> * Only classes (not instances) can declare that they implement an >> interface. > > Maybe there should be something like a "bindinstance" method as well. > I'm sure there's a way to do it, but I don't consider it a very high > priority. Me neither. I think declaration would typically be done in a per class, not per instance, basis. Still, due to the nature of python it is the=20 instance (the object, really) who implements (or fails to) the interface.=20 So I would add it, for completeness and to reflect this, *if* it would=20 not require any specific syntax or additional complication. I think the=20 way __implements__ would be searched would be natural for python users,=20 since it is consistent with what is being done for __dict__, for example.=20 > Good ideas. Your method has a lot of advantages, but it would be hard > (and messy) to make it do everything you can do with seprarate method > calls, and one thing I'm *very* reluctant to do is have two ways of > doing everything with only subtle or stylistic differences between them. I agree there should not be two ways. :>=20 > The best thing to do here may be to allow your style for simple cases > (the 90% that would have required only a single "bind" call), but > use the method-based syntax for anything more elaborate, like method > renaming and unbinding subclasses. Method renaming and unbinding subclasses are supported in the alternative and rather easy; I guess the weird formatting obscured this :). Method renaming: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D class I1(interface): def f(self): "" class I2(interface): def f(self, a, r, g, s): "" class SomeClass: def i_f(self): pass i1f.__implements__ =3D I1.f def g_f(self, a, r, g, s): pass i2f.__implements__ =3D I2.f [Note: if 'implements' is introduced as a keyword, as in the PEP, we=20 could just as well declare def g_f(self, a, r, g, s) implements I2.f: ... ] Unbinding subclasses: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D class I1(interface): ... class I2interface): ... class Base:=20 __implements__ =3D (I1, I2) class Sub(Base): # it would be more pretty to say (not I2,), but it would involve # some language hacking I guess, while (-I2,) only requires to # override __neg__ in the interface=20 __implements__ =3D (-I2,) >> [using proxies] should be optional too, but your syntax does this=20 >> nicely; you can call Foo() as an assertion of sorts and ignore the=20 >> result. > > I think it would be very misleading to require than an object support a > formal interface but then expect it to also support methods not > specified in the interface. If this is what you want, the right thing > to do is derive a new interface from the old one that has the extra > functionality you need. As you pointed out, in the real world you won't be defining interfaces for very simple things like "has a write() method". This means, you may=20 still want to use other forms of validation, or do without validation at all (for aspects of the code which are still in experimental phase,=20 for example). You may even use the interface more informally as=20 a mere sanity check.=20 Anyway, method redirection almost enforces that the interface-specific=20 functionality of an object be accessed through a proxy. Doing so looks=20 like the right think to do anyway, so I don't consider this unfortunate=20 at all. I like the proxy idea and the syntax you propose for it. The iterator=20 example was a nice read, anyway :). Was this a typo? class Iterator(Iterable): Meant class Iterator(interface): ? As a conclussion, I am glad about the stress, in python, to generally make things easier for you, rather than paternalistically try to keep=20 you from doing the evil. Since I came from Java, it took a bit using=20 to, but I have seen python scaling nicely to rather big sized projects=20 without this becoming a serious issue. I think there is a point to=20 standardize and automate type checking in python, but I believe it can and should be done without betraying this philosophy. Enough ranting for today! :)=20 Esteban. From esteban@ccpgames.com Mon Sep 30 07:31:23 2002 From: esteban@ccpgames.com (Esteban U.C.. Castro) Date: Mon, 30 Sep 2002 06:31:23 -0000 Subject: [Python-Dev] bad formatting Message-ID: <25A39AFEB31B06408C675CEF28199E5B06EB0E@postur.ccp.cc> Sorry again. Anyone knows of a 'sandbox' mailing list where I could experiment to fix the formatting problems? From aleax@aleax.it Mon Sep 30 07:38:54 2002 From: aleax@aleax.it (Alex Martelli) Date: Mon, 30 Sep 2002 08:38:54 +0200 Subject: [Python-Dev] Documentation: type-vs.-function In-Reply-To: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> References: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> Message-ID: On Monday 30 September 2002 02:36 am, David Abrahams wrote: > I note that > http://www.python.org/dev/doc/devel/lib/built-in-funcs.html#l2h-14 > describes dict as a built-in function, whereas we all know that Guido's > cool 2.2 changes made it into a type > > >>> dict > > > > Does this distinction matter? A little, I think. Calling it a function > makes it sound like we're living in the past. Same goes for str, type, > list, tuple, et. al. I realize that the type (especially ) > acts like a function under many circumstances... Trying to cover both 2.1 and 2.2 in the coming Nutshell, I've resorted to periphrases such as "the built-in dict" or "the dict built-in" (the latter uses "built-in" as a noun, I'm not yet sure the editor will let that go by). I've also tried to use 'callable' systematically instead of 'function' wherever other callables (types, bound-methods, etc) can be substituted in lieu of functions. In documenting 2.2 or 2.3 only, I think such hedging is not warranted. It's important, when feasible, to clarify what built-ins are types -- a type has MORE functionality than a function, after all (in particular, one can subclass it, while one can't subclass a function). Alex From esteban@ccpgames.com Mon Sep 30 07:47:34 2002 From: esteban@ccpgames.com (Esteban U.C.. Castro) Date: Mon, 30 Sep 2002 06:47:34 -0000 Subject: [Python-Dev] proposal for interfaces (errata) Message-ID: <25A39AFEB31B06408C675CEF28199E5B06EB0F@postur.ccp.cc> g_f should be i2f. SPAM END :) ___________ Method renaming: ---------------- class I1(interface): def f(self): "" class I2(interface): def f(self, a, r, g, s): "" class SomeClass: def i_f(self): pass i1f.__implements__ =3D I1.f def g_f(self, a, r, g, s): pass i2f.__implements__ =3D I2.f [Note: if 'implements' is introduced as a keyword, as in the PEP, = we=3D20 could just as well declare def g_f(self, a, r, g, s) implements I2.f: ... ] From herald@ns1.nabitel.com Mon Sep 30 05:42:36 2002 From: herald@ns1.nabitel.com (herald@ns1.nabitel.com) Date: Mon, 30 Sep 2002 13:42:36 +0900 Subject: [Python-Dev] (ad)Strong WebRobot/eMailId Collector: Free Download ! Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_939A3_01C26887.3C7F6210 Content-Type: text/plain; charset="ks_c_5601-1987" Content-Transfer-Encoding: quoted-printable Sorry for interrupting you - click refuse for no more mail... =09 =A1=A1 =09 - Welcome to NabiTel's software products and portal services - =09 Software Products =09 =20 Web Robot: also called web spider or web crawler, collects useful web page informations by navigating world wide web sites.=20 Download free trial version now ! =20 eMail ID Collector: Collects email ids publicly opened on various web pages, with good intention.=20 Download free trial version now ! =20 Portal Services =09 Web Portal: Do you have your own home page and want to broadcast it all over the world ? Register your home page to NabiTel Portal Now !! (nabi=3Da butterfly) Register your home page now, it's free ! =20 Automobiles: Do you want to sell or buy automobiles ? Cars, trucks, limos, airplanes, ships,.... All That Cars are here ! Register your vehicles now, it's free ! =20 Computers: Do you want to sell or buy computers ? PCs, printers, scanners, servers, mainframes, .... All That Computers are here !=20 Register your computers now, it's free ! =20 =09 Food & Restaurants: Are you seeking for a nice place to eat ? Or do you run a restaurant ? Foods of the world, restaurants of the world, .... All That Foods are here !=20 Register your restaurant now, it's free ! =20 Have a nice day. Thank you. =09 ------=_NextPart_000_939A3_01C26887.3C7F6210 Content-Type: text/html; charset="ks_c_5601-1987" Content-Transfer-Encoding: quoted-printable Nabitel information broadcast mail
=
= =
=
Sorry for = interrupting you - click refuse for no more = mail...
=A1=A1
- = Welcome to NabiTel's software products and portal = services -
Software Products

Web Robot: also called web spider or web = crawler, collects useful web page informations by navigating = world wide web sites.

Download free trial version now !

=

eMail ID Collector: Collects = email ids publicly opened on various web pages, with good = intention.

Download free trial version now !

=
= Portal Services =

Web = Portal: Do you have your own home page and want to broadcast = it all over the world ? Register your home page to NabiTel = Portal Now !! (nabi=3Da butterfly)

Register your home page now, = it's free !

Automobiles: Do you want to = sell or buy automobiles ? Cars, trucks, limos, airplanes, = ships,....  All That Cars are here !

=

Register your vehicles now, it's free = !

=

Computers: Do you want to sell = or buy computers ? PCs, printers, scanners, servers, mainframes, = ... All That Computers are here !

Register your computers now, = it's free !

Food & Restaurants: = Are you seeking for a nice place to eat ? Or do you run a = restaurant ? Foods of the world, restaurants of the world, .... = All That Foods are here !

Register your restaurant now, it's free = !

Have a nice = day.  Thank you.
=
------=_NextPart_000_939A3_01C26887.3C7F6210-- From aleax@aleax.it Mon Sep 30 08:35:34 2002 From: aleax@aleax.it (Alex Martelli) Date: Mon, 30 Sep 2002 09:35:34 +0200 Subject: [Python-Dev] proposal for interfaces In-Reply-To: <3D97620E.2030405@pobox.com> References: <25A39AFEB31B06408C675CEF28199E5B073CA3@postur.ccp.cc> <3D97620E.2030405@pobox.com> Message-ID: On Sunday 29 September 2002 10:26 pm, John Williams wrote: > Esteban U.C.. Castro wrote: > > I like it a lot! Anyway, if it can be implemented in python as is, > > what is the point of the PEP? Making the 'interface' root class and/or > > InterfaceError builtins, maybe? > > Well, aside from the ego boost of having my very own PEP, I think the > nature of interfaces is such that they're vastly more useful if lots of > people use them. Putting something in the standard distribution almost > guarantees that. I'm not following all of the details (and I'm prejudiced by the mentioned bias for static typing), but I'd like to second this specific point: some features, and interfaces are definitely one, have growing usefulness for all the more people use them. This is what economists call "a network effect" (if you're the only one in the world to own a phone, its usefulness to you is nil; if there are just two phones in the world, each has at least one bit of usefulness; the more people use phones, the more useful each phone becomes to its user). If most Python modules used interfaces (of almost any given kind), those interfaces could be very useful; if almost no module did, the usefulness would be limited to inter-module communication for very complex systems one develops oneself -- not nil, but much less. Alex From mwh@python.net Mon Sep 30 09:14:20 2002 From: mwh@python.net (Michael Hudson) Date: 30 Sep 2002 09:14:20 +0100 Subject: [Python-Dev] ATTENTION! Releasing Python 2.2.2 in a few weeks In-Reply-To: Guido van Rossum's message of "Sat, 28 Sep 2002 10:22:58 -0400" References: <200209202126.g8KLQVI24554@pcp02138704pcs.reston01.va.comcast.net> <2mn0q2fip7.fsf@starship.python.net> <200209281422.g8SEMw720102@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2mlm5jlwrn.fsf@starship.python.net> Guido van Rossum writes: > > > I'd like to release something called Python 2.2.2 in a few weeks (say, > > > around Oct 8; I like Tuesday release dates). > > > > One minor point of concern: I think Jack Jansen's on holiday. Perhaps > > we should wait for him to get back... > > He should be back by Oct 5 or 6 if what he told me of his schedule is > true. I'm not sure that it would matter much if MacPython 2.2.2 was > released a week after the main release. Maybe we should do one > release candidate anyway and give him space that way. OK, if you knew about this, then I'll assume you have it in hand... Cheers, M. -- Java is a WORA language! (Write Once, Run Away) -- James Vandenberg (on progstone@egroups.com) & quoted by David Rush on comp.lang.scheme From fredrik@pythonware.com Mon Sep 30 10:01:24 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 30 Sep 2002 11:01:24 +0200 Subject: [Python-Dev] Re: User extendable literal modifiers ?! References: <200209292339.g8TNdiv28330@oma.cosc.canterbury.ac.nz> Message-ID: <00f701c2685f$f67611f0$0900a8c0@spiff> greg wrote: > > - Rational numbers. $r"123/234" > > - Regular expressions. $/"foo.*bar" > > - Dates and times. $t"2002-09-27 11:38" > > - Hostnames and ports. $h"www.google.com:80" >=20 > This strikes me as ugly. There doesn't seem to be much, if any, > syntactical advantage over using a constructor: >=20 > Rat("123/234") > Regex("foo.*bar") > Date("2002-09-27 11:38") > Port("www.google.com:80") >=20 > These look cleaner and easier to read to me. isn't the whole idea that with a special syntax, you can do some of the processing when compiling the script? it's pretty pointless to invent = more ways to call functions with string literals as arguments... btw, the following note is slightly related to this topic, and has been generating some buzz lately (at least in my mailbox): http://effbot.org/zone/idea-xml-literal.htm From martin@v.loewis.de Mon Sep 30 13:59:03 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 30 Sep 2002 14:59:03 +0200 Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: <00f701c2685f$f67611f0$0900a8c0@spiff> References: <200209292339.g8TNdiv28330@oma.cosc.canterbury.ac.nz> <00f701c2685f$f67611f0$0900a8c0@spiff> Message-ID: "Fredrik Lundh" writes: > isn't the whole idea that with a special syntax, you can do some of the > processing when compiling the script? it's pretty pointless to invent more > ways to call functions with string literals as arguments... That can't be the idea: Marshalling would store the string form, so any compilation done until marshalling must be undone. Perhaps the idea is that these things are interpreted once before byte code interpretation starts (i.e. after loading a .pyc file). In that case, a number of interesting questions arise: - in what order, precisely, are those things evaluated? Probably in textual order, but this is not that easy, since the marshalling procedure might make such a requirement unimplementable. - are duplicate occurrences eliminated? If so, how does one determine duplicates? In any case, I think users will be surprised if $h"www.google.com:80" causes a dial-up connection to be set up as soon as a module is imported. Regards, Martin From dave@boost-consulting.com Mon Sep 30 14:27:43 2002 From: dave@boost-consulting.com (David Abrahams) Date: Mon, 30 Sep 2002 09:27:43 -0400 Subject: [Python-Dev] Documentation: type-vs.-function References: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> Message-ID: <0e9f01c26885$7551a820$6501a8c0@boostconsulting.com> From: "Alex Martelli" > On Monday 30 September 2002 02:36 am, David Abrahams wrote: > > I note that > > http://www.python.org/dev/doc/devel/lib/built-in-funcs.html#l2h-14 > > describes dict as a built-in function, whereas we all know that Guido's > > cool 2.2 changes made it into a type > > > > >>> dict > > > > > > > > Does this distinction matter? A little, I think. Calling it a function > > makes it sound like we're living in the past. Same goes for str, type, > > list, tuple, et. al. I realize that the type (especially ) > > acts like a function under many circumstances... > > Trying to cover both 2.1 and 2.2 in the coming Nutshell, I've resorted to > periphrases such as "the built-in dict" or "the dict built-in" (the latter > uses "built-in" as a noun, I'm not yet sure the editor will let that go by). > > I've also tried to use 'callable' systematically instead of 'function' > wherever other callables (types, bound-methods, etc) can be substituted > in lieu of functions. In documenting 2.2 or 2.3 only, I think such hedging > is not warranted. It's important, when feasible, to clarify what built-ins > are types -- a type has MORE functionality than a function, after all (in > particular, one can subclass it, while one can't subclass a function). It's probably also worth noting that the dict type is not documented anywhere, except as a function. -Dave From guido@python.org Mon Sep 30 15:33:41 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 30 Sep 2002 10:33:41 -0400 Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: Your message of "Mon, 30 Sep 2002 11:01:24 +0200." <00f701c2685f$f67611f0$0900a8c0@spiff> References: <200209292339.g8TNdiv28330@oma.cosc.canterbury.ac.nz> <00f701c2685f$f67611f0$0900a8c0@spiff> Message-ID: <200209301433.g8UEXfP14842@pcp02138704pcs.reston01.va.comcast.net> [effbot] > isn't the whole idea that with a special syntax, you can do some of > the processing when compiling the script? it's pretty pointless to > invent more ways to call functions with string literals as > arguments... Not necessarily. Domain-specific notations are useful with or without compile-time processing, and sometimes the added noise of the function call syntax + string literals can get in the way of readability. (Hey, binary operators are [mostly] just another syntax for calling functions, and around here we all agree that they're a good thing. :-) That said, I'm not very enamored of the $x"foo" notation -- too much line noise. MAL's original minimalistic proposal (123x, or pehaps also 123.456x, and maybe even 1.23e-456x) seems cleaner in cases where it's applicable. I don't expect Python will ever grow date/time or (heaven forbid) IP address literals, and we already have r"regex" literals. > btw, the following note is slightly related to this topic, and has been > generating some buzz lately (at least in my mailbox): > > http://effbot.org/zone/idea-xml-literal.htm That looks interesting in a futuristic kind of way. I'm curious why you decided not to return fixed-type tuples of the form (tag, attrs, content) -- that seems easier to deal with than having to deal with both (tag, content) and (tag, attrs, content). Tuples used as records ought to have a fixed lay-out. Parsing this would be tricky -- the tokenizer would have to know in what state the parser is in order to tell when to switch to XML if it sees a '<'. And if you want to use a standard XML parser you'd have to be careful to stop reading after the final '>'. And what can this do that you can't do by putting it in a string literal and feeding it to a convenience function? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Mon Sep 30 15:58:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 30 Sep 2002 10:58:03 -0400 Subject: [Python-Dev] Re: Documentation: type-vs.-function In-Reply-To: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> References: <0cf201c2681a$2912c6d0$6501a8c0@boostconsulting.com> <0H3800KZ9NSXEZ@mtain01.icomcast.net> Message-ID: <15768.26235.434551.130411@grendel.zope.com> David Abrahams writes: > Does this distinction matter? A little, I think. Calling it a function > makes it sound like we're living in the past. Same goes for str, type, > list, tuple, et. al. I realize that the type (especially ) > acts like a function under many circumstances... It definately matters. Alex Martelli writes: > It's important, when feasible, to clarify what built-ins are types > -- a type has MORE functionality than a function, after all (in > particular, one can subclass it, while one can't subclass a > function). I agree. The current somewhat-vague plan is to add a new section parallel to the section on built-in functions that lists the built-in types exposed in the __builtin__ module. This would make it easier to describe these types and their ability to be subclassed in a more rational manner than in their current location. Placeholder entries will be maintained for the function entries so people accustomed to looking in the current location won't be completely lost. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From guido@python.org Mon Sep 30 16:00:48 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 30 Sep 2002 11:00:48 -0400 Subject: [Python-Dev] Snapshot win32all builds of interest? In-Reply-To: Your message of "Mon, 30 Sep 2002 11:16:41 +1000." References: Message-ID: <200209301500.g8UF0mL18039@pcp02138704pcs.reston01.va.comcast.net> > I'm wondering if there is any interest in me making regular, > basically untested win32all builds against the current Python CVS > tree? > > It would be fairly simple for me to do - I run against CVS Python, > so it is really just bundling up my latest built files into an > installer .EXE. I would only do it for the current CVS trunk - ie, > no Python 2.2 or earlier builds in this form. However, I would only > bother if there were people willing to use it. I figure that if > there aren't people in this forum who would use it, I won't find > them anywhere ;) I think that would be useful, especially if the current 2.2-compatible win32all does not work with 2.3, or if you have added features since that was last released. > OTOH, people in this forum using CVS Python on Windows may prefer to > use CVS and build their own win32all - I really have no clue ;) Not me. I know how to build and install Python from source on Windows, but setting up another project is a major pain for a Unix weenie like me, so I'd much prefer a binary distribution. --Guido van Rossum (home page: http://www.python.org/~guido/) From jepler@unpythonic.net Mon Sep 30 16:16:12 2002 From: jepler@unpythonic.net (Jeff Epler) Date: Mon, 30 Sep 2002 10:16:12 -0500 Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: <00f701c2685f$f67611f0$0900a8c0@spiff> References: <200209292339.g8TNdiv28330@oma.cosc.canterbury.ac.nz> <00f701c2685f$f67611f0$0900a8c0@spiff> Message-ID: <20020930151601.GA20279@unpythonic.net> On Mon, Sep 30, 2002 at 11:01:24AM +0200, Fredrik Lundh wrote: > btw, the following note is slightly related to this topic, and has been > generating some buzz lately (at least in my mailbox): > > http://effbot.org/zone/idea-xml-literal.htm This is a little like what I implemented for 'pyhtml'. It was inteded to be an extension to the Quixote templating system, so it used the idea that a HTML tag embedded in the code should write itself directly to the output, like the result of expression statements already does in templates. An excerpt the README: The following code:
    for i in range(10):
  • i would output something like
    • 0
    • 2
    • ....
    • 9
    As you can see, I let start a block, and let blocks end according to Python's normal indentation rules. The productions added to the grammar were: compound_stmt: ... | tag_stmt tag_stmt: '<' NAME [tag_args] '>' suite tag_args: NAME '=' expr (',' NAME '=' expr)* [','] so that
    "this might be blue" would also work. I thought it was rather cute to reverse the normal practice of finding a way to shoehorn Python syntax into the midst of an HTML document, but never wrote anything serious using pyhtml. The remains of the project can be seen at http://unpythonic.net/~jepler/falcon/pyhtml/ Jeff From greg@cosc.canterbury.ac.nz Mon Sep 30 23:22:39 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 01 Oct 2002 10:22:39 +1200 (NZST) Subject: [Python-Dev] Re: User extendable literal modifiers ?! In-Reply-To: <00f701c2685f$f67611f0$0900a8c0@spiff> Message-ID: <200209302222.g8UMMda01522@oma.cosc.canterbury.ac.nz> Fredrik Lundh : > isn't the whole idea that with a special syntax, you can do some of the > processing when compiling the script? I suppose the literal object could be precomputed when compiling -- but how would you marshal it when saving the bytecode? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+