From andymac at bullseye.apana.org.au Wed Sep 1 00:11:57 2004 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Wed Sep 1 03:14:52 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Misc NEWS, 1.1125, 1.1126 In-Reply-To: <20040831140352.GC15320@rogue.amk.ca> References: <20040831140352.GC15320@rogue.amk.ca> Message-ID: <20040901080957.D88920@bullseye.apana.org.au> On Tue, 31 Aug 2004, A.M. Kuchling wrote: > On Tue, Aug 31, 2004 at 06:51:03AM -0700, akuchling@users.sourceforge.net wrote: > > Add news item. > > +- The mpz, rotor, and xreadlines modules, all deprecated in earlier > > + versions of Python, have now been removed. > > + > > Well, *that* was messier than I was expecting... Done now. > > I haven't touched the Makefiles for the PC and OS2 ports to remove > these modules; if the maintainers want me to do that, please let me > know. I'll take care of the OS2 fixes (though I probably won't be able to do so for a week or so). Regards, Andrew -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From gvanrossum at gmail.com Wed Sep 1 06:59:25 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 1 06:59:30 2004 Subject: [Python-Dev] Rejecting the J2 decorators proposal Message-ID: Robert and Python-dev, I've read the J2 proposal up and down several times, pondered all the issues, and slept on it for a night, and I still don't like it enough to accept it. The only reason to accept it would be to pacify the supporters of the proposal, and that just isn't a good enough reason in language design. However, it got pretty darn close! I'm impressed with how the community managed to pull together and face the enormous challenge of picking a single alternative (from more than two dozen on the Wiki!) and arguing consistently. I expect to see more proposals like this in the future, and I'm sure that some of them will be good enough to make it into the language. I've also (again) learned a lesson: dramatic changes must be discussed with the community at large. In a large enough group there are no uncontroversial proposals, so this will take time, but it's worth it -- one of the main issues with the @decorator syntax was not technical but socio-political, in the sense that it hadn't been properly discussed outside a *very* small circle. I take the full blame for that, and I don't want to hide behind my current lack of time which, realistically, won't change until either my ESI stock options earn me an early retirement, or the PSF strikes it rich and can pay me full time :-). So let me explain why I'm not choosing J2, and what's next. There are two major issues and one minor that made me decide against J2. Major issue one: the syntactic form of an indented block strongly suggests that its contents should be a sequence of statements, but in fact it is not -- only expressions are allowed, and there is an implicit "collecting" of these expressions going on until they can be applied to the subsequent function definition. To me, this is a more serious problem than the namespace questions brought up in the proposal (unfortunately that particular section of the proposal is its most confused part; but even if the text had been crystal clear, the problem remains). The best counter-argument to this I've heard is "you'll get used to it", which is also what I'm saying of @decorators; and many people have already testified that they indeed got used to it and even liked it. Major issue two: the keyword starting the line that heads a block draws a lot of attention to it. This is true for "if", "while", "for", "try", "def" and "class". But the "using" keyword (or any other keyword in its place) doesn't deserve that attention; the emphasis should be on the decorator or decorators inside the suite, since those are the important modifiers to the function definition that follows. When a function definition carries one or more decorators, the most important information is not the fact that it has decorators, but the specific decorators used. A classmethod or staticmethod decorator adds a completely different flavor than a decorator that provides an external linkage hint for ObjC, or one that adds synchronization, or one that declares deprecation. I expect that at least 80% of the use of decorators will have a single decorator per function, and it's a pain for that decorator to be hiding behind a content-free keyword. (This is *not* a number-of-keystrokes argument. You know I don't care much about that.) Minor issue: "using" is a poor choice of keyword. It resembles C#'s "using" and perhaps Perl's "use", both of which have completely different meanings. But there don't seem to be any better alternatives (the best I could come up with was "transmogrify" :-). So, what's next? In Python 2.4a3 (to be released this Thursday), everything remains as currently in CVS. For 2.4b1, I will consider a change of @ to some other single character, even though I think that @ has the advantage of being the same character used by a similar feature in Java. It's been argued that it's not quite the same, since @ in Java is used for attributes that don't change semantics. But Python's dynamic nature makes that its syntactic elements never mean quite the same thing as similar constructs in other languages, and there is definitely significant overlap. Regarding the impact on 3rd party tools: IPython's author doesn't think there's going to be much impact; Leo's author has said that Leo will survive (although it will cause him and his users some transitional pain). I actually expect that picking a character that's already used elsewhere in Python's syntax might be harder for external tools to adapt to, since parsing will have to be more subtle in that case. But I'm frankly undecided, so there's some wiggle room here. I don't want to consider further syntactic alternatives at this point: the buck has to stop at some point, everyone has had their say, and the show must go on. In the coming years I hope that as a community we'll gain enough experience with decorators to decide whether we need to adopt a different syntax for Python 3000 or not. One of the difficulties with choosing a decorator syntax has definitely been that nobody can predict how they are going to be used predominantly. Different alternatives look better depending on whether there are many or few decorators per function, whether they have long argument lists or not, and perhaps also whether their use is for transformation or for annotation. Despite the novelty of using the @ character, I personally feel that prefix decorators are a huge improvement over the "f = staticmethod(f)" style of decorating. A warning: some people have shown examples of extreme uses of decorators. I've seen decorators proposed for argument and return type annotations, and even one that used a decorator to create an object that did a regular expression substitution. Those uses are cute, but I recommend being conservative when deciding between using a decorator or some other approach, especially in code that will see a large audience (like 3rd party library packages). Using decorators for type annotations in particular looks tedious, and this particular application is so important that I expect Python 3000 will have optional type declarations integrated into the argument list. Thanks to everyone who read until the end of this message! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From kbk at shore.net Wed Sep 1 07:56:36 2004 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed Sep 1 07:56:41 2004 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200409010556.i815uaF7026858@h006008a7bda6.ne.client2.attbi.com> Patch / Bug Summary ___________________ Patches : 247 open (-12) / 2596 closed (+23) / 2843 total (+11) Bugs : 758 open (+13) / 4415 closed (+10) / 5173 total (+23) RFE : 148 open ( -2) / 131 closed ( +1) / 279 total ( -1) New / Reopened Patches ______________________ compiler.transformer: correct lineno attribute when possible (2004-08-25) http://python.org/sf/1015989 opened by Thenault Sylvain configure.in change to allow compilation on AIX 5 (2004-08-25) CLOSED http://python.org/sf/1016224 opened by Trent Mick bsddb's DB.keys() method ignores transaction argument (2004-08-27) http://python.org/sf/1017405 opened by Jp Calderone Fix for bug 1017546 (2004-08-27) http://python.org/sf/1017550 opened by Michael ifdeffery patch (2004-08-28) CLOSED http://python.org/sf/1018291 opened by Ilya Sandler fix for several sre escaping bugs (fixes #776311) (2004-08-28) http://python.org/sf/1018386 opened by Mike Coleman fix bug 807871 : tkMessageBox.askyesno wrong result (2004-08-29) http://python.org/sf/1018509 opened by Jiba Multi-line strings and unittest (2004-08-30) http://python.org/sf/1019220 opened by Felix Wiemann Bad test for HAVE_UINTPTR_T in PC/pyconfig.h (2004-08-31) http://python.org/sf/1020042 opened by Scott David Daniels Py_CLEAR to implicitly cast its argument to PyObject * (2004-09-01) http://python.org/sf/1020185 opened by Dima Dorfman Use Py_CLEAR where necessary to avoid crashes (2004-09-01) http://python.org/sf/1020188 opened by Dima Dorfman Patches Closed ______________ expose lowlevel setlocale (2003-07-24) http://python.org/sf/776854 closed by mhammond Docs claim that coerce can return None (2004-08-24) http://python.org/sf/1015021 closed by loewis bug in tarfile.ExFileObject.readline (2004-08-24) http://python.org/sf/1014992 closed by loewis decode message attachments in email.Message (2003-08-28) http://python.org/sf/796908 closed by loewis More urllib2 examples (2003-08-31) http://python.org/sf/798244 closed by loewis interpreter final destination location (2003-05-13) http://python.org/sf/736857 closed by loewis docs for interpreter final destination location (2003-05-13) http://python.org/sf/736859 closed by loewis Use a better BuildRoot tag (2004-06-10) http://python.org/sf/970019 closed by loewis Generate a working spec even with wrong version of software (2004-06-10) http://python.org/sf/970015 closed by loewis platform-specific entropy (2004-04-14) http://python.org/sf/934711 closed by loewis configure.in change to allow compilation on AIX 5 (2004-08-25) http://python.org/sf/1016224 closed by tmick Expose current parse location to XMLParser (2004-08-24) http://python.org/sf/1014930 closed by davecole Improve markup and punctuation in libsocket.tex (2004-08-24) http://python.org/sf/1015012 closed by davecole help on re-exported names (bug 925628) (2004-04-13) http://python.org/sf/934356 closed by jlgijsbers socketmodule on OpenBSD/sparc64 (64bit machine) (2004-08-01) http://python.org/sf/1001610 closed by loewis ifdeffery patch (2004-08-28) http://python.org/sf/1018291 closed by loewis difflib side by side diff support, diff.py s/b/s HTML option (2004-03-12) http://python.org/sf/914575 closed by loewis Fix for compilation with runtime_library_dirs (2004-06-15) http://python.org/sf/973204 closed by loewis AUTH_TYPE and REMOTE_USER for CGIHTTPServer.py:run_cgi() (2003-04-25) http://python.org/sf/727483 closed by loewis Backport of recent sre fixes. (2003-04-19) http://python.org/sf/723940 closed by loewis Fixes for bug 940578 (glob.glob on broken symlinks) (2004-04-24) http://python.org/sf/941486 closed by jlgijsbers fix for bugs 976878, 926369, 875404 (pdb bkpt handling) (2004-08-05) http://python.org/sf/1003640 closed by jlgijsbers Multi-line imports implementation (2004-08-11) http://python.org/sf/1007189 closed by anthonybaxter New / Reopened Bugs ___________________ __setitem__ for __dict__ ignored (2004-08-25) CLOSED http://python.org/sf/1015792 opened by Viktor A Danilov Don't define _SGAPI on IRIX (2003-04-27) http://python.org/sf/728330 reopened by loewis os.system segmentation fault (2004-08-25) http://python.org/sf/1015937 opened by Tomasz Kowaltowski "reversed" gives its name as "reverse" in docstring (2004-08-25) CLOSED http://python.org/sf/1016181 opened by Hamish Lawson urllib2 bug in proxy auth (2004-08-26) http://python.org/sf/1016563 opened by Christoph Mussenbrock distutils support for swig is under par (2004-08-26) http://python.org/sf/1016626 opened by Sjoerd Mullender urllib.urlretrieve silently truncates downloads (2004-08-26) http://python.org/sf/1016880 opened by David Abrahams email.Message does not allow iteration (2004-08-26) http://python.org/sf/1017329 opened by Paul McGuire including Python.h redefines _POSIX_C_SOURCE (2004-08-27) http://python.org/sf/1017450 opened by Jon K?re Hellan including Python.h redefines _POSIX_C_SOURCE (2004-08-27) CLOSED http://python.org/sf/1017455 opened by Jon K?re Hellan test_inspect.py fails to clean up upon failure (2004-08-27) http://python.org/sf/1017546 opened by Michael filemode() in tarfile.py makes wrong file mode strings (2004-08-27) http://python.org/sf/1017553 opened by Peter Loje Hansen Case sensitivity bug in ConfigParser (2004-08-27) http://python.org/sf/1017864 opened by Dani IDLE DOES NOT START ON WinXP Pro (2004-08-27) http://python.org/sf/1017978 opened by Snake __new__ not defined? (2004-08-28) http://python.org/sf/1018315 opened by Skip Montanaro Solaris: reentrancy issues (2004-08-29) http://python.org/sf/1018492 opened by Simon Harrison inspect.getmodule symlink-related failur (2002-06-18) http://python.org/sf/570300 reopened by amitar re.sub: two-digit group-reference hangs (2004-08-29) http://python.org/sf/1018815 opened by Michael Dyck __metaclass__ in locals is ignored (2004-08-30) http://python.org/sf/1019048 opened by Jeff Epler "rich comparison'' methods hide stack overflow (2004-08-30) http://python.org/sf/1019129 opened by boyanb distutils ignores configure's --includedir (2004-08-31) http://python.org/sf/1019715 opened by Joseph Winston wrong socket error returned (2004-08-31) http://python.org/sf/1019808 opened by Federico Schwindt hotshot start / stop stats bug (2004-08-31) http://python.org/sf/1019882 opened by Barry A. Warsaw httplib.HTTPConnection sends extra blank line (2004-08-31) http://python.org/sf/1019956 opened by Antonio Rodriguez Bugs Closed ___________ __setitem__ for __dict__ ignored (2004-08-25) http://python.org/sf/1015792 closed by nnorwitz Building with --disable-toolbox-glue fails (2004-07-15) http://python.org/sf/991962 closed by bcannon "reversed" gives its name as "reverse" in docstring (2004-08-25) http://python.org/sf/1016181 closed by rhettinger including Python.h redefines _POSIX_C_SOURCE (2004-08-27) http://python.org/sf/1017455 closed by nnorwitz glob.glob inconsistent about broken symlinks (2004-04-23) http://python.org/sf/940578 closed by jlgijsbers PDB: unreliable breakpoints on functions (2004-06-21) http://python.org/sf/976878 closed by jlgijsbers global stmt causes breakpoints to be ignored (2004-01-12) http://python.org/sf/875404 closed by jlgijsbers pdb sometimes sets breakpoints in the wrong location (2004-03-31) http://python.org/sf/926369 closed by jlgijsbers help does not help with imported objects (2004-03-29) http://python.org/sf/925628 closed by jlgijsbers Misc/NEWS no valid reStructuredText (2004-08-24) http://python.org/sf/1014770 closed by jlgijsbers Misc/NEWS.help (2004-08-24) http://python.org/sf/1014775 closed by jlgijsbers "make pdf" failure w/ 2.4 docs (2004-07-30) http://python.org/sf/1000841 closed by jlgijsbers RFE Closed __________ array.array objects should support sequences (2004-07-17) http://python.org/sf/992967 closed by rhettinger From anthony at interlink.com.au Wed Sep 1 07:11:31 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Sep 1 08:16:01 2004 Subject: [Python-Dev] Freezing for alpha3 - trunk FROZEN from 2004-09-01 13:00 UTC Message-ID: <41355A03.6070405@interlink.com.au> I plan to start alpha3 in about 24 hours time. From about 12 hours from now, the trunk should be considered frozen - that's starting at about 1700 UTC on 2004-09-01. If your name isn't Anthony, Fred or Martin, please do NOT check in while we're doing the release. Really. Really, really, really. I'm cc'ing python-checkins as well this time, as a few folks missed this last time. The trunk will stay frozen until about 6 hours or so after the release is done - this makes it easier for me to do an emergency brown-paper-bag release in the case of a cockup Thanks, Anthony From pf_moore at yahoo.co.uk Wed Sep 1 08:44:58 2004 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Wed Sep 1 08:44:53 2004 Subject: [Python-Dev] Re: Right curry considered harmful References: <20040831164226.96D151E4008@bag.python.org> <4134DE7D.9020204@blueyonder.co.uk> Message-ID: Peter Harris writes: > I think we'll see if partial() is a useful enough feature to be > worth optimising once it actually makes it into a build and gets > used. We've now had a couple of comments regarding efficiency (Raymond Hettinger made this point as well). As a C implementation exists, and I can also imagine that this is the sort of thing that could get used in performance-sensitive areas, why not use the C implementation? Paul. -- Ooh, how Gothic. Barring the milk. From pedronis at bluewin.ch Wed Sep 1 14:00:41 2004 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Sep 1 13:58:25 2004 Subject: [Python-Dev] @ character choice and Jython (was: Rejecting the J2 decorators proposal) In-Reply-To: References: Message-ID: <4135B9E9.9020906@bluewin.ch> Guido van Rossum wrote: > So, what's next? In Python 2.4a3 (to be released this Thursday), > everything remains as currently in CVS. For 2.4b1, I will consider a > change of @ to some other single character, even though I think that @ > has the advantage of being the same character used by a similar > feature in Java. It's been argued that it's not quite the same, since > @ in Java is used for attributes that don't change semantics. But > Python's dynamic nature makes that its syntactic elements never mean > quite the same thing as similar constructs in other languages, and > there is definitely significant overlap. One issue with the '@' character choice is that in the context of Jython things can get rather confusing and I mean beyond the fact that the need of an "annotation" to get a static method will seem rather bizarre to Java people. It somewhat put the burden on Jython to try to do the obvious thing: Consider this java annotation definition (concretely these get compiled to interfaces): public @interface Author { String value() default ""; } Now this potential Jython code: import Author classs A: # not inheriting from a Java class # this also the exact legal java syntax for this @Author("batman") def method(self): pass in the past and at the moment interfaces are not callable, but here we would like to produce a nice error or warning, we are not inheriting from a java class so there is no way to attach the annotation. But in this case: import java import Author classs A(java.lang.Runnable): @Author("batman") def run(self): # this one pass here we potentially could attach the annotation to the exposed method. My point is basically that '@' will likely generate more user questions (which are time consuming) and expectations than a different character choice in Jython context. Samuele From skip at pobox.com Wed Sep 1 17:41:59 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 1 17:42:14 2004 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/peps pep-0318.txt, 1.30, 1.31 In-Reply-To: References: Message-ID: <16693.60871.999004.146879@montanaro.dyndns.org> anthony> (I'm not sure if the "Community Concensus" section should be anthony> trimmed down radically now - it's a lot of words for a rejected anthony> form, and the case for the form is still available on the web anthony> and in the mailing list archives... opinions, anyone?) I'd just refer to the wiki and Robert Brewer's J2 proposal. Skip From fdrake at acm.org Wed Sep 1 17:45:26 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed Sep 1 17:45:34 2004 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/peps pep-0318.txt, 1.30, 1.31 In-Reply-To: References: Message-ID: <200409011145.26772.fdrake@acm.org> On Wednesday 01 September 2004 11:02 am, anthonybaxter@users.sourceforge.net wrote: > (I'm not sure if the "Community Concensus" section should be trimmed > down radically now - it's a lot of words for a rejected form, and the > case for the form is still available on the web and in the mailing > list archives... opinions, anyone?) I'm for leaving the text in; wikis are fragile, and this is a valuable bit of Python history. A reader can ignore it if that's not interesting to them. -Fred -- Fred L. Drake, Jr. From gvanrossum at gmail.com Wed Sep 1 17:52:58 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 1 17:53:05 2004 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/peps pep-0318.txt, 1.30, 1.31 In-Reply-To: <200409011145.26772.fdrake@acm.org> References: <200409011145.26772.fdrake@acm.org> Message-ID: > > (I'm not sure if the "Community Concensus" section should be trimmed > > down radically now - it's a lot of words for a rejected form, and the > > case for the form is still available on the web and in the mailing > > list archives... opinions, anyone?) > > I'm for leaving the text in; wikis are fragile, and this is a valuable bit of > Python history. A reader can ignore it if that's not interesting to them. +1 -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gvanrossum at gmail.com Wed Sep 1 17:55:52 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 1 17:55:56 2004 Subject: [Python-Dev] @ character choice and Jython (was: Rejecting the J2 decorators proposal) In-Reply-To: <4135B9E9.9020906@bluewin.ch> References: <4135B9E9.9020906@bluewin.ch> Message-ID: > One issue with the '@' character choice is that in the context of > Jython things can get rather confusing and I mean beyond the fact > that the need of an "annotation" to get a static method will seem > rather bizarre to Java people. It somewhat put the burden on Jython > to try to do the obvious thing: [snipped example showing that Jython can do the right thing, at least for Java-derived classes, with Java annotation interfaces] > My point is basically that '@' will likely generate more user > questions (which are time consuming) and expectations than a > different character choice in Jython context. Have you gotten cynical? This should be counted as an argument *for* the @ character. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From anthony at interlink.com.au Wed Sep 1 17:37:48 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Sep 1 18:33:36 2004 Subject: [Python-Dev] (my) revisions to PEP318 finally done. In-Reply-To: <4133382B.7070308@interlink.com.au> References: <41332A02.1040902@interlink.com.au> <4133382B.7070308@interlink.com.au> Message-ID: <4135ECCC.4080704@interlink.com.au> I've now updated the PEP to the current state of play, which is pretty much done. If there's no significant feedback, I'll post this to c.l.py tomorrow. -------------- next part -------------- PEP: 318 Title: Decorators for Functions and Methods Version: $Revision: 1.31 $ Last-Modified: $Date: 2004/09/01 15:02:22 $ Author: Kevin D. Smith, Jim Jewett, Skip Montanaro, Anthony Baxter Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-Jun-2003 Python-Version: 2.4 Post-History: 09-Jun-2003, 10-Jun-2003, 27-Feb-2004, 23-Mar-2004, 30-Aug-2004, 2-Sep-2004 WarningWarningWarning ===================== This document is meant to describe the decorator syntax and the process that resulted in the decisions that were made. It does not attempt to cover the huge number of potential alternative syntaxes, nor is it an attempt to exhaustively list all the positives and negatives of each form. Abstract ======== The current method for transforming functions and methods (for instance, declaring them as a class or static method) is awkward and can lead to code that is difficult to understand. Ideally, these transformations should be made at the same point in the code where the declaration itself is made. This PEP introduces new syntax for transformations of a function or method declaration. Motivation ========== The current method of applying a transformation to a function or method places the actual translation after the function body. For large functions this separates a key component of the function's behavior from the definition of the rest of the function's external interface. For example:: def foo(self): perform method operation foo = classmethod(foo) This becomes less readable with longer methods. It also seems less than pythonic to name the function three times for what is conceptually a single declaration. A solution to this problem is to move the transformation of the method closer to the method's own declaration. While the new syntax is not yet final, the intent is to replace:: def foo(cls): pass foo = synchronized(lock)(foo) foo = classmethod(foo) with an alternative that places the decoration in the function's declaration:: @classmethod @synchronized(lock) def foo(cls): pass Modifying classes in this fashion is also possible, though the benefits are not as immediately apparent. Almost certainly, anything which could be done with class decorators could be done using metaclasses, but using metaclasses is sufficiently obscure that there is some attraction to having an easier way to make simple modifications to classes. For Python 2.4, only function/method decorators are being added. Why Is This So Hard? -------------------- Two decorators (``classmethod()`` and ``staticmethod()``) have been available in Python since version 2.2. It's been assumed since approximately that time that some syntactic support for them would eventually be added to the language. Given this assumption, one might wonder why it's been so difficult to arrive at a consensus. Discussions have raged off-and-on at times in both comp.lang.python and the python-dev mailing list about how best to implement function decorators. There is no one clear reason why this should be so, but a few problems seem to be most problematic. * Disagreement about where the "declaration of intent" belongs. Almost everyone agrees that decorating/transforming a function at the end of its definition is suboptimal. Beyond that there seems to be no clear consensus where to place this information. * Syntactic constraints. Python is a syntactically simple language with fairly strong constraints on what can and can't be done without "messing things up" (both visually and with regards to the language parser). There's no obvious way to structure this information so that people new to the concept will think, "Oh yeah, I know what you're doing." The best that seems possible is to keep new users from creating a wildly incorrect mental model of what the syntax means. * Overall unfamiliarity with the concept. For people who have a passing acquaintance with algebra (or even basic arithmetic) or have used at least one other programming language, much of Python is intuitive. Very few people will have had any experience with the decorator concept before encountering it in Python. There's just no strong preexisting meme that captures the concept. * Syntax discussions in general appear to cause more contention than almost anything else. Readers are pointed to the ternary operator discussions that were associated with PEP 308 for another example of this. Background ========== There is general agreement that syntactic support is desirable to the current state of affairs. Guido mentioned `syntactic support for decorators`_ in his DevDay keynote presentation at the `10th Python Conference`_, though `he later said`_ it was only one of several extensions he proposed there "semi-jokingly". `Michael Hudson raised the topic`_ on ``python-dev`` shortly after the conference, attributing the initial bracketed syntax to an earlier proposal on ``comp.lang.python`` by `Gareth McCaughan`_. .. _syntactic support for decorators: http://www.python.org/doc/essays/ppt/python10/py10keynote.pdf .. _10th python conference: http://www.python.org/workshops/2002-02/ .. _michael hudson raised the topic: http://mail.python.org/pipermail/python-dev/2002-February/020005.html .. _he later said: http://mail.python.org/pipermail/python-dev/2002-February/020017.html .. _gareth mccaughan: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=slrna40k88.2h9o.Gareth.McCaughan%40g.local Class decorations seem like an obvious next step because class definition and function definition are syntactically similar, however Guido remains unconvinced, and class decorators will almost certainly not be in Python 2.4. The discussion continued on and off on python-dev from February 2002 through July 2004. Hundreds and hundreds of posts were made, with people proposing many possible syntax variations. Guido took a list of proposals to `EuroPython 2004`_, where a discussion took place. Subsequent to this, he decided that we'd have the `Java-style`_ @decorator syntax, and this appeared for the first time in 2.4a2. Barry Warsaw named this the 'pie-decorator' syntax, in honor of the Pie-thon Parrot shootout which was occured around the same time as the decorator syntax, and because the @ looks a little like a pie. Guido `outlined his case`_ on Python-dev, including `this piece`_ on some of the (many) rejected forms. .. _EuroPython 2004: http://www.python.org/doc/essays/ppt/euro2004/euro2004.pdf .. _outlined his case: http://mail.python.org/pipermail/python-dev/2004-August/author.html .. _this piece: http://mail.python.org/pipermail/python-dev/2004-August/046672.html .. _Java-style: http://java.sun.com/j2se/1.5.0/docs/guide/language/annotations.html On the name 'Decorator' ======================= There's been a number of complaints about the choice of the name 'decorator' for this feature. The major one is that the name is not consistent with its use in the `GoF book`_. The name 'decorator' probably owes more to its use in the compiler area -- a syntax tree is walked and annotated. It's quite possible that a better name may turn up. .. _GoF book: http://patterndigest.com/patterns/Decorator.html Design Goals ============ The new syntax should * work for arbitrary wrappers, including user-defined callables and the existing builtins ``classmethod()`` and ``staticmethod()``. This requirement also means that a decorator syntax must support passing arguments to the wrapper constructor * work with multiple wrappers per definition * make it obvious what is happening; at the very least it should be obvious that new users can safely ignore it when writing their own code * be a syntax "that ... [is] easy to remember once explained" * not make future extensions more difficult * be easy to type; programs that use it are expected to use it very frequently * not make it more difficult to scan through code quickly. It should still be easy to search for all definitions, a particular definition, or the arguments that a function accepts * not needlessly complicate secondary support tools such as language-sensitive editors and other "`toy parser tools out there`_" * allow future compilers to optimize for decorators. With the hope of a JIT compiler for Python coming into existence at some point this tends to require the syntax for decorators to come before the function definition * move from the end of the function, where it's currently hidden, to the front where it is more `in your face`_ Andrew Kuchling has links to a bunch of the discussions about motivations and use cases `in his blog`_. Particularly notable is `Jim Huginin's list of use cases`_. .. _toy parser tools out there: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=mailman.1010809396.32158.python-list%40python.org .. _in your face: http://mail.python.org/pipermail/python-dev/2004-August/047112.html .. _in his blog: http://www.amk.ca/diary/archives/cat_python.html#003255 .. _Jim Huginin's list of use cases: http://mail.python.org/pipermail/python-dev/2004-April/044132.html Current Syntax ============== The current syntax for function decorators as implemented in Python 2.4a2 is:: @dec2 @dec1 def func(arg1, arg2, ...): pass This is equivalent to:: def func(arg1, arg2, ...): pass func = dec2(dec1(func)) without the intermediate assignment to the variable ``func``. The decorators are near the function declaration. The @ sign makes it clear that something new is going on here. The decorator statement is limited in what it can accept -- arbitrary expressions will not work. Guido preferred this because of a `gut feeling`_. .. _gut feeling: http://mail.python.org/pipermail/python-dev/2004-August/046711.html Syntax Alternatives =================== There have been `a large number`_ of different syntaxes proposed -- rather than attempting to work through these individual syntaxes, it's worthwhile to break the syntax discussion down into a number of areas. Attempting to discuss `each possible syntax`_ individually would be an act of madness, and produce a completely unwieldy PEP. .. _a large number: http://www.python.org/moin/PythonDecorators .. _each possible syntax: http://ucsu.colorado.edu/~bethard/py/decorators-output.py Decorator Location ------------------ The first syntax point is the location of the decorators. For the following examples, we use the @syntax used in 2.4a2. Decorators before the def statement are the first alternative, and the syntax used in 2.4a2:: @classmethod def foo(arg1,arg2): pass @accepts(int,int) @returns(float) def bar(low,high): pass There have been a number of objections raised to this location -- the primary one is that it's the first real Python case where a line of code has a result on a following line. The syntax available for in 2.4a3 requires one decorator per line (in a2, multiple decorators could be specified on the same line). People also complained that the syntax got unworldly quickly when multiple decorators were used. The point was made, though, that the chances of a large number of decorators being used on a single function were small and thus this was not a large worry. Some of the advantages of this form are that the decorators live outside the method body -- they are obviously executed at the time the function is defined. Another advantage is that being prefix to the function definition fit the idea of knowing about a change to the semantics of the code before the code itself, thus knowing how to interpret the code's semantics properly without having to go back and change your initial perceptions if the syntax did not come before the function definition. Guido decided `he preferred`_ having the decorators on the line before the 'def', because it was felt that a long argument list would mean that the decorators would be 'hidden' .. _he preferred: http://mail.python.org/pipermail/python-dev/2004-March/043756.html The second form is the decorators between the def and the function name, or the function name and the argument list:: def @classmethod foo(arg1,arg2): pass def @accepts(int,int),@returns(float) bar(low,high): pass def foo @classmethod (arg1,arg2): pass def bar @accepts(int,int),@returns(float) (low,high): pass There are a couple of objections to this form. The first is that it breaks easily 'greppability' of the source -- you can no longer search for 'def foo(' and find the definition of the function. The second, more serious, objection is that in the case of multiple decorators, the syntax would be extremely unwieldy. The next form, which has had a number of strong proponents, is to have the decorators between the argument list and the trailing ``:`` in the 'def' line:: def foo(arg1,arg2) @classmethod: pass def bar(low,high) @accepts(int,int),@returns(float): pass Guido `summarized the arguments`_ against this form (many of which also apply to the previous form) as: - it hides crucial information (e.g. that it is a static method) after the signature, where it is easily missed - it's easy to miss the transition between a long argument list and a long decorator list - it's cumbersome to cut and paste a decorator list for reuse, because it starts and ends in the middle of a line .. _summarized the arguments: http://mail.python.org/pipermail/python-dev/2004-August/047112.html The next form is that the decorator syntax go inside the method body at the start, in the same place that docstrings currently live: def foo(arg1,arg2): @classmethod pass def bar(low,high): @accepts(int,int) @returns(float) pass The primary objection to this form is that it requires "peeking inside" the method body to determine the decorators. In addition, even though the code is inside the method body, it is not executed when the method is run. Guido felt that docstrings were not a good counter-example, and that it was quite possible that a 'docstring' decorator could help move the docstring to outside the function body. The final form is a new block that encloses the method's code. For this example, we'll use a 'decorate' keyword, as it makes no sense with the @syntax. :: decorate: classmethod def foo(arg1,arg2): pass decorate: accepts(int,int) returns(float) def bar(low,high): pass This form would result in inconsistent indentation for decorated and undecorated methods. In addition, a decorated method's body would start three indent levels in. Syntax forms ------------ * ``@decorator``:: @classmethod def foo(arg1,arg2): pass @accepts(int,int) @returns(float) def bar(low,high): pass The major objections against this syntax are that the @ symbol is not currently used in Python (and is used in both IPython and Leo), and that the @ symbol is not meaningful. Another objection is that this "wastes" a currently unused character (from a limited set) on something that is not perceived as a major use. * ``|decorator``:: |classmethod def foo(arg1,arg2): pass |accepts(int,int) |returns(float) def bar(low,high): pass This is a variant on the @decorator syntax -- it has the advantage that it does not break IPython and Leo. Its major disadvantage compared to the @syntax is that the | symbol looks like both a capital I and a lowercase l. * list syntax:: [classmethod] def foo(arg1,arg2): pass [accepts(int,int), returns(float)] def bar(low,high): pass The major objection to the list syntax is that it's currently meaningful (when used in the form before the method). It's also lacking any indication that the expression is a decorator. * list syntax using other brackets (``<...>``, ``[[...]]``, ...):: def foo(arg1,arg2): pass def bar(low,high): pass None of these alternatives gained much traction. The alternatives which involve square brackets only serve to make it obvious that the decorator construct is not a list. They do nothing to make parsing any easier. The '<...>' alternative presents parsing problems because '<' and '>' already parse as un-paired. They present a further parsing ambiguity because a right angle bracket might be a greater than symbol instead of a closer for the decorators. * ``decorate()`` The ``decorate()`` proposal was that no new syntax be implemented -- instead a magic function that used introspection to manipulate the following function. Both Jp Calderone and Philip Eby produced implementations of functions that did this. Guido was pretty firmly against this -- with no new syntax, the magicness of a function like this is extremely high: Using functions with "action-at-a-distance" through sys.settraceback may be okay for an obscure feature that can't be had any other way yet doesn't merit changes to the language, but that's not the situation for decorators. The widely held view here is that decorators need to be added as a syntactic feature to avoid the problems with the postfix notation used in 2.2 and 2.3. Decorators are slated to be an important new language feature and their design needs to be forward-looking, not constrained by what can be implemented in 2.3. * _`new keyword (and block)` This idea was the consensus alternate from comp.lang.python (more on this in `Community Consensus`_ below.) Robert Brewer wrote up a detailed `J2 proposal`_ document outlining the arguments in favor of this form. The initial issues with this form are: - It requires a new keyword, and therefore a ``from __future__ import decorators`` statement. - The choice of keyword is contentious. However ``using`` emerged as the consensus choice, and is used in the proposal and implementation. - The keyword/block form produces something that looks like a normal code block, but isn't. Attempts to use statements in this block will cause a syntax error, which may confuse users. A few days later, Guido `rejected the proposal`_ on two main grounds, firstly: ... the syntactic form of an indented block strongly suggests that its contents should be a sequence of statements, but in fact it is not -- only expressions are allowed, and there is an implicit "collecting" of these expressions going on until they can be applied to the subsequent function definition. ... and secondly: ... the keyword starting the line that heads a block draws a lot of attention to it. This is true for "if", "while", "for", "try", "def" and "class". But the "using" keyword (or any other keyword in its place) doesn't deserve that attention; the emphasis should be on the decorator or decorators inside the suite, since those are the important modifiers to the function definition that follows. ... Readers are invited to read `the full response`_. .. _J2 proposal: http://www.aminus.org/rbre/python/pydec.html .. _rejected the proposal: http://mail.python.org/pipermail/python-dev/2004-September/048518.html .. _the full response: http://mail.python.org/pipermail/python-dev/2004-September/048518.html * Other forms There are plenty of other variants and proposals on `the wiki page`_. .. _the wiki page: http://www.python.org/moin/PythonDecorators Why @? ------ There is some history in Java using @ initially as a marker in `Javadoc comments`_ and later in Java 1.5 for `annotations`_, which are similar to Python decorators. The fact that @ was previously unused as a token in Python also means it's clear there is no possibility of such code being parsed by an earlier version of Python, leading to possibly subtle semantic bugs. It also means that ambiguity of what is a decorator and what isn't is removed. of That said, @ is still a fairly arbitrary choice. Some have suggested using | instead. For syntax options which use a list-like syntax (no matter where it appears) to specify the decorators a few alternatives were proposed: ``[|...|]``, ``*[...]*``, and ``<...>``. .. _Javadoc comments: http://java.sun.com/j2se/javadoc/writingdoccomments/ .. _annotations: http://java.sun.com/j2se/1.5.0/docs/guide/language/annotations.html Current Implementation, History =============================== Guido asked for a volunteer to implement his preferred syntax, and Mark Russell stepped up and posted a `patch`_ to SF. This new syntax was available in 2.4a2. :: @dec2 @dec1 def func(arg1, arg2, ...): pass This is equivalent to:: def func(arg1, arg2, ...): pass func = dec2(dec1(func)) though without the intermediate creation of a variable named ``func``. The version implemented in 2.4a2 allowed multiple ``@decorator`` clauses on a single line. In 2.4a3, this was tightened up to only allowing one decorator per line. A `previous patch`_ from Michael Hudson which implements the list-after-def syntax is also still kicking around. .. _patch: http://www.python.org/sf/979728 .. _previous patch: http://starship.python.net/crew/mwh/hacks/meth-syntax-sugar-3.diff After 2.4a2 was released, in response to community reaction, Guido stated that he'd re-examine a community proposal, if the community could come up with a community consensus, a decent proposal, and an implementation. After an amazing number of posts, collecting a vast number of alternatives in the `Python wiki`_, a community consensus emerged (below). Guido `subsequently rejected`_ this alternate form, but added: In Python 2.4a3 (to be released this Thursday), everything remains as currently in CVS. For 2.4b1, I will consider a change of @ to some other single character, even though I think that @ has the advantage of being the same character used by a similar feature in Java. It's been argued that it's not quite the same, since @ in Java is used for attributes that don't change semantics. But Python's dynamic nature makes that its syntactic elements never mean quite the same thing as similar constructs in other languages, and there is definitely significant overlap. Regarding the impact on 3rd party tools: IPython's author doesn't think there's going to be much impact; Leo's author has said that Leo will survive (although it will cause him and his users some transitional pain). I actually expect that picking a character that's already used elsewhere in Python's syntax might be harder for external tools to adapt to, since parsing will have to be more subtle in that case. But I'm frankly undecided, so there's some wiggle room here. I don't want to consider further syntactic alternatives at this point: the buck has to stop at some point, everyone has had their say, and the show must go on. .. _Python wiki: http://www.python.org/moin/PythonDecorators .. _subsequently rejected: http://mail.python.org/pipermail/python-dev/2004-September/048518.html Community Consensus ------------------- [editor's note: should this section be removed now?] The consensus that emerged on comp.lang.python was the proposed J2 syntax (the "J2" was how it was referenced on the PythonDecorators wiki page): the new keyword ``using`` prefixing a block of decorators before the ``def`` statement. For example:: using: classmethod synchronized(lock) def func(cls): pass The main arguments for this syntax fall under the "readability counts" doctrine. In brief, they are: * A suite is better than multiple @lines. The ``using`` keyword and block transforms the single-block ``def`` statement into a multiple-block compound construct, akin to try/finally and others. * A keyword is better than punctuation for a new token. A keyword matches the existing use of tokens. No new token category is necessary. A keyword distinguishes Python decorators from Java annotations and .Net attributes, which are significantly different beasts. Robert Brewer wrote a `detailed proposal`_ for this form, and Michael Sparks produced `a patch`_. .. _detailed proposal: http://www.aminus.org/rbre/python/pydec.html .. _a patch: http://www.python.org/sf/1013835 As noted previously, Guido rejected this form, outlining his problems with it in `a message`_ to python-dev and comp.lang.python. .. _a message: http://mail.python.org/pipermail/python-dev/2004-September/048518.html Examples ======== Much of the discussion on ``comp.lang.python`` and the ``python-dev`` mailing list focuses on the use of decorators as a cleaner way to use the ``staticmethod()`` and ``classmethod()`` builtins. This capability is much more powerful than that. This section presents some examples of use. 1. Define a function to be executed at exit. Note that the function isn't actually "wrapped" in the usual sense. :: def onexit(f): import atexit atexit.register(f) return f @onexit def func(): ... Note that this example is probably not suitable for real usage, but is for example purposes only. 2. Define a class with a singleton instance. Note that once the class disappears enterprising programmers would have to be more creative to create more instances. (From Shane Hathaway on ``python-dev``.) :: def singleton(cls): instances = {} def getinstance(): if cls not in instances: instances[cls] = cls() return instances[cls] return getinstance @singleton class MyClass: ... 3. Add attributes to a function. (Based on an example posted by Anders Munch on ``python-dev``.) :: def attrs(**kwds): def decorate(f): for k in kwds: setattr(f, k, kwds[k]) return f return decorate @attrs(versionadded="2.2", author="Guido van Rossum") def mymethod(f): ... 4. Enforce function argument and return types. Note that this copies the func_name attribute from the old to the new function. func_name was made writable in Python 2.4a3:: def accepts(*types): def check_accepts(f): assert len(types) == f.func_code.co_argcount def new_f(*args, **kwds): for (a, t) in zip(args, types): assert isinstance(a, t), \ "arg %r does not match %s" % (a,t) return f(*args, **kwds) new_f.func_name = f.func_name return new_f return check_accepts def returns(rtype): def check_returns(f): def new_f(*args, **kwds): result = f(*args, **kwds) assert isinstance(result, rtype), \ "return value %r does not match %s" % (result,rtype) return result new_f.func_name = f.func_name return new_f return check_returns @accepts(int, (int,float)) @returns((int,float)) def func(arg1, arg2): return arg1 * arg2 5. Declare that a class implements a particular (set of) interface(s). This is from a posting by Bob Ippolito on ``python-dev`` based on experience with `PyProtocols`_. :: def provides(*interfaces): """ An actual, working, implementation of provides for the current implementation of PyProtocols. Not particularly important for the PEP text. """ def provides(typ): declareImplementation(typ, instancesProvide=interfaces) return typ return provides class IBar(Interface): """Declare something about IBar here""" @provides(IBar) class Foo(object): """Implement something here...""" .. _PyProtocols: http://peak.telecommunity.com/PyProtocols.html Of course, all these examples are possible today, though without syntactic support. Open Issues =========== 1. It's not yet certain that class decorators will be incorporated into the language at a future point. Guido expressed skepticism about the concept, but various people have made some `strong arguments`_ (search for ``PEP 318 -- posting draft``) on their behalf in ``python-dev``. It's exceedingly unlikely that class decorators will be in Python 2.4. .. _strong arguments: http://mail.python.org/pipermail/python-dev/2004-March/thread.html 2. The choice of the ``@`` character will be re-examined before Python 2.4b1. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From FBatista at uniFON.com.ar Wed Sep 1 19:11:41 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Wed Sep 1 19:16:07 2004 Subject: [Python-Dev] Decimal for Py2.3 (was: about presicion) Message-ID: [Alex Martelli] #- I think that's an excellent policy -- Python 2.3 will no doubt remain #- widely used for a long time to come. I think it would be nice if #- Decimal was packaged up with its own docs for easy download and install #- into an existing 2.3 installation, then... make life as easy as possible #- for 2.3 users who need to do some decimal arithmetic! Don't know what's commong usage, so I ask. Should I prepare a "decimal package" with the module and docs and whatever, in form of a tgz, rmp, .exe, etc, to let people "install" decimal in their Py2.3? Or it's better to somewhere tell the user that if he/she wants to use Decimal in Py2.3 to follow this simple n steps (and the detail of the steps, of course ;)? Considering that it's only a file, and the docs could be accessed through Py2.4 documentation, I'll go for the latter. . Facundo From vishalvkapoor at gmail.com Wed Sep 1 20:51:05 2004 From: vishalvkapoor at gmail.com (Vishal Kapoor) Date: Wed Sep 1 20:51:09 2004 Subject: [Python-Dev] Installing python-dev Message-ID: <189a54cc0409011151216ab01@mail.gmail.com> Hi, I am trying to install Zope and it requires python2.3-dev. I downloaded Python and installed it, how do i install python-dev ?? Thank you Vishal Kapoor From allison at sumeru.stanford.EDU Wed Sep 1 20:59:20 2004 From: allison at sumeru.stanford.EDU (Dennis Allison) Date: Wed Sep 1 20:59:27 2004 Subject: [Python-Dev] Installing python-dev In-Reply-To: <189a54cc0409011151216ab01@mail.gmail.com> Message-ID: If you are on a rpm based sysem (RH, etc) you need to download the development RPM as well as the binary as that's where all the includes are that are needed for C extensions. But, the better installation is to download the source and compile locally using the usual sequence ./congfigure make su make install (but read the documents first). In that case the necesary files will be automatically installed. On Wed, 1 Sep 2004, Vishal Kapoor wrote: > Hi, > I am trying to install Zope and it requires python2.3-dev. > I downloaded Python and installed it, how do i install python-dev ?? > > Thank you > > Vishal Kapoor > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/allison%40sumeru.stanford.edu > From aahz at pythoncraft.com Wed Sep 1 21:49:57 2004 From: aahz at pythoncraft.com (Aahz) Date: Wed Sep 1 21:50:02 2004 Subject: [Python-Dev] Installing python-dev In-Reply-To: <189a54cc0409011151216ab01@mail.gmail.com> References: <189a54cc0409011151216ab01@mail.gmail.com> Message-ID: <20040901194957.GB25565@panix.com> On Wed, Sep 01, 2004, Vishal Kapoor wrote: > > I am trying to install Zope and it requires python2.3-dev. > I downloaded Python and installed it, how do i install python-dev ?? Sorry, this post is off-topic for python-dev. Please use comp.lang.python or a Zope list for help. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com From martin at v.loewis.de Wed Sep 1 22:32:57 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 1 22:33:03 2004 Subject: [Python-Dev] Installing python-dev In-Reply-To: <20040901194957.GB25565@panix.com> References: <189a54cc0409011151216ab01@mail.gmail.com> <20040901194957.GB25565@panix.com> Message-ID: <413631F9.4070507@v.loewis.de> Aahz wrote: > On Wed, Sep 01, 2004, Vishal Kapoor wrote: > >>I am trying to install Zope and it requires python2.3-dev. >>I downloaded Python and installed it, how do i install python-dev ?? > > > Sorry, this post is off-topic for python-dev. Although it is far from obvious that a list called python-dev is not about python-dev :-) Martin From gvanrossum at gmail.com Thu Sep 2 02:42:56 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 02:42:59 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: Message-ID: On Wed, 01 Sep 2004 15:31:26 -0700, mhammond@users.sourceforge.net wrote: > Update of /cvsroot/python/python/dist/src/Modules > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22421/Modules > > Modified Files: > Tag: release23-maint > threadmodule.c > Log Message: > Backport [ 1010677 ] thread Module Breaks PyGILState_Ensure() > to the 2.3 maint branch. As long as we're backporting C APIs to 2.3, can I request that the new datetime API be backported to 2.3? Anthony Tuininga (the cx_Oracle author) would be interested in using this and might be willing to help out with the work. (And yes, I'm encouraging this because I could use this myself.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From tim.peters at gmail.com Thu Sep 2 03:43:34 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 2 03:43:38 2004 Subject: [Python-Dev] Coernic Desktop Search versus shutil.rmtree Message-ID: <1f7befae040901184316a8ebf6@mail.gmail.com> It took a while to make this connection! Last night I downloaded this new free (beer) app for Windows: http://www.copernic.com/en/products/desktop-search/index.html Just a note to say that it's fantastic. It builds index files for all the documents on your drive, including PDFs, HTMLs, and Outlook email stores. Then you do can seriously fast (sub-second) Boolean searches. It's indexed about 10GB of data on my drive with about a million keywords, with 0 errors and 0 glitches, and the search quality is very good. Anyway, today I saw a really weird failure in the Zope X3 test suite, shutil.rmtree() complaining that it couldn't remove a directory. Studying the test didn't turn up any plausible cause for this. Tonight I was running the Python CVS test suite, and it failed once in the same mysterious way. Then it failed again that way, but in another test. I eventually reduced it to this: """ import os import shutil LOCALEDIR = os.path.join('xx', 'LC_MESSAGES') MOFILE = os.path.join(LOCALEDIR, 'gettext.mo') UMOFILE = os.path.join(LOCALEDIR, 'ugettext.mo') MMOFILE = os.path.join(LOCALEDIR, 'metadata.mo') class Drive: def setUp(self): if os.path.isdir(LOCALEDIR): shutil.rmtree(os.path.split(LOCALEDIR)[0]) os.makedirs(LOCALEDIR) fp = open(MOFILE, 'wb'); fp.write('a'); fp.close() fp = open(UMOFILE, 'wb'); fp.write('b'); fp.close() fp = open(MMOFILE, 'wb'); fp.write('c'); fp.close() shutil.rmtree(os.path.split(LOCALEDIR)[0]) d = Drive() while True: d.setUp() print '.', """ That failed every time, after printing from 0 to 100 dots, while trying to rmdir xx/LC_MESSAGES. The cause: Windows has low-level hooks for apps that want to monitor changes to the filesystem. For example, virus scanners use those heavily. Coernic also uses them, to reindex changed files in the background. So it can keep a file open beyond the time Python thinks it deleted it, and then trying to rmdir its parent directory fails (because the directory isn't really empty yet). Stopping the Desktop Search process makes these problems go away. It also appears to cure a range of incomprehensible complaints from large CVS updates that starting showing up last night . Ah, there's an option to keep the search app running but to turn off the filesystem hooking -- that cures it too. Anyway, this is worth sharing because this has got to be the next PC Killer App genre: finding info on a 120GB disk has become impossible, and I switched most of my email to a gmail account because I can't even find "important" email from last week using Outlook anymore. If you don't run an app like Coernic yet, you will soon <0.5 wink>. From anthony at interlink.com.au Thu Sep 2 07:19:10 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 2 07:19:35 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: Message-ID: <4136AD4E.30503@interlink.com.au> Guido van Rossum wrote: > As long as we're backporting C APIs to 2.3, can I request that the new > datetime API be backported to 2.3? Anthony Tuininga (the cx_Oracle > author) would be interested in using this and might be willing to help > out with the work. (And yes, I'm encouraging this because I could use > this myself.) Erm - this particular fix was a bug fix. I'm deeply uncomfortable about adding the C version of datetime to 2.3 at this very late stage of 2.3's life cycle. -- Anthony Baxter It's never too late to have a happy childhood. From aleaxit at yahoo.com Thu Sep 2 09:32:51 2004 From: aleaxit at yahoo.com (Alex Martelli) Date: Thu Sep 2 09:32:50 2004 Subject: [Python-Dev] Re: Decimal for Py2.3 (was: about presicion) In-Reply-To: References: Message-ID: <4BB8FEFE-FCB2-11D8-A6A1-000A95EFAE9E@yahoo.com> On 2004 Sep 01, at 19:11, Batista, Facundo wrote: > [Alex Martelli] > > #- I think that's an excellent policy -- Python 2.3 will no doubt > remain > #- widely used for a long time to come. I think it would be nice if > #- Decimal was packaged up with its own docs for easy download and > install > #- into an existing 2.3 installation, then... make life as easy as > possible > #- for 2.3 users who need to do some decimal arithmetic! > > Don't know what's commong usage, so I ask. > > Should I prepare a "decimal package" with the module and docs and > whatever, > in form of a tgz, rmp, .exe, etc, to let people "install" decimal in > their > Py2.3? > > Or it's better to somewhere tell the user that if he/she wants to use > Decimal in Py2.3 to follow this simple n steps (and the detail of the > steps, > of course ;)? > > Considering that it's only a file, and the docs could be accessed > through > Py2.4 documentation, I'll go for the latter. Based on little direct experience, I think that the former course would more than double the usage of Decimal among people still sticking with Python 2.3 -- having to get files out of a different release, including a piece of the docs, feels way scarier and more hassle to most people than just downloading the appropriate package and double clicking (or unpacking and running python setup.py install, of course). I can't blame them, most particularly when we're talking about people whose familiarity with the stuff that installers do to their computers is hazy and imprecise -- and I see no reason why such people shouldn't be eager Decimal users as well as people with more system administration nous. Alex From revol at free.fr Thu Sep 2 12:24:07 2004 From: revol at free.fr (=?windows-1252?q?Fran=E7ois?= Revol) Date: Thu Sep 2 12:31:02 2004 Subject: [Python-Dev] problem with pymalloc on the BeOS port. In-Reply-To: <1f7befae04082420104f9158df@mail.gmail.com> Message-ID: <1540805831-BeMail@taz> > [Fran?ois Revol] > > Now, I don't see why malloc itself would give such a result, it's > > pyMalloc which places those marks, so the thing malloc does > > wouldn't > > place them 4 bytes of each other for no reason, or repport 0 bytes > > where 4 are allocated. > > I think you're fooling yourself if you believe 4 *were* allocated. > The memory dump shows nothing but gibberish, with 4 blocks of > fbfbfbfb > not a one of which makes sense in context (the numbers before and > after them make no sense as "# of bytes allocated" or as "serial > number" values, so these forbidden-byte blocks don't make sense as > either end of an active pymalloc block). > > You should at least try to get a C traceback at this point, on the > chance that the routine passing the pointer is a clue. We don't even > know here yet whether the complaint came from a free() or realloc() > call. I finally found out what was making python throw up when using pymalloc, (and possibly why I'm getting MemoryErrors without it). It's caused by the BeOS exec() which copies the path to argv[0] without telling anyone. I noticed it was overriding argv[0] in the execed process, but didn't think it was doing that before actually doing the syscall. So this results in a double-free if exec fails. posix_fork() posix_fork() 0 posix_fork() 637 posix_execv1 posix_execv2: path @ 0x80010fb8 ='./gcc' posix_execv3 posix_execv4 posix_execv5: argvlist @ 0x8014f8e0 posix_execv5: argv[0] @ 0x80010fa0 = gcc posix_execv5: argv[1] @ 0x80010e20 = -O0 posix_execv5: argv[2] @ 0x801504c0 = -g posix_execv5: argv[3] @ 0x800193c0 = -fno-strict-aliasing posix_execv5: argv[4] @ 0x80150400 = -I. posix_execv5: argv[5] @ 0x80160f08 = -I/boot/home/Python-2.3.4/./ Include posix_execv5: argv[6] @ 0x8015c028 = -I/boot/home/config/include posix_execv5: argv[7] @ 0x80160f40 = -I/boot/home/Python-2.3.4/Include posix_execv5: argv[8] @ 0x8015c0e8 = -I/boot/home/Python-2.3.4 posix_execv5: argv[9] @ 0x80150460 = -c posix_execv5: argv[10] @ 0x8015e3a8 = /boot/home/Python-2.3.4/Modules/ structmodule.c posix_execv5: argv[11] @ 0x801503b8 = -o posix_execv5: argv[12] @ 0x8015e068 = build/temp.beos-5.1-BePC-2.3/ structmodule.o posix_execv6 execv: No such file or directory posix_execv7: path @ 0x80010fb8 ='./gcc' posix_execv7: argvlist @ 0x8014f8e0 posix_execv7: argv[0] @ 0x80010fb8 = ./gcc <<<<<< that's the problem ! posix_execv7: argv[1] @ 0x80010e20 = -O0 posix_execv7: argv[2] @ 0x801504c0 = -g posix_execv7: argv[3] @ 0x800193c0 = -fno-strict-aliasing posix_execv7: argv[4] @ 0x80150400 = -I. posix_execv7: argv[5] @ 0x80160f08 = -I/boot/home/Python-2.3.4/./ Include posix_execv7: argv[6] @ 0x8015c028 = -I/boot/home/config/include posix_execv7: argv[7] @ 0x80160f40 = -I/boot/home/Python-2.3.4/Include posix_execv7: argv[8] @ 0x8015c0e8 = -I/boot/home/Python-2.3.4 posix_execv7: argv[9] @ 0x80150460 = -c posix_execv7: argv[10] @ 0x8015e3a8 = /boot/home/Python-2.3.4/Modules/ structmodule.c posix_execv7: argv[11] @ 0x801503b8 = -o posix_execv7: argv[12] @ 0x8015e068 = build/temp.beos-5.1-BePC-2.3/ structmodule.o Debug memory block at address p=0x80010fb8: 0 bytes originally requested The 4 pad bytes at p-4 are FORBIDDENBYTE, as expected. The 4 pad bytes at tail=0x80010fb8 are not all FORBIDDENBYTE (0xfb): at tail+0: 0xdb *** OUCH at tail+1: 0xdb *** OUCH at tail+2: 0xdb *** OUCH at tail+3: 0xdb *** OUCH The block was made by call #3688627195 to debug malloc/realloc. Fatal Python error: bad trailing pad byte error: Bad thread ID From aahz at pythoncraft.com Thu Sep 2 15:20:02 2004 From: aahz at pythoncraft.com (Aahz) Date: Thu Sep 2 15:20:16 2004 Subject: [Python-Dev] Coernic Desktop Search versus shutil.rmtree In-Reply-To: <1f7befae040901184316a8ebf6@mail.gmail.com> References: <1f7befae040901184316a8ebf6@mail.gmail.com> Message-ID: <20040902132002.GA13089@panix.com> On Wed, Sep 01, 2004, Tim Peters wrote: > > The cause: Windows has low-level hooks for apps that want to monitor > changes to the filesystem. For example, virus scanners use those > heavily. Coernic also uses them, to reindex changed files in the > background. So it can keep a file open beyond the time Python thinks > it deleted it, and then trying to rmdir its parent directory fails > (because the directory isn't really empty yet). What happens when you use Windows Exploder to delete the folder? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com From tim.peters at gmail.com Thu Sep 2 16:37:03 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 2 16:37:05 2004 Subject: [Python-Dev] Coernic Desktop Search versus shutil.rmtree In-Reply-To: <20040902132002.GA13089@panix.com> References: <1f7befae040901184316a8ebf6@mail.gmail.com> <20040902132002.GA13089@panix.com> Message-ID: <1f7befae04090207371f5b2142@mail.gmail.com> [Tim] >> The cause: Windows has low-level hooks for apps that want to >> monitor changes to the filesystem. For example, virus scanners >> use those heavily. Coernic also uses them, to reindex changed >> files in the background. So it can keep a file open beyond the time >> Python thinks it deleted it, and then trying to rmdir its parent >> directory fails (because the directory isn't really empty yet). [Aahz] > What happens when you use Windows Exploder to delete the folder? I didn't try Explorer specifically. Since I was in a DOS box anyway, I used rmdir/s to clean it out. I'm sure using Explorer would have worked too. This is a timing problem. By the time I can click on the folder to delete it in Explorer, or by the time I can type "rmdir/s xx", Copernic is long done reindexing the files, so there's no problem nuking the directory then. shutil.rmtree issues the rmdir at machine speed. From tim.peters at gmail.com Thu Sep 2 16:59:58 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 2 17:00:00 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <4136AD4E.30503@interlink.com.au> References: <4136AD4E.30503@interlink.com.au> Message-ID: <1f7befae0409020759266a45cb@mail.gmail.com> [Anthony Baxter] > Erm - this particular fix was a bug fix. I'm deeply uncomfortable about > adding the C version of datetime to 2.3 at this very late stage of 2.3's > life cycle. It's quite arguably a bugfix, since datetime.h in 2.3.4 exposes things that can't possibly be used outside of datetimemodule.c (the datetime type objects are referenced in the header, but not exported in a usable way). Anthony Tuininga's patch to *finish* (not really add) the datetime C API is a low-risk change regardless: it doesn't change any existing functionality, it just finishes the job of exposing it to C coders, and adds some new macros for convenience. Now if some platform header file has macros with names like PyDateTime_FromTimestamp or PyDelta_FromDSU then adding these macros to datetime.h could cause new problems. But platform header files don't have macros with names like those (if they did, we would have bumped into it while developing 2.4). From gvanrossum at gmail.com Thu Sep 2 17:42:55 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 17:42:58 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <4136AD4E.30503@interlink.com.au> References: <4136AD4E.30503@interlink.com.au> Message-ID: > > As long as we're backporting C APIs to 2.3, can I request that the new > > datetime API be backported to 2.3? Anthony Tuininga (the cx_Oracle > > author) would be interested in using this and might be willing to help > > out with the work. (And yes, I'm encouraging this because I could use > > this myself.) > > Erm - this particular fix was a bug fix. I'm deeply uncomfortable about > adding the C version of datetime to 2.3 at this very late stage of 2.3's > life cycle. Fair enough. Let's drop the idea. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gvanrossum at gmail.com Thu Sep 2 17:45:34 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 17:45:40 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <1f7befae0409020759266a45cb@mail.gmail.com> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> Message-ID: > It's quite arguably a bugfix, since datetime.h in 2.3.4 exposes things > that can't possibly be used outside of datetimemodule.c (the datetime > type objects are referenced in the header, but not exported in a > usable way). Anthony Tuininga's patch to *finish* (not really add) > the datetime C API is a low-risk change regardless: it doesn't change > any existing functionality, it just finishes the job of exposing it to > C coders, and adds some new macros for convenience. > > Now if some platform header file has macros with names like > > PyDateTime_FromTimestamp > or > PyDelta_FromDSU > > then adding these macros to datetime.h could cause new problems. But > platform header files don't have macros with names like those (if they > did, we would have bumped into it while developing 2.4). Hm, Anthony, what do you think now? (Disregard my previous mail, I was confused by multiple logical threads mixed into the same conversation.) --Guido From anthony at interlink.com.au Thu Sep 2 18:26:12 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 2 18:26:50 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <1f7befae0409020759266a45cb@mail.gmail.com> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> Message-ID: <413749A4.5020902@interlink.com.au> Tim Peters wrote: > [Anthony Baxter] > >>Erm - this particular fix was a bug fix. I'm deeply uncomfortable about >>adding the C version of datetime to 2.3 at this very late stage of 2.3's >>life cycle. > > > It's quite arguably a bugfix, since datetime.h in 2.3.4 exposes things > that can't possibly be used outside of datetimemodule.c Ah - I misunderstood, and thought that 2.3 had no version of datetime.c at all, and Guido was proposing that we add it. So, to get this straight, what _are_ we talking about, exactly? Is there an SF bug/patch with the trunk change? -- Anthony Baxter It's never too late to have a happy childhood. From fredrik at pythonware.com Thu Sep 2 18:34:23 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Sep 2 18:32:37 2004 Subject: [Python-Dev] Re: Coernic Desktop Search versus shutil.rmtree References: <1f7befae040901184316a8ebf6@mail.gmail.com><20040902132002.GA13089@panix.com> <1f7befae04090207371f5b2142@mail.gmail.com> Message-ID: Tim Peters wrote: > This is a timing problem. By the time I can click on the folder to > delete it in Explorer, or by the time I can type "rmdir/s xx", > Copernic is long done reindexing the files, so there's no problem > nuking the directory then. shutil.rmtree issues the rmdir at machine > speed. so a possible robustification would be to add def _rmdir(path): try: os.rmdir(path): except IOError, v: if sys.platform == "win32" and (directory not empty): time.sleep(0.1) os.rmdir(path) else: raise and use _rmdir instead of os.rmdir in _build_cmdtuple... From gvanrossum at gmail.com Thu Sep 2 18:47:45 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 18:47:50 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <413749A4.5020902@interlink.com.au> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <413749A4.5020902@interlink.com.au> Message-ID: Anthony (the other one), can you explain it? On Fri, 03 Sep 2004 02:26:12 +1000, Anthony Baxter wrote: > Tim Peters wrote: > > [Anthony Baxter] > > > >>Erm - this particular fix was a bug fix. I'm deeply uncomfortable about > >>adding the C version of datetime to 2.3 at this very late stage of 2.3's > >>life cycle. > > > > > > It's quite arguably a bugfix, since datetime.h in 2.3.4 exposes things > > that can't possibly be used outside of datetimemodule.c > > Ah - I misunderstood, and thought that 2.3 had no version of datetime.c > at all, and Guido was proposing that we add it. So, to get this > straight, what _are_ we talking about, exactly? Is there an SF > bug/patch with the trunk change? > > > -- > Anthony Baxter > It's never too late to have a happy childhood. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From jcarlson at uci.edu Thu Sep 2 18:59:19 2004 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu Sep 2 19:05:59 2004 Subject: [Python-Dev] Re: Coernic Desktop Search versus shutil.rmtree In-Reply-To: References: <1f7befae04090207371f5b2142@mail.gmail.com> Message-ID: <20040902095224.C10B.JCARLSON@uci.edu> > so a possible robustification would be to add > > def _rmdir(path): > try: > os.rmdir(path): > except IOError, v: > if sys.platform == "win32" and (directory not empty): > time.sleep(0.1) > os.rmdir(path) > else: > raise > > and use _rmdir instead of os.rmdir in _build_cmdtuple... Only for this test. In the general case, there could be other reasons why that deletion failed. One that I run into relatively often is... Shell 1: curpath: :\arbitrary\path\name Shell 2: curpath: :\arbitrary\path command: python -c 'import os;os.remove("name")' In this case, the OSError is the correct thing, and shouldn't be hidden with a 'sleep'. - Josiah From martin at v.loewis.de Thu Sep 2 19:36:43 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 2 19:36:45 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> Message-ID: <41375A2B.2020401@v.loewis.de> Guido van Rossum wrote: >>Now if some platform header file has macros with names like >> >> PyDateTime_FromTimestamp >>or >> PyDelta_FromDSU >> >>then adding these macros to datetime.h could cause new problems. But >>platform header files don't have macros with names like those (if they >>did, we would have bumped into it while developing 2.4). > > > Hm, Anthony, what do you think now? I'm not Anthony (neither, actually), but I do think this is a new feature, not a bug fix - assuming we are talking about the changes between datetime.h in 2.3 and 2.4. This introduces datetime.datetime_CAPI, which is a C object allowing cross-module datetime calls at the C level. This change is very unlikely to break existing code, as existing code just won't use that new API. This is good for a backport. At the same time, this also clearly shows it is a new feature: only new code can use it. Channelling Anthony (Baxter), this cannot be accepted for 2.3. It would allow for code that works on 2.3.5, but fails on 2.3.4. What's worse, the extension module can be built on 2.3.5, and the binary module will fail when run on 2.3.4, as importing the CAPI object would fail. People who rely on that feature should get a compile time error on 2.3.x, instead of compilation succeeding for some x. People who need to support 2.3 as well should use the Python API to the datetime module, not the C API. Regards, Martin From anthony at interlink.com.au Thu Sep 2 19:45:43 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 2 19:46:17 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <41375A2B.2020401@v.loewis.de> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: <41375C47.2020608@interlink.com.au> Martin v. L?wis wrote: > Channelling Anthony (Baxter), this cannot be accepted for 2.3. > It would allow for code that works on 2.3.5, but fails on 2.3.4. > What's worse, the extension module can be built on 2.3.5, and > the binary module will fail when run on 2.3.4, as importing the > CAPI object would fail. Ugh. Thanks for the clarification. I really don't think that this is something we want to add to 2.3.5. From gvanrossum at gmail.com Thu Sep 2 20:05:11 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 20:05:15 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <41375A2B.2020401@v.loewis.de> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: > People who rely on that feature should get a compile time > error on 2.3.x, instead of compilation succeeding for some x. > People who need to support 2.3 as well should use the Python > API to the datetime module, not the C API. Given that it's a CObject, code could easily be written (and I'm sure cx_Oracle will do this) that attempts to import the CObject and uses a fallback if that fails. I expect that cx_Oracle will be just about the only customer of this API. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gvanrossum at gmail.com Thu Sep 2 20:07:22 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 20:07:25 2004 Subject: [Python-Dev] Re: Coernic Desktop Search versus shutil.rmtree In-Reply-To: <20040902095224.C10B.JCARLSON@uci.edu> References: <1f7befae04090207371f5b2142@mail.gmail.com> <20040902095224.C10B.JCARLSON@uci.edu> Message-ID: On Thu, 02 Sep 2004 09:59:19 -0700, Josiah Carlson wrote: > > > so a possible robustification would be to add > > > > def _rmdir(path): > > try: > > os.rmdir(path): > > except IOError, v: > > if sys.platform == "win32" and (directory not empty): > > time.sleep(0.1) > > os.rmdir(path) > > else: > > raise > > > > and use _rmdir instead of os.rmdir in _build_cmdtuple... > > > Only for this test. > > In the general case, there could be other reasons why that deletion > failed. One that I run into relatively often is... > > Shell 1: > curpath: :\arbitrary\path\name > > Shell 2: > curpath: :\arbitrary\path > command: python -c 'import os;os.remove("name")' > > In this case, the OSError is the correct thing, and shouldn't be hidden > with a 'sleep'. > > > - Josiah I surely hope Fredrik was being facetious. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From fredrik at pythonware.com Thu Sep 2 20:17:45 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Sep 2 20:15:58 2004 Subject: [Python-Dev] Re: Re: Coernic Desktop Search versus shutil.rmtree References: <1f7befae04090207371f5b2142@mail.gmail.com> <20040902095224.C10B.JCARLSON@uci.edu> Message-ID: Guido van Rossum wrote > I surely hope Fredrik was being facetious. not necessarily. The rmtree function already takes a couple of flags; I wouldn't mind seeing a "try harder" option for platforms like windows. (but a better solution would probably be a way to "override" the functions used to remove files and directories. I've had to copy and tweak the rmtree code quite a few times, usually to deal with cases where the tree might con- tain read-only files...) From jhylton at gmail.com Thu Sep 2 20:26:55 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Thu Sep 2 20:27:05 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <41375A2B.2020401@v.loewis.de> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: On Thu, 02 Sep 2004 19:36:43 +0200, "Martin v. L?wis" wrote: > I'm not Anthony (neither, actually), but I do think this is a new > feature, not a bug fix - assuming we are talking about the changes > between datetime.h in 2.3 and 2.4. > > This introduces datetime.datetime_CAPI, which is a C object > allowing cross-module datetime calls at the C level. > > This change is very unlikely to break existing code, as existing > code just won't use that new API. This is good for a backport. > > At the same time, this also clearly shows it is a new feature: > only new code can use it. Of late, I've found the True / False introduction in later 2.2 releases to be a pain. I'm writing code on a machine that has 2.2.2, but I occasionally run into machines with earlier versions of 2.2 and then my code fails. It would be easier if it didn't work on any 2.2 release, then I wouldn't be lulled into thinking it will work. Jeremy From martin at v.loewis.de Thu Sep 2 20:28:40 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 2 20:28:41 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: <41376658.6020502@v.loewis.de> Guido van Rossum wrote: > Given that it's a CObject, code could easily be written (and I'm sure > cx_Oracle will do this) that attempts to import the CObject and uses a > fallback if that fails. I expect that cx_Oracle will be just about the > only customer of this API. If there is a fallback already, why do you want the backport? Just use the fallback. Regards, Martin From gvanrossum at gmail.com Thu Sep 2 21:03:01 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 21:03:04 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <41376658.6020502@v.loewis.de> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> <41376658.6020502@v.loewis.de> Message-ID: > If there is a fallback already, why do you want the backport? Just > use the fallback. Because the fallback is slower? -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From anthony at interlink.com.au Thu Sep 2 21:37:47 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 2 21:38:22 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> <41376658.6020502@v.loewis.de> Message-ID: <4137768B.9000800@interlink.com.au> Guido van Rossum wrote: >>If there is a fallback already, why do you want the backport? Just >>use the fallback. > > > Because the fallback is slower? This, to me, is a poor reason to break the backwards/forwards compatibility of binary modules. Yes, modules _could_ be written to do the right thing, and cx_Oracle might. But then someone else comes along and uses it, and notices that it works on 2.3.5, so makes a 2.3 binary package. And people on older 2.3's get a broken package. I'm really really unconvinced that this is a good idea. Anthony From aahz at pythoncraft.com Thu Sep 2 22:02:06 2004 From: aahz at pythoncraft.com (Aahz) Date: Thu Sep 2 22:02:09 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: <20040902200205.GA23600@panix.com> On Thu, Sep 02, 2004, Jeremy Hylton wrote: > > Of late, I've found the True / False introduction in later 2.2 > releases to be a pain. I'm writing code on a machine that has 2.2.2, > but I occasionally run into machines with earlier versions of 2.2 and > then my code fails. It would be easier if it didn't work on any 2.2 > release, then I wouldn't be lulled into thinking it will work. +1 -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com From skip at pobox.com Thu Sep 2 22:06:21 2004 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 2 22:07:15 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <4137768B.9000800@interlink.com.au> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> <41376658.6020502@v.loewis.de> <4137768B.9000800@interlink.com.au> Message-ID: <16695.32061.372611.9265@montanaro.dyndns.org> >>>>> "Anthony" == Anthony Baxter writes: >> Because the fallback is slower? Anthony> This, to me, is a poor reason to break the backwards/forwards Anthony> compatibility of binary modules. +100 At my new job we maintain "stable" and "unstable" versions (*) of our current project. I frequently hold up Python's policy of "only bug fixes are allowed in stable versions" as a shining example of how things should be done. We've violated that policy on occasion. When that happens it generally comes back to bite us, and we only have three users down the hall, not users scattered all over the planet. Skip (*) I use those terms *very* loosely. From gvanrossum at gmail.com Thu Sep 2 22:23:54 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 2 22:24:00 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <16695.32061.372611.9265@montanaro.dyndns.org> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> <41376658.6020502@v.loewis.de> <4137768B.9000800@interlink.com.au> <16695.32061.372611.9265@montanaro.dyndns.org> Message-ID: OK, I withdraw my request. Never mind. :-) --Guido From martin at v.loewis.de Thu Sep 2 22:25:18 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 2 22:25:18 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> <41376658.6020502@v.loewis.de> Message-ID: <413781AE.8000204@v.loewis.de> Guido van Rossum wrote: >>If there is a fallback already, why do you want the backport? Just >>use the fallback. > > > Because the fallback is slower? I see. However, people with existing installation will have to suffer from the slow-down, anyway; people will need to upgrade in order to see the speed improvement. If they need the speed advantage (which is exactly how much?), they should consider upgrading to 2.4. That an extension module runs slower in 2.3 than it does in 2.3 is not a bug in 2.3 - a lot of things run slower in 2.3, yet we don't backport all performance changes to 2.3, especially if code has to be adopted to make use of it. Regards, Martin From barry at python.org Thu Sep 2 22:27:14 2004 From: barry at python.org (Barry Warsaw) Date: Thu Sep 2 22:27:22 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: <1094156834.8722.21.camel@geddy.wooz.org> On Thu, 2004-09-02 at 14:26, Jeremy Hylton wrote: > Of late, I've found the True / False introduction in later 2.2 > releases to be a pain. I'm writing code on a machine that has 2.2.2, > but I occasionally run into machines with earlier versions of 2.2 and > then my code fails. It would be easier if it didn't work on any 2.2 > release, then I wouldn't be lulled into thinking it will work. Just to add: while this can be worked around in code, it's extremely tedious both to add those workarounds, and to remove them when they're no longer necessary. I think it's generally not a good idea to do it. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040902/a09a1882/attachment.pgp From Scott.Daniels at Acm.Org Thu Sep 2 22:51:15 2004 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Thu Sep 2 22:50:14 2004 Subject: [Python-Dev] Re: Alternative placeholder delimiters for PEP 292 In-Reply-To: <59e9fd3a040829231141cd3fe4@mail.gmail.com> References: <59e9fd3a040829231141cd3fe4@mail.gmail.com> Message-ID: Andrew Durdin wrote: > A Yet Simpler Proposal, modifying that of PEP 292 ... > ... placeholders are delimited by braces {}. Do you know about the techinique I use? It works w/o a new library: Surround-style delimiting, using a single (specifiable) character. def subst(template, _sep='$', **kwds): if '' not in kwds: kwds[''] = _sep # Allow doubled _sep for _sep. parts = template.split(_sep) parts[1::2] = [kwds[element] for element in parts[1::2]] return template[0:0].join(parts) For 2.4, use a generator expression, not a list comprehension: def subst(template, _sep='$', **kwds): if '' not in kwds: kwds[''] = _sep # Allow doubled _sep for _sep. parts = template.split(_sep) parts[1::2] = (kwds[element] for element in parts[1::2]) return template[0:0].join(parts) Then you can use: subst('What I $mean$ is $$5.00', mean='really mean') or subst(u'What I $mean$ is $$5.00', mean=u'really mean') or subst('What I $mean$ is $$5.00', mean='really mean', *locals()) or ... -- Scott David Daniels Scott.Daniels@Acm.Org From Scott.Daniels at Acm.Org Thu Sep 2 23:49:02 2004 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Thu Sep 2 23:47:54 2004 Subject: [Python-Dev] Re: PEP 309 updated slightly In-Reply-To: <36f88922040831075133a98188@mail.gmail.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F72@UKDCX001.uk.int.atosorigin.com> <36f88922040831075133a98188@mail.gmail.com> Message-ID: Alex Naanou wrote: > if isinstance(func, LCurry) or isinstance(func, RCurry): ^^ is better written as: if isinstance(func, (LCurry, RCurry)): -- -- Scott David Daniels Scott.Daniels@Acm.Org From Scott.Daniels at Acm.Org Fri Sep 3 01:50:54 2004 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Fri Sep 3 01:49:48 2004 Subject: [Python-Dev] Re: FW: [Python-checkins] python/nondist/peps pep-0318.txt, 1.25, 1.26 In-Reply-To: References: <004b01c48a29$2b6d6ba0$04f9cc97@oemcomputer> Message-ID: Thomas Heller wrote: > "Raymond Hettinger" writes: >>.... everyone knew they [metaclasses] were powerful when they were >> put in, but no one knew how they would be used or whether they were >>necessary. In fact, two versions later, we still don't know those >>answers. > > Sorry I have to say this, but I don't think you know what you're talking > about in this paragraph. I would suggest we don't know the practical range of application yet. It is clear that some black magicians are happy, but metaclasses are not yet as well understood as list comprehensions in the sense of, "this is when you use them; this is an abuse." Metaclasses are more structural and less linguistic; such things take longer to absorb as design structure elements. This is all by way of saying, "nope, he has a point." -- Scott David Daniels Scott.Daniels@Acm.Org From alex.nanou at gmail.com Fri Sep 3 01:59:01 2004 From: alex.nanou at gmail.com (Alex Naanou) Date: Fri Sep 3 01:59:06 2004 Subject: [Python-Dev] Re: PEP 309 updated slightly In-Reply-To: References: <16E1010E4581B049ABC51D4975CEDB8803060F72@UKDCX001.uk.int.atosorigin.com> <36f88922040831075133a98188@mail.gmail.com> Message-ID: <36f88922040902165989cd456@mail.gmail.com> On Thu, 02 Sep 2004 14:49:02 -0700, Scott David Daniels wrote: > Alex Naanou wrote: > > > if isinstance(func, LCurry) or isinstance(func, RCurry): > ^^ is better written as: > if isinstance(func, (LCurry, RCurry)): I know! :) ...and it also is faster! but that particular code was a) written quite a while back. b) at my course at the MSU the students seem to have ALLOT less trouble understanding the existing version of the code.... (I did try both...) yes, this is a bit of a trade-off.... need to think about it a bit more! though this an off-topic here, but it would indeed be interesting to know if anyone has a different opinion.... Thanks! ^_^ --- Alex. From aahz at pythoncraft.com Fri Sep 3 02:11:01 2004 From: aahz at pythoncraft.com (Aahz) Date: Fri Sep 3 02:11:04 2004 Subject: [Python-Dev] Coernic Desktop Search versus shutil.rmtree In-Reply-To: <1f7befae04090207371f5b2142@mail.gmail.com> References: <1f7befae040901184316a8ebf6@mail.gmail.com> <20040902132002.GA13089@panix.com> <1f7befae04090207371f5b2142@mail.gmail.com> Message-ID: <20040903001101.GA16770@panix.com> On Thu, Sep 02, 2004, Tim Peters wrote: > > [Tim] >>> The cause: Windows has low-level hooks for apps that want to >>> monitor changes to the filesystem. For example, virus scanners >>> use those heavily. Coernic also uses them, to reindex changed >>> files in the background. So it can keep a file open beyond the time >>> Python thinks it deleted it, and then trying to rmdir its parent >>> directory fails (because the directory isn't really empty yet). > > [Aahz] >> What happens when you use Windows Exploder to delete the folder? > > I didn't try Explorer specifically. Since I was in a DOS box anyway, > I used rmdir/s to clean it out. I'm sure using Explorer would have > worked too. > > This is a timing problem. By the time I can click on the folder to > delete it in Explorer, or by the time I can type "rmdir/s xx", > Copernic is long done reindexing the files, so there's no problem > nuking the directory then. shutil.rmtree issues the rmdir at machine > speed. Question is, what happens when you use Explorer while Coernic is busy inside a folder? If it barfs, then I think it's reasonable for rmtree() to barf. Or are you saying that it's not possible to make that test? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I saw `cout' being shifted "Hello world" times to the left and stopped right there." --Steve Gonedes From anthony at computronix.com Thu Sep 2 17:59:10 2004 From: anthony at computronix.com (Anthony Tuininga) Date: Fri Sep 3 02:58:47 2004 Subject: [SPAM-heur] Re: Re: Re: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> Message-ID: <4137434E.4080505@computronix.com> Well, I find the argument convincing enough and it is quite safe. I am willing to make the necessary patches and it would be quite convenient to be able to use the C API in Python 2.3 as well. So I'm in favor but I'll bow to the greater wisdom of the Python development community since I really am not significantly involved. :-) Guido van Rossum wrote: >>It's quite arguably a bugfix, since datetime.h in 2.3.4 exposes things >>that can't possibly be used outside of datetimemodule.c (the datetime >>type objects are referenced in the header, but not exported in a >>usable way). Anthony Tuininga's patch to *finish* (not really add) >>the datetime C API is a low-risk change regardless: it doesn't change >>any existing functionality, it just finishes the job of exposing it to >>C coders, and adds some new macros for convenience. >> >>Now if some platform header file has macros with names like >> >> PyDateTime_FromTimestamp >>or >> PyDelta_FromDSU >> >>then adding these macros to datetime.h could cause new problems. But >>platform header files don't have macros with names like those (if they >>did, we would have bumped into it while developing 2.4). > > > Hm, Anthony, what do you think now? (Disregard my previous mail, I was > confused by multiple logical threads mixed into the same > conversation.) > > --Guido -- Anthony Tuininga anthony@computronix.com Computronix Distinctive Software. Real People. Suite 200, 10216 - 124 Street NW Edmonton, AB, Canada T5N 4A3 Phone: (780) 454-3700 Fax: (780) 454-3838 http://www.computronix.com From anthony at computronix.com Thu Sep 2 19:15:12 2004 From: anthony at computronix.com (Anthony Tuininga) Date: Fri Sep 3 02:58:48 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <413749A4.5020902@interlink.com.au> Message-ID: <41375520.8060503@computronix.com> Yes, there are patches. There happens to be two entries because I missed some documentation changes the first time around. The links are: https://sourceforge.net/tracker/?group_id=5470&atid=305470&func=detail&aid=876130 https://sourceforge.net/tracker/?group_id=5470&atid=305470&func=detail&aid=986010 The files changed are dist/src/Modules/datetimemodule.c dist/src/Include/datetime.h dist/src/Doc/api/concrete.tex To summarize, an attribute named "datetime_CAPI" is added to the datetime module. A macro PyDateTime_IMPORT is used to access this attribute and then additional macros are available for manipulating datetime instances. If you want to look at an actual implementation you can take a look at cx_Oracle 4.1 beta 1 available at http://starship.python.net/crew/atuining If you have further questions, let me know and I'll try to answer them. Guido van Rossum wrote: > Anthony (the other one), can you explain it? > > On Fri, 03 Sep 2004 02:26:12 +1000, Anthony Baxter > wrote: > >>Tim Peters wrote: >> >>>[Anthony Baxter] >>> >>> >>>>Erm - this particular fix was a bug fix. I'm deeply uncomfortable about >>>>adding the C version of datetime to 2.3 at this very late stage of 2.3's >>>>life cycle. >>> >>> >>>It's quite arguably a bugfix, since datetime.h in 2.3.4 exposes things >>>that can't possibly be used outside of datetimemodule.c >> >>Ah - I misunderstood, and thought that 2.3 had no version of datetime.c >>at all, and Guido was proposing that we add it. So, to get this >>straight, what _are_ we talking about, exactly? Is there an SF >>bug/patch with the trunk change? >> >> >>-- >>Anthony Baxter >>It's never too late to have a happy childhood. >> >> >>_______________________________________________ >>Python-Dev mailing list >>Python-Dev@python.org >>http://mail.python.org/mailman/listinfo/python-dev >>Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > -- Anthony Tuininga anthony@computronix.com Computronix Distinctive Software. Real People. Suite 200, 10216 - 124 Street NW Edmonton, AB, Canada T5N 4A3 Phone: (780) 454-3700 Fax: (780) 454-3838 http://www.computronix.com From anthony at computronix.com Thu Sep 2 19:42:58 2004 From: anthony at computronix.com (Anthony Tuininga) Date: Fri Sep 3 02:58:49 2004 Subject: [SPAM-heur] Re: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c,2.56, 2.56.8.1 In-Reply-To: <41375A2B.2020401@v.loewis.de> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: <41375BA2.5070700@computronix.com> I won't presume to dictate policy on this matter but since this is C API you do have to go through some effort in order to use it. I already have the following in cx_Oracle #if (PY_VERSION_HEX >= 0x02040000) ....do stuff.... #endif I am assuming that I could (if this patch is accepted) simply change it to #if (PY_VERSION_HEX >= 0x02030500) ....do stuff.... #endif Whether or not this makes it acceptable or not I leave that to the release manager to decide.... Martin v. L?wis wrote: > Guido van Rossum wrote: > >>> Now if some platform header file has macros with names like >>> >>> PyDateTime_FromTimestamp >>> or >>> PyDelta_FromDSU >>> >>> then adding these macros to datetime.h could cause new problems. But >>> platform header files don't have macros with names like those (if they >>> did, we would have bumped into it while developing 2.4). >> >> >> >> Hm, Anthony, what do you think now? > > > I'm not Anthony (neither, actually), but I do think this is a new > feature, not a bug fix - assuming we are talking about the changes > between datetime.h in 2.3 and 2.4. > > This introduces datetime.datetime_CAPI, which is a C object > allowing cross-module datetime calls at the C level. > > This change is very unlikely to break existing code, as existing > code just won't use that new API. This is good for a backport. > > At the same time, this also clearly shows it is a new feature: > only new code can use it. > > Channelling Anthony (Baxter), this cannot be accepted for 2.3. > It would allow for code that works on 2.3.5, but fails on 2.3.4. > What's worse, the extension module can be built on 2.3.5, and > the binary module will fail when run on 2.3.4, as importing the > CAPI object would fail. > > People who rely on that feature should get a compile time > error on 2.3.x, instead of compilation succeeding for some x. > People who need to support 2.3 as well should use the Python > API to the datetime module, not the C API. > > Regards, > Martin > -- Anthony Tuininga anthony@computronix.com Computronix Distinctive Software. Real People. Suite 200, 10216 - 124 Street NW Edmonton, AB, Canada T5N 4A3 Phone: (780) 454-3700 Fax: (780) 454-3838 http://www.computronix.com From tim.peters at gmail.com Fri Sep 3 03:30:36 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Sep 3 03:30:51 2004 Subject: [Python-Dev] Coernic Desktop Search versus shutil.rmtree In-Reply-To: <20040903001101.GA16770@panix.com> References: <1f7befae040901184316a8ebf6@mail.gmail.com> <20040902132002.GA13089@panix.com> <1f7befae04090207371f5b2142@mail.gmail.com> <20040903001101.GA16770@panix.com> Message-ID: <1f7befae0409021830219eb2fe@mail.gmail.com> [Aahz] > Question is, what happens when you use Explorer while Coernic is busy > inside a folder? If it barfs, then I think it's reasonable for rmtree() > to barf. Or are you saying that it's not possible to make that test? I didn't claim it was unreasonable for shutil.rmtree to barf, and I have no interest in making that test. As mentioned before, Copernic's use of the filesystem hooks drives CVS crazy too. It's a new app, and using the filesystem hooks transparently is a subtle undertaking. They'll fix it eventually. From tim.peters at gmail.com Fri Sep 3 05:52:52 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Sep 3 05:52:54 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c, 2.56, 2.56.8.1 In-Reply-To: <41375A2B.2020401@v.loewis.de> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> Message-ID: <1f7befae04090220522da07284@mail.gmail.com> [Martin v. L?wis] > ... > Channelling Anthony (Baxter), this cannot be accepted for 2.3. > It would allow for code that works on 2.3.5, but fails on 2.3.4. > What's worse, the extension module can be built on 2.3.5, and > the binary module will fail when run on 2.3.4, as importing the > CAPI object would fail. That is a strong argument, and you're right that "the rules" don't allow it. OTOH, unlike Jeremy's True/False example, this is an obscure piece of C with only one known user in the world (Anthony wrote the datetime C API patch, and Anthony wrote the Oracle wrapper which is the datetime C API's only known user). So an opposing "practicality beats purity" argument *could* apply too. I'm not going to make it myself, because I personally have no use for the C datetime API . From anthony at interlink.com.au Fri Sep 3 07:46:36 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Sep 3 07:47:05 2004 Subject: [SPAM-heur] Re: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules threadmodule.c,2.56, 2.56.8.1 In-Reply-To: <41375BA2.5070700@computronix.com> References: <4136AD4E.30503@interlink.com.au> <1f7befae0409020759266a45cb@mail.gmail.com> <41375A2B.2020401@v.loewis.de> <41375BA2.5070700@computronix.com> Message-ID: <4138053C.2070501@interlink.com.au> Anthony Tuininga wrote: [snip] > Whether or not this makes it acceptable or not I leave that to the > release manager to decide.... I can understand why it would be convenient for this to be in 2.3.5, but I really don't want to see this in the release23-maint branch. The advantages (it allows cx_oracle to be faster) are nowhere near strong enough to outweigh the disadvantages (breaking binary compatibility between bugfix releases). From the feedback I've received since I started the current run of bugfix releases, one of the strongest messages I've received is that people _really_ _really_ like the no-new-features rule, because it makes it much easier to justify rolling out a bugfix release. I'm not saying that this rule must never be broken, only that it would need an extremely good reason to do so. This case is even worse, as it is both a new feature _and_ a binary imcompatibility. If you wanted to, you could produce a package with a patched datetime module, and instructions for allowing users to install it into their existing installation. This is entirely up to them, then. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at python.org Fri Sep 3 10:36:16 2004 From: anthony at python.org (Anthony Baxter) Date: Fri Sep 3 10:36:31 2004 Subject: [Python-Dev] RELEASED Python 2.4, alpha 3 Message-ID: <41382D00.70906@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team and the Python community, I'm happy to announce the third alpha of Python 2.4. Python 2.4a3 is an alpha release. We'd greatly appreciate it if you could download it, kick the tires and let us know of any problems you find, but it is not suitable for production usage. ~ http://www.python.org/2.4 In this release we have PEP-292 string templates, a new syntax for multi-line imports, and a large number of other bug fixes and improvements. See either the highlights, the What's New in Python 2.4, or the detailed NEWS file -- all available from the Python 2.4 webpage. This will hopefully be the last alpha in the Python 2.4 cycle - a first beta will follow in a few weeks. Once the first beta is out, we're in feature-freeze mode - so if you've got new things you want in, make sure you hurry! Please log any problems you have with this release in the SourceForge bug tracker (noting that you're using 2.4a3): ~ http://sourceforge.net/bugs/?group_id=5470 Enjoy the new release, Anthony Anthony Baxter anthony@python.org Python Release Manager (on behalf of the entire python-dev team) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBOCz9Dt3F8mpFyBYRAmgqAJ42drhwIe3QLSx6WyUxOUPewUtX4QCgt5Wv mP4MfJRsXy6t0IcS6fY8Mmc= =efD5 -----END PGP SIGNATURE----- From anthony at interlink.com.au Fri Sep 3 11:30:16 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Sep 3 11:30:31 2004 Subject: please keep trunk frozen for a bit (was Re: [Python-Dev] RELEASED Python 2.4, alpha 3) In-Reply-To: <41382D00.70906@python.org> References: <41382D00.70906@python.org> Message-ID: <413839A8.9040700@interlink.com.au> If people could keep the trunk frozen for about 5 or 6 hours (in case of a need for a brown-paper-bag release) I'd appreciate it. Say, until 2004-09-03 14:00 UTC or so... -- Anthony Baxter It's never too late to have a happy childhood. From jim at zope.com Fri Sep 3 13:02:10 2004 From: jim at zope.com (Jim Fulton) Date: Fri Sep 3 13:02:15 2004 Subject: [Python-Dev] Want to make dictproxy objects creatable from Python Message-ID: <41384F32.10801@zope.com> New-style classes use dict proxies to protect their dictionaries from direct manipulation. I would like to be able to use these in other situations, but the dictproxy class doesn't let me create new instances. I propose to give the class __new__ and __init__ methods so that it is callable from Python. Any objections? May I do this for 2.4b1? Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From adurdin at gmail.com Fri Sep 3 15:55:25 2004 From: adurdin at gmail.com (Andrew Durdin) Date: Fri Sep 3 15:55:34 2004 Subject: [Python-Dev] Re: Alternative placeholder delimiters for PEP 292 In-Reply-To: <59e9fd3a040829231141cd3fe4@mail.gmail.com> References: <59e9fd3a040829231141cd3fe4@mail.gmail.com> Message-ID: <59e9fd3a0409030655395c213a@mail.gmail.com> On Mon, 30 Aug 2004 16:11:34 +1000, Andrew Durdin wrote: > A Yet Simpler Proposal, modifying that of PEP 292 > > I propose that the Template module not use $ to set off > placeholders; instead, placeholders are delimited by braces {}. Barry, would you care to comment on my proposal, particularly my points in the rationale for it? I've just taken the 2.4a3 Template class and modified it to fit this proposal. The result is below. I've also got a modified unit test and tex file to account for the changes at http://andy.durdin.net/test_pep292_braces.py and http://andy.durdin.net/bracetmpl.text -- I'd make a complete patch, but I'm not sure what tools to use (I'm running Win2k): can someone point me in the right direction? #################################################################### import re as _re class Template(unicode): """A string class for supporting {}-substitutions.""" __slots__ = [] # Search for {{, }}, {identifier}, and any bare {'s or }'s pattern = _re.compile(r""" (?P\{{2})| # Escape sequence of two { braces (?P\}{2})| # Escape sequence of two } braces {(?P[_a-z][_a-z0-9]*)}| # $ and a brace delimited identifier (?P\{|\}) # Other ill-formed { or } expressions """, _re.IGNORECASE | _re.VERBOSE) def __mod__(self, mapping): def convert(mo): if mo.group('escapedlt') is not None: return '{' if mo.group('escapedrt') is not None: return '}' if mo.group('bogus') is not None: raise ValueError('Invalid placeholder at index %d' % mo.start('bogus')) val = mapping[mo.group('braced')] return unicode(val) return self.pattern.sub(convert, self) class SafeTemplate(Template): """A string class for supporting {}-substitutions. This class is 'safe' in the sense that you will never get KeyErrors if there are placeholders missing from the interpolation dictionary. In that case, you will get the original placeholder in the value string. """ __slots__ = [] def __mod__(self, mapping): def convert(mo): if mo.group('escapedlt') is not None: return '{' if mo.group('escapedrt') is not None: return '}' if mo.group('bogus') is not None: raise ValueError('Invalid placeholder at index %d' % mo.start('bogus')) braced = mo.group('braced') try: return unicode(mapping[braced]) except KeyError: return '{' + braced + '}' return self.pattern.sub(convert, self) del _re From barry at python.org Fri Sep 3 17:30:24 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 3 17:30:29 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292: Simple String Substitutions In-Reply-To: References: <20040827233958.GA5560@panix.com> <000501c48d34$19ce9000$e841fea9@oemcomputer> Message-ID: <1094225424.8811.71.camel@geddy.wooz.org> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040903/1b7fd280/attachment.pgp From barry at python.org Fri Sep 3 17:38:52 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 3 17:38:55 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292: SimpleString Substitutions In-Reply-To: <003301c48e55$04942ac0$e841fea9@oemcomputer> References: <003301c48e55$04942ac0$e841fea9@oemcomputer> Message-ID: <1094225932.8788.85.camel@geddy.wooz.org> On Mon, 2004-08-30 at 01:48, Raymond Hettinger wrote: > By not inheriting from unicode, the bug can be fixed while retaining a > class implementation (see sandbox\curry292.py for an example). > > But, be clear, it *is* a bug. > > If all the inputs are strings, Unicode should not magically appear. See > all the other string methods as an example. But the Template classes aren't string methods, so I don't think the analogy is quite right. Because the template string itself is by definition a Unicode, it actually makes more sense that everything its mod operator returns is also a Unicode. So I still don't think it's a bug. > Someday, all will be > Unicode, until then, some apps choose to remain Unicode free. Also, > there is a build option to not even compile Unicode support -- it would > be a bummer to have the $ templates fail as a result. Maybe. Like the doctor says, well, don't do that! (i.e. use Templates and disable unicode). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040903/b8ac2991/attachment.pgp From Paul.Moore at atosorigin.com Fri Sep 3 17:49:59 2004 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Sep 3 17:50:04 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292:Simple String Substitutions Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> From: Barry Warsaw > Attached is a demo using the 2.4a3 implementation of string.Template. > Note that the only change in the Template subclass is the pattern, and > there, it's just that the 'named' and 'braced' groups got a '.' in the > second character class. Ah, I follow. The lookup logic is in the mapping class rather than in the template. Would it be useful to factor out the "identifier syntax" bit of the pattern? The "escaped" and "bogus" groups are less likely to need changing than what constitutes an identifier. Hmm, you'd have to get fancy then, as the "obvious" approach is a class attribute id = "[_a-z][_a-z0-9]*" but then computing pattern while keeping it as a class attribute is harder than I can work out right now. Forget it - let's keep it simple until someone shows a real need. Thanks for the sample, Paul. __________________________________________________________________________ This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Origin group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. __________________________________________________________________________ From bob at redivi.com Fri Sep 3 18:58:46 2004 From: bob at redivi.com (Bob Ippolito) Date: Fri Sep 3 18:58:52 2004 Subject: [Python-Dev] Coernic Desktop Search versus shutil.rmtree In-Reply-To: <1f7befae0409021830219eb2fe@mail.gmail.com> References: <1f7befae040901184316a8ebf6@mail.gmail.com> <20040902132002.GA13089@panix.com> <1f7befae04090207371f5b2142@mail.gmail.com> <20040903001101.GA16770@panix.com> <1f7befae0409021830219eb2fe@mail.gmail.com> Message-ID: <84EC7C1B-FDCA-11D8-95A7-000A95686CD8@redivi.com> On Sep 2, 2004, at 9:30 PM, Tim Peters wrote: > [Aahz] >> Question is, what happens when you use Explorer while Coernic is busy >> inside a folder? If it barfs, then I think it's reasonable for >> rmtree() >> to barf. Or are you saying that it's not possible to make that test? > > I didn't claim it was unreasonable for shutil.rmtree to barf, and I > have no interest in making that test. As mentioned before, Copernic's > use of the filesystem hooks drives CVS crazy too. It's a new app, and > using the filesystem hooks transparently is a subtle undertaking. > They'll fix it eventually. It could very well be a bug in Windows, too. I think I ran across one last night. Sometimes win32's os.stat(...) raises an exception for folders that *do* exist. The only case I ran into was an iPod though. The iPod is a 3rd generation 15gb (read: not a new click wheel), FAT32 formatted (fresh restore from the win32 iPod Updater) managed by iTunes with "Enable disk use" on, and is plugged into a laptop with a "low speed" USB port (I think 1.1, whatever was before 2.0). It is mounted as 'F:\\' and it has several folders on the root (IIRC ['Notes', 'Calendars', 'Contacts', 'iPod_Control']). Nothing special about any of the folders, except iPod_Control which is attrib +h. os.stat fails on EVERY folder except 'F:\\iPod_Control'. os.listdir('F:\\') shows them all. win32api.GetFileAttributes works on all of these folders and returns just FILE_DIRECTORY (or whatever it's called, I don't have a win32 machine where I'm at right now; the constant is 16). iPod_Control probably returns a slightly different set of flags, I don't remember trying it. os.stat('F:\\Notes\\Instructions') succeeds (Instructions is a file). I believe this may be a bug in _wstat64i (name may be slightly wrong.. from memory) or something? I tried it on Windows XP (not SP2, but should be otherwise patched up) with the python.org distributions of Python 2.3.0 and Python 2.3.4 and got the same results both times. -bob From raymond.hettinger at verizon.net Fri Sep 3 20:51:52 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Fri Sep 3 20:52:37 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: <1094225932.8788.85.camel@geddy.wooz.org> Message-ID: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> > > By not inheriting from unicode, the bug can be fixed while retaining a > > class implementation (see sandbox\curry292.py for an example). > > > > But, be clear, it *is* a bug. > > > > If all the inputs are strings, Unicode should not magically appear. See > > all the other string methods as an example. > > But the Template classes aren't string methods, so I don't think the > analogy is quite right. Because the template string itself is by > definition a Unicode, it actually makes more sense that everything its > mod operator returns is also a Unicode. So I still don't think it's a > bug. Templates are not Unicode by definition. That is an arbitrary implementation quirk and a design flaw. The '%(key)s' forms do not behave this way. They return str unless one of the inputs are unicode. People should be able to use Python and not have to deal with Unicode unless that is an intentional part of their design. Unless there is some compelling advantage to going beyond the PEP and changing all the rules, it is a bug. Raymond From skip at pobox.com Fri Sep 3 21:21:18 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 3 21:21:31 2004 Subject: [Python-Dev] Isn't the trunk still frozen? Message-ID: <16696.50222.258426.648284@montanaro.dyndns.org> Anthony gave me a mild virtual rap on the knuckles for checking in a doc change yesterday while the trunk was frozen for the release. Later on he asked if we could keep it frozen for a bit more. I just saw a bunch of checkin messages float by, but haven't seen an all-clear from Anthony. Isn't the trunk still frozen? Skip From bac at OCF.Berkeley.EDU Fri Sep 3 21:30:48 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Fri Sep 3 21:31:58 2004 Subject: [Python-Dev] Isn't the trunk still frozen? In-Reply-To: <16696.50222.258426.648284@montanaro.dyndns.org> References: <16696.50222.258426.648284@montanaro.dyndns.org> Message-ID: <4138C668.3080009@ocf.berkeley.edu> Skip Montanaro wrote: > Anthony gave me a mild virtual rap on the knuckles for checking in a doc > change yesterday while the trunk was frozen for the release. Later on he > asked if we could keep it frozen for a bit more. I just saw a bunch of > checkin messages float by, but haven't seen an all-clear from Anthony. > Isn't the trunk still frozen? > Although Anthony has not given the explicit go-ahead for checkins he said it would be okay after 14:00 UTC today (which is 9:00 CST) in another email to the list so it should be okay. Personally I am still waiting for Anthony to say it is okay to check in again. -Brett From mal at egenix.com Fri Sep 3 22:37:54 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Fri Sep 3 22:37:59 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> Message-ID: <4138D622.6050807@egenix.com> Raymond Hettinger wrote: >>>By not inheriting from unicode, the bug can be fixed while retaining a >>>class implementation (see sandbox\curry292.py for an example). >>> >>>But, be clear, it *is* a bug. >>> >>>If all the inputs are strings, Unicode should not magically appear. See >>>all the other string methods as an example. >> >>But the Template classes aren't string methods, so I don't think the >>analogy is quite right. Because the template string itself is by >>definition a Unicode, it actually makes more sense that everything its >>mod operator returns is also a Unicode. So I still don't think it's a >>bug. > > > Templates are not Unicode by definition. That is an arbitrary > implementation quirk and a design flaw. > > The '%(key)s' forms do not behave this way. They return str unless one > of the inputs are unicode. > > People should be able to use Python and not have to deal with Unicode > unless that is an intentional part of their design. > > Unless there is some compelling advantage to going beyond the PEP and > changing all the rules, it is a bug. I think Barry needs some backup here. First, please be aware that normal use of Templates is for formatting *text* data. Second, it is good design and good practice to store text data in Unicode objects, because that's what they were designed for, while string objects have always been an abstract container for storing bytes with varying meanings and interpretations. The latter is a design flaw that needs to get fixed, not the choice of Unicode as Template base class. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 03 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From bac at OCF.Berkeley.EDU Fri Sep 3 22:43:35 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Fri Sep 3 22:43:50 2004 Subject: [Python-Dev] Making custom patterns for string.Template easier (was: Alternative Implementation for PEP 292) In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> Message-ID: <4138D777.3090009@ocf.berkeley.edu> Moore, Paul wrote: [SNIP] > Hmm, you'd have to get fancy then, as the "obvious" approach is a > class attribute > > id = "[_a-z][_a-z0-9]*" > > but then computing pattern while keeping it as a class attribute is > harder than I can work out right now. > > Forget it - let's keep it simple until someone shows a real need. > OK, but it isn't *that* bad. I already have it so that the parts of the pattern can at least be separate and it leaves the class alone:: >>> test = string.Template('This has a ${dotted.thing} in it') [26527 refs] >>> test.braced = "\${(?P[_a-z][_a-z0-9]*(\.[_a-z0-9]+)?)}" [26532 refs] >>> test % {'dotted.thing': "dotted name"} u'This has a dotted name in it' >>> string.Template('This has a ${dotted.thing} in it') % {'dotted.thing': "dotted named"} Traceback (most recent call last): File "", line 1, in ? File "/Users/drifty/Code/CVS/python/dist/src/Lib/string.py", line 123, in __mod__ return self.pattern.sub(convert, self) File "/Users/drifty/Code/CVS/python/dist/src/Lib/string.py", line 119, in convert raise ValueError('Invalid placeholder at index %d' % ValueError: Invalid placeholder at index 11 Making it so that one doesn't have to specify the extra stuff (such as braces, $, group name, etc.) would not be hard but could take away from the power of it all. But it does not in any way mess with the class and the class' regex is still compiled at class creation time so slowdown from anything only happens if someone changes something (did rip out the empty __slots__ value, though). But once again I don't know how useful it would be. The only thing coming off the top of my head is Raymond's Cheetah example of making the rules looser for bogus $s. With this you just need to substitute self.bogus. Nice thing about that is if we change the rules for the other pattern groups later on the past code will get that benefit insted of being locked into the pattern they probably copied at the time of writing and pasted in with their minor tweak. Also won't lead to errors down the road if we add another group to the pattern. Anyway, I did this partially as an exercise so not a huge deal to me if it doesn't make it in, so +0 from me for adding the functionality. -Brett From anthony at interlink.com.au Fri Sep 3 22:51:13 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Sep 3 22:52:09 2004 Subject: [Python-Dev] Isn't the trunk still frozen? In-Reply-To: <4138C668.3080009@ocf.berkeley.edu> References: <16696.50222.258426.648284@montanaro.dyndns.org> <4138C668.3080009@ocf.berkeley.edu> Message-ID: <4138D941.4050608@interlink.com.au> Brett C. wrote: > Although Anthony has not given the explicit go-ahead for checkins he > said it would be okay after 14:00 UTC today (which is 9:00 CST) in > another email to the list so it should be okay. > > Personally I am still waiting for Anthony to say it is okay to check in > again. I'm very concerned by this bugreport: http://www.python.org/sf/1022010 My windows box has a dodgy RAID card, so I can't check it myself, but the -n -e is definately in Tools/msi/msi.py. I have no ideas whether they're meant to be - the msi stuff is a complete unknown to me. Martin? Anthony -- Anthony Baxter It's never too late to have a happy childhood. From tim.peters at gmail.com Fri Sep 3 23:01:08 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Sep 3 23:01:10 2004 Subject: [Python-Dev] Isn't the trunk still frozen? In-Reply-To: <4138D941.4050608@interlink.com.au> References: <16696.50222.258426.648284@montanaro.dyndns.org> <4138C668.3080009@ocf.berkeley.edu> <4138D941.4050608@interlink.com.au> Message-ID: <1f7befae040903140179f8919@mail.gmail.com> [Anthony Baxter] > I'm very concerned by this bugreport: > > http://www.python.org/sf/1022010 > > My windows box has a dodgy RAID card, so I can't check it myself, > but the -n -e is definately in Tools/msi/msi.py. I have no ideas > whether they're meant to be - the msi stuff is a complete unknown > to me. > > Martin? I think the bug report got it right: due to copy-'n-paste error, the associations for .py (etc) files were mistakely given IDLE-specific arguments. So they don't work. Everyone on c.l.py who noticed this (the bug report isn't unique) quickly figured out how to fix it themself (by editing the file assocations created on their box to get rid of the inappropriate arguments). Presumably Martin can fix that by building a new MSI installer after repairing the MSI setup. If he does that before doing a cvs up, it's "just" a matter of cutting a new Windows installer. From amk at amk.ca Sat Sep 4 00:13:39 2004 From: amk at amk.ca (A.M. Kuchling) Date: Sat Sep 4 00:13:44 2004 Subject: [Python-Dev] Isn't the trunk still frozen? In-Reply-To: <1f7befae040903140179f8919@mail.gmail.com> References: <16696.50222.258426.648284@montanaro.dyndns.org> <4138C668.3080009@ocf.berkeley.edu> <4138D941.4050608@interlink.com.au> <1f7befae040903140179f8919@mail.gmail.com> Message-ID: <20040903221339.GA5870@rogue.amk.ca> On Fri, Sep 03, 2004 at 05:01:08PM -0400, Tim Peters wrote: > Presumably Martin can fix that by building a new MSI installer after > repairing the MSI setup. If he does that before doing a cvs up, it's > "just" a matter of cutting a new Windows installer. Should Raymond's last random.py bugfix also be included, if there's going to be a quick 2.4a4? --amk From raymond.hettinger at verizon.net Sat Sep 4 00:36:28 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Sep 4 00:37:15 2004 Subject: [Python-Dev] Isn't the trunk still frozen? In-Reply-To: <20040903221339.GA5870@rogue.amk.ca> Message-ID: <001201c49206$74505fa0$e841fea9@oemcomputer> > > Presumably Martin can fix that by building a new MSI installer after > > repairing the MSI setup. If he does that before doing a cvs up, it's > > "just" a matter of cutting a new Windows installer. > > Should Raymond's last random.py bugfix also be included, if there's > going to be a quick 2.4a4? That's up to Anthony. It is a critical fix. Raymond From raymond.hettinger at verizon.net Sat Sep 4 01:15:17 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Sep 4 01:16:06 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: <4138D622.6050807@egenix.com> Message-ID: <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> [MAL] > Second, it is good design and good practice to store text > data in Unicode objects, because that's what they were designed for, > while string objects have always been an abstract container for storing > bytes with varying meanings and interpretations. IMO, it is subversive to start taking new string functions/methods and coercing their results to Unicode. Someday we may be there, Py3.0 perhaps, but str is not yet deprecated. Until then, a user should reasonably expect SISO str in, str out. This is doubly true when the rest of python makes active efforts to avoid SIUO (see % formatting and''.join() for example). Someday Guido may get wild and turn all text uses of str into unicode. Most likely, it will need a PEP so that all the issues get thought through and everything gets changed at once. Slipping this into the third alpha as if it were part of PEP292 is not a good idea. The PEP was about simplification. Tossing in unnecessary unicode coercions is not in line with that goal. Does anyone else think this is a crummy idea? Is everyone ready for unicode coercions to start sprouting everywhere? Raymond From aahz at pythoncraft.com Sat Sep 4 01:49:45 2004 From: aahz at pythoncraft.com (Aahz) Date: Sat Sep 4 01:49:47 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> References: <4138D622.6050807@egenix.com> <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> Message-ID: <20040903234945.GA5856@panix.com> On Fri, Sep 03, 2004, Raymond Hettinger wrote: > > The PEP was about simplification. Tossing in unnecessary unicode > coercions is not in line with that goal. > > Does anyone else think this is a crummy idea? > Is everyone ready for unicode coercions to start sprouting everywhere? +0 (agreeing with Raymond) Correct me if I'm wrong, but there are a couple of issues here: * First of all, I believe that unicode strings are interoperable (down to hashing) with 8-bit strings, as long as there are no non-7-bit ASCII characters. Where things get icky is with encoded 8-bit strings making use of e.g. Latin-1. So the question is whether we need full interoperability. * Unicode strings take four bytes per character (not counting decomposed characters). Is it fair at this point in Python's evolution to force this kind of change in performance metric, essentially silently? The PEP and docs do make the issue of Unicode fairly clear up-front, so anyone choosing to use template strings knows what zie is getting into. But what about someone grabbing a module that uses template strings internally?.... OTOH, I'm not up for making a big issue out of this. If Raymond really is the only person who feels strongly about it, it probably isn't going to be a big deal in practice. In addition, I think it's the kind of change that could be easily fixed in the next release. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I saw `cout' being shifted "Hello world" times to the left and stopped right there." --Steve Gonedes From martin at v.loewis.de Sat Sep 4 03:46:24 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 4 03:46:21 2004 Subject: [Python-Dev] Isn't the trunk still frozen? In-Reply-To: <4138D941.4050608@interlink.com.au> References: <16696.50222.258426.648284@montanaro.dyndns.org> <4138C668.3080009@ocf.berkeley.edu> <4138D941.4050608@interlink.com.au> Message-ID: <41391E70.6000204@v.loewis.de> > My windows box has a dodgy RAID card, so I can't check it myself, but > the -n -e is definately in Tools/msi/msi.py. I have no ideas whether > they're meant to be - the msi stuff is a complete unknown to me. I've now corrected the MSI, and put a new version on http://www.dcl.hpi.uni-potsdam.de/home/loewis/python-2.4a3.2.msi 8e84ce2308613955b54673bbfb47697f python-2.4a3.2.msi This has the very same binaries as the previous package - just the packaging itself has changed (also bringing back "Edit with IDLE" in the context menu). Regards, Martin From fredrik at pythonware.com Sat Sep 4 08:23:26 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 4 08:21:44 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292:SimpleStringSubstitutions References: <4138D622.6050807@egenix.com> <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> Message-ID: Raymond Hettinger wrote: > > The PEP was about simplification. Tossing in unnecessary unicode > coercions is not in line with that goal. > > Does anyone else think this is a crummy idea? Yes. Whatever MAL and Barry thinks, Python's current model is 8+8=8, U+U=U, and 8+U=U for ascii U. That's an advantage, not a bug. > Is everyone ready for unicode coercions to start sprouting everywhere? No. And when that time comes, storing everything as 32-bit characters is not the right answer either. From mal at egenix.com Sat Sep 4 13:51:22 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Sat Sep 4 13:51:40 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> References: <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> Message-ID: <4139AC3A.8050501@egenix.com> Raymond Hettinger wrote: > [MAL] > >>Second, it is good design and good practice to store text >>data in Unicode objects, because that's what they were designed for, >>while string objects have always been an abstract container for >>storing bytes with varying meanings and interpretations. Hmm, I wonder why you cut away the first part: "First, please be aware that normal use of Templates is for formatting *text* data." This is the most important argument for making Template a Unicode-subclass. Coercion to Unicode then is a logical consequence and fully in line with what Python has been doing since version 1.6, ie. U=U+U and U=U+8 (to use /Fs notation). > IMO, it is subversive to start taking new string functions/methods and > coercing their results to Unicode. I don't understand... there's nothing subversive here. If strings meet Unicode the result gets coerced to Unicode. Nothing surprising here. Why are you guys putting so much effort into fighting Unicode ? I often get the impression that you are considering Unicode a nightmare rather than a blessing. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 04 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From mal at egenix.com Sat Sep 4 14:05:32 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Sat Sep 4 14:05:39 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292:SimpleStringSubstitutions In-Reply-To: References: <4138D622.6050807@egenix.com> <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> Message-ID: <4139AF8C.9040907@egenix.com> Fredrik Lundh wrote: > Raymond Hettinger wrote: > >>The PEP was about simplification. Tossing in unnecessary unicode >>coercions is not in line with that goal. >> >>Does anyone else think this is a crummy idea? > > > Yes. Whatever MAL and Barry thinks, Python's current model is 8+8=8, > U+U=U, and 8+U=U for ascii U. That's an advantage, not a bug. Indeed, but I don't see how that's different from what the PEP is saying. >>Is everyone ready for unicode coercions to start sprouting everywhere? > > No. > > And when that time comes, storing everything as 32-bit characters is not the > right answer either. I'll leave that for the libc designers to decide :-) If you look at performance, there's not much difference between 8-bit strings and Unicode, so the only argument against using Unicode for storing text data is memory usage. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 04 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From martin at v.loewis.de Sat Sep 4 15:12:10 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 4 15:12:06 2004 Subject: [Python-Dev] random.py still broken wrt. urandom Message-ID: <4139BF2A.6000907@v.loewis.de> I consider the random module still broken in its current form (1.66). It tries to invoke random.urandom(1) in order to find out whether urandom works. Instead, it should defer that determination until urandom is actually used; i.e. instead of if _urandom is None: import time a = long(time.time() * 256) # use fractional seconds else: a = long(_hexlify(_urandom(16)), 16) it should read try: a = long(_hexlify(os.urandom(16)), 16) except NotImplementedError: import time a = long(time.time() * 256) # use fractional seconds IMO the patch to random.py should not have been applied without a review. Regards, Martin From fredrik at pythonware.com Sat Sep 4 15:20:55 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 4 15:19:08 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation forPEP 292:SimpleStringSubstitutions References: <4138D622.6050807@egenix.com> <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> <4139AF8C.9040907@egenix.com> Message-ID: M.-A. Lemburg wrote: >> Yes. Whatever MAL and Barry thinks, Python's current model is 8+8=8, >> U+U=U, and 8+U=U for ascii U. That's an advantage, not a bug. > > Indeed, but I don't see how that's different from what the PEP > is saying. the current implementation is T(8) % 8 = U. which violates the 8+8=8 rule. >> And when that time comes, storing everything as 32-bit characters is not the >> right answer either. > > I'll leave that for the libc designers to decide :-) > > If you look at performance, there's not much difference between > 8-bit strings and Unicode, so the only argument against using > Unicode for storing text data is memory usage. I used to make that argument, but these days, I no longer think that you can talk about performance without taking memory usage into account. From tim.peters at gmail.com Sat Sep 4 16:40:55 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sat Sep 4 16:40:58 2004 Subject: [Python-Dev] random.py still broken wrt. urandom In-Reply-To: <4139BF2A.6000907@v.loewis.de> References: <4139BF2A.6000907@v.loewis.de> Message-ID: <1f7befae040904074040c1ba4e@mail.gmail.com> [Martin v. L?wis] > I consider the random module still broken in its current form (1.66). > It tries to invoke random.urandom(1) in order to find out whether > urandom works. Instead, it should defer that determination until > urandom is actually used; Why? > i.e. instead of > > if _urandom is None: > import time > a = long(time.time() * 256) # use fractional seconds > else: > a = long(_hexlify(_urandom(16)), 16) > > it should read > > try: > a = long(_hexlify(os.urandom(16)), 16) > except NotImplementedError: > import time > a = long(time.time() * 256) # use fractional seconds Why? I like it better the way it is, in part because this kind of determination is made at least 4 times in random.py, and the "_urandom is None" spelling is quite clear. The from binascii import hexlify as _hexlify import certainly doesn't belong in the try/except block setting that up, though. > IMO the patch to random.py should not have been applied without a > review. I think that falls under the "expert rule": Raymond has done more work on random.py than everyone else combined over the last year or two, and he had no reason to suspect this change would be controversial. To the contrary, I specifically suggested (on python-dev) that using urandom in seed() methods, when available, would be a significant improvement over time.time()-based seeding. Now that you've made your objection, I confess I still have no idea why you're objecting (see "why?" ). I did review the patch (after the fact) for numeric correctness (which did lead to changing the code, due to a subtle numeric flaw in the original HardwareRandom.random). From martin at v.loewis.de Sat Sep 4 17:55:15 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 4 17:55:12 2004 Subject: [Python-Dev] random.py still broken wrt. urandom In-Reply-To: <1f7befae040904074040c1ba4e@mail.gmail.com> References: <4139BF2A.6000907@v.loewis.de> <1f7befae040904074040c1ba4e@mail.gmail.com> Message-ID: <4139E563.8010506@v.loewis.de> Tim Peters wrote: >>I consider the random module still broken in its current form (1.66). >>It tries to invoke random.urandom(1) in order to find out whether >>urandom works. Instead, it should defer that determination until >>urandom is actually used; > > > Why? Invoking urandom() causes /dev/urandom to be opened on Unix (if available). I really would prefer if merely importing the random module would not open files (and keep them open for the entire Python run). Operations like this contribute to startup time. Now, since the random module also creates a Random object, importing random would still open /dev/urandom even if the logic dealing with its absence was somewhat deferred - unless seeding the RNG would also be deferred until it is first used. Still, consuming randomness just to determine whether it is available seems wrong. Importing a module should not affect system state unless absolutely necessary. > I think that falls under the "expert rule": Raymond has done more > work on random.py than everyone else combined over the last year or > two, and he had no reason to suspect this change would be > controversial. To the contrary, I specifically suggested (on > python-dev) that using urandom in seed() methods, when available, > would be a significant improvement over time.time()-based seeding. I have no problem with that aspect of the change. However, I wish somebody had noticed that unavailability of urandom is expressed through a NotImplementedError, not through absence of the function itself. It is bad luck that this state of the code was released as 2.4a3. Regards, Martin From barry at python.org Sat Sep 4 18:25:38 2004 From: barry at python.org (Barry Warsaw) Date: Sat Sep 4 18:25:46 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: <4138D622.6050807@egenix.com> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> Message-ID: <1094315138.8696.36.camel@geddy.wooz.org> On Fri, 2004-09-03 at 16:37, M.-A. Lemburg wrote: > I think Barry needs some backup here. Thanks MAL! I'll point out that Template was very deliberately subclassed from unicode, so Template instances /are/ unicode objects. From the standpoint of type conversion, using /F's notation, T(8) == U, thus because U % 8 == U, T(8) % 8 == U. Other than .encode() are there any other methods of unicode objects that return 8bit strings? I don't think so, so it seems completely natural that T % 8 returns U. Raymond is against the class-based implementation of PEP 292, but if you accept the class implementation of 292 (which I still believe is the right choice), then the fact that the mod operator always returns a unicode makes perfect sense. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040904/7a62acc7/attachment.pgp From barry at python.org Sat Sep 4 18:30:45 2004 From: barry at python.org (Barry Warsaw) Date: Sat Sep 4 18:30:50 2004 Subject: [Python-Dev] Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: <4139AC3A.8050501@egenix.com> References: <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> <4139AC3A.8050501@egenix.com> Message-ID: <1094315445.8693.40.camel@geddy.wooz.org> On Sat, 2004-09-04 at 07:51, M.-A. Lemburg wrote: > Why are you guys putting so much effort into fighting > Unicode ? I often get the impression that you are considering > Unicode a nightmare rather than a blessing. Indeed. For example, the only way to maintain your sanity in an i18n'd application is to convert all text[1] to unicode as early as possible, deal with only unicode internally, and encode to 8bit strings as late as possible, if ever. -Barry [1] "text" defined as "strings intended for human consumption". -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040904/de02da8f/attachment.pgp From barry at python.org Sat Sep 4 18:32:39 2004 From: barry at python.org (Barry Warsaw) Date: Sat Sep 4 18:32:42 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292:Simple String Substitutions In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> Message-ID: <1094315559.8696.42.camel@geddy.wooz.org> On Fri, 2004-09-03 at 11:49, Moore, Paul wrote: > Would it be useful to factor out the "identifier syntax" bit of the > pattern? The "escaped" and "bogus" groups are less likely to need > changing than what constitutes an identifier. And if they did, you'd want to change them both at the same time. Do you have any ideas for an efficient, easily documented implementation? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040904/b3c5fb1b/attachment-0001.pgp From barry at python.org Sat Sep 4 18:33:36 2004 From: barry at python.org (Barry Warsaw) Date: Sat Sep 4 18:33:39 2004 Subject: [Python-Dev] Alternative placeholder delimiters for PEP 292 In-Reply-To: <59e9fd3a040829231141cd3fe4@mail.gmail.com> References: <59e9fd3a040829231141cd3fe4@mail.gmail.com> Message-ID: <1094315616.8721.44.camel@geddy.wooz.org> On Mon, 2004-08-30 at 02:11, Andrew Durdin wrote: > I propose that the Template module not use $ to set off > placeholders; instead, placeholders are delimited by braces {}. > The following rules for {}-placeholders apply: The PEP 292 rules were specifically chosen for their similarity to placeholder syntaxes in many other languages. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040904/c1b34f0e/attachment.pgp From barry at python.org Sat Sep 4 18:54:18 2004 From: barry at python.org (Barry Warsaw) Date: Sat Sep 4 18:54:22 2004 Subject: [Python-Dev] Re: Making custom patterns for string.Template easier (was: Alternative Implementation for PEP 292) In-Reply-To: <4138D777.3090009@ocf.berkeley.edu> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> <4138D777.3090009@ocf.berkeley.edu> Message-ID: <1094316858.8696.58.camel@geddy.wooz.org> On Fri, 2004-09-03 at 16:43, Brett C. wrote: > Anyway, I did this partially as an exercise so not a huge deal to me if > it doesn't make it in, so +0 from me for adding the functionality. So, where's the code?! :) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040904/0446cf0a/attachment.pgp From raymond.hettinger at verizon.net Sat Sep 4 22:03:25 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Sep 4 22:04:12 2004 Subject: [Python-Dev] Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: <1094315138.8696.36.camel@geddy.wooz.org> Message-ID: <006a01c492ba$3d90f920$e841fea9@oemcomputer> > Other than .encode() are there any other methods of unicode objects that > return 8bit strings? That misses the point. Templates do not have to be unicode objects. Template can be their own class rather than a subclass of unicode. The application does not demand that unicode be mentioned at all. There seems to be a strong "just live with it" argument but no advantages are offered other than it matching your personal approach to text handling. Why force it when you don't have to. At least three of your users (me, Aahz, and Fred) do not want unicode output when we have str inputs. > Raymond is against the class-based implementation of PEP 292, That choice is independent of the decision of whether to always coerce to unicode. Also, it more accurate to say that I think __mod__ operator is not ideal. If you want to stay with classes, Guido's __call__ syntax is also fine. It avoids the issues with %, makes it possible to have keyword arguments, and lets you take advantage of polymorphism. The % operator has several issues: * it is mnemonic for %(name)s substitution not $ formatting. * it is hard to find in the docs * it is does not accept tuple/scalar arguments like % formatting * its precedence is more appropriate for int.__mod__ Raymond From adurdin at gmail.com Sun Sep 5 00:23:49 2004 From: adurdin at gmail.com (Andrew Durdin) Date: Sun Sep 5 00:23:52 2004 Subject: [Python-Dev] Alternative placeholder delimiters for PEP 292 In-Reply-To: <59e9fd3a0409041520114d0604@mail.gmail.com> References: <59e9fd3a040829231141cd3fe4@mail.gmail.com> <1094315616.8721.44.camel@geddy.wooz.org> <59e9fd3a0409041520114d0604@mail.gmail.com> Message-ID: <59e9fd3a04090415236f9bc0ac@mail.gmail.com> On Sat, 04 Sep 2004 12:33:36 -0400, Barry Warsaw wrote: > On Mon, 2004-08-30 at 02:11, Andrew Durdin wrote: > > > I propose that the Template module not use $ to set off > > placeholders; instead, placeholders are delimited by braces {}. > > The following rules for {}-placeholders apply: > > The PEP 292 rules were specifically chosen for their similarity to > placeholder syntaxes in many other languages. Sure. But just because many other languages do it that way doesn't mean that it's the best way for Python. There are significant advantages to using paired delimiters instead of a single prefix delimiter. The "Rationale" section of PEP 292 says only that the desire was for something simpler than the built-in % substitution; if the similarity to many other languages is also an important part of the rationale, then the PEP should be modified to take that into account, should it not? From bac at OCF.Berkeley.EDU Sun Sep 5 01:22:26 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Sep 5 01:22:30 2004 Subject: [Python-Dev] Re: Making custom patterns for string.Template easier (was: Alternative Implementation for PEP 292) In-Reply-To: <1094316858.8696.58.camel@geddy.wooz.org> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> <4138D777.3090009@ocf.berkeley.edu> <1094316858.8696.58.camel@geddy.wooz.org> Message-ID: <413A4E32.2010104@ocf.berkeley.edu> Barry Warsaw wrote: > On Fri, 2004-09-03 at 16:43, Brett C. wrote: > > >>Anyway, I did this partially as an exercise so not a huge deal to me if >>it doesn't make it in, so +0 from me for adding the functionality. > > > So, where's the code?! :) > Not on SF since the bloody thing is down! I will stick it up on a patch, assign to you, and report the tracker # here as soon as it is back up. -Brett From tim.peters at gmail.com Sun Sep 5 06:32:33 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sun Sep 5 06:32:37 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292:Simple String Substitutions In-Reply-To: <1094315559.8696.42.camel@geddy.wooz.org> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> <1094315559.8696.42.camel@geddy.wooz.org> Message-ID: <1f7befae040904213277ffa84@mail.gmail.com> [Paul Moore] >> Would it be useful to factor out the "identifier syntax" bit of the >> pattern? The "escaped" and "bogus" groups are less likely to need >> changing than what constitutes an identifier. [Barry Warsaw] > And if they did, you'd want to change them both at the same time. Do > you have any ideas for an efficient, easily documented implementation? You'll rarely hear me say this , but fiddling classes at class creation time is exactly what metaclasses are for. For example, suppose you said a Template subclass could define a class variable `idpat`, containing a regexp matching that subclass's idea of "an identifier". Then we could define a metaclass once-and-for-all, like so: class _TemplateFiddler(type): pattern = r""" (?P\${2})| # Escape sequence of two $ signs \$(?P%s)| # $ and a Python identifier \${(?P%s)}| # $ and a brace delimited identifier (?P\$) # Other ill-formed $ expressions """ def __init__(cls, name, bases, dct): super(_TemplateFiddler, cls).__init__(name, bases, dct) idpat = cls.idpat cls.pattern = _re.compile(_TemplateFiddler.pattern % (idpat, idpat), _re.IGNORECASE | _re.VERBOSE) That substitutes the idpat regexp into the base pattern in two spots, compiles it, and attaches the result as the `pattern` attribute of the class being defined. The definition of Template changes like so: class Template(unicode): # same """A string class for supporting $-substitutions.""" # same __metaclass__ = _TemplateFiddler # this is new __slots__ = [] # same idpat = r'[_a-z][_a-z0-9]*' # this repaces the current `pattern` # The rest is the same. While the implementation relies on understanding metaclasses, users don't have to know about that. The docs are easy ("define a class vrbl `idpat`"), and it's as efficient as if subclasses had compiled the full regexp themselves. Indeed, you can do any amount of computation once in the metaclass __init__, and cache the results in attributes of the class. From tim.peters at gmail.com Sun Sep 5 07:02:35 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sun Sep 5 07:02:38 2004 Subject: [Python-Dev] Dangerous exceptions (was Re: Another test_compiler mystery) In-Reply-To: <20040816112916.GA19969@vicky.ecs.soton.ac.uk> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> Message-ID: <1f7befae04090422024afaee58@mail.gmail.com> [Armin Rigo] > ... Here is a patch attempting to do what I described: > http://www.python.org/sf/1009929 > > It's an extension of the asynchronous exception mecanism used to signal > between threads. PyErr_Clear() can send some exceptions to its own thread > using this mecanism. (So it is thread-safe.) I'm sorry that I haven't had time to look at this. But since I didn't and don't, let's try to complicate it . Some exceptions should never be suppressed unless named explicitly, and a real bitch is that some user-defined exceptions can fit in that category too. The ones that give me (and my employer) the most grief are the tree of exceptions deriving from ZODB's ConflictError. ConflictError is a serious thing: it essentially means the current transaction cannot succeed, and the app should give up (and maybe retry the current transaction from its start). Suppressing ConflictError by accident-- even inside a hasattr() call! --can grossly reduce efficiency, and has a long history too of provoking subtle, catastrophic, database corruption bugs. I would like to see Python's exception hierarchy grow more sophisticated in this respect. MemoryError, SystemExit, and KeyboardInterrupt are things that should not be caught by "except Exception:", neither by a bare "except:", nor by hasattr() or C-level dict lookup. ZODB's ConflictError is another of that ilk. I'd like to see "except Exception:" become synonymous with bare "except:", and move the "dangerous exceptions" to subclass off a new branch of the exception hierarchy. It could be that something like your patch is the only practical way to make this work in the C implementation, so I'm keen on it. From raymond.hettinger at verizon.net Sun Sep 5 07:07:35 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sun Sep 5 07:08:24 2004 Subject: [Python-Dev] decorator support Message-ID: <001f01c49306$41f86060$e841fea9@oemcomputer> In my experiments with decorators, it is common to wrap the original function with a new function. After creating the new function, there are efforts to make it look like the old: newf.__doc__ = oldf.__doc__ # copy the docstring newf.__dict__.update(oldf.__dict__) # copy attributes newf.__name__ = oldf.__name__ # keep the name (new in Py2.4) All is well and good except the argspec. Running help() on the new function gives: funcname(*args, **kwds) The original docstring Running help() on the original function gives: funcname(arg1, arg2) The original docstring So, it would be nice if there were some support for carrying forward the argspec to inform help(), calltips(), and inspect(). FWIW, I do know that with sufficient gyrations a decorator could do this on its own, but it is way too difficult for general use. Raymond From anthony at interlink.com.au Sun Sep 5 09:51:07 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Sep 5 09:51:55 2004 Subject: [Python-Dev] random.py fixage Message-ID: <413AC56B.5060308@interlink.com.au> I've made a patch (from CVS) for the random.py breakage and linked it from the top of the 2.4 page. I'll be sending an email out shortly with this and the new windows installer availability. I thought about cutting a 2.4a4, but decided against it. For one thing, it'd mean Martin would need to cut more Windows installers, if we don't want to end up with mismatched windows installers and tarballs. For another, it's an _alpha_ release, and so it's not as vital as if it'd been something like a 2.3.4 bug, or a 2.4 final. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From fredrik at pythonware.com Sun Sep 5 10:26:28 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Sep 5 10:24:45 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> Message-ID: Barry wrote: > I'll point out that Template was very deliberately subclassed from > unicode, so Template instances /are/ unicode objects. From the > standpoint of type conversion, using /F's notation, T(8) == U, thus > because U % 8 == U, T(8) % 8 == U. from a user perspective, there's no reason to make templates a sub- class of unicode, so the rest of your argument is irrelevant. instead of looking at use patterns, you're stuck defending the existing code. that's not a good way to design usable code. From anthony at python.org Sun Sep 5 09:58:15 2004 From: anthony at python.org (Anthony Baxter) Date: Sun Sep 5 10:34:09 2004 Subject: [Python-Dev] UPDATE: New 2.4a3 Windows installer and random.py patch available Message-ID: <413AC717.5010707@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Due to a goof in the packaging scripts, the Windows installer that was released on Friday for 2.4a3 broke file associations for .py files. There's a fixed installer (python-2.4a3.2.msi) available from the Python 2.4 web page. There's also a patch for the breakage for random.py on systems that don't have support for the new os.urandom() call. This is also available from the Python 2.4 web page. ~ http://www.python.org/2.4/ We apologise to those affected by these bugs, and I'd like to thank the folks who downloaded the release and let us know about the problems so promptly. Anthony Baxter anthony@python.org Python Release Manager (on behalf of the entire python-dev team) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBOscVDt3F8mpFyBYRAk2xAJ9KcxQf5JOTmD6dpOiBShe/8jnWSwCgnL/S hN44eBC0mqygEhltFK2W9Ig= =W445 -----END PGP SIGNATURE----- From fredrik at pythonware.com Sun Sep 5 10:36:51 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Sep 5 10:35:08 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleStringSubstitutions References: <1094315138.8696.36.camel@geddy.wooz.org> <006a01c492ba$3d90f920$e841fea9@oemcomputer> Message-ID: Raymond Hettinger wrote: > There seems to be a strong "just live with it" argument but no > advantages are offered other than it matching your personal approach to > text handling. Why force it when you don't have to. At least three of > your users (me, Aahz, and Fred) do not want unicode output when we have > str inputs. one of which wrote the original unicode implementation, and the mixed-type regular expression engine used to implement templates, and a very popular XML library that successfully uses mixed-type text to handle text faster and using less memory than all other Python XML libraries. I've shown over and over again that Unicode-aware text handling in Python doesn't have to be slow and bloated; I'd prefer if we kept it that way. From martin at v.loewis.de Sun Sep 5 11:16:45 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 5 11:16:41 2004 Subject: [Python-Dev] decorator support In-Reply-To: <001f01c49306$41f86060$e841fea9@oemcomputer> References: <001f01c49306$41f86060$e841fea9@oemcomputer> Message-ID: <413AD97D.9060301@v.loewis.de> Raymond Hettinger wrote: > So, it would be nice if there were some support for carrying forward the > argspec to inform help(), calltips(), and inspect(). What were you thinking of? I could imagine a predefined class, such as class copyfuncattrs: def __init__(self, f): self.f = f def __call__(self, func): res = self.f(func) res.__name__ = func.__name__ res.__doc__ = func.__doc__ res.__dict__.update(func.__dict__) return res This could be used to define a decorator @copyfuncattrs def trace(f): def do_trace(*args): print "invoking", f.__name__, args return f(*args) return do_trace which in turn could be used to decorate a function @trace def hello(): "Print a nice greeting" print "Hello, world" which in turn could be called and inspected hello() print hello.__doc__ Then, the question is where copyfuncattrs should live, and I would object that to be yet another builtin. Regards, Martin From raymond.hettinger at verizon.net Sun Sep 5 11:24:13 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sun Sep 5 11:25:08 2004 Subject: [Python-Dev] decorator support In-Reply-To: <413AD97D.9060301@v.loewis.de> Message-ID: <000d01c4932a$1c912400$e841fea9@oemcomputer> {Martin] > Then, the question is where copyfuncattrs should live I imagine that a number of useful recipes like this will emerge over the next few months and need to be collected in a module. For now, a wiki might be a good idea. Raymond From anthony at interlink.com.au Sun Sep 5 11:50:09 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Sep 5 11:50:31 2004 Subject: [Python-Dev] decorator support In-Reply-To: <413AD97D.9060301@v.loewis.de> References: <001f01c49306$41f86060$e841fea9@oemcomputer> <413AD97D.9060301@v.loewis.de> Message-ID: <413AE151.9090309@interlink.com.au> Martin v. L?wis wrote: > Then, the question is where copyfuncattrs should live, and I would > object that to be yet another builtin. I think it's very likely that in 2.5 we'll have some sort of 'decorators' module that captures these sorts of things. I don't think it's likely we'll know enough about the various ins and outs of decorators to want to put something in 2.4. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From jacobs at theopalgroup.com Sun Sep 5 15:13:41 2004 From: jacobs at theopalgroup.com (Kevin Jacobs) Date: Sun Sep 5 15:13:48 2004 Subject: [Python-Dev] decorator support In-Reply-To: <001f01c49306$41f86060$e841fea9@oemcomputer> References: <001f01c49306$41f86060$e841fea9@oemcomputer> Message-ID: <413B1105.4070305@theopalgroup.com> Raymond Hettinger wrote: >In my experiments with decorators, it is common to wrap the original >function with a new function. >[...] >So, it would be nice if there were some support for carrying forward the >argspec to inform help(), calltips(), and inspect(). > >FWIW, I do know that with sufficient gyrations a decorator could do this >on its own, but it is way too difficult for general use. > > The way I look at it, this is a situation analogous to symbolic links at the filesystem level. My first intuition is to add a function attribute, say func.__proxyfor__ that indicates that a function is a proxy for another. That way, introspection and reflection tools that understand that attribute can extract information from both the proxy and the proxied functions without too much behind the scenes black magic. Also note that this is not just an issue with decorators -- I run into it frequently when writing metaclasses. I tend not to copy the proxied functions dictionary, attributes, etc. Rather, I re-introduce the proxied function into the class namespace under a different name. e.g., a Synchronized metaclass, which implements thread synchronization, replaces each method with a proxy that performs the locking gymnastics. However, the original methods are accessible via '__unlocked'. This allows documentation and introspection tools to find and output information on the true function signatures. All that is missing is the '__proxyfor__' attribute to link them and tools that interpret it. I realize that decorators are limited to dealing with one name binding at a time without resorting to _getframe (unlike metaclasses, which can rummage through the entire class with impunity). Thus, I'd like to keep brainstorming on how to address this issue. Once we have something good, I'm even up for contributing some of the necessary code. -Kevin From edcjones at erols.com Sun Sep 5 15:57:36 2004 From: edcjones at erols.com (Edward C. Jones) Date: Sun Sep 5 16:03:57 2004 Subject: [Python-Dev] Re: Alternative placeholder delimiters for PEP 292 In-Reply-To: <20040905082502.9CEB41E4007@bag.python.org> References: <20040905082502.9CEB41E4007@bag.python.org> Message-ID: <413B1B50.9010803@erols.com> On Sat, 04 Sep 2004, Barry Warsaw wrote: >On Mon, 2004-08-30 at 02:11, Andrew Durdin wrote: > >> I propose that the Template module not use $ to set off >> placeholders; instead, placeholders are delimited by braces {}. >> The following rules for {}-placeholders apply: >> >> > >The PEP 292 rules were specifically chosen for their similarity to >placeholder syntaxes in many other languages. > > Just because other languages use "$" does not mean that Python should. IMHO, lines of code with "$"s in them do not read with a smooth flow. They start to look like regex or Perl. I suggest "<@...@>" which I think reads more smoothly. If "$" is too locked in to change, please allow users to change the default. From barry at python.org Sun Sep 5 17:25:28 2004 From: barry at python.org (Barry Warsaw) Date: Sun Sep 5 17:25:37 2004 Subject: [Python-Dev] Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: <006a01c492ba$3d90f920$e841fea9@oemcomputer> References: <006a01c492ba$3d90f920$e841fea9@oemcomputer> Message-ID: <1094397928.8145.35.camel@geddy.wooz.org> On Sat, 2004-09-04 at 16:03, Raymond Hettinger wrote: > > Other than .encode() are there any other methods of unicode objects > that > > return 8bit strings? > > That misses the point. Templates do not have to be unicode objects. But it's damn convenient for them to be though. Please read the Internationalization section of the PEP. In addition to being able to use them directly as gettext catalog keys, I think there will be /a lot/ of scenarios where you won't want to care whether you have a Template or a unicode -- you will just want to treat everything as a unicode string without having to do tedious type checking. > There seems to be a strong "just live with it" argument but no > advantages are offered other than it matching your personal approach to > text handling. > Why force it when you don't have to. At least three of > your users (me, Aahz, and Fred) do not want unicode output when we have > str inputs. PEP 292 was a direct outgrowth of my experience in trying to internationalize an application and make it (much) easier for my translators to contribute. Many of them are not Python gurus and the existing % syntax is clearly a common tripping point. I'm convinced that the current design of PEP 292 is right for the use cases I originally designed it for. To be generous, if the three of you disagree, then it's because you have other requirements. That's fine; maybe they're just incompatible with mine. Maybe I did a poor job of explaining how my uses cases lead to the design of PEP 292. If all that's true, then PEP 292 can't be made general enough and should be rejected, and the code should be ripped out of the standard library. Let applications use whatever is appropriate for their own uses cases. Because PEP 292 is a library addition, Python itself won't suffer in the least. The implementations you proposed won't be of any use to me. Fortunately, the archives will be replete with all the alternatives for future software archaeologists. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040905/94f595a1/attachment.pgp From barry at python.org Sun Sep 5 17:26:49 2004 From: barry at python.org (Barry Warsaw) Date: Sun Sep 5 17:26:52 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> Message-ID: <1094398009.8144.37.camel@geddy.wooz.org> On Sun, 2004-09-05 at 04:26, Fredrik Lundh wrote: > from a user perspective, there's no reason to make templates a sub- > class of unicode, so the rest of your argument is irrelevant. Not true. I had a very specific reason for making Templates subclasses of unicode. Read the Internationalization section of the PEP. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040905/6e10965b/attachment.pgp From barry at python.org Sun Sep 5 17:30:56 2004 From: barry at python.org (Barry Warsaw) Date: Sun Sep 5 17:31:00 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292:Simple String Substitutions In-Reply-To: <1f7befae040904213277ffa84@mail.gmail.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> <1094315559.8696.42.camel@geddy.wooz.org> <1f7befae040904213277ffa84@mail.gmail.com> Message-ID: <1094398256.8140.41.camel@geddy.wooz.org> On Sun, 2004-09-05 at 00:32, Tim Peters wrote: > You'll rarely hear me say this , but fiddling classes at class > creation time is exactly what metaclasses are for. For example, > suppose you said a Template subclass could define a class variable > `idpat`, containing a regexp matching that subclass's idea of "an > identifier". Very cool idea, thanks Tim! I like that much more than forcing users to specify the entire pattern. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040905/b6b0bbb3/attachment.pgp From fredrik at pythonware.com Sun Sep 5 17:58:14 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Sep 5 17:56:36 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <1094398009.8144.37.camel@geddy.wooz.org> Message-ID: Barry wrote: > Not true. I had a very specific reason for making Templates subclasses > of unicode. Read the Internationalization section of the PEP. this section? The implementation supports internationalization magic by keeping the original string value intact. In fact, all the work of the special substitution rules are implemented by overriding the __mod__() operator. However the string value of a Template (or SafeTemplate) is the string that was passed to its constructor. This approach allows a gettext-based internationalized program to use the Template instance as a lookup into the catalog; in fact gettext doesn't care that the catalog key is a Template. Because the value of the Template is the original $-string, translators also never need to use %-strings. The right thing will happen at run-time. I don't follow: if you're passing a template to gettext, do you really get a template back? if that's really the case, is being able to write "_(Template(x))" really that much of an advantage over writing "Template(_(x))" ? if that's really the case, you can still get the same effect from a template factory function, or a trivial modification of gettext. From pf_moore at yahoo.co.uk Sun Sep 5 18:13:57 2004 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Sun Sep 5 18:14:02 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292:Simple String Substitutions References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> <1094315559.8696.42.camel@geddy.wooz.org> <1f7befae040904213277ffa84@mail.gmail.com> Message-ID: Tim Peters writes: > [Paul Moore] >>> Would it be useful to factor out the "identifier syntax" bit of the >>> pattern? The "escaped" and "bogus" groups are less likely to need >>> changing than what constitutes an identifier. > > [Barry Warsaw] >> And if they did, you'd want to change them both at the same time. Do >> you have any ideas for an efficient, easily documented implementation? > > You'll rarely hear me say this , but fiddling classes at class > creation time is exactly what metaclasses are for. For example, > suppose you said a Template subclass could define a class variable > `idpat`, containing a regexp matching that subclass's idea of "an > identifier". > > Then we could define a metaclass once-and-for-all, like so: [...] That's exactly the type of implementation I was thinking about, but didn't have the knowledge to implement myself! Thanks for doing my thinking for me, Tim :-) Paul. -- The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair. -- Douglas Adams From mal at egenix.com Sun Sep 5 18:48:55 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Sun Sep 5 18:49:04 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation forPEP 292:SimpleStringSubstitutions In-Reply-To: References: <4138D622.6050807@egenix.com> <001f01c4920b$e0ac4ba0$e841fea9@oemcomputer> <4139AF8C.9040907@egenix.com> Message-ID: <413B4377.6060307@egenix.com> Fredrik Lundh wrote: > M.-A. Lemburg wrote: > > >>>Yes. Whatever MAL and Barry thinks, Python's current model is 8+8=8, >>>U+U=U, and 8+U=U for ascii U. That's an advantage, not a bug. >> >>Indeed, but I don't see how that's different from what the PEP >>is saying. > > > the current implementation is > > T(8) % 8 = U. > > which violates the 8+8=8 rule. T is a sub-class of Unicode, so you have: U % 8 = U which is just fine. >>>And when that time comes, storing everything as 32-bit characters is not the >>>right answer either. >> >>I'll leave that for the libc designers to decide :-) >> >>If you look at performance, there's not much difference between >>8-bit strings and Unicode, so the only argument against using >>Unicode for storing text data is memory usage. > > I used to make that argument, but these days, I no longer think that you can > talk about performance without taking memory usage into account. You always have to take both into account. I was just saying that 8-bit strings don't buy you much in terms of performance over Unicode these days, so the only argument against using Unicode would be doubled memory usage. Of course, this is a rather mild argument given the problems you face when trying to localize applications - which I see as the main use case for templates. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 05 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From edloper at gradient.cis.upenn.edu Sun Sep 5 20:17:31 2004 From: edloper at gradient.cis.upenn.edu (Edward Loper) Date: Sun Sep 5 20:17:21 2004 Subject: [Python-Dev] decorator support In-Reply-To: <20040905100006.8B0BB1E403A@bag.python.org> References: <20040905100006.8B0BB1E403A@bag.python.org> Message-ID: Raymond wrote: > So, it would be nice if there were some support for carrying forward > the > argspec to inform help(), calltips(), and inspect(). +1. I've already gotten one complaint about decorators confusing epydoc (since the user wasn't copying the function's __name__), and I expect to get many more. One thing that's always bothered me about classmethod and staticmethod is the fact that you can't inspect them. Compare that to instancemethod, which has the im_func property. One way to carry forward the argspec would be to recommend that decorators add a pointer back to the undecorated function (similar to instancemethod's im_func): newf.__doc__ = oldf.__doc__ # copy the docstring newf.__dict__.update(oldf.__dict__) # copy attributes newf.__name__ = oldf.__name__ # keep the name (new in Py2.4) newf.__undecorated__ = oldf # [XX NEW] ptr to undecorated obj Then tools like pydoc/epydoc could be written to check this property for the real argspec. (Note that there's precedent for showing the *undecorated* signature in tools like pydoc/epydoc, since that's what they both do with instancemethods, classmethods, and staticmethods). We would want to pick a standard name. __undecorated__ seems reasonable to me, but I'd be ok with other names. Martin v. Lowis wrote: > What were you thinking of? I could imagine a predefined class, such as > class copyfuncattrs: [...] This function could take care of adding the __undecorated__ attribute. But I'm not sure copyfuncattrs is the right name; note that it works just as well on class decorators (we did decide that we're allowing those, right?). Perhaps just copyattrs? Martin v. Lowis wrote: > Then, the question is where copyfuncattrs should live, and I would > object that to be yet another builtin. Having trouble parsing this sentence -- do you mean that you object to making it a builtin, or did you mean "I would expect that to be..."? Anthony Baxter wrote: > I think it's very likely that in 2.5 we'll have some sort of > 'decorators' module that captures these sorts of things. I > don't think it's likely we'll know enough about the various ins > and outs of decorators to want to put something in 2.4. I disagree. We may not know much about the ins and outs of decorators, but I feel very confident that I'll want to be able to inspect them, whatever they are. I would very much like for one of the following to happen *before* we release 2.4: - Add a prominent note in the docs on decorators that decorators should generally copy the original object's __doc__, __name__, and attributes, unless there's a good reason not to. (Also, create an __undecorated__ property, but I'll wait to see what others see about that idea first.) - Add copyfuncattrs to the standard library (or builtins), and add a prominent note to the docs that you should use it unless you have a good reason not to. (I can write up an appropriate patch for the docs if no one objects; for copyfuncattrs, we'd need to decide what to name it and where it should live, first.) -Edward From bac at OCF.Berkeley.EDU Sun Sep 5 21:16:50 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Sep 5 21:16:58 2004 Subject: [Python-Dev] Re: Making custom patterns for string.Template easier In-Reply-To: <413A4E32.2010104@ocf.berkeley.edu> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> <4138D777.3090009@ocf.berkeley.edu> <1094316858.8696.58.camel@geddy.wooz.org> <413A4E32.2010104@ocf.berkeley.edu> Message-ID: <413B6622.7030308@ocf.berkeley.edu> Brett C. wrote: > Barry Warsaw wrote: > >> On Fri, 2004-09-03 at 16:43, Brett C. wrote: >> >> >>> Anyway, I did this partially as an exercise so not a huge deal to me >>> if it doesn't make it in, so +0 from me for adding the functionality. >> >> >> >> So, where's the code?! :) >> > > Not on SF since the bloody thing is down! I will stick it up on a > patch, assign to you, and report the tracker # here as soon as it is > back up. > OK, http://www.python.org/sf/1022698 has the patch. But with Tim's metaclass solution I don't know how much people will care about this one. =) I guess the question becomes whether people will prefer to have to define a new class to override the regex or do it per instance (I suspect the former, although my code does the latter). If Tim's solution is used I would like to suggest that we also allow for overloading the bogus group since that was the last thing in contention about the implementation (sans the whole unicode subclass deal). -Brett From bac at OCF.Berkeley.EDU Sun Sep 5 22:02:27 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Sep 5 22:02:38 2004 Subject: [Python-Dev] random.py fixage In-Reply-To: <413AC56B.5060308@interlink.com.au> References: <413AC56B.5060308@interlink.com.au> Message-ID: <413B70D3.6010107@ocf.berkeley.edu> Anthony Baxter wrote: [SNIP] > I thought about cutting a 2.4a4, but decided against it. For one thing, > it'd mean Martin would need to cut more Windows installers, if we don't > want to end up with mismatched windows installers and tarballs. For > another, it's an _alpha_ release, and so it's not as vital as if it'd > been something like a 2.3.4 bug, or a 2.4 final. > Does this mean that CVS is open to major checkins again? I assume so, but Misc/NEWS has not been given a new section for 2.4b1 so I thought I would double-check before doing a major checkin. -Brett From martin at v.loewis.de Sun Sep 5 23:40:27 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 5 23:40:23 2004 Subject: [Python-Dev] decorator support In-Reply-To: References: <20040905100006.8B0BB1E403A@bag.python.org> Message-ID: <413B87CB.6080701@v.loewis.de> Edward Loper wrote: >> Then, the question is where copyfuncattrs should live, and I would >> object that to be yet another builtin. > > > Having trouble parsing this sentence -- do you mean that you object to > making it a builtin, or did you mean "I would expect that to be..."? The former. No more builtins. Regards, Martin From martin at v.loewis.de Sun Sep 5 23:41:48 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 5 23:41:43 2004 Subject: [Python-Dev] random.py fixage In-Reply-To: <413B70D3.6010107@ocf.berkeley.edu> References: <413AC56B.5060308@interlink.com.au> <413B70D3.6010107@ocf.berkeley.edu> Message-ID: <413B881C.20806@v.loewis.de> Brett C. wrote: > Does this mean that CVS is open to major checkins again? I assume so, > but Misc/NEWS has not been given a new section for 2.4b1 so I thought I > would double-check before doing a major checkin. I think the tradition is that this is created by whoever makes the first NEWS-worthy change. Regards, Martin From jcarlson at uci.edu Mon Sep 6 00:42:56 2004 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon Sep 6 00:50:08 2004 Subject: [Python-Dev] decorator support In-Reply-To: References: <20040905100006.8B0BB1E403A@bag.python.org> Message-ID: <20040905151944.90BC.JCARLSON@uci.edu> > newf.__doc__ = oldf.__doc__ # copy the docstring > newf.__dict__.update(oldf.__dict__) # copy attributes > newf.__name__ = oldf.__name__ # keep the name (new in Py2.4) > newf.__undecorated__ = oldf # [XX NEW] ptr to undecorated obj If one were to consider function decorators as a type of 'subclass' of a particular function, that is, post-decoration a function inherits from the pre-decoration version of the function, we could, with a single attribute (like __undecorated__ or __proxyfor__ as suggested) and proper introspection tools, do iterative attribute lookups similar to the way that it is already done with classes (without diamond-inheritance). That is, one wouldn't need to copy __doc__, __name__, and __dict__ from a decorated function, one would get access to them automatically. In current Python, what I am saying would be equivalent to... def foo(arg): pass t = foo foo = arbitrary_decorator(foo) foo.__proxyfor__ = t del t It would be some 'behind-the-scenes-magic', but I think it may be the right amount and kind of magic. - Josiah From adurdin at gmail.com Mon Sep 6 01:24:05 2004 From: adurdin at gmail.com (Andrew Durdin) Date: Mon Sep 6 01:24:16 2004 Subject: [Python-Dev] Alternative placeholder delimiters for PEP 292 In-Reply-To: <59e9fd3a040905152243efcb2@mail.gmail.com> References: <59e9fd3a040829231141cd3fe4@mail.gmail.com> <1094315616.8721.44.camel@geddy.wooz.org> <59e9fd3a0409041520114d0604@mail.gmail.com> <1094398089.8144.39.camel@geddy.wooz.org> <59e9fd3a040905152243efcb2@mail.gmail.com> Message-ID: <59e9fd3a0409051624705fe938@mail.gmail.com> (oops -- I accidentally sent this reply to python-list instead of python-dev) On Sun, 05 Sep 2004 11:28:09 -0400, Barry Warsaw wrote: > > It's explained in the section of the PEP titled "Why $ and Braces?". I quote that section entirely: """The BDFL said it best: The $ means "substitution" in so many languages besides Perl that I wonder where you've been. [...] We're copying this from the shell.""" That states that the $-substitution is common in other languages, and was copied from the shell. It does not provide a case for the merits of $ substitution (as opposed to using other delimiters), nor a rationale why copying from another language is a good idea in this instance. From eppstein at ics.uci.edu Mon Sep 6 03:04:19 2004 From: eppstein at ics.uci.edu (David Eppstein) Date: Mon Sep 6 03:04:25 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compiler mystery) References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> Message-ID: In article <1f7befae04090422024afaee58@mail.gmail.com>, Tim Peters wrote: > Some exceptions should never be suppressed unless named explicitly, > and a real bitch is that some user-defined exceptions can fit in that > category too. The ones that give me (and my employer) the most grief > are the tree of exceptions deriving from ZODB's ConflictError. > ConflictError is a serious thing: it essentially means the current > transaction cannot succeed, and the app should give up (and maybe > retry the current transaction from its start). Suppressing > ConflictError by accident-- even inside a hasattr() call! --can > grossly reduce efficiency, and has a long history too of provoking > subtle, catastrophic, database corruption bugs. > > I would like to see Python's exception hierarchy grow more > sophisticated in this respect. MemoryError, SystemExit, and > KeyboardInterrupt are things that should not be caught by "except > Exception:", neither by a bare "except:", nor by hasattr() or C-level > dict lookup. ZODB's ConflictError is another of that ilk. I'd like > to see "except Exception:" become synonymous with bare "except:", and > move the "dangerous exceptions" to subclass off a new branch of the > exception hierarchy. It could be that something like your patch is > the only practical way to make this work in the C implementation, so > I'm keen on it. It's not really the same subject, but the exception that gives me the most grief is StopIteration. I have to keep remembering to never call .next() without catching it; if I forget, I get bugs where some loop several levels back in the call tree mysteriously exits. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ From jhylton at gmail.com Mon Sep 6 03:42:44 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Mon Sep 6 03:42:50 2004 Subject: [Python-Dev] Dangerous exceptions (was Re: Another test_compiler mystery) In-Reply-To: <1f7befae04090422024afaee58@mail.gmail.com> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> Message-ID: On Sun, 5 Sep 2004 01:02:35 -0400, Tim Peters wrote: > I would like to see Python's exception hierarchy grow more > sophisticated in this respect. MemoryError, SystemExit, and > KeyboardInterrupt are things that should not be caught by "except > Exception:", neither by a bare "except:", nor by hasattr() or C-level > dict lookup. ZODB's ConflictError is another of that ilk. I'd like > to see "except Exception:" become synonymous with bare "except:", and > move the "dangerous exceptions" to subclass off a new branch of the > exception hierarchy. It could be that something like your patch is > the only practical way to make this work in the C implementation, so > I'm keen on it. The current exception hierarchy isn't too far from what you suggest. We just got the names wrong. That is, there is a base class, StandardException, that captures most exceptions other than MemoryError, SystemError, and KeyboardInterrupt. If we renamed that Exception, then we'd be 90% of the way there. You could also change your code, right now, to say "except StandardError:" and avoid the problem entirely. Make sure ConflictError does not inherit from StandardError, of course. And make sure you're happy that ImportError is not a StandardError either. I'm not sure what I think of the change to "except:" It's often the case that someone who has written "except:" really means "except Something:", but I expect that very often Something != StandardError and issubclass(Something, StandardError). In that case, the change doesn't really help them. The code is still wrong. Jeremy From xavier.combelle at free.fr Sat Sep 18 02:57:36 2004 From: xavier.combelle at free.fr (Xavier Combelle) Date: Mon Sep 6 08:46:53 2004 Subject: [Python-Dev] decorator support In-Reply-To: <413AD97D.9060301@v.loewis.de> References: <001f01c49306$41f86060$e841fea9@oemcomputer> <413AD97D.9060301@v.loewis.de> Message-ID: <414B8800.1080204@free.fr> > > This could be used to define a decorator > > @copyfuncattrs > def trace(f): > def do_trace(*args): > print "invoking", f.__name__, args > return f(*args) > return do_trace I am quite new in Python, and I do my first suggestions. (One time I should begin) If there is a general agreement about what to do to wrap a decorator, why no use the following syntax @decorator def trace(f): def do_trace(*args): print "invoking", f.__name__, args return f(*args) return do_trace I would prefer it because it explain what the method do instead of how it implement it. Generally speaking, I believe the decarators should be very expressive, just because even if it is not a part of the language, it's a kind of new syntax. For the same reason, Python should incorporate just a reduce set of decorators. From fredrik at pythonware.com Mon Sep 6 09:44:12 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Sep 6 09:42:22 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) References: <002d01c48083$9a89a6c0$5229c797@oemcomputer><1f7befae040812091754035bcb@mail.gmail.com><20040812185521.GA2277@vicky.ecs.soton.ac.uk><1f7befae04081212414007274f@mail.gmail.com><20040812204431.GA31884@vicky.ecs.soton.ac.uk><1f7befae0408151950361f0cb4@mail.gmail.com><20040816112916.GA19969@vicky.ecs.soton.ac.uk><1f7befae04090422024afaee58@mail.gmail.com> Message-ID: Jeremy Hylton wrote: > The current exception hierarchy isn't too far from what you suggest. > We just got the names wrong. That is, there is a base class, > StandardException, that captures most exceptions other than > MemoryError, SystemError, and KeyboardInterrupt. If we renamed that > Exception, then we'd be 90% of the way there. You could also change > your code, right now, to say "except StandardError:" and avoid the > problem entirely. when was that changed? Python 2.4a3 (#1, Sep 3 2004, 11:32:03) >>> StandardException Traceback (most recent call last): File "", line 1, in ? NameError: name 'StandardException' is not defined >>> StandardError >>> issubclass(MemoryError, StandardError) True >>> issubclass(KeyboardInterrupt, StandardError) True >>> issubclass(SystemExit, StandardError) False >>> issubclass(StopIteration, StandardError) False >>> issubclass(ImportError, StandardError) True From aahz at pythoncraft.com Mon Sep 6 19:42:12 2004 From: aahz at pythoncraft.com (Aahz) Date: Mon Sep 6 19:42:14 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation forPEP 292:SimpleStringSubstitutions In-Reply-To: <413B4377.6060307@egenix.com> References: <4138D622.6050807@egenix.com> <4139AF8C.9040907@egenix.com> <413B4377.6060307@egenix.com> Message-ID: <20040906174212.GA7423@panix.com> On Sun, Sep 05, 2004, M.-A. Lemburg wrote: > > You always have to take both into account. I was just saying that > 8-bit strings don't buy you much in terms of performance over Unicode > these days, so the only argument against using Unicode would be > doubled memory usage. Only if one sticks with the 2-byte Unicode implementation; you can compile Python with 4-byte Unicode (and I seem to recall that at least one standard distribution does exactly that). > Of course, this is a rather mild argument given the problems you face > when trying to localize applications - which I see as the main use > case for templates. If I18N is intended to be the primary/only use case of templates, then the PEP needs to be updated. It would also explain some of the disagreement about the implementation. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I saw `cout' being shifted "Hello world" times to the left and stopped right there." --Steve Gonedes From barry at python.org Mon Sep 6 19:57:54 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 6 19:57:57 2004 Subject: [Python-Dev] Re: Making custom patterns for string.Template easier In-Reply-To: <413B6622.7030308@ocf.berkeley.edu> References: <16E1010E4581B049ABC51D4975CEDB8803060F7E@UKDCX001.uk.int.atosorigin.com> <4138D777.3090009@ocf.berkeley.edu> <1094316858.8696.58.camel@geddy.wooz.org> <413A4E32.2010104@ocf.berkeley.edu> <413B6622.7030308@ocf.berkeley.edu> Message-ID: <1094493474.8144.109.camel@geddy.wooz.org> On Sun, 2004-09-05 at 15:16, Brett C. wrote: > If Tim's solution is used I would like to suggest that we also allow for > overloading the bogus group since that was the last thing in contention > about the implementation (sans the whole unicode subclass deal). +1. I'm going to pursue Tim's approach, but thanks for the patch! -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040906/925d67ff/attachment.pgp From martin at v.loewis.de Mon Sep 6 22:00:59 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon Sep 6 22:00:54 2004 Subject: [Python-Dev] decorator support In-Reply-To: <414B8800.1080204@free.fr> References: <001f01c49306$41f86060$e841fea9@oemcomputer> <413AD97D.9060301@v.loewis.de> <414B8800.1080204@free.fr> Message-ID: <413CC1FB.7060903@v.loewis.de> > If there is a general agreement about what to do to wrap a decorator, > why no use the following syntax > > @decorator > def trace(f): I've thought of that, and it is tempting. However, it does not give you any clue what the decorator actually *does*, that's why I don't like it. People would declare any decorator using @decorator, without thinking whether they actually need to make that declaration. By design, any function (or, any callable for that matter) can serve as a decorator, so having a declaration for it might actually add confusion. Regards, Martin From niemeyer at conectiva.com Mon Sep 6 22:24:21 2004 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Mon Sep 6 22:35:33 2004 Subject: [Python-Dev] Re: FW: [Python-checkins] python/dist/src/Lib sre_parse.py, 1.62, 1.63 In-Reply-To: <000b01c492d9$f9e66140$e841fea9@oemcomputer> References: <000b01c492d9$f9e66140$e841fea9@oemcomputer> Message-ID: <20040906202421.GA4909@burma.localdomain> > FYI, with today's checkins, test_re.py fails. Not here. Do you have any extra information about it? Is it failing for anyone else? -- Gustavo Niemeyer http://niemeyer.net From raymond.hettinger at verizon.net Mon Sep 6 22:44:13 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Mon Sep 6 22:45:04 2004 Subject: [Python-Dev] FW: FW: [Python-checkins] python/dist/src/Lib sre_parse.py, 1.62, 1.63 Message-ID: <002e01c49452$452b20e0$e841fea9@oemcomputer> > > FYI, with today's checkins, test_re.py fails. > > Not here. Do you have any extra information about it? > > Is it failing for anyone else? Another one of your checkins arrived later and fixed it. All is well. Raymond From xavier.combelle at free.fr Sun Sep 19 02:30:40 2004 From: xavier.combelle at free.fr (Xavier Combelle) Date: Tue Sep 7 02:25:43 2004 Subject: [Python-Dev] decorator support In-Reply-To: <413CC1FB.7060903@v.loewis.de> References: <001f01c49306$41f86060$e841fea9@oemcomputer> <413AD97D.9060301@v.loewis.de> <414B8800.1080204@free.fr> <413CC1FB.7060903@v.loewis.de> Message-ID: <414CD330.2000500@free.fr> > However, it does not give > you any clue what the decorator actually *does*, that's why I don't > like it. People would declare any decorator using @decorator, without > thinking whether they actually need to make that declaration. In my opinion, I don't care what is behind the scene. If @decorator syntax transform the function into a decorator, all is good. > By > design, any function (or, any callable for that matter) can serve > as a decorator, so having a declaration for it might actually add > confusion. > Not any function can act as an usefull decorator, in my opinion. It should do something useful around the concept of callabale object. That's why it seems for me more like a syntax sugar than an algorihm construct. > From Vladimir.Marangozov at t-online.de Sat Sep 4 04:33:02 2004 From: Vladimir.Marangozov at t-online.de (Vladimir Marangozov) Date: Tue Sep 7 03:29:22 2004 Subject: [Python-Dev] Re: Call for defense of @decorators Message-ID: <000001c49227$85069f00$6c02a8c0@ESII9100> Hi, Having read the PEP, most of the py-dev discussion and the EuroPython slides just now (sorry, maybe I'm late by a couple of months :-), the call for defense is a tough call for me. I am -0. There is no new functionality and actually my impression is that there is potential for code readability to get hurt. The argumentation for the change in the PEP seems to be weak. It is explicitly stated that successive @decos are equivalent to chained function calls without the temp vars at definition time. This is indeed a key point. No new functionality is actually added (or a good amount of typed characters saved for that matter). The staticmethod() readability problem is a code readability problem, and as such it can simply be addressed the "classic" way via comments. (BTW, has anyone suggested changing the @deco syntax to #deco? :-) It works for me <0.5 wink>. I do not necessarily perceive the foo = staticmethod(foo) transformation statements as evil too. They are explicit, normal def-time function calls, and we certainly don't need a new way for the job. We might introduce one indeed (I saw the syntax becoming a fact of life already), but we don't really need it to get the job done. IOW, everything the @decos can bring will remain doable the obvious way via function calls. And that's good enough. As for the arguments about future code tools knowing better about the annotated functions / classes via the @decos at definition time, well, it's common practice really (cf. #pragma and all sorts of special comments for compile-time processing) which does not involve new syntax. I perceive the extravaganza in doing one or the other as comparable. That said, looking at what's being done here, and all things being equal, there is certainly some merit in the idea of declaring the def-time caller list of a function before the function itself, but I personally fail to foresee its goodness for Python on most counts. These chained @decos look like a call stack crash dump to me anyway :-) I will happily embrace genexps and multi-line imports though. Thanks! Cheers, Vladimir From gvanrossum at gmail.com Tue Sep 7 03:46:59 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Sep 7 03:47:03 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compiler mystery) In-Reply-To: References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> Message-ID: > It's not really the same subject, but the exception that gives me the > most grief is StopIteration. I have to keep remembering to never call > .next() without catching it; if I forget, I get bugs where some loop > several levels back in the call tree mysteriously exits. Are you sure? This sounds like superstition to me, since that's not how loops work. Raising StopIteration in the middle of a loop does not break out of the loop -- only raising StopIteration from a next() breaks a loop. Or are you talking about nested next() calls? That's the only case where the behavior you are citing occurs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From jhylton at gmail.com Tue Sep 7 14:09:49 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Tue Sep 7 14:09:51 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> Message-ID: On Mon, 6 Sep 2004 09:44:12 +0200, Fredrik Lundh wrote: > Jeremy Hylton wrote: > > > The current exception hierarchy isn't too far from what you suggest. > > We just got the names wrong. That is, there is a base class, > > StandardException, that captures most exceptions other than > > MemoryError, SystemError, and KeyboardInterrupt. If we renamed that > > Exception, then we'd be 90% of the way there. You could also change > > your code, right now, to say "except StandardError:" and avoid the > > problem entirely. > > when was that changed? I misread the very long output of pydoc exceptions :-). I don't understand the hierarchy, either, or I would have noticed that the results don't make sense. Why isn't StopIteration a StandardError? I'll second Tim's suggestion that some errors -- like SystemError, MemoryError, and KeyboardInterrupt belong in a different category. I think it would be easier in principle to put them at a different place in the class hierarchy than to make them some special kind of uncatchable exception. Jeremy From jim at zope.com Tue Sep 7 15:43:53 2004 From: jim at zope.com (Jim Fulton) Date: Tue Sep 7 15:43:57 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> Message-ID: <413DBB19.40602@zope.com> Jeremy Hylton wrote: ... > I > think it would be easier in principle to put them at a different place > in the class hierarchy than to make them some special kind of > uncatchable exception. Note that we don't want uncatchable exceptions. Rather, we want exceptions that aren't caught by bare excepts or very broad excepts. In many cases, we want certain knowledgeable code to be able to catch these exceptions. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jhylton at gmail.com Tue Sep 7 16:07:25 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Tue Sep 7 16:07:28 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: <413DBB19.40602@zope.com> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> <413DBB19.40602@zope.com> Message-ID: On Tue, 07 Sep 2004 09:43:53 -0400, Jim Fulton wrote: > Jeremy Hylton wrote: > > ... > > > I > > think it would be easier in principle to put them at a different place > > in the class hierarchy than to make them some special kind of > > uncatchable exception. > > Note that we don't want uncatchable exceptions. Rather, we want > exceptions that aren't caught by bare excepts or very broad > excepts. In many cases, we want certain knowledgeable code to be able > to catch these exceptions. I agree with half the cause. There ought to be a decent organization of the exception class hierarchy so that exceptions like KeyboardInterrupt are in a special category. Then an "except NormalError:" would catch only the normal errors and not the special ones. I don't think bare exception should change it's meaning; you just shouldn't use it unless it's *really* what you mean. I think backwards compatibility is a really hard issue for any of these changes. It's probably hard to re-arrange the class hierarchy, but I don't know what practical solution there is to these problems that doesn't involve breaking some code. It's even harder to change bare except, but I don't think that's necessary. Jeremy From barry at python.org Tue Sep 7 16:13:51 2004 From: barry at python.org (Barry Warsaw) Date: Tue Sep 7 16:13:58 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: <413DBB19.40602@zope.com> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> <413DBB19.40602@zope.com> Message-ID: <1094566431.8341.25.camel@geddy.wooz.org> On Tue, 2004-09-07 at 09:43, Jim Fulton wrote: > Note that we don't want uncatchable exceptions. Rather, we want > exceptions that aren't caught by bare excepts or very broad > excepts. In many cases, we want certain knowledgeable code to be able > to catch these exceptions. I don't agree about having exceptions that pass bare excepts. A typical /valid/ use of bare excepts are in frameworks such as transaction processing, where you need to do some extra work when an exception occurs, then re-raise the original exception, e.g.: try: do_something() except: database.rollback() raise else: database.commit() Even exceptions like SystemError, MemoryError, or KeyboardInterrupt want to adhere to this simple idiom. Bare except should continue to catch all exceptions. Code that wanted to do otherwise should /not/ use a bare except, and +1 on some form of exception hierarchy restructuring that would make that clearer. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040907/3512c996/attachment.pgp From jhylton at gmail.com Tue Sep 7 16:26:34 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Tue Sep 7 16:26:43 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_compiler.py, 1.5, 1.6 test_decimal.py, 1.13, 1.14 In-Reply-To: References: Message-ID: On Sat, 04 Sep 2004 13:09:15 -0700, rhettinger@users.sourceforge.net wrote: > Update of /cvsroot/python/python/dist/src/Lib/test > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11509 > > Modified Files: > test_compiler.py test_decimal.py > Log Message: > Change the strategy for coping with time intensive tests from > "all or none" to "all or some". > > This provides much greater test coverage without eating much time. > It also makes it more likely that routine regression testing will > unearth bugs. If the time is really a problem, I'd rather select a certain set of files to test all the times. I'm interested in making sure the compiler is run against the widest possible range of language constructs, which plain old random testing isn't likely to do. I'd prefer, even, an option to enable random sampling for these tests. When I'm testing the compiler package, I really want to run the entire test, not a small part of it. Jeremy From tim.peters at gmail.com Tue Sep 7 16:43:25 2004 From: tim.peters at gmail.com (Tim Peters) Date: Tue Sep 7 16:43:39 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_compiler.py, 1.5, 1.6 test_decimal.py, 1.13, 1.14 In-Reply-To: References: Message-ID: <1f7befae04090707437ce4aff1@mail.gmail.com> >> wrote: >> Modified Files: >> test_compiler.py test_decimal.py >> Log Message: >> Change the strategy for coping with time intensive tests from >> "all or none" to "all or some". >> >> This provides much greater test coverage without eating much time. >> It also makes it more likely that routine regression testing will >> unearth bugs. [Jeremy Hylton] > If the time is really a problem, I'd rather select a certain set of > files to test all the times. I'm interested in making sure the > compiler is run against the widest possible range of language > constructs, which plain old random testing isn't likely to do. > > I'd prefer, even, an option to enable random sampling for these tests. > When I'm testing the compiler package, I really want to run the > entire test, not a small part of it. If you enable the compiler resource, the entire test is still run. Nothing changed there. What changed is what happens if you don't enable that resource: before, test_compiler was skipped entirely; after, test_compiler tries a small, random subset of files. From jim at zope.com Tue Sep 7 16:44:17 2004 From: jim at zope.com (Jim Fulton) Date: Tue Sep 7 16:44:21 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: <1094566431.8341.25.camel@geddy.wooz.org> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> <413DBB19.40602@zope.com> <1094566431.8341.25.camel@geddy.wooz.org> Message-ID: <413DC941.60600@zope.com> Barry Warsaw wrote: > On Tue, 2004-09-07 at 09:43, Jim Fulton wrote: > > >>Note that we don't want uncatchable exceptions. Rather, we want >>exceptions that aren't caught by bare excepts or very broad >>excepts. In many cases, we want certain knowledgeable code to be able >>to catch these exceptions. > > > I don't agree about having exceptions that pass bare excepts. A typical > /valid/ use of bare excepts are in frameworks such as transaction > processing, where you need to do some extra work when an exception > occurs, then re-raise the original exception, e.g.: > > try: > do_something() > except: > database.rollback() > raise > else: > database.commit() > > Even exceptions like SystemError, MemoryError, or KeyboardInterrupt want > to adhere to this simple idiom. Bare except should continue to catch > all exceptions. Code that wanted to do otherwise should /not/ use a > bare except, and +1 on some form of exception hierarchy restructuring > that would make that clearer. Fair enough. I also agree with jeremy's points re backward compatability. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jacobs at theopalgroup.com Tue Sep 7 17:11:57 2004 From: jacobs at theopalgroup.com (Kevin Jacobs) Date: Tue Sep 7 17:12:01 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: <1094566431.8341.25.camel@geddy.wooz.org> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> <413DBB19.40602@zope.com> <1094566431.8341.25.camel@geddy.wooz.org> Message-ID: <413DCFBD.7010306@theopalgroup.com> Barry Warsaw wrote: >I don't agree about having exceptions that pass bare excepts. A typical >/valid/ use of bare excepts are in frameworks such as transaction >processing, where you need to do some extra work when an exception >occurs, then re-raise the original exception, [...] > My policy for bare excepts is that without significant justification they _must_ either re-raise the original exception or raise another exception. There are very few circumstances where I have allowed my team to write pure bare excepts. I haven't checked, but a warning for violations of this rule may be a nice addition to pychecker or pylint. -Kevin From jhylton at gmail.com Tue Sep 7 17:12:40 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Tue Sep 7 17:12:46 2004 Subject: [Python-Dev] assert failure on obmalloc Message-ID: Failure running the test suite today with -u compiler enabled on Windows XP. test_logging Assertion failed: bp != NULL, file \code\python\dist\src\Objects\obmalloc.c, line 604 The debugger says the error is here: msvcr71d.dll!_assert(const char * expr=0x1e22bcc0, const char * filename=0x1e22bc94, unsigned int lineno=604) Line 306 C python24_d.dll!PyObject_Malloc(unsigned int nbytes=100) Line 604 + 0x1b C python24_d.dll!_PyObject_DebugMalloc(unsigned int nbytes=84) Line 1014 + 0x9 C python24_d.dll!PyThreadState_New(_is * interp=0x00951028) Line 136 + 0x7 C python24_d.dll!PyGILState_Ensure() Line 430 + 0xc C python24_d.dll!t_bootstrap(void * boot_raw=0x02801d48) Line 431 + 0x5 C python24_d.dll!bootstrap(void * call=0x04f0d264) Line 166 + 0x7 C msvcr71d.dll!_threadstart(void * ptd=0x026a2320) Line 196 + 0xd C I've been seeing this sort of error on-and-off for at least a year with my Python 2.3 install. It's the usual reason my spambayes popproxy dies. I can't recell seeing it before on Windows or while running the test suite. Jeremy From jhylton at gmail.com Tue Sep 7 17:14:40 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Tue Sep 7 17:14:42 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_compiler.py, 1.5, 1.6 test_decimal.py, 1.13, 1.14 In-Reply-To: <1f7befae04090707437ce4aff1@mail.gmail.com> References: <1f7befae04090707437ce4aff1@mail.gmail.com> Message-ID: On Tue, 7 Sep 2004 10:43:25 -0400, Tim Peters wrote: > If you enable the compiler resource, the entire test is still run. > Nothing changed there. What changed is what happens if you don't > enable that resource: before, test_compiler was skipped entirely; > after, test_compiler tries a small, random subset of files. Thanks for clarifying. The checkin comment obviously says that, but somehow I didn't get it when reading the diff for test_compiler. Jeremy From mwh at python.net Tue Sep 7 17:25:28 2004 From: mwh at python.net (Michael Hudson) Date: Tue Sep 7 17:25:29 2004 Subject: [Python-Dev] assert failure on obmalloc In-Reply-To: (Jeremy Hylton's message of "Tue, 7 Sep 2004 11:12:40 -0400") References: Message-ID: <2msm9u14pj.fsf@starship.python.net> Jeremy Hylton writes: > Failure running the test suite today with -u compiler enabled on Windows XP. > > test_logging > Assertion failed: bp != NULL, file > \code\python\dist\src\Objects\obmalloc.c, line 604 > > The debugger says the error is here: > msvcr71d.dll!_assert(const char * expr=0x1e22bcc0, const char * > filename=0x1e22bc94, unsigned int lineno=604) Line 306 C > python24_d.dll!PyObject_Malloc(unsigned int nbytes=100) Line 604 + 0x1b C > python24_d.dll!_PyObject_DebugMalloc(unsigned int nbytes=84) Line > 1014 + 0x9 C > python24_d.dll!PyThreadState_New(_is * interp=0x00951028) Line 136 + 0x7 C > python24_d.dll!PyGILState_Ensure() Line 430 + 0xc C > python24_d.dll!t_bootstrap(void * boot_raw=0x02801d48) Line 431 + 0x5 C > python24_d.dll!bootstrap(void * call=0x04f0d264) Line 166 + 0x7 C > msvcr71d.dll!_threadstart(void * ptd=0x026a2320) Line 196 + 0xd C > > I've been seeing this sort of error on-and-off for at least a year > with my Python 2.3 install. It's the usual reason my spambayes > popproxy dies. I can't recell seeing it before on Windows or while > running the test suite. Don't debug builds route all PyMem_ calls through PyMalloc? Doesn't pymalloc rely on the GIL being held when it's called? If both of these are true, there's an obvious problem here, because the call to PyMem_NEW in PyThreadState_New certainly isn't called with the GIL held... This would only be a problem in a debug build, though. Cheers, mwh -- Never meddle in the affairs of NT. It is slow to boot and quick to crash. -- Stephen Harris -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From eppstein at ics.uci.edu Tue Sep 7 17:31:41 2004 From: eppstein at ics.uci.edu (David Eppstein) Date: Tue Sep 7 17:31:58 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compiler mystery) References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> Message-ID: In article , Guido van Rossum wrote: > > It's not really the same subject, but the exception that gives me the > > most grief is StopIteration. I have to keep remembering to never call > > .next() without catching it; if I forget, I get bugs where some loop > > several levels back in the call tree mysteriously exits. > > Are you sure? This sounds like superstition to me, since that's not > how loops work. Raising StopIteration in the middle of a loop does not > break out of the loop -- only raising StopIteration from a next() > breaks a loop. > > Or are you talking about nested next() calls? That's the only case > where the behavior you are citing occurs. I don't remember, it could have been nested next()s. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ From tim.peters at gmail.com Tue Sep 7 17:44:48 2004 From: tim.peters at gmail.com (Tim Peters) Date: Tue Sep 7 17:44:51 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_compiler.py, 1.5, 1.6 test_decimal.py, 1.13, 1.14 In-Reply-To: References: <1f7befae04090707437ce4aff1@mail.gmail.com> Message-ID: <1f7befae04090708446feb59c7@mail.gmail.com> [Tim Peters] >> If you enable the compiler resource, the entire test is still run. >> Nothing changed there. What changed is what happens if you don't >> enable that resource: before, test_compiler was skipped entirely; >> after, test_compiler tries a small, random subset of files. [Jeremy Hylton] > Thanks for clarifying. The checkin comment obviously says that, but > somehow I didn't get it when reading the diff for test_compiler. Me neither! I misunderstand it until I brought up test_compiler in an editor, and then it was obvious. Let's blame it on diff . From barry at python.org Tue Sep 7 18:01:08 2004 From: barry at python.org (Barry Warsaw) Date: Tue Sep 7 18:01:13 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: <413DCFBD.7010306@theopalgroup.com> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <1f7befae040812091754035bcb@mail.gmail.com> <20040812185521.GA2277@vicky.ecs.soton.ac.uk> <1f7befae04081212414007274f@mail.gmail.com> <20040812204431.GA31884@vicky.ecs.soton.ac.uk> <1f7befae0408151950361f0cb4@mail.gmail.com> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> <413DBB19.40602@zope.com> <1094566431.8341.25.camel@geddy.wooz.org> <413DCFBD.7010306@theopalgroup.com> Message-ID: <1094572868.8342.43.camel@geddy.wooz.org> On Tue, 2004-09-07 at 11:11, Kevin Jacobs wrote: > My policy for bare excepts is that without significant justification > they _must_ either re-raise the original exception or raise another > exception. There are very few circumstances where I have allowed > my team to write pure bare excepts. I haven't checked, but a warning > for violations of this rule may be a nice addition to pychecker or pylint. The other case I've seen are for command-shell like loops, where you might print the exception in the bare except, but not re-raise the exception. Think about the main interactive interpreter loop. But yeah I agree, you want strong justification for any use of bare except. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040907/0ad56503/attachment.pgp From trentm at ActiveState.com Tue Sep 7 19:54:57 2004 From: trentm at ActiveState.com (Trent Mick) Date: Tue Sep 7 20:01:19 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules socketmodule.c, 1.304, 1.305 In-Reply-To: ; from tmick@users.sourceforge.net on Tue, Sep 07, 2004 at 10:48:29AM -0700 References: Message-ID: <20040907105457.C24597@ActiveState.com> The log message for that was supposed to be: Apply patch from http://python.org/sf/728330 to fix socket module compilation on Solaris 2.6, HP-UX 11, AIX 5.1 and (possibly) some IRIX versions. but "cvs" surprised me with its wonderful and clear UI for specifying the log message. Can the cvs logs be updated after the fact? Trent [tmick@users.sourceforge.net wrote] > Update of /cvsroot/python/python/dist/src/Modules > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10162 > > Modified Files: > socketmodule.c > Log Message: > > > Index: socketmodule.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Modules/socketmodule.c,v > retrieving revision 1.304 > retrieving revision 1.305 > diff -u -d -r1.304 -r1.305 > --- socketmodule.c 26 Aug 2004 00:51:16 -0000 1.304 > +++ socketmodule.c 7 Sep 2004 17:48:26 -0000 1.305 > @@ -257,7 +257,19 @@ > # define O_NONBLOCK O_NDELAY > #endif > > -#include "addrinfo.h" > +/* include Python's addrinfo.h unless it causes trouble */ > +#if defined(__sgi) && _COMPILER_VERSION>700 && defined(_SS_ALIGNSIZE) > + /* Do not include addinfo.h on some newer IRIX versions. > + * _SS_ALIGNSIZE is defined in sys/socket.h by 6.5.21, > + * for example, but not by 6.5.10. > + */ > +#elif defined(_MSC_VER) && _MSC_VER>1200 > + /* Do not include addrinfo.h for MSVC7 or greater. 'addrinfo' and > + * EAI_* constants are defined in (the already included) ws2tcpip.h. > + */ > +#else > +# include "addrinfo.h" > +#endif > > #ifndef HAVE_INET_PTON > int inet_pton(int af, const char *src, void *dst); > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://mail.python.org/mailman/listinfo/python-checkins -- Trent Mick TrentM@ActiveState.com From fdrake at acm.org Tue Sep 7 20:09:33 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Sep 7 20:09:47 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules socketmodule.c, 1.304, 1.305 In-Reply-To: <20040907105457.C24597@ActiveState.com> References: <20040907105457.C24597@ActiveState.com> Message-ID: <200409071409.33867.fdrake@acm.org> On Tuesday 07 September 2004 01:54 pm, Trent Mick wrote: > but "cvs" surprised me with its wonderful and clear UI for specifying > the log message. Can the cvs logs be updated after the fact? Yes; run "cvs -H admin" to learn how. -Fred -- Fred L. Drake, Jr. From bac at OCF.Berkeley.EDU Tue Sep 7 21:26:36 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Tue Sep 7 21:26:55 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules socketmodule.c, 1.304, 1.305 In-Reply-To: <200409071409.33867.fdrake@acm.org> References: <20040907105457.C24597@ActiveState.com> <200409071409.33867.fdrake@acm.org> Message-ID: <413E0B6C.8020103@ocf.berkeley.edu> Fred L. Drake, Jr. wrote: > On Tuesday 07 September 2004 01:54 pm, Trent Mick wrote: > > but "cvs" surprised me with its wonderful and clear UI for specifying > > the log message. Can the cvs logs be updated after the fact? > > Yes; run "cvs -H admin" to learn how. > You can also read the dev FAQ; it has an entry on just how to do this; http://www.python.org/dev/devfaq.html#how-can-i-fix-a-log-message-from-a-previous-checkin And just so other people know, if you comem across something you have to do with CVS that is not clear and think it warrants an entry in the FAQ, let me know. I am trying to make it rather thorough so that no developers have to look up the CVS docs again. -Brett From noamr at myrealbox.com Tue Sep 7 21:34:11 2004 From: noamr at myrealbox.com (Noam Raphael) Date: Tue Sep 7 21:35:28 2004 Subject: [Python-Dev] Missing arguments in RE functions Message-ID: <413E0D33.7030703@myrealbox.com> Hello, I've now finished teaching Python to a group of people, and regular expressions was a part of the course. I have encountered a few missing features (that is, optional arguments) in RE functions. I've checked, and it seems to me that they can be added very easily. The first missing feature is the "flags" argument in the findall and finditer functions. Searching for all occurances of an RE is, of course, a legitimate action, and I had to use (?s) in my RE, instead of adding re.DOTALL, which, to my opinion, is a lot clearer. The solution is simple: the functions sub, subn, split, findall and finditer all first compile the given RE, with the flags argument set to 0, and then run the appropriate method. As far as I can see, they could all get an additional optional argument, flags=0, and compile the RE with it. The second missing feature is the ability to specify start and end indices when doing matches and searches. This feature is available when using a compiled RE, but isn't mentioned at all in any of the straightforward functions (That's why I didn't even know it was possible, until I now checked - I naturally assumed that all the functionality is availabe when using the functions). I think these should be added to the functions match, search, findall and finditer. This feature isn't documented for the findall and finditer methods, but I checked, and it seems to work fine. (In case you are interested in the use case: the exercise was to parse an XML file. It was done by first matching the beginning of a tag, then trying to match attributes, and so on - each match starts from where the previous successfull match ended. Since I didn't know of this feature, it was done by replacing the original string with a substring after every match, which is terribly unefficient.) If you approve, I can create a patch in a few minutes and send it. Have a good day, Noam Raphael From fdrake at acm.org Tue Sep 7 21:46:26 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Sep 7 21:46:54 2004 Subject: [Python-Dev] Re: [Python-checkins] =?iso-8859-1?q?python=2Fdist=2Fsrc=2FModules socketmodule=2Ec?=, 1.304, 1.305 In-Reply-To: <413E0B6C.8020103@ocf.berkeley.edu> References: <200409071409.33867.fdrake@acm.org> <413E0B6C.8020103@ocf.berkeley.edu> Message-ID: <200409071546.26539.fdrake@acm.org> On Tuesday 07 September 2004 03:26 pm, Brett C. wrote: > I am trying to make it rather thorough so that no > developers have to look up the CVS docs again. What's the point of this? Reading the CVS docs is good, if only because it makes one realize how fragile the whole thing is. -Fred -- Fred L. Drake, Jr. From bac at OCF.Berkeley.EDU Tue Sep 7 21:56:05 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Tue Sep 7 21:56:19 2004 Subject: [Python-Dev] Re: [Python-checkins] =?iso-8859-1?q?python=2Fdist=2Fsrc=2FModules socketmodule=2Ec?=, 1.304, 1.305 In-Reply-To: <200409071546.26539.fdrake@acm.org> References: <200409071409.33867.fdrake@acm.org> <413E0B6C.8020103@ocf.berkeley.edu> <200409071546.26539.fdrake@acm.org> Message-ID: <413E1255.9050102@ocf.berkeley.edu> Fred L. Drake, Jr. wrote: > On Tuesday 07 September 2004 03:26 pm, Brett C. wrote: > > I am trying to make it rather thorough so that no > > developers have to look up the CVS docs again. > > What's the point of this? Reading the CVS docs is good, if only because it > makes one realize how fragile the whole thing is. > Well, I know when I was starting out CVS was a hurdle to deal with and that what seemed like should be a simple thing was not so simple to extrapolate from the docs. Figured there was no need for anyone else to suffer as well. -Brett From aahz at pythoncraft.com Tue Sep 7 22:15:42 2004 From: aahz at pythoncraft.com (Aahz) Date: Tue Sep 7 22:15:44 2004 Subject: [Python-Dev] Missing arguments in RE functions In-Reply-To: <413E0D33.7030703@myrealbox.com> References: <413E0D33.7030703@myrealbox.com> Message-ID: <20040907201541.GA1083@panix.com> On Tue, Sep 07, 2004, Noam Raphael wrote: > > If you approve, I can create a patch in a few minutes and send it. Go ahead and create the patch -- it's unlikely that you'll get formal approval without it. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I saw `cout' being shifted "Hello world" times to the left and stopped right there." --Steve Gonedes From aahz at pythoncraft.com Tue Sep 7 22:17:20 2004 From: aahz at pythoncraft.com (Aahz) Date: Tue Sep 7 22:17:22 2004 Subject: [Python-Dev] Re: =?us-ascii?Q?=5BPytho?= =?us-ascii?Q?n-checkins=5D_=3D=3Fiso-8859-1=3Fq=3Fpython=3D2Fdist=3D2Fsrc?= =?us-ascii?Q?=3D2FModules?= socketmodule=2Ec?=, 1.304, 1.305 In-Reply-To: <200409071546.26539.fdrake@acm.org> References: <200409071409.33867.fdrake@acm.org> <413E0B6C.8020103@ocf.berkeley.edu> <200409071546.26539.fdrake@acm.org> Message-ID: <20040907201720.GB1083@panix.com> On Tue, Sep 07, 2004, Fred L. Drake, Jr. wrote: > On Tuesday 07 September 2004 03:26 pm, Brett C. wrote: >> >> I am trying to make it rather thorough so that no >> developers have to look up the CVS docs again. > > What's the point of this? Reading the CVS docs is good, if only because it > makes one realize how fragile the whole thing is. Heh. My take on it is that we should minimize the learning curve for new developers whenever possible -- I don't think we should require everyone to become CVS experts. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I saw `cout' being shifted "Hello world" times to the left and stopped right there." --Steve Gonedes From fdrake at acm.org Tue Sep 7 22:23:05 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Sep 7 22:23:16 2004 Subject: [Python-Dev] Re: [Python-checkins] =?iso-8859-1?q?=3D=3Fiso-8859-1=3Fq=3Fpython=3D2Fdist=3D2Fsrc=3D2FModule?= =?iso-8859-1?q?s socketmodule=3D2Ec=3F=3D?=, 1.304, 1.305 In-Reply-To: <20040907201720.GB1083@panix.com> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> Message-ID: <200409071623.05499.fdrake@acm.org> On Tuesday 07 September 2004 04:17 pm, Aahz wrote: > Heh. My take on it is that we should minimize the learning curve for > new developers whenever possible -- I don't think we should require > everyone to become CVS experts. Agreed. But we do want them to scream out "Why are we still using this piece of junk???" Seriously, I'm not slamming CVS for being evil or bad; it has served well. But there are better options now. (I'm really looking forward to Subversion 1.1; all the advantage of Subversion, without the disadvantage of Berkeley DB...!) -Fred -- Fred L. Drake, Jr. From martin at v.loewis.de Tue Sep 7 22:51:28 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Sep 7 22:51:22 2004 Subject: [Python-Dev] Subversion In-Reply-To: <200409071623.05499.fdrake@acm.org> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> Message-ID: <413E1F50.90709@v.loewis.de> Fred L. Drake, Jr. wrote: > (I'm really looking forward to Subversion 1.1; all the advantage of > Subversion, without the disadvantage of Berkeley DB...!) What *is* the disadvantage of Berkeley DB that the file storage of svn 1.1 will remove? One of the things that you could do in CVS that you can't easily do because of the DB approach is to ultimately remove a file, along with its entire history (by removing the ,v file). Along with that goes the option of moving part of a repository into another repository. Neither is either with svn because of the DB thing. However, I understand that it won't become simpler with the file storage, either, as the files being created don't directly correlate to files of the versions file system. So you still can't delete a single file with all of its history, nor can you move just a part of the repository. Of course, you can do both with svndump|svnfilter|svnload, so that is not a serious obstacle. One problem that I had with svn+bsddb is that the DB files are tied to a DB version, so you can't easily upgrade to a newer DB version (without dump/load cycle). But a) don't do that, then, and b) for the last 3 or so bsddb updates (since 4.0), it wasn't that bad - the SVN repositories would have continued to work, as the bsddb format didn't change in a relevant way. Regards, Martin From exarkun at divmod.com Tue Sep 7 22:59:17 2004 From: exarkun at divmod.com (Jp Calderone) Date: Tue Sep 7 22:59:22 2004 Subject: [Python-Dev] Subversion In-Reply-To: <413E1F50.90709@v.loewis.de> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> Message-ID: <413E2125.1020206@divmod.com> (This is somewhat off-topic for python-dev, so I won't post more than = one message unless people really want me to) Martin v. L=F6wis wrote: > Fred L. Drake, Jr. wrote: > = >> (I'm really looking forward to Subversion 1.1; all the advantage of = >> Subversion, without the disadvantage of Berkeley DB...!) > = > = > What *is* the disadvantage of Berkeley DB that the file storage of > svn 1.1 will remove? One of the things that you could do in CVS that > you can't easily do because of the DB approach is to ultimately > remove a file, along with its entire history (by removing the ,v file). > Along with that goes the option of moving part of a repository into > another repository. Files are, by and large, big blobs of opaque bytes. They don't = belong in a database. The subversion developers made a mistake by = putting *everything* into bdbs. They should have put metadata into the = database and files into the filesystem. I doubt this is the = disadvantage perceived by the svn user community at large (they mostly = complain about umask problems), but it is the real problem with using = bdb in the way svn uses it. > [snip] > = > One problem that I had with svn+bsddb is that the DB files are > tied to a DB version, so you can't easily upgrade to a newer DB > version (without dump/load cycle). But This is half-true. You don't have to dump/load to move between = incompatible database versions, you just have to run the = sleepycat-supplied upgrade tool. Not to say that dump/load doesn't work... Jp From bob at redivi.com Tue Sep 7 23:02:55 2004 From: bob at redivi.com (Bob Ippolito) Date: Tue Sep 7 23:03:26 2004 Subject: [Python-Dev] Subversion In-Reply-To: <413E1F50.90709@v.loewis.de> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> Message-ID: <4A27E67A-0111-11D9-9892-000A95686CD8@redivi.com> On Sep 7, 2004, at 4:51 PM, Martin v. L?wis wrote: > Fred L. Drake, Jr. wrote: >> (I'm really looking forward to Subversion 1.1; all the advantage of >> Subversion, without the disadvantage of Berkeley DB...!) > > What *is* the disadvantage of Berkeley DB that the file storage of > svn 1.1 will remove? One of the things that you could do in CVS that > you can't easily do because of the DB approach is to ultimately > remove a file, along with its entire history (by removing the ,v file). > Along with that goes the option of moving part of a repository into > another repository. The biggest complaint I've heard, and I believe the reason for the optional alternative database implementation in 1.1, is that the Berkeley DB must be on a single local volume. -bob From fdrake at acm.org Tue Sep 7 23:06:59 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Sep 7 23:07:07 2004 Subject: [Python-Dev] Subversion In-Reply-To: <413E1F50.90709@v.loewis.de> References: <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> Message-ID: <200409071706.59370.fdrake@acm.org> On Tuesday 07 September 2004 04:51 pm, Martin v. L?wis wrote: > What *is* the disadvantage of Berkeley DB that the file storage of > svn 1.1 will remove? One of the things that you could do in CVS that > you can't easily do because of the DB approach is to ultimately > remove a file, along with its entire history (by removing the ,v file). > Along with that goes the option of moving part of a repository into > another repository. I'm not concerned with people deliberately hosing their repositories; they shouldn't do that. The advantage I see is that we won't have to deal with hosed databases having to be "recovered" to make the Subversion server useful again. I certainly agree with Jp's comments about how databases are used, but as long as the server is working, that's less of an issue for me. > Neither is either with svn because of the DB thing. However, I > understand that it won't become simpler with the file storage, either, > as the files being created don't directly correlate to files of the > versions file system. So you still can't delete a single file with all > of its history, nor can you move just a part of the repository. Again, that's not my desire. I'm happy to not manipulate the content of the repository directly. -Fred -- Fred L. Drake, Jr. From barry at python.org Tue Sep 7 23:30:00 2004 From: barry at python.org (Barry Warsaw) Date: Tue Sep 7 23:30:05 2004 Subject: [Python-Dev] Subversion In-Reply-To: <413E2125.1020206@divmod.com> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> <413E2125.1020206@divmod.com> Message-ID: <1094592600.8339.93.camel@geddy.wooz.org> On Tue, 2004-09-07 at 16:59, Jp Calderone wrote: > Files are, by and large, big blobs of opaque bytes. They don't = > belong in a database. The subversion developers made a mistake by = > putting *everything* into bdbs. They should have put metadata into the = > database and files into the filesystem. Right, and Berkeley splits big blobs up into overflow pages, which aren't as efficient as if all the data for a key fits in on one page. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040907/756f9d2e/attachment.pgp From barry at python.org Tue Sep 7 23:32:53 2004 From: barry at python.org (Barry Warsaw) Date: Tue Sep 7 23:32:58 2004 Subject: [Python-Dev] Subversion In-Reply-To: <4A27E67A-0111-11D9-9892-000A95686CD8@redivi.com> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> <4A27E67A-0111-11D9-9892-000A95686CD8@redivi.com> Message-ID: <1094592773.8346.97.camel@geddy.wooz.org> On Tue, 2004-09-07 at 17:02, Bob Ippolito wrote: > The biggest complaint I've heard, and I believe the reason for the > optional alternative database implementation in 1.1, is that the > Berkeley DB must be on a single local volume. Having nothing to do with svn's choice of bdb, my biggest complaint about subversion is its lack of a mature merging algorithm. Still, it's not worse than cvs and there are plenty of other advantages. We also had some horrendous stability problems early on, but it looks like the newer versions are pretty stable. We haven't lost any data or seen mysterious re-appearances in months . -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040907/b1c38b9c/attachment.pgp From pyth at devel.trillke.net Tue Sep 7 23:50:53 2004 From: pyth at devel.trillke.net (Holger Krekel) Date: Tue Sep 7 23:51:09 2004 Subject: [Python-Dev] Subversion In-Reply-To: <4A27E67A-0111-11D9-9892-000A95686CD8@redivi.com> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> <4A27E67A-0111-11D9-9892-000A95686CD8@redivi.com> Message-ID: <20040907215053.GF5208@solar.trillke> Bob Ippolito wrote: > > On Sep 7, 2004, at 4:51 PM, Martin v. L?wis wrote: > > >Fred L. Drake, Jr. wrote: > >>(I'm really looking forward to Subversion 1.1; all the advantage of > >>Subversion, without the disadvantage of Berkeley DB...!) > > > >What *is* the disadvantage of Berkeley DB that the file storage of > >svn 1.1 will remove? One of the things that you could do in CVS that > >you can't easily do because of the DB approach is to ultimately > >remove a file, along with its entire history (by removing the ,v file). > >Along with that goes the option of moving part of a repository into > >another repository. > > The biggest complaint I've heard, and I believe the reason for the > optional alternative database implementation in 1.1, is that the > Berkeley DB must be on a single local volume. the primary reason was more to be able to have a local svn repository on SMB or NFS network storage. Apparently there are a lot of commercial environments who used cvs this way and the svn developers answered to this pressure with the new "FSFS" backend. See http://subversion.tigris.org/svn_1.1_releasenotes.html for more info. cheers, holger From raymond.hettinger at verizon.net Wed Sep 8 00:01:30 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Wed Sep 8 00:02:20 2004 Subject: [Python-Dev] Missing arguments in RE functions In-Reply-To: <413E0D33.7030703@myrealbox.com> Message-ID: <006301c49526$3b46ad40$e841fea9@oemcomputer> > The first missing feature is the "flags" argument in the findall and > finditer functions. . . . > The second missing feature is the ability to specify start and end > indices when doing matches and searches. +1 I've need both of these more than once. Are you up to crafting the code? Raymond From noamr at myrealbox.com Wed Sep 8 00:23:35 2004 From: noamr at myrealbox.com (Noam Raphael) Date: Wed Sep 8 00:24:54 2004 Subject: [Python-Dev] Missing arguments in RE functions In-Reply-To: <006301c49526$3b46ad40$e841fea9@oemcomputer> References: <006301c49526$3b46ad40$e841fea9@oemcomputer> Message-ID: <413E34E7.1030409@myrealbox.com> Raymond Hettinger wrote: >+1 > >I've need both of these more than once. > >Are you up to crafting the code? > > > Thanks! Are these diffs ok? Noam -------------- next part -------------- *** /home/noam/python/old/sre.py Tue Sep 7 23:36:36 2004 --- sre.py Tue Sep 7 23:40:53 2004 *************** *** 123,147 **** # -------------------------------------------------------------------- # public interface ! def match(pattern, string, flags=0): """Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.""" ! return _compile(pattern, flags).match(string) ! def search(pattern, string, flags=0): """Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.""" ! return _compile(pattern, flags).search(string) ! def sub(pattern, repl, string, count=0): """Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a callable, it's passed the match object and must return a replacement string to be used.""" ! return _compile(pattern, 0).sub(repl, string, count) ! def subn(pattern, repl, string, count=0): """Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in the source --- 123,147 ---- # -------------------------------------------------------------------- # public interface ! def match(pattern, string, flags=0, pos=0, endpos=sys.maxint): """Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.""" ! return _compile(pattern, flags).match(string, pos, endpos) ! def search(pattern, string, flags=0, pos=0, endpos=sys.maxint): """Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.""" ! return _compile(pattern, flags).search(string, pos, endpos) ! def sub(pattern, repl, string, count=0, flags=0): """Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a callable, it's passed the match object and must return a replacement string to be used.""" ! return _compile(pattern, flags).sub(repl, string, count) ! def subn(pattern, repl, string, count=0, flags=0): """Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in the source *************** *** 149,162 **** substitutions that were made. repl can be either a string or a callable; if a callable, it's passed the match object and must return a replacement string to be used.""" ! return _compile(pattern, 0).subn(repl, string, count) ! def split(pattern, string, maxsplit=0): """Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.""" ! return _compile(pattern, 0).split(string, maxsplit) ! def findall(pattern, string): """Return a list of all non-overlapping matches in the string. If one or more groups are present in the pattern, return a --- 149,162 ---- substitutions that were made. repl can be either a string or a callable; if a callable, it's passed the match object and must return a replacement string to be used.""" ! return _compile(pattern, flags).subn(repl, string, count) ! def split(pattern, string, maxsplit=0, flags=0): """Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.""" ! return _compile(pattern, flags).split(string, maxsplit) ! def findall(pattern, string, flags=0, pos=0, endpos=sys.maxint): """Return a list of all non-overlapping matches in the string. If one or more groups are present in the pattern, return a *************** *** 164,179 **** has more than one group. Empty matches are included in the result.""" ! return _compile(pattern, 0).findall(string) if sys.hexversion >= 0x02020000: __all__.append("finditer") ! def finditer(pattern, string): """Return an iterator over all non-overlapping matches in the string. For each match, the iterator returns a match object. Empty matches are included in the result.""" ! return _compile(pattern, 0).finditer(string) def compile(pattern, flags=0): "Compile a regular expression pattern, returning a pattern object." --- 164,179 ---- has more than one group. Empty matches are included in the result.""" ! return _compile(pattern, flags).findall(string, pos, endpos) if sys.hexversion >= 0x02020000: __all__.append("finditer") ! def finditer(pattern, string, flags=0, pos=0, endpos=sys.maxint): """Return an iterator over all non-overlapping matches in the string. For each match, the iterator returns a match object. Empty matches are included in the result.""" ! return _compile(pattern, flags).finditer(string, pos, endpos) def compile(pattern, flags=0): "Compile a regular expression pattern, returning a pattern object." -------------- next part -------------- *** libre.tex Wed Sep 8 01:09:55 2004 --- /home/noam/python/old/libre.tex Wed Sep 8 01:04:53 2004 *************** *** 508,530 **** \end{datadesc} ! \begin{funcdesc}{search}{pattern, string\optional{, ! flags\optional{, pos\optional{, endpos}}}} Scan through \var{string} looking for a location where the regular expression \var{pattern} produces a match, and return a corresponding \class{MatchObject} instance. Return \code{None} if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. - - The optional \var{pos} and \var{endpos} parameters have the same - meaning as for the \function{match()} function. - \versionchanged[Added the optional parameters - \var{pos} and \var{endpos}]{2.4} \end{funcdesc} ! \begin{funcdesc}{match}{pattern, string\optional{, flags\optional{, ! pos\optional{, endpos}}}} If zero or more characters at the beginning of \var{string} match the regular expression \var{pattern}, return a corresponding \class{MatchObject} instance. Return \code{None} if the string does not --- 508,523 ---- \end{datadesc} ! \begin{funcdesc}{search}{pattern, string\optional{, flags}} Scan through \var{string} looking for a location where the regular expression \var{pattern} produces a match, and return a corresponding \class{MatchObject} instance. Return \code{None} if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. \end{funcdesc} ! \begin{funcdesc}{match}{pattern, string\optional{, flags}} If zero or more characters at the beginning of \var{string} match the regular expression \var{pattern}, return a corresponding \class{MatchObject} instance. Return \code{None} if the string does not *************** *** 533,561 **** \note{If you want to locate a match anywhere in \var{string}, use \method{search()} instead.} - - The optional parameter \var{pos} gives an index in the string - where the search is to start; it defaults to \code{0}. This is not - completely equivalent to slicing the string; the - \code{'\textasciicircum'} pattern - character matches at the real beginning of the string and at positions - just after a newline, but not necessarily at the index where the search - is to start. - - The optional parameter \var{endpos} limits how far the string will - be searched; it will be as if the string is \var{endpos} characters - long, so only the characters from \var{pos} to \code{\var{endpos} - - 1} will be searched for a match. If \var{endpos} is less than - \var{pos}, no match will be found, otherwise, - \code{re.match(\var{string}, \var{pos}=0, \var{endpos}=50)} is - equivalent to \code{re.match(\var{string}[:50], \var{pos}=0)}. - - \versionchanged[Added the optional parameters - \var{pos} and \var{endpos}]{2.4} \end{funcdesc} ! \begin{funcdesc}{split}{pattern, string\optional{, ! maxsplit\code{ = 0}\optional{, flags}}} Split \var{string} by the occurrences of \var{pattern}. If capturing parentheses are used in \var{pattern}, then the text of all groups in the pattern are also returned as part of the resulting list. --- 526,534 ---- \note{If you want to locate a match anywhere in \var{string}, use \method{search()} instead.} \end{funcdesc} ! \begin{funcdesc}{split}{pattern, string\optional{, maxsplit\code{ = 0}}} Split \var{string} by the occurrences of \var{pattern}. If capturing parentheses are used in \var{pattern}, then the text of all groups in the pattern are also returned as part of the resulting list. *************** *** 576,613 **** This function combines and extends the functionality of the old \function{regsub.split()} and \function{regsub.splitx()}. - \versionchanged[Added the optional parameter \var{flags}]{2.4} \end{funcdesc} ! \begin{funcdesc}{findall}{pattern, string\optional{, ! flags\optional{, pos\optional{, endpos}}}} Return a list of all non-overlapping matches of \var{pattern} in \var{string}. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match. - - The optional parameters \var{pos} and \var{endpos} limit the search to - a part of the string, just like they do in the \function{match()} - function. \versionadded{1.5.2} - \versionchanged[Added the optional parameters - \var{flags}, \var{pos} and \var{endpos}]{2.4} \end{funcdesc} ! \begin{funcdesc}{finditer}{pattern, string\optional{, ! flags\optional{, pos\optional{, endpos}}}} Return an iterator over all non-overlapping matches for the RE \var{pattern} in \var{string}. For each match, the iterator returns a match object. Empty matches are included in the result unless they touch the beginning of another match. \versionadded{2.2} - \versionchanged[Added the optional parameters - \var{flags}, \var{pos} and \var{endpos}]{2.4} \end{funcdesc} ! \begin{funcdesc}{sub}{pattern, repl, string\optional{, ! count\optional{, flags}}} Return the string obtained by replacing the leftmost non-overlapping occurrences of \var{pattern} in \var{string} by the replacement \var{repl}. If the pattern isn't found, \var{string} is returned --- 549,574 ---- This function combines and extends the functionality of the old \function{regsub.split()} and \function{regsub.splitx()}. \end{funcdesc} ! \begin{funcdesc}{findall}{pattern, string} Return a list of all non-overlapping matches of \var{pattern} in \var{string}. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match. \versionadded{1.5.2} \end{funcdesc} ! \begin{funcdesc}{finditer}{pattern, string} Return an iterator over all non-overlapping matches for the RE \var{pattern} in \var{string}. For each match, the iterator returns a match object. Empty matches are included in the result unless they touch the beginning of another match. \versionadded{2.2} \end{funcdesc} ! \begin{funcdesc}{sub}{pattern, repl, string\optional{, count}} Return the string obtained by replacing the leftmost non-overlapping occurrences of \var{pattern} in \var{string} by the replacement \var{repl}. If the pattern isn't found, \var{string} is returned *************** *** 660,674 **** group 2 followed by the literal character \character{0}. The backreference \samp{\e g<0>} substitutes in the entire substring matched by the RE. - - \versionchanged[Added the optional parameter \var{flags}]{2.4} \end{funcdesc} ! \begin{funcdesc}{subn}{pattern, repl, string\optional{, ! count\optional{, flags}}} Perform the same operation as \function{sub()}, but return a tuple \code{(\var{new_string}, \var{number_of_subs_made})}. - \versionchanged[Added the optional parameter \var{flags}]{2.4} \end{funcdesc} \begin{funcdesc}{escape}{string} --- 621,631 ---- group 2 followed by the literal character \character{0}. The backreference \samp{\e g<0>} substitutes in the entire substring matched by the RE. \end{funcdesc} ! \begin{funcdesc}{subn}{pattern, repl, string\optional{, count}} Perform the same operation as \function{sub()}, but return a tuple \code{(\var{new_string}, \var{number_of_subs_made})}. \end{funcdesc} \begin{funcdesc}{escape}{string} *************** *** 737,749 **** Identical to the \function{split()} function, using the compiled pattern. \end{methoddesc} ! \begin{methoddesc}[RegexObject]{findall}{string\optional{, ! pos\optional{, endpos}}} Identical to the \function{findall()} function, using the compiled pattern. \end{methoddesc} ! \begin{methoddesc}[RegexObject]{finditer}{string\optional{, ! pos\optional{, endpos}}} Identical to the \function{finditer()} function, using the compiled pattern. \end{methoddesc} --- 694,704 ---- Identical to the \function{split()} function, using the compiled pattern. \end{methoddesc} ! \begin{methoddesc}[RegexObject]{findall}{string} Identical to the \function{findall()} function, using the compiled pattern. \end{methoddesc} ! \begin{methoddesc}[RegexObject]{finditer}{string} Identical to the \function{finditer()} function, using the compiled pattern. \end{methoddesc} From greg at electricrain.com Wed Sep 8 01:37:18 2004 From: greg at electricrain.com (Gregory P. Smith) Date: Wed Sep 8 01:37:31 2004 Subject: [Python-Dev] Subversion, Codeville In-Reply-To: <200409071706.59370.fdrake@acm.org> References: <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> <200409071706.59370.fdrake@acm.org> Message-ID: <20040907233718.GC10869@zot.electricrain.com> On Tue, Sep 07, 2004 at 05:06:59PM -0400, Fred L. Drake, Jr. wrote: > On Tuesday 07 September 2004 04:51 pm, Martin v. L?wis wrote: > > What *is* the disadvantage of Berkeley DB that the file storage of > > svn 1.1 will remove? One of the things that you could do in CVS that > > you can't easily do because of the DB approach is to ultimately > > remove a file, along with its entire history (by removing the ,v file). > > Along with that goes the option of moving part of a repository into > > another repository. > > I'm not concerned with people deliberately hosing their repositories; they > shouldn't do that. > > The advantage I see is that we won't have to deal with hosed databases having > to be "recovered" to make the Subversion server useful again. > > I certainly agree with Jp's comments about how databases are used, but as long > as the server is working, that's less of an issue for me. agreed, if someone else makes it work i don't care so much how. I was pretty shocked at svn's use of berkeleydb for the reasons others have already hashed out here. to fuel a fire: given that its written in python i'd suggest codeville as a cvs replacement. Its in very early development but i'll bet by the time anyone actually bothers to take the plunge away from tried-and-true cvs rather than just talk about it, it won't be. i expect to be shot down for this suggestion. ;) > > Neither is either with svn because of the DB thing. However, I > > understand that it won't become simpler with the file storage, either, > > as the files being created don't directly correlate to files of the > > versions file system. So you still can't delete a single file with all > > of its history, nor can you move just a part of the repository. There should -never- be a reason to remove the entire proof of a files past existence from a repository (unless you live in 1984). disk space is effectively free. -g From raymond.hettinger at verizon.net Wed Sep 8 01:50:44 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Wed Sep 8 01:52:30 2004 Subject: [Python-Dev] Missing arguments in RE functions In-Reply-To: <006301c49526$3b46ad40$e841fea9@oemcomputer> Message-ID: <000f01c49535$9ec914c0$e841fea9@oemcomputer> > > The first missing feature is the "flags" argument in the findall and > > finditer functions. > . . . > > The second missing feature is the ability to specify start and end > > indices when doing matches and searches. > > +1 > > I've need both of these more than once. > > Are you up to crafting the code? Noam has posted a patch: www.python.org/sf/1024041 After adding the unittests, does anyone see any reason that this should not be in Py2.4? Raymond Hettinger From cce at clarkevans.com Wed Sep 8 03:48:45 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Wed Sep 8 03:48:49 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration Message-ID: <20040908014845.GA52384@prometheusresearch.com> I've packaged up the idea of a coroutine facility using iterators and an exception, SuspendIteration. This would require some rather deep changes to how generators are implemented, however, it seems backwards compatible, implementable /w JVM or CLR, and would make most of my database/web development work far more pleasant. http://www.python.org/peps/pep-0334.html Cheers! Clark ... PEP: 334 Title: Simple Coroutines via SuspendIteration Version: $Revision: 1.1 $ Last-Modified: $Date: 2004/09/08 00:11:18 $ Author: Clark C. Evans Status: Draft Type: Standards Track Python-Version: 3.0 Content-Type: text/x-rst Created: 26-Aug-2004 Post-History: Abstract ======== Asynchronous application frameworks such as Twisted [1]_ and Peak [2]_, are based on a cooperative multitasking via event queues or deferred execution. While this approach to application development does not involve threads and thus avoids a whole class of problems [3]_, it creates a different sort of programming challenge. When an I/O operation would block, a user request must suspend so that other requests can proceed. The concept of a coroutine [4]_ promises to help the application developer grapple with this state management difficulty. This PEP proposes a limited approach to coroutines based on an extension to the iterator protocol [5]_. Currently, an iterator may raise a StopIteration exception to indicate that it is done producing values. This proposal adds another exception to this protocol, SuspendIteration, which indicates that the given iterator may have more values to produce, but is unable to do so at this time. Rationale ========= There are two current approaches to bringing co-routines to Python. Christian Tismer's Stackless [6]_ involves a ground-up restructuring of Python's execution model by hacking the 'C' stack. While this approach works, its operation is hard to describe and keep portable. A related approach is to compile Python code to Parrot [7]_, a register-based virtual machine, which has coroutines. Unfortunately, neither of these solutions is portable with IronPython (CLR) or Jython (JavaVM). It is thought that a more limited approach, based on iterators, could provide a coroutine facility to application programmers and still be portable across runtimes. * Iterators keep their state in local variables that are not on the "C" stack. Iterators can be viewed as classes, with state stored in member variables that are persistent across calls to its next() method. * While an uncaught exception may terminate a function's execution, an uncaught exception need not invalidate an iterator. The proposed exception, SuspendIteration, uses this feature. In other words, just because one call to next() results in an exception does not necessarily need to imply that the iterator itself is no longer capable of producing values. There are four places where this new exception impacts: * The simple generator [8]_ mechanism could be extended to safely 'catch' this SuspendIteration exception, stuff away its current state, and pass the exception on to the caller. * Various iterator filters [9]_ in the standard library, such as itertools.izip should be made aware of this exception so that it can transparently propagate SuspendIteration. * Iterators generated from I/O operations, such as a file or socket reader, could be modified to have a non-blocking variety. This option would raise a subclass of SuspendIteration if the requested operation would block. * The asyncore library could be updated to provide a basic 'runner' that pulls from an iterator; if the SuspendIteration exception is caught, then it moves on to the next iterator in its runlist [10]_. External frameworks like Twisted would provide alternative implementations, perhaps based on FreeBSD's kqueue or Linux's epoll. While these may seem dramatic changes, it is a very small amount of work compared with the utility provided by continuations. Semantics ========= This section will explain, at a high level, how the introduction of this new SuspendIteration exception would behave. Simple Iterators ---------------- The current functionality of iterators is best seen with a simple example which produces two values 'one' and 'two'. :: class States: def __iter__(self): self._next = self.state_one return self def next(self): return self._next() def state_one(self): self._next = self.state_two return "one" def state_two(self): self._next = self.state_stop return "two" def state_stop(self): raise StopIteration print list(States()) An equivalent iteration could, of course, be created by the following generator:: def States(): yield 'one' yield 'two' print list(States()) Introducing SuspendIteration ---------------------------- Suppose that between producing 'one' and 'two', the generator above could block on a socket read. In this case, we would want to raise SuspendIteration to signal that the iterator is not done producing, but is unable to provide a value at the current moment. :: from random import randint from time import sleep class SuspendIteration(Exception): pass class NonBlockingResource: """Randomly unable to produce the second value""" def __iter__(self): self._next = self.state_one return self def next(self): return self._next() def state_one(self): self._next = self.state_suspend return "one" def state_suspend(self): rand = randint(1,10) if 2 == rand: self._next = self.state_two return self.state_two() raise SuspendIteration() def state_two(self): self._next = self.state_stop return "two" def state_stop(self): raise StopIteration def sleeplist(iterator, timeout = .1): """ Do other things (e.g. sleep) while resource is unable to provide the next value """ it = iter(iterator) retval = [] while True: try: retval.append(it.next()) except SuspendIteration: sleep(timeout) continue except StopIteration: break return retval print sleeplist(NonBlockingResource()) In a real-world situation, the NonBlockingResource would be a file iterator, socket handle, or other I/O based producer. The sleeplist would instead be an async reactor, such as those found in asyncore or Twisted. The non-blocking resource could, of course, be written as a generator:: def NonBlockingResource(): yield "one" while True: rand = randint(1,10) if 2 == rand: break raise SuspendIteration() yield "two" It is not necessary to add a keyword, 'suspend', since most real content generators will not be in application code, they will be in low-level I/O based operations. Since most programmers need not be exposed to the SuspendIteration() mechanism, a keyword is not needed. Application Iterators --------------------- The previous example is rather contrived, a more 'real-world' example would be a web page generator which yields HTML content, and pulls from a database. Note that this is an example of neither the 'producer' nor the 'consumer', but rather of a filter. :: def ListAlbums(cursor): cursor.execute("SELECT title, artist FROM album") yield '' for (title, artist) in cursor: yield '' % (title, artist) yield '
TitleArtist
%s%s
' The problem, of course, is that the database may block for some time before any rows are returned, and that during execution, rows may be returned in blocks of 10 or 100 at a time. Ideally, if the database blocks for the next set of rows, another user connection could be serviced. Note the complete absence of SuspendIterator in the above code. If done correctly, application developers would be able to focus on functionality rather than concurrency issues. The iterator created by the above generator should do the magic necessary to maintain state, yet pass the exception through to a lower-level async framework. Here is an example of what the corresponding iterator would look like if coded up as a class:: class ListAlbums: def __init__(self, cursor): self.cursor = cursor def __iter__(self): self.cursor.execute("SELECT title, artist FROM album") self._iter = iter(self._cursor) self._next = self.state_head return self def next(self): return self._next() def state_head(self): self._next = self.state_cursor return "" def state_tail(self): self._next = self.state_stop return "
\ TitleArtist
" def state_cursor(self): try: (title,artist) = self._iter.next() return '%s%s' % (title, artist) except StopIteration: self._next = self.state_tail return self.next() except SuspendIteration: # just pass-through raise def state_stop(self): raise StopIteration Complicating Factors -------------------- While the above example is straight-forward, things are a bit more complicated if the intermediate generator 'condenses' values, that is, it pulls in two or more values for each value it produces. For example, :: def pair(iterLeft,iterRight): rhs = iter(iterRight) lhs = iter(iterLeft) while True: yield (rhs.next(), lhs.next()) In this case, the corresponding iterator behavior has to be a bit more subtle to handle the case of either the right or left iterator raising SuspendIteration. It seems to be a matter of decomposing the generator to recognize intermediate states where a SuspendIterator exception from the producing context could happen. :: class pair: def __init__(self, iterLeft, iterRight): self.iterLeft = iterLeft self.iterRight = iterRight def __iter__(self): self.rhs = iter(iterRight) self.lhs = iter(iterLeft) self._temp_rhs = None self._temp_lhs = None self._next = self.state_rhs return self def next(self): return self._next() def state_rhs(self): self._temp_rhs = self.rhs.next() self._next = self.state_lhs return self.next() def state_lhs(self): self._temp_lhs = self.lhs.next() self._next = self.state_pair return self.next() def state_pair(self): self._next = self.state_rhs return (self._temp_rhs, self._temp_lhs) This proposal assumes that a corresponding iterator written using this class-based method is possible for existing generators. The challenge seems to be the identification of distinct states within the generator where suspension could occur. Resource Cleanup ---------------- The current generator mechanism has a strange interaction with exceptions where a 'yield' statement is not allowed within a try/finally block. The SuspendIterator exception provides another similar issue. The impacts of this issue are not clear. However it may be that re-writing the generator into a state machine, as the previous section did, could resolve this issue allowing for the situation to be no-worse than, and perhaps even removing the yield/finally situation. More investigation is needed in this area. API and Limitations ------------------- This proposal only covers 'suspending' a chain of iterators, and does not cover (of course) suspending general functions, methods, or "C" extension function. While there could be no direct support for creating generators in "C" code, native "C" iterators which comply with the SuspendIterator semantics are certainly possible. Low-Level Implementation ======================== The author of the PEP is not yet familiar with the Python execution model to comment in this area. References ========== .. [1] Twisted (http://twistedmatrix.com) .. [2] Peak (http://peak.telecommunity.com) .. [3] C10K (http://www.kegel.com/c10k.html) .. [4] Coroutines (http://c2.com/cgi/wiki?CallWithCurrentContinuation) .. [5] PEP 234, Iterators (http://www.python.org/peps/pep-0234.html) .. [6] Stackless Python (http://stackless.com) .. [7] Parrot /w coroutines (http://www.sidhe.org/~dan/blog/archives/000178.html) .. [8] PEP 255, Simple Generators (http://www.python.org/peps/pep-0255.html) .. [9] itertools - Functions creating iterators (http://docs.python.org/lib/module-itertools.html) .. [10] Microthreads in Python, David Mertz (http://www-106.ibm.com/developerworks/linux/library/l-pythrd.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * From stephen at xemacs.org Wed Sep 8 04:18:59 2004 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed Sep 8 04:19:14 2004 Subject: [Python-Dev] Python 3.0 list of goals In-Reply-To: (Dennis Allison's message of "Tue, 17 Aug 2004 22:33:54 -0700 (PDT)") References: Message-ID: <87fz5ty030.fsf@tleepslib.sk.tsukuba.ac.jp> >>>>> "Dennis" == Dennis Allison writes: Dennis> Ahhhh... Zeno's paradox again. Nah, you're thinking of Archimedes's "Sand Reckoner". -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. From tim.peters at gmail.com Wed Sep 8 05:14:49 2004 From: tim.peters at gmail.com (Tim Peters) Date: Wed Sep 8 05:14:51 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re: Another test_compilermystery) In-Reply-To: <1094572868.8342.43.camel@geddy.wooz.org> References: <002d01c48083$9a89a6c0$5229c797@oemcomputer> <20040816112916.GA19969@vicky.ecs.soton.ac.uk> <1f7befae04090422024afaee58@mail.gmail.com> <413DBB19.40602@zope.com> <1094566431.8341.25.camel@geddy.wooz.org> <413DCFBD.7010306@theopalgroup.com> <1094572868.8342.43.camel@geddy.wooz.org> Message-ID: <1f7befae040907201426c33006@mail.gmail.com> After a bit more thought (and it's hard to measure how little), I'd like to see "bare except" deprecated. That doesn't mean no way to catch all exceptions, it means being explicit about intent. Only a few of the bare excepts I've seen in my Python life did what was actually intended, and there's something off in the design when the easiest thing to say usually does a wrong thing. I think Java has a saner model in this particular respect: Throwable Exception Error Java's distinction between "checked" and "unchecked" exceptions is a distinct layer of complication on top of that. All exceptions derive from Throwable. A "catch" clause requires specifying a class (there's no "bare except"). "An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch". That includes AssertionError and VirtualMachineError. Those are exceptions that should never occur. It also includes ThreadDeath, which is expected to occur, but The class ThreadDeath is specifically a subclass of Error rather than Exception, even though it is a "normal occurrence", because many applications catch all occurrences of Exception and then discard the exception. and it's necessary for ThreadDeath to reach the top level else the thread never really dies. In that respect, it's interesting that SystemExit and KeyboardInterrupt are *intended* to "reach the top level" too, but can't be relied on to do so because of ubiquitous bare excepts and even pseudo-careful "except Exception:"s now. If people changed those to "except StandardError:", SystemExit would make it to the top but KeyboardInterrupt still wouldn't. Raisable Exception Stubborn ControlFlow KeyboardInterrupt StopIteration SystemExit MemoryError introduces a class of stubborn exceptions, those that wouldn't be caught by "except Exception:", and with the intent that there's no way you should get the effect of "except Raisable" without explictly saying just that (once bare except's deprecation is complete). Oh well. We should elect a benevolent dictator for Python! From cce at clarkevans.com Wed Sep 8 05:29:25 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Wed Sep 8 05:29:28 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040908014845.GA52384@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> Message-ID: <20040908032925.GA28079@prometheusresearch.com> Josiah Carlson kindly pointed out (off list), that my use of SuspendIteration violates the standard idiom of exceptions terminating the current function. This got past me, beacuse I think a generator not as a function, but rather as a shortcut to creating iterators. The offending code is, | def NonBlockingResource(): | yield "one" | while True: | rand = randint(1,10) | if 2 == rand: | break | raise SuspendIteration() | yield "two" There are two solutions: (a) introduce a new keyword 'suspend'; or, (b) don't do that. It is not essential to the proposal that the generator syntax produce iterators that can SuspendIteration, it is only essential that the implementation of generators pass-through this exception. Most non-blocking resources will be low-level components from an async database or socket library; they can make iterators the old way. Cheers, Clark From tim.peters at gmail.com Wed Sep 8 05:40:20 2004 From: tim.peters at gmail.com (Tim Peters) Date: Wed Sep 8 05:40:25 2004 Subject: [Python-Dev] assert failure on obmalloc In-Reply-To: <2msm9u14pj.fsf@starship.python.net> References: <2msm9u14pj.fsf@starship.python.net> Message-ID: <1f7befae04090720401c415b32@mail.gmail.com> [Michael Hudson] > Don't debug builds route all PyMem_ calls through PyMalloc? Indeed they do. > Doesn't pymalloc rely on the GIL being held when it's called? Indeed it does. > If both of these are true, there's an obvious problem here, because the call to > PyMem_NEW in PyThreadState_New certainly isn't called with the GIL > held... Indeed that sucks. > This would only be a problem in a debug build, though. So it's Jeremy's fault, just as we suspected all along. There are lock macros in obmalloc, which currently expand to nothing. They could be changed to "do something" in a debug build, but I'd rather not -- the debug capabilities of obmalloc are more useful the nastier a memory corruption problem is, and few things make problems nastier than throwing threads into the mix. A cheap trick is to ensure that all code that may be called without the GIL calls the platform malloc()/free() directly. Alas, I haven't been able to reproduce Jeremy's symptom. From martin at v.loewis.de Wed Sep 8 06:41:28 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 8 06:41:21 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions Message-ID: <413E8D78.2030302@v.loewis.de> I recently looked into properly implementing the "Register Extensions" feature in the installer; in 2.4a3, not selecting that doesn't really work. The problem is that MSI only supports installing either both the "extension server" (the .exe) and the extension, or neither. So you can chose not to install word.exe, and it won't install the .doc extension; if you install word.exe, it will associate .doc with it. For Python, this leaves us with three options: 1. Don't make registration of extensions optional; always associate .py, .pyc, .pyw, .pyo. 2. Don't support installation-on-demand for extensions. This means to not use the MSI extension machinery at all, but to directly write the registry keys that build the extension. Installing these keys can then be made optional. 3. Provide another binary that is the "extension server", and install that independently of python.exe, and pythonw.exe. In CVS, I have implemented this approach to see whether it works (it does), and called this binary "launcher.exe". It is a Windows app which supports a -console argument which also makes it a console app. This is the the binary that gets associated with all four extensions, for the "open" verb. Currently, I'm in favour of using option 3, but I'd like to hear whether people would prefer something else instead. Regards, Martin From gvanrossum at gmail.com Wed Sep 8 06:53:52 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 8 06:53:55 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <413E8D78.2030302@v.loewis.de> References: <413E8D78.2030302@v.loewis.de> Message-ID: On Wed, 08 Sep 2004 06:41:28 +0200, Martin v. L?wis wrote: > I recently looked into properly implementing the "Register Extensions" > feature in the installer; in 2.4a3, not selecting that doesn't really > work. The problem is that MSI only supports installing either both > the "extension server" (the .exe) and the extension, or neither. So > you can chose not to install word.exe, and it won't install the .doc > extension; if you install word.exe, it will associate .doc with it. > > For Python, this leaves us with three options: > 1. Don't make registration of extensions optional; always associate > .py, .pyc, .pyw, .pyo. > 2. Don't support installation-on-demand for extensions. This means > to not use the MSI extension machinery at all, but to directly > write the registry keys that build the extension. Installing > these keys can then be made optional. > 3. Provide another binary that is the "extension server", and > install that independently of python.exe, and pythonw.exe. > In CVS, I have implemented this approach to see whether it > works (it does), and called this binary "launcher.exe". It > is a Windows app which supports a -console argument which also > makes it a console app. This is the the binary that gets > associated with all four extensions, for the "open" verb. > > Currently, I'm in favour of using option 3, but I'd like to hear > whether people would prefer something else instead. > > Regards, > Martin I frequently use the extension feature in a console context; when I am in a directory full of .py files, I can run any one of them by simply typing its name (and possibly command line arguments). The script will then interact through the existing console window. WIll this work? >From your description I fear that this would start the script without console I/O possibility or in a separate window, both of which would make this a no-no. If you can confirm that this works as expected, I think the separate driver is fine, since pretty much by definition you can't pass any command line arguments to Python (although I would hope that the environment variables would still work). -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From erik at heneryd.com Wed Sep 8 09:15:16 2004 From: erik at heneryd.com (Erik Heneryd) Date: Wed Sep 8 09:15:23 2004 Subject: [Python-Dev] Missing arguments in RE functions In-Reply-To: <000f01c49535$9ec914c0$e841fea9@oemcomputer> References: <000f01c49535$9ec914c0$e841fea9@oemcomputer> Message-ID: <413EB184.9030604@heneryd.com> Raymond Hettinger wrote: >>>The first missing feature is the "flags" argument in the findall and >>>finditer functions. >> >> . . . >> >>>The second missing feature is the ability to specify start and end >>>indices when doing matches and searches. >> >>+1 >> >>I've need both of these more than once. >> >>Are you up to crafting the code? > > > Noam has posted a patch: > www.python.org/sf/1024041 > > After adding the unittests, does anyone see any reason that this should > not be in Py2.4? > +0 I rarely use the functions, but rather precompile the pattern myself, even when it's a one-shot throw-away. It happens once in awhile, and I know I've been puzzled by this a few times when I've used the functions for a change. Erik From Paul.Moore at atosorigin.com Wed Sep 8 10:21:47 2004 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Sep 8 10:21:52 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> From: "Martin v. L?wis" > 3. Provide another binary that is the "extension server", and > install that independently of python.exe, and pythonw.exe. > In CVS, I have implemented this approach to see whether it > works (it does), and called this binary "launcher.exe". It > is a Windows app which supports a -console argument which also > makes it a console app. This is the the binary that gets > associated with all four extensions, for the "open" verb. > > Currently, I'm in favour of using option 3, but I'd like to hear > whether people would prefer something else instead. With option (3), what happens if you run "launcher -console" from a command prompt? Does it produce output in the same console window, or does it launch a new console? The reason I ask is that cmd.exe uses the association of the .py extension to treat .py files as executable. If the association is to a Windows program, cmd.exe will not wait for the command to finish, but will return a prompt immediately, and the command output will appear in a separate console. If this is the case, I'm -1 on option (3). If it's not, I'd like to see how you coded console.exe, as I've often needed this sort of behaviour, and never been able to achieve it correctly! Paul. __________________________________________________________________________ This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Origin group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. __________________________________________________________________________ From exarkun at divmod.com Wed Sep 8 15:07:59 2004 From: exarkun at divmod.com (Jp Calderone) Date: Wed Sep 8 15:08:06 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040908032925.GA28079@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> <20040908032925.GA28079@prometheusresearch.com> Message-ID: <413F042F.10803@divmod.com> Clark C. Evans wrote: > Josiah Carlson kindly pointed out (off list), that my use of > SuspendIteration violates the standard idiom of exceptions > terminating the current function. This got past me, beacuse > I think a generator not as a function, but rather as a shortcut > to creating iterators. The offending code is, > > | def NonBlockingResource(): > | yield "one" > | while True: > | rand = randint(1,10) > | if 2 == rand: > | break > | raise SuspendIteration() > | yield "two" > > There are two solutions: > (a) introduce a new keyword 'suspend'; or, > (b) don't do that. > > It is not essential to the proposal that the generator syntax produce > iterators that can SuspendIteration, it is only essential that the > implementation of generators pass-through this exception. Most > non-blocking resources will be low-level components from an async > database or socket library; they can make iterators the old way. > What about this? def somefunc(): raise SuspendIteration() return 'foo' def genfunc(): yield somefunc() Jp From cce at clarkevans.com Wed Sep 8 15:26:03 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Wed Sep 8 15:26:08 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <413F042F.10803@divmod.com> References: <20040908014845.GA52384@prometheusresearch.com> <20040908032925.GA28079@prometheusresearch.com> <413F042F.10803@divmod.com> Message-ID: <20040908132603.GB66159@prometheusresearch.com> On Wed, Sep 08, 2004 at 09:07:59AM -0400, Jp Calderone wrote: | | def somefunc(): | raise SuspendIteration() | return 'foo' | | def genfunc(): | yield somefunc() Interesting, but: - somefunc is a function, thus SuspendIteration() should terminate the function; raising an exception - somefunc is not a generator, so it cannot be yielded. However, perhaps something like... def suspend(*args,**kwargs): raise SuspendIteration(*args,**kwargs) # never ever returns def myProducer(): yeild "one" suspend() yield "two" Regardless, this is a side point. The authors of iterators that raise a SuspendIterator() will be low-level code, like a next() which reads the next block from a socket or row from a database query. In these cases, the class style iterator is sufficient. The real point, is that user-level generators, such as this example from the PEP (which is detailed as a class-based iterator), should transparently handle SuspendIteration() by passing it up the generator chain without killing the current scope. | def ListAlbums(cursor): | cursor.execute("SELECT title, artist FROM album") | yield '' | for (title, artist) in cursor: | yield '' % (title, artist) | yield '
TitleArtist
%s%s
' For those who say that this iterator should be invalidated when cursor.next() raises SuspendIteration(), I point out that it is not invalided when cursor.next() raises StopIteration(). Kind Regards, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * From cce at clarkevans.com Wed Sep 8 15:32:15 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Wed Sep 8 15:32:17 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040908132603.GB66159@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> <20040908032925.GA28079@prometheusresearch.com> <413F042F.10803@divmod.com> <20040908132603.GB66159@prometheusresearch.com> Message-ID: <20040908133215.GA86633@prometheusresearch.com> On Wed, Sep 08, 2004 at 09:26:03AM -0400, Clark C. Evans wrote: | On Wed, Sep 08, 2004 at 09:07:59AM -0400, Jp Calderone wrote: | | | | def somefunc(): | | raise SuspendIteration() | | return 'foo' | | | | def genfunc(): | | yield somefunc() | | Interesting, but: | - somefunc is a function, thus SuspendIteration() should | terminate the function; raising an exception | - somefunc is not a generator, so it cannot be yielded. It's too early for me to be posting; scrap the nonsense in this second point. I don't think this changes the suggestion below though. | | However, perhaps something like... | | def suspend(*args,**kwargs): | raise SuspendIteration(*args,**kwargs) | # never ever returns | | def myProducer(): | yeild "one" | suspend() | yield "two" | | Regardless, this is a side point. The authors of iterators that | raise a SuspendIterator() will be low-level code, like a next() | which reads the next block from a socket or row from a database | query. In these cases, the class style iterator is sufficient. | | The real point, is that user-level generators, such as this example | from the PEP (which is detailed as a class-based iterator), should | transparently handle SuspendIteration() by passing it up the generator | chain without killing the current scope. | | | def ListAlbums(cursor): | | cursor.execute("SELECT title, artist FROM album") | | yield '' | | for (title, artist) in cursor: | | yield '' % (title, artist) | | yield '
TitleArtist
%s%s
' | | For those who say that this iterator should be invalidated when | cursor.next() raises SuspendIteration(), I point out that it is not | invalided when cursor.next() raises StopIteration(). | | Kind Regards, | | Clark | | | -- | Clark C. Evans Prometheus Research, LLC. | http://www.prometheusresearch.com/ | o office: +1.203.777.2550 | ~/ , mobile: +1.203.444.0557 | // | (( Prometheus Research: Transforming Data Into Knowledge | \\ , | \/ - Research Exchange Database | /\ - Survey & Assessment Technologies | ` \ - Software Tools for Researchers | ~ * | _______________________________________________ | Python-Dev mailing list | Python-Dev@python.org | http://mail.python.org/mailman/listinfo/python-dev | Unsubscribe: http://mail.python.org/mailman/options/python-dev/cce%40clarkevans.com -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * From mal at egenix.com Wed Sep 8 16:47:35 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Wed Sep 8 16:47:40 2004 Subject: [Python-Dev] PEP 328 - Relative Imports Message-ID: <413F1B87.90301@egenix.com> Hi there, I know that this has been discussed a few times in the past, but the more I have to deal with building applications using third-party libs or packages, the more I get the feeling that the choice of making "import module" absolute is the wrong path to follow. The typical scenario goes like this: * you build an application that uses various third-party packages and has to maintain them inside another package, e.g. ThirdPartyCode * you don't have access to the (third-party) package source code or it's not feasable to make changes to it for maintenance reasons Another common case is that you have to deal with third-party code that is not properly packaged as Python package, but comes as a set of top-level modules. In this scenario you typically put all those files into a newly created Python package directory and access the modules in that directory using the package name. In Python 2.3 and 2.4 (as well as all previous versions), both scenarios can easily be implemented without having to change the third-party code. The PEP however suggests that starting with 2.5, the interpreter will issue a warning and 2.6 should default to absolute paths. I'd like to request that the latter change be postponed to Python 3k, or that some other way of supporting the above scenarios is provided that can be enabled in the application. Please remember that changes to application code are well possible. What's not possible is making changes to the packaged third-party code. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 08 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From mal at egenix.com Wed Sep 8 16:56:28 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Wed Sep 8 16:56:31 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> Message-ID: <413F1D9C.20209@egenix.com> Fredrik Lundh wrote: > Barry wrote: > >>I'll point out that Template was very deliberately subclassed from >>unicode, so Template instances /are/ unicode objects. From the >>standpoint of type conversion, using /F's notation, T(8) == U, thus >>because U % 8 == U, T(8) % 8 == U. > > from a user perspective, there's no reason to make templates a sub- > class of unicode, so the rest of your argument is irrelevant. Templates are meant to template *text* data, so Unicode is the right choice of baseclass from a design perspective. > instead of looking at use patterns, you're stuck defending the existing > code. that's not a good way to design usable code. Perhaps I'm missing something, but where would you use Templates for templating binary data (where strings or bytes would be a more appropriate design choice) ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 08 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From gvanrossum at gmail.com Wed Sep 8 17:03:39 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 8 17:03:42 2004 Subject: [Python-Dev] Subversion In-Reply-To: <20040907215053.GF5208@solar.trillke> References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> <4A27E67A-0111-11D9-9892-000A95686CD8@redivi.com> <20040907215053.GF5208@solar.trillke> Message-ID: Somebody in this thread said that files don't belong in a db, and proposed to only put the metadata in the db. That argument seems misguided to me: when recovering the db and filesystem after a host crash, you'd have to go to extra hoops to make sure the metadata matches the filesystem data. Note that Perforce puts everything in a database and it's rock solid. The main problems I had with svn's use of Berkeley DB were packaging issues (but then, I was using a pre-1.0 beta of svn) and poor management of the Berkeley DB by the svn code, requiring frequent db "recovery" actions; also a bug that caused accesses by different users to hose the database in a subtle way. All of that seems fixable. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gvanrossum at gmail.com Wed Sep 8 17:08:07 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 8 17:08:10 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: <413F1D9C.20209@egenix.com> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> Message-ID: > Templates are meant to template *text* data, so Unicode is > the right choice of baseclass from a design perspective. Only in Python 3.0. But even so, deriving from Unicode (or str) means the template class inherits a lot of unwanted operations. While I can see that concatenating templates probably works, slicing them or converting to lowercase etc. make no sense. IMO the standard Template class should implement a "narrow" interface, i.e. *only* the template expansion method (__mod__ or something else), so it's clear that other compatible template classes shouldn't have to implement anything besides that. This avoids the issues we have with the mapping protocol: when does an object implement enough of the mapping API to be usable? That's currently ill-defined; sometimes, __getitem__ is all you need, sometimes __contains__ is required, sometimes keys, rarely setdefault. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gvanrossum at gmail.com Wed Sep 8 17:11:28 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 8 17:11:31 2004 Subject: [Python-Dev] PEP 328 - Relative Imports In-Reply-To: <413F1B87.90301@egenix.com> References: <413F1B87.90301@egenix.com> Message-ID: > I know that this has been discussed a few times in the past, > but the more I have to deal with building applications using > third-party libs or packages, the more I get the feeling that > the choice of making "import module" absolute is the wrong > path to follow. > > The typical scenario goes like this: > > * you build an application that uses various third-party > packages and has to maintain them inside another package, > e.g. ThirdPartyCode > > * you don't have access to the (third-party) package source code or > it's not feasable to make changes to it for maintenance reasons > > Another common case is that you have to deal with third-party > code that is not properly packaged as Python package, but comes > as a set of top-level modules. > > In this scenario you typically put all those files into a > newly created Python package directory and access the modules > in that directory using the package name. > > In Python 2.3 and 2.4 (as well as all previous versions), both > scenarios can easily be implemented without having to change > the third-party code. > > The PEP however suggests that starting with 2.5, the interpreter > will issue a warning and 2.6 should default to absolute paths. > > I'd like to request that the latter change be postponed to > Python 3k, or that some other way of supporting the above > scenarios is provided that can be enabled in the application. > > Please remember that changes to application code are well > possible. What's not possible is making changes to the > packaged third-party code. As long as it's clear that this is a compatibility requirement only I think it's a good idea to support this way of developing apps (even though I think that clever sys.path manipulation can probably get around it, it's not worth breaking existing approaches). All new apps should however use relative imports to reference their own code, so the problem won't be repeated in the future. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gvanrossum at gmail.com Wed Sep 8 17:12:56 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 8 17:13:04 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> Message-ID: One more thing. I'd like the launcher app's name to begin with "Py". Maybe PyLaunch.exe or PyStart.exe? -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From mal at egenix.com Wed Sep 8 17:23:12 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Wed Sep 8 17:23:17 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <413F1D9C.20209@egenix.com> Message-ID: <413F23E0.2090908@egenix.com> Guido van Rossum wrote: >>Templates are meant to template *text* data, so Unicode is >>the right choice of baseclass from a design perspective. > > Only in Python 3.0. We better start early to ever reach the point of making a clear distinction between text and binary data in P3k. > But even so, deriving from Unicode (or str) means the template class > inherits a lot of unwanted operations. While I can see that > concatenating templates probably works, slicing them or converting to > lowercase etc. make no sense. IMO the standard Template class should > implement a "narrow" interface, i.e. *only* the template expansion > method (__mod__ or something else), so it's clear that other > compatible template classes shouldn't have to implement anything > besides that. This avoids the issues we have with the mapping > protocol: when does an object implement enough of the mapping API to > be usable? That's currently ill-defined; sometimes, __getitem__ is all > you need, sometimes __contains__ is required, sometimes keys, rarely > setdefault. Looks like it's ont even clear what templating itself should mean... you're talking about a templating interface here, not an object type, like Barry is (for the sake of making Templates compatible to i18n tools like gettext). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 08 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From mcherm at mcherm.com Wed Sep 8 17:58:31 2004 From: mcherm at mcherm.com (Michael Chermside) Date: Wed Sep 8 17:57:47 2004 Subject: [Python-Dev] Subversion, Codeville Message-ID: <1094659111.413f2c2781664@mcherm.com> Gregory P. Smith writes: > There should -never- be a reason to remove the entire proof of a files > past existence from a repository (unless you live in 1984). disk space > is effectively free. One day a careless Python developer checks in a new cryptography library based on code she found on the internet. Shortly thereafter, SCO decides to sue the PSF for using and distributing "their" copyrighted code. Removing the library from the distributed version isn't sufficient, we have "a copy" of the code, and that's against the law. I realize that disk space usually isn't the issue, but as long as laws make certain information illegal, there will always be reasons to need to delete information. -- Michael Chermside From fredrik at pythonware.com Wed Sep 8 18:33:00 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Sep 8 18:31:18 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation forPEP292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> Message-ID: M.-A. Lemburg wrote: >> from a user perspective, there's no reason to make templates a sub- >> class of unicode, so the rest of your argument is irrelevant. > > Templates are meant to template *text* data, so Unicode is > the right choice of baseclass from a design perspective. not true. as I've shown in SRE and ElementTree (just to give a few examples), 8-bit strings are superior for the *huge* subset of all text strings that only contain ASCII data. >> instead of looking at use patterns, you're stuck defending the existing >> code. that's not a good way to design usable code. > > Perhaps I'm missing something, but where would you use Templates > for templating binary data (where strings or bytes would be a more > appropriate design choice) ? 8-bit strings != binary data. you clearly haven't read my other posts in this thread. please do that, instead of repeating the same bogus arguments over again. From trentm at ActiveState.com Wed Sep 8 18:36:59 2004 From: trentm at ActiveState.com (Trent Mick) Date: Wed Sep 8 18:37:04 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Tools/msi msi.py, 1.7, 1.8 In-Reply-To: ; from loewis@users.sourceforge.net on Wed, Sep 08, 2004 at 09:09:17AM -0700 References: Message-ID: <20040908093659.A11945@ActiveState.com> [loewis@users.sourceforge.net wrote] >... > msi.py >... > add_data(db, "Verb", > - [("py", "open", 1, None, r'-console "%1"'), > + [("py", "open", 1, None, r'"%1"'), > ("pyw", "open", 1, None, r'"%1"'), > - ("pyc", "open", 1, None, r'-console "%1"'), > - ("pyo", "open", 1, None, r'-console "%1"')]) > + ("pyc", "open", 1, None, r'"%1"'), > + ("pyo", "open", 1, None, r'"%1"')]) >... Not sure I am following exactly here, but ActivePython makes the association: "%1" %* I don't know if msi.py adds that '%*' later or not. IIRC that allows for script arguments to be passed as well. C:\> foo.py -h blah blah | `--`----`-------- %* `--- %1 Trent -- Trent Mick TrentM@ActiveState.com From mal at egenix.com Wed Sep 8 18:40:37 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Wed Sep 8 18:40:41 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation forPEP292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> Message-ID: <413F3605.7090707@egenix.com> Fredrik Lundh wrote: > M.-A. Lemburg wrote: > > >>>from a user perspective, there's no reason to make templates a sub- >>>class of unicode, so the rest of your argument is irrelevant. >> >>Templates are meant to template *text* data, so Unicode is >>the right choice of baseclass from a design perspective. > > not true. as I've shown in SRE and ElementTree (just to give a few > examples), 8-bit strings are superior for the *huge* subset of all text > strings that only contain ASCII data. > >>>instead of looking at use patterns, you're stuck defending the existing >>>code. that's not a good way to design usable code. >> >>Perhaps I'm missing something, but where would you use Templates >>for templating binary data (where strings or bytes would be a more >>appropriate design choice) ? > > > 8-bit strings != binary data. > > you clearly haven't read my other posts in this thread. please do that, > instead of repeating the same bogus arguments over again. I've read them all and, to be honest, I don't follow your argumentation. The text interpretation of 8-bit strings is only one possible form of their interpretation. You could just as well have image data in your 8-bit string and calling .lower() on such a string is certainly going to render that image data useless. The whole point in adding Unicode to the language was to make the difference between text and binary data clear and visible at the type level. I'm not saying that you can not store text data in 8-bit strings, but that we should start to make use of the distinction between text and binary data. If we start to store text data in Unicode now and leave binary data in 8-bit strings, then the move to Unicode strings literals will be much smoother in P3k. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 08 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From fredrik at pythonware.com Wed Sep 8 18:35:24 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Sep 8 18:40:43 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions References: <413E0D33.7030703@myrealbox.com> <006301c49526$3b46ad40$e841fea9@oemcomputer> Message-ID: Raymond Hettinger wrote: . . >> The second missing feature is the ability to specify start and end >> indices when doing matches and searches. > > +1 > > I've need both of these more than once. any reason you why you cannot type "re.compile(p).match(...)" ? From mcherm at mcherm.com Wed Sep 8 18:49:42 2004 From: mcherm at mcherm.com (Michael Chermside) Date: Wed Sep 8 18:48:36 2004 Subject: [Python-Dev] decorator support Message-ID: <1094662182.413f3826f1c1a@mcherm.com> Raymond Hettinger writes: > In my experiments with decorators, it is common to wrap the original > function with a new function. > > After creating the new function, there are efforts to make it > look like > the old: [...] > All is well and good except the argspec. Running help() on the new > function gives: [...] > So, it would be nice if there were some support for carrying > forward the > argspec to inform help(), calltips(), and inspect(). I created something to help address this... the "obvious" solution of a decorator used for making decorators. It's example #6 in: http://www.python.org/cgi-bin/moinmoin/PythonDecoratorLibrary Currently it doesn't handle argspec, but I propose adding argspec (it'll be slightly tricky, but shouldn't be impossible), then including it in a "decorators" package. Until we have a decorators package, I think it can live in the wiki and/or the cookbook. If asked to do so, I'll see about updating this to fix the argspec. If not asked to, I'll probably leave it alone, because as-is it is simple enough to serve as a decent example as well as being useful but with argspec handling it would be complex enough to be useful but too complex to be a good example to learn from. -- Michael Chermside From fredrik at pythonware.com Wed Sep 8 18:56:10 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Sep 8 18:54:20 2004 Subject: [Python-Dev] Re: Re: Alternative Implementationfor PEP292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <413F1D9C.20209@egenix.com> <413F23E0.2090908@egenix.com> Message-ID: M.-A. Lemburg wrote > for the sake of making Templates compatible to i18n tools like gettext). assuming that gettext really always returns a template if you hand it a template, of course. given that the 2.4 gettext doesn't seem to map templates to templates on my machine, that there's no sign of template support in the 2.4 gettext source code, and that Barry ignored my question about this, I have to assume that the I18N argument is yet another bogus argument. From mcherm at mcherm.com Wed Sep 8 19:19:29 2004 From: mcherm at mcherm.com (Michael Chermside) Date: Wed Sep 8 19:18:23 2004 Subject: [Python-Dev] Re: Dangerous exceptions (was Re:Another test_compilermystery) Message-ID: <1094663969.413f3f217532e@mcherm.com> [various discussion about how to NOT catch exceptions by default] I've been working in the Java world for a while now, and they seem to have solved this problem quite handily -- no one ever seems to get confused about it. Ignoring Java's "checked exceptions" (which are a different problem), they've done the following: Throwable | +-- Error | | | +-- OutOfMemorError (and others like it) | +-- Exception | +-- IndexOutOfBoundsException (and others like it) | +-- As I see it, there are several common cases: (1) You want to catch a particular (normal) exception. (eg: to handle the problem -- this is the normal thing!) (2) You want to catch any normal exception. (eg: to log and ignore after calls to some subsystem) (3) You want to catch a particular special exception (eg: to deal with KeyboardInterrupt, or MemoryError) (4) You want to catch ANYTHING, then re-throw it afterward (eg: to cleanup a DB connection) We currently make (1) easy (of course!) by writing "except Foo:", and we make (4) easy by writing "except:" (the bare except). But most users only want to use (1) and (2)... only experts use (3) and (4). So I certainly agree that bare except is used by people who want (2) and should be using (4) instead. I would think that both the end goal (python 3000) AND the transition plan are straightforward. For now, a bare except can't be changed because of backward compatibility. So create a top-level class which is NOT named "Exception" ("Raisable" anyone?). Our hierarchy would look like this: Raisable | +-- MemoryError +-- SystemExit +-- KeyboardInterrupt +-- oddballs like ZODB's ConflictError | +-- Exception | +-- most everything else +-- user defined exceptions That's not quite as flat as today's hierarchy, but it works pretty well. When non-experts want to catch all exceptions, if bare excepts are deprecated, they will write "except Exception:" (that's just psychology), so most users will wind up with (2) when that's what they want. Experts can easily do (3) by catching the exception by name, and experts can do (4) by catching Raisable. When Python 3000 is released and backward compatibility is not required, we can either remove the bare except, or change it to mean the same as "except Exception" (Python 3000 will, of course, forbid raising strings or anything not decended from Raisable). I don't need anyone to buy into this approach, but since it seems so straightforward to me I thought I should write it out anyhow. -- Michael Chermside From raymond.hettinger at verizon.net Wed Sep 8 19:39:43 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Wed Sep 8 19:40:36 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: Message-ID: <001501c495ca$d3a8bd40$e841fea9@oemcomputer> > >> The second missing feature is the ability to specify start and end > >> indices when doing matches and searches. > > > > +1 > > > > I've need both of these more than once. > > any reason you why you cannot type "re.compile(p).match(...)" ? That is what I usually do and that is the approach taken by the patch. If you see a downside, feel free to reject his patch. IMO, it is only a small win. Raymond From fredrik at pythonware.com Wed Sep 8 20:04:42 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Sep 8 20:02:53 2004 Subject: [Python-Dev] Re: Re: Missing arguments in RE functions References: <001501c495ca$d3a8bd40$e841fea9@oemcomputer> Message-ID: Raymond Hettinger wrote: >> > I've need both of these more than once. >> >> any reason you why you cannot type "re.compile(p).match(...)" ? > > That is what I usually do and that is the approach taken by the patch. > > If you see a downside, feel free to reject his patch. IMO, it is only a > small win. If it's up to me, it's a clear "not worth it". The function API is only there for trivial cases; if you need the full RE power, use pattern objects (you have to use them anyway if you're serious about RE:s). but I'm an API minimalist; someone else will have to make the final decision on this one (Guido, what's your take on API size issues?) From foom at fuhm.net Wed Sep 8 20:08:17 2004 From: foom at fuhm.net (James Y Knight) Date: Wed Sep 8 20:08:24 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040908014845.GA52384@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> Message-ID: <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> On Sep 7, 2004, at 9:48 PM, Clark C. Evans wrote: > I've packaged up the idea of a coroutine facility using iterators and > an > exception, SuspendIteration. Very interesting. > This proposal assumes that a corresponding iterator written using > this class-based method is possible for existing generators. The > challenge seems to be the identification of distinct states within > the generator where suspension could occur. That is basically impossible. Essentially *every* operation could possibly raise SuspendIteration, because essentially every operation can call an arbitrary python function, and python functions can raise any exception they want. I think you could still make the proposal work in CPython: if I understand its internals properly, it doesn't need to do a transformation to a class iterator, it simply suspends the frame wherever it is. Thus, being able to suspend at any point in the function would not cause an undue performance degradation. However, I think it is a deal-breaker for JPython. From the generator PEP: "It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new keyword makes that easy." If this quote is right about the implementation of Jython (and it seems likely, given the JVM), your proposal makes it impossible to implement generators in Jython. Given that the advantage claimed for this proposal over stackless is that it can be implemented in non-CPython runtimes, I think it still needs some reworking. James From raymond.hettinger at verizon.net Wed Sep 8 20:16:26 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Wed Sep 8 20:17:21 2004 Subject: [Python-Dev] Re: Re: Missing arguments in RE functions In-Reply-To: Message-ID: <001d01c495cf$f54623c0$e841fea9@oemcomputer> > > If you see a downside, feel free to reject his patch. IMO, it is only a > > small win. > > If it's up to me, it's a clear "not worth it". The function API is only > there for > trivial cases; if you need the full RE power, use pattern objects (you > have > to use them anyway if you're serious about RE:s). > > but I'm an API minimalist; someone else will have to make the final > decision > on this one (Guido, what's your take on API size issues?) It is up to you. You're still the god of re (among other things). FWIW, I gave extra weight to the OP's usability enhancement request because it was born out of experience teaching Python to newbies. The patch itself is a little rough and needs refinement. Raymond From fredrik at pythonware.com Wed Sep 8 20:20:17 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Sep 8 20:18:28 2004 Subject: [Python-Dev] Re: Re: Re: AlternativeImplementation forPEP292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> Message-ID: M.-A. Lemburg wrote: > The whole point in adding Unicode to the language was to make > the difference between text and binary data clear and visible > at the type level. well, when I wrote the Unicode type, the whole point was to be able to make it easy to handle Unicode text. no more, no less. > If we start to store text data in Unicode now and leave binary > data in 8-bit strings, then the move to Unicode strings literals > will be much smoother in P3k. hopefully, the P3K string design will take a lot more into account than text-vs-binary; there are many ways to represent text, and many ways to store binary data, and many usage patterns for them both. a good design should take most of this into account. (google for "stringlib" for some work I'm doing in this area) From martin at v.loewis.de Wed Sep 8 20:20:37 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 8 20:20:29 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Tools/msi msi.py, 1.7, 1.8 In-Reply-To: <20040908093659.A11945@ActiveState.com> References: <20040908093659.A11945@ActiveState.com> Message-ID: <413F4D75.8040803@v.loewis.de> Trent Mick wrote: > Not sure I am following exactly here, but ActivePython makes the > association: > "%1" %* > > I don't know if msi.py adds that '%*' later or not. IIRC that allows for > script arguments to be passed as well. > > C:\> foo.py -h blah blah Until yesterday, I didn't even know that was possible, and today, I did not make the right association (pun intended). Will fix soon. Regards, Martin From paul.dubois at gmail.com Wed Sep 8 20:22:28 2004 From: paul.dubois at gmail.com (Paul Du Bois) Date: Wed Sep 8 20:22:31 2004 Subject: [Python-Dev] Subversion In-Reply-To: References: <200409071546.26539.fdrake@acm.org> <20040907201720.GB1083@panix.com> <200409071623.05499.fdrake@acm.org> <413E1F50.90709@v.loewis.de> <4A27E67A-0111-11D9-9892-000A95686CD8@redivi.com> <20040907215053.GF5208@solar.trillke> Message-ID: <85f6a31f04090811221fbd30fd@mail.gmail.com> On Wed, 8 Sep 2004 08:03:39 -0700, Guido van Rossum wrote: > Note that Perforce puts everything in a database and it's rock solid. It's actually the other way around (about the everything-in-database bit, not the rock-solid bit). Snipped from http://www.perforce.com/perforce/technotes/note033.html: The Perforce Server stores two kinds of data: versioned files, and metadata (changelists, opened files, labels, etc.). Both are stored in the Perforce Server's root directory. Versioned files are stored in depot subdirectories; there is one subdirectory for each depot in your Perforce installation. Metadata are stored in the Perforce database. p From martin at v.loewis.de Wed Sep 8 20:28:12 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 8 20:28:05 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: References: <413E8D78.2030302@v.loewis.de> Message-ID: <413F4F3C.8060905@v.loewis.de> Guido van Rossum wrote: > I frequently use the extension feature in a console context; when I am > in a directory full of .py files, I can run any one of them by simply > typing its name (and possibly command line arguments). The script will > then interact through the existing console window. WIll this work? No. I didn't (really) know that was possible (although Mr Rivest's bug report should have taught me). I've tried to fix it, and now think this is impossible: Even though XP provides an AttachConsole call (which doesn't exist in earlier releases or W9x), which allows to write in the console from which the binary was started, there is apparently no way to tell cmd.exe that it should wait for completion, instead of immediately giving a prompt. I have now reverted the change to create launcher.exe, and install python.exe and pythonw.exe twice (the second time as extpy.exe and extpyw.exe). P.S. Out of curiosity, and to the WINDOWS GURUS ON THIS LIST: How does cmd.exe know whether the program started is a console application or not? Is there any API for that? Just looking at the file being run is clearly insufficient - if the file is foo.py, it needs to look at python.exe. From martin at v.loewis.de Wed Sep 8 20:30:58 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 8 20:30:52 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> Message-ID: <413F4FE2.2090602@v.loewis.de> Moore, Paul wrote: > With option (3), what happens if you run "launcher -console" from a > command prompt? Does it produce output in the same console window, > or does it launch a new console? That was a problem originally, which I have now fixed into (3') Install two binaries, extpy.exe, and extpyw.exe. With that approach, what do you think? > If it's not, I'd like to see how you coded console.exe, as I've > often needed this sort of behaviour, and never been able to achieve > it correctly! As I found, you can use AttachConsole in WXP, but that isn't a full solution, either (more like a Unix background process). Regards, Martin From martin at v.loewis.de Wed Sep 8 20:32:36 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 8 20:32:28 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: References: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> Message-ID: <413F5044.6080205@v.loewis.de> Guido van Rossum wrote: > One more thing. I'd like the launcher app's name to begin with "Py". > Maybe PyLaunch.exe or PyStart.exe? I now call them extpy.exe and extpyw.exe. I deliberately avoided a py *prefix*, as this really hurts tab completion if you interactively hit c:\py\py. I actually want to ban py.ico for that very reason from the python directory. Regards, Martin From theller at python.net Wed Sep 8 20:34:57 2004 From: theller at python.net (Thomas Heller) Date: Wed Sep 8 20:36:32 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <413F4F3C.8060905@v.loewis.de> ( =?iso-8859-1?q?Martin_v._L=F6wis's_message_of?= "Wed, 08 Sep 2004 20:28:12 +0200") References: <413E8D78.2030302@v.loewis.de> <413F4F3C.8060905@v.loewis.de> Message-ID: <4qm8a9ta.fsf@python.net> "Martin v. L?wis" writes: > Guido van Rossum wrote: >> I frequently use the extension feature in a console context; when I am >> in a directory full of .py files, I can run any one of them by simply >> typing its name (and possibly command line arguments). The script will >> then interact through the existing console window. WIll this work? > > No. I didn't (really) know that was possible (although Mr Rivest's > bug report should have taught me). > > I've tried to fix it, and now think this is impossible: Even though > XP provides an AttachConsole call (which doesn't exist in earlier > releases or W9x), which allows to write in the console from which > the binary was started, there is apparently no way to tell cmd.exe > that it should wait for completion, instead of immediately giving > a prompt. > > I have now reverted the change to create launcher.exe, and install > python.exe and pythonw.exe twice (the second time as extpy.exe and > extpyw.exe). > > P.S. Out of curiosity, and to the WINDOWS GURUS ON THIS LIST: > How does cmd.exe know whether the program started is a console > application or not? Is there any API for that? Just looking at > the file being run is clearly insufficient - if the file is > foo.py, it needs to look at python.exe. It seems to be a flag in the exe header. A quick google search turned up this: http://www.codeguru.com/Cpp/W-P/system/misc/article.php/c2897/ Thomas From martin at v.loewis.de Wed Sep 8 20:52:35 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 8 20:52:27 2004 Subject: [Python-Dev] Console vs. GUI applications In-Reply-To: <4qm8a9ta.fsf@python.net> References: <413E8D78.2030302@v.loewis.de> <413F4F3C.8060905@v.loewis.de> <4qm8a9ta.fsf@python.net> Message-ID: <413F54F3.30500@v.loewis.de> Thomas Heller wrote: > It seems to be a flag in the exe header. A quick google search turned > up this: > > http://www.codeguru.com/Cpp/W-P/system/misc/article.php/c2897/ Sure. However, if I do foo.py then some part of the system must determine that python.exe is to be invoked, and then must determine that this is a console binary. Does that all happen in cmd.exe? Regards, Martin From cce at clarkevans.com Wed Sep 8 20:53:54 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Wed Sep 8 20:53:57 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> Message-ID: <20040908185353.GA62848@prometheusresearch.com> On Wed, Sep 08, 2004 at 02:08:17PM -0400, James Y Knight wrote: | >This proposal assumes that a corresponding iterator written using | >this class-based method is possible for existing generators. The | >challenge seems to be the identification of distinct states within | >the generator where suspension could occur. | | That is basically impossible. Essentially *every* operation could | possibly raise SuspendIteration, because essentially every operation | can call an arbitrary python function, and python functions can raise | any exception they want. If the SuspendIteration() was raised in an arbitrary Python function, it would close-out the function call due to exception semantics. So, a brain-dead situation would have to make each time a function is called a separate state. The proposal is not implying that this would be converting arbitrary functions into generators, if they happened to raise SuspendIteration(). | I think you could still make the proposal work | in CPython: if I understand its internals properly, it doesn't need to | do a transformation to a class iterator, it simply suspends the frame | wherever it is. Thus, being able to suspend at any point in the | function would not cause an undue performance degradation. Ok. | However, I think it is a deal-breaker for JPython. From the generator | PEP: "It's also believed that efficient implementation in Jython | requires that the compiler be able to determine potential suspension | points at compile-time, and a new keyword makes that easy." If this | quote is right about the implementation of Jython (and it seems likely, | given the JVM), your proposal makes it impossible to implement | generators in Jython. Ok, beacuse suspension points would now include not only 'yield' statements, but potentially any function call. So, it could be quite inefficient, but it is not impossible. For an optimization, you could decorate a function if it could throw a SuspendIteration. If an non-decorated function threw that exception, it would be a deal-breaker. | Given that the advantage claimed for this proposal over stackless is | that it can be implemented in non-CPython runtimes, I think it still | needs some reworking. Thanks for your feedback. Best, Clark From pedronis at bluewin.ch Wed Sep 8 20:58:21 2004 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Sep 8 20:56:05 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> Message-ID: <413F564D.2070708@bluewin.ch> James Y Knight wrote: > > On Sep 7, 2004, at 9:48 PM, Clark C. Evans wrote: > >> I've packaged up the idea of a coroutine facility using iterators and an >> exception, SuspendIteration. > > > Very interesting. > >> This proposal assumes that a corresponding iterator written using >> this class-based method is possible for existing generators. The >> challenge seems to be the identification of distinct states within >> the generator where suspension could occur. > > > That is basically impossible. Essentially *every* operation could > possibly raise SuspendIteration, because essentially every operation can > call an arbitrary python function, and python functions can raise any > exception they want. I think you could still make the proposal work in > CPython: if I understand its internals properly, it doesn't need to do a > transformation to a class iterator, it simply suspends the frame > wherever it is. Thus, being able to suspend at any point in the function > would not cause an undue performance degradation. I don't think it is that simple for CPython either, a single bytecode can potentially invoke more then just a single builtin or other python code, e.g. calling construction can invoke __new__ and __init__, and then there are all the cases were descriptors are involved with their __get__ etc (and __add__,__radd__...). So bytecodes are not the right suspension/resumption granularity because you don't want to reinvoke things that could have had side-effects. So you have all the points per bytecode were python code/builtins can be invoked or from another POV an exception can be detected. If I understand the proposal (which is quite vague), like restartable syscalls, there is also the matter that whatever raised the SuspendIteration should be retried on resumption of the generator, e.g calling nested generator next. So one would have to cherry pick for each bytecode or similar abstract operations model relevant suspension/resumption points and it would still be quite a daunting task to implement this adding the code for intra bytecode resumption. (Of course this assumes that capturing the C stack and similar techniques are out of question) > > However, I think it is a deal-breaker for JPython. From the generator > PEP: "It's also believed that efficient implementation in Jython > requires that the compiler be able to determine potential suspension > points at compile-time, and a new keyword makes that easy." If this > quote is right about the implementation of Jython (and it seems likely, > given the JVM), your proposal makes it impossible to implement > generators in Jython. a hand coded implementation would be a *lot* of work (beyound practical) for potentially very bad performance and a resulting messy codebase. One could also encounter resulting code size problems or issues with the verifier. From theller at python.net Wed Sep 8 21:10:34 2004 From: theller at python.net (Thomas Heller) Date: Wed Sep 8 21:10:37 2004 Subject: [Python-Dev] Console vs. GUI applications In-Reply-To: <413F54F3.30500@v.loewis.de> ( =?iso-8859-1?q?Martin_v._L=F6wis's_message_of?= "Wed, 08 Sep 2004 20:52:35 +0200") References: <413E8D78.2030302@v.loewis.de> <413F4F3C.8060905@v.loewis.de> <4qm8a9ta.fsf@python.net> <413F54F3.30500@v.loewis.de> Message-ID: "Martin v. L?wis" writes: > Thomas Heller wrote: >> It seems to be a flag in the exe header. A quick google search turned >> up this: >> http://www.codeguru.com/Cpp/W-P/system/misc/article.php/c2897/ > > Sure. However, if I do > > foo.py > > then some part of the system must determine that python.exe is > to be invoked, and then must determine that this is a console > binary. Does that all happen in cmd.exe? I cannot answer this question (and I'm not the windows guru either;), but using regmon from sysinternals shows that cmd.exe does more than 500 registry accesses before python.exe is finally started - so it *does* a lot of work. Thomas From cce at clarkevans.com Wed Sep 8 21:20:57 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Wed Sep 8 21:20:58 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <413F564D.2070708@bluewin.ch> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> <413F564D.2070708@bluewin.ch> Message-ID: <20040908192056.GB62848@prometheusresearch.com> On Wed, Sep 08, 2004 at 08:58:21PM +0200, Samuele Pedroni wrote: | If I understand the proposal (which is quite vague), like restartable | syscalls, there is also the matter that whatever raised the | SuspendIteration should be retried on resumption of the generator, e.g | calling nested generator next. That's exactly the idea. The SuspendIteration exception could contain, however, the file/socket that it is blocked on, so a smart scheduler need not be blindly restarting things. | So one would have to cherry pick for each bytecode or similar abstract | operations model relevant suspension/resumption points and it would | still be quite a daunting task to implement this adding the code | for intra bytecode resumption. (Of course this assumes that capturing | the C stack and similar techniques are out of question) I was assuming that only calls within the generator to next(), implicit or otherwise, would be suspension points. This covers all of my use cases anyway. In the other situations, if they are even useful, don't do that. Convert the SuspendIteration to a RuntimeError, or resume at the previous suspension point? The idea of the PEP was that a nested-generator context provides this limited set of suspension points to make an implementation possible. Kind Regards, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * From tim.peters at gmail.com Wed Sep 8 21:24:49 2004 From: tim.peters at gmail.com (Tim Peters) Date: Wed Sep 8 21:25:09 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <413E8D78.2030302@v.loewis.de> References: <413E8D78.2030302@v.loewis.de> Message-ID: <1f7befae04090812247d1589b0@mail.gmail.com> [Martin v. L?wis] > I recently looked into properly implementing the "Register Extensions" > feature in the installer; in 2.4a3, not selecting that doesn't really > work. The problem is that MSI only supports installing either both > the "extension server" (the .exe) and the extension, or neither. So > you can chose not to install word.exe, and it won't install the .doc > extension; if you install word.exe, it will associate .doc with it. > > For Python, this leaves us with three options: > 1. Don't make registration of extensions optional; always associate > .py, .pyc, .pyw, .pyo. -1. If we do that, I'll never install an alpha or beta again <0.5 wink>. > 2. Don't support installation-on-demand for extensions. This means > to not use the MSI extension machinery at all, but to directly > write the registry keys that build the extension. Installing > these keys can then be made optional. +1. I may or may not want to change/create .py (etc) extensions. I never before heard of the concept of "install-on-demand for extensions", and I don't think I want to. > 3. Provide another binary that is the "extension server", and > install that independently of python.exe, and pythonw.exe. > In CVS, I have implemented this approach to see whether it > works (it does), and called this binary "launcher.exe". It > is a Windows app which supports a -console argument which also > makes it a console app. This is the the binary that gets > associated with all four extensions, for the "open" verb. This is soooo convoluted compared to what it's trying to achieve: write the registry entries, or don't, end of story. It would be nicest if the code to fiddle the registry were materialized as a .reg file. Then (later, and manually) switching among multiple installed Pythons would be easy. From pedronis at bluewin.ch Wed Sep 8 21:33:10 2004 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Sep 8 21:30:49 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040908192056.GB62848@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> <413F564D.2070708@bluewin.ch> <20040908192056.GB62848@prometheusresearch.com> Message-ID: <413F5E76.4050805@bluewin.ch> Clark C. Evans wrote: > On Wed, Sep 08, 2004 at 08:58:21PM +0200, Samuele Pedroni wrote: > | If I understand the proposal (which is quite vague), like restartable > | syscalls, there is also the matter that whatever raised the > | SuspendIteration should be retried on resumption of the generator, e.g > | calling nested generator next. > > That's exactly the idea. The SuspendIteration exception could contain, > however, the file/socket that it is blocked on, so a smart scheduler > need not be blindly restarting things. > > | So one would have to cherry pick for each bytecode or similar abstract > | operations model relevant suspension/resumption points and it would > | still be quite a daunting task to implement this adding the code > | for intra bytecode resumption. (Of course this assumes that capturing > | the C stack and similar techniques are out of question) > > I was assuming that only calls within the generator to next(), implicit > or otherwise, would be suspension points. I missed that. > > This covers all of my use cases anyway. In the other situations, if > they are even useful, don't do that. Convert the SuspendIteration to a > RuntimeError, or resume at the previous suspension point? > > The idea of the PEP was that a nested-generator context provides this > limited set of suspension points to make an implementation possible. then the PEP needs clarification because I had the impression that def g(src): data = src.read() yield data data = src.read() yield data the read itself could throw a SuspendIteration, and upon the sucessive next the src.read() itself would be retried. But if it's only nexts than can be suspension points then the generator would be not resumable in this case. Which is the case? From mal at egenix.com Wed Sep 8 21:44:32 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Wed Sep 8 21:45:00 2004 Subject: [Python-Dev] Re: Re: Re: AlternativeImplementation forPEP292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> Message-ID: <413F6120.7090603@egenix.com> Fredrik Lundh wrote: > M.-A. Lemburg wrote: > >>The whole point in adding Unicode to the language was to make >>the difference between text and binary data clear and visible >>at the type level. > > well, when I wrote the Unicode type, the whole point was to be able to > make it easy to handle Unicode text. no more, no less. ... and the Unicode integration made that a reality :-) In todays globalized world, the only sane way to deal with different scripts is through Unicode, which is why I believe that text data should eventually always be stored in Unicode objects - regardless of whether it takes more memory or not. (If you compare development time to prices of a few GB extra RAM, the effort needed to maintain text in non-Unicode formats simply doesn't pay off anymore.) >>If we start to store text data in Unicode now and leave binary >>data in 8-bit strings, then the move to Unicode strings literals >>will be much smoother in P3k. > > hopefully, the P3K string design will take a lot more into account than > text-vs-binary; there are many ways to represent text, and many ways > to store binary data, and many usage patterns for them both. a good > design should take most of this into account. (google for "stringlib" for > some work I'm doing in this area) Ah, now I know where you're coming from :-) Shift tables don't work well in the Unicode world with its large alphabet. BTW, you might want to look at the BMS implementation I did for mxTextTools. Here's a nice reference for pattern matching: http://www-igm.univ-mlv.fr/~lecroq/string/index.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 08 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From cce at clarkevans.com Wed Sep 8 21:58:53 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Wed Sep 8 21:59:01 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <413F5E76.4050805@bluewin.ch> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> <413F564D.2070708@bluewin.ch> <20040908192056.GB62848@prometheusresearch.com> <413F5E76.4050805@bluewin.ch> Message-ID: <20040908195852.GB98180@prometheusresearch.com> On Wed, Sep 08, 2004 at 09:33:10PM +0200, Samuele Pedroni wrote: | Clark C. Evans wrote: | >I was assuming that only calls within the generator to next(), implicit | >or otherwise, would be suspension points. | | I missed that. *nod* I will fix the PEP. | >This covers all of my use cases anyway. In the other situations, if | >they are even useful, don't do that. Convert the SuspendIteration to a | >RuntimeError, or resume at the previous suspension point? | > | >The idea of the PEP was that a nested-generator context provides this | >limited set of suspension points to make an implementation possible. | | then the PEP needs clarification because I had the impression that | | def g(src): | data = src.read() | yield data | data = src.read() | yield data The data producers would all be iterators, ones that that could possibly raise SuspendIteration() from within their next() method. | the read itself could throw a SuspendIteration If read() did raise a SuspendIteration() exception, then it would make sense to terminate the generator, perhaps with a RuntimeError. I just hadn't considered this case. If someone has a clever solution that makes this case work, great, but its not something that I was contemplating. | and upon the sucessive | next the src.read() itself would be retried. | But if it's only nexts than | can be suspension points then the generator would be not resumable in | this case. Right. I was musing (but it's not in the PEP) that, iter() would sprout an option that let the producer know if it can suspend. If a generator that was itself called with this suspend flag asked for a child generator, then the suspend flag would be carried. But this is a separate issue. Thanks for thinking about this PEP. Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * From pedronis at bluewin.ch Wed Sep 8 22:14:54 2004 From: pedronis at bluewin.ch (Samuele Pedroni) Date: Wed Sep 8 22:12:31 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040908195852.GB98180@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> <413F564D.2070708@bluewin.ch> <20040908192056.GB62848@prometheusresearch.com> <413F5E76.4050805@bluewin.ch> <20040908195852.GB98180@prometheusresearch.com> Message-ID: <413F683E.1050204@bluewin.ch> Clark C. Evans wrote: > On Wed, Sep 08, 2004 at 09:33:10PM +0200, Samuele Pedroni wrote: > | Clark C. Evans wrote: > | >I was assuming that only calls within the generator to next(), implicit > | >or otherwise, would be suspension points. > | > | I missed that. > > *nod* I will fix the PEP. > > | >This covers all of my use cases anyway. In the other situations, if > | >they are even useful, don't do that. Convert the SuspendIteration to a > | >RuntimeError, or resume at the previous suspension point? > | > > | >The idea of the PEP was that a nested-generator context provides this > | >limited set of suspension points to make an implementation possible. > | > | then the PEP needs clarification because I had the impression that > | > | def g(src): > | data = src.read() > | yield data > | data = src.read() > | yield data > > The data producers would all be iterators, ones that that could > possibly raise SuspendIteration() from within their next() method. > > | the read itself could throw a SuspendIteration > > If read() did raise a SuspendIteration() exception, then it would > make sense to terminate the generator, perhaps with a RuntimeError. > I just hadn't considered this case. If someone has a clever > solution that makes this case work, great, but its not something > that I was contemplating. thinking about it, but this is not different: def g(src): data = src.next() yield data data = src.next() yield data def g(src): demand = src.next data = demand() yield data data = demand() yield data what is supposed to happen here, notice that you may know that src.next is an iterator 'next' at runtime but not at compile time. From pje at telecommunity.com Wed Sep 8 23:38:32 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 8 23:37:58 2004 Subject: [Python-Dev] PEP 302 and 'reload()' Message-ID: <5.1.1.6.0.20040908172822.020f0a40@mail.telecommunity.com> It appears to me there is an error in both PEP 302's specification and its implementation concerning the correct operation of reload(). First, it says: The load_module() method has a few responsibilities that it must fulfill *before* it runs any code: - It must create the module object. From Python this can be done via the new.module() function, the imp.new_module() function or via the module type object; from C with the PyModule_New() function or the PyImport_ModuleAdd() function. This should probably say that if the module already exists in sys.modules, it should reuse the existing module object, rather than creating a new one. Otherwise, 'reload()' cannot fulfill its contract. Second, the actual implementation of PyImport_ReloadModule doesn't actually use a loader object, so reload() doesn't work with import hooks at all. There's an SF bug report for this, and a patch to fix it (that also adds a test to test_importhooks to ensure that 'reload()' actually invokes the loader. Are there any objections to me fixing either/both of these, and backporting the bugfix to the 2.3 maintenance branch? Also, should PyImport_ReloadModule use the import lock? It doesn't currently, but I'm not clear on why it doesn't. From martin at v.loewis.de Wed Sep 8 23:51:25 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 8 23:51:17 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <1f7befae04090812247d1589b0@mail.gmail.com> References: <413E8D78.2030302@v.loewis.de> <1f7befae04090812247d1589b0@mail.gmail.com> Message-ID: <413F7EDD.7030600@v.loewis.de> Tim Peters wrote: > +1. I may or may not want to change/create .py (etc) extensions. I > never before heard of the concept of "install-on-demand for > extensions", and I don't think I want to. Ok, I'll wait for some more votes, and failing them, I'll revert the entire advertisement (of extensions) change. Creating a .reg file is a different issue, which may or may not happen. Regards, Martin From pf_moore at yahoo.co.uk Thu Sep 9 00:03:28 2004 From: pf_moore at yahoo.co.uk (Paul Moore) Date: Thu Sep 9 00:03:11 2004 Subject: [Python-Dev] Re: Install-on-first-use vs. optional extensions References: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> <413F4FE2.2090602@v.loewis.de> Message-ID: "Martin v. L?wis" writes: > Moore, Paul wrote: >> With option (3), what happens if you run "launcher -console" from a >> command prompt? Does it produce output in the same console window, >> or does it launch a new console? > > That was a problem originally, which I have now fixed into > > (3') Install two binaries, extpy.exe, and extpyw.exe. > > With that approach, what do you think? I tend to agree with Tim - I'm not at all sure what "install-on-first- use" is doing, and I don't think I care. All I care about are being able to *not* install the extensions (for alpha/beta releases) and being able *to* install them (for final releases). I don't have any problem that this feature could solve, so I don't have a valid reason to have an opinion... Paul. -- The only reason some people get lost in thought is because it's unfamiliar territory -- Paul Fix From gvanrossum at gmail.com Thu Sep 9 00:55:16 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 9 00:55:23 2004 Subject: [Python-Dev] Re: Install-on-first-use vs. optional extensions In-Reply-To: References: <16E1010E4581B049ABC51D4975CEDB8803060F85@UKDCX001.uk.int.atosorigin.com> <413F4FE2.2090602@v.loewis.de> Message-ID: I'm with Tim too -- the MSI solution seems too convoluted to bother. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From janssen at parc.com Thu Sep 9 02:19:47 2004 From: janssen at parc.com (Bill Janssen) Date: Thu Sep 9 02:22:09 2004 Subject: [Python-Dev] PEP 328 - Relative Imports In-Reply-To: Your message of "Wed, 08 Sep 2004 07:47:35 PDT." <413F1B87.90301@egenix.com> Message-ID: <04Sep8.171951pdt."58612"@synergy1.parc.xerox.com> > I'd like to request that the latter change be postponed to > Python 3k, or that some other way of supporting the above > scenarios is provided that can be enabled in the application. Well said. Bill From gvanrossum at gmail.com Thu Sep 9 04:20:29 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 9 04:20:39 2004 Subject: [Python-Dev] Re: Re: Missing arguments in RE functions In-Reply-To: References: <001501c495ca$d3a8bd40$e841fea9@oemcomputer> Message-ID: > If it's up to me, it's a clear "not worth it". The function API is only there for > trivial cases; if you need the full RE power, use pattern objects (you have > to use them anyway if you're serious about RE:s). > > but I'm an API minimalist; someone else will have to make the final decision > on this one (Guido, what's your take on API size issues?) I'm with /F here. There are already so many ways to do it, adding more isn't going to make things easier, and I'd rather see the API stable. Also, it's awfully close to 2.4b1. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gvanrossum at gmail.com Thu Sep 9 04:29:06 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Sep 9 04:29:37 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: <413F23E0.2090908@egenix.com> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <413F1D9C.20209@egenix.com> <413F23E0.2090908@egenix.com> Message-ID: > > Only in Python 3.0. > > We better start early to ever reach the point of making > a clear distinction between text and binary data in P3k. The introduction of a bytes type in Python 2.5 should be a good start. > > But even so, deriving from Unicode (or str) means the template class > > inherits a lot of unwanted operations. While I can see that > > concatenating templates probably works, slicing them or converting to > > lowercase etc. make no sense. IMO the standard Template class should > > implement a "narrow" interface, i.e. *only* the template expansion > > method (__mod__ or something else), so it's clear that other > > compatible template classes shouldn't have to implement anything > > besides that. This avoids the issues we have with the mapping > > protocol: when does an object implement enough of the mapping API to > > be usable? That's currently ill-defined; sometimes, __getitem__ is all > > you need, sometimes __contains__ is required, sometimes keys, rarely > > setdefault. > > Looks like it's ont even clear what templating itself should > mean... you're talking about a templating interface here, not an > object type, like Barry is (for the sake of making Templates compatible > to i18n tools like gettext). I don't know zip about i18n or gettext. But I thought we had plenty of time since Barry has offered to withdraw the PEP 292 implementation for 2.4? -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From gmccaughan at synaptics-uk.com Thu Sep 9 10:39:41 2004 From: gmccaughan at synaptics-uk.com (Gareth McCaughan) Date: Thu Sep 9 10:40:17 2004 Subject: [Python-Dev] Re: Re: =?iso-8859-1?q?Re=3A=09AlternativeImplementation=09forPEP292=3ASimpleStri?= =?iso-8859-1?q?ng?= Substitutions In-Reply-To: <413F6120.7090603@egenix.com> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <413F6120.7090603@egenix.com> Message-ID: <200409090939.41873.gmccaughan@synaptics-uk.com> Marc-Andre Lemburg wrote: > In todays globalized world, the only sane way to deal with > different scripts is through Unicode, which is why I > believe that text data should eventually always be stored in > Unicode objects - regardless of whether it takes more memory > or not. > > (If you compare development time to prices of a few GB extra > RAM, the effort needed to maintain text in non-Unicode > formats simply doesn't pay off anymore.) This is not as obvious as it seems, because the "few GB extra RAM" is a price paid by everyone who *uses* the software. Granted, it's quite common for software to be only run ever on one or two machines in the company where it was developed, but not all software is used that way. Also: the price of "a few GB extra RAM" is not always as low as it seems. If adding 2GB means moving from 3GB to 5GB, it may mean replacing the CPU and the OS. That said, I strongly agree that all textual data should be Unicode as far as the developer is concerned; but, at least in the USA :-), it makes sense to have an optimized representation that saves space for ASCII-only text, just as we have an optimized representation for small integers. (The benefit is potentially much greater in that case, though.) -- g From Paul.Moore at atosorigin.com Thu Sep 9 10:56:00 2004 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Thu Sep 9 10:56:05 2004 Subject: [Python-Dev] Console vs. GUI applications Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060F88@UKDCX001.uk.int.atosorigin.com> From: "Martin v. L?wis" >Thomas Heller wrote: >> It seems to be a flag in the exe header. A quick google search turned >> up this: >> >> http://www.codeguru.com/Cpp/W-P/system/misc/article.php/c2897/ > > Sure. However, if I do > > foo.py > > then some part of the system must determine that python.exe is > to be invoked, and then must determine that this is a console > binary. Does that all happen in cmd.exe? I believe so. The relevant Windows API call is CreateProcess, which only handles EXEs (and maybe some obscure cases like COM and PIF files). Everything else gets done in user code (in this case CMD.EXE). So CMD.EXE runs CreateProcess on your launcher.exe. CreateProcess checks a flag (the "subsystem") in the executable header, and acts dependent on that. For a "console" executable, it leaves the new process attached to the current console (the one CMD.EXE is using) and for a "windows" executable, it detaches the process from any console. (Default behaviour - there are flags which can affect this). I can't work out how CMD.EXE "knows" to wait for a child process to release the console (immediately for a windows process, when it terminates for a console process) but clearly it does... A test shows that it *is* possible for two console processes to share the console. The result is an unusable mess, though, so we should be glad cmd.exe does avoid this :-) Paul. __________________________________________________________________________ This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Origin group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. __________________________________________________________________________ From michael.walter at gmail.com Thu Sep 9 11:32:44 2004 From: michael.walter at gmail.com (Michael Walter) Date: Thu Sep 9 11:32:51 2004 Subject: [Python-Dev] Console vs. GUI applications In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060F88@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F88@UKDCX001.uk.int.atosorigin.com> Message-ID: <877e9a1704090902326f2373e9@mail.gmail.com> I guessed CMD.EXE would run ShellExecute(), to which you can pass a filename such as "foo.py". Didn't verify this tho :) Cheers, Michael On Thu, 9 Sep 2004 09:56:00 +0100, Moore, Paul wrote: > From: "Martin v. L?wis" > >Thomas Heller wrote: > >> It seems to be a flag in the exe header. A quick google search turned > >> up this: > >> > >> http://www.codeguru.com/Cpp/W-P/system/misc/article.php/c2897/ > > > > Sure. However, if I do > > > > foo.py > > > > then some part of the system must determine that python.exe is > > to be invoked, and then must determine that this is a console > > binary. Does that all happen in cmd.exe? > > I believe so. The relevant Windows API call is CreateProcess, which > only handles EXEs (and maybe some obscure cases like COM and PIF files). > Everything else gets done in user code (in this case CMD.EXE). > > So CMD.EXE runs CreateProcess on your launcher.exe. CreateProcess checks > a flag (the "subsystem") in the executable header, and acts dependent on > that. For a "console" executable, it leaves the new process attached to > the current console (the one CMD.EXE is using) and for a "windows" > executable, it detaches the process from any console. (Default behaviour > - there are flags which can affect this). > > I can't work out how CMD.EXE "knows" to wait for a child process to release > the console (immediately for a windows process, when it terminates for a > console process) but clearly it does... A test shows that it *is* possible > for two console processes to share the console. The result is an unusable > mess, though, so we should be glad cmd.exe does avoid this :-) > > Paul. > > __________________________________________________________________________ > This e-mail and the documents attached are confidential and intended > solely for the addressee; it may also be privileged. If you receive this > e-mail in error, please notify the sender immediately and destroy it. > As its integrity cannot be secured on the Internet, the Atos Origin group > liability cannot be triggered for the message content. Although the > sender endeavours to maintain a computer virus-free network, the sender > does not warrant that this transmission is virus-free and will not be > liable for any damages resulting from any virus transmitted. > __________________________________________________________________________ > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/michael.walter%40gmail.com > From fredrik at pythonware.com Thu Sep 9 11:49:57 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Sep 9 11:50:01 2004 Subject: [Python-Dev] Re: Console vs. GUI applications References: <16E1010E4581B049ABC51D4975CEDB8803060F88@UKDCX001.uk.int.atosorigin.com> <877e9a1704090902326f2373e9@mail.gmail.com> Message-ID: Michael Walter wrote: > I guessed CMD.EXE would run ShellExecute(), to which you can pass a > filename such as "foo.py". Didn't verify this tho :) > dumpbin /imports \windows\system32\cmd.exe | grep Shell > dumpbin /imports \windows\system32\cmd.exe | grep Create 7C81E968 4A CreateDirectoryW 7C802332 66 CreateProcessW 7C810976 52 CreateFileW (I doubt ShellExecute gives cmd.exe the control it needs. besides, ShellExecute is part of the shell layer, not the core Windows API. and the shell layer depends on everyone and his brother; I doubt they want the command line interface to depend on the GDI layer, RPC services, etc.) From michael.walter at gmail.com Thu Sep 9 11:53:10 2004 From: michael.walter at gmail.com (Michael Walter) Date: Thu Sep 9 11:53:13 2004 Subject: [Python-Dev] Re: Console vs. GUI applications In-Reply-To: References: <16E1010E4581B049ABC51D4975CEDB8803060F88@UKDCX001.uk.int.atosorigin.com> <877e9a1704090902326f2373e9@mail.gmail.com> Message-ID: <877e9a1704090902534aaa4ec7@mail.gmail.com> Ah, I see. Thanks, Michael On Thu, 9 Sep 2004 11:49:57 +0200, Fredrik Lundh wrote: > Michael Walter wrote: > > > I guessed CMD.EXE would run ShellExecute(), to which you can pass a > > filename such as "foo.py". Didn't verify this tho :) > > > dumpbin /imports \windows\system32\cmd.exe | grep Shell > > > dumpbin /imports \windows\system32\cmd.exe | grep Create > 7C81E968 4A CreateDirectoryW > 7C802332 66 CreateProcessW > 7C810976 52 CreateFileW > > (I doubt ShellExecute gives cmd.exe the control it needs. besides, ShellExecute > is part of the shell layer, not the core Windows API. and the shell layer depends > on everyone and his brother; I doubt they want the command line interface to > depend on the GDI layer, RPC services, etc.) > > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/michael.walter%40gmail.com > From arigo at tunes.org Thu Sep 9 12:14:44 2004 From: arigo at tunes.org (Armin Rigo) Date: Thu Sep 9 12:20:11 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040908192056.GB62848@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> <413F564D.2070708@bluewin.ch> <20040908192056.GB62848@prometheusresearch.com> Message-ID: <20040909101444.GA2877@vicky.ecs.soton.ac.uk> Hi, I agree with Samuele that the proposal is far too vague currently. You should try to describe what precisely should occur in each situation. A major problem I see with the proposal is that you can describe what should occur in some situations by presenting source code snippets; such descriptions correspond easily to possible semantics at the bytecode level. But bytecode is not a natural granularity for coroutine issues. Frames (either of generators or functions) execute operations that may invoke new frames, and all frames in the chain except possibly the most recent one need to be suspended *during* the execution of their current bytecode. For example, a generator f() may currently be calling a generator g() with a FOR_ITER bytecode ('for' statement), a CALL_FUNCTION (calling next()), or actually anything else like a BINARY_ADD which calls a nb_add implemented in C which indirectly calls back to Python code. For this reason it is not reasonably possible to implement restartable exceptions in general: when an exception is caught, not all the C state is saved (i.e. you don't know where, *within* the execution of a bytecode, you should restart). Your PEP is very similar to restartable exceptions: their possible semantics are difficult to specify in general. You may try to do that to understand what I mean. This doesn't mean that it is impossible to figure out a more limited concept, like you are trying to do. However keeping the "restartable exception" idea in mind should help focusing on the difficult problems and where restrictions are needed. I think that Stackless contains all the solutions in this area, and I'm not talking about the C stack hacking. Stackless is sometimes able to switch coroutines without hacking at the C stack. I think that if any coroutine support is ever going to be added to CPython it will be done in a similar way. (Generators were also inspired by Stackless, BTW.) (Also note that although the generator syntax is nice and helpful, it would have been possible to write generators without any custom 'yield' syntax if we had restartable exceptions; this makes the latter idea more general and independent from generators.) A bient?t, Armin. From garth at garthy.com Thu Sep 9 13:20:39 2004 From: garth at garthy.com (Garth) Date: Thu Sep 9 13:20:51 2004 Subject: [Python-Dev] Re: Console vs. GUI applications In-Reply-To: <877e9a1704090902534aaa4ec7@mail.gmail.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F88@UKDCX001.uk.int.atosorigin.com> <877e9a1704090902326f2373e9@mail.gmail.com> <877e9a1704090902534aaa4ec7@mail.gmail.com> Message-ID: <41403C87.3070705@garthy.com> use depends.exe to open cmd.exe It delay loads shell32.dll and uses the functions ShellExecuteExW and SHChangeNotify so it may use them. Garth Michael Walter wrote: >Ah, I see. > >Thanks, >Michael > >On Thu, 9 Sep 2004 11:49:57 +0200, Fredrik Lundh wrote: > > >>Michael Walter wrote: >> >> >> >>>I guessed CMD.EXE would run ShellExecute(), to which you can pass a >>>filename such as "foo.py". Didn't verify this tho :) >>> >>> >>>dumpbin /imports \windows\system32\cmd.exe | grep Shell >>> >>> >>>dumpbin /imports \windows\system32\cmd.exe | grep Create >>> >>> >> 7C81E968 4A CreateDirectoryW >> 7C802332 66 CreateProcessW >> 7C810976 52 CreateFileW >> >>(I doubt ShellExecute gives cmd.exe the control it needs. besides, ShellExecute >>is part of the shell layer, not the core Windows API. and the shell layer depends >>on everyone and his brother; I doubt they want the command line interface to >>depend on the GDI layer, RPC services, etc.) >> >> >> >> >> >> >>_______________________________________________ >>Python-Dev mailing list >>Python-Dev@python.org >>http://mail.python.org/mailman/listinfo/python-dev >>Unsubscribe: http://mail.python.org/mailman/options/python-dev/michael.walter%40gmail.com >> >> >> >_______________________________________________ >Python-Dev mailing list >Python-Dev@python.org >http://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: http://mail.python.org/mailman/options/python-dev/garth%40garthy.com > > > From garth at garthy.com Thu Sep 9 13:39:33 2004 From: garth at garthy.com (Garth) Date: Thu Sep 9 13:39:35 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <413E8D78.2030302@v.loewis.de> References: <413E8D78.2030302@v.loewis.de> Message-ID: <414040F5.4030706@garthy.com> Couldn't you conditionally run RegisterExtensionInfo? And set this in a dialog checkbox? (Which I don't know haow to do in msi + python) This is my guess at a patch --- oldsequence.py Thu Sep 9 12:35 :51 2004 +++ sequence.py Thu Sep 9 12:35 :31 2004 @@ -50,7 +50,7 @@ (u'PublishFeatures', None, 6300), (u'PublishProduct', None, 6400), (u'RegisterClassInfo', None, 4600), -(u'RegisterExtensionInfo', None, 4700), +(u'RegisterExtensionInfo', 'INSTALLEXT=1', 4700), (u'RegisterMIMEInfo', None, 4900), (u'RegisterProgIdInfo', None, 4800), (u'AllocateRegistrySpace', u'NOT Installed', 1550), Martin v. L?wis wrote: > I recently looked into properly implementing the "Register Extensions" > feature in the installer; in 2.4a3, not selecting that doesn't really > work. The problem is that MSI only supports installing either both > the "extension server" (the .exe) and the extension, or neither. So > you can chose not to install word.exe, and it won't install the .doc > extension; if you install word.exe, it will associate .doc with it. > > For Python, this leaves us with three options: > 1. Don't make registration of extensions optional; always associate > .py, .pyc, .pyw, .pyo. > 2. Don't support installation-on-demand for extensions. This means > to not use the MSI extension machinery at all, but to directly > write the registry keys that build the extension. Installing > these keys can then be made optional. > 3. Provide another binary that is the "extension server", and > install that independently of python.exe, and pythonw.exe. > In CVS, I have implemented this approach to see whether it > works (it does), and called this binary "launcher.exe". It > is a Windows app which supports a -console argument which also > makes it a console app. This is the the binary that gets > associated with all four extensions, for the "open" verb. > > Currently, I'm in favour of using option 3, but I'd like to hear > whether people would prefer something else instead. > > Regards, > Martin > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/garth%40garthy.com > From nas at arctrix.com Thu Sep 9 20:07:44 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Thu Sep 9 20:07:49 2004 Subject: [Python-Dev] unicode inconsistency? Message-ID: <20040909180743.GA31140@mems-exchange.org> Perhaps this is more approprate for python-list but I looks like a bug to me. Example code: class A: def __str__(self): return u'\u1234' '%s' % u'\u1234' # this works '%s' % A() # this doesn't work It will work if 'A' subclasses from 'unicode' but should not be necessary, IMHO. Any reason why this shouldn't be fixed? Neil From aahz at pythoncraft.com Thu Sep 9 20:09:56 2004 From: aahz at pythoncraft.com (Aahz) Date: Thu Sep 9 20:10:00 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <20040909180743.GA31140@mems-exchange.org> References: <20040909180743.GA31140@mems-exchange.org> Message-ID: <20040909180955.GA28902@panix.com> On Thu, Sep 09, 2004, Neil Schemenauer wrote: > > Perhaps this is more approprate for python-list but I looks like a > bug to me. Example code: > > class A: > def __str__(self): > return u'\u1234' > > '%s' % u'\u1234' # this works > '%s' % A() # this doesn't work > > It will work if 'A' subclasses from 'unicode' but should not be > necessary, IMHO. Any reason why this shouldn't be fixed? Check the recent python-dev archives for a long and nauseating thread about interactions between __str__ and unicode. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From martin at v.loewis.de Thu Sep 9 20:29:26 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 9 20:29:17 2004 Subject: [Python-Dev] Install-on-first-use vs. optional extensions In-Reply-To: <414040F5.4030706@garthy.com> References: <413E8D78.2030302@v.loewis.de> <414040F5.4030706@garthy.com> Message-ID: <4140A106.4020100@v.loewis.de> Garth wrote: > Couldn't you conditionally run RegisterExtensionInfo? This is what I currently do (see msi.py:build_database). Unfortunately, it doesn't work: Installer then unconditionally runs UnregisterExtensionInfo first, which removes any old extension information before installing a new one. Now, this could also be made conditional, although defining the condition is difficult: If the user changes the extension from "installed" to "absent", UnregisterExtensionInfo *should* run. In any case, uninstalling the entire package (i.e. executing the toplevel REMOVE action) doesn't run the InstallExecuteSequence (I believe), which further complicates issues. I've played with a number of options, and could not make it to work. I have given up now, but if you find a solution, please let me know. Regards, Martin From martin at v.loewis.de Thu Sep 9 20:42:38 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 9 20:42:29 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <20040909180955.GA28902@panix.com> References: <20040909180743.GA31140@mems-exchange.org> <20040909180955.GA28902@panix.com> Message-ID: <4140A41E.9080705@v.loewis.de> Aahz wrote: >>It will work if 'A' subclasses from 'unicode' but should not be >>necessary, IMHO. Any reason why this shouldn't be fixed? > > > Check the recent python-dev archives for a long and nauseating thread > about interactions between __str__ and unicode. Although that really doesn't answer this particular question. It was about str() and its interaction with __str__ and __unicode__, and whether Python should support __unicode__. For the specific issue, I would maintain that str() should always return string objects. I'm not so sure about %s since, as Neil observes, '%s' % unicode_string gives a unicode result. I can't see any harm by supporting this operation also if __str__ returns a Unicode object. Regards, Martin From tim.peters at gmail.com Thu Sep 9 20:44:56 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 9 20:44:58 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <20040909180743.GA31140@mems-exchange.org> References: <20040909180743.GA31140@mems-exchange.org> Message-ID: <1f7befae04090911441c85dfd9@mail.gmail.com> [Neil Schemenauer] > Perhaps this is more approprate for python-list but I looks like a > bug to me. Example code: > > class A: > def __str__(self): > return u'\u1234' > > '%s' % u'\u1234' # this works > '%s' % A() # this doesn't work > > It will work if 'A' subclasses from 'unicode' but should not be > necessary, IMHO. You know better than to say "doesn't work". I assume you mean the latter raises UnicodeEncodeError. > Any reason why this shouldn't be fixed? Didn't we just go thru this, last week or so? PyObject_Str() never returns a unicode (it returns a str). That is, str(A()) raises UnicodeEncodeError, and that's out of interpolation's hands. As Martin said last time, a __str__ method that returns a unicode doesn't make much sense. I'm not sure you really mean "it will work if 'A' subclasses from 'unicode'" either: >>> class A(unicode): ... def __str__(self): ... return u'\u1234' ... >>> '%s' % A() u'' >>> len(_) 0 >>> That is, A.__str__ is ignored if A subclasses from Unicode. So "doesn't blow up" seems more on-target than "works" -- I don't think you expected an empty Unicode string here. From nas at arctrix.com Thu Sep 9 20:50:35 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Thu Sep 9 20:50:39 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <20040909180955.GA28902@panix.com> References: <20040909180743.GA31140@mems-exchange.org> <20040909180955.GA28902@panix.com> Message-ID: <20040909185034.GA31277@mems-exchange.org> On Thu, Sep 09, 2004 at 02:09:56PM -0400, Aahz wrote: > Check the recent python-dev archives for a long and nauseating > thread about interactions between __str__ and unicode. Using __unicode__ doesn't help. The core problem is that you cannot create a class that behaves like 'unicode' in this operation without subclassing from 'unicode'. That violates the "duck typing" design principle of Python. We violate it other places, usually in the name of efficiency, but I see no good reason in this case. I suspect the fix will be pretty straight forward (call tp_str and if the result is 'unicode' the produce a 'unicode' string). Again, is there some reason why we don't want this behavior? Neil From tim.peters at gmail.com Thu Sep 9 21:00:07 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 9 21:00:17 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <4140A41E.9080705@v.loewis.de> References: <20040909180743.GA31140@mems-exchange.org> <20040909180955.GA28902@panix.com> <4140A41E.9080705@v.loewis.de> Message-ID: <1f7befae04090912007305d532@mail.gmail.com> [Martin v. L?wis] > ... > For the specific issue, I would maintain that str() should always > return string objects. __builtin__.str() always does -- or raises an exception. Same for PyObject_Str() and PyObject_Repr(). > I'm not so sure about %s since, as Neil observes, '%s' % unicode_string > gives a unicode result. That's because PyString_Format()'s '%s' processing special-cases the snot out of unicode *inputs*. All other inputs to '%s' (and '%r') go thru PyObject_Str() or PyObject_Repr(), and, as above, those never return a unicode. In Neil's case, they raise the expected exception, and there's nothing sane PyString_Format can do about that. > I can't see any harm by supporting this operation also if __str__ returns > a Unicode object. It doesn't sound like a good idea to me, at least in part because it would be darned messy to implement short of saying "OK, we don't give a rip anymore about what type of objects PyObject_{Str,Repr} return", and that would have broader consequences that just letting Neil get away with whatever he's trying to do with str.__mod__. From tim.peters at gmail.com Thu Sep 9 21:11:51 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 9 21:11:54 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <20040909185034.GA31277@mems-exchange.org> References: <20040909180743.GA31140@mems-exchange.org> <20040909180955.GA28902@panix.com> <20040909185034.GA31277@mems-exchange.org> Message-ID: <1f7befae040909121126062330@mail.gmail.com> [Neil Schemenauer] > ... > I suspect the fix will be pretty straight forward (call tp_str and > if the result is 'unicode' the produce a 'unicode' string). Again, > is there some reason why we don't want this behavior? Yes: '%s' is documented as "String (converts any python object using str())". It's str(A()) that raises the exception you're seeing, not interpolation. To worm around that, you'll effectively have to duplicate PyObject_Str's implementation (which is more than just calling tp_str -- that may not exist -- you'll end up at least duplicating PyObject_Repr's implementation too) inside PyString_Format(), and end up with a mess that's harder to explain too. The *real* problem (IMO) is that we don't have a format code that means "stick the unicode representation here", i.e. there's no format code that triggers PyObject_Unicode() directly. unicode.__mod__ treats '%s' that way, but that isn't documented. From FBatista at uniFON.com.ar Thu Sep 9 21:14:24 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Thu Sep 9 21:18:58 2004 Subject: [Python-Dev] unicode inconsistency? Message-ID: [Tim Peters] #- The *real* problem (IMO) is that we don't have a format code that #- means "stick the unicode representation here", i.e. there's no format #- code that triggers PyObject_Unicode() directly. unicode.__mod__ #- treats '%s' that way, but that isn't documented. You mean something like %u? (actually don't know if the "u" is used for something else) If %u triggers PyObject_Unicode(), the following will work? class A: def __unicode__(self): return u'\u1234' '%u' % u'\u1234' '%u' % A() . Facundo From tim.peters at gmail.com Thu Sep 9 21:28:39 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 9 21:28:44 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: References: Message-ID: <1f7befae04090912281fa118fc@mail.gmail.com> [Batista, Facundo] > You mean something like %u? (actually don't know if the "u" is used for > something else) '%u' is used for unsigned int formats -- although int/long unification rendered those senseless. > If %u triggers PyObject_Unicode(), the following will work? > > class A: > def __unicode__(self): > return u'\u1234' > > '%u' % u'\u1234' > '%u' % A() That's the intent, yes. Neil's original example would *also* "work" then (because unlike PyObject_Str(), PyObject_Unicode() is happy to accept a unicode result as-is from a tp_str implementation). From nas at arctrix.com Thu Sep 9 21:57:32 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Thu Sep 9 21:57:36 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <1f7befae040909121126062330@mail.gmail.com> References: <20040909180743.GA31140@mems-exchange.org> <20040909180955.GA28902@panix.com> <20040909185034.GA31277@mems-exchange.org> <1f7befae040909121126062330@mail.gmail.com> Message-ID: <20040909195732.GB31277@mems-exchange.org> On Thu, Sep 09, 2004 at 03:11:51PM -0400, Tim Peters wrote: > '%s' is documented as "String (converts any python object using > str())". It's str(A()) that raises the exception you're seeing, > not interpolation. Shouldn't '%s' % u'\u1234' also raise an exception then? > To worm around that, you'll effectively have to duplicate > PyObject_Str's implementation Yes. I want something like "PyObject_UnicodeOrStr" that would return either a unicode object or a str object. That would make it easier to write code that produces 'str' results if unicode characters don't appear in any of the inputs. Having __str__ methods that can return either 'unicode' or 'str' objects is also very handy (I don't see how you can say that it doesn't make any sense). Perhaps I am on the wrong track. However, if I understand the /F bot correctly, he favours a design that does not force everthing to unicode strings. Neil From nas at arctrix.com Thu Sep 9 22:01:07 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Thu Sep 9 22:01:10 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <1f7befae04090912281fa118fc@mail.gmail.com> References: <1f7befae04090912281fa118fc@mail.gmail.com> Message-ID: <20040909200107.GC31277@mems-exchange.org> On Thu, Sep 09, 2004 at 03:28:39PM -0400, Tim Peters wrote: > > '%u' % u'\u1234' > > '%u' % A() > > That's the intent, yes. Neil's original example would *also* "work" > then (because unlike PyObject_Str(), PyObject_Unicode() is happy to > accept a unicode result as-is from a tp_str implementation). No, it would not "work" the way I want. I don't want to force things to unicode strings unless necessary. Neil From nas at arctrix.com Thu Sep 9 22:03:26 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Thu Sep 9 22:03:29 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <1f7befae04090912007305d532@mail.gmail.com> References: <20040909180743.GA31140@mems-exchange.org> <20040909180955.GA28902@panix.com> <4140A41E.9080705@v.loewis.de> <1f7befae04090912007305d532@mail.gmail.com> Message-ID: <20040909200326.GD31277@mems-exchange.org> On Thu, Sep 09, 2004 at 03:00:07PM -0400, Tim Peters wrote: > [Martin v. L?wis] > > I can't see any harm by supporting this operation also if __str__ returns > > a Unicode object. > > It doesn't sound like a good idea to me, at least in part because it > would be darned messy to implement short of saying "OK, we don't give > a rip anymore about what type of objects PyObject_{Str,Repr} return" Just to be clear, I don't propose allowing PyObject_Str and PyObject_Repr to return unicode objects. That would be a disaster, IMO. Neil From fredrik at pythonware.com Thu Sep 9 22:12:30 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Sep 9 22:10:43 2004 Subject: [Python-Dev] Re: unicode inconsistency? References: <20040909180743.GA31140@mems-exchange.org><20040909180955.GA28902@panix.com><20040909185034.GA31277@mems-exchange.org><1f7befae040909121126062330@mail.gmail.com> <20040909195732.GB31277@mems-exchange.org> Message-ID: Neil Schemenauer wrote: > Perhaps I am on the wrong track. However, if I understand the /F > bot correctly, he favours a design that does not force everthing to > unicode strings. that's correct. I'm beginning to think that we need an extra method (__text__), that can return any kind of string that's compatible with Python's text model. (in today's CPython, that's an 8-bit string with ASCII only, or a Uni- code string. future Python's may support more string types, at least at the C implementation level). I'm not sure we can change __str__ or __unicode__ without breaking code in really obscure ways (but I'd be happy to be proven wrong). From aahz at pythoncraft.com Thu Sep 9 22:21:13 2004 From: aahz at pythoncraft.com (Aahz) Date: Thu Sep 9 22:21:21 2004 Subject: [Python-Dev] Re: unicode inconsistency? In-Reply-To: References: <20040909195732.GB31277@mems-exchange.org> Message-ID: <20040909202112.GB5485@panix.com> On Thu, Sep 09, 2004, Fredrik Lundh wrote: > > I'm beginning to think that we need an extra method (__text__), that > can return any kind of string that's compatible with Python's text model. +1 While we're at it, that would be a good opportunity to add the __index__ method (for int-like objects that actually support indexing). That would get rid of the issues with using floats as inappropriate inputs. Can't require __index__ until 3.0, but we can start making it available. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From mal at egenix.com Thu Sep 9 22:54:26 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Thu Sep 9 22:54:28 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <20040909200107.GC31277@mems-exchange.org> References: <1f7befae04090912281fa118fc@mail.gmail.com> <20040909200107.GC31277@mems-exchange.org> Message-ID: <4140C302.10302@egenix.com> Neil Schemenauer wrote: > On Thu, Sep 09, 2004 at 03:28:39PM -0400, Tim Peters wrote: > >>> '%u' % u'\u1234' >>> '%u' % A() >> >>That's the intent, yes. Neil's original example would *also* "work" >>then (because unlike PyObject_Str(), PyObject_Unicode() is happy to >>accept a unicode result as-is from a tp_str implementation). > > No, it would not "work" the way I want. I don't want to force > things to unicode strings unless necessary. Unicode always causes coercion towards Unicode, just like floats always cause coercion towards floats. Nothing's going to change at that end. Note that your examples do work with %s if the format string itself is Unicode, so in P3k, you'll no longer have these problems. Since there must be a reason why you have a __str__ method that returns Unicode, I'd suggest you make the format string itself a Unicode string as well :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 09 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From tim.peters at gmail.com Thu Sep 9 22:59:18 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 9 22:59:39 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <20040909195732.GB31277@mems-exchange.org> References: <20040909180743.GA31140@mems-exchange.org> <20040909180955.GA28902@panix.com> <20040909185034.GA31277@mems-exchange.org> <1f7befae040909121126062330@mail.gmail.com> <20040909195732.GB31277@mems-exchange.org> Message-ID: <1f7befae0409091359320e34c9@mail.gmail.com> [Tim] >> '%s' is documented as "String (converts any python object using >> str())". It's str(A()) that raises the exception you're seeing, >> not interpolation. [Neil] > Shouldn't '%s' % u'\u1234' also raise an exception then? Yes, but the existence of one undocumented extension isn't sufficient reason to multiply them. The "Unicode exception" here is at least easy to explain. To make your case work, we somehow have to explain that although virtually all ways of invoking __str__ produce an 8-bit encoding of a unicode return value, for some magical reason str.__mod__ does not. The existing "Unicode exception" consists solely of saying "but unicode inputs don't invoke str(), and force the interpolation to get passed to unicode.__mod__ instead". > Yes. I want something like "PyObject_UnicodeOrStr" that would > return either a unicode object or a str object. That would make it > easier to write code that produces 'str' results if unicode > characters don't appear in any of the inputs. I think biting the Unicode bullet whole is saner, but suit yourself. > Having __str__ methods that can return either 'unicode' or 'str' objects > is also very handy (I don't see how you can say that it doesn't make any > sense). Didn't we go thru that last week ? Yes: [Neil] [... the same class as today's class ...] [Martin] > This class is incorrect: it does not support str(). [Neil] > Can you be more specific about what is incorrect with the above > class? [Martin] In the default installation, it gives a UnicodeEncodeError. You didn't respond to that (at least not that I saw), so I assumed you accepted Martin's nag. Having a __str__ that returns a unicode object that the default encoding can't handle is clearly (IMO) begging for trouble. > Perhaps I am on the wrong track. However, if I understand the /F > bot correctly, he favours a design that does not force everthing to > unicode strings. Saying it doesn't make sense to have a __str__ method return a Unicode value that can't be encoded *as* a str isn't asking anyone to force anything to Unicode. __str__ is still trying hard to retain a *distinction* between str and unicode. PyObject_Unicode() no longer plays along with that distinction, but I (mildly) wish it still did. From martin at v.loewis.de Thu Sep 9 23:00:12 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 9 23:00:04 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <4140C302.10302@egenix.com> References: <1f7befae04090912281fa118fc@mail.gmail.com> <20040909200107.GC31277@mems-exchange.org> <4140C302.10302@egenix.com> Message-ID: <4140C45C.4050009@v.loewis.de> M.-A. Lemburg wrote: >> No, it would not "work" the way I want. I don't want to force >> things to unicode strings unless necessary. > > > Unicode always causes coercion towards Unicode, just like floats > always cause coercion towards floats. Nothing's going to > change at that end. Not always. As we are discussing right now, str() (and indirectly %s) coerce Unicode objects into string objects. Also, PyArg_ParseTuple coerces Unicode into byte strings for the "s" and "t" formats. Regards, Martin From mal at egenix.com Thu Sep 9 23:11:53 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Thu Sep 9 23:11:56 2004 Subject: [Python-Dev] unicode inconsistency? In-Reply-To: <4140C45C.4050009@v.loewis.de> References: <1f7befae04090912281fa118fc@mail.gmail.com> <20040909200107.GC31277@mems-exchange.org> <4140C302.10302@egenix.com> <4140C45C.4050009@v.loewis.de> Message-ID: <4140C719.1010906@egenix.com> Martin v. L?wis wrote: > M.-A. Lemburg wrote: > >>> No, it would not "work" the way I want. I don't want to force >>> things to unicode strings unless necessary. >> >> Unicode always causes coercion towards Unicode, just like floats >> always cause coercion towards floats. Nothing's going to >> change at that end. > > Not always. As we are discussing right now, str() (and indirectly > %s) coerce Unicode objects into string objects. Also, > PyArg_ParseTuple coerces Unicode into byte strings for the "s" > and "t" formats. I may have been misunderstanding Neil, but I was referring to Neil's comment that he would not like things to get forced to Unicode. If I look at his initial posting, it looks as if Neil wanted '%s' % A() to return u'\u1234'. The current implementation tests for Unicode-subclasses, but does not look at the __str__ return object. In order to add support for the latter we'd have to add a new C API, e.g. PyObject_Text() that returns a StringTypes instance, or catch the UnicodeError caused by the ASCII codec and let this trigger a redirection to the Unicode formatting routine (however, this is dangerous since it would cause the object to be evaluated twice). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 09 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From cce at clarkevans.com Thu Sep 9 23:55:48 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Thu Sep 9 23:55:53 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040909101444.GA2877@vicky.ecs.soton.ac.uk> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> <413F564D.2070708@bluewin.ch> <20040908192056.GB62848@prometheusresearch.com> <20040909101444.GA2877@vicky.ecs.soton.ac.uk> Message-ID: <20040909215548.GB61544@prometheusresearch.com> Armin, On Thu, Sep 09, 2004 at 11:14:44AM +0100, Armin Rigo wrote: | I agree with Samuele that the proposal is far too vague currently. You | should try to describe what precisely should occur in each situation. Oh, absolutely. This was a draft PEP to collect feedback. It will be a bit before I have a chunk of time to assimilate the comments and produce another (more detailed) draft. Your comments were very helpful, I've got a bit of education in my future. | A major problem I see with the proposal is that you can describe what | should occur in some situations by presenting source code snippets; such | descriptions correspond easily to possible semantics at the bytecode | level. But bytecode is not a natural granularity for coroutine issues. *nod* | This doesn't mean that it is impossible to figure out a more limited | concept, like you are trying to do. However keeping the "restartable | exception" idea in mind should help focusing on the difficult problems | and where restrictions are needed. Best, Clark From noamr at myrealbox.com Fri Sep 10 01:03:05 2004 From: noamr at myrealbox.com (Noam Raphael) Date: Fri Sep 10 01:04:22 2004 Subject: [Python-Dev] Missing arguments in RE functions In-Reply-To: <413EB184.9030604@heneryd.com> References: <000f01c49535$9ec914c0$e841fea9@oemcomputer> <413EB184.9030604@heneryd.com> Message-ID: <4140E129.1040700@myrealbox.com> I've read the objections. I understand being careful about extending an API, but I still think that there are things to improve, even when being conservative about the API. I think that the straightforward functions should be taken seriously. The reason is that although you can write re.compile(pattern).match(...), re.match(pattern, ...) is shorter and just as clear - I think of the fact that REs are first compiled and then applied as an implementation issue, which lets you save time when applying the same RE many times. The documentation is with me - let me quote: ===================== The sequence prog = re.compile(pat) result = prog.match(str) is equivalent to result = re.match(pat, str) but the version using compile() is more efficient when the expression will be used several times in a single program. ===================== findall(string) Identical to the findall() function, using the compiled pattern. ===================== Not only the straightforward functions are not being regarded as being "only there for trivial cases", the methods of the compiled RE are regarded as sometimes-more-efficient versions of the straightforward functions. This is why I didn't even know, until I made my research before sending my message to python-dev, that you could match from a given start position - I studied the page documenting the functions, because I didn't want on an early stage to bother my students with the fact that REs are first compiled and then applied, and I didn't find any mention of the start position option. So, as I see it, there are two options. The first one is to decide that the functions are a ligitimate way of using REs in python, and add the optional parameters that I added in my patch. In this way, anything you can do with the compiled pattern you could do using the functions. (I'm not that big expert in REs, but I checked through the documentation and didn't find any functionality that was missing from the functions, after adding these parameters.) The second option is to decide that the functions are only a shortcut, meant for use in trivial cases. In that case, two things should be done, IMHO: The main thing is to update the documentation, to make that clear. It means at least adding a prominent note in the "module contents" page, stating something like "these functions are here only as shortcuts; to access the full functionality, use compiled patterns". I think that in this case, the documentation should be further updated, by changing all the function explanations to something like "equivalent to re.compile(pattern, flags).match(string)", instead of the detailed explanations now given. The second thing that should be done even if the functions are considered shortcuts, is to add the "flags" parameter to the findall() and finditer() functions - I really can't see any reason why the search() and match() functions should have that parameter and findall() and finditer() shouldn't - they all get two arguments, pattern and string. Why should the optional parameter be available only for the older functions? And a final note: the parameters for start and end positions are already available in the findall() and finditer() methods. Should this be left an undocumented feature? It seems to me perfectly legitimate to search for all the matches of a specific RE in a substring without actually copying all the characters of the substring to another string. Noam (P.S. Can you please add me to the CC of your replies? It would make it easier for me to reply, since I'm not a member of python-dev.) From greg at cosc.canterbury.ac.nz Fri Sep 10 02:58:23 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri Sep 10 02:58:28 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators Message-ID: <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz> PEP: 335 Title: Overloadable Boolean Operators Version: $Revision: 1.2 $ Last-Modified: $Date: 2004/09/09 14:17:17 $ Author: Gregory Ewing Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-Aug-2004 Python-Version: 2.4 Post-History: 05-Sep-2004 Abstract ======== This PEP proposes an extension to permit objects to define their own meanings for the boolean operators 'and', 'or' and 'not', and suggests an efficient strategy for implementation. A prototype of this implementation is available for download. Background ========== Python does not currently provide any '__xxx__' special methods corresponding to the 'and', 'or' and 'not' boolean operators. In the case of 'and' and 'or', the most likely reason is that these operators have short-circuiting semantics, i.e. the second operand is not evaluated if the result can be determined from the first operand. The usual technique of providing special methods for these operators therefore would not work. There is no such difficulty in the case of 'not', however, and it would be straightforward to provide a special method for this operator. The rest of this proposal will therefore concentrate mainly on providing a way to overload 'and' and 'or'. Motivation ========== There are many applications in which it is natural to provide custom meanings for Python operators, and in some of these, having boolean operators excluded from those able to be customised can be inconvenient. Examples include: 1. Numeric/Numarray, in which almost all the operators are defined on arrays so as to perform the appropriate operation between corresponding elements, and return an array of the results. For consistency, one would expect a boolean operation between two arrays to return an array of booleans, but this is not currently possible. There is a precedent for an extension of this kind: comparison operators were originally restricted to returning boolean results, and rich comparisons were added so that comparisons of Numeric arrays could return arrays of booleans. 2. A symbolic algebra system, in which a Python expression is evaluated in an environment which results in it constructing a tree of objects corresponding to the structure of the expression. 3. A relational database interface, in which a Python expression is used to construct an SQL query. A workaround often suggested is to use the bitwise operators '&', '|' and '~' in place of 'and', 'or' and 'not', but this has some drawbacks. The precedence of these is different in relation to the other operators, and they may already be in use for other purposes (as in example 1). There is also the aesthetic consideration of forcing users to use something other than the most obvious syntax for what they are trying to express. This would be particularly acute in the case of example 3, considering that boolean operations are a staple of SQL queries. Rationale ========= The requirements for a successful solution to the problem of allowing boolean operators to be customised are: 1. In the default case (where there is no customisation), the existing short-circuiting semantics must be preserved. 2. There must not be any appreciable loss of speed in the default case. 3. If possible, the customisation mechanism should allow the object to provide either short-circuiting or non-short-circuiting semantics, at its discretion. One obvious strategy, that has been previously suggested, is to pass into the special method the first argument and a function for evaluating the second argument. This would satisfy requirements 1 and 3, but not requirement 2, since it would incur the overhead of constructing a function object and possibly a Python function call on every boolean operation. Therefore, it will not be considered further here. The following section proposes a strategy that addresses all three requirements. A `prototype implementation`_ of this strategy is available for download. .. _prototype implementation: http://www.cosc.canterbury.ac.nz/~greg/python/obo//Python_OBO.tar.gz Specification ============= Special Methods --------------- At the Python level, objects may define the following special methods. =============== ================= ======================== Unary Binary, phase 1 Binary, phase 2 =============== ================= ======================== * __not__(self) * __and1__(self) * __and2__(self, other) * __or1__(self) * __or2__(self, other) * __rand2__(self, other) * __ror2__(self, other) =============== ================= ======================== The __not__ method, if defined, implements the 'not' operator. If it is not defined, or it returns NotImplemented, existing semantics are used. To permit short-circuiting, processing of the 'and' and 'or' operators is split into two phases. Phase 1 occurs after evaluation of the first operand but before the second. If the first operand defines the appropriate phase 1 method, it is called with the first operand as argument. If that method can determine the result without needing the second operand, it returns the result, and further processing is skipped. If the phase 1 method determines that the second operand is needed, it returns the special value NeedOtherOperand. This triggers the evaluation of the second operand, and the calling of an appropriate phase 2 method. During phase 2, the __and2__/__rand2__ and __or2__/__ror2__ method pairs work as for other binary operators. Processing falls back to existing semantics if at any stage a relevant special method is not found or returns NotImplemented. As a special case, if the first operand defines a phase 2 method but no corresponding phase 1 method, the second operand is always evaluated and the phase 2 method called. This allows an object which does not want short-circuiting semantics to simply implement the relevant phase 2 methods and ignore phase 1. Bytecodes --------- The patch adds four new bytecodes, LOGICAL_AND_1, LOGICAL_AND_2, LOGICAL_OR_1 and LOGICAL_OR_2. As an example of their use, the bytecode generated for an 'and' expression looks like this:: . . . evaluate first operand LOGICAL_AND_1 L evaluate second operand LOGICAL_AND_2 L: . . . The LOGICAL_AND_1 bytecode performs phase 1 processing. If it determines that the second operand is needed, it leaves the first operand on the stack and continues with the following code. Otherwise it pops the first operand, pushes the result and branches to L. The LOGICAL_AND_2 bytecode performs phase 2 processing, popping both operands and pushing the result. Type Slots ---------- A the C level, the new special methods are manifested as five new slots in the type object. In the patch, they are added to the tp_as_number substructure, since this allowed making use of some existing code for dealing with unary and binary operators. Their existence is signalled by a new type flag, Py_TPFLAGS_HAVE_BOOLEAN_OVERLOAD. The new type slots are:: unaryfunc nb_logical_not; unaryfunc nb_logical_and_1; unaryfunc nb_logical_or_1; binaryfunc nb_logical_and_2; binaryfunc nb_logical_or_2; Python/C API Functions ---------------------- There are also five new Python/C API functions corresponding to the new operations:: PyObject *PyObject_LogicalNot(PyObject *); PyObject *PyObject_LogicalAnd1(PyObject *); PyObject *PyObject_LogicalOr1(PyObject *); PyObject *PyObject_LogicalAnd2(PyObject *, PyObject *); PyObject *PyObject_LogicalOr2(PyObject *, PyObject *); Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From fredrik at pythonware.com Fri Sep 10 03:29:27 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Sep 10 03:27:36 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions References: <000f01c49535$9ec914c0$e841fea9@oemcomputer><413EB184.9030604@heneryd.com> <4140E129.1040700@myrealbox.com> Message-ID: Noam Raphael wrote: > This is why I didn't even know, until I made my research before sending my message to python-dev, > that you could match from a given start position - I studied the page documenting the functions, > because I didn't want on an early stage to bother my students with the fact that REs are first > compiled and then applied, and I didn't find any mention of the start position option. the "I didn't prepare properly, didn't know what I was talking about, and didn't know what do answer when my students asked me a legitimate question" argument isn't a good reason to change the language. if you're doing Python training, make sure you know your Python. I do, and I very seldom have problems explaining how things work. From barry at python.org Fri Sep 10 04:43:55 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 04:44:02 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> Message-ID: <1094784234.13055.14.camel@geddy.wooz.org> On Wed, 2004-09-08 at 11:08, Guido van Rossum wrote: > > Templates are meant to template *text* data, so Unicode is > > the right choice of baseclass from a design perspective. > > Only in Python 3.0. > > But even so, deriving from Unicode (or str) means the template class > inherits a lot of unwanted operations. Except that I think in general it'll just be very convenient for Templates to /be/ unicodes. But no matter. It seems like if we make Template a simple class, it will be possible for applications to mix in Template and unicode if they want. E.g. class UTemplate(Template, unicode). If we go that route, then I agree we probably don't want to use __mod__(), but I'm not too crazy about using __call__(). "Calling a template" just seems weird to me. Besides, extrapolating, I don't think we need separate Template and SafeTemplate classes. A single Template class can have both safe and non-safe substitution methods. So, I have working code that integrates these changes, and also uses Tim's metaclass idea to provide a nice, easy-to-document pattern overloading mechanism. I chose methods substitute() and safe_substitute() because, er, that's what they do, and those names also don't interfere with existing str or unicode methods. And to make effbot and Raymond happy, it won't auto-promote to unicode if everything's an 8bit string. I will check this in and hopefully this will put the issue to bed. There will be updated unit tests, and I will update the documentation and the PEP as appropriate -- if we've reached agreement on it. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040909/27bcbd95/attachment-0001.pgp From barry at python.org Fri Sep 10 04:48:56 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 04:49:00 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation forPEP292:SimpleString Substitutions In-Reply-To: <413F3605.7090707@egenix.com> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> Message-ID: <1094784536.13113.17.camel@geddy.wooz.org> On Wed, 2004-09-08 at 12:40, M.-A. Lemburg wrote: > If we start to store text data in Unicode now and leave binary > data in 8-bit strings, then the move to Unicode strings literals > will be much smoother in P3k. Not to mention more consistent with established alternative implementations of the Python language based on Unicode-only runtimes. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040909/0f1bf80a/attachment.pgp From barry at python.org Fri Sep 10 04:53:33 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 04:53:38 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <413F1D9C.20209@egenix.com> <413F23E0.2090908@egenix.com> Message-ID: <1094784813.13113.23.camel@geddy.wooz.org> On Wed, 2004-09-08 at 22:29, Guido van Rossum wrote: > But I thought we had plenty of time since Barry has offered to > withdraw the PEP 292 implementation for 2.4? Which I will still do if we cannot reach community agreement by beta1. But lets see how the latest proposal goes over. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040909/e87f2cf6/attachment.pgp From fdrake at acm.org Fri Sep 10 05:31:39 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Sep 10 05:32:00 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib string.py, 1.73, 1.74 In-Reply-To: References: Message-ID: <200409092331.39486.fdrake@acm.org> On Thursday 09 September 2004 11:07 pm, bwarsaw@users.sourceforge.net wrote: > - Adopt Tim Peter's idea for giving Template a metaclass, which makes the > delimiter, the identifier pattern, or the entire pattern easy to > override and document, while retaining efficiency of class-time > compilation of the regexp. Good documentation would really help for this as well. One simple and one... interesting example would be nice. ;-) -Fred -- Fred L. Drake, Jr. From nidoizo at yahoo.com Fri Sep 10 05:37:39 2004 From: nidoizo at yahoo.com (Nicolas Fleury) Date: Fri Sep 10 05:36:21 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: References: <000f01c49535$9ec914c0$e841fea9@oemcomputer><413EB184.9030604@heneryd.com> <4140E129.1040700@myrealbox.com> Message-ID: Fredrik Lundh wrote: > Noam Raphael wrote: > >>This is why I didn't even know, until I made my research before sending my message to python-dev, >>that you could match from a given start position - I studied the page documenting the functions, >>because I didn't want on an early stage to bother my students with the fact that REs are first >>compiled and then applied, and I didn't find any mention of the start position option. > > the "I didn't prepare properly, didn't know what I was talking about, > and didn't know what do answer when my students asked me a legitimate > question" argument isn't a good reason to change the language. > > if you're doing Python training, make sure you know your Python. I do, > and I very seldom have problems explaining how things work. I don't know what in Noam requests justify what I read as insults (and hope were not intended to be). I think Noam's point is just that the function API can be considered incomplete/incoherent when compared to the one with pattern objects. It's debatable and personally I always use pattern objects. It basically depends on the goals of the redundant function API, and I have no idea what they are. I tend to agree with Raymond. FWIW, I think it's clearer to define the function API as pattern objects equivalent in functionality than as shortcuts for trivial cases. However, as you pointed, the advantage of not extending the API forces moving the pattern objects. (I also give Python courses, but to be honest I teach regular expressions in Perl, avoiding focusing on compilation issues.) Regards, Nicolas From stephen at xemacs.org Fri Sep 10 07:38:38 2004 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri Sep 10 07:38:46 2004 Subject: [Python-Dev] AlternativeImplementation forPEP292:SimpleString Substitutions In-Reply-To: <200409090939.41873.gmccaughan@synaptics-uk.com> (Gareth McCaughan's message of "Thu, 9 Sep 2004 09:39:41 +0100") References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <413F6120.7090603@egenix.com> <200409090939.41873.gmccaughan@synaptics-uk.com> Message-ID: <87k6v2smxt.fsf_-_@tleepslib.sk.tsukuba.ac.jp> >>>>> "Gareth" == Gareth McCaughan writes: Gareth> That said, I strongly agree that all textual data should Gareth> be Unicode as far as the developer is concerned; but, at Gareth> least in the USA :-), it makes sense to have an optimized Gareth> representation that saves space for ASCII-only text, just Gareth> as we have an optimized representation for small integers. This is _not at all_ obvious. As MAL just pointed out, if efficiency is a goal, text algorithms often need to be different for operations on texts that are dense in an 8-bit character space, vs texts that are sparse in a 16-bit or 20-bit character space. Note that that is what is talking about too; he points to SRE and ElementTree. When viewed from that point of view, the subtext to 's comment is "I don't want to separately maintain 8-bit versions of new text facilities to support my non-Unicode applications, I want to impose that burden on the authors of text-handling PEPs." That may very well be the best thing for Python; as has done a lot of Unicode implementation for Python, he's in a good position to make such judgements. But the development costs MAL refers to are bigger than you are estimating, and will continue as long as that policy does. While I'm very sympathetic to 's view that there's more than one way to skin a cat, and a good cat-handling design should account for that, and conceding his expertise, none-the-less I don't think that Python really wants to _maintain_ more than one text-processing system by default. Of course if you restrict yourself to the class of ASCII- only strings, you can do better, and of course that is a huge class of strings. But that, as such, is important only to efficiency fanatics. The question is, how often are people going to notice that when they have pure ASCII they get a 100% speedup, or that they actually can just suck that 3GB ASCII file into their 4GB memory, rather than buffering it as 3 (or 6) 2GB Unicode strings? Compare how often people are going to notice that a new facility "just works" for Japanese or Hindi. I just don't see the former being worth the extra effort, while the latter makes the "this or that" choice clear. If a single representation is enough, it had better be Unicode-based, and the others can be supported in libraries (which turn binary blobs into non-standard text objects with appropriate methods) as the need arises. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. From raymond.hettinger at verizon.net Fri Sep 10 07:50:40 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Fri Sep 10 07:51:34 2004 Subject: [Python-Dev] Re: Alternative Implementation forPEP292:SimpleString Substitutions In-Reply-To: <1094784234.13055.14.camel@geddy.wooz.org> Message-ID: <003f01c496fa$1b2f6da0$e841fea9@oemcomputer> [Barry] > And to make effbot and Raymond happy, it won't auto-promote to unicode > if everything's an 8bit string. Glad to see that my happiness now ranks as a development objective ;-) > There will be updated unit tests, and I will update the documentation > and the PEP as appropriate -- if we've reached agreement on it. +1 Beautiful job. Barry asked me to bring up one remaining implementation issue for discussion on python-dev. The docs clearly state that only python identifiers are allowed as placeholders: [_A-Za-z][_A-Za-z0-9]* The challenge is that templates can be exposed to non-programmer end-users with no reason to suspect that one letter of their alphabet is different from another. So, as it stands right now, there is a usability issue with placeholder errors passing silently: >>> fechas = {u'hoy':u'lunes', u'ma?ana':u'martes'} >>> t = Template(u'?Puede volver $hoy o $ma?ana?') >>> t.safe_substitute(fechas) u'?Puede volver lunes o $ma?ana?' The substitution failed silently (no ValueError as would have occurred with $@ or a dangling $). It may be especially baffling for the user because one placeholder succeeded and the other failed without a hint of why (he can see the key in the mapping, it just won't substitute). No clue is offered that the Template was looking for $ma, a partial token, and didn't find it (the situation is even worse if it does find $ma and substitutes an unintended value). I suggest that the above should raise an error: ValueError: Invalid token $ma?ana on line 1, column 24 It is easily possible to detect and report such errors (see an example in nondist/sandbox/string/curry292.py). The arguments against such reporting are: * Raymond is smoking crack. End users will never make this mistake. * The docs say python identifiers only. You blew it. Tough. Not a bug. * For someone who understands exactly what they are doing, perhaps $ma is the intended placeholder -- why force them to uses braces: ${ma}?ana. In addition to the above usability issue, there is one other nit. The new invocation syntax offers us the opportunity for to also accept keyword arguments as mapping alternatives: def substitute(self, mapping=None, **kwds): if mapping is None: mapping == kwds . . . When applicable, this makes for beautiful, readable calls: t.substitute(who="Barry", what="mailmeister", when=now()) This would be a simple and nice enchancement to Barry's excellent implementation. I recommend that keyword arguments be adopted. Raymond From raymond.hettinger at verizon.net Fri Sep 10 09:50:31 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Fri Sep 10 09:51:25 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: Message-ID: <005801c4970a$d9227e00$e841fea9@oemcomputer> > I tend to agree with Raymond. FWIW, I think it's clearer to define the > function API as pattern objects equivalent in functionality than as > shortcuts for trivial cases I'm down to +0 on the request. Keeping the API stable is also important. And, Fred's effort to separate basic from advanced seems reasonable. Filling in the missing docs for existing flag, start, and stop args is a good idea and should probably be done even if the function API changes are rejected. Raymond From mal at egenix.com Fri Sep 10 11:05:58 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Fri Sep 10 11:06:07 2004 Subject: [Python-Dev] PEP 328 - Relative Imports In-Reply-To: References: <413F1B87.90301@egenix.com> Message-ID: <41416E76.8030603@egenix.com> Guido van Rossum wrote: >>I know that this has been discussed a few times in the past, >>but the more I have to deal with building applications using >>third-party libs or packages, the more I get the feeling that >>the choice of making "import module" absolute is the wrong >>path to follow. >> >>The typical scenario goes like this: >> >>* you build an application that uses various third-party >> packages and has to maintain them inside another package, >> e.g. ThirdPartyCode >> >>* you don't have access to the (third-party) package source code or >> it's not feasable to make changes to it for maintenance reasons >> >>Another common case is that you have to deal with third-party >>code that is not properly packaged as Python package, but comes >>as a set of top-level modules. >> >>In this scenario you typically put all those files into a >>newly created Python package directory and access the modules >>in that directory using the package name. >> >>In Python 2.3 and 2.4 (as well as all previous versions), both >>scenarios can easily be implemented without having to change >>the third-party code. >> >>The PEP however suggests that starting with 2.5, the interpreter >>will issue a warning and 2.6 should default to absolute paths. >> >>I'd like to request that the latter change be postponed to >>Python 3k, or that some other way of supporting the above >>scenarios is provided that can be enabled in the application. >> >>Please remember that changes to application code are well >>possible. What's not possible is making changes to the >>packaged third-party code. > > As long as it's clear that this is a compatibility requirement only I > think it's a good idea to support this way of developing apps (even > though I think that clever sys.path manipulation can probably get > around it, it's not worth breaking existing approaches). All new apps > should however use relative imports to reference their own code, so > the problem won't be repeated in the future. I have my doubts that this is going to happen. People are more likely going to make all imports absolute (like they already do in Java and other languages) - which is good, since it makes reading code much easier and allows for writing packages which are compatible to older Python version, but it also prevent developing applications using the above approach. I also don't think that extension writers will care enough to make their packages fully relocateable by using relative imports all over - these are hard to read and don't buy the developer of the extension anything. Anyway, what should the strategy for the PEP look like ? 1. postpone the defaulting to absolute until P3k 2. provide a way to customize the behaviour using e.g. a sys function -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From jim at zope.com Fri Sep 10 13:46:58 2004 From: jim at zope.com (Jim Fulton) Date: Fri Sep 10 13:47:02 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: <41416E76.8030603@egenix.com> References: <413F1B87.90301@egenix.com> <41416E76.8030603@egenix.com> Message-ID: <41419432.2000600@zope.com> M.-A. Lemburg wrote: > Guido van Rossum wrote: > ... >> As long as it's clear that this is a compatibility requirement only I >> think it's a good idea to support this way of developing apps (even >> though I think that clever sys.path manipulation can probably get >> around it, it's not worth breaking existing approaches). All new apps >> should however use relative imports to reference their own code, so >> the problem won't be repeated in the future. > > > I have my doubts that this is going to happen. > > People are more likely going to make all imports absolute (like > they already do in Java and other languages) - which > is good, since it makes reading code much easier and allows for > writing packages which are compatible to older Python version, > but it also prevent developing applications using the above > approach. > > I also don't think that extension writers will care enough to > make their packages fully relocateable by using relative > imports all over - these are hard to read and don't buy > the developer of the extension anything. I find explicit relative imports easier to read, as it reduces the noise level. I like the fact that local imports look different from non-local ones. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim at zope.com Fri Sep 10 13:46:58 2004 From: jim at zope.com (Jim Fulton) Date: Fri Sep 10 13:47:05 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: <41416E76.8030603@egenix.com> References: <413F1B87.90301@egenix.com> <41416E76.8030603@egenix.com> Message-ID: <41419432.2000600@zope.com> M.-A. Lemburg wrote: > Guido van Rossum wrote: > ... >> As long as it's clear that this is a compatibility requirement only I >> think it's a good idea to support this way of developing apps (even >> though I think that clever sys.path manipulation can probably get >> around it, it's not worth breaking existing approaches). All new apps >> should however use relative imports to reference their own code, so >> the problem won't be repeated in the future. > > > I have my doubts that this is going to happen. > > People are more likely going to make all imports absolute (like > they already do in Java and other languages) - which > is good, since it makes reading code much easier and allows for > writing packages which are compatible to older Python version, > but it also prevent developing applications using the above > approach. > > I also don't think that extension writers will care enough to > make their packages fully relocateable by using relative > imports all over - these are hard to read and don't buy > the developer of the extension anything. I find explicit relative imports easier to read, as it reduces the noise level. I like the fact that local imports look different from non-local ones. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From gmccaughan at synaptics-uk.com Fri Sep 10 13:57:13 2004 From: gmccaughan at synaptics-uk.com (Gareth McCaughan) Date: Fri Sep 10 13:57:46 2004 Subject: [Python-Dev] =?iso-8859-1?q?AlternativeImplementation=09forPEP292=3ASimpleString?= =?iso-8859-1?q?_Substitutions?= In-Reply-To: <87k6v2smxt.fsf_-_@tleepslib.sk.tsukuba.ac.jp> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <200409090939.41873.gmccaughan@synaptics-uk.com> <87k6v2smxt.fsf_-_@tleepslib.sk.tsukuba.ac.jp> Message-ID: <200409101257.13802.gmccaughan@synaptics-uk.com> On Friday 2004-09-10 06:38, Stephen J. Turnbull wrote: > >>>>> "Gareth" == Gareth McCaughan writes: > > Gareth> That said, I strongly agree that all textual data should > Gareth> be Unicode as far as the developer is concerned; but, at > Gareth> least in the USA :-), it makes sense to have an optimized > Gareth> representation that saves space for ASCII-only text, just > Gareth> as we have an optimized representation for small integers. > > This is _not at all_ obvious. As MAL just pointed out, if efficiency > is a goal, text algorithms often need to be different for operations > on texts that are dense in an 8-bit character space, vs texts that are > sparse in a 16-bit or 20-bit character space. Note that that is what > is talking about too; he points to SRE and ElementTree. I hope you aren't expecting me to disagree. > When viewed from that point of view, the subtext to 's comment is > "I don't want to separately maintain 8-bit versions of new text > facilities to support my non-Unicode applications, I want to impose > that burden on the authors of text-handling PEPs." That may very well > be the best thing for Python; as has done a lot of Unicode > implementation for Python, he's in a good position to make such > judgements. But the development costs MAL refers to are bigger than > you are estimating, and will continue as long as that policy does. How do you know what I am estimating? > While I'm very sympathetic to 's view that there's more than one > way to skin a cat, and a good cat-handling design should account for > that, and conceding his expertise, none-the-less I don't think that > Python really wants to _maintain_ more than one text-processing system > by default. Of course if you restrict yourself to the class of ASCII- > only strings, you can do better, and of course that is a huge class of > strings. But that, as such, is important only to efficiency fanatics. No, it's important to ... well, people to whom efficiency matters. There's no need for them to be fanatics. > The question is, how often are people going to notice that when they > have pure ASCII they get a 100% speedup, or that they actually can > just suck that 3GB ASCII file into their 4GB memory, rather than > buffering it as 3 (or 6) 2GB Unicode strings? Compare how often > people are going to notice that a new facility "just works" for > Japanese or Hindi. Why is that the question, rather than "how often are people going to benefit from getting a 100% speedup when they have pure ASCII"? Or even "how often are people going to try out Python on an application that uses pure-ASCII strings, and decide to use some other language that seems to do the job much faster"? > I just don't see the former being worth the extra > effort, while the latter makes the "this or that" choice clear. If a > single representation is enough, it had better be Unicode-based, and > the others can be supported in libraries (which turn binary blobs into > non-standard text objects with appropriate methods) as the need arises. No question that if a single representation is enough then it had better be Unicode. -- g From andrew at andreweland.org Fri Sep 10 13:45:25 2004 From: andrew at andreweland.org (Andrew Eland) Date: Fri Sep 10 13:58:10 2004 Subject: [Python-Dev] Adding status code constants to httplib Message-ID: <414193D5.6010405@andreweland.org> Hi, Over in web-sig, we're discussing PEP 333, the Web Server Gateway Interface. Rather than defining our own set of constants for the HTTP status code integers, we thought it would be a good idea to add them to httplib, allowing other applications to benefit. I've uploaded a patch[1] to httplib.py and the corresponding documentation. Do people think this is a good idea? -- Andrew Eland (http://www.andreweland.org) [1] http://sourceforge.net/tracker/index.php?func=detail&aid=1025790&group_id=5470&atid=305470 From skip at pobox.com Fri Sep 10 15:57:12 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 10 15:57:21 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: <005801c4970a$d9227e00$e841fea9@oemcomputer> References: <005801c4970a$d9227e00$e841fea9@oemcomputer> Message-ID: <16705.45752.540765.442498@montanaro.dyndns.org> Raymond> I'm down to +0 on the request. Keeping the API stable is also Raymond> important. And, Fred's effort to separate basic from advanced Raymond> seems reasonable. Adding my two cents, I'm -1 on the idea. I view re.match() and friends as convenience functions. There's no reason to provide all the functionality of the slightly lower-level re.compile(). If we were to do that, I'd propose (facetiously) that we deprecate re.compile() as well. Skip From fdrake at acm.org Fri Sep 10 16:14:52 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Sep 10 16:15:06 2004 Subject: [Python-Dev] Adding status code constants to httplib In-Reply-To: <414193D5.6010405@andreweland.org> References: <414193D5.6010405@andreweland.org> Message-ID: <200409101014.52091.fdrake@acm.org> On Friday 10 September 2004 07:45 am, Andrew Eland wrote: > Over in web-sig, we're discussing PEP 333, the Web Server Gateway > Interface. Rather than defining our own set of constants for the HTTP > status code integers, we thought it would be a good idea to add them to > httplib, +1 Some of us really don't remember what all the numeric codes mean, especially the ones we don't see often. -Fred -- Fred L. Drake, Jr. From aahz at pythoncraft.com Fri Sep 10 16:26:09 2004 From: aahz at pythoncraft.com (Aahz) Date: Fri Sep 10 16:26:17 2004 Subject: [Python-Dev] PEP292 vs Unicode In-Reply-To: <87k6v2smxt.fsf_-_@tleepslib.sk.tsukuba.ac.jp> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <413F6120.7090603@egenix.com> <200409090939.41873.gmccaughan@synaptics-uk.com> <87k6v2smxt.fsf_-_@tleepslib.sk.tsukuba.ac.jp> Message-ID: <20040910142609.GA15723@panix.com> On Fri, Sep 10, 2004, Stephen J. Turnbull wrote: > > While I'm very sympathetic to 's view that there's more than one > way to skin a cat, and a good cat-handling design should account for > that, and conceding his expertise, none-the-less I don't think that > Python really wants to _maintain_ more than one text-processing system > by default. Of course if you restrict yourself to the class of ASCII- > only strings, you can do better, and of course that is a huge class of > strings. But that, as such, is important only to efficiency fanatics. That's a good point, and that's what Python is moving toward. The thing is, we currently have two text processing systems, and there's no reason (given Python's dynamic dispatch capabilities) to treat one of them as second-class for this issue. It's particularly onerous in this instance because Unicode is unfortunately second-class in a number of respects, and doing what is in some respects a silent switch here would be needlessly confusing and irritating for users. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From pje at telecommunity.com Fri Sep 10 17:01:08 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 10 17:01:37 2004 Subject: [Python-Dev] Adding status code constants to httplib In-Reply-To: <414193D5.6010405@andreweland.org> Message-ID: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> At 12:45 PM 9/10/04 +0100, Andrew Eland wrote: >Over in web-sig, we're discussing PEP 333, the Web Server Gateway >Interface. Rather than defining our own set of constants for the HTTP >status code integers, we thought it would be a good idea to add them to >httplib, allowing other applications to benefit. I've uploaded a patch[1] >to httplib.py and the corresponding documentation. Do people think this is >a good idea? I would also put the statuses in a dictionary, such that: status_code[BAD_GATEWAY] = "Bad Gateway" This could be accomplished via something like: status_code = dict([ (val, key.replace('_',' ').title()) for key,val in globals.items() if key==key.upper() and not key.startswith('HTTP') and not key.startswith('_') ]) From barry at python.org Fri Sep 10 17:04:32 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 17:04:37 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: <41419432.2000600@zope.com> References: <413F1B87.90301@egenix.com> <41416E76.8030603@egenix.com> <41419432.2000600@zope.com> Message-ID: <1094828671.30837.23.camel@geddy.wooz.org> On Fri, 2004-09-10 at 07:46, Jim Fulton wrote: > I find explicit relative imports easier to read, as it > reduces the noise level. > > I like the fact that local imports look different from non-local ones. Yes, +1. The most important thing IMO is that there be an explicit way to spell whatever the default isn't. I was just grumbling the other day because I had to rename a submodule foologging.py instead of the more natural logging.py because that module suddenly wanted to start importing the global logging package. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040910/930a82a1/attachment.pgp From andrew at andreweland.org Fri Sep 10 17:12:02 2004 From: andrew at andreweland.org (Andrew Eland) Date: Fri Sep 10 17:24:53 2004 Subject: [Python-Dev] Adding status code constants to httplib In-Reply-To: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> Message-ID: <4141C442.8050005@andreweland.org> Phillip J. Eby wrote: > I would also put the statuses in a dictionary, such that: > > status_code[BAD_GATEWAY] = "Bad Gateway" There's a table mapping status codes to messages on BaseHTTPRequestHandler at the moment. It could be moved into httplib to make it more publically visible. -- Andrew From andrew at andreweland.org Fri Sep 10 17:46:44 2004 From: andrew at andreweland.org (Andrew Eland) Date: Fri Sep 10 17:59:36 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <4141CC1F.4000207@xhaus.com> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <4141C442.8050005@andreweland.org> <4141CC1F.4000207@xhaus.com> Message-ID: <4141CC64.2090205@andreweland.org> Alan Kennedy wrote: > And that mapping has 2 levels of human readable messages on it, for example > 304: ('Not modified', 'Document has not changed singe given time'), > I think that, since the human readable versions are seldom heeded > anyway, perhaps a single message is all we need? A simple move would mean we'd have to keep both, for backwards compatability. I guess BaseHTTPRequestHandler could mix its long messages in with those in a httplib table, but it sounds ugly. > And I'm -1 on forcing servers, particularly CGI servers, to import the > client-side httplib (2.3 httplib.pyc == 42K) just to get this mapping. I think the number of people who wouldn't import httplib on speed/process size grounds is very small. If they're that worried about efficiency, they could copy and paste the table, and manage the extra development complexity. -- Andrew From pje at telecommunity.com Fri Sep 10 18:08:37 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 10 18:09:09 2004 Subject: [Python-Dev] Adding status code constants to httplib In-Reply-To: <4141C442.8050005@andreweland.org> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040910120714.032b5b80@mail.telecommunity.com> At 04:12 PM 9/10/04 +0100, Andrew Eland wrote: >Phillip J. Eby wrote: > >>I would also put the statuses in a dictionary, such that: >> status_code[BAD_GATEWAY] = "Bad Gateway" > >There's a table mapping status codes to messages on BaseHTTPRequestHandler >at the moment. It could be moved into httplib to make it more publically >visible. It doesn't appear to include HTTP/1.1 status codes. From tim.hochberg at ieee.org Fri Sep 10 18:33:09 2004 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Fri Sep 10 18:33:20 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz> References: <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz> Message-ID: <4141D745.805@ieee.org> Greg Ewing wrote: [SNIP] Just a couple of quick comments on the motivation: > Motivation > ========== > > There are many applications in which it is natural to provide custom > meanings for Python operators, and in some of these, having boolean > operators excluded from those able to be customised can be > inconvenient. Examples include: > > 1. Numeric/Numarray, in which almost all the operators are defined on > arrays so as to perform the appropriate operation between > corresponding elements, and return an array of the results. For > consistency, one would expect a boolean operation between two > arrays to return an array of booleans, but this is not currently > possible. > > There is a precedent for an extension of this kind: comparison > operators were originally restricted to returning boolean results, > and rich comparisons were added so that comparisons of Numeric > arrays could return arrays of booleans. For Numeric/Numarray, I think and1/or1 would be unnecessary. If that were true in general it would simplify the proposal signifigantly: and2/or2 could be renamed to and/or and and1/or1 could be dropped. > 2. A symbolic algebra system, in which a Python expression is > evaluated in an environment which results in it constructing a tree > of objects corresponding to the structure of the expression. > > 3. A relational database interface, in which a Python expression is > used to construct an SQL query. I would be interested in seeing use cases for either or both of these last two examples that show how and1/or1 are useful. Regards, -tim From mal at egenix.com Fri Sep 10 19:05:32 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Fri Sep 10 19:05:36 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: <1094828671.30837.23.camel@geddy.wooz.org> References: <413F1B87.90301@egenix.com><41416E76.8030603@egenix.com> <41419432.2000600@zope.com> <1094828671.30837.23.camel@geddy.wooz.org> Message-ID: <4141DEDC.8080503@egenix.com> Barry Warsaw wrote: > On Fri, 2004-09-10 at 07:46, Jim Fulton wrote: > > >>I find explicit relative imports easier to read, as it >>reduces the noise level. >> >>I like the fact that local imports look different from non-local ones. > > > Yes, +1. The most important thing IMO is that there be an explicit way > to spell whatever the default isn't. I was just grumbling the other day > because I had to rename a submodule foologging.py instead of the more > natural logging.py because that module suddenly wanted to start > importing the global logging package. If that's the only reason, then placing the whole Python standard lib under a new top-level package name would be the better solution, starting with P3k. I wasn't suggesting not to have relative imports. It is just that most third-party packages nowadays rely on the current import lookup mechanism (first local, then global). All of these would break the day absolute imports become the default. Whether or not relative imports look right is probably more a question of taste than anything else... I find getting the number of dots right just as hard as getting the number '../' right in an relative path name. But back to the original question: should absolute imports be made a P3k feature or will we have a sys.setimportscheme() hook to tune the setting on a per application basis ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From nidoizo at yahoo.com Fri Sep 10 19:18:47 2004 From: nidoizo at yahoo.com (Nicolas Fleury) Date: Fri Sep 10 19:18:47 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: <4141DEDC.8080503@egenix.com> References: <413F1B87.90301@egenix.com><41416E76.8030603@egenix.com> <41419432.2000600@zope.com> <1094828671.30837.23.camel@geddy.wooz.org> <4141DEDC.8080503@egenix.com> Message-ID: M.-A. Lemburg wrote: > I wasn't suggesting not to have relative imports. It is just > that most third-party packages nowadays rely on the current > import lookup mechanism (first local, then global). All of these > would break the day absolute imports become the default. Don't you think that they have enough time to adapt? We're talking about 2.6 for the final step and 2.4 is not even released. If a third-party package doesn't adapt, I'm sure it's possible to have a wordaround, but isn't that a different issue? Don't forget that you can make you imports in if/else blocks on version. Regards, Nicolas From mcherm at mcherm.com Fri Sep 10 19:21:25 2004 From: mcherm at mcherm.com (Michael Chermside) Date: Fri Sep 10 19:19:56 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions Message-ID: <1094836885.4141e2953f7df@mcherm.com> Fredrik Lundh writes: > the "I didn't prepare properly, didn't know what I was talking about, > and didn't know what do answer when my students asked me a legitimate > question" argument isn't a good reason to change the language. > > if you're doing Python training, make sure you know your > Python. I do, > and I very seldom have problems explaining how things work. Fredrik, a less hostile response would be appropriate here. No one knows every detail of every API of any reasonably sized library (like Python's). Students ask questions about the darndest things. If you have never been stumped by a student's question then you're not teaching the right people. My opinion on the underlying question is this: We have two ways of doing things: using compiled REs, and using the RE functions. Our goal is to make Python's API be so simple and easy to understand that people DON'T have to memorize every little detail -- it should be "obvious". That is, in my opinion, the strongest reason in favor of minimal APIs. Right now, there are some things you can do with the RE functions and a DIFFERENT set of things you can do with the compiled REs. That's TWO sets of functionality to learn. If Noam's patch can make the feature set of the RE functions the SAME as the feature set of the compiled REs, then there's only ONE set of features to memorize. On the whole, there are MORE indiviual "pieces" to the API but because of orthogonality the API as a whole is simpler. Therefore in this case I favor using Noam's patch. My next-favorite alternative would actually be to remove the RE functions so there's "only one way to do it". But the functions are conceptually simpler (as Noam showed, the docs describe the functions then say the compiled REs work "just the same"), and they've been in place for years... removing them is not an option. -- Michael Chermside From amk at amk.ca Fri Sep 10 19:42:47 2004 From: amk at amk.ca (A.M. Kuchling) Date: Fri Sep 10 19:43:13 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: <1094836885.4141e2953f7df@mcherm.com> References: <1094836885.4141e2953f7df@mcherm.com> Message-ID: <20040910174247.GA17451@rogue.amk.ca> On Fri, Sep 10, 2004 at 10:21:25AM -0700, Michael Chermside wrote: > (as Noam showed, the docs describe the > functions then say the compiled REs work "just the same"), This fact is just a historical accident, because I initially wrote the docs for the re module starting with the functions and then moving on to the methods. I can restructure the docs to make regex objects paramount. (The Regex HOWTO takes this approach; the module-level functions are mentioned in only one section, and not used outside of that section.) --amk From michel at dialnetwork.com Thu Sep 9 09:46:24 2004 From: michel at dialnetwork.com (Michel Pelletier) Date: Fri Sep 10 19:56:43 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators Message-ID: <1094715983.1472.7.camel@debbie> > Message: 4 > Date: Fri, 10 Sep 2004 12:58:23 +1200 > From: Greg Ewing > Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators > To: python-dev@python.org > Message-ID: > <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz> > > > Python does not currently provide any '__xxx__' special methods > corresponding to the 'and', 'or' and 'not' boolean operators. I like the PEP with 'and' and 'or', but isn't the 'not' special method essentially the inverse of __nonzero__? -Michel From pje at telecommunity.com Fri Sep 10 20:18:08 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 10 20:18:45 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <1094715983.1472.7.camel@debbie> Message-ID: <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> At 12:46 AM 9/9/04 -0700, Michel Pelletier wrote: > > Message: 4 > > Date: Fri, 10 Sep 2004 12:58:23 +1200 > > From: Greg Ewing > > Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators > > To: python-dev@python.org > > Message-ID: > > <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz> > > > > > > Python does not currently provide any '__xxx__' special methods > > corresponding to the 'and', 'or' and 'not' boolean operators. > >I like the PEP with 'and' and 'or', but isn't the 'not' special method >essentially the inverse of __nonzero__? There isn't such a method currently. Also, note that the expression 'not x' is currently guaranteed to return a boolean value. The purpose of the PEP is to allow 'not x' to potentially return an arbitrary object, as for use in algebraic and query systems that want to use Python code as their syntax. Such systems currently use e.g. '~x' instead of 'not x' because the former allows return of arbitrary objects. IMO, the algebraic/query use cases would be better served by some sort of "code literal" or "AST literal" syntax, rather than adding more special methods. The reason is that all too often you want to include "normal" Python values in such an expression, but still manipulate them symbolically, or have some other sort of special treatment. A literal syntax for Python expressions is more useful for this, which is why I've moved to using strings and the parser module to accomplish such processing. At that level, boolean operator methods are moot. (Code literals would be useful primarily in the ability to have them parsed and syntax checked at import time, rather than waiting until runtime. This consideration also applies to PEP 335, but PEP 335 may consume all of its compilation performance gains by losing runtime performance at all boolean operation sites.) But anyway, I digress. Since PEP 335 doesn't significantly help (IMO) with algebraic and query systems, that leaves the numeric use cases, which I don't have enough experience to comment on. From barry at python.org Fri Sep 10 20:32:30 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 20:32:35 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib string.py, 1.74, 1.75 In-Reply-To: References: Message-ID: <1094841150.30836.38.camel@geddy.wooz.org> On Fri, 2004-09-10 at 02:21, rhettinger@users.sourceforge.net wrote: > Update of /cvsroot/python/python/dist/src/Lib > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18307 > > Modified Files: > string.py > Log Message: > __slots__ went missing from Template. On purpose though. With __slots__ you can't mix in Template and unicode. I don't see any reason to limit the attributes of a Template instance, so I backed this out. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040910/d80ee6aa/attachment-0001.pgp From barry at python.org Fri Sep 10 20:38:14 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 20:38:18 2004 Subject: [Python-Dev] Re: Alternative Implementation forPEP292:SimpleString Substitutions In-Reply-To: <003f01c496fa$1b2f6da0$e841fea9@oemcomputer> References: <003f01c496fa$1b2f6da0$e841fea9@oemcomputer> Message-ID: <1094841494.30829.45.camel@geddy.wooz.org> On Fri, 2004-09-10 at 01:50, Raymond Hettinger wrote: > [Barry] > > And to make effbot and Raymond happy, it won't auto-promote to unicode > > if everything's an 8bit string. > > Glad to see that my happiness now ranks as a development objective ;-) Well, if I want to get other work done... :) > > There will be updated unit tests, and I will update the documentation > > and the PEP as appropriate -- if we've reached agreement on it. > > +1 > Beautiful job. Cool! > The arguments against such reporting are: > * Raymond is smoking crack. End users will never make this mistake. > * The docs say python identifiers only. You blew it. Tough. Not a > bug. > * For someone who understands exactly what they are doing, perhaps $ma > is the intended placeholder -- why force them to uses braces: > ${ma}?ana. It also makes it more difficult to document. IOW, right now the PEP and the documentation say that the first non-identifier character terminates the placeholder. How would you word the rules with your change? > In addition to the above usability issue, there is one other nit. The > new invocation syntax offers us the opportunity for to also accept > keyword arguments as mapping alternatives: > > def substitute(self, mapping=None, **kwds): > if mapping is None: > mapping == kwds > . . . > > When applicable, this makes for beautiful, readable calls: > > t.substitute(who="Barry", what="mailmeister", when=now()) > > This would be a simple and nice enchancement to Barry's excellent > implementation. I recommend that keyword arguments be adopted. My only problem with that is the interference that the 'mapping' argument presents. IOW, kwds can't contain 'mapping'. We could solve that in a couple of ways: 1. ignore the problem and tell people not to do that 2. change 'mapping' to something less likely to collide, such as '_mapping' or '__mapping__', and then see #1. 3. get rid of the mapping altogether and only have kwds. This would change the non-keyword invocation from mytemplate.substitute(mymapping) to mytemplate.substitute(**mymapping) A bit uglier and harder to document. Note that there's also a potential collision on 'self'. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040910/20fd8f3e/attachment.pgp From barry at python.org Fri Sep 10 20:41:24 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 20:41:28 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib string.py, 1.73, 1.74 In-Reply-To: <200409092331.39486.fdrake@acm.org> References: <200409092331.39486.fdrake@acm.org> Message-ID: <1094841684.30831.50.camel@geddy.wooz.org> On Thu, 2004-09-09 at 23:31, Fred L. Drake, Jr. wrote: > On Thursday 09 September 2004 11:07 pm, bwarsaw@users.sourceforge.net wrote: > > - Adopt Tim Peter's idea for giving Template a metaclass, which makes the > > delimiter, the identifier pattern, or the entire pattern easy to > > override and document, while retaining efficiency of class-time > > compilation of the regexp. > > Good documentation would really help for this as well. One simple and one... > interesting example would be nice. ;-) Yep. I'm definitely planning on updating the docs. I'll make sure to include some examples. After re-organizing libstring.tex, there's plenty of room to do so without increasing the clutter. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040910/747dd7cc/attachment.pgp From barry at python.org Fri Sep 10 20:47:56 2004 From: barry at python.org (Barry Warsaw) Date: Fri Sep 10 20:48:03 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: <4141DEDC.8080503@egenix.com> References: <413F1B87.90301@egenix.com><41416E76.8030603@egenix.com> <41419432.2000600@zope.com> <1094828671.30837.23.camel@geddy.wooz.org> <4141DEDC.8080503@egenix.com> Message-ID: <1094842075.30831.55.camel@geddy.wooz.org> On Fri, 2004-09-10 at 13:05, M.-A. Lemburg wrote: > If that's the only reason, then placing the whole Python standard > lib under a new top-level package name would be the better > solution, starting with P3k. One of my earliest suggestions on the topic did just that. In fact, you could do it in a backward compatible way, by introducing an optional global package. E.g. import logging That would import the local logging.py module if it existed, otherwise it would import the global logging module. This is exactly what Python does today. from __global__ import logging That would always import the global logging package. __global__ is the optional "fake" global package and would only be used when you want to explicitly skip any local imports. IIRC though, Guido never liked this proposal much. I repost it here on the off chance that he's way too busy to read every message in this thread . -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040910/2db44f51/attachment.pgp From edloper at gradient.cis.upenn.edu Fri Sep 10 21:59:31 2004 From: edloper at gradient.cis.upenn.edu (Edward Loper) Date: Fri Sep 10 21:59:44 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: <20040910183258.A1DCE1E4009@bag.python.org> References: <20040910183258.A1DCE1E4009@bag.python.org> Message-ID: > Right now, there are some things you can do with the RE functions > and a DIFFERENT set of things you can do with the compiled REs. > That's TWO sets of functionality to learn. If Noam's patch can > make the feature set of the RE functions the SAME as the feature > set of the compiled REs, then there's only ONE set of features to > memorize. On the whole, there are MORE indiviual "pieces" to the > API but because of orthogonality the API as a whole is simpler. > Therefore in this case I favor using Noam's patch. +1. Consistency makes the API conceptually simpler, even if the absolute number of parameters is larger. -Edward From martin at v.loewis.de Fri Sep 10 22:59:06 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri Sep 10 22:58:57 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> References: <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> Message-ID: <4142159A.7030309@v.loewis.de> Phillip J. Eby wrote: >> I like the PEP with 'and' and 'or', but isn't the 'not' special method >> essentially the inverse of __nonzero__? > > > There isn't such a method currently. Did you mean to say that there is currently no method named __nonzero__? This is not true: >>> class X: ... def __nonzero__(self): ... print "Called" ... return 13 ... >>> not X() Called False Regards, Martin From pje at telecommunity.com Fri Sep 10 23:29:51 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Sep 10 23:30:31 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <4142159A.7030309@v.loewis.de> References: <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040910172922.02a930f0@mail.telecommunity.com> At 10:59 PM 9/10/04 +0200, Martin v. L?wis wrote: >Phillip J. Eby wrote: >>>I like the PEP with 'and' and 'or', but isn't the 'not' special method >>>essentially the inverse of __nonzero__? >> >>There isn't such a method currently. > >Did you mean to say that there is currently no method named __nonzero__? No; that there was no method named '__not__'. From raymond.hettinger at verizon.net Sat Sep 11 00:22:54 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Sep 11 00:23:50 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <1094841494.30829.45.camel@geddy.wooz.org> Message-ID: <003501c49784$b8292d00$e841fea9@oemcomputer> > > * For someone who understands exactly what they are doing, perhaps $ma > > is the intended placeholder -- why force them to uses braces: > > ${ma}?ana. > > It also makes it more difficult to document. IOW, right now the PEP and > the documentation say that the first non-identifier character terminates > the placeholder. How would you word the rules with your change? """Placeholders must be a valid Python identifier (containing only ASCII alphanumeric characters and an underscore). If an unbraced identifier ends with a non-ASCII alphanumeric character, such as the latin letter n with tilde in $ma?ana, then a ValueError is raised for the specious identifier. > My only problem with that is the interference that the 'mapping' > argument presents. IOW, kwds can't contain 'mapping'. To support a case where both a mapping and keywords are present, perhaps an auxiliary class could simplify matters: def substitute(self, mapping=None, **kwds): if mapping is None: mapping = kwds elif kwds: mapping = _altmap(kwds, mapping) . . . class _altmap: def __init__(self, primary, secondary): self.primary = primary self.secondary = secondary def __getitem__(self, key): try: return self.primary[key] except KeyError: return self.secondary[key] This matches the way keywords are used with the dict(). Raymond From nidoizo at yahoo.com Sat Sep 11 00:51:26 2004 From: nidoizo at yahoo.com (Nicolas Fleury) Date: Sat Sep 11 00:51:35 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: <1094842075.30831.55.camel@geddy.wooz.org> References: <413F1B87.90301@egenix.com><41416E76.8030603@egenix.com> <41419432.2000600@zope.com> <1094828671.30837.23.camel@geddy.wooz.org> <4141DEDC.8080503@egenix.com> <1094842075.30831.55.camel@geddy.wooz.org> Message-ID: Barry Warsaw wrote: > from __global__ import logging > > That would always import the global logging package. __global__ is the > optional "fake" global package and would only be used when you want to > explicitly skip any local imports. > > IIRC though, Guido never liked this proposal much. I repost it here on > the off chance that he's way too busy to read every message in this > thread . I agree with Guido. FWIW, I think imports should be absolute by default and that the statu quo is a mistake. The __global__ solution makes absolute imports too verbose, when they are usually in majority. I also don't see any advantage (but clear disadvantages) to mix relative and absolute imports with the same syntax, so PEP328 is the way to go. Third party packages have 3 releases to adapt, so I don't see the problem. You have to understand that with the __global__ solution, I would make all my imports use that syntax, and that's really verbose. Where I work, we are many working in a root package and right now it's a mess because any new module can hide global modules to modules in same directory, so modules names must be chosen accordingly (we even run a test at night to make sure no import is relative). And yes, I would want to be able to name modules in a package with names like "math", "os", "pickle", "test", "unittest", etc. and not wait Python 3 for that capability. I also expect more standard modules to be in packages in future. Regards, Nicolas From gvanrossum at gmail.com Sat Sep 11 02:46:20 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat Sep 11 02:46:23 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: References: <20040910183258.A1DCE1E4009@bag.python.org> Message-ID: > +1. Consistency makes the API conceptually simpler, even if the > absolute number of parameters is larger. And how is it more consistent that in one form you have to write re.compile(r"[a-z]+", re.I).search(line) while in the other form you have to write re.search(r"[a-z]+", line, re.I) ??? This parameter ordering issue alone makes me cringe at adding the flags to the functions. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From py-web-sig at xhaus.com Fri Sep 10 17:45:35 2004 From: py-web-sig at xhaus.com (Alan Kennedy) Date: Sat Sep 11 07:00:48 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <4141C442.8050005@andreweland.org> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <4141C442.8050005@andreweland.org> Message-ID: <4141CC1F.4000207@xhaus.com> [Phillip J. Eby] >> I would also put the statuses in a dictionary, such that: >> >> status_code[BAD_GATEWAY] = "Bad Gateway" [Andrew Eland] > There's a table mapping status codes to messages on > BaseHTTPRequestHandler at the moment. It could be moved into httplib to > make it more publically visible. And that mapping has 2 levels of human readable messages on it, for example 304: ('Not modified', 'Document has not changed singe given time'), I think that, since the human readable versions are seldom heeded anyway, perhaps a single message is all we need? And I'm -1 on forcing servers, particularly CGI servers, to import the client-side httplib (2.3 httplib.pyc == 42K) just to get this mapping. If the changes are not going to make it in until the next release of cpython anyway, then maybe we should just aim for a new module? Or is some version of 2.4 the target, in which case minimal patches might make it in, whereas new modules won't? Just my 0,02 euro. Alan. From bac at OCF.Berkeley.EDU Sat Sep 11 07:22:12 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sat Sep 11 07:22:17 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <003501c49784$b8292d00$e841fea9@oemcomputer> References: <003501c49784$b8292d00$e841fea9@oemcomputer> Message-ID: <41428B84.8090709@ocf.berkeley.edu> Raymond Hettinger wrote: >>>* For someone who understands exactly what they are doing, perhaps > > $ma > >>>is the intended placeholder -- why force them to uses braces: >>>${ma}?ana. >> >>It also makes it more difficult to document. IOW, right now the PEP > > and > >>the documentation say that the first non-identifier character > > terminates > >>the placeholder. How would you word the rules with your change? > > > """Placeholders must be a valid Python identifier (containing only ASCII > alphanumeric characters and an underscore). If an unbraced identifier > ends with a non-ASCII alphanumeric character, such as the latin letter n > with tilde in $ma?ana, then a ValueError is raised for the specious > identifier. > I don't think any of this is needed. If a non-programmer is being told to use string substitution chances are someone is either going to explain it to them or there will be another set of docs to explain things in a simple way. I suspect stating exactly what a valid Python identifier contains as you did in parentheses above will be enough. -Brett From shane.holloway at ieee.org Sat Sep 11 08:25:38 2004 From: shane.holloway at ieee.org (Shane Holloway (IEEE)) Date: Sat Sep 11 08:26:06 2004 Subject: [Python-Dev] PEP 328 - Relative Imports In-Reply-To: <41416E76.8030603@egenix.com> References: <413F1B87.90301@egenix.com> <41416E76.8030603@egenix.com> Message-ID: <41429A62.3080201@ieee.org> M.-A. Lemburg wrote: > People are more likely going to make all imports absolute (like > they already do in Java and other languages) - which > is good, since it makes reading code much easier and allows for > writing packages which are compatible to older Python version, > but it also prevent developing applications using the above > approach. > > I also don't think that extension writers will care enough to > make their packages fully relocateable by using relative > imports all over - these are hard to read and don't buy > the developer of the extension anything. > > Anyway, what should the strategy for the PEP look like ? > > 1. postpone the defaulting to absolute until P3k > > 2. provide a way to customize the behaviour using > e.g. a sys function As a package writer, I will go through the effort to write relative imports for a few reasons. * One is that I often don't know what the final layout of the larger package group will be; however, I am usually fairly certain about local dependencies. Having a way to refer to a parent package will enable me to be "complete" in this development style. * Second, it's really handy to develop something in the sandbox, and then move it to production in one fell swooop. BTW, will __path__ work for relative parent references? * A third reason I will go through the effort to use relative imports is that I'd like to allow application frameworks (and other package writers) to "scoop up" any whole packages if they so desire. Unfortunately, it'll be a while before I can target Python 2.4 directly. But it will be good when I can! Thanks, -Shane Holloway From shane.holloway at ieee.org Sat Sep 11 08:25:49 2004 From: shane.holloway at ieee.org (Shane Holloway (IEEE)) Date: Sat Sep 11 08:26:14 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <41428B84.8090709@ocf.berkeley.edu> References: <003501c49784$b8292d00$e841fea9@oemcomputer> <41428B84.8090709@ocf.berkeley.edu> Message-ID: <41429A6D.90405@ieee.org> > Raymond Hettinger wrote: >> """Placeholders must be a valid Python identifier (containing only ASCII >> alphanumeric characters and an underscore). If an unbraced identifier >> ends with a non-ASCII alphanumeric character, such as the latin letter n >> with tilde in $ma?ana, then a ValueError is raised for the specious >> identifier. Brett C. wrote: > I don't think any of this is needed. If a non-programmer is being told > to use string substitution chances are someone is either going to > explain it to them or there will be another set of docs to explain > things in a simple way. I suspect stating exactly what a valid Python > identifier contains as you did in parentheses above will be enough. Also, since Barry has gone to great lengths to make Template overrideable, applications can replace the regular expression in their derived Template class when there is a need to allow for end-users inputing template strings. So, I'd suggest keeping safe_substitute relatively simple, but document the limitation and/or solution. Thanks, -Shane Holloway From fredrik at pythonware.com Sat Sep 11 08:52:06 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 11 08:50:17 2004 Subject: [Python-Dev] Re: Re: Alternative ImplementationforPEP292:SimpleStringSubstitutions References: <1094841494.30829.45.camel@geddy.wooz.org> <003501c49784$b8292d00$e841fea9@oemcomputer> Message-ID: Raymond Hettinger wrote: > """Placeholders must be a valid Python identifier (containing only ASCII > alphanumeric characters and an underscore). If an unbraced identifier > ends with a non-ASCII alphanumeric character, such as the latin letter n > with tilde in $mañana, then a ValueError is raised for the specious > identifier. so why keep the python identifier limitation? the RE engine you're using to parse the template has a concept of "alphanumeric character". just define the placeholder syntax as "one or more alphanumeric characters or under- scores" (\w+), use re.UNICODE if the template is created from a unicode string, and you're done. this doesn't mean that people *have* to use non-ASCII characters, of course. but if they do, things just work. From raymond.hettinger at verizon.net Sat Sep 11 08:57:24 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sat Sep 11 08:58:20 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <41428B84.8090709@ocf.berkeley.edu> Message-ID: <000e01c497cc$983d6720$e841fea9@oemcomputer> [Brett] > I suspect stating exactly what a valid Python > identifier contains as you did in parentheses above will be enough. Given the template, u'?Puede volver $hoy o $ma?ana?', you think $ma is an intended placeholder name and that ? should be a delimiter just like whitespace and punctuation? If end users always follow the rules, this will never come up. If they don't, should there be error message or a silent failure? Raymond From martin at v.loewis.de Sat Sep 11 09:01:34 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 11 09:01:23 2004 Subject: [Web-SIG] Re: [Python-Dev] Adding status code constants to httplib In-Reply-To: <4141CC1F.4000207@xhaus.com> References: <5.1.1.6.0.20040910105252.020caec0@mail.telecommunity.com> <4141C442.8050005@andreweland.org> <4141CC1F.4000207@xhaus.com> Message-ID: <4142A2CE.5060405@v.loewis.de> Alan Kennedy wrote: > And I'm -1 on forcing servers, particularly CGI servers, to import the > client-side httplib (2.3 httplib.pyc == 42K) just to get this mapping. It might be somewhat comforting that the 2.4 httplib.pyc is only 33K. Regards, Martin From stephen at xemacs.org Sat Sep 11 09:35:08 2004 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat Sep 11 09:35:18 2004 Subject: [Python-Dev] AlternativeImplementation forPEP292:SimpleString Substitutions In-Reply-To: <200409101257.13802.gmccaughan@synaptics-uk.com> (Gareth McCaughan's message of "Fri, 10 Sep 2004 12:57:13 +0100") References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <200409090939.41873.gmccaughan@synaptics-uk.com> <87k6v2smxt.fsf_-_@tleepslib.sk.tsukuba.ac.jp> <200409101257.13802.gmccaughan@synaptics-uk.com> Message-ID: <87mzzxqmvn.fsf@tleepslib.sk.tsukuba.ac.jp> >>>>> "Gareth" == Gareth McCaughan writes: Gareth> On Friday 2004-09-10 06:38, Stephen J. Turnbull wrote: >> But [efficiency], as such, is important only to efficiency >> fanatics. Gareth> No, it's important to ... well, people to whom efficiency Gareth> matters. There's no need for them to be fanatics. If it matters just because they care, they're fanatics. If it matters because they get some other benefit (response time less than the threshold of hotice, twice as many searches per unit time, half as many boxes to serve a given load), they're not. 's talk of many ways to do things "and Python should account for most of them" strikes me as fanaticism by that definition; the vast majority of developers will never deal with the special cases, or write apps that anticipate dealing with huge ASCII strings. Those costs should be borne by the developers who do, and their clients. I apologize for shoehorning that into my reply to you. >> The question is, how often are people going to notice that when >> they have pure ASCII they get a 100% speedup [...]? Gareth> Why is that the question, rather than "how often are Gareth> people going to benefit from getting a 100% speedup when Gareth> they have pure ASCII"? Because "benefit" is very subjective for _one_ person, and I don't want to even think about putting coefficients on your benefit versus mine. If the benefit is large enough, a single person will be willing to do the extra work. The question is, should all Python users and developers bear some burden to make it easier for that person to do what he needs to do? I think "notice" is something you can get consensus on. If a lot of people are _noticing_ the difference, I think that's a reasonable rule of thumb for when we might want to put "it", or facilities for making individual efforts to deal with "it" simpler, into "standard Python" at some level. If only a few people are noticing, let them become expert at dealing with it. Gareth> Or even "how often are people going to try out Python on Gareth> an application that uses pure-ASCII strings, and decide to Gareth> use some other language that seems to do the job much Gareth> faster"? See? You're now using a "notice" standard, too. I don't think that's an accident. >> I just don't see the former being worth the extra effort, while >> the latter makes the "this or that" choice clear. If a single >> representation is enough, it had better be Unicode-based, and >> the others can be supported in libraries (which turn binary >> blobs into non-standard text objects with appropriate methods) >> as the need arises. Gareth> No question that if a single representation is enough then Gareth> it had better be Unicode. Not for you, not for me, not for , I'm pretty sure. The point here is that there is a reasonable way to support the others, too, but their users will have to make more effort than if it were a goal to support them in the "standard language and libraries." I think that's the way to go, and thinks the opposite AFAICT. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba From fredrik at pythonware.com Sat Sep 11 10:13:44 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 11 10:11:57 2004 Subject: [Python-Dev] Re: AlternativeImplementation forPEP292:SimpleStringSubstitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><200409090939.41873.gmccaughan@synaptics-uk.com><87k6v2smxt.fsf_-_@tleepslib.sk.tsukuba.ac.jp><200409101257.13802.gmccaughan@synaptics-uk.com> <87mzzxqmvn.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull wrote: > I think "notice" is something you can get consensus on. If a lot of > people are _noticing_ the difference, I think that's a reasonable rule > of thumb for when we might want to put "it", or facilities for making > individual efforts to deal with "it" simpler, into "standard Python" > at some level. who are "we"? does that group include you? From martin at v.loewis.de Sat Sep 11 10:39:14 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 11 10:39:04 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <000e01c497cc$983d6720$e841fea9@oemcomputer> References: <000e01c497cc$983d6720$e841fea9@oemcomputer> Message-ID: <4142B9B2.7060306@v.loewis.de> Raymond Hettinger wrote: > [Brett] > >>I suspect stating exactly what a valid Python >>identifier contains as you did in parentheses above will be enough. > > > Given the template, u'?Puede volver $hoy o $ma?ana?', you think $ma is > an intended placeholder name and that ? should be a delimiter just like > whitespace and punctuation? No, I think Brett (and apparently nearly everybody else) thinks that such a template will not be written over the course of the next five years, except for demonstration purposes. Instead, what will be written is u'?Puede volver $today o $tomorrow?' because the template will be a translation of the original English template, and, during translation, placeholder names must not be changed (although I have difficulties imagining possible values for today or tomorrow so that this becomes meaningful). > If end users always follow the rules, this will never come up. If they > don't, should there be error message or a silent failure? There is always a chance of a silent failure in SafeTemplates, even with this rule added - this is the purpose of SafeTemplates. With a Template, you will get a KeyError. In any case, the failure will not be completely silent, as the user will see $ma?ana show up in the output. My prediction is that the typical application is to use Templates, as users know very well what the placeholders are. Furthermore, the typical application will use locals/globals/vars(), or dict(key="value") to create the replacement dictionary. In this application, nobody would even think of using ma?ana as a key, because you can't get it into the dictionary. If this never comes up, it is better to not complicate the rules. Simple is better than complex. Regards, Martin From fredrik at pythonware.com Sat Sep 11 10:47:17 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 11 10:56:47 2004 Subject: [Python-Dev] Re: Re: Missing arguments in RE functions References: <1094836885.4141e2953f7df@mcherm.com> Message-ID: Michael Chermside wrote: > Fredrik, a less hostile response would be appropriate here. No one > knows every detail of every API of any reasonably sized library > (like Python's). We're not talking about Python's library, we're talking about Python's RE library. It's not that big, really. The documentation is five moderately-sized HTML pages, plus a page with examples. Seven functions (plus two trivial variations) and two object types. You cannot use the library at all without knowing the stuff that's discussed on the first, third, and fifth page; the two other pages discuss pos/endpos issues within the first few paragraphs. Are we trying to optimize Python for people who won't read evenly-numbered sections? > On the whole, there are MORE indiviual "pieces" to the > API but because of orthogonality the API as a whole is > simpler. Given that there's no way to order the arguments consistently (since some arguments apply to the compilation process, other to the match process), you're obviously using "orthogonal" and "simple" in the Perl sense ;-) From fredrik at pythonware.com Sat Sep 11 11:51:23 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 11 11:49:32 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292: Simple String Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> Message-ID: M.-A. Lemburg wrote: >> (google for "stringlib" for some work I'm doing in this area) > > Ah, now I know where you're coming from :-) Shift tables > don't work well in the Unicode world with its large alphabet. since most real-life text use characters from only a small number of regions in that alphabet, compressed shift tables work extremely well (the algorithm on the stringlib page shows one way to do that, in constant space and O(m) time). > BTW, you might want to look at the BMS implementation I did > for mxTextTools. did you ever get around to add Unicode support to mxTextTools ? From erik at heneryd.com Sat Sep 11 13:54:52 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 11 13:55:00 2004 Subject: [Python-Dev] PEP 292: method names Message-ID: <4142E78C.7010800@heneryd.com> I haven't followed the template threads very closely, but reading the pep/implementation it clearly looks useful. I don't know if I like the method names substitute/safe_substitute though. * Too long 10/15 character names for something so simple it up until now just needed a %? Programs using templates will probably use them frequently... I'd prefer sub instead of substitute. * Safe? safe_substitution doesn't tell you much upon first glance. Safe? In what way? You could even argue that the "plain" version really is the safer one, as you'll notice typos and thus get a more solid program. I think a name hinting that this method uses the var name as a fallback would be better, but can't think of (a short) one... defaultsub? fallbacksub? loosesub? Guess I could live with safe, but... Erik From erik at heneryd.com Sat Sep 11 15:04:16 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 11 15:04:20 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <4142E78C.7010800@heneryd.com> References: <4142E78C.7010800@heneryd.com> Message-ID: <4142F7D0.5080807@heneryd.com> Erik Heneryd wrote: > * Safe? > safe_substitution doesn't tell you much upon first glance. Safe? In > what way? You could even argue that the "plain" version really is the > safer one, as you'll notice typos and thus get a more solid program. I > think a name hinting that this method uses the var name as a fallback > would be better, but can't think of (a short) one... defaultsub? > fallbacksub? loosesub? Guess I could live with safe, but... Come to think of it, I really like the more OO-ish approach better, than to cram everything into a single class. Is the safe_substitute really that special it deserves a special method? Is it really the one, true way to do a "safe" substitution? IIRC DOS and sh don't agree, so it's not that obvious. I say keep the inheritance thing, it's much more flexible, and delegate the KeyError condition to an overridable method. Erik From nidoizo at yahoo.com Sat Sep 11 18:01:28 2004 From: nidoizo at yahoo.com (Nicolas Fleury) Date: Sat Sep 11 18:00:15 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: References: <20040910183258.A1DCE1E4009@bag.python.org> Message-ID: Guido van Rossum wrote: > And how is it more consistent that in one form you have to write > > re.compile(r"[a-z]+", re.I).search(line) > > while in the other form you have to write > > re.search(r"[a-z]+", line, re.I) > > ??? > > This parameter ordering issue alone makes me cringe at adding the > flags to the functions. I agree. In fact, probably the line parameter should have been the first parameter for all functions, but anyway it's too late for that. The few times I have not used pattern objects in quick scripts, I always put the line at first at the wrong place instinctively, probably for the reason you mention (and I'm not pretending my instinct is universal). In that context, keeping the API is even more reasonable. Regards, Nicolas From jlgijsbers at planet.nl Sat Sep 11 18:26:51 2004 From: jlgijsbers at planet.nl (Johannes Gijsbers) Date: Sat Sep 11 18:25:12 2004 Subject: [Python-Dev] doctest and inspect.getmodule Message-ID: <20040911162650.GA9132@mail.planet.nl> I just checked in a change to inspect.getmodule (without running the tests beforehand, not a smart move) which broke a whole bunc of tests for doctest. The tests mostly seem to fail because doctest can find modules for objects it previously couldn't. I think the change is basically correct, but I'm not sure how to fix doctest. Should doctest omit the module, or should the doctest tests be changed to expect the module being printed? Oh, I promise I'll run the tests before checking in next time. Johannes P.S.: here's the checkin message for the change: Modified Files: inspect.py Log Message: Use __module__ attribute when available instead of using isclass() predicate (functions and methods have grown the __module__ attribute too). See bug #570300. Index: inspect.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/inspect.py,v retrieving revision 1.54 retrieving revision 1.55 diff -u -d -r1.54 -r1.55 --- inspect.py 18 Aug 2004 12:40:30 -0000 1.54 +++ inspect.py 11 Sep 2004 15:53:22 -0000 1.55 @@ -370,7 +370,7 @@ """Return the module an object was defined in, or None if not found.""" if ismodule(object): return object - if isclass(object): + if hasattr(object, '__module__'): return sys.modules.get(object.__module__) try: file = getabsfile(object) From fredrik at pythonware.com Sat Sep 11 18:23:27 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 11 18:30:27 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions References: <20040910183258.A1DCE1E4009@bag.python.org> Message-ID: Nicolas Fleury wrote: >> This parameter ordering issue alone makes me cringe at adding the >> flags to the functions. > > I agree. In fact, probably the line parameter should have been the first parameter for all > functions so where would you put the pattern? From erik at heneryd.com Sat Sep 11 18:34:40 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 11 18:34:45 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: References: <20040910183258.A1DCE1E4009@bag.python.org> Message-ID: <41432920.2020701@heneryd.com> Nicolas Fleury wrote: > Guido van Rossum wrote: > >> And how is it more consistent that in one form you have to write >> >> re.compile(r"[a-z]+", re.I).search(line) >> >> while in the other form you have to write >> >> re.search(r"[a-z]+", line, re.I) >> >> ??? >> >> This parameter ordering issue alone makes me cringe at adding the >> flags to the functions. > > > I agree. In fact, probably the line parameter should have been the > first parameter for all functions, but anyway it's too late for that. > The few times I have not used pattern objects in quick scripts, I always > put the line at first at the wrong place instinctively, probably for the > reason you mention (and I'm not pretending my instinct is universal). In > that context, keeping the API is even more reasonable. Well, considering that mandatory parameters must come before optional ones, theres really not much to do. At least the mandatory function parameters are in the "right" order. Erik From nidoizo at yahoo.com Sat Sep 11 18:44:01 2004 From: nidoizo at yahoo.com (Nicolas Fleury) Date: Sat Sep 11 18:42:44 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: References: <20040910183258.A1DCE1E4009@bag.python.org> Message-ID: Fredrik Lundh wrote: >>I agree. In fact, probably the line parameter should have been the first parameter for all >>functions > > so where would you put the pattern? Just after. You "insert" the line parameter first, since it's the additional parameter to the pattern objects functions. It's basically the input followed by everything to modify/search it. I think it's better to insert it at first, since it's a mandatory argument, while it can be logical to have optional flags for patterns (and I'm not talking about the current request, but in general in API design). But again, it's too late for that and I don't pretend my instinct is universal. Regards, Nicolas From nidoizo at yahoo.com Sat Sep 11 18:47:32 2004 From: nidoizo at yahoo.com (Nicolas Fleury) Date: Sat Sep 11 18:50:54 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: <41432920.2020701@heneryd.com> References: <20040910183258.A1DCE1E4009@bag.python.org> <41432920.2020701@heneryd.com> Message-ID: Erik Heneryd wrote: > At least the mandatory function > parameters are in the "right" order. I think otherwise. See reply to Fredrik. Regards, Nicolas From erik at heneryd.com Sat Sep 11 18:51:32 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 11 18:51:38 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: References: <20040910183258.A1DCE1E4009@bag.python.org> Message-ID: <41432D14.6050106@heneryd.com> Nicolas Fleury wrote: > Fredrik Lundh wrote: > >>> I agree. In fact, probably the line parameter should have been the >>> first parameter for all functions >> >> >> so where would you put the pattern? > > > Just after. You "insert" the line parameter first, since it's the > additional parameter to the pattern objects functions. It's basically > the input followed by everything to modify/search it. I think it's > better to insert it at first, since it's a mandatory argument, while it > can be logical to have optional flags for patterns (and I'm not talking > about the current request, but in general in API design). But again, > it's too late for that and I don't pretend my instinct is universal. compile() doesn't have that many additional parameters, just the optional flags. OTOH the regex object methods do (both mandatory and optional). Wouldn't it be stupid to insert the pattern in the middle of the method parameters? Erik From gvanrossum at gmail.com Sat Sep 11 18:55:17 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat Sep 11 18:55:22 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: <41432D14.6050106@heneryd.com> References: <20040910183258.A1DCE1E4009@bag.python.org> <41432D14.6050106@heneryd.com> Message-ID: I don't see any reason to continue this debate. Patch rejected. Go argue somewhere else if you can't stop arguing. In case any of the participants think they can convince the rest of the world with *one* more post, *one* more clever argument: when was the last time that worked? They didn't change their mind on any of your previous posts, so why would they now? Think about it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) Ask me about gmail. From bac at OCF.Berkeley.EDU Sat Sep 11 19:07:29 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sat Sep 11 19:07:39 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <4142B9B2.7060306@v.loewis.de> References: <000e01c497cc$983d6720$e841fea9@oemcomputer> <4142B9B2.7060306@v.loewis.de> Message-ID: <414330D1.4060202@ocf.berkeley.edu> Martin v. L?wis wrote: > Raymond Hettinger wrote: > >> [Brett] >> >>> I suspect stating exactly what a valid Python >>> identifier contains as you did in parentheses above will be enough. >> >> >> >> Given the template, u'?Puede volver $hoy o $ma?ana?', you think $ma is >> an intended placeholder name and that ? should be a delimiter just like >> whitespace and punctuation? > > > No, I think Brett (and apparently nearly everybody else) thinks that > such a template will not be written over the course of the next five > years, except for demonstration purposes. Instead, what will be written > is u'?Puede volver $today o $tomorrow?' because the template will be > a translation of the original English template, and, during translation, > placeholder names must not be changed (although I have difficulties > imagining possible values for today or tomorrow so that this becomes > meaningful). > Actually, that wasn't what I was thinking, but that also works. My original thinking is that Template will throw a fit and that's fine since they didn't follow the rules. >> If end users always follow the rules, this will never come up. If they >> don't, should there be error message or a silent failure? > > > There is always a chance of a silent failure in SafeTemplates, even with > this rule added - this is the purpose of SafeTemplates. With a Template, > you will get a KeyError. In any case, the failure will not be completely > silent, as the user will see $ma?ana show up in the output. > Right, my other reason for not thinking this is a big issue. If you use SafeTemplate you will have to watch out for silent problems like this anyway. I just don't think it will be a big problem. And if people want the support they will just use a pure Unicode Template subclass (perhaps we should include that in the module?). -Brett From nidoizo at yahoo.com Sat Sep 11 19:28:35 2004 From: nidoizo at yahoo.com (Nicolas Fleury) Date: Sat Sep 11 19:27:19 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: <41432D14.6050106@heneryd.com> References: <20040910183258.A1DCE1E4009@bag.python.org> <41432D14.6050106@heneryd.com> Message-ID: Erik Heneryd wrote: > compile() doesn't have that many additional parameters, just the > optional flags. OTOH the regex object methods do (both mandatory and > optional). Wouldn't it be stupid to insert the pattern in the middle of > the method parameters? Just a last post to end the debate. I agree with you. I looked at the API and I think my suggestion was wrong. I think my basic instinct was due to the fact that I made a lot of Perl and that I use mostly only match/search/sub with only pattern flags. Since there's no really intuitive way to mix two APIs with mandatory and optional arguments, and also that using pattern objects is the way to go, I'm now -1 with the patch. Sorry to have not understood alone, if anyone wants to continue the discussion, I will do it privately. Regards, Nicolas From tim.peters at gmail.com Sat Sep 11 19:46:20 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sat Sep 11 19:46:26 2004 Subject: [Python-Dev] doctest and inspect.getmodule In-Reply-To: <20040911162650.GA9132@mail.planet.nl> References: <20040911162650.GA9132@mail.planet.nl> Message-ID: <1f7befae04091110466ceb66cc@mail.gmail.com> [Johannes Gijsbers] > I just checked in a change to inspect.getmodule (without running the tests > beforehand, not a smart move) which broke a whole bunc of tests for doctest. > The tests mostly seem to fail because doctest can find modules for objects it > previously couldn't. All failures were like that. test_doctest.py contains lots of "recursive" uses of doctest, where test_doctest.py functions contain docstrings that themselves contain both definitions of functions with their own docstrings, and calls to doctest functions. Before your change, functions defined inside docstrings and dynamically compiled by doctest.py were a mystery to inspect.getmodule(), but after your change getmodule() figured it knew which module they came from. This had no effect on doctest doctests that showed succeeding doctest examples, but for doctest doctests showing failing doctest examples, the failure-output "and which doctest failed?" meta line changed, from stuff like: Line 3, in f to stuff like: File "C:\Code\python\lib\test\test_doctest.py", line 4, in f Couldn't be more obvious . > I think the change is basically correct, Me too. > but I'm not sure how to fix doctest. That's OK, I already did. doctest didn't need any changes, but the expected output in test_doctest.py had to be fiddled. ... > Oh, I promise I'll run the tests before checking in next time. Everyone is entitled to one screwup per century. This was yours . From mal at egenix.com Sat Sep 11 23:00:18 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Sat Sep 11 23:00:17 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292: Simple String Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> Message-ID: <41436762.7040207@egenix.com> Fredrik Lundh wrote: > M.-A. Lemburg wrote: > > >>>(google for "stringlib" for some work I'm doing in this area) >> >>Ah, now I know where you're coming from :-) Shift tables >>don't work well in the Unicode world with its large alphabet. > > since most real-life text use characters from only a small number of regions > in that alphabet, compressed shift tables work extremely well (the algorithm > on the stringlib page shows one way to do that, in constant space and O(m) > time). You mean: a compressed shift table for Unicode patterns ? I'll have a look. >>BTW, you might want to look at the BMS implementation I did >>for mxTextTools. > > > did you ever get around to add Unicode support to mxTextTools ? Yes in egenix-mx-base 2.1.0. It's not yet released, but Google will find the most recent snapshot :-) The package has been available as beta for more than a year now; just haven't found time to cut a release. The search functions from 2.0 were replaced with search objects that can deal with both 8-strings and Unicode. However, the Unicode search implementation uses a rather naive approach due to the shift table problem (and my lack of time). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From noamr at myrealbox.com Sat Sep 11 23:19:50 2004 From: noamr at myrealbox.com (Noam Raphael) Date: Sat Sep 11 23:21:05 2004 Subject: [Python-Dev] Re: Missing arguments in RE functions In-Reply-To: <1094836885.4141e2953f7df@mcherm.com> References: <1094836885.4141e2953f7df@mcherm.com> Message-ID: <41436BF6.6080903@myrealbox.com> Ok, so I understand that Guido doesn't want to extend the functions' API to have the full functionality. Fine. However, I've suggested three things that I think should be done in that case, and nobody objected. Here they are: 1. Add a prominent note in the module contents page or in the module's main page, stating that some functionality can only be acheived by using compiled REs. 2. Document the optional parameters which let you specify the start and end pos in the findall and finditer methods of a compiled RE object. 3. Add the optional parameter "flags" to the findall and finditer functions. Then, the four functions match, search, findall and finditer would have the same interface: function(pattern, string[, flags]). Does anyone have any objections? Noam From tim.peters at gmail.com Sun Sep 12 00:43:27 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sun Sep 12 00:43:41 2004 Subject: [Python-Dev] SHA-256 module Message-ID: <1f7befae04091115436d5a70fa@mail.gmail.com> [Michael Hudson, on 30 June 2004] >> Nevertheless, am I right to still believe that there are no known >> distinct strings which even MD5 to the same hash? [Andrew Kuchling] > Correct. And two months later, the world is all different again: """ import md5 S = ('\xd11\xdd\x02\xc5\xe6\xee\xc4i=\x9a\x06\x98\xaf\xf9\\' '/\xca\xb5\x87\x12F~\xab@\x04X>\xb8\xfb\x7f\x89U\xad4' '\x06\t\xf4\xb3\x02\x83\xe4\x88\x83%qAZ\x08Q%\xe8\xf7' '\xcd\xc9\x9f\xd9\x1d\xbd\xf2\x807<[\x96\x0b\x1d\xd1' '\xdcA{\x9c\xe4\xd8\x97\xf4ZeU\xd55s\x9a\xc7\xf0\xeb' '\xfd\x0c0)\xf1f\xd1\t\xb1\x8fu\'\x7fy0\xd5\\\xeb"' '\xe8\xad\xbay\xcc\x15\\\xedt\xcb\xdd_\xc5\xd3m\xb1' '\x9b\n\xd85\xcc\xa7\xe3') T = ('\xd11\xdd\x02\xc5\xe6\xee\xc4i=\x9a\x06\x98\xaf\xf9\\' '/\xca\xb5\x07\x12F~\xab@\x04X>\xb8\xfb\x7f\x89U\xad4' '\x06\t\xf4\xb3\x02\x83\xe4\x88\x83%\xf1AZ\x08Q%\xe8\xf7' '\xcd\xc9\x9f\xd9\x1d\xbdr\x807<[\x96\x0b\x1d\xd1\xdcA{' '\x9c\xe4\xd8\x97\xf4ZeU\xd55s\x9aG\xf0\xeb\xfd\x0c0)' '\xf1f\xd1\t\xb1\x8fu\'\x7fy0\xd5\\\xeb"\xe8\xad\xbayL' '\x15\\\xedt\xcb\xdd_\xc5\xd3m\xb1\x9b\nX5\xcc\xa7\xe3') assert S != T print md5.new(S).hexdigest() print md5.new(T).hexdigest() print "oops" """ A number of hash functions got cracked since this thread started, by some researchers in China: http://eprint.iacr.org/2004/199.pdf MD5 is truly dead now for "secure" applications. Maybe someone who gives a rip could update the docs. Best I understand it, SHA-1 still stands, although a variant with half the rounds has been cracked. It does increase the desirability (IMO) of adding SHA-256, lest SHA-1 get cracked too while Python 2.4.j is still current. From fredrik at pythonware.com Sun Sep 12 13:23:15 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Sep 12 13:24:03 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292: SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <41436762.7040207@egenix.com> Message-ID: M.-A. Lemburg wrote: > You mean: a compressed shift table for Unicode patterns ? > I'll have a look. It's a lossy compression: the entire delta1 table is represented as two 32-bit values, independent of the size of the source alphabet. Works amazingly well, at least when combined with the BM-variant it was designed for... (I suppose it's too late for 2.4, but it would probably be a good idea to switch to this algorithm in 2.5) From mwh at python.net Sun Sep 12 18:02:04 2004 From: mwh at python.net (Michael Hudson) Date: Sun Sep 12 18:02:06 2004 Subject: [Python-Dev] SHA-256 module In-Reply-To: <1f7befae04091115436d5a70fa@mail.gmail.com> (Tim Peters's message of "Sat, 11 Sep 2004 18:43:27 -0400") References: <1f7befae04091115436d5a70fa@mail.gmail.com> Message-ID: <2my8jfxypv.fsf@starship.python.net> Tim Peters writes: > [Michael Hudson, on 30 June 2004] >>> Nevertheless, am I right to still believe that there are no known >>> distinct strings which even MD5 to the same hash? > > [Andrew Kuchling] >> Correct. > > And two months later, the world is all different again: Heh, I'd already blogged about that: http://starship.python.net/crew/mwh/blog/nb.cgi/view/weblog/2004/08/18/0 > """ > import md5 > > S = ('\xd11\xdd\x02\xc5\xe6\xee\xc4i=\x9a\x06\x98\xaf\xf9\\' > '/\xca\xb5\x87\x12F~\xab@\x04X>\xb8\xfb\x7f\x89U\xad4' > '\x06\t\xf4\xb3\x02\x83\xe4\x88\x83%qAZ\x08Q%\xe8\xf7' > '\xcd\xc9\x9f\xd9\x1d\xbd\xf2\x807<[\x96\x0b\x1d\xd1' > '\xdcA{\x9c\xe4\xd8\x97\xf4ZeU\xd55s\x9a\xc7\xf0\xeb' > '\xfd\x0c0)\xf1f\xd1\t\xb1\x8fu\'\x7fy0\xd5\\\xeb"' > '\xe8\xad\xbay\xcc\x15\\\xedt\xcb\xdd_\xc5\xd3m\xb1' > '\x9b\n\xd85\xcc\xa7\xe3') > > T = ('\xd11\xdd\x02\xc5\xe6\xee\xc4i=\x9a\x06\x98\xaf\xf9\\' > '/\xca\xb5\x07\x12F~\xab@\x04X>\xb8\xfb\x7f\x89U\xad4' > '\x06\t\xf4\xb3\x02\x83\xe4\x88\x83%\xf1AZ\x08Q%\xe8\xf7' > '\xcd\xc9\x9f\xd9\x1d\xbdr\x807<[\x96\x0b\x1d\xd1\xdcA{' > '\x9c\xe4\xd8\x97\xf4ZeU\xd55s\x9aG\xf0\xeb\xfd\x0c0)' > '\xf1f\xd1\t\xb1\x8fu\'\x7fy0\xd5\\\xeb"\xe8\xad\xbayL' > '\x15\\\xedt\xcb\xdd_\xc5\xd3m\xb1\x9b\nX5\xcc\xa7\xe3') > > assert S != T > print md5.new(S).hexdigest() > print md5.new(T).hexdigest() > print "oops" > """ > > A number of hash functions got cracked since this thread started, by > some researchers in China: > > http://eprint.iacr.org/2004/199.pdf Is there any resource that explains these guys results any more fully? The only examples I've seen only differ in a very few bits. > MD5 is truly dead now for "secure" applications. I'd say it's resting :) > Maybe someone who gives a rip could update the docs. > Best I understand it, SHA-1 still stands, although a variant with half > the rounds has been cracked. It does increase the desirability (IMO) > of adding SHA-256, lest SHA-1 get cracked too while Python 2.4.j is > still current. I'm hardly an expert, but I'd still like to know more about this attack. If it's as limited as it could possibly be (i.e. it can only make very specific strings differing by a handful of bits hash the same) then it's only an issue for the paranoid. If it's as wide as it could possibly be it seems that all hash functions we currently know could be doomed. Cheers, mwh -- Q: Isn't it okay to just read Slashdot for the links? A: No. Reading Slashdot for the links is like having "just one hit" off the crack pipe. -- http://www.cs.washington.edu/homes/klee/misc/slashdot.html#faq From tim.peters at gmail.com Sun Sep 12 21:44:30 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sun Sep 12 21:44:33 2004 Subject: [Python-Dev] SHA-256 module In-Reply-To: <2my8jfxypv.fsf@starship.python.net> References: <1f7befae04091115436d5a70fa@mail.gmail.com> <2my8jfxypv.fsf@starship.python.net> Message-ID: <1f7befae0409121244712506d0@mail.gmail.com> [Tim Peters] ... >> A number of hash functions got cracked since this thread started, by >> some researchers in China: >> >> http://eprint.iacr.org/2004/199.pdf [Michael Hudson] > Is there any resource that explains these guys results any more fully? Not that I know of. I've read that they're writing a paper on *how* their approach works, but it will take time to finish it. There's no doubt that they're on to something. Apparently the first version of the paper provided collisions for a hash that wasn't actually MD5, due (at least) to confusing endianness in places. This was pointed out at the conference, and by the next morning they produced two collisions for "the real" MD5. > The only examples I've seen only differ in a very few bits. Probably due to the method, which apparently makes a sequence of small, controlled changes, based more on analysis than on brute force. Given the uses of MD5 for verifying downloads, it doesn't take much of a change to open "a security hole" in C code, so even if they can't extend the method beyond a few bits' difference, that would be cold comfort. I note that they got to pick both msgs here, and haven't claimed to be able to derive a collision for a given msg. When more about their method is known, it may or may not prove feasible to extend. >> MD5 is truly dead now for "secure" applications. > I'd say it's resting :) I based "truly dead" on press reaction. MD5 had been falling out of favor for years anyway (due to earlier cracks of various weakened versions); this is just nail-in-the-coffin news. > ... > I'm hardly an expert, but I'd still like to know more about this > attack. If it's as limited as it could possibly be (i.e. it can only > make very specific strings differing by a handful of bits hash the > same) then it's only an issue for the paranoid. If it's as wide as it > could possibly be it seems that all hash functions we currently know > could be doomed. Security weenies are paranoid by necessity -- paranoia is part of their field. I'm not sure there's ever been a real-world attack based on a "double free" bug, for example, but finding such a bug is sufficient to kill a product release anyway. They don't claim to have an attack against SHA-1, BTW. Someone else reported collisions using a grossly weakened SHA-1, with 42 rounds instead of 80. From martin at v.loewis.de Sun Sep 12 23:51:27 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 12 23:51:16 2004 Subject: [Python-Dev] SHA-256 module In-Reply-To: <2my8jfxypv.fsf@starship.python.net> References: <1f7befae04091115436d5a70fa@mail.gmail.com> <2my8jfxypv.fsf@starship.python.net> Message-ID: <4144C4DF.5020100@v.loewis.de> Michael Hudson wrote: > I'm hardly an expert, but I'd still like to know more about this > attack. If it's as limited as it could possibly be (i.e. it can only > make very specific strings differing by a handful of bits hash the > same) then it's only an issue for the paranoid. If it's as wide as it > could possibly be it seems that all hash functions we currently know > could be doomed. The nicest summary I have seen on this so far was Tim Churches' message . In his terminology, "collision resistance" has been attacked (i.e. it is now possible to create pairs of plaintext that hash same). "Preimage resistance" and "2nd preimage resistance" remain unattacked, atleast wrt. to this paper. IOW, it is still not possible to easily reconstruct some plaintext given the hash (good for password hashing), and it is still not possible to modify a given plaintext so that it still hashes same (good for signing). However, the trust into "pseudo-randomness" of the hash is gone now - for a cryptographically "secure" hash, it should not be possible to create a collision until the sun collapses. Regards, Martin From dave.l.harrison at gmail.com Mon Sep 13 03:34:06 2004 From: dave.l.harrison at gmail.com (David Harrison) Date: Mon Sep 13 03:34:08 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value Message-ID: Hi all, Quick pep265 summary : People frequently want to count the occurrences of values in a dict, or sort the results of a d.items() call by value. This could be done by extending the current items() definition, or by creating a new function for the dict object (both requiring a C implementation). I've had a read through pep265 a few times now, and every time I've had two immediate reactions. First, that I've been there too. I've found myself innumerable times needing to count the occurrences of values in a dict. However second, that dicts shouldn't be naturally sortable. A dict does not guarantee the order that it returns calls such as items() keys() or values(). It's my feeling that we should not encourage people to rely on a dict returning a set ordering, since as a hash based data structure they are designed for key lookup not sequential traversal - if you want to sort something, massage the data into a list and then sort the list (I've seen a proposal before that the sort function be able to handle objects which would allow sorting of 2 dimensional lists). With regards to the two arguments put forward by Grant, the first - that it is an idiom known only to experienced campaigners - does not seem to be a supportable argument to me. I think the problem has quite a simple elegant solution which is rather easily discovered - there are lots of differences in Python that require an inexperienced programmer to learn a new idiom (such as the looping construct). The second, that the solution is full of 'grunge', seems a matter of taste and use to me. As mentioned in the pep there are different kinds of comparison that may be wanted, but could not be supported. Further, it is a natural use case of a dict that items held within it need not be of the same type (and therefore makes the idea of a comparison between them meaningless). With respect to implementation suggestions, numbers 1 2 and 3 definitely don't work for me. To extend the usage of items() without similarly extending the usage of keys() and values() would mean that we are special casing the items() function in a way that makes it inconsistant with the other dict functions. Number 5 seems too specific to me. I could live with 4 ;-) I think in the end it's my feeling that these kind of idioms belong in the cookbook - which, incidentally, it already is to a certain extent under 'Sorting a Dictionary', another recipe could always be added for this ;-) cheers Dave Harrison From barry at python.org Mon Sep 13 03:40:32 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 13 03:40:36 2004 Subject: [Python-Dev] Re: PEP 328 - Relative Imports In-Reply-To: References: <413F1B87.90301@egenix.com><41416E76.8030603@egenix.com> <41419432.2000600@zope.com> <1094828671.30837.23.camel@geddy.wooz.org> <4141DEDC.8080503@egenix.com> <1094842075.30831.55.camel@geddy.wooz.org> Message-ID: <1095039632.30217.7.camel@geddy.wooz.org> On Fri, 2004-09-10 at 18:51, Nicolas Fleury wrote: > I agree with Guido. FWIW, I think imports should be absolute by default > and that the statu quo is a mistake. The __global__ solution makes > absolute imports too verbose, when they are usually in majority. I'm really not trying to argue strongly that __global__ is a solution, but let me just point out that I think they wouldn't be that common. You'd add an __global__ only when the "normal" import statement didn't do what you want, primarily because of a local module name that conflicted with a global module, and you really wanted the global. Ordinarily, those conflicts don't occur. OTOH, when they do, you can often "fix" the problem by renaming your local module, but that's a bit ugly. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040912/22303763/attachment.pgp From barry at python.org Mon Sep 13 03:49:37 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 13 03:49:41 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <4142B9B2.7060306@v.loewis.de> References: <000e01c497cc$983d6720$e841fea9@oemcomputer> <4142B9B2.7060306@v.loewis.de> Message-ID: <1095040177.30216.12.camel@geddy.wooz.org> On Sat, 2004-09-11 at 04:39, "Martin v. L?wis" wrote: > No, I think Brett (and apparently nearly everybody else) thinks that > such a template will not be written over the course of the next five > years, except for demonstration purposes. Instead, what will be written > is u'?Puede volver $today o $tomorrow?' because the template will be > a translation of the original English template, and, during translation, > placeholder names must not be changed (although I have difficulties > imagining possible values for today or tomorrow so that this becomes > meaningful). > > > If end users always follow the rules, this will never come up. If they > > don't, should there be error message or a silent failure? > > There is always a chance of a silent failure in SafeTemplates, even with > this rule added - this is the purpose of SafeTemplates. With a Template, > you will get a KeyError. In any case, the failure will not be completely > silent, as the user will see $ma?ana show up in the output. > > My prediction is that the typical application is to use Templates, as > users know very well what the placeholders are. Furthermore, the > typical application will use locals/globals/vars(), or dict(key="value") > to create the replacement dictionary. In this application, nobody > would even think of using ma?ana as a key, because you can't get > it into the dictionary. > > If this never comes up, it is better to not complicate the rules. > Simple is better than complex. I tend to agree, so I'd like to keep the rules as they currently stand. Your prediction is aligned with what I think the most common use cases are too. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040912/fe37123b/attachment.pgp From adurdin at gmail.com Mon Sep 13 04:17:13 2004 From: adurdin at gmail.com (Andrew Durdin) Date: Mon Sep 13 04:17:25 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value In-Reply-To: References: Message-ID: <59e9fd3a04091219174fa7c0f4@mail.gmail.com> On Mon, 13 Sep 2004 11:34:06 +1000, David Harrison wrote: > > With regards to the two arguments put forward by Grant, the first - that > it is an idiom known only to experienced campaigners - does not seem to > be a supportable argument to me. I think the problem has quite a > simple elegant solution which is rather easily discovered - there are > lots of differences in Python that require an inexperienced programmer > to learn a new idiom (such as the looping construct). And of course, it is better to teach these idioms to newbies so they become competent. A python newbie will be much better off if they learn the decorate, sort, [undecorate] idiom and list comprehensions, neither of which are particularly difficult; and both will serve the newbie well in many other areas. > With respect to implementation suggestions, numbers 1 2 and 3 definitely > don't work for me. To extend the usage of items() without similarly > extending the usage of keys() and values() would mean that we are > special casing the items() function in a way that makes it inconsistant > with the other dict functions. Number 5 seems too specific to me. I > could live with 4 ;-) To quote the PEP: """ Alternatively, items() could simply let us control the (key, value) order: (3) items(values_first=0) """ This suggestion No. 3 from the PEP does not special case the items() function in a way that makes it "inconsistent with the other dict functions" (i.e. keys(), values()); however it would suggest that dict() then also ought take such an inverted, values-first list of tuples if given an optional values_first parameter. But this IMHO makes the dict() constructor too complicated, as well as having a potential conflict with named keywords. From dave.l.harrison at gmail.com Mon Sep 13 04:51:13 2004 From: dave.l.harrison at gmail.com (David Harrison) Date: Mon Sep 13 04:51:19 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value In-Reply-To: References: <59e9fd3a04091219174fa7c0f4@mail.gmail.com> Message-ID: > > With respect to implementation suggestions, numbers 1 2 and 3 definitely > > don't work for me. To extend the usage of items() without similarly > > extending the usage of keys() and values() would mean that we are > > special casing the items() function in a way that makes it inconsistant > > with the other dict functions. Number 5 seems too specific to me. I > > could live with 4 ;-) > > To quote the PEP: > """ > Alternatively, items() could simply let us control the (key, value) > order: > > (3) items(values_first=0) > """ > This suggestion No. 3 from the PEP does not special case the items() > function in a way that makes it "inconsistent with the other dict > functions" (i.e. keys(), values()); however it would suggest that > dict() then also ought take such an inverted, values-first list of > tuples if given an optional values_first parameter. But this IMHO > makes the dict() constructor too complicated, as well as having a > potential conflict with named keywords. In the sense that items() can still be used as before, it remains consistent. However since the same argument could be used to equally promote the modification of other dict functions to accept such arguments - such as keys(values_first=0) - to make the change to items() alone is (in my humble opinion) inconsistent. But that's just my opinion, others may feel differently. From greg at cosc.canterbury.ac.nz Mon Sep 13 04:59:41 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Sep 13 04:59:48 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <4141D745.805@ieee.org> Message-ID: <200409130259.i8D2xfoK008493@cosc353.cosc.canterbury.ac.nz> > For Numeric/Numarray, I think and1/or1 would be unnecessary. If that > were true in general it would simplify the proposal signifigantly: > and2/or2 could be renamed to and/or and and1/or1 could be dropped. It's true that none of the use cases I put forward need and1/or1. But I was trying to think of the future and at least show how the general case could be accommodated. Leaving out and1/or1 would make things simpler, but at the risk of someone coming up with a use case for them in the future, requiring yet another change. Wouldn't it be best to get things right from the beginning if possible? Also, the simplification wouldn't be all that great. There would still be the need for two bytecodes per boolean operation to accommodate either short-circuiting or not. All that would be saved is testing for and calling the and1/or1 methods. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon Sep 13 05:05:26 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon Sep 13 05:05:34 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <1094715983.1472.7.camel@debbie> Message-ID: <200409130305.i8D35QmS008516@cosc353.cosc.canterbury.ac.nz> > I like the PEP with 'and' and 'or', but isn't the 'not' special method > essentially the inverse of __nonzero__? No, because: (1) __nonzero__ is restricted to returning a boolean result. (2) There are other contexts besides 'not' in which __nonzero__ gets called. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.hochberg at ieee.org Mon Sep 13 06:21:14 2004 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Mon Sep 13 06:21:24 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <200409130259.i8D2xfoK008493@cosc353.cosc.canterbury.ac.nz> References: <4141D745.805@ieee.org> <200409130259.i8D2xfoK008493@cosc353.cosc.canterbury.ac.nz> Message-ID: <4145203A.8070303@ieee.org> Greg Ewing wrote: >>For Numeric/Numarray, I think and1/or1 would be unnecessary. If that >>were true in general it would simplify the proposal signifigantly: >>and2/or2 could be renamed to and/or and and1/or1 could be dropped. > > > It's true that none of the use cases I put forward need and1/or1. But > I was trying to think of the future and at least show how the general > case could be accommodated. > > Leaving out and1/or1 would make things simpler, but at the risk of > someone coming up with a use case for them in the future, requiring > yet another change. Wouldn't it be best to get things right from the > beginning if possible? Sure, assuming using and1/or1 is the right approach. I'm not convinced it is. I think it would be better to start with something simple, but design the syntax so that it can be gracefully upgraded if compelling use cases emerge. My first thought on seeing the current proposal was that the special method names need changing. and2/or2 should be just and/or since these are the methods that will actually be used. I don't have a good name for and1/or1, but it's probably not hard to be more descriptive than the current names. Some imperfect possibilities: shortcircand, scand, preand, andsc. scand is my favorite of these. I suppose and1 could even be kept and only and2 renamed. After renaming stuff, we're halfway to the simpler solution. The next step is, having established both that it's possible to implement full, custom short circuiting as per your patch and that there are no use cases for the custom short circuiting yet, we then just drop scand/scor until a compelling use case shows up, if it ever does. > Also, the simplification wouldn't be all that great. There would > still be the need for two bytecodes per boolean operation to > accommodate either short-circuiting or not. All that would be saved is > testing for and calling the and1/or1 methods. I'll take your word for it that the implementation would not be appreciably simpler. However, conceptually it's much simpler without __and1__/__or1__. Explaining the full version looks difficult, so why burden ourselves with that if we don't have to. At least not yet. -tim From adurdin at gmail.com Mon Sep 13 06:21:38 2004 From: adurdin at gmail.com (Andrew Durdin) Date: Mon Sep 13 06:21:44 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value In-Reply-To: References: <59e9fd3a04091219174fa7c0f4@mail.gmail.com> Message-ID: <59e9fd3a04091221216aae55e9@mail.gmail.com> On Mon, 13 Sep 2004 12:29:11 +1000, David Harrison wrote: > > In the sense that items() can still be used as before, it remains > consistent. However since the same argument could be used to equally > promote the modification of other dict functions to accept such > arguments - such as keys(values_first=0) - to make the change to > items() alone is (in my humble opinion) inconsistent. > But that's just my opinion, others may feel differently. To me, neither mydict.keys(values_first=whatever) nor mydict.values(values_first=whatever) make any sense: these methods return a list of only keys or only values, so saying "values first" when you're getting a list of keys is meaningless. From stephen at xemacs.org Mon Sep 13 06:21:32 2004 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon Sep 13 06:21:54 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292: Simple String Substitutions In-Reply-To: (Fredrik Lundh's message of "Sat, 11 Sep 2004 11:51:23 +0200") References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> Message-ID: <87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> >>>>> "Fredrik" == Fredrik Lundh writes: Fredrik> M.-A. Lemburg wrote: >>> (google for "stringlib" for some work I'm doing in this area) >> Ah, now I know where you're coming from :-) Shift tables don't >> work well in the Unicode world with its large alphabet. Fredrik> since most real-life text use characters from only a Fredrik> small number of regions in that alphabet, This is true of "most real-life text", but it's going to be false most of the time for a large (and rapidly growing) minority of users: those working with texts comprised mostly of Asian ideographs. Unihan (spread over about 80 256-character rows) has a potential big problem: because it is ordered by root, then stroke count, the simpler (and usually more frequently used) ideographs with a common root cluster near the root. Whether those clusters frequently overlap based on a simple compression method like "lowest 5 bits" I don't know offhand. I don't know whether the composed Hangul (~ 40 rows) would show clustering; that would depend on phonetic frequencies in the Korean language. Of course the find algorithm you present is almost surely a big win over the brute-force method, even in the presence of some degree of clustering in Unihan and Hangul. But I worry that it's an exceptional example, when you use assumptions like "real-life text uses characters drawn from a small number of short contiguous regions in the alphabet." -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. From dave.l.harrison at gmail.com Mon Sep 13 06:44:25 2004 From: dave.l.harrison at gmail.com (David Harrison) Date: Mon Sep 13 06:44:28 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value In-Reply-To: References: <59e9fd3a04091219174fa7c0f4@mail.gmail.com> <59e9fd3a04091221216aae55e9@mail.gmail.com> Message-ID: > > In the sense that items() can still be used as before, it remains > > consistent. However since the same argument could be used to equally > > promote the modification of other dict functions to accept such > > arguments - such as keys(values_first=0) - to make the change to > > items() alone is (in my humble opinion) inconsistent. > > But that's just my opinion, others may feel differently. > > To me, neither mydict.keys(values_first=whatever) nor > mydict.values(values_first=whatever) make any sense: these methods > return a list of only keys or only values, so saying "values first" > when you're getting a list of keys is meaningless. Oops my mistake, I was thinking along the lines of requesting a sort order based on value (i.e. keys() returns in the order of its values, increasing or decreasing). So my misunderstanding aside ;-) ... We would have the following situation (just to clarify) : >>> d = { 'a' : 1 , 'b' :2, 'c':0 } >>> itemList = d.items(values_first=1) >>> itemList [(1, 'a'), (2, 'b'), (0, 'c')] >>> itemList.sort() >>> itemList [(0, 'c'), (1, 'a'), (2, 'b')] That is probably my preferred option for this usage. But to play devil's advocate for a minute. Considering that a dict is a key based data structure and not a sequential structure, does it really make sense to be able to request its inversion ? For example : >>> d = { 'a' : [1,2,3] , 'b' : [4,5,6], 'c':[7,8,9] } >>> d.items(values_first=1) [ ([1,2,3], 'a'), ([4,5,6], 'b'), ([7,8,9], 'c') ] This just doesnt make sense, nor would it make sense if the values were objects. The only use case raised was for counting instances of an item in a dict, and then inverting _that_ dict (ie. the one that stored the count values) . e.g. for key in d.keys(): d[key] = d.get(key, 0) + 1 items = [(v,k) for k,v in d.items()] items.sort() items.reverse() items = [(k,v) for v,k in items] So I'll also raise again my question of whether it is reasonable to implement a functionality that is not going to be used as a part of the primary purpose of a dict. This functionality extension is just an implementation of one way of using a dict - not necessarily an example of 'missing functionality' to me. From ncoghlan at iinet.net.au Mon Sep 13 06:46:53 2004 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Mon Sep 13 06:47:50 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value In-Reply-To: References: Message-ID: <4145263D.8010402@iinet.net.au> David Harrison wrote: > Hi all, > > Quick pep265 summary : People frequently want to count the occurrences > of values in a dict, or sort the results of a d.items() call by value. > This could be done by extending the current items() definition, or by > creating a new function for the dict object (both requiring a C > implementation). In Python 2.4: ->>> ud = dict(a=1, b=2, c=3) ->>> from operator import itemgetter ->>> print sorted(ud.items(), key=itemgetter(1), reverse=True) [('c', 3), ('b', 2), ('a', 1)] I'm not entirely sure who needs to be thanked for this addition, but it sure makes the 'decorate-sort-undecorate' idiom very, very easy to follow (which was, in fact, the point - I do remember that much of the discussion). I think the addition of 'sorted', and the keyword arguments for both it and list.sort make PEP 265 somewhat redundant. Cheers, Nick. From shane.holloway at ieee.org Mon Sep 13 06:59:08 2004 From: shane.holloway at ieee.org (Shane Holloway (IEEE)) Date: Mon Sep 13 06:59:39 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <4145203A.8070303@ieee.org> References: <4141D745.805@ieee.org> <200409130259.i8D2xfoK008493@cosc353.cosc.canterbury.ac.nz> <4145203A.8070303@ieee.org> Message-ID: <4145291C.6000204@ieee.org> Tim Hochberg wrote: > that there are no use > cases for the custom short circuiting yet, we then just drop scand/scor > until a compelling use case shows up, if it ever does. A boolean calculus (predicate) engine would make use of short-circuiting. Or perhaps a state machine would make use of this feature. I agree with Greg that I'd rather the implementation be "complete". Computer Scientists have already been down this road, and we know that there are two useful forms. :) I like [__and1__, __and__, __or__, __or1__] -- the abbreviation would have to be documented anyway, and the '1' says "one argument: self" to me. Respectfully, -Shane Holloway From dave.l.harrison at gmail.com Mon Sep 13 08:15:42 2004 From: dave.l.harrison at gmail.com (David Harrison) Date: Mon Sep 13 08:15:47 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value In-Reply-To: <4145263D.8010402@iinet.net.au> References: <4145263D.8010402@iinet.net.au> Message-ID: > > Quick pep265 summary : People frequently want to count the occurrences > > of values in a dict, or sort the results of a d.items() call by value. > > This could be done by extending the current items() definition, or by > > creating a new function for the dict object (both requiring a C > > implementation). > > In Python 2.4: > > ->>> ud = dict(a=1, b=2, c=3) > ->>> from operator import itemgetter > ->>> print sorted(ud.items(), key=itemgetter(1), reverse=True) > [('c', 3), ('b', 2), ('a', 1)] > > I'm not entirely sure who needs to be thanked for this addition, but it > sure makes the 'decorate-sort-undecorate' idiom very, very easy to > follow (which was, in fact, the point - I do remember that much of the > discussion). > > I think the addition of 'sorted', and the keyword arguments for both it > and list.sort make PEP 265 somewhat redundant. Seems like another solution to the problem, which makes this pep even less meaningful I'd say. Guess this this pep should be closed then ? From fredrik at pythonware.com Mon Sep 13 08:53:53 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Sep 13 08:52:03 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP 292: SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com><1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull wrote: > But I worry that it's an exceptional example, when you use assumptions > like "real-life text uses characters drawn from a small number of short > contiguous regions in the alphabet." The problem is that I cannot tell if you've studied search issues, or if you're just applying general "but wait, it's different for asian languages" arguments here. There are many issues here, all pulling in different directions: - If you look at usage statistics, you'll find that the absolute majority of all searches are for a single character (usually separators, like colons, spaces, commas). The second largest category is computer-level keywords (usually pure ASCII, also in localized programs), used to process network protocols, file formats, message headers, etc. Searches for "human text" are not that common, really, and search terms are usually limited to only a few words. - This means that most searches have exactly the same characteristics, independent of the locale. Even if a new algorithm would only be better for pure-ASCII text, everyone would benefit. - As for non-ASCII search terms, the "human text" search terms are usually shorter in languages with many ideographs (my non-scientific tests indicate that chinese text uses about 4 times less symbols than english; I'm sure someone can dig up better figures). - This means that even if you are more likely to get collisions in the compressed skip table, there are fewer characters in the table. - This means that you'll probably be able to make long skips as often as for non-Asian text. - On the other hand, the long skips are shorter than for non-Asian text, so you may have to make more of them. - On the other hand, the target strings are also likely to be shorter, so that might not matter. - And so on. The only way to know for sure is if anyone has the time and energy to carry out tests on real-life datasets. (or at least prepare some datasets; I can run the tests if someone provides me with a file with search terms and a number of files containing texts to apply them to, preferrably using UTF-8 encoding). From ncoghlan at iinet.net.au Mon Sep 13 13:46:47 2004 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Mon Sep 13 13:47:37 2004 Subject: [Python-Dev] PEP 265 - Sorting dicts by value In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060F8D@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F8D@UKDCX001.uk.int.atosorigin.com> Message-ID: <414588A7.3040602@iinet.net.au> Moore, Paul wrote: > From: Nick Coghlan > >>In Python 2.4: >> >>->>> ud = dict(a=1, b=2, c=3) >>->>> from operator import itemgetter >>->>> print sorted(ud.items(), key=itemgetter(1), reverse=True) >>[('c', 3), ('b', 2), ('a', 1)] > > > If you haven't done so already, I think this should be submitted > to the cookbook. It's a nice idiom, and demonstrates some useful > Python 2.4 features, and how they work well in combination. It's submitted now. I have a feeling Raymond is the one who should get the credit for the approach, though. I'd be surprised if he made it through the discussions about the introduction of sorted without using this example at least once :) Cheers, Nick. From gmccaughan at synaptics-uk.com Mon Sep 13 14:32:00 2004 From: gmccaughan at synaptics-uk.com (Gareth McCaughan) Date: Mon Sep 13 14:32:34 2004 Subject: [Python-Dev] =?iso-8859-1?q?AlternativeImplementation=09forPEP292=3ASimpleString?= =?iso-8859-1?q?_Substitutions?= In-Reply-To: <87mzzxqmvn.fsf@tleepslib.sk.tsukuba.ac.jp> References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <200409101257.13802.gmccaughan@synaptics-uk.com> <87mzzxqmvn.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: <200409131332.00927.gmccaughan@synaptics-uk.com> On Saturday 2004-09-11 08:35, Stephen J. Turnbull wrote: > >> But [efficiency], as such, is important only to efficiency > >> fanatics. > > Gareth> No, it's important to ... well, people to whom efficiency > Gareth> matters. There's no need for them to be fanatics. > > If it matters just because they care, they're fanatics. If it matters > because they get some other benefit (response time less than the > threshold of hotice, twice as many searches per unit time, half as > many boxes to serve a given load), they're not. 's talk of many > ways to do things "and Python should account for most of them" strikes > me as fanaticism by that definition; the vast majority of developers > will never deal with the special cases, or write apps that anticipate > dealing with huge ASCII strings. Those costs should be borne by the > developers who do, and their clients. I am unconvinced that "the vast majority of developers" will not have work to do that involves a large volume of ASCII data ... but I'm not sure this is something either of us is in a position to know. (If it turns out that you're just completing a PhD thesis entitled "Use of large-volume string data among software developers", or something, then please accept my apologies for guessing wrong and enlighten me!) > I apologize for shoehorning that into my reply to you. That's OK. > >> The question is, how often are people going to notice that when > >> they have pure ASCII they get a 100% speedup [...]? > > Gareth> Why is that the question, rather than "how often are > Gareth> people going to benefit from getting a 100% speedup when > Gareth> they have pure ASCII"? > > Because "benefit" is very subjective for _one_ person, and I don't > want to even think about putting coefficients on your benefit versus > mine. If the benefit is large enough, a single person will be willing > to do the extra work. The question is, should all Python users and > developers bear some burden to make it easier for that person to do > what he needs to do? "Burden" is just as subjective as "benefit". But let's take a look at these burdens and benefits. - Burden for a very small number of Python developers: having to write and maintain a larger body of code, with duplication (at least of purpose) between Unicode and ASCII strings. - Consequent burden on all Python users: more risk of those developers getting burned out and giving up, less time for them to work on other aspects of Python, more danger of bugs in code, larger executables. They won't notice this, of course. + Benefit for a small (but nearly so small) number of Python users: important code runs twice as fast, and this makes a real difference to them. + Consequent benefit for all Python users: more use of Python means more people contributing code, bug reports, useful libraries, etc. They won't notice this, either. + Benefit for all Python users: some of their code runs a little faster. They won't notice this, either. Perhaps I'm being obtuse, but it's far from clear to me that this is a net loss for Python users at large. In any case, the burdens seem less likely to be noticed than the benefits. > I think "notice" is something you can get consensus on. If a lot of > people are _noticing_ the difference, I think that's a reasonable rule > of thumb for when we might want to put "it", or facilities for making > individual efforts to deal with "it" simpler, into "standard Python" > at some level. If only a few people are noticing, let them become > expert at dealing with it. But even if "noticing the difference" is the key point, it is a mistake (I think) to make it specifically "noticing that when they have pure ASCII they get a 100% speedup". Hence my comment quoted below: > Gareth> Or even "how often are people going to try out Python on > Gareth> an application that uses pure-ASCII strings, and decide to > Gareth> use some other language that seems to do the job much > Gareth> faster"? > > See? You're now using a "notice" standard, too. I don't think that's > an accident. It isn't. It's because I was replying to someone who apparently took "notice" standards as the only relevant ones, in order to point out that even with that assumptions there are relevant questions other than "will anyone notice getting a speedup when their data are pure ASCII?". And I, in turn, apologize for shoehorning all *that* into the word "even". :-) I still think, though, that a "notice" standard makes for bad designs. Most people would not notice if all floating-point operations gave results with the last couple of bits wrong, but it is a good thing that they don't. Some people wouldn't notice but would get badly unsatisfactory results. Some people would notice but would find it impractical to work around the problems because that would mean tons of code and major losses in speed. Most people would not notice if by inserting the magic word "wibble" at the start of their programs they could make them 10 times faster, but if for some weird reason it were possible to make that so (but not possible to provide the speedup for programs without "wibble") then it should be done. What people notice is easier to define and to measure than what actually makes a difference to them. That is not enough reason to treat it as the only criterion. -- g From stephen at xemacs.org Mon Sep 13 16:00:57 2004 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon Sep 13 16:01:05 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP 292: SimpleString Substitutions In-Reply-To: (Fredrik Lundh's message of "Mon, 13 Sep 2004 08:53:53 +0200") References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: <87pt4qp8ti.fsf@tleepslib.sk.tsukuba.ac.jp> >>>>> "Fredrik" == Fredrik Lundh writes: Fredrik> Stephen J. Turnbull wrote: >> But I worry that it's an exceptional example, when you use >> assumptions like "real-life text uses characters drawn from a >> small number of short contiguous regions in the alphabet." Fredrik> The problem is that I cannot tell if you've studied Fredrik> search issues, Enough to understand Boyer-Moore and how the proposed algorithm differs, and to recognize that your statements about the distribution of search applications are true. Not that I want to argue about search, I'm all in favor of better search. I was startled to read that Python still uses a brute-force algorithm for searching. My point about distribution of ideographs was simply that you made an unjustified assumption in the context of what is (to me, anyway) an important subdomain of text processing. Here, it is "obviously harmless," but that's because brute force search is so bad. In other applications, or with a better status quo, there very well may be real tradeoffs between what's good for 8-bit text and what's good for Unicode. Fredrik> or if you're just applying general "but wait, it's Fredrik> different for asian languages" arguments here. No, I know that ostrich won't fly. Fredrik> Searches for "human text" are not that common, really, Fredrik> and search terms are usually limited to only a few words. In the context of PEP 292 is a focus on "human text" unwarranted? After all, what motivated the PEP and the implementation was evidently "human text" processing. In my experience, the notation for interpolation it uses would have much bigger advantages over the format string style for "human text" than for the "non-human text" applications I know of. Not that it's useless for the latter, just that it's much more of a luxury there. If that's valid, there's a point where it makes sense for people who develop human-text-oriented features based on Unicode strings to say "pick the features you really want for 8-bit strings, because you have to support them yourselves." Fredrik> The only way to know for sure is if anyone has the time Fredrik> and energy to carry out tests on real-life datasets. (or Fredrik> at least prepare some datasets; I can prepare datasets and do some statistical work for Japanese, but it probably won't happen this month. Sounds like a worthwhile thing to have around, though. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. From barry at python.org Mon Sep 13 16:32:55 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 13 16:33:01 2004 Subject: [Python-Dev] Re: Alternative ImplementationforPEP292:SimpleString Substitutions In-Reply-To: <003501c49784$b8292d00$e841fea9@oemcomputer> References: <003501c49784$b8292d00$e841fea9@oemcomputer> Message-ID: <1095085975.10676.40.camel@geddy.wooz.org> On Fri, 2004-09-10 at 18:22, Raymond Hettinger wrote: > > My only problem with that is the interference that the 'mapping' > > argument presents. IOW, kwds can't contain 'mapping'. > > To support a case where both a mapping and keywords are present, perhaps > an auxiliary class could simplify matters: > > def substitute(self, mapping=None, **kwds): > if mapping is None: > mapping = kwds > elif kwds: > mapping = _altmap(kwds, mapping) > . . . > > class _altmap: > def __init__(self, primary, secondary): > self.primary = primary > self.secondary = secondary > def __getitem__(self, key): > try: > return self.primary[key] > except KeyError: > return self.secondary[key] > This matches the way keywords are used with the dict(). This isn't exactly what I was concerned about, but I agree that it's a worthwhile approach. (I'm going to accept your patch and check it in, with slight modifications.) What I was worried about was if you providing 'mapping' positionally, and kwds contained a 'mapping' key, you'll get a TypeError. I'm going to change the positional argument to '__mapping' so collisions of that kind are less likely, and will document it in libstring.tex. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040913/c4d9e37d/attachment.pgp From barry at python.org Mon Sep 13 16:42:12 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 13 16:42:18 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <4142E78C.7010800@heneryd.com> References: <4142E78C.7010800@heneryd.com> Message-ID: <1095086532.10677.46.camel@geddy.wooz.org> On Sat, 2004-09-11 at 07:54, Erik Heneryd wrote: > * Too long > 10/15 character names for something so simple it up until now just > needed a %? Programs using templates will probably use them > frequently... I'd prefer sub instead of substitute. Noted, thanks. In general I'm not a fan of abbreviations in APIs though. Also note that it is trivial for applications to derive and override __mod__(), aliasing it to whichever version of substitute() they want. For example, I plan on multiply inheriting Template and unicode, and aliasing __mod__() to safe_substitute(). > * Safe? > safe_substitution doesn't tell you much upon first glance. Safe? In > what way? You could even argue that the "plain" version really is the > safer one, as you'll notice typos and thus get a more solid program. I > think a name hinting that this method uses the var name as a fallback > would be better, but can't think of (a short) one... defaultsub? > fallbacksub? loosesub? Guess I could live with safe, but... Yeah, that's the problem, there are no good alternatives. As for which version is "safer", when you're using Templates in an i18n environment, where the actual Template you're going to be interpolating into comes from 3rd party language translation teams, the safe_substitute() version is definitely safer to the application. Thanks for the feedback. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040913/edf62152/attachment.pgp From barry at python.org Mon Sep 13 16:44:37 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 13 16:44:42 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <4142F7D0.5080807@heneryd.com> References: <4142E78C.7010800@heneryd.com> <4142F7D0.5080807@heneryd.com> Message-ID: <1095086677.10672.49.camel@geddy.wooz.org> On Sat, 2004-09-11 at 09:04, Erik Heneryd wrote: > Come to think of it, I really like the more OO-ish approach better, than > to cram everything into a single class. Is the safe_substitute really > that special it deserves a special method? Yes. > Is it really the one, true > way to do a "safe" substitution? Probably not. > IIRC DOS and sh don't agree, so it's > not that obvious. I'm sorry I don't follow that one. > I say keep the inheritance thing, it's much more flexible, and delegate > the KeyError condition to an overridable method. After the lengthy discussions on python-dev, I'm viewing the role of the Template class a little differently, so I think it's fine to put them both in one class. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040913/81a339b6/attachment.pgp From Paul.Moore at atosorigin.com Mon Sep 13 16:50:45 2004 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Mon Sep 13 16:50:50 2004 Subject: [Python-Dev] Re: AlternativeImplementationforPEP292:SimpleString Substitutions Message-ID: <16E1010E4581B049ABC51D4975CEDB8803060F8F@UKDCX001.uk.int.atosorigin.com> From: Barry Warsaw > What I was worried about was if you providing 'mapping' positionally, > and kwds contained a 'mapping' key, you'll get a TypeError. I'm going > to change the positional argument to '__mapping' so collisions of that > kind are less likely, and will document it in libstring.tex. Can't you do something like def substitute(self, *args, **kwds): if len(args) > 1: raise TypeError # mild hack... if len(args) == 1: mapping = args[0] mapping.update(kwds) else: mapping = kwds # etc... This avoids the use of a strangely-named positional argument, at the cost of a check for too many positional arguments (because the interpreter no longer does it) Paul. __________________________________________________________________________ This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Origin group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. __________________________________________________________________________ From mnot at mnot.net Sat Sep 11 07:24:29 2004 From: mnot at mnot.net (Mark Nottingham) Date: Mon Sep 13 17:19:50 2004 Subject: [Python-Dev] Re: [Web-SIG] Adding status code constants to httplib In-Reply-To: <414193D5.6010405@andreweland.org> References: <414193D5.6010405@andreweland.org> Message-ID: FYI; status codes as exceptions; http://www.mnot.net/python/http/status.py On Sep 10, 2004, at 9:45 PM, Andrew Eland wrote: > Hi, > > Over in web-sig, we're discussing PEP 333, the Web Server Gateway > Interface. Rather than defining our own set of constants for the HTTP > status code integers, we thought it would be a good idea to add them > to httplib, allowing other applications to benefit. I've uploaded a > patch[1] to httplib.py and the corresponding documentation. Do people > think this is a good idea? > > -- Andrew Eland (http://www.andreweland.org) > > [1] > http://sourceforge.net/tracker/index.php? > func=detail&aid=1025790&group_id=5470&atid=305470 > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net > -- Mark Nottingham http://www.mnot.net/ From tim.hochberg at cox.net Mon Sep 13 17:05:29 2004 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Sep 13 17:19:52 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <4145291C.6000204@ieee.org> References: <4141D745.805@ieee.org> <200409130259.i8D2xfoK008493@cosc353.cosc.canterbury.ac.nz> <4145203A.8070303@ieee.org> <4145291C.6000204@ieee.org> Message-ID: <4145B739.4040803@cox.net> Shane Holloway (IEEE) wrote: > Tim Hochberg wrote: > > that there are no use > > cases for the custom short circuiting yet, we then just drop scand/scor > > until a compelling use case shows up, if it ever does. > > A boolean calculus (predicate) engine would make use of > short-circuiting. Or perhaps a state machine would make use of this > feature. I agree with Greg that I'd rather the implementation be > "complete". Computer Scientists have already been down this road, and > we know that there are two useful forms. :) I have no objections if someone can actually come up with use cases. However, I still thinks the names should change: and2/or2 will be used the vast majority of the time. Of course, my earlier suggestion to use and/or is completely bogus since that's what &/| map to. Doh! Still, I think the use cases need to be more concrete than what we've seen so far. I can come up with a case where short circuiting could be used in numarray, but not one where I think it should, so I won't be of any help here. > > I like [__and1__, __and__, __or__, __or1__] -- the abbreviation would > have to be documented anyway, and the '1' says "one argument: self" to > me. Sadly, and/or are already taken. I don't think this helps the and1/and2 case much though -- having three methods and/and1/and2 is just confusing. Maybe booland or logand or logicaland? I dunno, none of those are particularly satisfying. Regards, -tim From barry at python.org Mon Sep 13 17:24:19 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 13 17:24:23 2004 Subject: [Python-Dev] Re: AlternativeImplementationforPEP292:SimpleString Substitutions In-Reply-To: <16E1010E4581B049ABC51D4975CEDB8803060F8F@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB8803060F8F@UKDCX001.uk.int.atosorigin.com> Message-ID: <1095089059.10677.94.camel@geddy.wooz.org> On Mon, 2004-09-13 at 10:50, Moore, Paul wrote: > From: Barry Warsaw > > What I was worried about was if you providing 'mapping' positionally, > > and kwds contained a 'mapping' key, you'll get a TypeError. I'm going > > to change the positional argument to '__mapping' so collisions of that > > kind are less likely, and will document it in libstring.tex. > > Can't you do something like > > def substitute(self, *args, **kwds): > if len(args) > 1: > raise TypeError # mild hack... > if len(args) == 1: > mapping = args[0] > mapping.update(kwds) > else: > mapping = kwds > > # etc... > > This avoids the use of a strangely-named positional argument, at the cost > of a check for too many positional arguments (because the interpreter no > longer does it) Nice. That's a better hack IMO than the crappy argument name hack. Thanks, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040913/0ba5d82b/attachment.pgp From stephen at xemacs.org Mon Sep 13 18:01:21 2004 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon Sep 13 18:01:28 2004 Subject: [Python-Dev] AlternativeImplementation forPEP292:SimpleString Substitutions In-Reply-To: <200409131332.00927.gmccaughan@synaptics-uk.com> (Gareth McCaughan's message of "Mon, 13 Sep 2004 13:32:00 +0100") References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <200409101257.13802.gmccaughan@synaptics-uk.com> <87mzzxqmvn.fsf@tleepslib.sk.tsukuba.ac.jp> <200409131332.00927.gmccaughan@synaptics-uk.com> Message-ID: <87d60qp38u.fsf@tleepslib.sk.tsukuba.ac.jp> >>>>> "Gareth" == Gareth McCaughan writes: Gareth> I am unconvinced that "the vast majority of developers" Gareth> will not have work to do that involves a large volume of Gareth> ASCII data ... but I'm not sure this is something either Gareth> of us is in a position to know. Oh, I'm pretty sure that an awful lot of developers _will_ have work to do that involves large volumes of ASCII data. The question is how much will that work be facilitated by having all (as opposed to a few well-chosen) text processing features support returning 8-bit strings as well as Unicodes? Gareth> Perhaps I'm being obtuse, but it's far from clear to me Gareth> that this is a net loss for Python users at large. It's not clear to me, either. I am just not convinced by hand-waving that says "there's no difference between human text processing and other text processing, so any text processing facility should be available in an 8-bit version." Maybe that's a straw man, but that's what was advocating AFAICT. Gareth> I still think, though, that a "notice" standard makes for Gareth> bad designs. We're not talking about design here, IMO. We're talking about requirements. Of course if you're going to implement a capability, you should design it "right." Gareth> What people notice is easier to define and to measure than Gareth> what actually makes a difference to them. That is not Gareth> enough reason to treat it as the only criterion. It's not. What I'm saying is that if very few people see a noticable difference, it should be left up to those few to implement what they need. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. From erik at heneryd.com Mon Sep 13 18:16:08 2004 From: erik at heneryd.com (Erik Heneryd) Date: Mon Sep 13 18:16:16 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <1095086532.10677.46.camel@geddy.wooz.org> References: <4142E78C.7010800@heneryd.com> <1095086532.10677.46.camel@geddy.wooz.org> Message-ID: <4145C7C8.5080101@heneryd.com> Barry Warsaw wrote: > On Sat, 2004-09-11 at 07:54, Erik Heneryd wrote: > > >>* Too long >>10/15 character names for something so simple it up until now just >>needed a %? Programs using templates will probably use them >>frequently... I'd prefer sub instead of substitute. > > > Noted, thanks. In general I'm not a fan of abbreviations in APIs > though. Also note that it is trivial for applications to derive and > override __mod__(), aliasing it to whichever version of substitute() > they want. For example, I plan on multiply inheriting Template and > unicode, and aliasing __mod__() to safe_substitute(). -1 Well, even if it's trivial, I still think the out-of-the-box API shouldn't be hostile against frequent use. Subclassing just to get a decent name/operator feels stupid. Why not __mod__ = safe_substitute per default then? Erik From erik at heneryd.com Mon Sep 13 18:18:28 2004 From: erik at heneryd.com (Erik Heneryd) Date: Mon Sep 13 18:18:32 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <1095086677.10672.49.camel@geddy.wooz.org> References: <4142E78C.7010800@heneryd.com> <4142F7D0.5080807@heneryd.com> <1095086677.10672.49.camel@geddy.wooz.org> Message-ID: <4145C854.4070200@heneryd.com> Barry Warsaw wrote: > On Sat, 2004-09-11 at 09:04, Erik Heneryd wrote: > > >>Come to think of it, I really like the more OO-ish approach better, than >>to cram everything into a single class. Is the safe_substitute really >>that special it deserves a special method? > > > Yes. > > >> Is it really the one, true >>way to do a "safe" substitution? > > > Probably not. > > >>IIRC DOS and sh don't agree, so it's >>not that obvious. > > > I'm sorry I don't follow that one. DOS: '%NOTFOUND%' => '%NOTFOUND%' sh: '$NOTFOUND' => '' BTW, what about a closing delimiter in the standard regex? >>I say keep the inheritance thing, it's much more flexible, and delegate >>the KeyError condition to an overridable method. > > > After the lengthy discussions on python-dev, I'm viewing the role of the > Template class a little differently, so I think it's fine to put them > both in one class. I think there are more use cases for a KeyError hook than just sh-style substitution; a default value, a computed value (think replacing html entities - returning chr(idpattern) on KeyError) etc... I hope you don't do pep-292 just to fill your own needs (i18n?), but also keep your mind open to other uses... Erik From bac at OCF.Berkeley.EDU Mon Sep 13 19:41:08 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Mon Sep 13 19:41:32 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <4145C7C8.5080101@heneryd.com> References: <4142E78C.7010800@heneryd.com> <1095086532.10677.46.camel@geddy.wooz.org> <4145C7C8.5080101@heneryd.com> Message-ID: <4145DBB4.8010601@ocf.berkeley.edu> Erik Heneryd wrote: > Barry Warsaw wrote: > >> On Sat, 2004-09-11 at 07:54, Erik Heneryd wrote: >> >> >>> * Too long >>> 10/15 character names for something so simple it up until now just >>> needed a %? Programs using templates will probably use them >>> frequently... I'd prefer sub instead of substitute. >> >> >> >> Noted, thanks. In general I'm not a fan of abbreviations in APIs >> though. Also note that it is trivial for applications to derive and >> override __mod__(), aliasing it to whichever version of substitute() >> they want. For example, I plan on multiply inheriting Template and >> unicode, and aliasing __mod__() to safe_substitute(). > > > -1 > > Well, even if it's trivial, I still think the out-of-the-box API > shouldn't be hostile against frequent use. Subclassing just to get a > decent name/operator feels stupid. Why not __mod__ = safe_substitute > per default then? > I'm with Barry on this. Verbosity is going to overtake practicality here. And I think this is a good thing the last thing the stdlib should start doing is trying to force people to use some shorthand that we come up with that won't necessarily be intuitive to other people ('sub' just doesn't seem right here; and don't ask for justification since this is a gut feeling). I am sure the way I tend to abbreviate things is not how anyone else would. So why would the stdlib try to? We have tried to come up with good names and this is the best we came up with. And as Barry said, you can add __mod__ to your own subclass. And another option entirely is to just assign the method to a shorter name in your code. And if you *really* want to argue the length thing, you can take into account that "substitute" has a decent amount of hand alternation on QWERTY to allow for pretty good typing speed. -Brett From mal at egenix.com Mon Sep 13 22:15:01 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Mon Sep 13 22:15:11 2004 Subject: [Python-Dev] Re: Alternative Implementation for PEP 292: SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <41436762.7040207@egenix.com> Message-ID: <4145FFC5.8090208@egenix.com> Fredrik Lundh wrote: > M.-A. Lemburg wrote: > > >>You mean: a compressed shift table for Unicode patterns ? >>I'll have a look. > > > It's a lossy compression: the entire delta1 table is represented as > two 32-bit values, independent of the size of the source alphabet. > Works amazingly well, at least when combined with the BM-variant > it was designed for... > > (I suppose it's too late for 2.4, but it would probably be a good > idea to switch to this algorithm in 2.5) Here's a reference that might be interesting for you: http://citeseer.ist.psu.edu/boldi02compact.html They use statistical approaches to dealing with the problem of large alphabets. Their motivation is making Java's Unicode string implementation faster... sounds familiar, eh :-) Their motivation was based on work done for the "Managing Gigabytes" project: http://www.cs.mu.oz.au/mg/ and http://www.mds.rmit.edu.au/mg/ Too bad their code is GPLed, but I suppose getting some ideas is OK ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 13 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From fredrik at pythonware.com Mon Sep 13 22:18:01 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Sep 13 22:16:11 2004 Subject: [Python-Dev] Re: PEP 292: method names References: <4142E78C.7010800@heneryd.com> <1095086532.10677.46.camel@geddy.wooz.org><4145C7C8.5080101@heneryd.com> <4145DBB4.8010601@ocf.berkeley.edu> Message-ID: Brett C wrote: > I am sure the way I tend to abbreviate things is not how anyone > else would. So why would the stdlib try to? it's pretty amazing that you've been able to use Python without noticing that the standard library is full of abbreviations. doesn't anyone here think before they post, these days? From fredrik at pythonware.com Mon Sep 13 22:20:28 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Sep 13 22:20:30 2004 Subject: [Python-Dev] Re: Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com><1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> <87pt4qp8ti.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull wrote: > In the context of PEP 292 is a focus on "human text" unwarranted? I'm pretty sure this subthread left the PEP quite a few posts ago. The rest of us were talking about string searches, of the find/replace/split variety. From fredrik at pythonware.com Mon Sep 13 22:47:40 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Sep 13 22:45:54 2004 Subject: [Python-Dev] Re: PEP 335: Overloadable Boolean Operators References: <200409100058.i8A0wNIV002743@cosc353.cosc.canterbury.ac.nz> Message-ID: Greg Ewing wrote: > To permit short-circuiting, processing of the 'and' and 'or' operators > is split into two phases. Phase 1 occurs after evaluation of the first > operand but before the second. If the first operand defines the > appropriate phase 1 method, it is called with the first operand as > argument. If that method can determine the result without needing the > second operand, it returns the result, and further processing is > skipped. nice. +1 from here (but only +0 on the method names). From raymond.hettinger at verizon.net Mon Sep 13 23:23:01 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Mon Sep 13 23:24:05 2004 Subject: [Python-Dev] Decorator PEP elaborations Message-ID: <000a01c499d7$dbed5900$e841fea9@oemcomputer> If one of the authors gets a chance, it would be nice to document the rationale for the order of application being inside-out instead of top-down: @deco3 @deco2 @deco1 def myfunc(args): . . . Also, it would be nice to document the reasons for the approach to argument handling: @deco # calls deco(f) @decomaker(arg) # calls tmp(f) where tmp=decomaker(arg) Raymond Hettinger From fredrik at pythonware.com Mon Sep 13 23:29:24 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Sep 13 23:27:33 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP292: SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <41436762.7040207@egenix.com> <4145FFC5.8090208@egenix.com> Message-ID: M.-A. Lemburg wrote: >> (I suppose it's too late for 2.4, but it would probably be a good >> idea to switch to this algorithm in 2.5) > > Here's a reference that might be interesting for you: > > http://citeseer.ist.psu.edu/boldi02compact.html > > They use statistical approaches to dealing with the problem of > large alphabets. Their motivation is making Java's Unicode string > implementation faster... sounds familiar, eh :-) thanks for the reference. but I have to admit that I found the following paper by the same authors to be more interesting ... http://citeseer.ist.psu.edu/boldi03rethinking.html ... both because they've looked into efficient designs for mutable strings, and because of how they use a 32-bit "bloom filter" hashed by the least significant bits in the Unicode characters... oh well, there are never any new ideas ;-) From barry at python.org Tue Sep 14 00:40:48 2004 From: barry at python.org (Barry Warsaw) Date: Tue Sep 14 00:40:54 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <4145C7C8.5080101@heneryd.com> References: <4142E78C.7010800@heneryd.com> <1095086532.10677.46.camel@geddy.wooz.org> <4145C7C8.5080101@heneryd.com> Message-ID: <1095115247.10672.187.camel@geddy.wooz.org> On Mon, 2004-09-13 at 12:16, Erik Heneryd wrote: > Well, even if it's trivial, I still think the out-of-the-box API > shouldn't be hostile against frequent use. It's no more hostile than os.path.splitext or KeyboardInterrupt . Seriously, although Python does use abbreviations sometimes, I personally think that doing so can create ambiguity and can cause problems for non-native English speakers. Python is not Unix. Besides, don't most editors and IDEs provide completion these days? > Subclassing just to get a > decent name/operator feels stupid. Why not __mod__ = safe_substitute > per default then? Because I don't know which version will be more generally preferred by application authors and in the face of ambiguity I refuse the temptation to guess. (I know which version my own applications will prefer but I think you're the same person arguing that my own needs shouldn't drive all decisions.) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040913/93f4bf95/attachment.pgp From barry at python.org Tue Sep 14 00:43:37 2004 From: barry at python.org (Barry Warsaw) Date: Tue Sep 14 00:43:48 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <4145C854.4070200@heneryd.com> References: <4142E78C.7010800@heneryd.com> <4142F7D0.5080807@heneryd.com> <1095086677.10672.49.camel@geddy.wooz.org> <4145C854.4070200@heneryd.com> Message-ID: <1095115417.10677.191.camel@geddy.wooz.org> On Mon, 2004-09-13 at 12:18, Erik Heneryd wrote: > >>IIRC DOS and sh don't agree, so it's > >>not that obvious. > > > > > > I'm sorry I don't follow that one. > > DOS: '%NOTFOUND%' => '%NOTFOUND%' > sh: '$NOTFOUND' => '' Okay, thanks. > BTW, what about a closing delimiter in the standard regex? There isn't one. The PEP explains the rationale. > I hope you don't do pep-292 just to fill your own needs (i18n?), but > also keep your mind open to other uses... What can't you do with PEP 292 as it now stands? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040913/023cab74/attachment.pgp From erik at heneryd.com Tue Sep 14 01:44:26 2004 From: erik at heneryd.com (Erik Heneryd) Date: Tue Sep 14 01:44:31 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <1095115247.10672.187.camel@geddy.wooz.org> References: <4142E78C.7010800@heneryd.com> <1095086532.10677.46.camel@geddy.wooz.org> <4145C7C8.5080101@heneryd.com> <1095115247.10672.187.camel@geddy.wooz.org> Message-ID: <414630DA.1000009@heneryd.com> Barry Warsaw wrote: > On Mon, 2004-09-13 at 12:16, Erik Heneryd wrote: >>Subclassing just to get a >>decent name/operator feels stupid. Why not __mod__ = safe_substitute >>per default then? > > > Because I don't know which version will be more generally preferred by > application authors and in the face of ambiguity I refuse the temptation > to guess. (I know which version my own applications will prefer but I > think you're the same person arguing that my own needs shouldn't drive > all decisions.) You know, that's an argument for going back to subclasses. Erik From erik at heneryd.com Tue Sep 14 01:48:30 2004 From: erik at heneryd.com (Erik Heneryd) Date: Tue Sep 14 01:48:35 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <1095115417.10677.191.camel@geddy.wooz.org> References: <4142E78C.7010800@heneryd.com> <4142F7D0.5080807@heneryd.com> <1095086677.10672.49.camel@geddy.wooz.org> <4145C854.4070200@heneryd.com> <1095115417.10677.191.camel@geddy.wooz.org> Message-ID: <414631CE.4060205@heneryd.com> Barry Warsaw wrote: > On Mon, 2004-09-13 at 12:18, Erik Heneryd wrote: >>BTW, what about a closing delimiter in the standard regex? > > > There isn't one. The PEP explains the rationale. Sorry, must've missed it? Note that I'm not saying there should be a default closer, but an empty group in the regex, for subclasses to fill in (for example ml entities could use this). >>I hope you don't do pep-292 just to fill your own needs (i18n?), but >>also keep your mind open to other uses... > > > What can't you do with PEP 292 as it now stands? Examples from previous posts: sh-style safe variables, ml entities. As you really can't reuse anything now, it would be like starting from scratch. Erik From gvanrossum at gmail.com Tue Sep 14 02:12:55 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Sep 14 02:13:00 2004 Subject: [Python-Dev] Decorator PEP elaborations In-Reply-To: <000a01c499d7$dbed5900$e841fea9@oemcomputer> References: <000a01c499d7$dbed5900$e841fea9@oemcomputer> Message-ID: No PEP text from me, but: > If one of the authors gets a chance, it would be nice to document the > rationale for the order of application being inside-out instead of > top-down: > > @deco3 > @deco2 > @deco1 > def myfunc(args): > . . . This is the usual order for function-application. @f @g def foo() -> foo=f(g(foo). > Also, it would be nice to document the reasons for the approach to > argument handling: > > @deco # calls deco(f) > @decomaker(arg) # calls tmp(f) where tmp=decomaker(arg) The thing after the @ can be consered to be an expression (never mind that syntactically you are restricted), and whatever that expression returns is called. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From raymond.hettinger at verizon.net Tue Sep 14 04:00:02 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Tue Sep 14 04:01:11 2004 Subject: [Python-Dev] PEP 292: method names In-Reply-To: <414631CE.4060205@heneryd.com> Message-ID: <000d01c499fe$8d71b9c0$e841fea9@oemcomputer> [Barry] > > What can't you do with PEP 292 as it now stands? [Erik] > Examples from previous posts: sh-style safe variables, ml entities. As > you really can't reuse anything now, it would be like starting from > scratch. It took a good while to refine the existing implementation to handle all the nuances of the $var format. My suspicion is that a format with opening and closing delimiters would have its own share of issues (nesting and escaping for example) and would warrant its own separate solution. The current implementation is pretty darned good and strikes a nice balance between extensibility goals and simplification goals (using $var instead of a %(var)s format). The API is clean and friendly for most purposes. Barry has made it possible to create unicode coercing subclasses, to substitute alternate identifier patterns (such as dotted names), to specify an alternative delimiter, and to use polymorphism for changing the implementation without changing client code. That is quite a bit of extensibility. Further hypergeneralization would stray too far from the original simplification goals. After experimenting with alternative approaches and writing subclasses, I learned that no design easily accommodated the most complex use cases. The pattern, convert function, flags, and invocation are so tightly coupled that you really are better off coding from scratch. Fortunately, with the string.py source available as a model, it is not hard to do. So, for applications beyond the limits of the current design, my suggestion is to use regexes to roll your own. At some point, it is easier to write a regex than to write a subclass overriding all existing behaviors. Raymond From greg at cosc.canterbury.ac.nz Tue Sep 14 04:58:22 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue Sep 14 04:58:28 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> Message-ID: <200409140258.i8E2wMDW010792@cosc353.cosc.canterbury.ac.nz> > IMO, the algebraic/query use cases would be better served by some > sort of "code literal" or "AST literal" syntax You may be right about the symbolic algebra case, if the intent is to be able to write code that manipulates expressions, in which case writing the expressions to be manipulated as literals of some kind may make sense. But I don't agree in the SQL case, where my intent is for the user to simply write Python code that performs database queries, not write Python code that constructs trees of SQL expressions that perform database queries. The fact that expression manipulation is going on should be an implementation detail that the user doesn't need to be aware of. Having to write the query expressions using some special syntax would interfere with that. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From bac at OCF.Berkeley.EDU Tue Sep 14 04:58:27 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Tue Sep 14 04:58:35 2004 Subject: [Python-Dev] Re: PEP 292: method names In-Reply-To: References: <4142E78C.7010800@heneryd.com> <1095086532.10677.46.camel@geddy.wooz.org><4145C7C8.5080101@heneryd.com> <4145DBB4.8010601@ocf.berkeley.edu> Message-ID: <41465E53.6050606@ocf.berkeley.edu> Fredrik Lundh wrote: > Brett C wrote: > > >>I am sure the way I tend to abbreviate things is not how anyone >>else would. So why would the stdlib try to? > > > it's pretty amazing that you've been able to use Python without noticing > that the standard library is full of abbreviations. > Just because the stdlib is full of abbreviations does not mean it should be continued. Precedence != acceptance . -Brett From pje at telecommunity.com Tue Sep 14 06:37:06 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Sep 14 06:36:37 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <200409140258.i8E2wMDW010792@cosc353.cosc.canterbury.ac.nz> References: <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> At 02:58 PM 9/14/04 +1200, Greg Ewing wrote: > > IMO, the algebraic/query use cases would be better served by some > > sort of "code literal" or "AST literal" syntax > >You may be right about the symbolic algebra case, if the intent is to >be able to write code that manipulates expressions, in which case >writing the expressions to be manipulated as literals of some kind may >make sense. > >But I don't agree in the SQL case, where my intent is for the user to >simply write Python code that performs database queries, not write >Python code that constructs trees of SQL expressions that perform >database queries. So, something like this: query("x and y or z") isn't "code that performs database queries"? My main concern about the PEP is that it adds overhead to *all* logical operations, but the feature will only benefit code that hasn't yet been written. I also fear that as a result, people will start writing complex if-then blocks to "optimize" performance of conditionals to get them back to where they were before the facility was added. Also, it considerably expands the scope of understanding that someone needs in order to grasp the meaning of a logical expression. For these reasons, I'd feel more comfortable with either a literal syntax (to address algebra, SQL, etc.) or some type of special infix notation to allow new operators to be defined in Python, so that it isn't necessary to use prefix or method notation to perform operations like these. Neither of these solutions burdens applications that don't need the feature(s). From tjreedy at udel.edu Tue Sep 14 08:54:44 2004 From: tjreedy at udel.edu (Terry Reedy) Date: Tue Sep 14 08:54:55 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com><1094315138.8696.36.camel@geddy.wooz.org><413F1D9C.20209@egenix.com><413F3605.7090707@egenix.com><413F6120.7090603@egenix.com><87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: "Fredrik Lundh" wrote in message news:ci3g2d$m3g$1@sea.gmane.org... > usually shorter in languages with many ideographs (my non-scientific > tests indicate that chinese text uses about 4 times less symbols than > english; I'm sure someone can dig up better figures). This is why I am not especially enamored of Unicode and the prospect of Python becoming married to it. It is heavily weighted in favor of efficiently representing Chinese and inefficiently representing English. To give English equivalent treatment, the 20,000 or so most common words, roots, prefixes, and suffixes would each get its own codepoint. Terry J. Reedy From stephen at xemacs.org Tue Sep 14 09:03:19 2004 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue Sep 14 09:03:30 2004 Subject: [Python-Dev] Re: Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: (Fredrik Lundh's message of "Mon, 13 Sep 2004 22:20:28 +0200") References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> <87pt4qp8ti.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: <87k6uxnxhk.fsf@tleepslib.sk.tsukuba.ac.jp> >>>>> "Fredrik" == Fredrik Lundh writes: Fredrik> Stephen J. Turnbull wrote: >> In the context of PEP 292 is a focus on "human text" >> unwarranted? Fredrik> I'm pretty sure this subthread left the PEP quite a few Fredrik> posts ago. That's a funny way to spell "I don't like the way this is going, good-bye", but it works for me. Have a nice day, thanks for the information on search algorithms and usage patterns. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. From fredrik at pythonware.com Tue Sep 14 10:33:03 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Sep 14 10:33:08 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com><1094315138.8696.36.camel@geddy.wooz.org><413F1D9C.20209@egenix.com><413F3605.7090707@egenix.com><413F6120.7090603@egenix.com><87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: Terry Reedy wrote: >> usually shorter in languages with many ideographs (my non-scientific >> tests indicate that chinese text uses about 4 times less symbols than >> english; I'm sure someone can dig up better figures). > > This is why I am not especially enamored of Unicode and the prospect of Python becoming married to > it. It is heavily weighted in favor of efficiently representing Chinese and inefficiently > representing English. Don't confuse Unicode with its UCS-2 and UCS-4 encodings. On a conceptual level, good old 7-bit ASCII and 8-bit ISO-Latin-1 are both Unicode. From jacobs at theopalgroup.com Tue Sep 14 14:04:45 2004 From: jacobs at theopalgroup.com (Kevin Jacobs) Date: Tue Sep 14 14:04:41 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> References: <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> Message-ID: <4146DE5D.3040702@theopalgroup.com> Phillip J. Eby wrote: > My main concern about the PEP is that it adds overhead to *all* > logical operations, but the feature will only benefit code that hasn't > yet been written. Actually, there are several packages that implement ugly workarounds for exactly this issue. So, in a sense, there is a significant amount of code that exists that will benefit from this feature. Some that come to mind are my own SQL ADT library, SQLObject, and several parser tools. > For these reasons, I'd feel more comfortable with either a literal > syntax (to address algebra, SQL, etc.) or some type of special infix > notation to allow new operators to be defined in Python, so that it > isn't necessary to use prefix or method notation to perform operations > like these. Neither of these solutions burdens applications that > don't need the feature(s). Both of your alternatives are being used in some form and neither is really satisfactory. Literal representations require complex parsers, when the Python parser is really what is desired. The infix notation idea is interesting, however the operators desired are usually 'logical and' and 'logical or', which are clearly spelled 'and' and 'or' in Python. I see it as a semantic limitation that Python does not allow overriding these operators. Adding extra indirection (i.e., extra byte codes) _will_ affect performance, but my view is that correctness and completeness are more important than performance. -Kevin From ndbecker2 at verizon.net Tue Sep 14 14:48:39 2004 From: ndbecker2 at verizon.net (Neal D. Becker) Date: Tue Sep 14 14:50:59 2004 Subject: [Python-Dev] find_first (and relatives) Message-ID: I was a bit surprised to find out that python doesn't seem to have builtin functors, such as find_first. Although there are ways to simulate such functions, it would be good to have an expanded set of functional programming tools which are coded in C for speed. From pinard at iro.umontreal.ca Tue Sep 14 15:08:08 2004 From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard) Date: Tue Sep 14 15:08:52 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: References: Message-ID: <20040914130808.GA2294@alcyon.progiciels-bpi.ca> [Terry Reedy] > [Unicode] is heavily weighted in favor of efficiently representing > Chinese and inefficiently representing English. You undoubtedly forgot the smiley! :-) Many people consider that Unicode, or UTF-8 at least, is strongly favouring English (boldly American) over any other script or language. If it has not been so, Americans would never have promoted it so much, and would have rather shown an infinite and eternal reluctance... -- Fran?ois Pinard http://www.iro.umontreal.ca/~pinard From exarkun at divmod.com Tue Sep 14 15:32:33 2004 From: exarkun at divmod.com (exarkun@divmod.com) Date: Tue Sep 14 15:33:05 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <4146DE5D.3040702@theopalgroup.com> Message-ID: <20040914133233.29723.1901953221.divmod.quotient.1109@ohm> On Tue, 14 Sep 2004 08:04:45 -0400, Kevin Jacobs wrote: >Phillip J. Eby wrote: > > > For these reasons, I'd feel more comfortable with either a literal > > syntax (to address algebra, SQL, etc.) or some type of special infix > > notation to allow new operators to be defined in Python, so that it > > isn't necessary to use prefix or method notation to perform operations > > like these. Neither of these solutions burdens applications that > > don't need the feature(s). > > Both of your alternatives are being used in some form and > neither is really satisfactory. Literal representations require > complex parsers, when the Python parser is really what is > desired. Python's parser is already available, through the compiler module. The example given earlier, query("x and y or z"), is relatively straightforward to implement as a set of AST manipulations. Jp From mal at egenix.com Tue Sep 14 15:56:09 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Tue Sep 14 15:56:17 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com><1094315138.8696.36.camel@geddy.wooz.org><413F1D9C.20209@egenix.com><413F3605.7090707@egenix.com><413F6120.7090603@egenix.com><87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: <4146F879.6070805@egenix.com> Terry Reedy wrote: > "Fredrik Lundh" wrote in message > news:ci3g2d$m3g$1@sea.gmane.org... > >>usually shorter in languages with many ideographs (my non-scientific >>tests indicate that chinese text uses about 4 times less symbols than >>english; I'm sure someone can dig up better figures). > > This is why I am not especially enamored of Unicode and the prospect of > Python becoming married to it. It is heavily weighted in favor of > efficiently representing Chinese and inefficiently representing English. Hmm, the Asian world has a very different view on these things. Representing English ASCII text in UTF-8 is very efficient (1-1), while typical Asian texts use between 1.5-2 times as much space as their equivalent in one of the resp. Asian encodings, e.g. take the Japanese translation of the bible from (only parts of New Testament): http://www.cozoh.org/denmo/ >>> bible = unicode(open('denmo.txt', 'rb').read(), 'shift-jis') >>> len(bible) 386980 >>> len(bible.encode('utf-8')) 1008272 >>> len(bible.encode('shift-jis')) 697626 Some stats: ----------- Number of unique code points: 1512 Code point frequency (truncated): u'\u305f' : ================================= u' ' : ============================= u'\u306e' : =========================== u'\uff0c' : ========================== u'\r' : ======================== u'\n' : ======================== u'\u306b' : ===================== u'\u3044' : ================= u'\u3066' : ================= u'\u3057' : ================ u'\u3002' : ================ u'\u306f' : ================ u'\u306a' : =============== u'\u3092' : ============== u'\u3068' : ============ u'\u308b' : ============ u'\u3089' : =========== u'\u3063' : =========== u':' : =========== u'}' : =========== u'{' : =========== u'\u304c' : ========== u'\u308c' : ========== u'\u304b' : ========= u'\u3067' : ========= u'1' : ========= u'\u5f7c' : ======== u'\u3053' : ======== u'\u3042' : ======= u'\u3061' : ======= u'\u3046' : ======= u'2' : ======= ... As you can see, most code points live in the 0x3000 area. These code points require 3 bytes in UTF-8, 2 bytes in UTF-16. > To give English equivalent treatment, the 20,000 or so most common words, > roots, prefixes, and suffixes would each get its own codepoint. I suggest you take this one up with the Unicode Consortium :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 14 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From jacobs at theopalgroup.com Tue Sep 14 17:29:10 2004 From: jacobs at theopalgroup.com (Kevin Jacobs) Date: Tue Sep 14 17:29:14 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <20040914133233.29723.1901953221.divmod.quotient.1109@ohm> References: <20040914133233.29723.1901953221.divmod.quotient.1109@ohm> Message-ID: <41470E46.5020802@theopalgroup.com> exarkun@divmod.com wrote: >On Tue, 14 Sep 2004 08:04:45 -0400, Kevin Jacobs wrote: > > >>Phillip J. Eby wrote: >> >> >>>For these reasons, I'd feel more comfortable with either a literal >>>syntax (to address algebra, SQL, etc.) or some type of special infix >>>notation to allow new operators to be defined in Python, so that it >>>isn't necessary to use prefix or method notation to perform operations >>>like these. Neither of these solutions burdens applications that >>>don't need the feature(s). >>> >>> >>Both of your alternatives are being used in some form and >>neither is really satisfactory. Literal representations require >>complex parsers, when the Python parser is really what is >>desired. >> >> > Python's parser is already available, through the compiler module. The example given earlier, query("x and y or z"), is relatively straightforward to implement as a set of AST manipulations. > > While strictly true, your suggestion still requires two distinct parsers (although one implementation) and two distinct parsing contexts (one embedded in a literal string). The use cases I care about involve minimizing the difference between evaluating regular Python expressions and ADT instances -- plus the ability to mix constructs from both in a seamless way. If Python didn't support any over-loadable ADT methods, then this wouldn't be an issue. However, the problem is that virtually all ADT methods _are_ defined _except_ logical conjunction and disjunction. Thus, I am more concerned with correcting this oversight than I am with a fraction of a percent in slowdown in real applications. (or at least micro-benchmarks are _not_ representative of any real world situations I've ever cared about) -Kevin From pje at telecommunity.com Tue Sep 14 17:43:05 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Sep 14 17:42:56 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <4146DE5D.3040702@theopalgroup.com> References: <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040914112226.0354d8d0@mail.telecommunity.com> At 08:04 AM 9/14/04 -0400, Kevin Jacobs wrote: >>For these reasons, I'd feel more comfortable with either a literal syntax >>(to address algebra, SQL, etc.) or some type of special infix notation to >>allow new operators to be defined in Python, so that it isn't necessary >>to use prefix or method notation to perform operations like >>these. Neither of these solutions burdens applications that don't need >>the feature(s). > >Both of your alternatives are being used in some form and >neither is really satisfactory. Literal representations require >complex parsers, when the Python parser is really what is >desired. Maybe you missed the earlier part of the thread, where I was suggesting that a Python "code literal" or "AST literal" syntax would be helpful. For example, if backquotes didn't already have a use, one might say something like: db.query(`x.y==z and foo*bar<27`) To pass an AST object to the db.query() method. The advantage would be that the AST would be parsed and syntax checked at compile time, rather than runtime. After several experiments with using &, |, and ~ for query expressions, I've pretty much quit and gone to using string literals, since AST literals don't exist. But if AST literals *did* exist, I'd certainly use them in preference to strings. But, even if PEP 335 *were* implemented, creating a query system using Python expressions would *still* be kludgy, because you still need "seed variables" in the current scope to write a query expression. In my example above, I didn't need to bind 'x' or 'y' or 'z' or 'foo' or 'bar', because the db.query() method is going to interpret those in some context. If I were using a PEP 335-based query system, I'd have to initialize those variables to special querying objects first. From my POV, the use of &, |, and ~ were very minor issues. Being able to use 'and', 'or', and 'not' would provided some minor syntactic sugar at best. Trying to implement every *other* Python operator correctly, and having to have seed variables is IMO where the bulk of the complexity comes from, when trying to use Python syntax as a query language. That's why I say that an AST literal syntax would be much more useful to me than PEP 335 for this type of use case. As for the numeric use cases, I'm not at all clear why &, |, and ~ (or special methods/functions) aren't suitable. > The infix notation idea is interesting, however the >operators desired are usually 'logical and' and 'logical or', >which are clearly spelled 'and' and 'or' in Python. Actually, from a pure functionality perspective, the logical operators are shortcuts for writing if-then-else blocks, and they compile to almost the same bytecode as if-then-else blocks. > I see it >as a semantic limitation that Python does not allow overriding >these operators. Python also doesn't allow overriding of 'is' or 'type()' either. I see the logical operators as being rather in the same plane of fundamentals. From dgm at ecs.soton.ac.uk Tue Sep 14 11:14:06 2004 From: dgm at ecs.soton.ac.uk (David G Mills) Date: Tue Sep 14 18:00:12 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? Message-ID: As the link below shows httplib can't handle an IPv6 address, it checks for a port number by checking for a : but this simply cuts the IPv6 address in two and tries to set the port variable as nonsense. http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=81926#c2 Are there any plans to rectify this, or even an alternative version of httplib kicking about that can handle IPv6. Regards, David. From skip at pobox.com Tue Sep 14 18:36:45 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 14 18:37:05 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: References: Message-ID: <16711.7709.255870.851658@montanaro.dyndns.org> David> As the link below shows httplib can't handle an IPv6 address, it David> checks for a port number by checking for a : but this simply cuts David> the IPv6 address in two and tries to set the port variable as David> nonsense. David> http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=81926#c2 David> Are there any plans to rectify this, or even an alternative David> version of httplib kicking about that can handle IPv6. I just checked in a fix for this (it was a one-character change, not including unit test update). I don't know if there is a Python bug report open which now needs to be closed. Considering the ease of the fix, I sort of think not (otherwise it would have been fixed long ago). Note that in general we have plenty of other things to do with our time without monitoring other projects' bug trackers looking for possible Python bug reports. If they aren't reported on SF we won't here about them. (The Debian folks routinely open SF items when Python bug reports wind up in the Debian tracker.) -- Skip Montanaro Got spam? http://www.spambayes.org/ skip@pobox.com From jcarlson at uci.edu Tue Sep 14 18:58:56 2004 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue Sep 14 19:06:17 2004 Subject: [Python-Dev] find_first (and relatives) In-Reply-To: References: Message-ID: <20040914092447.7B21.JCARLSON@uci.edu> > I was a bit surprised to find out that python doesn't seem to have builtin > functors, such as find_first. Although there are ways to simulate such > functions, it would be good to have an expanded set of functional > programming tools which are coded in C for speed. I think I've been here long enough, and it is getting to be my turn to do this kind of thing once, so here it goes... Address your concerns of "Python should or should not have this thing" on python-list first (available as a newsgroup as comp.lang.python if you so prefer). Python-dev is about developing the core language, not about fielding requests without background, support, salutation, and "thank you for your consideration". As for "find_first", the first mention of such things I found on the net were the Boost C++ libraries for searching strings. Considering your recent posts on the C++ sig with regards to Boost::Python, this seems like what you were referring to. Searching strings are generally done via "string literal".find("substring literal") or string_variable.find(substring_variable), and any other such variations you would care to use. There also exists a find_all mechanism in the regular expression module re, which comes standard with Python. Lists also include find methods, though they call them index(). If your list is sorted, you may want to consider the bisect module. If you desire your finding methods to return an iteratble through the sequence of positions of the item, perhaps this little throwaway generator would be sufficient (which I'm sure a python-list user could have helped you with)... def find_first(str_or_list, item): if type(str_or_list) in (str, unicode): f = str_or_list.find(item) while f != -1: yield f f = str_or_list.find(item, f) elif type(str_or_list) is list: try: f = str_or_list.index(item) except ValueError: return while 1: yield f try: f = str_or_list.index(item, f) except ValueError: return else: raise ValueError,\ "type %s is not supported for searching"%type(str_or_list) - Josiah From skip at pobox.com Tue Sep 14 19:12:18 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 14 19:12:26 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: References: <16711.7709.255870.851658@montanaro.dyndns.org> Message-ID: <16711.9842.758365.619457@montanaro.dyndns.org> David> Have you actually tested it, cus I made my own fix than was at David> least and additional 7 lines.... I don't have access to ipv6. I used the ipaddr:port combination in the RedHat bug report as a test input. Here's the change to httplib.py. Replace: i = host.find(':') with i = host.rfind(':') Like I said, it was a one-character fix. rfind() looks from the back of the host/port combination for the colon separating the host and port. Since port numbers can't contain colons, the first colon found from the back has to be the colon separating the host-or-address from the port number. Here's the change to the test case (Lib/test/test_httplib.py). After the for loop that checks for invalid ports, add this analogous for loop that checks valid host/port combinations: for hp in ("[fe80::207:e9ff:fe9b]:8000", "www.python.org:80", "www.python.org"): try: h = httplib.HTTP(hp) except httplib.InvalidURL: print "InvalidURL raised erroneously" The test case failed before applying the patch and succeeded after. According to the principals of I test-driven development, I'm done until another bug surfaces. In short, I've done what I can to fix the obvious problem. Drilling down any deeper than that is impossible for me. As I indicated, I have no ipv6 access. If you test it out and still find problems, please submit a bug report on SF. Please *don't* follow up to python-dev. It's not the appropriate place to discuss the ins and outs of specific patches. I only did so because that was the easiest way to tell the other developers that I'd applied a fix for the problem. back-to-my-paying-job-ly, y'rs, Skip From dgm at ecs.soton.ac.uk Tue Sep 14 18:52:11 2004 From: dgm at ecs.soton.ac.uk (David G Mills) Date: Tue Sep 14 19:12:51 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: <16711.7709.255870.851658@montanaro.dyndns.org> Message-ID: Have you actually tested it, cus I made my own fix than was at least and additional 7 lines.... David. On Tue, 14 Sep 2004, Skip Montanaro wrote: > > David> As the link below shows httplib can't handle an IPv6 address, it > David> checks for a port number by checking for a : but this simply cuts > David> the IPv6 address in two and tries to set the port variable as > David> nonsense. > > David> http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=81926#c2 > > David> Are there any plans to rectify this, or even an alternative > David> version of httplib kicking about that can handle IPv6. > > I just checked in a fix for this (it was a one-character change, not > including unit test update). I don't know if there is a Python bug report > open which now needs to be closed. Considering the ease of the fix, I sort > of think not (otherwise it would have been fixed long ago). Note that in > general we have plenty of other things to do with our time without > monitoring other projects' bug trackers looking for possible Python bug > reports. If they aren't reported on SF we won't here about them. (The > Debian folks routinely open SF items when Python bug reports wind up in the > Debian tracker.) > > -- > Skip Montanaro > Got spam? http://www.spambayes.org/ > skip@pobox.com > From pje at telecommunity.com Tue Sep 14 19:20:08 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Sep 14 19:20:30 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: <16711.9842.758365.619457@montanaro.dyndns.org> References: <16711.7709.255870.851658@montanaro.dyndns.org> Message-ID: <5.1.1.6.0.20040914131736.05819db0@mail.telecommunity.com> At 12:12 PM 9/14/04 -0500, Skip Montanaro wrote: >Here's the change to the test case (Lib/test/test_httplib.py). After the >for loop that checks for invalid ports, add this analogous for loop that >checks valid host/port combinations: > > for hp in ("[fe80::207:e9ff:fe9b]:8000", "www.python.org:80", > "www.python.org"): Here's the test case that's missing, then: "[fe80::207:e9ff:fe9b]" From skip at pobox.com Tue Sep 14 19:55:53 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 14 19:56:03 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: <5.1.1.6.0.20040914131736.05819db0@mail.telecommunity.com> References: <16711.7709.255870.851658@montanaro.dyndns.org> <5.1.1.6.0.20040914131736.05819db0@mail.telecommunity.com> Message-ID: <16711.12457.647107.816397@montanaro.dyndns.org> Phillip> Here's the test case that's missing, then: Phillip> "[fe80::207:e9ff:fe9b]" Whoops. Fixed. Skip From alloydflanagan at comcast.net Tue Sep 14 19:57:47 2004 From: alloydflanagan at comcast.net (alloydflanagan@comcast.net) Date: Tue Sep 14 19:57:50 2004 Subject: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292) Message-ID: <091420041757.13290.414731190005E51E000033EA2200751150020E090E020E04000B970104040E@comcast.net> [François Pinard] >>Many people consider that Unicode, or UTF-8 at least, is strongly >>favouring English (boldly American) over any other script or language. >>If it has not been so, Americans would never have promoted it so much, >>and would have rather shown an infinite and eternal reluctance... To be fair to the developers of Unicode, I'd suggest that the issue is not favoring (note spelling! :) ) English, but rather keeping compatibility with an enormous amount of existing data which was encoded in ASCII. Which was an English standard, but you can only do so much in 7 bits... As for American reluctance, how are you going to convince anyone to double (at least) the storage requirements for their data, to support languages they never use? That would have cost a great deal of money. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20040914/151e29b1/attachment.htm From foom at fuhm.net Tue Sep 14 20:12:35 2004 From: foom at fuhm.net (James Y Knight) Date: Tue Sep 14 20:12:41 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com><1094315138.8696.36.camel@geddy.wooz.org><413F1D9C.20209@egenix.com><413F3605.7090707@egenix.com><413F6120.7090603@egenix.com><87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: On Sep 14, 2004, at 2:54 AM, Terry Reedy wrote: > This is why I am not especially enamored of Unicode and the prospect of > Python becoming married to it. It is heavily weighted in favor of > efficiently representing Chinese and inefficiently representing > English. > To give English equivalent treatment, the 20,000 or so most common > words, > roots, prefixes, and suffixes would each get its own codepoint. Of course it is perfectly possible to have the Python unicode implementation choose to represent some unicode strings with only 8 bits per character. There is no (conceptual) reason it could not represent (u'a' * 8) with 8 bytes + class header overhead. That is simply an implementation detail and really has nothing to do with Unicode itself. It would also be possible to use UTF-8 string storage, although this has the tradeoff that indexing an element takes linear time w.r.t. position instead of constant time. James From fumanchu at amor.org Tue Sep 14 20:39:54 2004 From: fumanchu at amor.org (Robert Brewer) Date: Tue Sep 14 20:45:51 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3027BDA@exchange.hqamor.amorhq.net> Kevin Jacobs: > >Both of your alternatives are being used in some form and > >neither is really satisfactory. Literal representations require > >complex parsers, when the Python parser is really what is > >desired. Phillip J. Eby: > Maybe you missed the earlier part of the thread, where I was > suggesting that a Python "code literal" or "AST literal" > syntax would be helpful. For example, if backquotes didn't > already have a use, one might say something like: > > db.query(`x.y==z and foo*bar<27`) > > To pass an AST object to the db.query() method. The > advantage would be that the AST would be parsed and > syntax checked at compile time, rather than runtime. We already have a de facto "code literal syntax": lambdas. db.query(lambda x: x.y==z and foo*bar<27) I use this technique in my ORM, dejavu. When declared, the lambda gets passed immediately into a wrapper which early-binds as much as possible (using Raymond's cookbook technique). See http://www.aminus.org/rbre/python/logic.py and /codewalk.py for the guts. SQL is generated from the lambda as needed (not online at the moment, sorry, coming soon). The bonus is that you can pass ordinary Python objects into the lambda and evaluate them. The current downside is that it's a bytecode hack and therefore limited to CPython, certain versions. I'd love a generic early-binder mechanism at the language level to help get around that, but it's not critical for my users (= me). > After several experiments with using &, |, and ~ for query > expressions, > I've pretty much quit and gone to using string literals, > since AST literals > don't exist. But if AST literals *did* exist, I'd certainly > use them in > preference to strings. I tried &|~ also and quit pretty quickly (sorry, Greg ;). Using the lambdas allowed me to do more of the parsing earlier, much of it at compile-time, the rest at declaration time (I can then pickle the lambdas so users can persist ones they create). > But, even if PEP 335 *were* implemented, creating a query > system using Python expressions would *still* be kludgy, > because you still need "seed variables" in the current > scope to write a query expression. > In my example above, I didn't need to bind 'x' or 'y' > or 'z' or 'foo' or 'bar', because the db.query() method > is going to interpret those in some context. If I > were using a PEP 335-based query system, I'd have to > initialize those variables to special querying objects first. A lot of that becomes a non-issue if you bind early. Once the constants are bound, you're left with attribute access on your core objects (x.y) and special functions (see logic.ieq or logic.today for example). Again, too, I can use the lambda to evaluate Python objects, the 'Object' side of "ORM". In that situation, the binding is a benefit. >8 > > That's why I say that an AST literal syntax would be much > more useful to me than PEP 335 for this type of use case. I seem to recall my AST version was quite slow, in pure Python. Can't recall whether that was all the tuple-unpacking or just my naive function-call overhead at the time. Anyway, for those reasons, I'm -0.5. Robert Brewer MIS Amor Ministries fumanchu@amor.org From pje at telecommunity.com Tue Sep 14 21:02:31 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Sep 14 21:02:47 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3027BDA@exchange.hqamor.amo rhq.net> Message-ID: <5.1.1.6.0.20040914144853.0230b0e0@mail.telecommunity.com> At 11:39 AM 9/14/04 -0700, Robert Brewer wrote: >We already have a de facto "code literal syntax": lambdas. > >db.query(lambda x: x.y==z and foo*bar<27) Right, but that requires non-portable bytecode disassembly. If there was a simple way to convert from a function object to an AST, I'd be happy with that too. > > But, even if PEP 335 *were* implemented, creating a query > > system using Python expressions would *still* be kludgy, > > because you still need "seed variables" in the current > > scope to write a query expression. > > In my example above, I didn't need to bind 'x' or 'y' > > or 'z' or 'foo' or 'bar', because the db.query() method > > is going to interpret those in some context. If I > > were using a PEP 335-based query system, I'd have to > > initialize those variables to special querying objects first. > >A lot of that becomes a non-issue if you bind early. Once the constants >are bound, you're left with attribute access on your core objects (x.y) >and special functions (see logic.ieq or logic.today for example). Again, >too, I can use the lambda to evaluate Python objects, the 'Object' side >of "ORM". In that situation, the binding is a benefit. I'm not following what you mean by "bind early". My point was that in order to have bindings for seeds like 'x' and 'z' and 'foo', most query languages end up with hacks like 'tables.tablename.columname' or '_.table.column' or other rigamarole, and that this is usually more awkward to deal with than the &/|/~ operator spelling. > > That's why I say that an AST literal syntax would be much > > more useful to me than PEP 335 for this type of use case. > >I seem to recall my AST version was quite slow, in pure Python. Can't >recall whether that was all the tuple-unpacking or just my naive >function-call overhead at the time. When I say AST, I just mean "some kind of syntax representation", not necessarily the 'parser' module's current AST implementation. However, I have found that it's possible to translate parser-module AST's to query specifications quite efficiently in pure Python, such that the overhead is minor compared to whatever actual computation you're doing. The key is that the vast majority of AST nodes are a trivial wrapper around another AST node. The core of my AST-handling engine, therefore, looks like this: def build(builder, nodelist): while len(nodelist)==2: nodelist = nodelist[1] return production[nodelist[0]](builder,nodelist) Where 'production' is a table mapping symbol IDs to helper functions that invoke methods on 'builder', which then may recursively invoke 'build' on items in 'nodelist'. The first two lines of this function eliminate enormous amounts of overhead by ignoring all the zillions of trivial wrapper nodes. (Note that you must include line number information in the generated AST, or it will mistake tokens for unnecessary symbols.) >Anyway, for those reasons, I'm -0.5. On what? AST literals, or PEP 335? From aahz at pythoncraft.com Tue Sep 14 21:04:36 2004 From: aahz at pythoncraft.com (Aahz) Date: Tue Sep 14 21:04:39 2004 Subject: [Python-Dev] Re: PEP 292: method names In-Reply-To: <41465E53.6050606@ocf.berkeley.edu> References: <4142E78C.7010800@heneryd.com> <4145DBB4.8010601@ocf.berkeley.edu> <41465E53.6050606@ocf.berkeley.edu> Message-ID: <20040914190436.GA11541@panix.com> On Mon, Sep 13, 2004, Brett C. wrote: > Fredrik Lundh wrote: >>Brett C wrote: >>> >>>I am sure the way I tend to abbreviate things is not how anyone >>>else would. So why would the stdlib try to? >> >>it's pretty amazing that you've been able to use Python without noticing >>that the standard library is full of abbreviations. > > Just because the stdlib is full of abbreviations does not mean it should be > continued. Precedence != acceptance . What I find interesting about your responses is that you're using the abbreviation "stdlib", assuming that your audience will understand that easily enough. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From pinard at iro.umontreal.ca Tue Sep 14 21:15:28 2004 From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard) Date: Tue Sep 14 21:16:10 2004 Subject: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292) In-Reply-To: <091420041757.13290.414731190005E51E000033EA2200751150020E090E020E04000B970104040E@comcast.net> References: <091420041757.13290.414731190005E51E000033EA2200751150020E090E020E04000B970104040E@comcast.net> Message-ID: <20040914191528.GA7964@alcyon.progiciels-bpi.ca> [alloydflanagan@comcast.net] > [Fran?ois Pinard] > >>Many people consider that Unicode, or UTF-8 at least, is strongly > >>favouring English (boldly American) over any other script or > >>language. If it has not been so, Americans would never have > >>promoted it so much, and would have rather shown an infinite and > >>eternal reluctance... > To be fair to the developers of Unicode, I'd suggest that the issue > is not favoring (note spelling! :) ) English, but rather keeping > compatibility with an enormous amount of existing data which was > encoded in ASCII. Of course, this is the standard and official reason. Yet, the net effect of that concern and constraint, noticed by many foreigners, is that Unicode favours English. (About "favouring" spelling, I find it amusing to spell-check my out-going email with a British dictionary.) > Which was an English standard, but you can only do so much in 7 > bits... As for American reluctance, how are you going to convince > anyone to double (at least) the storage requirements for their data, > to support languages they never use? That would have cost a great > deal of money. I would not think money has to be expressed in term of storage. Storage considerations are more likely a justification than an explanation for the reluctance. UTF-8 is such that on disk, and for applications using UTF-8 internally (there are a few), not a single bit is spent on extra storage for English. There are cases, and the current Python approach is one of them, Unicode may be made to be fairly unobtrusive on memory consumption, at least in English contexts. The complexity added by Unicode, however, may undoubtedly be a concern, for any implementor wanting to really address that standard, that is, further than merely toying with 16-bit characters. *This* means human time, and this is where the real cost lies. -- Fran?ois Pinard http://www.iro.umontreal.ca/~pinard From bac at OCF.Berkeley.EDU Tue Sep 14 21:44:16 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Tue Sep 14 21:44:28 2004 Subject: [Python-Dev] Re: PEP 292: method names In-Reply-To: <20040914190436.GA11541@panix.com> References: <4142E78C.7010800@heneryd.com> <4145DBB4.8010601@ocf.berkeley.edu> <41465E53.6050606@ocf.berkeley.edu> <20040914190436.GA11541@panix.com> Message-ID: <41474A10.4080008@ocf.berkeley.edu> Aahz wrote: > On Mon, Sep 13, 2004, Brett C. wrote: > >>Fredrik Lundh wrote: >> >>>Brett C wrote: >>> >>>>I am sure the way I tend to abbreviate things is not how anyone >>>>else would. So why would the stdlib try to? >>> >>>it's pretty amazing that you've been able to use Python without noticing >>>that the standard library is full of abbreviations. >> >>Just because the stdlib is full of abbreviations does not mean it should be >>continued. Precedence != acceptance . > > > What I find interesting about your responses is that you're using the > abbreviation "stdlib", assuming that your audience will understand that > easily enough. My audience is python-dev, and so I do assume they will know what the abbreviation is. But Template is not just for python-dev but the whole Python community so making assumptions is little dangerous. -Brett From tim.hochberg at ieee.org Tue Sep 14 21:50:00 2004 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Tue Sep 14 21:53:39 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040914112226.0354d8d0@mail.telecommunity.com> References: <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> <5.1.1.6.0.20040910140546.0298f130@mail.telecommunity.com> <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> <4146DE5D.3040702@theopalgroup.com> <5.1.1.6.0.20040914112226.0354d8d0@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: [CHOP] > > As for the numeric use cases, I'm not at all clear why &, |, and ~ (or > special methods/functions) aren't suitable. They often are, but sometimes you want a logical and/or/not and &/|/~ are mapped to bitwise and/or/not, which isn't always what you want. Presumably, if Gregs proposal were adopted, and/or/not would get mapped to numarray.logical_and/or/not. What I find more interesting about this proposal is that one could probably finagle it so that (A < B < C) worked correctly for arrays. It can't work now since it is equivalent to ((A < B) and (B < C)) and 'and' doesn't do anything sensible for arrays at present. This is one I always expect to work even though I know that and/or/not don't work for arrays. -tim From fumanchu at amor.org Tue Sep 14 22:41:57 2004 From: fumanchu at amor.org (Robert Brewer) Date: Tue Sep 14 22:47:55 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022ED4@exchange.hqamor.amorhq.net> Phillip J. Eby wrote: > At 11:39 AM 9/14/04 -0700, Robert Brewer wrote: > >We already have a de facto "code literal syntax": lambdas. > > > >db.query(lambda x: x.y==z and foo*bar<27) > > Right, but that requires non-portable bytecode disassembly. > If there was a simple way to convert from a function object > to an AST, I'd be happy with that too. If it were fast enough, I would too. > > > But, even if PEP 335 *were* implemented, creating a query > > > system using Python expressions would *still* be kludgy, > > > because you still need "seed variables" in the current > > > scope to write a query expression. > > > In my example above, I didn't need to bind 'x' or 'y' > > > or 'z' or 'foo' or 'bar', because the db.query() method > > > is going to interpret those in some context. If I > > > were using a PEP 335-based query system, I'd have to > > > initialize those variables to special querying objects first. > > > >A lot of that becomes a non-issue if you bind early. Once > the constants > >are bound, you're left with attribute access on your core > objects (x.y) > >and special functions (see logic.ieq or logic.today for > example). Again, > >too, I can use the lambda to evaluate Python objects, the > 'Object' side > >of "ORM". In that situation, the binding is a benefit. > > I'm not following what you mean by "bind early". My point > was that in > order to have bindings for seeds like 'x' and 'z' and 'foo', > most query > languages end up with hacks like 'tables.tablename.columname' or > '_.table.column' or other rigamarole, and that this is > usually more awkward > to deal with than the &/|/~ operator spelling. Dejavu addresses that by separating the "table binding" from the expression. That is, given: z = "Hansel" e = logic.Expression(lambda x: x.Name.startswith(z)) books = recall(myapp.Book, e) authors = recall(myapp.Author, e) ...'x' isn't bound within the Expression declaration; it is supplied as the first param to recall(). For example, you could apply the same Expression to both a Book class/table and an Author class/table within the same application, as above. IMO, this is a natural way to map the lambda-calculus to a query language, where the bound variable = ORM-object instances (a "table row"). But any free variables need to be resolved ASAP; therefore, z gets evaluated completely and immediately; Expression() rewrites the lambda co_code, replacing the closure lookup with a LOAD_CONST (sticking the value of z into co_consts). > > > That's why I say that an AST literal syntax would be much > > > more useful to me than PEP 335 for this type of use case. > > > >I seem to recall my AST version was quite slow, in pure Python. Can't > >recall whether that was all the tuple-unpacking or just my naive > >function-call overhead at the time. > > When I say AST, I just mean "some kind of syntax representation", not > necessarily the 'parser' module's current AST implementation. Sure. > However, I have found that it's possible to translate parser-module > AST's to query specifications quite efficiently in pure Python, > such that the overhead is minor compared to whatever actual > computation you're doing... Hmm, perhaps I'll look again. > >Anyway, for those reasons, I'm -0.5. > > On what? AST literals, or PEP 335? The PEP. ASTs would be better. A builtin early-binder would make me happiest, but I won't hold my breath. I don't think it would require new syntax, either, just something like codewalk.EarlyBinder() and .LambdaDecompiler() in a standard lib module somewhere. But I may go back and look at ASTs again. Robert Brewer MIS Amor Ministries fumanchu@amor.org From nhodgson at bigpond.net.au Tue Sep 14 23:41:45 2004 From: nhodgson at bigpond.net.au (Neil Hodgson) Date: Tue Sep 14 23:41:52 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP292:SimpleString Substitutions References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer> <4138D622.6050807@egenix.com> <1094315138.8696.36.camel@geddy.wooz.org> <413F1D9C.20209@egenix.com> <413F3605.7090707@egenix.com> <413F6120.7090603@egenix.com> <87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: <024501c49aa3$a1afb450$a44a8890@neil> James Y Knight: > It would also be possible to use UTF-8 string storage, although this > has the tradeoff that indexing an element takes linear time w.r.t. > position instead of constant time. At the cost of additional storage, indexing into UTF-8 by character rather than byte can be made better than linear. Two techniques are (1) maintain a list containing the byte index of some character index values (such as each line start) then use linear access from the closest known index and (2) to cache the most recent access due to the likelihood that the next access will be close. While I have thought about this problem, it has only once came up seriously for Scintilla (an editing component) and that was when someone was trying to provide a UCS2 facade that matched existing interfaces. Neil From barry at barrys-emacs.org Tue Sep 14 23:46:41 2004 From: barry at barrys-emacs.org (Barry Scott) Date: Tue Sep 14 23:47:24 2004 Subject: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292) In-Reply-To: <20040914191528.GA7964@alcyon.progiciels-bpi.ca> References: <091420041757.13290.414731190005E51E000033EA2200751150020E090E020E04000B970104040E@comcast.net> <20040914191528.GA7964@alcyon.progiciels-bpi.ca> Message-ID: <904FAAD7-0697-11D9-9E6D-000A95A8705A@barrys-emacs.org> On Sep 14, 2004, at 20:15, Fran?ois Pinard wrote: > Of course, this is the standard and official reason. Yet, the net > effect of that concern and constraint, noticed by many foreigners, is > that Unicode favours English. (About "favouring" spelling, I find it > amusing to spell-check my out-going email with a British dictionary.) First where national character sets. Working in more then one language was a nightmare. Then came ISO 10646 which gave every language its own unique set of code points. But ISO 10646 is not easy to process which lead to the development of unicode that is easier to implement and work but could not originally deal with all the code points required for all the worlds languages. I believe that was been fixed now you can have 32bit unicode. Somewhere in the code point space you have to have ASCII. I'd be charitable and say that its pragmatic that its in code page 0 given the history of the computer industry. From now on if you use unicode no language has an advantage, all are equal and software authors stand a chance to create international software. Barry From martin at v.loewis.de Tue Sep 14 23:48:16 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Sep 14 23:48:21 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: <16711.12457.647107.816397@montanaro.dyndns.org> References: <16711.7709.255870.851658@montanaro.dyndns.org> <5.1.1.6.0.20040914131736.05819db0@mail.telecommunity.com> <16711.12457.647107.816397@montanaro.dyndns.org> Message-ID: <41476720.9060803@v.loewis.de> Skip Montanaro wrote: > Phillip> Here's the test case that's missing, then: > > Phillip> "[fe80::207:e9ff:fe9b]" > > Whoops. Fixed. The code was still incorrect. The square brackets don't belong to the host name - they are part of the URL syntax. Before passing them to the socket module, they need to be stripped off. I have now changed httplib to do that right when parsing host:port. Regards, Martin From martin at v.loewis.de Wed Sep 15 00:03:05 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 00:03:10 2004 Subject: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292) In-Reply-To: <904FAAD7-0697-11D9-9E6D-000A95A8705A@barrys-emacs.org> References: <091420041757.13290.414731190005E51E000033EA2200751150020E090E020E04000B970104040E@comcast.net> <20040914191528.GA7964@alcyon.progiciels-bpi.ca> <904FAAD7-0697-11D9-9E6D-000A95A8705A@barrys-emacs.org> Message-ID: <41476A99.9030305@v.loewis.de> Barry Scott wrote: > Then came ISO 10646 which gave every language its own unique set > of code points. But ISO 10646 is not easy to process which lead to the > development of unicode that is easier to implement and work but could > not originally deal with all the code points required for all the worlds > languages. I think this is historically incorrect. ISO 10646 and Unicode were developed in lock-step, and the very first publication of ISO 10646 (in 1993) had precisely the same character assignments as Unicode 1.1. Ever since then, both standards are roughly the same. > I believe that was been fixed now you can have 32bit unicode. This is also incorrect. Unicode now has roughly 20.09 bits. ISO 10646 used to have 32 bits, but now also restricts itself to 20.09 bits. There are encodings of it which take four octets per code point. > Somewhere in the code point space you have to have ASCII. I'd be charitable > and say that its pragmatic that its in code page 0 given the history of > the computer > industry. Strictly speaking, this is group 0, plane 0, row 0 (actually, only the first 128 cells of this row). > From now on if you use unicode no language has an advantage, > all are equal and software authors stand a chance to create international > software. ... assuming encodings are the only issue in creating international software. Regards, Martin From martin at v.loewis.de Wed Sep 15 00:04:52 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 00:04:55 2004 Subject: [Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions In-Reply-To: References: <007301c491e7$13f1c0a0$e841fea9@oemcomputer><4138D622.6050807@egenix.com><1094315138.8696.36.camel@geddy.wooz.org><413F1D9C.20209@egenix.com><413F3605.7090707@egenix.com><413F6120.7090603@egenix.com><87ekl6re7n.fsf@tleepslib.sk.tsukuba.ac.jp> Message-ID: <41476B04.3020009@v.loewis.de> James Y Knight wrote: > Of course it is perfectly possible to have the Python unicode > implementation choose to represent some unicode strings with only 8 bits > per character. That would break the C API, though, which is part of Python. Regards, Martin From pinard at iro.umontreal.ca Wed Sep 15 01:58:19 2004 From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard) Date: Wed Sep 15 01:59:01 2004 Subject: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292) In-Reply-To: <904FAAD7-0697-11D9-9E6D-000A95A8705A@barrys-emacs.org> References: <091420041757.13290.414731190005E51E000033EA2200751150020E090E020E04000B970104040E@comcast.net> <20040914191528.GA7964@alcyon.progiciels-bpi.ca> <904FAAD7-0697-11D9-9E6D-000A95A8705A@barrys-emacs.org> Message-ID: <20040914235819.GA10975@alcyon.progiciels-bpi.ca> [Barry Scott] > Then came ISO 10646 which gave every language its own unique set > of code points. Many languages at most. That's far from "every language". And some languages, and not the least, were not satisfied with ISO 10646, many countries long resisted its adoption as a national standard. > But ISO 10646 is not easy to process which lead to the development of > unicode [...] ISO 10646 and Unicode converged. Unicode was the fact of an industry consortium, ISO 10646 was more in the realm of international standards. Why do you say that ISO 10646 was especially "not easy to process"? > that is easier to implement and work but could not originally deal > with all the code points required for all the worlds languages. Before the convergence, ISO 10646 more than Unicode was designed for many code points, and so, ISO 10646 was more opened to many languages. > I believe that was been fixed now you can have 32bit unicode. Neither ISO 10646 nor Unicode are 32 bits. The limit is 31 bits. > From now on if you use unicode no language has an advantage, all are > equal and software authors stand a chance to create international > software. English has a clear and definite advantage in Unicode, and this is reflected in various Unicode-aware programs. Taking Python as an mere example, English texts may be translated from `unicode' to `str' without raising an exception -- not many languages benefit of this property. Some languages have all their characters pre-combined in Unicode, and these have the advantage over the others of needing only one code point per character. Lately introduced languages met the established resistance of Unicode (and W3C) to any new pre-combined characters, and have to cope with zero-width diacritics, so inducing purely artificial complexities in programs. Unicode might well have granted them the same service as early comers. And there are more complex or difficult things which are needed by some languages when Unicoded, still unneeded by the above languages, directionality marks quickly come to mind. Software authors will support Unicode more or less deeply depending on the fact they aim German, Hebrew or Korean. I do not think most American-centric applications will go very far supporting Unicode. For real and complete Unicode support, software authors are only equal by the hell they have to suffer. I hardly call this a "chance"! :-) -- Fran?ois Pinard http://www.iro.umontreal.ca/~pinard From greg at cosc.canterbury.ac.nz Wed Sep 15 03:15:09 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Sep 15 03:15:16 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> Message-ID: <200409150115.i8F1F9Y4012592@cosc353.cosc.canterbury.ac.nz> "Phillip J. Eby" : > So, something like this: > > query("x and y or z") > > isn't "code that performs database queries"? Yes, but it's not Python code - it's SQL code wrapped in a string wrapped in Python code. I want just Python code. > My main concern about the PEP is that it adds overhead to *all* > logical operations, but the feature will only benefit code that > hasn't yet been written. The overhead shouldn't be substantially worse than that already incurred by all the other operators being overloadable. Also, realistically, how much code do you think has boolean operations as a speed bottleneck? I find it hard to imagine what such code would be like. > I also fear that as a result, people will start writing complex > if-then blocks to "optimize" performance of conditionals to get them > back to where they were before the facility was added. If people do that, they're guilty of premature optimisation if they haven't actually measured the speed of their code and found an actual problem with it. I expect such cases will be extremely rare if they occur at all. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Wed Sep 15 03:54:51 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Sep 15 03:54:56 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <20040914133233.29723.1901953221.divmod.quotient.1109@ohm> Message-ID: <200409150154.i8F1spLt012644@cosc353.cosc.canterbury.ac.nz> exarkun@divmod.com: > Python's parser is already available, through the compiler module. > The example given earlier, query("x and y or z"), is relatively > straightforward to implement as a set of AST manipulations. But that misses the point, which is to have the expression blend in seamlessly with the rest of the Python code. Anything which requires the explicit invocation of a separate parsing phase prevents that. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jhylton at gmail.com Wed Sep 15 04:24:24 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Wed Sep 15 04:24:56 2004 Subject: [Python-Dev] --with-tsc compile fails Message-ID: I'm feeling pretty out of it :-). I'm very happy to see that the Pentium tsc patch made it into the core; I had missed it. I'm amused that the Pentium tsc patch works for PPC, too. Anyway, I tried to use it this evening and the compilation failed: ../Python/ceval.c:50:21: asm/msr.h: No such file or directory ../Python/ceval.c: In function `PyEval_EvalFrame': ../Python/ceval.c:575: warning: implicit declaration of function `rdtscll' ../Python/ceval.c:572: warning: `inst0' might be used uninitialized in this function ../Python/ceval.c:572: warning: `inst1' might be used uninitialized in this function ../Python/ceval.c:572: warning: `loop0' might be used uninitialized in this function ../Python/ceval.c:572: warning: `loop1' might be used uninitialized in this function It sounds like is for Microsoft platforms, but I'm building on Linux. Perhaps the change to add PPC support screwed up the ifdefs that were detecting a Windows compile? Does it work for anyone else? Jeremy From greg at cosc.canterbury.ac.nz Wed Sep 15 04:44:12 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Sep 15 04:44:22 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: Message-ID: <200409150244.i8F2iCfi012752@cosc353.cosc.canterbury.ac.nz> > What I find more interesting about this proposal is that one could > probably finagle it so that (A < B < C) worked correctly for arrays. Yes. Despite what I said earlier, I've now decided that the new semantics should be extended to A < B < C as well. I'll update the pep & patch at some point to reflect this. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From ilya at bluefir.net Wed Sep 15 05:06:30 2004 From: ilya at bluefir.net (Ilya Sandler) Date: Wed Sep 15 05:07:36 2004 Subject: [Python-Dev] --with-tsc compile fails In-Reply-To: References: Message-ID: > It sounds like is for Microsoft platforms, but I'm > building on Linux. Perhaps the change to add PPC support screwed up > the ifdefs that were detecting a Windows compile? Does it work for > anyone else? /usr/include/asm/msr.h exists on my linux system (mixed Debian 3.0) (msr.h came with linux-kernel-headers package, my kernel version 2.4.25) and compile with WITH_TSC defined worked fine for me about a week ago Ilya PS. just checked my other ancient RedHat 7.2 install and it also has /usr/include/asm/msr.h On Tue, 14 Sep 2004, Jeremy Hylton wrote: > I'm feeling pretty out of it :-). I'm very happy to see that the > Pentium tsc patch made it into the core; I had missed it. I'm amused > that the Pentium tsc patch works for PPC, too. Anyway, I tried to use > it this evening and the compilation failed: > > ../Python/ceval.c:50:21: asm/msr.h: No such file or directory > ../Python/ceval.c: In function `PyEval_EvalFrame': > ../Python/ceval.c:575: warning: implicit declaration of function `rdtscll' > ../Python/ceval.c:572: warning: `inst0' might be used uninitialized in > this function > ../Python/ceval.c:572: warning: `inst1' might be used uninitialized in > this function > ../Python/ceval.c:572: warning: `loop0' might be used uninitialized in > this function > ../Python/ceval.c:572: warning: `loop1' might be used uninitialized in > this function > > It sounds like is for Microsoft platforms, but I'm > building on Linux. Perhaps the change to add PPC support screwed up > the ifdefs that were detecting a Windows compile? Does it work for > anyone else? > > Jeremy > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ilya%40bluefir.net > From pje at telecommunity.com Wed Sep 15 05:15:37 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 15 05:16:00 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <200409150115.i8F1F9Y4012592@cosc353.cosc.canterbury.ac.nz> References: <5.1.1.6.0.20040914002619.02aaf8c0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040914225734.02433b50@mail.telecommunity.com> At 01:15 PM 9/15/04 +1200, Greg Ewing wrote: >"Phillip J. Eby" : > > > So, something like this: > > > > query("x and y or z") > > > > isn't "code that performs database queries"? > >Yes, but it's not Python code - it's SQL code wrapped >in a string wrapped in Python code. I want just Python >code. But if this were possible: query(``x and y or z``) such that the expression ``x and y or z`` results in a Python AST for that expression, then you'd be able to do whatever you want with it. > > My main concern about the PEP is that it adds overhead to *all* > > logical operations, but the feature will only benefit code that > > hasn't yet been written. > >The overhead shouldn't be substantially worse than that already >incurred by all the other operators being overloadable. Also, >realistically, how much code do you think has boolean operations as a >speed bottleneck? I find it hard to imagine what such code would be >like. So it's acceptable to slow down all logical operations, add new byte codes, and expand the size of the eval loop, all to support a niche usage? That doesn't make sense to me. Again, I'm not familiar with the numeric use cases, but I am familiar with algebraic manipulation of Python code for SQL generation and other purposes, and I honestly don't see any benefit to the PEP for those purposes. AST's are more useful, and I'd support a PEP to make code expressible as literals, because that wouldn't impose overhead on systems that doesn't use them. (For one thing, they could be expressed as constants in code objects, so the bytecode would just be LOAD_CONST.) For the numeric use cases, frankly I don't see why one would want to apply short-circuiting boolean operators to arrays, since presumably the values in them have already been evaluated. And if the idea is to make them *not* be short-circuting operators, that seems to me to corrupt the whole point of the logical operators versus their bitwise counterparts. From greg at cosc.canterbury.ac.nz Wed Sep 15 06:34:46 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed Sep 15 06:34:58 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040914225734.02433b50@mail.telecommunity.com> Message-ID: <200409150434.i8F4YkpL012903@cosc353.cosc.canterbury.ac.nz> "Phillip J. Eby" : > For the numeric use cases, frankly I don't see why one would want to > apply short-circuiting boolean operators to arrays, since presumably > the values in them have already been evaluated. And if the idea is > to make them *not* be short-circuting operators, that seems to me to > corrupt the whole point of the logical operators versus their > bitwise counterparts. There's more to it than short-circuiting. Consider a = array([42, ""]) b = array([(), "spam"]) One might reasonably expect the result of 'a or b' to be array([42, "spam"]) which is considerably different from a bitwise operation. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin at v.loewis.de Wed Sep 15 07:53:13 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 07:53:17 2004 Subject: [Python-Dev] tempfile.TemporaryFile on Windows NT Message-ID: <4147D8C9.3020508@v.loewis.de> The tempfile module has a wrapper class to implement delete on close. On NT+, this is not necessary, since the system supports the O_TEMPORARY flag. However the wrapper is still created 'so that file.name is useful (i.e. not "(fdopen)"'. I find this a weak argument, since file.name is also "fdopen" on POSIX. So I would like to drop the wrapper object on Windows NT, and have tempfile.TemporaryFile return a proper file object. Any objections? If there are objections, would they change if file.name would point uniformly to the file name of the temporary file? If so, should this be better achieved by os.fdopen grow a name argument, or by using builtin open() in the first place? On Windows, one can pass the additional "D" flag to open() to get a delete-on-close file. Regards, Martin From foom at fuhm.net Wed Sep 15 08:33:50 2004 From: foom at fuhm.net (James Y Knight) Date: Wed Sep 15 08:33:57 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <200409150434.i8F4YkpL012903@cosc353.cosc.canterbury.ac.nz> References: <200409150434.i8F4YkpL012903@cosc353.cosc.canterbury.ac.nz> Message-ID: <34EB398C-06E1-11D9-AC9C-000A95A50FB2@fuhm.net> On Sep 15, 2004, at 12:34 AM, Greg Ewing wrote: > There's more to it than short-circuiting. Consider > > a = array([42, ""]) > b = array([(), "spam"]) > > One might reasonably expect the result of 'a or b' to > be > > array([42, "spam"]) > > which is considerably different from a bitwise operation. One might, but *I* would reasonably expect it to give me array a, by extrapolation from every other data type in python. Consider also this: x and 4 or 5 which is of course a common idiom to workaround the lack of an if-then-else expression. So, try with x = array([42, 0]) Currently, doing this with numarray raises an exception "An array doesn't make sense as a truth value. Use sometrue(a) or alltrue(a).". Odd, since nearly all python objects can somehow be turned into a truth value, but ok. [Forbidding __nonzero__ prevents horrible mistakes from occurring because of the misuse of the comparison operators as element-wise comparison. "if array([1,2,3]) == array([3,2,1]): print 'Bad'" of course oughtn't print 'Bad'.] However, with this change, it may instead return: array([4, 5]) and that's nothing like what was meant. The idiom would change to: bool(x) and 4 or 5 I suppose... James PS: Perl6 has distinct element-wise operators ("hyper" operators). I find that less distasteful than misusing regular operators as element-wise operators, when they really have vastly different semantics. From tim.hochberg at ieee.org Wed Sep 15 08:48:16 2004 From: tim.hochberg at ieee.org (Tim Hochberg) Date: Wed Sep 15 08:48:29 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <200409150434.i8F4YkpL012903@cosc353.cosc.canterbury.ac.nz> References: <5.1.1.6.0.20040914225734.02433b50@mail.telecommunity.com> <200409150434.i8F4YkpL012903@cosc353.cosc.canterbury.ac.nz> Message-ID: Greg Ewing wrote: > "Phillip J. Eby" : > > >>For the numeric use cases, frankly I don't see why one would want to >>apply short-circuiting boolean operators to arrays, since presumably >>the values in them have already been evaluated. And if the idea is >>to make them *not* be short-circuting operators, that seems to me to >>corrupt the whole point of the logical operators versus their >>bitwise counterparts. > > > There's more to it than short-circuiting. Consider > > a = array([42, ""]) > b = array([(), "spam"]) > > One might reasonably expect the result of 'a or b' to > be > > array([42, "spam"]) > > which is considerably different from a bitwise operation. Another example from numarray land. You can pick out subarrays, by indexing with an array of booleans, which can be pretty slick. >>> import numarray as na >>> a = na.arange(9) >>> a[a < 4] array([0, 1, 2, 3]) You would like a[2 < a < 4] to work, but instead you need: >>> a[(2 < a) & (a < 4)] Gregs proposal could fix this. Or suppose you want to find the logical and of a, b. Consider trying to use bitwise ops: >>> a = na.array([1,1,1,1]) # all true >>> b = na.array([2,2,2,2]) # all true >>> a & b array([0, 0, 0, 0]) # oops, that's why there's logical_and >>> na.logical_and(a,b) array([1, 1, 1, 1], type=Bool) >>> (a!=0) & (b!=0) # this also works, but it does 3x as much work array([1, 1, 1, 1], type=Bool) Again with Greg's proposal one could write 'a and b' for this. Much nicer. It's not that you couldn't make numarrays short circuit. In the expression "a and b", if all the elements of a are false, then we can skip evaluating b. I'm just not sure that this is a good idea. -tim From dgm at ecs.soton.ac.uk Wed Sep 15 11:24:48 2004 From: dgm at ecs.soton.ac.uk (David G Mills) Date: Wed Sep 15 11:32:20 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: <41476720.9060803@v.loewis.de> Message-ID: And where can we get a copy of this new 'official' httplib? David. On Tue, 14 Sep 2004, [ISO-8859-1] "Martin v. L=F6wis" wrote: > Skip Montanaro wrote: > > Phillip> Here's the test case that's missing, then: > >=20 > > Phillip> "[fe80::207:e9ff:fe9b]" > >=20 > > Whoops. Fixed. >=20 > The code was still incorrect. The square brackets don't belong > to the host name - they are part of the URL syntax. Before passing > them to the socket module, they need to be stripped off. I have now > changed httplib to do that right when parsing host:port. >=20 > Regards, > Martin >=20 From FBatista at uniFON.com.ar Wed Sep 15 14:53:55 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Wed Sep 15 14:58:36 2004 Subject: [Python-Dev] Trying to extract documentation from the CVS Message-ID: I'm packaging decimal. In part because of the suggestion of Alex Martelli, and in part because, as I need it for my SiGeFi project, it's a must to offer the user to download it separately if he/she has Py2.3 and cannot upgrade. The main issue I have is including the documentation. I want to include something like a "decimal.pdf" with the decimal documentation only. So I copied the libdecimal.tex and tried to convert it, and I couldn't. I'm a completely tex newbie, but I think that there's an issue with the syntax (that the file uses it own and I don't know which files use). I've generated the documentation from the CVS files (with the make), so I guess I have all the necessary support programs in my machine (not here at office, at home). So, the questions are: - Is possible to extract only one file and generate a .pdf from it? And a .html? - There's somewhere a how-to? Or the procedure is so simple that is not needed? - Which files from CVS I need? Sorry if some of these questions are not python-dev specific and could be answered only with tex knowledge. Thank you very much. Facundo Batista Desarrollo de Red fbatista@unifon.com.ar (54 11) 5130-4643 Cel: 15 5097 5024 From mwh at python.net Wed Sep 15 15:35:07 2004 From: mwh at python.net (Michael Hudson) Date: Wed Sep 15 15:35:08 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040914144853.0230b0e0@mail.telecommunity.com> (Phillip J. Eby's message of "Tue, 14 Sep 2004 15:02:31 -0400") References: <5.1.1.6.0.20040914144853.0230b0e0@mail.telecommunity.com> Message-ID: <2m1xh3y7sk.fsf@starship.python.net> "Phillip J. Eby" writes: > When I say AST, I just mean "some kind of syntax representation", not > necessarily the 'parser' module's current AST implementation. That ... thing ... isn't an AST in any sense of the word. I know the documentation and the method names (used to) suggest it is, but that doesn't make it true :) Cheers, mwh -- It's relatively seldom that desire for sex is involved in technology procurement decisions. -- ESR at EuroPython 2002 From mwh at python.net Wed Sep 15 15:51:55 2004 From: mwh at python.net (Michael Hudson) Date: Wed Sep 15 15:51:57 2004 Subject: [Python-Dev] --with-tsc compile fails In-Reply-To: (Jeremy Hylton's message of "Tue, 14 Sep 2004 22:24:24 -0400") References: Message-ID: <2mwtyvwsg4.fsf@starship.python.net> Jeremy Hylton writes: > I'm feeling pretty out of it :-). I'm very happy to see that the > Pentium tsc patch made it into the core; I had missed it. I'm amused > that the Pentium tsc patch works for PPC, too. I did consider changing all the names but couldn't be bothered. > Anyway, I tried to use it this evening and the compilation failed: > > ../Python/ceval.c:50:21: asm/msr.h: No such file or directory > ../Python/ceval.c: In function `PyEval_EvalFrame': > ../Python/ceval.c:575: warning: implicit declaration of function `rdtscll' > ../Python/ceval.c:572: warning: `inst0' might be used uninitialized in > this function > ../Python/ceval.c:572: warning: `inst1' might be used uninitialized in > this function > ../Python/ceval.c:572: warning: `loop0' might be used uninitialized in > this function > ../Python/ceval.c:572: warning: `loop1' might be used uninitialized in > this function > > It sounds like is for Microsoft platforms, but I'm > building on Linux. Perhaps the change to add PPC support screwed up > the ifdefs that were detecting a Windows compile? Well, it failed like that for me both before and after my PPC changes. I'm fairly sure I didn't mess this up. Maybe there's some kernel-headers package that's necessary. OTOH, I think one could replace the include by #define rdtscll(val) \ __asm__ __volatile__("rdtsc" : "=A" (val)) if my limited googling is anything to go by. It also seems asm/msr.h is a "kernel internal header with absolutely no stable API properties...." (Redhat bugzilla). So, now I've written this email , I think we should take out the include and put in the #define. Anyone who cares about, e.g., Windows can find out how to make their compiler do this. Cheers, mwh -- Presumably pronging in the wrong place zogs it. -- Aldabra Stoddart, ucam.chat From theller at python.net Wed Sep 15 17:27:24 2004 From: theller at python.net (Thomas Heller) Date: Wed Sep 15 17:27:32 2004 Subject: [Python-Dev] PyExc_UnicodeDecodeError Message-ID: Can anyone explain why calling this code in a C extension static PyObject * test(PyObject *self, PyObject *arg) { PyErr_SetString(PyExc_UnicodeDecodeError, "blah blah"); return NULL; } PyMethodDef module_methods[] = { {"test", test, METH_NOARGS}, {NULL, NULL} }; does this (same in 2.3.4, and 2.4 current CVS): >>> from somewhere import test >>> test() Traceback (most recent call last): File "", line 1, in ? TypeError: function takes exactly 5 arguments (1 given) >>> Thomas From mal at egenix.com Wed Sep 15 17:35:36 2004 From: mal at egenix.com (M.-A. Lemburg) Date: Wed Sep 15 17:35:40 2004 Subject: [Python-Dev] PyExc_UnicodeDecodeError In-Reply-To: References: Message-ID: <41486148.7090007@egenix.com> Thomas Heller wrote: > Can anyone explain why calling this code in a C extension > > static PyObject * > test(PyObject *self, PyObject *arg) > { > PyErr_SetString(PyExc_UnicodeDecodeError, "blah blah"); > return NULL; > } > > PyMethodDef module_methods[] = { > {"test", test, METH_NOARGS}, > {NULL, NULL} > }; > > > does this (same in 2.3.4, and 2.4 current CVS): > > >>>>from somewhere import test >>>>test() > > Traceback (most recent call last): > File "", line 1, in ? > TypeError: function takes exactly 5 arguments (1 given) > See Python/exceptions.c: PyObject * PyUnicodeDecodeError_Create( const char *encoding, const char *object, int length, int start, int end, const char *reason) { return PyObject_CallFunction(PyExc_UnicodeDecodeError, "ss#iis", encoding, object, length, start, end, reason); } This exception is thrown by codecs that want to signal a decoding error. It includes the context of the problem as well as the reason string. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 15 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From pje at telecommunity.com Wed Sep 15 17:56:31 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 15 17:57:19 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: References: <200409150434.i8F4YkpL012903@cosc353.cosc.canterbury.ac.nz> <5.1.1.6.0.20040914225734.02433b50@mail.telecommunity.com> <200409150434.i8F4YkpL012903@cosc353.cosc.canterbury.ac.nz> Message-ID: <5.1.1.6.0.20040915115358.033df630@mail.telecommunity.com> At 11:48 PM 9/14/04 -0700, Tim Hochberg wrote: >Again with Greg's proposal one could write 'a and b' for this. Much nicer. > >It's not that you couldn't make numarrays short circuit. In the expression >"a and b", if all the elements of a are false, then we can skip evaluating >b. I'm just not sure that this is a good idea. My point is that the idea of using 'and' in order to implement something that's *not* short-circuiting seems like a bad idea. I'd rather see array-specific operators added, or some sort of infix notation for functions so that you can define custom operators for such specialized usages. From pje at telecommunity.com Wed Sep 15 17:58:27 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 15 17:59:12 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <2m1xh3y7sk.fsf@starship.python.net> References: <5.1.1.6.0.20040914144853.0230b0e0@mail.telecommunity.com> <5.1.1.6.0.20040914144853.0230b0e0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040915115744.03364ae0@mail.telecommunity.com> At 02:35 PM 9/15/04 +0100, Michael Hudson wrote: >"Phillip J. Eby" writes: > > > When I say AST, I just mean "some kind of syntax representation", not > > necessarily the 'parser' module's current AST implementation. > >That ... thing ... isn't an AST in any sense of the word. > >I know the documentation and the method names (used to) suggest it is, >but that doesn't make it true :) Well, it's definitely syntax and it's definitely a tree, so it's at least an ST. :) From theller at python.net Wed Sep 15 18:01:07 2004 From: theller at python.net (Thomas Heller) Date: Wed Sep 15 18:01:16 2004 Subject: [Python-Dev] PyExc_UnicodeDecodeError In-Reply-To: <41486148.7090007@egenix.com> (M.'s message of "Wed, 15 Sep 2004 17:35:36 +0200") References: <41486148.7090007@egenix.com> Message-ID: "M.-A. Lemburg" writes: > Thomas Heller wrote: >> Can anyone explain why calling this code in a C extension >> static PyObject * >> test(PyObject *self, PyObject *arg) >> { >> PyErr_SetString(PyExc_UnicodeDecodeError, "blah blah"); >> return NULL; >> } >> PyMethodDef module_methods[] = { >> {"test", test, METH_NOARGS}, >> {NULL, NULL} >> }; >> does this (same in 2.3.4, and 2.4 current CVS): >> >>>>>from somewhere import test >>>>>test() >> Traceback (most recent call last): >> File "", line 1, in ? >> TypeError: function takes exactly 5 arguments (1 given) >> > > See Python/exceptions.c: > > PyObject * PyUnicodeDecodeError_Create( > const char *encoding, const char *object, int length, > int start, int end, const char *reason) > { > return PyObject_CallFunction(PyExc_UnicodeDecodeError, "ss#iis", > encoding, object, length, start, end, reason); > } > > This exception is thrown by codecs that want to signal a > decoding error. It includes the context of the problem as > well as the reason string. Thanks, this makes sense. The real problem I wanted to solve is a little bit less contrieved ;-) In this context: I find Exceptions being much too underdocumented. Not only that a lot of built in exceptions are not listed in , also I find the description for the exceptions here very diffcult to understand, if you want to define a subclass of, for example, WindowsError for your own code. A much more interesting and understandable reading is the exceptions.py module which was last used in 1.5, afaik. I'm not sure what there can be done about that, maybe keep exceptions.py in sync (although unused) with the current code, and point to it from the docs? Thomas From martin at v.loewis.de Wed Sep 15 20:22:18 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 20:22:22 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: References: Message-ID: <4148885A.5090803@v.loewis.de> David G Mills wrote: > And where can we get a copy of this new 'official' httplib? As usual: In the CVS. Regards, Martin From martin at v.loewis.de Wed Sep 15 20:43:13 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 20:43:16 2004 Subject: [Python-Dev] --with-tsc compile fails In-Reply-To: <2mwtyvwsg4.fsf@starship.python.net> References: <2mwtyvwsg4.fsf@starship.python.net> Message-ID: <41488D41.9090905@v.loewis.de> Michael Hudson wrote: > Well, it failed like that for me both before and after my PPC changes. > I'm fairly sure I didn't mess this up. Maybe there's some > kernel-headers package that's necessary. > > OTOH, I think one could replace the include by > > #define rdtscll(val) \ > __asm__ __volatile__("rdtsc" : "=A" (val)) > > if my limited googling is anything to go by. It also seems asm/msr.h > is a "kernel internal header with absolutely no stable API > properties...." (Redhat bugzilla). I'ld still like to understand why it fails for your system (it works fine on mine). Do you have a definition for rdtscll in /usr/include/asm/msr.h? Is it a define like the one you just put there? If so, why does the macro not expand? Regards, Martin From martin at v.loewis.de Wed Sep 15 20:46:06 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 20:46:08 2004 Subject: [Python-Dev] PyExc_UnicodeDecodeError In-Reply-To: References: <41486148.7090007@egenix.com> Message-ID: <41488DEE.1030506@v.loewis.de> Thomas Heller wrote: > In this context: I find Exceptions being much too underdocumented. [...] > I'm not sure what there can be done about that, maybe keep exceptions.py > in sync (although unused) with the current code, and point to it from > the docs? The best solution for missing, incomplete, and incomprehensible documentation is to add, complete, and rewrite the documentation. Do you volunteer? Regards, Martin From tim.peters at gmail.com Wed Sep 15 20:51:16 2004 From: tim.peters at gmail.com (Tim Peters) Date: Wed Sep 15 20:51:19 2004 Subject: [Python-Dev] tempfile.TemporaryFile on Windows NT In-Reply-To: <4147D8C9.3020508@v.loewis.de> References: <4147D8C9.3020508@v.loewis.de> Message-ID: <1f7befae0409151151486156b@mail.gmail.com> [Martin v. L?wis] > The tempfile module has a wrapper class to implement > delete on close. On NT+, this is not necessary, since > the system supports the O_TEMPORARY flag. However > the wrapper is still created 'so that file.name is useful > (i.e. not "(fdopen)"'. I find this a weak argument, since > file.name is also "fdopen" on POSIX. File names are much more important on Windows, because Windows doesn't allow to rename or delete an open file, two things Unixheads are apparently incapable of avoiding. Without a name, debugging Unixhead code on Windows gets harder. Files on Windows, at least from the std C level, always have names. > So I would like to drop the wrapper object on Windows NT, > and have tempfile.TemporaryFile return a proper file > object. Any objections? I would sorely miss knowing the name. > If there are objections, would they change if file.name > would point uniformly to the file name of the temporary > file? Yes, but file.name is read-only now -- you can't change file.name from Python code. > If so, should this be better achieved by os.fdopen grow > a name argument, or by using builtin open() in the first > place? On Windows, one can pass the additional "D" flag > to open() to get a delete-on-close file. The latter doesn't fly, because there's no flag to open() that gets the effect of the Windows O_NOINHERIT, and O_NOINHERIT is a pragmatic necessity for sane use of temp files on Windows. Indeed, although the docs don't say this, without O_NOINHERIT even O_TEMPORARY isn't reliable (program P creates a file F w/ O_TEMPORARY but not O_NOINHERIT; P spawns program Q; Q inherits F's file descriptor, but does *not* inherit the "delete on close" info about F; P exits; F is not deleted then because a handle is still open on F (in Q); Q exits; F isn't deleted then either because Q never knew that F was a "delete on close" file; P and Q are both gone now, but F never goes away; specifying O_NOINHERIT too stops this). BTW, the docs also don't say this: a file created with O_TEMPORARY cannot be opened by name again, not even by the process that created the file. That's why there's no "security risk" in having a named O_TEMPORARY file visible in the filesystem on Windows (although, as above, that can lose if O_NOINHERIT isn't used too, or even if the creating process goes away without running the C runtime cleanup code). From jhylton at gmail.com Wed Sep 15 20:56:43 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Wed Sep 15 20:56:47 2004 Subject: [Python-Dev] --with-tsc compile fails In-Reply-To: <2mwtyvwsg4.fsf@starship.python.net> References: <2mwtyvwsg4.fsf@starship.python.net> Message-ID: On Wed, 15 Sep 2004 14:51:55 +0100, Michael Hudson wrote: > Jeremy Hylton writes: > > > I'm feeling pretty out of it :-). I'm very happy to see that the > > Pentium tsc patch made it into the core; I had missed it. I'm amused > > that the Pentium tsc patch works for PPC, too. > > I did consider changing all the names but couldn't be bothered. There's nothing wrong with amusing names for obscure stuff like this :-). > OTOH, I think one could replace the include by > > #define rdtscll(val) \ > __asm__ __volatile__("rdtsc" : "=A" (val)) > > if my limited googling is anything to go by. It also seems asm/msr.h > is a "kernel internal header with absolutely no stable API > properties...." (Redhat bugzilla). > > So, now I've written this email , I think we should take out the > include and put in the #define. I'll give it a try tonight. I double-checked and my somewhat tweaked RH Linux distro doesn't have an asm/msr.h. I'd rather not try to find out if there is an rdtscll() defined somewhere else. jeremy From barry at barrys-emacs.org Wed Sep 15 20:56:34 2004 From: barry at barrys-emacs.org (Barry Scott) Date: Wed Sep 15 20:57:18 2004 Subject: [Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292) In-Reply-To: <41476A99.9030305@v.loewis.de> References: <091420041757.13290.414731190005E51E000033EA2200751150020E090E020E04000B970104040E@comcast.net> <20040914191528.GA7964@alcyon.progiciels-bpi.ca> <904FAAD7-0697-11D9-9E6D-000A95A8705A@barrys-emacs.org> <41476A99.9030305@v.loewis.de> Message-ID: On Sep 14, 2004, at 23:03, Martin v. L?wis wrote: > I think this is historically incorrect. ISO 10646 and Unicode were > developed in lock-step, and the very first publication of ISO 10646 > (in 1993) had precisely the same character assignments as Unicode 1.1. > Ever since then, both standards are roughly the same. ISO is not known for its speed. You are probable right about publication date. However I'm sure I had my draft iso 10646 a long time before the unicode got going. But its all a long time ago, I'll not bet on it. > ... assuming encodings are the only issue in creating international > software. Of course you are right its one part of the puzzle to allow a piece of software to be acceptable in a particular culture. Barry From martin at v.loewis.de Wed Sep 15 21:06:19 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 21:06:22 2004 Subject: [Python-Dev] tempfile.TemporaryFile on Windows NT In-Reply-To: <1f7befae0409151151486156b@mail.gmail.com> References: <4147D8C9.3020508@v.loewis.de> <1f7befae0409151151486156b@mail.gmail.com> Message-ID: <414892AB.7010403@v.loewis.de> Tim Peters wrote: > Yes, but file.name is read-only now -- you can't change file.name from > Python code. > > >>If so, should this be better achieved by os.fdopen grow >>a name argument So what about adding a name argument to fdopen? Regards, Martin From bac at OCF.Berkeley.EDU Wed Sep 15 20:26:49 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Wed Sep 15 21:17:38 2004 Subject: [Python-Dev] python-dev Summary for 2004-08-16 through 2004-08-31 [draft] Message-ID: <41488969.70909@ocf.berkeley.edu> OK, using the new coverage style of "whatever Brett finds interesting", here is the next summary. Plan to send this out some time this upcoming weekend so edits need to get in between now and then. If anyone thinks a thread should be covered (see "Skipped Threads"), write a summary and I will add it with an author mention. -------------------------------------- ===================== Summary Announcements ===================== After asking people last week about how they wanted me to change the python-dev Summary as as to allow me to retain my free time, all respondents unanimously went with the option of letting me choose what I wanted to cover. That made me happy for a couple of reasons. One is that it makes the summaries more enjoyable for me since I mostly cover stuff I like and thus should be less bored at points while writing. It also flattering in a way that people trusted what I would cover enough to not want to suggest what I should cover. It also allows me to spend more time on python-dev participating than sitting on the sidelines dreading have to summarize some 100 email thread on whether we should move over to VC 7 or something (can you tell I was bored out of my skull by that thread?). So this summary starts the new coverage style. As you can see it is much shorter than normal since I didn't try to be as thorough (7 pages compared to the usual 10-20). But I think this style allows what I do summarize to have more details than it would normally have; quality over quantity. I have reintroduced the "Skipped Threads" section of the Summaries so people can see what I skipped in case there is something they might want to read that I just didn't care about. In places where I remember something partially relevant I added a sentence on it so it isn't just a list of subject lines. Enjoy. ========= Summaries ========= ------------- PEP movements ------------- `PEP 3000`_ (Python 3.0 Plans) came into creation. This text's point of existence is to list changes known to be planned for Python 3.0 and not hypothetical (Guido suggested all hypothetical talk call the version Python 3000, partially for marketing purposes). Plus I am a co-author and it finally completes my time in the `School of Hard Knocks`_. =) `PEP 333`_ (Python Web Server Gateway Interface v1.0) proposes a "standard interface between web servers and Python web applications or frameworks, to promote web application portability across a variety of web servers". `PEP 309`_ (Partial Function Application) was updated with some details. `PEP 334`_ (Simple Coroutines via SuspendIteration) came into existence to suggest coming up with some form of lightweight coroutines. .. _School of Hard Knocks: http://mail.python.org/pipermail/python-dev/2002-September/028725.html .. _PEP 3000: http://www.python.org/peps/pep-3000.html .. _PEP 333: http://www.python.org/peps/pep-0333.html .. _PEP 309: http://www.python.org/peps/pep-0309.html .. _PEP 334: http://www.python.org/peps/pep-0334.html Contributing threads: - `Minimal 'stackless' PEP using generators? `__ -------------------------------- Decorators "issue" mostly solved -------------------------------- While the hubbub over using a character for decorators was brewing, people began suggesting reserving a character that would never be used in Python for anything. The thought was that people who wanted to use a character to represent application-specific information could use the reserved symbol and not have to worry about clashing with possible future features like Leo and IPython are with the use of '@'. But no reservation of a character occurred. Towards the end of the month, to meet the a3 deadline, a unified proposal from the community came forward led by Robert Brewer and Michael Sparks. They pushed the J2 proposal:: using: somedecorator staticmethod def func(): pass Guido contemplated the proposal, saying "it got pretty darn close" to being accepted, but in the end decided not to. For Guido's full reasoning see http://mail.python.org/pipermail/python-dev/2004-September/048518.html . But he said he had two key issues. One was the indentation "suggests that its contents should be a sequence of statement, but in fact it is not". Issue two was that using a keyword to start a line was a real attention grabber and that "using" did not deserve this. The topic of how the whole decorators situation was handled was touched upon. He realized that "dramatic changes must be discussed with the community at large". He was also impressed by how the community pulled together to propose an alternative as it did and hopes to see more proposals of the same quality in the future. So now what? Guido said that he would be willing to change the character used for decorators for 2.4b1 . That means if '@' drives you nuts but something else like '!' works for you then speak up and try to get the community to rally behind it. Contributing threads: - `Decorator order implemented backwards? `__ - `Considering decorator syntax on an individual feature `__ - `PEP 318: Suggest we drop it `__ - `__metaclass__ and __author__ are already decorators `__ - `Reserved Characters `__ - `PEP 318: Can't we all just get along? `__ - `Multiple decorators per line `__ - `Important decorator proposal on c.l.p. `__ - `Re: [Python-checkins] python/nondist/peps pep-0318.txt... `__ - `CO_FUTURE_DECORATORS `__ - `decorators: If you go for it, go all the way!!! :) `__ - `Re: Re: def fn (args) [dec,dec]: `__ - `J2 proposal final `__ - `(my) revisions to PEP318 finally done. `__ - `Rejecting the J2 decorators proposal `__ ----------------------------------------------------------- When should something be put under the great powers of -O ? ----------------------------------------------------------- Python has had a simple peephole optimizer in the compiler since 2.3 that optimized imported bytecode. Raymond Hettinger moved it up, though, so that the optimization would be saved to .pyc files and thus remove the need to repeat the process every time. Guido questioned this move. He thought that since it was an optimization it should fall under the -O command-line option. But then people came forward to suggest that Raymond's move was good, saying that the cost of the optimization was non-existent and thus should be used. I brought up the point that a definition of what should be considered an optimization; anything that changes the initial opcode, or something that takes extra time/memory or changes semantics? Tim Peters stepped forward and said that since the optimizations were so simple that he thought they should be kept. David Abrahams also came forward and said they should be kept to get more testing on them since they were not complex and thus did not influence debugging of code. In the end Raymond's change was kept in place. Contributing threads: - `Re: [Python-checkins] python/dist/src/Python compile.c, 2.319, 2.320 `__ ---------------------------------------- 2.4a3 out the doors so kick those tires! ---------------------------------------- `Python 2.4a3`__ has been released. As usual, please download it, run the regression tests, and report any errors you get. Since this will be the last alpha this is your last chance to get new features in before b1 comes out. The use of priorities on the SourceForge tracker has also been clarified. Anything set to 9 **must** be dealt with before the next release. Priority 8 is to be dealt with before b1; it changes functionality so if it isn't in by b1 it won't be in until the next version. Priority 7 is for something that should get in before the final release. Anthony Baxter also gained sole control of setting the priority so as to keep the settings consistent. .. _Python 2.4a3: http://www.python.org/2.4/ Contributing threads: - `2.4a3 release is September 2, SF tracker keywords `__ - - Stemming from a conversation about moving Python over to Unicode only for string representation for 3.0, the discussion of a bytes type came up. People were saying they used str to store binary data and that if str went away or no longer represented straight binary data (since Unicode has different encodings the values can change while meaning the same thing in terms of characters) they would need a way to deal with this. The idea that the array module solved this was basically dismissed since it seemed more built-in support was needed for convenience. It also meant more flexibility in terms of what interfaces were implemented. There was also some issues with getting array to work the exact way people wanted it to. The next question was whether literal support was needed. Would you really need to write something like ``b"\x66\x6f\x6f"`` instead of ``bytes([0x66, 0x6f, 0x6f])``? How all of this would play with Unicode ended up being discussed. In the end it seemed that one could encode and decode back and forth but that all work with character should be in Unicode and only decoded into bytes on the I/O barrier (writing to disk or the network, for instance) to minimize any possible encoding errors and to make usage easier. Mutability came up. Being mutable would be handy, but it killed its usage as a dictionary key. It was suggested that bytes hash to a tuple of integers representing the bytes, but nothing more was said. But in general almost everyone agreed that having the bytes type be mutable was best. `PEP 332`_ was sketched out during the early part of this discussion, but has not been updated since it died down. .. _PEP 332: http://www.python.org/peps/pep-0332.html Contributing threads: - `adding a bytes sequence type to Python `__ - `Byte string class hierarchy `__ -------------------------------------------- String substitution sure is a touchy subject -------------------------------------------- PEP 292`_ (Simpler String Substitutions) got a huge amount of discussion this past two weeks. Ignoring the syntax discussions (that was decided long ago before the PEP was accepted and had consensus and thus was a moot point) and the discussion of whether a trailing ``$`` at the end of the substitution pattern should be considered an error or not (it is), a couple of topics were discussed. To make this summary easier to follow, realize that the class that implements PEP 292 is named "Template" and thus I will just refer to the implementation by that name. The first topic was over whether Template should return Unicode objects. The side supporting it pointed out that Python 3.0 was going to be using Unicode for strings exclusively so it would be good to start using them now. It also went with the initial design of PEP 292 which was to help with i18n where Unicode is constantly used. People against, though, didn't want to suddenly be given a Unicode object when a string was used for template string passed in. That would be too surprising and lead to inconsistent usage thanks to sudden mixing of strings and Unicode objects in code. This issue was resolved by no longer subclassing unicode but making it easy to subclass Template so as to add direct Unicode support with ease. The second issue was other the design of the API. Originally Template was a class that overrode __mod__ to make it work like string interpolation works now for str and unicode. But then some people felt a class was too heavy-handed if there was no way to change the way Template worked through a subclass. This obviously led to a desire for functions to do the work for both Template and SafeTemplate (similar class to Template that left in substitution points if they didn't match any values in the dict passed in). In the end the class design was kept thanks to Tim Peters and metaclasses. Tim came up with a neat way to have the regex be generated at class creation time through a metaclass and thus allow subclasses to change how Template matched substitution points and such, all without a performance hit at instance creation time. Use of __mod__ and the SafeTemplate class were removed and Template grew substitute and safe_substitute methods. Everyone at this point seems happy with the design. .. _PEP 292: http://www.python.org/peps/pep-0322.html Contributing threads: - `Update PEP 292 `__ - `PEP 292 - Simpler String Substitutions `__ - `Alternative Implementation for PEP 292: Simple String Substitutions `__ - `Alternative placeholder delimiters for PEP 292 `__ ------------------------------------------- Private names considered rude in the stdlib ------------------------------------------- Anthony Baxter suggested banning use of mangled private names (names starting with ``__``) in the stdlib. His argument was that they are a hack and the stdlib is supposed to act as a good example and that name mangling was not good. Guido essentially agreed with the caveat that some uses of private names is justified such as if a private name is storing the equivalent of a 'friend' function from C++. Contributing threads: - `__mangled in stdlib considered poor form `__ =============== Skipped Threads =============== Warnocked (i.e., emails that get essentially no response) emails very insignificant threads are not listed - Find out whether PyEval_InitThreads has been called? - Unifying Long Integers and Integers: baseint - test_tempfile failure on Mac OSX - Deprecate sys.exitfunc? - multiple instances of python on XP - Adding 'lexists()' to os.path - #ifdeffery - Weekly Python Bug/Patch Summary - problem with pymalloc on the BeOS port. - Proposed change to logging - sre.py backward compatibility and PEP 291 - Dealing with test__locale failure on OS X before a3 - os.urandom API - Decoding incomplete unicode Basically culminated into new stateful UTF-8 and UTF-16 decoders but that's all I know =) - Decimal module portability to 2.3? Looks like this will happen; wait for next summary for a more concrete answer - Python icons If you think you can come up with a good icon for the Windows installer please let c.l.py know and it might get used - [Python-checkins] python/dist/src/Lib/test test_string.py, 1.25, 1.26 - list += string?? From trentm at ActiveState.com Wed Sep 15 20:59:48 2004 From: trentm at ActiveState.com (Trent Mick) Date: Wed Sep 15 21:20:25 2004 Subject: [Python-Dev] [TARGETDIR]lib-tk added to PythonPath in MSI Message-ID: <20040915115947.A26465@ActiveState.com> Round about line 1088 of Tools/msi/msi.py, lib-tk is added to the PythonPath: ("PythonPath", -1, prefix+r"\PythonPath", "", "[TARGETDIR]Lib;[TARGETDIR]DLLs;[TARGETDIR]lib-tk", "REGISTRY"), Shouldn't that be this instead? ("PythonPath", -1, prefix+r"\PythonPath", "", "[TARGETDIR]Lib;[TARGETDIR]DLLs;[TARGETDIR]Lib\\lib-tk", "REGISTRY"), Trent -- Trent Mick TrentM@ActiveState.com From pje at telecommunity.com Wed Sep 15 21:26:11 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Sep 15 21:27:28 2004 Subject: [Python-Dev] python-dev Summary for 2004-08-16 through 2004-08-31 [draft] In-Reply-To: <41488969.70909@ocf.berkeley.edu> Message-ID: <5.1.1.6.0.20040915152555.025ba470@mail.telecommunity.com> At 11:26 AM 9/15/04 -0700, Brett C. wrote: >- > >- >Stemming from a conversation about moving Python over to Unicode only for >string representation for 3.0, the discussion of a bytes type came >up. People were saying they used str to store binary data and that if str >went away or no longer represented straight binary data (since Unicode has >different encodings the values can change while meaning the same thing in >terms of characters) they would need a way to deal with this. Looks like this section was supposed to have a title, but it got lost. From theller at python.net Wed Sep 15 21:30:23 2004 From: theller at python.net (Thomas Heller) Date: Wed Sep 15 21:35:59 2004 Subject: [Python-Dev] PyExc_UnicodeDecodeError In-Reply-To: <41488DEE.1030506@v.loewis.de> ( =?iso-8859-1?q?Martin_v._L=F6wis's_message_of?= "Wed, 15 Sep 2004 20:46:06 +0200") References: <41486148.7090007@egenix.com> <41488DEE.1030506@v.loewis.de> Message-ID: "Martin v. L?wis" writes: > Thomas Heller wrote: >> In this context: I find Exceptions being much too underdocumented. > [...] >> I'm not sure what there can be done about that, maybe keep exceptions.py >> in sync (although unused) with the current code, and point to it from >> the docs? > > The best solution for missing, incomplete, and incomprehensible > documentation is to add, complete, and rewrite the documentation. > Do you volunteer? I known. Maybe I'll do something about it. So far I think it's difficult to describe the behaviour of the exception classes - that was my impression when I looked at (for example) the description of EnvironmentError in the library docs, and compared that to the code in Do others share this impression, or is it me only? As I said, one idea would be to keep exceptions.py, although unused, in sync with the current C code, and include it in the docs. Another idea that came to my mind is the include the Python code in the docstring. Thomas From bac at OCF.Berkeley.EDU Wed Sep 15 21:40:21 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Wed Sep 15 21:40:32 2004 Subject: [Python-Dev] python-dev Summary for 2004-08-16 through 2004-08-31 [draft] In-Reply-To: <5.1.1.6.0.20040915152555.025ba470@mail.telecommunity.com> References: <5.1.1.6.0.20040915152555.025ba470@mail.telecommunity.com> Message-ID: <41489AA5.8090206@ocf.berkeley.edu> Phillip J. Eby wrote: > At 11:26 AM 9/15/04 -0700, Brett C. wrote: > >> - >> >> - >> Stemming from a conversation about moving Python over to Unicode only >> for string representation for 3.0, the discussion of a bytes type came >> up. People were saying they used str to store binary data and that if >> str went away or no longer represented straight binary data (since >> Unicode has different encodings the values can change while meaning >> the same thing in terms of characters) they would need a way to deal >> with this. > > > Looks like this section was supposed to have a title, but it got lost. > Yep. Fixed in my copy now. Thanks, Phillip. -Brett From martin at v.loewis.de Wed Sep 15 21:57:29 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 21:57:32 2004 Subject: [Python-Dev] PyExc_UnicodeDecodeError In-Reply-To: References: <41486148.7090007@egenix.com> <41488DEE.1030506@v.loewis.de> Message-ID: <41489EA9.3040205@v.loewis.de> Thomas Heller wrote: > Do others share this impression, or is it me only? I never have the need to raise any exception except for standard errors taking a single char*, so I probably haven't noticed, yet. If my impression is right, and people either raise their own exceptions, or "simple" standard errors, your usage of exceptions would count as "guru application". Gurus are expected to read and understand the source code of the interpreter, so I don't care much about the state of the documentation in this area. Regards, Martin From tim.peters at gmail.com Wed Sep 15 22:30:55 2004 From: tim.peters at gmail.com (Tim Peters) Date: Wed Sep 15 22:31:24 2004 Subject: [Python-Dev] tempfile.TemporaryFile on Windows NT In-Reply-To: <414892AB.7010403@v.loewis.de> References: <4147D8C9.3020508@v.loewis.de> <1f7befae0409151151486156b@mail.gmail.com> <414892AB.7010403@v.loewis.de> Message-ID: <1f7befae04091513304eebed0c@mail.gmail.com> [Martin v. L?wis] > So what about adding a name argument to fdopen? I suppose. The file wrapper on Windows never bothered me, although Windows breakage in the tempfile code has cost me plenty of grief. So I'm much more concerned about not breaking it again than in repairing an inelegance that didn't violate my sense of aesthetics to begin with . From mike at skew.org Wed Sep 15 23:04:16 2004 From: mike at skew.org (Mike Brown) Date: Wed Sep 15 23:04:15 2004 Subject: [Python-Dev] urllib.urlopen() vs IDNs, percent-encoded hosts, ':' Message-ID: <200409152104.i8FL4G14033121@chilled.skew.org> Over the last couple of years, while implementing an RFC 2396 and RFC 2396bis compliant URI library for 4Suite, I've amassed a sizable list of, um, complaints about urllib. Many of the issues I have run into are attributable to the age of urllib (I am pretty sure it predates the unicode type) and the obsolescence of the specs on which parts of it are based (it's essentially in RFC 1808 land, with a smattering of patches to bring aspects of it closer to RFC 2396). Other issues are matters of API entrenchment, either for the convenience for users (e.g. treating '/' and '\' as equivalent on Windows) or for compatibility with the APIs of other libraries & applications. When I'm comfortable enough with 4Suite's Ft.Lib.Uri APIs I intend to formally propose incorporating updated implementations into Python core, perhaps distributed among urllib, urllib2, and urlparse or maybe in a new module, as appropriate. I'm not really ready to make such a proposal, though, as I still have some philosophical questions about str/unicode transparency in APIs (e.g. urllib.unquote, when given unicode, does not percent-decode characters above \u007f, and I'm wondering if that's ideal), and I am also unclear on what the policy is regarding using regular expressions in core Python modules -- it seems to be a no-no, but I don't know for sure... any comments on that particular matter would be appreciated. Anyway, there's at least one part of Ft.Lib.Uri that I think could stand to be addressed more immediately: there is a bit of transformation that one must perform on a spec-conformant URI in order to get urllib.urlopen() to process it correctly. This should not be necessary, IMHO. The main issues are: 1. urlopen() cannot reliably process unicode unless there are no percent-encoded octets above %7F and no characters above \u007f (I think that's the gist of it, at least). I don't think this is necessarily a bug, as a proper URI will never contain non-ASCII characters. However since urlopen()'s API is unfortunately such that it accepts OS-specific filesystem paths, which nowadays may be unicode, it may be time to tighten up the API and say that the url argument *must* be a URI, and that if unicode is given, it will be converted to str and thus must not contain non-ASCII characters. 2. urlopen() (the URI scheme-specific openers it uses, actually) does not percent-decode the host portion of a URL before doing a DNS lookup. This wasn't really a problem until IDNs came along; no one was using non-ASCII in their hostnames. But now we have to deal with URLs where the host component is a string of percent-encoded UTF-8 octets, like 'http://www.%E3%81%BB%E3%82%93%E3%81%A8%E3%81%86%E3%81%AB%E3%81%AA%E3%81%8C%E3%81%84%E3%82%8F%E3%81%91%E3%81%AE%E3%82%8F%E3%81%8B%E3%82%89%E3%81%AA%E3%81%84%E3%81%A9%E3%82%81%E3%81%84%E3%82%93%E3%82%81%E3%81%84%E3%81%AE%E3%82%89%E3%81%B9%E3%82%8B%E3%81%BE%E3%81%A0%E3%81%AA%E3%81%8C%E3%81%8F%E3%81%97%E3%81%AA%E3%81%84%E3%81%A8%E3%81%9F%E3%82%8A%E3%81%AA%E3%81%84.w3.mag.keio.ac.jp/' which are supposed decoded back to Unicode (in this case, it's a string of Japanese characters) and then IDNA-encoded for the DNS lookup, so that it will be interpreted as if it were the equally-unintelligible-but-DNS-friendly 'http://www.xn--n8jaaaaai5bhf7as8fsfk3jnknefdde3fg11amb5gzdb4wi9bya3kc6lra.w3.mag.keio.ac.jp/' Even though IDNs are the main application for percent-encoded octets in the host component, it is necessary in simpler cases as well, like 'http://www.w%33.org' which would need to be interpreted as 'http://www.w3.org' Python 2.3 introduced an IDNA codec, and both the socket and httplib modules were updated to accept unicode hostnames (e.g. the Japanese characters represented by, but not shown, in the examples above), automatically applying IDNA encoding prior to doing the DNS lookup. urllib's urlopeners were *not* updated accordingly. This should be changed. The way I do it in Ft.Lib.Uri is to rewrite the hostname, regardless of its URI scheme (since once I pass it to urlopen it's out of my hands), to a percent-decoded, IDNA-encoded version before passing it to urlopen. Ideally it should be handled by each opener as necessary, I think. 3. On Windows, urlopen() only recognizes '|' as a Windows drivespec character, whereas ':' is just as, if not more, common in 'file' URIs. file:///C:/Windows/notepad.exe is a perfectly valid 'file' URI and should not fail to be interpreted on Windows as C:\Windows\notepad.exe. Currently the only way to get it to work is to replace the ':' with '|', which was established in the days of the Mosaic web browsers, I believe, and that has remained as a widely supported, but arbitrary & unnecessary convention. I would prefer that all the APIs that expect '|' instead of ':' be updated to not consider '|' to be canon, but the simplest workaround for the sake of using ':'-containing URIs with urllib.urlopen() is just to do a simle string replacement in the path, e.g. if os.name == 'nt' and scheme == 'file': path = path.replace(':','|',1) (assuming you've already got the path and scheme components of the given URI split out). I would appreciate any comments that anyone has on the feasibility of these suggestions. Thanks, Mike P.S. If you're curious, the current version of Ft.Lib.Uri is at http://cvs.4suite.org/cgi-bin/viewcvs.cgi/4Suite/Ft/Lib/Uri.py and a test suite for it (which relies on a custom framework, not unittest, but that should be fairly understandable anyway) is at http://cvs.4suite.org/cgi-bin/viewcvs.cgi/4Suite/test/Lib/test_uri.py The function that I am currently using to massage a URI to make it safe for urllib.urlopen() is named MakeUrllibSafe. I wouldn't recommend it as-is, though, since it relies on other functions that deal with more convoluted unicode issues that I'm trying to avoid asking about in this post. From martin at v.loewis.de Wed Sep 15 23:18:10 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Sep 15 23:18:12 2004 Subject: [Python-Dev] tempfile.TemporaryFile on Windows NT In-Reply-To: <1f7befae04091513304eebed0c@mail.gmail.com> References: <4147D8C9.3020508@v.loewis.de> <1f7befae0409151151486156b@mail.gmail.com> <414892AB.7010403@v.loewis.de> <1f7befae04091513304eebed0c@mail.gmail.com> Message-ID: <4148B192.2040809@v.loewis.de> Tim Peters wrote: > I suppose. The file wrapper on Windows never bothered me, although > Windows breakage in the tempfile code has cost me plenty of grief. So > I'm much more concerned about not breaking it again than in repairing > an inelegance that didn't violate my sense of aesthetics to begin with > . Ah, ok. This is probably the time to present my case: Somebody complained on c.l.p that isinstance(tempfile.TemporaryFile(), file) gives True on Linux but False on Windows. While this result is "in principle correct", I think something can be done to make it correct practically, too. Regards, Martin From martin at v.loewis.de Wed Sep 15 23:40:01 2004 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed Sep 15 23:40:14 2004 Subject: [Python-Dev] urllib.urlopen() vs IDNs, percent-encoded hosts, ':' In-Reply-To: <200409152104.i8FL4G14033121@chilled.skew.org> References: <200409152104.i8FL4G14033121@chilled.skew.org> Message-ID: <4148B6B1.4040902@v.loewis.de> Mike Brown wrote: > 1. urlopen() cannot reliably process unicode unless there are no > percent-encoded octets above %7F and no characters above \u007f > (I think that's the gist of it, at least). And that feature is by design. URLs are conceptually byte strings, not character strings, so passing Unicode strings is mostly a meaningless operation. Mostly - because if the Unicode string is pure ASCII, it probably matches most implementations and user expectations to convert it to pure ASCII first, and then treat it as a URL. IETF is working on resolving the issue, by introducing IRIs. It appears that draft-duerst-iri-09.txt is what will become the relevant RFC. Once the RFC is published, urllib and urllib2 should be updated to support IRIs; contributions are welcome. > I don't think this is necessarily a bug, as a proper URI will never contain > non-ASCII characters. However since urlopen()'s API is unfortunately such that > it accepts OS-specific filesystem paths, which nowadays may be unicode, it may > be time to tighten up the API and say that the url argument *must* be a URI, > and that if unicode is given, it will be converted to str and thus must not > contain non-ASCII characters. No. I'ld rather prefer to specify that it if it is a Unicode string, it must be an IRI, and is converted to an URI according to the IRI spec. > 2. urlopen() (the URI scheme-specific openers it uses, actually) does not > percent-decode the host portion of a URL before doing a DNS lookup. > > This wasn't really a problem until IDNs came along; no one was using non-ASCII > in their hostnames. But now we have to deal with URLs where the host component > is a string of percent-encoded UTF-8 octets. Hmm. I think there is no backup in any standard for doing that. Applications that put URL-escaped UTF-8 bytes into host names deserve to lose. There are two valid ways for putting non-ASCII characters into the hostname part of an URL: use Unicode strings, or use IDNA. It may be that IRIs add another way (I haven't checked this aspect specifically), but unless there is some RFC supporting such a protocol, any response by urllib is fine, exceptions preferred. > Even though IDNs are the main application for percent-encoded octets in the > host component, it is necessary in simpler cases as well, like > > 'http://www.w%33.org' > > which would need to be interpreted as > > 'http://www.w3.org' We would have to check: this might be valid usage, but I somewhat doubt it. > urllib's urlopeners were *not* updated accordingly. This should be changed. The change was deliberately deferred until the IRI RFC is published. > 3. On Windows, urlopen() only recognizes '|' as a Windows drivespec character, > whereas ':' is just as, if not more, common in 'file' URIs. I have long ago given up trying to understand this issue. I'm happy to change this forth and back about once or twice a year, until somebody comes up with a clear and definitive story, backed up by standards and product documentation, so that we might get a stable implementation some day. Feel free to write patches. Regards, Martin From gvanrossum at gmail.com Wed Sep 15 23:46:43 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 15 23:46:46 2004 Subject: [Python-Dev] Strawman decision: @decorator won't change Message-ID: Anthony Baxter asked me for a pronouncement on whether @decorator will change to use some other character instead; I kept this open as a possibility before 2.4b1 (which is tentatively scheduled for Oct 7th). Given the near-complete silence following my rejection of the J2 alternative proposal, I don't expect there to be a massive popular movement to change the character, but I admit I haven't looked for responses outside python-dev. Let's plan on doing the following. If in the next 7 days there's no indication that some group of users wants to rally for a different character, the decision to keep @ is made final on Sept 23. To change the character, somebody will need to start rallying for a different character, and be able to show signs of significant support by that date. The definition of "significant support" is intentionally left open for interpretation, I'll review the evidence on the 23rd. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From s.percivall at chello.se Thu Sep 16 01:03:20 2004 From: s.percivall at chello.se (Simon Percivall) Date: Thu Sep 16 01:03:23 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py Message-ID: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> The title says it all, tabs breaking installation. Lines 537 and 538 in httplib.py Lines 124, 129, 130, 131 in test_httplib.py //Simon From bac at OCF.Berkeley.EDU Thu Sep 16 01:27:29 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Sep 16 01:27:42 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py In-Reply-To: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> Message-ID: <4148CFE1.5010503@ocf.berkeley.edu> Simon Percivall wrote: > The title says it all, tabs breaking installation. > > Lines 537 and 538 in httplib.py > Lines 124, 129, 130, 131 in test_httplib.py > Fixed. Bad, Martin, bad! =) -Brett From mike at skew.org Thu Sep 16 02:10:17 2004 From: mike at skew.org (Mike Brown) Date: Thu Sep 16 02:10:20 2004 Subject: [Python-Dev] urllib.urlopen() vs IDNs, percent-encoded hosts, ':' In-Reply-To: <4148B6B1.4040902@v.loewis.de> =?UTF-8?Q?from_=22Martin_v=2E_L=C3=B6wis=22_at_Sep_15=2C_2004_11=3A40=3A01_p?= =?UTF-8?Q?m?= Message-ID: <200409160010.i8G0AI78034010@chilled.skew.org> "Martin v. L?wis" wrote: > Mike Brown wrote: > > 1. urlopen() cannot reliably process unicode unless there are no > > percent-encoded octets above %7F and no characters above \u007f > > (I think that's the gist of it, at least). > > And that feature is by design. URLs are conceptually byte strings, > not character strings, so passing Unicode strings is mostly a > meaningless operation. No. The intent is actually that a URI is (not conceptually, just *is*) a string of characters; the syntax is only defined in terms of bytes due to peculiarities of the grammar. A percent-encoded sequence conceptually represents an encoded character, or part of one in the case of multibyte encodings, that may or may not be allowed by the syntax to appear as a literal character in that part of the URI. This was actually clear in RFC 2396 sections 1.5 and 2, but has been explained somewhat better in the rephrased section 2 of rfc2396bis, which is in Last Call. As for what was by design, the fact that a unicode url arg fails relatively deep in the processing (generally when it gets handed to urllib.unquote) or a resolver, and that it isn't ASCII-fied first, and that this isn't documented, and that urlopen() seems to be designed to be a URL-or-filepath-opener, all seems to indicate to me that this 'design' isn't very deliberate. > Mostly - because if the Unicode string is > pure ASCII, it probably matches most implementations and user > expectations to convert it to pure ASCII first, and then treat it > as a URL. Well, we can take it for granted that an object that purports to be a URI must consist only of characters from a limited subset of ASCII. If the object is unicode, then there is no ambiguity about what each item in the sequence means, it's just a character and it must be in the allowed set in order to be interpreted unambiguously, so unicode is actually the ideal type of argument to urlopen(). If the object is a byte str, then we can pretty much assume that each byte represents its ASCII equivalent and is subject to the same restrictions, although this should be documented, lest someone pass in a UCS-2 or UTF-16 string expecting its characters to be magically decoded. The question is, does the url argument to urlopen() purport to be or is it assumed to be a URL? The function is quite lenient about what it accepts as a URL -- it accepts pretty much anything you give it, be it unicode or str, with or without a scheme component, relative to some unknown base, and loaded with illegal characters, and it tries to deal with it as best it can -- yet it still rejects or inconsistently handles some valid URIs, and this is what I want to see changed. Perhaps I should rephrase part of the issue this way: If the argument to urlopen() is assumed to be a URI, then %FF in the argument should not be interpreted any differently when the argument is a str vs when it is unicode. RFC 2396 left it ambiguous as to what characters are represented by %80-%FF, so an implementation thereof may make such interpretations as it pleases. The current implementation doesn't do this in a consistent manner. > IETF is working on resolving the issue, by introducing IRIs. It > appears that draft-duerst-iri-09.txt is what will become the relevant > RFC. Once the RFC is published, urllib and urllib2 should be updated > to support IRIs; contributions are welcome. > > > I don't think this is necessarily a bug, as a proper URI will never contain > > non-ASCII characters. However since urlopen()'s API is unfortunately such that > > it accepts OS-specific filesystem paths, which nowadays may be unicode, it may > > be time to tighten up the API and say that the url argument *must* be a URI, > > and that if unicode is given, it will be converted to str and thus must not > > contain non-ASCII characters. > > No. I'd rather prefer to specify that it if it is a Unicode string, it > must be an IRI, and is converted to an URI according to the IRI spec. OK, that's probably a good way to go about it. You should note however that percent-encoded sequences are legal in IRIs and pass through unchanged in the conversion to URI, so this does not solve the problem of how they are interpreted (i.e. the %80-%FF pass-through in certain situations). In an IRI that you construct yourself, you are much less likely to ever see a percent-encoded octet, but nevertheless, being a superset of URI, any IRI may contain them. > > 2. urlopen() (the URI scheme-specific openers it uses, actually) does not > > percent-decode the host portion of a URL before doing a DNS lookup. > > > > This wasn't really a problem until IDNs came along; no one was using non-ASCII > > in their hostnames. But now we have to deal with URLs where the host component > > is a string of percent-encoded UTF-8 octets. > > Hmm. I think there is no backup in any standard for doing that. OK, you're right; it was in an IETF draft of its own (draft-uri-idn-something) and in February of this year was folded into rfc2396bis. How IDNs are represented in URIs is indeed currently restricted to IDNA (RFC 3490) only, by virtue of the fact that RFC 2396 forbids percent-encoding in hostnames. I sometimes forget which aspects of rfc2396bis are changes from RFC 2396 and its predecessors, and which are clarifications / bugfixes. > Applications that put URL-escaped UTF-8 bytes into host names deserve to > lose. Come February or whenever rfc2396bis and the IRI draft become RFCs, that will no longer be a position you can maintain. > There are two valid ways for putting non-ASCII characters into the > hostname part of an URL: use Unicode strings, or use IDNA. It may be > that IRIs add another way (I haven't checked this aspect specifically) They do by virtue of reference to "RFCYYYY" which is a placeholder for the RFC that the rfc2396bis draft will become, pending approval. > but unless there is some RFC supporting such a protocol, any response > by urllib is fine, exceptions preferred. Consider it a feature request then. > > urllib's urlopeners were *not* updated accordingly. This should be changed. > > The change was deliberately deferred until the IRI RFC is published. OK. > > 3. On Windows, urlopen() only recognizes '|' as a Windows drivespec character, > > whereas ':' is just as, if not more, common in 'file' URIs. > > I have long ago given up trying to understand this issue. I'm happy to > change this forth and back about once or twice a year, until somebody > comes up with a clear and definitive story, backed up by standards and > product documentation, so that we might get a stable implementation some > day. Feel free to write patches. OK, a few points to understand: - There is no canonical form of 'file' URI for any OS path. All conventions are established by implementations. - 'file' as a URL scheme is very vaguely specified. It is being revised now but the revision may not be any better, from what I've seen so far on the mailing list for it. - No RFC disallows ":" in the path component of any URL, except when it needs to appear in the first segment of the path component of what is now called a relative URI reference, when that path component is hierarchical (as determined by the scheme). In that situation, the segment must be prepended with './' in order to ensure that it is interpreted correctly. Thus 'C:/autoexec.bat' as a URI reference (like in an href in an HTML doc) must be interpreted as scheme 'C' (not 'file'), and (by RFC 2396) non-hierarchical path '/autoexec.bat' or (by rfc2396bis) authority/hostname autoexec.bat, path ''. In either case it shouldn't be resolvable. Meanwhile, './C:/autoexec.bat' is scheme , authority , path './C:/autoexec.bat', which is much less ambiguous. Using '|' allows one to write 'C|/autoexec.bat' as a relative URI reference, but that is, as far as I can tell, the only advantage to using it. Let me be clear though - I am not suggesting getting rid of support for '|'. I am merely saying that there is no reason ':' should, on Windows, fail to be treated the same as '|' for the purpose of representing the ':' in a drivespec. From nnorwitz at gmail.com Thu Sep 16 03:27:57 2004 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu Sep 16 03:28:03 2004 Subject: [Python-Dev] --with-tsc compile fails In-Reply-To: References: <2mwtyvwsg4.fsf@starship.python.net> Message-ID: On Wed, 15 Sep 2004 14:56:43 -0400, Jeremy Hylton wrote: > On Wed, 15 Sep 2004 14:51:55 +0100, Michael Hudson wrote: > > > > if my limited googling is anything to go by. It also seems asm/msr.h > > is a "kernel internal header with absolutely no stable API > > properties...." (Redhat bugzilla). > > > > So, now I've written this email , I think we should take out the > > include and put in the #define. In RedHat 9 and Fedora Core 1, msr.h is not installed under /usr/include/. There are only versions for x86 and amd64 in the kernel source. Michael's suggestion about adding the #define is probably the best way to handle it for now. Neal From greg at cosc.canterbury.ac.nz Thu Sep 16 03:46:52 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu Sep 16 03:46:58 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <34EB398C-06E1-11D9-AC9C-000A95A50FB2@fuhm.net> Message-ID: <200409160146.i8G1kqRf014588@cosc353.cosc.canterbury.ac.nz> > Consider also this: > x and 4 or 5 > which is of course a common idiom to workaround the lack of an > if-then-else expression. Actually, I hope it isn't common, because it's flawed. It doesn't always work properly even with current Python semantics. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu Sep 16 03:48:25 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu Sep 16 03:48:30 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <34EB398C-06E1-11D9-AC9C-000A95A50FB2@fuhm.net> Message-ID: <200409160148.i8G1mPBJ014594@cosc353.cosc.canterbury.ac.nz> > PS: Perl6 has distinct element-wise operators ("hyper" operators). I > find that less distasteful than misusing regular operators as > element-wise operators, when they really have vastly different > semantics. There was a huge discussion about that a while back. I don't think anything came of it, though. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu Sep 16 04:01:29 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu Sep 16 04:01:34 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: Message-ID: <200409160201.i8G21TCL014610@cosc353.cosc.canterbury.ac.nz> > It's not that you couldn't make numarrays short circuit. In the > expression "a and b", if all the elements of a are false, then we > can skip evaluating b. I'm just not sure that this is a good idea. Whether it would be worth it would be application-dependent, i.e. it would only help if pre-scanning all the elements of a were cheaper enough than evaluating b. Probably not a good idea to make it the default behaviour. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.peters at gmail.com Thu Sep 16 04:34:58 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 16 04:35:05 2004 Subject: [Python-Dev] tempfile.TemporaryFile on Windows NT In-Reply-To: <4148B192.2040809@v.loewis.de> References: <4147D8C9.3020508@v.loewis.de> <1f7befae0409151151486156b@mail.gmail.com> <414892AB.7010403@v.loewis.de> <1f7befae04091513304eebed0c@mail.gmail.com> <4148B192.2040809@v.loewis.de> Message-ID: <1f7befae0409151934593ea8b4@mail.gmail.com> [Martin v. L?wis] > Ah, ok. This is probably the time to present my case: Somebody > complained on c.l.p that isinstance(tempfile.TemporaryFile(), file) > gives True on Linux but False on Windows. While this result is > "in principle correct", I think something can be done to make it > correct practically, too. I'm not going to object, but writing "isinstance(..., file)" is almost never a *practical* thing to do in Python code anyway, so I don't personally see the attraction. Since tons of file-like objects aren't instances of __builtin__.file, and it doesn't make a lick of difference that they aren't, "isinstance(..., file)" isn't in the practical Pythoneer's vocabulary. That doesn't mean you can't want this change for inscrutable reasons, though . From anthony at interlink.com.au Thu Sep 16 04:48:10 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 16 04:49:09 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: <4148885A.5090803@v.loewis.de> References: <4148885A.5090803@v.loewis.de> Message-ID: <4148FEEA.4050801@interlink.com.au> Martin v. L?wis wrote: > David G Mills wrote: > >> And where can we get a copy of this new 'official' httplib? > > > As usual: In the CVS. As well as in 2.4b1, and, I assume 2.3.5, assuming the fix gets backported. -- Anthony Baxter It's never too late to have a happy childhood. From skip at pobox.com Thu Sep 16 05:09:34 2004 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 16 05:10:50 2004 Subject: [Python-Dev] httplib is not v6 compatible, is this going to be fixed? In-Reply-To: <4148FEEA.4050801@interlink.com.au> References: <4148885A.5090803@v.loewis.de> <4148FEEA.4050801@interlink.com.au> Message-ID: <16713.1006.188325.822437@montanaro.dyndns.org> Anthony> ... and, I assume 2.3.5, assuming the fix gets backported. I missed this as well. Will backport. Skip From mike at skew.org Thu Sep 16 06:20:10 2004 From: mike at skew.org (Mike Brown) Date: Thu Sep 16 06:20:08 2004 Subject: [Python-Dev] urllib.urlopen() vs IDNs, percent-encoded hosts, ':' In-Reply-To: <200409160010.i8G0AI78034010@chilled.skew.org> "from Mike Brown at Sep 15, 2004 06:10:17 pm" Message-ID: <200409160420.i8G4KAjI035125@chilled.skew.org> > Meanwhile, './C:/autoexec.bat' is scheme , > authority , path './C:/autoexec.bat', I meant to say, path . From greg at cosc.canterbury.ac.nz Thu Sep 16 06:39:19 2004 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu Sep 16 06:39:34 2004 Subject: [Python-Dev] ANN: PEP 335: Overloadable Boolean Operators In-Reply-To: <5.1.1.6.0.20040915115744.03364ae0@mail.telecommunity.com> Message-ID: <200409160439.i8G4dJU5014867@cosc353.cosc.canterbury.ac.nz> > Well, it's definitely syntax and it's definitely a tree, so it's at least > an ST. :) I'd call it a VST (Verbose Syntax Tree). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin at v.loewis.de Thu Sep 16 08:37:39 2004 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu Sep 16 08:37:41 2004 Subject: [Python-Dev] urllib.urlopen() vs IDNs, percent-encoded hosts, ':' In-Reply-To: <200409160010.i8G0AI78034010@chilled.skew.org> References: <200409160010.i8G0AI78034010@chilled.skew.org> Message-ID: <414934B3.6030004@v.loewis.de> Mike Brown wrote: > No. The intent is actually that a URI is (not conceptually, just *is*) a > string of characters You are right: URIs are meant to be written on paper. However, RFC 2396 also acknowledges that the issue of non-ASCII characters is unresolved. It suggests (in 2.1) that the URI scheme should specify the interpretation of byte values. > This was actually clear in RFC 2396 sections 1.5 and 2, but has been explained > somewhat better in the rephrased section 2 of rfc2396bis, which is in Last > Call. This suggests that new URI schemes should mandate UTF-8 in the components, but is silent on the issue of existing schemes. > The question is, does the url argument to urlopen() purport to be or is it > assumed to be a URL? The function is quite lenient about what it accepts as a > URL -- it accepts pretty much anything you give it, be it unicode or str, with > or without a scheme component, relative to some unknown base, and loaded with > illegal characters, and it tries to deal with it as best it can -- yet it > still rejects or inconsistently handles some valid URIs, and this is what I > want to see changed. If something passed to it is clearly a valid URL, and there is a clear definition of how a computer should process it, and urllib doesn't, than this is certainly a bug and should be fixed. Can you give an example of such a URL? > Perhaps I should rephrase part of the issue this way: If the argument to > urlopen() is assumed to be a URI, then %FF in the argument should not be > interpreted any differently when the argument is a str vs when it is unicode. Certainly. Indeed, urllib makes no difference, AFAICT. "http://localhost/%FF" and u"http://localhost/%FF" are processed in the same way. > RFC 2396 left it ambiguous as to what characters are represented by %80-%FF, > so an implementation thereof may make such interpretations as it pleases. > The current implementation doesn't do this in a consistent manner. No. RFC 2396 defers the specifications to the specific schema. >>Applications that put URL-escaped UTF-8 bytes into host names deserve to >>lose. > > > Come February or whenever rfc2396bis and the IRI draft become RFCs, that > will no longer be a position you can maintain. I see. I think I could accept a patch in this direction for Python 2.4 even if RFC2396bis isn't published, assuming the patch arrives before 2.4b1. > Let me be clear though - I am not suggesting getting rid of support for '|'. > I am merely saying that there is no reason ':' should, on Windows, fail to > be treated the same as '|' for the purpose of representing the ':' in a > drivespec. I know that I personally won't touch this code, except for applying patches. So if you have a clear vision of what needs to be changed and how, submit a patch. As for using regular expressions in the standard library: It seems you believe this is discouraged. I don't know why you think so - I've never heard of such a constraint before (in general - in specific cases, submitters may have been told that alternatives are more efficient). Regards, Martin From martin at v.loewis.de Thu Sep 16 08:43:43 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 16 08:43:45 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py In-Reply-To: <4148CFE1.5010503@ocf.berkeley.edu> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> <4148CFE1.5010503@ocf.berkeley.edu> Message-ID: <4149361F.3030906@v.loewis.de> Brett C. wrote: > Simon Percivall wrote: > >> The title says it all, tabs breaking installation. >> >> Lines 537 and 538 in httplib.py >> Lines 124, 129, 130, 131 in test_httplib.py >> > > Fixed. Bad, Martin, bad! =) I should learn not to use vim for Python editing... Regards, Martin From symbiont+py at berlios.de Thu Sep 16 09:15:56 2004 From: symbiont+py at berlios.de (Jeff Pitman) Date: Thu Sep 16 09:17:45 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py In-Reply-To: <4149361F.3030906@v.loewis.de> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> <4148CFE1.5010503@ocf.berkeley.edu> <4149361F.3030906@v.loewis.de> Message-ID: <200409161515.56466.symbiont+py@berlios.de> On Thursday 16 September 2004 14:43, "Martin v. L?wis" wrote: > I should learn not to use vim for Python editing... in vimrc: set softtabstop=4 set shiftwidth=4 set expandtab -- -jeff From mwh at python.net Thu Sep 16 13:56:38 2004 From: mwh at python.net (Michael Hudson) Date: Thu Sep 16 13:56:43 2004 Subject: [Python-Dev] --with-tsc compile fails In-Reply-To: <41488D41.9090905@v.loewis.de> ( =?iso-8859-1?q?Martin_v._L=F6wis's_message_of?= "Wed, 15 Sep 2004 20:43:13 +0200") References: <2mwtyvwsg4.fsf@starship.python.net> <41488D41.9090905@v.loewis.de> Message-ID: <2mbrg6whop.fsf@starship.python.net> "Martin v. L?wis" writes: > Michael Hudson wrote: >> Well, it failed like that for me both before and after my PPC changes. >> I'm fairly sure I didn't mess this up. Maybe there's some >> kernel-headers package that's necessary. >> OTOH, I think one could replace the include by >> #define rdtscll(val) \ >> __asm__ __volatile__("rdtsc" : "=A" (val)) >> if my limited googling is anything to go by. It also seems >> asm/msr.h >> is a "kernel internal header with absolutely no stable API >> properties...." (Redhat bugzilla). > > I'ld still like to understand why it fails for your system (it works > fine on mine). Do you have a definition for rdtscll in > /usr/include/asm/msr.h? I don't *have* asm/msr.h! And the impression I get is that we shouldn't be going near it with the proverbial bargepole. Cheers, mwh -- (ps: don't feed the lawyers: they just lose their fear of humans) -- Peter Wood, comp.lang.lisp From erik at heneryd.com Thu Sep 16 14:15:31 2004 From: erik at heneryd.com (Erik Heneryd) Date: Thu Sep 16 14:15:38 2004 Subject: [Python-Dev] python-dev Summary for 2004-08-16 through 2004-08-31 [draft] In-Reply-To: <41488969.70909@ocf.berkeley.edu> References: <41488969.70909@ocf.berkeley.edu> Message-ID: <414983E3.9090509@heneryd.com> Brett C. wrote: > The second issue was other the design of the API. Originally Template > was a class that overrode __mod__ to make it work like string > interpolation works now for str and unicode. But then some people felt > a class was too heavy-handed if there was no way to change the way > Template worked through a subclass. This obviously led to a desire for > functions to do the work for both Template and SafeTemplate (similar > class to Template that left in substitution points if they didn't match > any values in the dict passed in). > > In the end the class design was kept thanks to Tim Peters and > metaclasses. Tim came up with a neat way to have the regex be generated > at class creation time through a metaclass and thus allow subclasses to > change how Template matched substitution points and such, all without a > performance hit at instance creation time. Use of __mod__ and the > SafeTemplate class were removed and Template grew substitute and > safe_substitute methods. Everyone at this point seems happy with the > design. Well, not *everyone*. As expressed in the PEP 292: Method Names thread I (still) think that: 1) substitute() and safe_substitute() are far too long names for such (probably) common/frequent operations. 2) The design would be more flexible if done with the Template/SafeTemplate class approach. Less code duplication, easier to extend and it solves the long method name problem... Didn't get that much (any) positive feedback though... Erik From perry at stsci.edu Tue Sep 14 22:55:53 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Sep 16 15:14:18 2004 Subject: [Python-Dev] Re: ANN: PEP 335: Overloadable Boolean Operators Message-ID: <774A39E3-0690-11D9-8495-000A95B68E50@stsci.edu> Tim Hochberg wrote: > Phillip J. Eby wrote: > > [CHOP] > > > > As for the numeric use cases, I'm not at all clear why &, |, and ~ > (or > > special methods/functions) aren't suitable. > > They often are, but sometimes you want a logical and/or/not and &/|/~ > are mapped to bitwise and/or/not, which isn't always what you want. > Presumably, if Gregs proposal were adopted, and/or/not would get mapped > to numarray.logical_and/or/not. > I'll go further than that. *Most* of the time Numeric/numarray users want logical and/or/not. Bitwise operations are, by comparison, rarely desired. It is true that one can use the bitwise operators in place of the logical ones (and currently, that's what I tell people to use), but you better make sure the arguments are booleans or limited to 0/1 values or the result is not what is expected. In the great majority of cases the arguments are booleans, but there are times when that isn't true, and using the bitwise operators causes real grief if the user has become accustomed to using the bitwise operator mindlessly. Furthermore, most new array users naturally expect and/or/not to operate on the array and are usually very annoyed that it doesn't work. This is one of the largest (if not the largest) remaining warts for using arrays with Python. I sure would like to see the PEP accepted. No one who has tried to write many array expressions with the functional or method equivalents would argue for their use in place of the operators. Perry Greenfield From mike at skew.org Thu Sep 16 17:50:24 2004 From: mike at skew.org (Mike Brown) Date: Thu Sep 16 17:50:48 2004 Subject: [Python-Dev] URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <414934B3.6030004@v.loewis.de> =?UTF-8?Q?from_=22Martin_v=2E_L=C3=B6wis=22_at_Sep_16=2C_2004_08=3A37=3A39_a?= =?UTF-8?Q?m?= Message-ID: <200409161550.i8GFoODM038508@chilled.skew.org> "Martin v. L?wis" wrote: > You are right: URIs are meant to be written on paper. However, RFC 2396 > also acknowledges that the issue of non-ASCII characters is unresolved. > It suggests (in 2.1) that the URI scheme should specify the > interpretation of byte values. Right. This part of the thread was just about how the argument to urllib.urlopen() should be handled when given as unicode vs str. You seemed to be saying it should be str because a URI is fundamentally bytes and should be analyzed as such, whereas I'm saying no, a URI is fundamentally characters and should be analyzed as such. I mentioned %-encoding and the quirk of the BNF just because those are aspects of the syntax that are byte-oriented and are the source of much confusion, and because they may have influenced your assertion. Are we in agreement on these points? - A URL/URI consists of a finite sequence of Unicode characters; - urlopen(), and anything else that takes a URL/URI argument, must accept both str and unicode; - If given unicode, each character in the string directly represents a character in the URL/URI and needs no interpretation; - If given str, each byte in the string represents a character in the URL/URI according to US-ASCII interpretation; - Characters or bytes outside the ASCII range, and even certain characters in the ASCII range, are not permitted in a URL/URI, and thus the interpretation of a string containing them may result in an exception or other unpredictable results. If even these principles can be agreed upon, then I can submit a documentation patch, at the very least. Furthermore, what about this principle? - The urllib, urllib2, and urlparse modules currently do not claim to conform to any particular standards governing the interpretation of URLs; they merely acknowledge that some standards may be applicable. However, the intent is to provide standards-conformant behavior where possible, to the extent that the module APIs overlap with functionality mandated by current standards. When the relevant standards become obsolete due to publication of updated standards (e.g. RFC 1630 -> 1738 -> 1808 -> 2396), the implementations *may* be updated accordingly, and users should expect behavior that conforms to either the current or obsoleted standards. Which standards are applicable to a particular implementation should be documented in the module and in its functions & classes where necessary. And how about these? - urlopen() is documented as accepting a 'url' argument that is the URL of 'a network object' that can be read; a file-like object, based on either a local file or a socket, is normally returned. This 'network object' may be a local file if the 'file' scheme is used or if the URL's scheme component is omitted. For convenience, the 'url' argument is permitted to be given as a str or unicode, and may be 'absolute' or 'relative'. If RFC 2396 or rfc2396bis apply, then the argument is assumed to be what is defined in the grammar as a URI-reference. A fragment component, if present, is stripped (this requires a change to the implementation) and in all cases, the reference is resolved against a default base URI. If RFC 1808 applies (the current implementation is based largely on this spec, which did not clearly distinguish between a reference and a URI), it is what is defined in the grammar as a URL, and if it is relative (relativeURL in the grammar), it is considered to be relative to a default base URL. (This is essentially describing the current implementation in terms used by the standards). - In urlopen() and the URLOpener classes it depends on, the default base URI is the result of resolving the result of os.getcwd(), converted to a URL by some undocumented means, against the base 'file:///'. (I don't think this would require a change to the implementation, but it is a principle that should be agreed upon and documented, and perhaps the nuances of getcwd vs getcwdu should be addressed). - The resolution of URIs having the 'file' scheme is undertaken on the local filesystem according to conventions that should be, but presently aren't, documented. A preferred mapping of filesystem paths to URIs and back should be documented for each platform. - In urlopen(), the processing of a 'url' argument that is syntactically absolute may be nonconformant on platforms that use ":" in their filesystem paths. On such platforms, if the first ":" in what is syntactically an absolute URL/URI appears to be intended for use other than as a scheme component delimiter, the path will assumed to be relative. Furthermore, on Windows, '\', which is not allowed in a URL, or its equivalent percent- encoded sequence '%5C' (case-insensitive), will be interpreted as a '/' in the URL. Thus, on Windows, an argument such as r'C:\a\b\c.txt' will be treated as if it were 'file:///C:/a/b/c.txt' by the URLOpeners. This is a convenience feature for the benefit of users who do not have the means to convert an OS path to full 'file' URL. (This mostly describes current behavior, assuming we can reach agreement that the "C:" in the example above should be treated no differently than "C|"). > As for using regular expressions in the standard library: It seems you > believe this is discouraged. I don't know why you think so - I've never > heard of such a constraint before (in general - in specific cases, > submitters may have been told that alternatives are more efficient). I was just surprised to find that regular expressions are not used much in urllib, urllib2, and urlparse. The implementations seem to be going to a lot of trouble to process URLs using find() and string slices. I thought perhaps there was a good reason for this. I must attend to other things right now; will comment on the other issues later. -Mike From bac at OCF.Berkeley.EDU Thu Sep 16 18:41:15 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Sep 16 18:41:34 2004 Subject: [Python-Dev] python-dev Summary for 2004-08-16 through 2004-08-31 [draft] In-Reply-To: <414983E3.9090509@heneryd.com> References: <41488969.70909@ocf.berkeley.edu> <414983E3.9090509@heneryd.com> Message-ID: <4149C22B.8070004@ocf.berkeley.edu> Erik Heneryd wrote: > Brett C. wrote: > >> The second issue was other the design of the API. Originally Template >> was a class that overrode __mod__ to make it work like string >> interpolation works now for str and unicode. But then some people >> felt a class was too heavy-handed if there was no way to change the >> way Template worked through a subclass. This obviously led to a >> desire for functions to do the work for both Template and SafeTemplate >> (similar class to Template that left in substitution points if they >> didn't match any values in the dict passed in). >> >> In the end the class design was kept thanks to Tim Peters and >> metaclasses. Tim came up with a neat way to have the regex be >> generated at class creation time through a metaclass and thus allow >> subclasses to change how Template matched substitution points and >> such, all without a performance hit at instance creation time. Use of >> __mod__ and the SafeTemplate class were removed and Template grew >> substitute and safe_substitute methods. Everyone at this point seems >> happy with the design. > > > Well, not *everyone*. OK, it now says "practically everyone". =) -Brett From bac at OCF.Berkeley.EDU Thu Sep 16 18:45:12 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Sep 16 18:45:30 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py In-Reply-To: <200409161515.56466.symbiont+py@berlios.de> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> <4148CFE1.5010503@ocf.berkeley.edu> <4149361F.3030906@v.loewis.de> <200409161515.56466.symbiont+py@berlios.de> Message-ID: <4149C318.2010902@ocf.berkeley.edu> Jeff Pitman wrote: > On Thursday 16 September 2004 14:43, "Martin v. L?wis" wrote: > >>I should learn not to use vim for Python editing... > > > in vimrc: > > set softtabstop=4 > set shiftwidth=4 > set expandtab > I don't want this to explode into a major thread, but if people think coming up with a good vimrc file for Python would be worth having as a separate SF project send me an email **personally**; DON"T CC python-dev! Been contemplating doing this so that there is always up-to-date Vim config stuff (syntax highlighting, ai, etc.) while also leading to code that follows PEP 7 and 8 so that all of us Vim users here can check in without having to worry about not following the style guidelines. -Brett From Jack.Jansen at cwi.nl Thu Sep 16 21:48:09 2004 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Sep 16 21:47:59 2004 Subject: [Python-Dev] Python/PSF at SANE 2004 - Announcement and a request for help Message-ID: <5615AEDA-0819-11D9-8800-000D934FF6B4@cwi.nl> At this years' SANE conference (System Administration and Networking Europe, www.sane.nl) in Amsterdam there will be a Free and Open Source Bazar on wednesday evening, september 29, from 18.30 until 22.00. The bazar will be open to the general public (i.e. free as in beer), and about 20 FOSS groups will be present. In addition, Richard Stallman will present a talk. Among the groups present is, you guessed it, the Python Software Foundation. And the person who volunteered for this is, you guessed it, me. The intention is to provide visitors with information on both the Python language and the PSF. The setting is informal: there will be a tabletop and a backdrop we can use to put material up. In addition there are rooms available to hold BOF sessions. That concludes the announcement bit, on to the request bit: I'm looking for people who'd be willing to join me in manning the stand. And, ideally, also with preparing some material to put up on the backdrop and/or demonstrations we could stage (I can supply the computer, provided it's a Macintosh:-) But if you'd just like to loiter at the stand to tell people how wonderful Python is that's also very welcome. Please let me know if you're willing to help, -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From pje at telecommunity.com Thu Sep 16 23:14:07 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 16 23:13:05 2004 Subject: [Python-Dev] PEP 302 and 'reload()' In-Reply-To: <5.1.1.6.0.20040908172822.020f0a40@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040916171009.039ca5a0@mail.telecommunity.com> At 05:38 PM 9/8/04 -0400, Phillip J. Eby wrote: >It appears to me there is an error in both PEP 302's specification and its >implementation concerning the correct operation of reload(). First, it says: > > The load_module() method has a few responsibilities that it must > fulfill *before* it runs any code: > > - It must create the module object. From Python this can be done > via the new.module() function, the imp.new_module() function or > via the module type object; from C with the PyModule_New() > function or the PyImport_ModuleAdd() function. > >This should probably say that if the module already exists in sys.modules, >it should reuse the existing module object, rather than creating a new >one. Otherwise, 'reload()' cannot fulfill its contract. > >Second, the actual implementation of PyImport_ReloadModule doesn't >actually use a loader object, so reload() doesn't work with import hooks >at all. There's an SF bug report for this, and a patch to fix it (that >also adds a test to test_importhooks to ensure that 'reload()' actually >invokes the loader. > >Are there any objections to me fixing either/both of these, and >backporting the bugfix to the 2.3 maintenance branch? Since there have been no objections, I'll undertake (schedule permitting) to correct PEP 302, fix PyImport_ReloadModule and Lib/test/test_importhooks, and backport the changes. I'll note that there are other issues that affect reloading from e.g. zipfiles, but those are over my head to tackle at present. However, until the PEP 302-level issues are dealt with, there's no chance of fixing reload-from-zip, since the underlying reload mechanism itself is broken with respect to PEP 302. >Also, should PyImport_ReloadModule use the import lock? It doesn't >currently, but I'm not clear on why it doesn't. Since noone has answered this, I'll have to assume that there is a good reason, and won't fiddle with it. But I'd still appreciate an answer. From martin at v.loewis.de Thu Sep 16 23:30:39 2004 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu Sep 16 23:30:42 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <200409161550.i8GFoODM038508@chilled.skew.org> References: <200409161550.i8GFoODM038508@chilled.skew.org> Message-ID: <414A05FF.8080000@v.loewis.de> Mike Brown wrote: > Right. This part of the thread was just about how the argument to > urllib.urlopen() should be handled when given as unicode vs str. You seemed to > be saying it should be str because a URI is fundamentally bytes and should be > analyzed as such, whereas I'm saying no, a URI is fundamentally characters and > should be analyzed as such. I mentioned %-encoding and the quirk of the BNF > just because those are aspects of the syntax that are byte-oriented and are the > source of much confusion, and because they may have influenced your assertion. > > Are we in agreement on these points? I think I have to answer "no". The % notation is not a quirk of the BNF. I.e. when the BNF states that an URI contains %AC (say), this does *not* mean that the actual URI in-memory-or-on-the-wire contains the byte \xAC. The spec actually says that the URI, in memory, on the wire, or on paper, actually contains the three character '%', 'A', and 'C'. So usage of that escape mechanism is *not* a result of the BNF notation; it is the inherent desire that URIs contain only characters in ASCII. URIs that contain non-ASCII characters have to escape them "somehow", typically using the % notation. > - A URL/URI consists of a finite sequence of Unicode characters; No. An URI contains of a finite sequence of characters. Whether they are Unicode or not is not specified. The assumption certainly is that if the characters are coded (i.e. assigned to numbers), those numbers don't have to match Unicode code points at all. An URI that consists of KOI-8R characters would very well be possible. > - urlopen(), and anything else that takes a URL/URI argument, > must accept both str and unicode; Certainly. > - If given unicode, each character in the string directly represents > a character in the URL/URI and needs no interpretation; No. Only ASCII characters in the string need no interpretation. For non-ASCII characters, urllib needs to assume some escaping mechanism. > - If given str, each byte in the string represents a character in > the URL/URI according to US-ASCII interpretation; Yes, if the bytes are meaningful in ASCII. > - Characters or bytes outside the ASCII range, and even certain > characters in the ASCII range, are not permitted in a URL/URI, > and thus the interpretation of a string containing them may > result in an exception or other unpredictable results. Yes. > - The urllib, urllib2, and urlparse modules currently do not > claim to conform to any particular standards governing the > interpretation of URLs; they merely acknowledge that some > standards may be applicable. However, the intent is to provide > standards-conformant behavior where possible, to the extent > that the module APIs overlap with functionality mandated by > current standards. Yes. For input that is out of scope of existing standards, backwards > > When the relevant standards become obsolete due to publication > of updated standards (e.g. RFC 1630 -> 1738 -> 1808 -> 2396), > the implementations *may* be updated accordingly, and users > should expect behavior that conforms to either the current or > obsoleted standards. Which standards are applicable to a > particular implementation should be documented in the module > and in its functions & classes where necessary. > > And how about these? > > - urlopen() is documented as accepting a 'url' argument that is > the URL of 'a network object' that can be read; a file-like > object, based on either a local file or a socket, is normally > returned. This 'network object' may be a local file if the > 'file' scheme is used or if the URL's scheme component is omitted. > > For convenience, the 'url' argument is permitted to be given as > a str or unicode, and may be 'absolute' or 'relative'. > > If RFC 2396 or rfc2396bis apply, then the argument is assumed to > be what is defined in the grammar as a URI-reference. A fragment > component, if present, is stripped (this requires a change to the > implementation) and in all cases, the reference is resolved > against a default base URI. > > If RFC 1808 applies (the current implementation is based largely > on this spec, which did not clearly distinguish between a reference > and a URI), it is what is defined in the grammar as a URL, and > if it is relative (relativeURL in the grammar), it is considered > to be relative to a default base URL. > > (This is essentially describing the current implementation in > terms used by the standards). > > - In urlopen() and the URLOpener classes it depends on, the default > base URI is the result of resolving the result of os.getcwd(), > converted to a URL by some undocumented means, against the base > 'file:///'. > > (I don't think this would require a change to the implementation, > but it is a principle that should be agreed upon and documented, > and perhaps the nuances of getcwd vs getcwdu should be addressed). > > - The resolution of URIs having the 'file' scheme is undertaken on > the local filesystem according to conventions that should be, but > presently aren't, documented. A preferred mapping of filesystem > paths to URIs and back should be documented for each platform. > > - In urlopen(), the processing of a 'url' argument that is > syntactically absolute may be nonconformant on platforms > that use ":" in their filesystem paths. On such platforms, if the > first ":" in what is syntactically an absolute URL/URI appears to > be intended for use other than as a scheme component delimiter, > the path will assumed to be relative. Furthermore, on Windows, > '\', which is not allowed in a URL, or its equivalent percent- > encoded sequence '%5C' (case-insensitive), will be interpreted as > a '/' in the URL. > > Thus, on Windows, an argument such as r'C:\a\b\c.txt' will be > treated as if it were 'file:///C:/a/b/c.txt' by the URLOpeners. > This is a convenience feature for the benefit of users who do > not have the means to convert an OS path to full 'file' URL. > > (This mostly describes current behavior, assuming we can reach > agreement that the "C:" in the example above should be treated > no differently than "C|"). > > >>As for using regular expressions in the standard library: It seems you >>believe this is discouraged. I don't know why you think so - I've never >>heard of such a constraint before (in general - in specific cases, >>submitters may have been told that alternatives are more efficient). > > > I was just surprised to find that regular expressions are not used much in > urllib, urllib2, and urlparse. The implementations seem to be going to a > lot of trouble to process URLs using find() and string slices. I thought > perhaps there was a good reason for this. > > I must attend to other things right now; will comment on the other issues > later. > > -Mike > > From martin at v.loewis.de Thu Sep 16 23:39:00 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 16 23:39:01 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) Message-ID: <414A07F4.9000905@v.loewis.de> {I hit sent too early, here is the rest } Mike Brown wrote: > Right. This part of the thread was just about how the argument to > urllib.urlopen() should be handled when given as unicode vs str. You seemed to > be saying it should be str because a URI is fundamentally bytes and should be > analyzed as such, whereas I'm saying no, a URI is fundamentally characters and > should be analyzed as such. I mentioned %-encoding and the quirk of the BNF > just because those are aspects of the syntax that are byte-oriented and are the > source of much confusion, and because they may have influenced your assertion. > > Are we in agreement on these points? I think I have to answer "no". The % notation is not a quirk of the BNF. I.e. when the BNF states that an URI contains %AC (say), this does *not* mean that the actual URI in-memory-or-on-the-wire contains the byte \xAC. The spec actually says that the URI, in memory, on the wire, or on paper, actually contains the three character '%', 'A', and 'C'. So usage of that escape mechanism is *not* a result of the BNF notation; it is the inherent desire that URIs contain only characters in ASCII. URIs that contain non-ASCII characters have to escape them "somehow", typically using the % notation. > - A URL/URI consists of a finite sequence of Unicode characters; No. An URI contains of a finite sequence of characters. Whether they are Unicode or not is not specified. The assumption certainly is that if the characters are coded (i.e. assigned to numbers), those numbers don't have to match Unicode code points at all. An URI that consists of KOI-8R characters would very well be possible. > - urlopen(), and anything else that takes a URL/URI argument, > must accept both str and unicode; Certainly. > - If given unicode, each character in the string directly represents > a character in the URL/URI and needs no interpretation; No. Only ASCII characters in the string need no interpretation. For non-ASCII characters, urllib needs to assume some escaping mechanism. > - If given str, each byte in the string represents a character in > the URL/URI according to US-ASCII interpretation; Yes, if the bytes are meaningful in ASCII. > - Characters or bytes outside the ASCII range, and even certain > characters in the ASCII range, are not permitted in a URL/URI, > and thus the interpretation of a string containing them may > result in an exception or other unpredictable results. Yes. > - The urllib, urllib2, and urlparse modules currently do not > claim to conform to any particular standards governing the > interpretation of URLs; they merely acknowledge that some > standards may be applicable. However, the intent is to provide > standards-conformant behavior where possible, to the extent > that the module APIs overlap with functionality mandated by > current standards. Yes. For input that is out of scope of existing standards, backwards compatibility is desirable, unless there is a strong indication that Python should have raised an exception for this input all along. > When the relevant standards become obsolete due to publication > of updated standards (e.g. RFC 1630 -> 1738 -> 1808 -> 2396), > the implementations *may* be updated accordingly, and users > should expect behavior that conforms to either the current or > obsoleted standards. Which standards are applicable to a > particular implementation should be documented in the module > and in its functions & classes where necessary. Yes. > - urlopen() is documented as accepting a 'url' argument that is > the URL of 'a network object' that can be read; a file-like > object, based on either a local file or a socket, is normally > returned. This 'network object' may be a local file if the > 'file' scheme is used or if the URL's scheme component is omitted. Yes. > If RFC 1808 applies (the current implementation is based largely > on this spec, which did not clearly distinguish between a reference > and a URI), it is what is defined in the grammar as a URL, and > if it is relative (relativeURL in the grammar), it is considered > to be relative to a default base URL. This is troublesome. What is a meaningful base URL? This should be mentioned prominently. > - In urlopen() and the URLOpener classes it depends on, the default > base URI is the result of resolving the result of os.getcwd(), > converted to a URL by some undocumented means, against the base > 'file:///'. > > (I don't think this would require a change to the implementation, > but it is a principle that should be agreed upon and documented, > and perhaps the nuances of getcwd vs getcwdu should be addressed). Sounds good. > - The resolution of URIs having the 'file' scheme is undertaken on > the local filesystem according to conventions that should be, but > presently aren't, documented. A preferred mapping of filesystem > paths to URIs and back should be documented for each platform. Ok. > - In urlopen(), the processing of a 'url' argument that is > syntactically absolute may be nonconformant on platforms > that use ":" in their filesystem paths. On such platforms, if the > first ":" in what is syntactically an absolute URL/URI appears to > be intended for use other than as a scheme component delimiter, > the path will assumed to be relative. Furthermore, on Windows, > '\', which is not allowed in a URL, or its equivalent percent- > encoded sequence '%5C' (case-insensitive), will be interpreted as > a '/' in the URL. Ok. > (This mostly describes current behavior, assuming we can reach > agreement that the "C:" in the example above should be treated > no differently than "C|"). I have no problem with that. There are no one-letter URL schemata, are there? > I must attend to other things right now; will comment on the other issues > later. Take your time. This has been sitting around for many releases - one more or less doesn't matter much in the global flow of things :-) Regards, Martin From martin at v.loewis.de Thu Sep 16 23:51:24 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Sep 16 23:51:25 2004 Subject: [Python-Dev] --with-tsc compile fails In-Reply-To: <2mbrg6whop.fsf@starship.python.net> References: <2mwtyvwsg4.fsf@starship.python.net> <41488D41.9090905@v.loewis.de> <2mbrg6whop.fsf@starship.python.net> Message-ID: <414A0ADC.4060806@v.loewis.de> Michael Hudson wrote: > I don't *have* asm/msr.h! And the impression I get is that we > shouldn't be going near it with the proverbial bargepole. Ah, ok - I probably missed the relevant gcc error message about the missing header file in earlier reports. It is fine then to use a copy of the macro. It should probably apply to all installations where both __GNUC__ and __i386__ are defined. Regards, Martin From anthony at interlink.com.au Fri Sep 17 05:19:34 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Sep 17 05:20:22 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py In-Reply-To: <4149C318.2010902@ocf.berkeley.edu> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> <4148CFE1.5010503@ocf.berkeley.edu> <4149361F.3030906@v.loewis.de> <200409161515.56466.symbiont+py@berlios.de> <4149C318.2010902@ocf.berkeley.edu> Message-ID: <414A57C6.9090803@interlink.com.au> Instead of 'softtabstop', use 'set et' (expandtabs). Anthony -- Anthony Baxter It's never too late to have a happy childhood. From kbk at shore.net Fri Sep 17 05:34:23 2004 From: kbk at shore.net (Kurt B. Kaiser) Date: Fri Sep 17 05:34:30 2004 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200409170334.i8H3YNF7005875@h006008a7bda6.ne.client2.attbi.com> Patch / Bug Summary ___________________ Patches : 241 open ( -6) / 2622 closed (+26) / 2863 total (+20) Bugs : 764 open ( +6) / 4453 closed (+38) / 5217 total (+44) RFE : 150 open ( +2) / 131 closed ( +0) / 281 total ( +2) New / Reopened Patches ______________________ Use Py_CLEAR where necessary to avoid crashes (2004-09-01) http://python.org/sf/1020188 reopened by mwh Decimal performance enhancements (2004-09-02) http://python.org/sf/1020845 opened by Nick Coghlan topdir calculated incorrectly in bdist_rpm (2004-09-03) http://python.org/sf/1022003 opened by Anthony Tuininga add support for the AutoReq flag in bdist_rpm (2004-09-03) http://python.org/sf/1022011 opened by Anthony Tuininga Improve Template error detection and reporting (2004-09-03) CLOSED http://python.org/sf/1022173 opened by Raymond Hettinger test_random depends on os.urandom (2004-09-03) CLOSED http://python.org/sf/1022176 opened by Raymond Hettinger Conserve memory with list.pop() (2004-09-06) CLOSED http://python.org/sf/1022910 opened by Raymond Hettinger CodeContext - an extension to show you where you are (2004-04-16) http://python.org/sf/936169 reopened by noamr Add arguments to RE functions (2004-09-08) CLOSED http://python.org/sf/1024041 opened by Noam Raphael Fix for 1022152 (2004-09-08) CLOSED http://python.org/sf/1024238 opened by Andrew Durdin Error when int sent to PyLong_AsUnsignedLong (2004-09-08) http://python.org/sf/1024670 opened by Clinton R. Nixon Check for NULL returns in compile.c:com_import_stmt (2004-09-10) http://python.org/sf/1025636 opened by Dima Dorfman Add status code constants to httplib (2004-09-10) http://python.org/sf/1025790 opened by Andrew Eland Clarify language in Data Structures chapter of tutorial (2004-09-10) CLOSED http://python.org/sf/1025795 opened by Dima Dorfman Fix TeX pasto in liboptparse.tex (2004-09-10) CLOSED http://python.org/sf/1025800 opened by Dima Dorfman typo repair (2004-09-12) CLOSED http://python.org/sf/1026384 opened by George Yoshida Add keyword arguments to Template substitutions (2004-09-12) CLOSED http://python.org/sf/1026859 opened by Raymond Hettinger building on OpenBSD 3.5 (2004-09-12) CLOSED http://python.org/sf/1026986 opened by Trevor Perrin Specify a source baseurl for bdist_rpm. (2004-09-15) http://python.org/sf/1028432 opened by Chris Ottrey Adding IPv6 host handling to httplib (2004-09-15) http://python.org/sf/1028502 opened by David Mills Changes to cookielib.py & friends for 2.4b1 (2004-09-16) http://python.org/sf/1028908 opened by John J Lee tarfile.py longnames are truncated in getnames() (2004-09-16) http://python.org/sf/1029061 opened by Lars Gust?bel Patches Closed ______________ Use Py_CLEAR where necessary to avoid crashes (2004-09-01) http://python.org/sf/1020188 closed by rhettinger Py_CLEAR to implicitly cast its argument to PyObject * (2004-09-01) http://python.org/sf/1020185 closed by rhettinger Implementation for PEP 318 using syntax J2 (2004-08-22) http://python.org/sf/1013835 closed by ms_ fix for several sre escaping bugs (fixes #776311) (2004-08-29) http://python.org/sf/1018386 closed by niemeyer Improve Template error detection and reporting (2004-09-03) http://python.org/sf/1022173 closed by rhettinger test_random depends on os.urandom (2004-09-03) http://python.org/sf/1022176 closed by rhettinger bsddb's DB.keys() method ignores transaction argument (2004-08-26) http://python.org/sf/1017405 closed by greg Conserve memory with list.pop() (2004-09-06) http://python.org/sf/1022910 closed by rhettinger PEP 292 reference implementation (2004-03-23) http://python.org/sf/922115 closed by bcannon Multi-line strings and unittest (2004-08-30) http://python.org/sf/1019220 closed by purcell Decoding incomplete unicode (2004-07-27) http://python.org/sf/998993 closed by doerwalter Add arguments to RE functions (2004-09-07) http://python.org/sf/1024041 closed by rhettinger Fix for 1022152 (2004-09-08) http://python.org/sf/1024238 closed by jlgijsbers Fix for duplicate attributes in generated HTML (2004-08-20) http://python.org/sf/1013055 closed by fdrake Address bug 980938, add set_debug_output() function (2004-07-03) http://python.org/sf/984492 closed by jlgijsbers make test_fcntl 64bit clean (2003-09-13) http://python.org/sf/805626 closed by loewis NetBSD py_curses.h fix (2003-09-15) http://python.org/sf/806800 closed by loewis Add script support to bdist_rpm.py (2003-09-17) http://python.org/sf/808115 closed by loewis Add --force-arch=ARCH to bdist_rpm.py (2003-09-17) http://python.org/sf/808120 closed by loewis Clarify language in Data Structures chapter of tutorial (2004-09-10) http://python.org/sf/1025795 closed by jlgijsbers Fix TeX pasto in liboptparse.tex (2004-09-10) http://python.org/sf/1025800 closed by jlgijsbers typo repair (2004-09-11) http://python.org/sf/1026384 closed by jlgijsbers make Demo/scripts/primes.py usable as a module (2004-01-04) http://python.org/sf/870286 closed by jlgijsbers reflect the removal of mpz (2003-11-15) http://python.org/sf/842567 closed by jlgijsbers Add keyword arguments to Template substitutions (2004-09-12) http://python.org/sf/1026859 closed by bwarsaw building on OpenBSD 3.5 (2004-09-13) http://python.org/sf/1026986 closed by loewis fix for glob with directories which contain brackets (2003-05-15) http://python.org/sf/738389 closed by progoth New / Reopened Bugs ___________________ a wrong link from "frame object" in lib index (2004-09-01) CLOSED http://python.org/sf/1020540 opened by Ilya Sandler senddigest error (2004-09-01) http://python.org/sf/1020605 opened by James O'Kane PyThreadState_Next not thread safe? (2004-09-02) http://python.org/sf/1021318 opened by John Ehresman Trivial fix for obscure bug in os.urandom() (2004-09-03) http://python.org/sf/1021596 opened by Nick Mathewson use first_name, not first, in code samples (2004-09-02) http://python.org/sf/1021621 opened by Steve R. Hastings 2.4a3: unhelpful error message from distutils (2004-09-03) http://python.org/sf/1021756 opened by Fredrik Lundh Import random fails (2004-09-03) CLOSED http://python.org/sf/1021890 opened by Paul D. Lusk wrong options are set to python.exe (2004-09-03) http://python.org/sf/1022010 reopened by loewis wrong options are set to python.exe (2004-09-04) CLOSED http://python.org/sf/1022010 opened by George Yoshida re.match(), re.MULTILINE and "^" broken (2004-09-03) CLOSED http://python.org/sf/1022030 opened by Pat Notz Bad examples of gettext.translation (2004-09-03) CLOSED http://python.org/sf/1022152 opened by Facundo Batista x, y in curses window object documentation (2004-09-04) http://python.org/sf/1022311 opened by Felix Wiemann Solaris: reentrancy issues (2004-08-29) http://python.org/sf/1018492 reopened by loewis test_xrange fails on osf1 v5.1b (2004-09-06) http://python.org/sf/1022813 opened by roadkill random.shuffle should restrict the type of its argument (2004-09-06) CLOSED http://python.org/sf/1022880 opened by Faheem Mitha Generator exps fail with large value of range (2004-09-06) CLOSED http://python.org/sf/1022912 opened by Andy Elvey make test fails on HP-UX11i (2004-09-06) CLOSED http://python.org/sf/1022951 opened by Richard Townsend binascii.a2b_hqx("") raises SystemError (2004-09-06) CLOSED http://python.org/sf/1022953 opened by Florian Bauer Example does not match diagram. (2004-09-06) CLOSED http://python.org/sf/1023359 opened by Nefarious CodeMonkey, Jr. script which sets random.seed still returns random value (2004-09-07) CLOSED http://python.org/sf/1023453 opened by Faheem Mitha test__locale fails (2004-09-07) CLOSED http://python.org/sf/1023798 opened by Michael Hudson Include/pyport.h: Bad LONG_BIT assumption on non-glibc sys (2004-09-07) http://python.org/sf/1023838 opened by Gregor Richards WinCVS doesn't recognize 2.4a3 (2004-09-08) CLOSED http://python.org/sf/1024427 opened by David W. Thomas struct.calcsize() behaves strangely with short type (2004-09-09) CLOSED http://python.org/sf/1024669 opened by Serafeim Zanikolas shutils.rmtree() uses excessive amounts of memory (2004-09-09) http://python.org/sf/1025127 opened by James Henstridge HTML Documentation for 2.4a3 not found (2004-09-09) http://python.org/sf/1025392 opened by Colin J. Williams email.Utils.parseaddr fails to parse valid addresses (2004-09-09) http://python.org/sf/1025395 opened by Charles asyncore.file_dispatcher should not take fd as argument (2004-09-10) http://python.org/sf/1025525 opened by david houlder tkinter.py invalid number of parameter for _tkinet.create (2004-09-10) CLOSED http://python.org/sf/1025599 opened by bertrandbfr X to the power of 0 may give wrong answer (2004-09-10) CLOSED http://python.org/sf/1025872 opened by Nick Coghlan "ASCII" in doc section "String literals" (2004-09-10) CLOSED http://python.org/sf/1026038 opened by Felix Wiemann Confusing error message when subclassing from invalid base (2004-09-11) CLOSED http://python.org/sf/1026269 opened by Gerrit Holl iso-latin-1 strings and functions lower & upper (2004-09-11) CLOSED http://python.org/sf/1026480 opened by Tomasz Kowaltowski HardwareRandom should be renamed OSRandom (2004-09-13) CLOSED http://python.org/sf/1027105 opened by Trevor Perrin unicode DNS names in socket, urllib, urlopen (2004-09-13) http://python.org/sf/1027206 opened by Damjan Georgievski socket.ssl should explain that it is a 2/3 connection (2004-09-13) http://python.org/sf/1027394 opened by adam goucher Argument missing from calltip for new-style class init (2004-09-13) http://python.org/sf/1027566 opened by Loren Guthrie os.stat errors when using shared drive on XP or NT (2004-09-13) http://python.org/sf/1027570 opened by zeke In DOM Node Objects, add more explanations for insertBefore (2004-09-14) http://python.org/sf/1027771 opened by M.-A. DARCHE Cookies without values are silently ignored (by design?) (2004-09-14) http://python.org/sf/1028088 opened by Doug Sheppard date-datetime comparison (2004-09-14) CLOSED http://python.org/sf/1028306 opened by Donnal Walter get_installer_filename (2004-09-15) http://python.org/sf/1028334 opened by bingo Python 2.3.4 broken? (2004-09-15) CLOSED http://python.org/sf/1028447 opened by Stan Problem linking on windows using mingw32 and C++ (2004-09-15) http://python.org/sf/1028697 opened by Steve Menard No command line args when script run without python.exe (2004-09-16) http://python.org/sf/1029047 opened by Kerim Borchaev PEP 302 loader not carried through by reload function (2004-09-16) http://python.org/sf/1029475 opened by Stephen Haberman test_pep277 fails (2004-09-17) http://python.org/sf/1029561 opened by Marel Baczynski Bugs Closed ___________ Crash from Rapid Clicks (2004-07-14) http://python.org/sf/990911 closed by kbk a wrong link from "frame object" in lib index (2004-09-01) http://python.org/sf/1020540 closed by rhettinger httplib.HTTPConnection sends extra blank line (2004-08-31) http://python.org/sf/1019956 closed by jhylton re.sub: two-digit group-reference hangs (2004-08-30) http://python.org/sf/1018815 closed by niemeyer re.finditer hangs on final empty match (2003-10-03) http://python.org/sf/817234 closed by niemeyer Make Problem on HPUX (2004-07-14) http://python.org/sf/991125 closed by plusk Import random fails (2004-09-03) http://python.org/sf/1021890 closed by rhettinger Regular expression failure of the sre engine (2003-07-23) http://python.org/sf/776311 closed by niemeyer wrong options are set to python.exe (2004-09-03) http://python.org/sf/1022010 closed by loewis wrong options are set to python.exe (2004-09-03) http://python.org/sf/1022010 closed by loewis re.match(), re.MULTILINE and "^" broken (2004-09-03) http://python.org/sf/1022030 closed by effbot Bad examples of gettext.translation (2004-09-04) http://python.org/sf/1022152 closed by jlgijsbers Solaris: reentrancy issues (2004-08-29) http://python.org/sf/1018492 closed by loewis including Python.h redefines _POSIX_C_SOURCE (2004-08-27) http://python.org/sf/1017450 closed by loewis inspect.getmodule symlink-related failur (2002-06-18) http://python.org/sf/570300 closed by jlgijsbers __metaclass__ in locals is ignored (2004-08-30) http://python.org/sf/1019048 closed by bcannon split method documentation can be improved (2004-02-21) http://python.org/sf/901654 closed by rhettinger random.shuffle should restrict the type of its argument (2004-09-05) http://python.org/sf/1022880 closed by rhettinger Generator exps fail with large value of range (2004-09-06) http://python.org/sf/1022912 closed by rhettinger make test fails on HP-UX11i (2004-09-06) http://python.org/sf/1022951 closed by rhettinger binascii.a2b_hqx("") raises SystemError (2004-09-06) http://python.org/sf/1022953 closed by rhettinger mimetypes add_type has bogus self parameter (2004-08-23) http://python.org/sf/1014022 closed by doerwalter Example does not match diagram. (2004-09-06) http://python.org/sf/1023359 closed by akuchling "rich comparison'' methods hide stack overflow (2004-08-30) http://python.org/sf/1019129 closed by rhettinger script which sets random.seed still returns random value (2004-09-07) http://python.org/sf/1023453 closed by rhettinger test__locale fails (2004-09-07) http://python.org/sf/1023798 closed by bcannon WinCVS doesn't recognize 2.4a3 (2004-09-08) http://python.org/sf/1024427 closed by loewis os.system segmentation fault (2004-08-25) http://python.org/sf/1015937 closed by nnorwitz struct.calcsize() behaves strangely with short type (2004-09-08) http://python.org/sf/1024669 closed by mwh RE engine internal error with LARGE RE: scalability bug (2003-12-10) http://python.org/sf/857676 closed by effbot "build" target doesn't check umask (2004-06-22) http://python.org/sf/977937 closed by melicertes tkinter.py invalid number of parameter for _tkinet.create (2004-09-10) http://python.org/sf/1025599 closed by loewis X to the power of 0 may give wrong answer (2004-09-10) http://python.org/sf/1025872 closed by tim_one Unspecific errors with metaclass (2004-08-23) http://python.org/sf/1014215 closed by rhettinger "ASCII" in doc section "String literals" (2004-09-10) http://python.org/sf/1026038 closed by loewis Confusing error message when subclassing from invalid base (2004-09-11) http://python.org/sf/1026269 closed by mwh iso-latin-1 strings and functions lower & upper (2004-09-11) http://python.org/sf/1026480 closed by kowaltowski crash error in glob.glob; directories with brackets (2003-05-15) http://python.org/sf/738361 closed by progoth HardwareRandom should be renamed OSRandom (2004-09-13) http://python.org/sf/1027105 closed by rhettinger date-datetime comparison (2004-09-14) http://python.org/sf/1028306 closed by tim_one Python 2.3.4 broken? (2004-09-15) http://python.org/sf/1028447 closed by mwh New / Reopened RFE __________________ proposed struct module format code addition (2004-09-06) http://python.org/sf/1023290 opened by Josiah Carlson urllib2 http auth (2004-09-10) http://python.org/sf/1025540 opened by Tim Nelson From symbiont+py at berlios.de Fri Sep 17 06:35:39 2004 From: symbiont+py at berlios.de (Jeff Pitman) Date: Fri Sep 17 06:37:50 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py In-Reply-To: <4149C318.2010902@ocf.berkeley.edu> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> <200409161515.56466.symbiont+py@berlios.de> <4149C318.2010902@ocf.berkeley.edu> Message-ID: <200409171235.39871.symbiont+py@berlios.de> On Friday 17 September 2004 00:45, Brett C. wrote: > would be worth having as a separate SF > project I believe the facilities at www.vim.org are sufficient for such an effort. Additionally, such standards-compliance should be pushed into the upstream vim tarball as well. Account holders on vim.org can develop vim scripts here: http://www.vim.org/scripts/index.php, which allows for a simple release mechanism. For discussion, c.l.python could work, albeit a bit noisy. Maybe a keyword in the subject line or something for those with filtering technologies would be beneficial. take care, -- -jeff From symbiont+py at berlios.de Fri Sep 17 06:39:42 2004 From: symbiont+py at berlios.de (Jeff Pitman) Date: Fri Sep 17 06:41:46 2004 Subject: [Python-Dev] tabs in httplib.py and test_httplib.py In-Reply-To: <414A57C6.9090803@interlink.com.au> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> <4149C318.2010902@ocf.berkeley.edu> <414A57C6.9090803@interlink.com.au> Message-ID: <200409171239.42375.symbiont+py@berlios.de> On Friday 17 September 2004 11:19, Anthony Baxter wrote: > Instead of 'softtabstop', use 'set et' (expandtabs). I use both. et will make it so you don't mess with \t insanity. sts makes it nice when you want to mass indent things using a region and the "=" indenter. (Ctrl-V,j,j,j,j,j,=) <-- Something to this effect. have fun, -- -jeff From mike at skew.org Fri Sep 17 07:39:11 2004 From: mike at skew.org (Mike Brown) Date: Fri Sep 17 07:39:09 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <414A05FF.8080000@v.loewis.de> =?UTF-8?Q?from_=22Martin_v=2E_L=C3=B6wis=22_at_Sep_16=2C_2004_11=3A30=3A39_p?= =?UTF-8?Q?m?= Message-ID: <200409170539.i8H5dBfF042148@chilled.skew.org> "Martin v. L?wis" wrote: > > Are we in agreement on these points? > > I think I have to answer "no". The % notation is not a quirk of the BNF. That's not what I said at *all*. The quirk of the BNF is a completely separate issue, and is this: BNF mandates that its terminals are integers, e.g. character ":" in a particular BNF-based grammar represents the value 58 (in decimal). RFC 2396 makes use of the grammar to define the generic syntax, but stipulates (well, rfc2396bis clarifies that the intent was to stipulate) that the intent is to actually define the syntax in terms of characters, so the ":" in the grammar really does mean the colon character, in that spec. So there is no disagreement there, really. > > - A URL/URI consists of a finite sequence of Unicode characters; > > No. An URI contains of a finite sequence of characters. You are correct. This is stated in RFC 2396, and Martin Duerst and I pushed for rfc2396bis to settle upon a definition of character just to make it extra clear, so I should have known better. > > - If given unicode, each character in the string directly represents > > a character in the URL/URI and needs no interpretation; > > No. Only ASCII characters in the string need no interpretation. For > non-ASCII characters, urllib needs to assume some escaping mechanism. Err, no. Let me start over. The question is: what do we do with a unicode object given as the 'url' argument in urllib.urlopen(), etc.? Assumption 1: Resolution to absolute form and subsequent dereferencing of a character sequence that is intended to identify a resource, in order to be performed in a manner that is conformant with [pick one: RFC 1630, RFC 1738, RFC 1808, RFC 2396, the RFC that rfc2396bis will likely become, or the RFC that the IRIs draft will likely become], requires that the character sequence actually *be* [depending on which spec you chose] a URL, a URI reference, or an IRI reference. Those standards do not define how to resolve & dereference other types of resource identifiers, be they character sequences or otherwise. Assumption 2: The aforementioned standards unambiguously define the syntax to which a resource-identifying character sequence must conform in order to be considered a URL, a URI reference, or an IRI reference. The standards do not define how character sequences that do not conform to the syntax can be processed (but they do not forbid such processing; they just say that they aren't applicable to those situations). Assumption 3: When an argument is given to an RFC 1808-era URL resolution function that is documented as requiring that the argument be [an object that represents] a 'URL', then the caller implicitly asserts that whatever object passed indeed represents a URL. Assumption 4: The object passed into the function, of course, is going to manifest relatively concretely, as, say, a Python str or unicode object, so the function, if it intends to perform standards-conformant resolution, must behave as if it has interpreted the object as a resource-identifying sequence of abstract characters, and must verify somehow that the sequence adheres to the syntax requirements of a URL / URI ref / IRI ref. This verification can either be an explicit syntax check, or can be a feature of the conversion of the object as resource-identifying characters. In either case, we need to define the mechanics of that conversion. This is what I am attempting to unambiguously do for str and unicode arguments by saying how each item in a str or unicode object maps to the characters that are going to be treated as a URL/URI ref. It is true that we are under no obligation in our API to assume a one-to-one mapping between the characters in a unicode argument and the characters in the resource-identifying string that, in turn, may or may not be a URL, but to do otherwise seems a bit unintuitive, to me. You seem to be suggesting that a one-to-one mapping be assumed until a syntax error is found. Then, if the syntax error is of a certain type (like the character is > U+007F, then you seem to be saying that you want some kind of cleanup to be performed in order to ensure that the resulting string is conformant to the URL syntax. I feel that since urllib is under no obligation to assume anything about what the syntax-violating characters are intended to mean, it would be within its rights to reject the argument altogether, and I would rather see it do that than try to guess what the user intended -- especially in this domain, where such guesses, if wrong, only lead developers to be even more confused about topics that are already barely understood as it is. For example, some specs (HTML, XHTML, XSLT) suggest that processors of those types of documents perform UTF-8 based percent-encoding of any non-ASCII characters that mistakenly appear in attribute values that are normally supposed to contain URI references (hrefs and the like). Users who rely on this then wonder why many widely-deployed HTTP servers/CGI/PHP apps, etc. -- the ones that assume %-encoded octets in the Request-URI are iso-8859-1 based -- misinterpret the characters. To me, convenience afforded by the automatic percent-encoding is outweighed by the harm introduced by the wrong guesses and the reinforcement of the belief in the document author or developer that a URI reference is whatever string of characters they want it to be. I have a feeling this is a matter of personal philosophy. I've never been a huge fan of the "be lenient in what you accept, strict in what you produce" mantra. URLs/URIs have a strict syntax, and IMHO we should enforce it so that developers can learn about and code to standards, rather than becoming reliant upon the crutch of lenient-yet-convenient APIs. But if we are going to accept arbitrary strings and then attempt to make 'em fit the URL syntax, then we should, IMHO, acknowledge (in API documentation) that this is behavior provided for the sake of having a convenient API, and is not within the scope of the standards. Hopefully the marginal percentage of developers who actually read the API docs can then learn that u'http://m.v.l\xd6wis/' is not a URL, even if urllib happens to convert it to one, and in my perfect fantasy-world, they'd be less inclined to give us any reason to make lenient APIs. Actually, in a perfect world I probably would not be inclined to obsess over such things :) -Mike From martin at v.loewis.de Fri Sep 17 08:07:30 2004 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri Sep 17 08:07:31 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <200409170539.i8H5dBfF042148@chilled.skew.org> References: <200409170539.i8H5dBfF042148@chilled.skew.org> Message-ID: <414A7F22.3070006@v.loewis.de> Mike Brown wrote: > It is true that we are under no obligation in our API to assume a one-to-one > mapping between the characters in a unicode argument and the characters in the > resource-identifying string that, in turn, may or may not be a URL, but to do > otherwise seems a bit unintuitive, to me. Not at all. If the URI contains the sequence '%A0', does that constitute one or three characters? You suggested earlier that the host part of an URI could be UTF-8 encoded. In that case, a single character translates into, say, 2 octets, which then get %-escaped, translating into 6 ASCII characters. So a single Unicode character may end up in multiple ASCII characters during processing. > You seem to be suggesting that a > one-to-one mapping be assumed until a syntax error is found. Then, if the > syntax error is of a certain type (like the character is > U+007F, then you > seem to be saying that you want some kind of cleanup to be performed in order > to ensure that the resulting string is conformant to the URL syntax. > > I feel that since urllib is under no obligation to assume anything about what > the syntax-violating characters are intended to mean, it would be within its > rights to reject the argument altogether, and I would rather see it do that > than try to guess what the user intended -- especially in this domain, where > such guesses, if wrong, only lead developers to be even more confused about > topics that are already barely understood as it is. Either is fine. It appears that the future URI RFC and the IRI RFC will suggest that the "cleanup" is the right action, and that the implementation should indeed process the string. > To me, convenience afforded by the automatic > percent-encoding is outweighed by the harm introduced by the wrong guesses > and the reinforcement of the belief in the document author or developer that > a URI reference is whatever string of characters they want it to be. I agree. However, I hope that the IRI RFC will resolve the issue for good, at least when the input is a Python Unicode string. When the input is a Python byte string, it seems natural to %-escape the non-ASCII bytes. > But if we are going to accept arbitrary strings and then attempt to make 'em > fit the URL syntax, then we should, IMHO, acknowledge (in API documentation) > that this is behavior provided for the sake of having a convenient API, and is > not within the scope of the standards. Hopefully the marginal percentage of > developers who actually read the API docs can then learn that > u'http://m.v.l\xd6wis/' is not a URL, even if urllib happens to convert it to > one, and in my perfect fantasy-world, they'd be less inclined to give us any > reason to make lenient APIs. But it is an IRI reference, isn't it? I think urllib then should process it as such. Regards, Martin From mike at skew.org Fri Sep 17 09:54:21 2004 From: mike at skew.org (Mike Brown) Date: Fri Sep 17 09:54:21 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <414A07F4.9000905@v.loewis.de> =?UTF-8?Q?from_=22Martin_v=2E_L=C3=B6wis=22_at_Sep_16=2C_2004_11=3A39=3A00_p?= =?UTF-8?Q?m?= Message-ID: <200409170754.i8H7sLWr042680@chilled.skew.org> "Martin v. L> > If RFC 1808 applies (the current implementation is based largely > > on this spec, which did not clearly distinguish between a reference > > and a URI), it is what is defined in the grammar as a URL, and > > if it is relative (relativeURL in the grammar), it is considered > > to be relative to a default base URL. > > This is troublesome. What is a meaningful base URL? This should be > mentioned prominently. In effect, this is what happens in the current implementation, but I don't think it was ever anyone's intent to think of it in terms of standards-based resolution-to-absolute-form against a base URL, and in any event, it's not as well-documented as it should be. User expectation in most contexts, even when it doesn't apply (as in the most prominent use of relative references: HTML/XML document processing) is that relative references are relative to a base having something to do with the current working directory of the URL processor. Wrong as it often is to make such an assumption, in the case of urlopen() we have no context that would define a base URL. The documented precedent is that the 'file' scheme is assumed, and the implementation, IIRC, is such that the relative path is run through url2pathname which does very little to it, and it is then passed right to open(), so in effect the current working directory is assumed. For the sake of having a sane policy going forward, I would rather see the behavior expressed in terms that would be governed by standards, which is what I attempted to do. Luckily, the behavior is such that it is possible. There is an issue though: if disallowed/non-ASCII characters or bytes are in the urlopen() argument, and it's a relative URL, then right now the implementation is (I think) such that those characters or bytes pass through unchanged to the open() call. So if we do anything to these characters/bytes beforehand, such as %-encoding them as I think you were suggesting (see previous email), then for compatibility we'd have to specify that we're %-decoding them again in a way that results in the original characters/bytes being passed to open(). > > (This mostly describes current behavior, assuming we can reach > > agreement that the "C:" in the example above should be treated > > no differently than "C|"). > > I have no problem with that. There are no one-letter URL schemata, > are there? There aren't, although in principle I wish the API weren't lenient; people would quickly learn that C:\x\y\z is not a URL and C:/x/y/z is only allowed by the standards to be interpreted in one way: the one they probably don't want, and what they really need to do is learn to use file:///blahblahblah. In 4Suite's Ft.Lib.Uri we needed to conduct strictly conformant processing of URI references in our DOM, XPath, XSLT, and HTTP implementations. I found that we couldn't use urllib for hardly anything of this sort without a great deal of working around / closing up the holes opened by all these 'conveniences'. Tightening up the conformance issues meant that we needed to help users produce valid URIs from filesystem paths and vice-versa. Once again, the core Python libs were of little use -- pathname2url and url2pathname are platform-dependent, and are so full of bugs^H^H^H^Hfeatures that I had to start from scratch and roll my own functions. I think what I've got at this point would make great additions to urllib2, but I'll save them for another day... At least with all the "OKs" you've given so far, I can submit a patch or three to get some of the documentation updated. > > I must attend to other things right now; will comment on the other issues > > later. > > Take your time. This has been sitting around for many releases - one > more or less doesn't matter much in the global flow of things :-) Heh, agreed. I wish rfc2396bis and IRIs would hurry on through the IETF's machinery. I've only been actively paying attention to the former, but they both have a lot going for them. From mike at skew.org Fri Sep 17 11:02:39 2004 From: mike at skew.org (Mike Brown) Date: Fri Sep 17 11:02:40 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <414A7F22.3070006@v.loewis.de> =?UTF-8?Q?from_=22Martin_v=2E_L=C3=B6wis=22_at_Sep_17=2C_2004_08=3A07=3A30_a?= =?UTF-8?Q?m?= Message-ID: <200409170902.i8H92dgF042997@chilled.skew.org> "Martin v. L?wis" wrote: > > It is true that we are under no obligation in our API to assume a one-to-one > > mapping between the characters in a unicode argument and the characters in the > > resource-identifying string that, in turn, may or may not be a URL, but to do > > otherwise seems a bit unintuitive, to me. > > Not at all. If the URI contains the sequence '%A0', > does that constitute one or three characters? Yes, it does. :) I think I've got this right: %A0 in a URI is three characters in the URI. Together they are representing one octet (byte A0) in much the same way that the 6 characters è represents a single small-e-with-acute character in ISO/IEC 10646-based markup languages. If the sequence were %00-%7F, then the octet represented by that sequence would in turn represent a single character in the ASCII range, and you would be allowed to use equivalence rules and knowledge of the syntax in order to ascertain whether the sequence is interchangeable with the raw character at that position in the URI. But since in this example it is %80-%FF, the octet represented by the sequence does not automatically represent a character; it represents, at best, a scheme- or implementation-defined code unit which may or may not be an encoded character or portion thereof. > You suggested earlier that the host part of an > URI could be UTF-8 encoded. In that case, a single character translates > into, say, 2 octets, which then get %-escaped, translating into 6 ASCII > characters. So a single Unicode character may end up in multiple ASCII > characters during processing. That sounds right, but I think I need to an example to understand where the disagreement is. It's not a URI at the point where it contains a non-ASCII character. Theoretical resolution procedure of argument u'http://m.v.l\xf6wis/': arg u'http://m.v.l\xf6wis/' => IRI ref u'http://m.v.l\xf6wis/' => URI ref u'http://m.v.l%C3%B6wis/' and likewise, just for example, arg u'http://m.v.l%C3%B6wis/' => IRI ref u'http://m.v.l%C3%B6wis/' => URI ref u'http://m.v.l%C3%B6wis/' In any event, the argument has become the URI reference u'http://m.v.l%C3%B6wis/' (which we don't need to necessarily store as unicode, but I prefer to write it as such for clarity): 1. Resolve to absolute form (necessary even with absolute refs in order to eliminate dot segments in the path; the rfc2396bis algorithm is preferable to the buggy ones in older specs for this). The base URI will be based on os.getcwd(). We'll say cwd is '/home/mike/test' to keep it simple. Base URI then is u'file:///home/mike/test'. Resolution to absolute form results in, in this case, no change: the URI represented by the URI ref is the same as the ref itself: u'http://m.v.l%C3%B6wis/'. 2. URI is decomposed into its components: scheme: u'http' authority: u'm.v.l%C3%B6wis' path: u'/' query: undefined fragment: undefined 3. Fragment, if any, is stripped prior to dereference, per specs. 4. For http scheme, authority is split into: user: undefined pass: undefined host: u'm.v.l%C3%B6wis' port: u'80' (default) 5. host is percent-decoded with a UTF-8 basis: host: u'm.v.l\xf6wis' 6. socket object is obtained for host u'm.v.l\xf6wis' and port 80 (int); socket module applies IDNA encoding and does DNS lookup of 'm.v.xn--lwis-5qa', connects to corresponding IP address on port 80 7. properly formatted HTTP request message (a byte string) is sent for Request-URI '/' with Host header 'Host: m.v.xn--lwis-5qa' If the initial argument were a byte string, I agree that any non-ASCIIs should be percent-encoded directly. Processing would then be conducted exactly as above. -Mike From mike at skew.org Fri Sep 17 11:04:27 2004 From: mike at skew.org (Mike Brown) Date: Fri Sep 17 11:04:25 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <200409170902.i8H92dgF042997@chilled.skew.org> "from Mike Brown at Sep 17, 2004 03:02:39 am" Message-ID: <200409170904.i8H94RUC043035@chilled.skew.org> I wrote: > %A0 in a URI is three characters in the URI. Together they are representing > one octet (byte A0) in much the same way that the 6 characters è > represents a single small-e-with-acute character in ISO/IEC 10646-based markup > languages. Actually I suppose it's not 'much the same way' since è does not at any point represent bytes, but you know what I mean, I think. From FBatista at uniFON.com.ar Fri Sep 17 16:10:21 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Fri Sep 17 16:14:48 2004 Subject: [Python-Dev] Decimal, copyright and license Message-ID: People: I'm creating a decimal installer (for Py2.3 users), making tarball, .rpm and .exe versions available. What I don't know is what to put about license and copyright. Regarding copyright, my first draft says: Copyright (c) 2004 Python Software Foundation. All rights reserved. Regarding license, didn't put nothing yet, should I write something like the following and include the file? See the file "LICENSE" for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES. Remember that the "decimal installer" will be available for download not in a Python location. Thanks! Facundo Batista Desarrollo de Red fbatista@unifon.com.ar (54 11) 5130-4643 Cel: 15 5097 5024 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20040917/516634a9/attachment.html From skip at pobox.com Fri Sep 17 16:20:51 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 17 16:20:59 2004 Subject: [Python-Dev] Re: Weekly Python Patch/Bug Summary In-Reply-To: <200409170414.i8H48pF8022459@h006008a7bda6.ne.client2.attbi.com> References: <200409170414.i8H48pF8022459@h006008a7bda6.ne.client2.attbi.com> Message-ID: <16714.62147.810666.375480@montanaro.dyndns.org> Kurt> Patch / Bug Summary Kurt> ___________________ Kurt> Patches : 241 open ( -6) / 2622 closed (+26) / 2863 total (+20) Kurt> Bugs : 764 open ( +6) / 4453 closed (+38) / 5217 total (+44) Kurt> RFE : 150 open ( +2) / 131 closed ( +0) / 281 total ( +2) Let me take the opportunity to thank Kurt for providing this excellent summary (much better than my original hack) and invite the larger Python community to participate in Python's development by reviewing patches and bug reports. If you're new to Python development, I urge you to read http://www.python.org/dev/dev_intro.html especially the "Helping Out" section. Skip From symbiont+py at berlios.de Fri Sep 17 17:16:18 2004 From: symbiont+py at berlios.de (Jeff Pitman) Date: Fri Sep 17 17:18:33 2004 Subject: [Python-Dev] Decimal, copyright and license In-Reply-To: References: Message-ID: <200409172316.18261.symbiont+py@berlios.de> On Friday 17 September 2004 22:10, Batista, Facundo wrote: > Remember that the "decimal installer" will be available for download > not in a Python location. For the ignorant (me): what's a "decimal installer"? thanks, -- -jeff From FBatista at uniFON.com.ar Fri Sep 17 17:50:16 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Fri Sep 17 17:54:50 2004 Subject: [Python-Dev] Decimal, copyright and license Message-ID: #- > Remember that the "decimal installer" will be available #- for download #- > not in a Python location. #- #- For the ignorant (me): what's a "decimal installer"? An installer which puts the decimal module in site-packages. Sorry for the before-turn-on-neurons-written mail. . Facundo From fdrake at acm.org Fri Sep 17 19:09:47 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Sep 17 19:10:04 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. Message-ID: <200409171309.48011.fdrake@acm.org> At this point, I'm planning to drop the gzip-compressed archives for all future Python releases. The bzip2 archives are much smaller (saving bandwidth, disk space, and download time), and supporting software seems to have become widely available in both free and commercial tools. I'm still planning to make ZIP archives available. If anyone would like to argue that I should drop that as well, feel free. ;-) -Fred -- Fred L. Drake, Jr. From skip at pobox.com Fri Sep 17 19:43:19 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 17 19:43:38 2004 Subject: [Python-Dev] Decimal, copyright and license In-Reply-To: <200409172316.18261.symbiont+py@berlios.de> References: <200409172316.18261.symbiont+py@berlios.de> Message-ID: <16715.8759.13545.439652@montanaro.dyndns.org> Jeff> On Friday 17 September 2004 22:10, Batista, Facundo wrote: >> Remember that the "decimal installer" will be available for download >> not in a Python location. Jeff> For the ignorant (me): what's a "decimal installer"? It's sort of like an impressionist's tatoo machine. After using it you have lots of little dots all over. ;-) More seriously, I suspect it's an installer for the new Decimal class for use with older versions of Python. Skip From jcarlson at uci.edu Fri Sep 17 20:00:16 2004 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri Sep 17 20:06:54 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <200409171309.48011.fdrake@acm.org> References: <200409171309.48011.fdrake@acm.org> Message-ID: <20040917105329.F123.JCARLSON@uci.edu> > At this point, I'm planning to drop the gzip-compressed archives for all > future Python releases. The bzip2 archives are much smaller (saving > bandwidth, disk space, and download time), and supporting software seems to > have become widely available in both free and commercial tools. Sounds good. When are we going to start offering a bzip2 library in Python? > I'm still planning to make ZIP archives available. If anyone would like to > argue that I should drop that as well, feel free. ;-) Zip has been the de-facto standard for compression in the windows world for around 10 years. While other formats are making inroads (rar, ace, bzip2, etc.), they are not supported by the most popular windows archiver, WinZip: http://www.download.com/sort/3150-2250-0-1-4.html? When the most popular compression tool for Windows starts offering bzip2 compression, then it seems like a good idea to toss the zip file format. - Josiah From lalo at laranja.org Fri Sep 17 20:14:09 2004 From: lalo at laranja.org (Lalo Martins) Date: Fri Sep 17 20:17:15 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <200409171309.48011.fdrake@acm.org> References: <200409171309.48011.fdrake@acm.org> Message-ID: <20040917181409.GN21135@laranja.org> On Fri, Sep 17, 2004 at 01:09:47PM -0400, Fred L. Drake, Jr. wrote: > I'm still planning to make ZIP archives available. If anyone would like to > argue that I should drop that as well, feel free. ;-) 1. the main archive software packages for all OSes support tar.bz2 in their current releases. (This includes WinZip, WinRAR and whatnot.) 2. if you can't be bothered to know what is a tar.bz2 and how to open it, you won't be getting the ZIP, but rather the EXE installer. []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://garfield.laranja.org/~lalo/gpgkey-signed.asc GNU: never give up freedom http://www.gnu.org/ From jcarlson at uci.edu Fri Sep 17 20:27:08 2004 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri Sep 17 20:33:55 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <20040917105329.F123.JCARLSON@uci.edu> References: <200409171309.48011.fdrake@acm.org> <20040917105329.F123.JCARLSON@uci.edu> Message-ID: <20040917112644.F126.JCARLSON@uci.edu> > Sounds good. When are we going to start offering a bzip2 library in > Python? Nevermind, it is already there. - Josiah From tim.peters at gmail.com Fri Sep 17 20:35:41 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Sep 17 20:35:57 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <20040917181409.GN21135@laranja.org> References: <200409171309.48011.fdrake@acm.org> <20040917181409.GN21135@laranja.org> Message-ID: <1f7befae04091711353b5538a8@mail.gmail.com> [Lalo Martins] > 1. the main archive software packages for all OSes support > tar.bz2 in their current releases. (This includes WinZip, > WinRAR and whatnot.) WinZip 9.0 SR-1 (which is the current release) does not support bz2. From mike at skew.org Fri Sep 17 22:37:53 2004 From: mike at skew.org (Mike Brown) Date: Fri Sep 17 22:38:00 2004 Subject: [Python-Dev] Re: URL processing conformance and principles In-Reply-To: <200409170902.i8H92dgF042997@chilled.skew.org> References: <200409170902.i8H92dgF042997@chilled.skew.org> Message-ID: <414B4B21.2060004@skew.org> Oops, found another little mistake in my last email: > The base URI will be based on os.getcwd(). We'll say cwd is > '/home/mike/test' to keep it simple. Base URI then is > u'file:///home/mike/test'. I meant to say u'file:///home/mike/test/' (with trailing slash). Even though the filesystem does not care, the resolution-to-absolute-form algorithm does. From Scott.Daniels at Acm.Org Sat Sep 18 00:33:38 2004 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Sat Sep 18 00:32:30 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <20040917181409.GN21135@laranja.org> References: <200409171309.48011.fdrake@acm.org> <20040917181409.GN21135@laranja.org> Message-ID: Lalo Martins wrote: > On Fri, Sep 17, 2004 at 01:09:47PM -0400, Fred L. Drake, Jr. wrote: >>I'm still planning to make ZIP archives available. If anyone would like to >>argue that I should drop that as well, feel free. ;-) > > 1. the main archive software packages for all OSes support > tar.bz2 in their current releases. (This includes WinZip, > WinRAR and whatnot.) > > 2. if you can't be bothered to know what is a tar.bz2 and how to > open it, you won't be getting the ZIP, but rather the EXE installer. .zip is the only one of these 3 formats that allows you to decompress a few files without expanding the entire archive. This feature is useful to me at least (and makes up for the larger size). -- -- Scott David Daniels Scott.Daniels@Acm.Org From python at discworld.dyndns.org Sat Sep 18 00:46:15 2004 From: python at discworld.dyndns.org (Charles Cazabon) Date: Sat Sep 18 00:39:11 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: References: <200409171309.48011.fdrake@acm.org> <20040917181409.GN21135@laranja.org> Message-ID: <20040917224615.GC8367@discworld.dyndns.org> Scott David Daniels wrote: > > .zip is the only one of these 3 formats that allows you to decompress a > few files without expanding the entire archive. This feature is useful > to me at least (and makes up for the larger size). tar supports that as well, and with better compression when paired with bzip2. Hint: tar xzf archive [file] [...] Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From nbastin at opnet.com Sat Sep 18 00:54:44 2004 From: nbastin at opnet.com (Nick Bastin) Date: Sat Sep 18 00:55:12 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <20040917181409.GN21135@laranja.org> References: <200409171309.48011.fdrake@acm.org> <20040917181409.GN21135@laranja.org> Message-ID: <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> On Sep 17, 2004, at 2:14 PM, Lalo Martins wrote: > On Fri, Sep 17, 2004 at 01:09:47PM -0400, Fred L. Drake, Jr. wrote: >> I'm still planning to make ZIP archives available. If anyone would >> like to >> argue that I should drop that as well, feel free. ;-) > > 1. the main archive software packages for all OSes support > tar.bz2 in their current releases. (This includes WinZip, > WinRAR and whatnot.) If we're only talking binary releases, then I don't really care, but please don't make this change for the source releases. There are several platforms on which Python is supported which do not support bzip2 out of the box (Solaris, as a prime example). It adds just that much more heartache to get python installed on such a system. -- Nick From Scott.Daniels at Acm.Org Sat Sep 18 01:51:18 2004 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Sat Sep 18 01:50:13 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <20040917224615.GC8367@discworld.dyndns.org> References: <200409171309.48011.fdrake@acm.org> <20040917181409.GN21135@laranja.org> <20040917224615.GC8367@discworld.dyndns.org> Message-ID: Charles Cazabon wrote: > Scott David Daniels wrote: > >>.zip is the only one of these 3 formats that allows you to decompress a >>few files without expanding the entire archive. This feature is useful >>to me at least (and makes up for the larger size). > > tar supports that as well, and with better compression when paired with bzip2. > Hint: tar xzf archive [file] [...] Right, but the only way it can extract the last file ofthetar archive is to expand the entire arcive (in order to determine the bytes at the end of the archive). .zip looks in the directory for the file, reads the bytes representing the compressed file (and only that file), and uses them to expand the file to its original version. > > Charles -- -- Scott David Daniels Scott.Daniels@Acm.Org From fredrik at pythonware.com Sat Sep 18 09:10:02 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 18 09:08:07 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for futurereleases. References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> Message-ID: Nick Bastin wrote: > If we're only talking binary releases, then I don't really care, but please don't make this change > for the source releases. There are several platforms on which Python is supported which do not > support bzip2 out of the box (Solaris, as a prime example). It adds just that much more heartache > to get python installed on such a system. agreed. it may come as a surprise to some people, but Linux is not the only Unix system out there. Python works extremely well on non-Linux systems too... From martin at v.loewis.de Sat Sep 18 10:42:18 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 18 10:42:18 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for futurereleases. In-Reply-To: References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> Message-ID: <414BF4EA.4050600@v.loewis.de> Fredrik Lundh wrote: > agreed. it may come as a surprise to some people, but Linux > is not the only Unix system out there. Python works extremely > well on non-Linux systems too... But then, a Unix system does not have gzip, either. So we probably should use compress(1), or, better yet, distribute uncompressed tar files. Perhaps we should use cpio instead, or pax, because we need to avoid GNU tar extensions. Maybe IP isn't available, either, so we should ship QIC tapes. On Solaris, bzip2 is in the SUNWbzipS package, and installs into /usr/bin. Regards, Martin P.S. Just found this on compress(1) of Solaris 9: NOTES Although compressed files are compatible between machines with large memory, -b 12 should be used for file transfer to architectures with a small process data space (64KB or less). Solaris 9 requires a 512MB swap partition for installation, and the installer makes heavy use of Java... From fredrik at pythonware.com Sat Sep 18 13:00:18 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 18 12:58:28 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414BF4EA.4050600@v.loewis.de> Message-ID: Martin v. Löwis wrote: >> agreed. it may come as a surprise to some people, but Linux >> is not the only Unix system out there. Python works extremely >> well on non-Linux systems too... > > But then, a Unix system does not have gzip, either. Of the build systems I checked, all had gunzip, most had unzip, but only the Linux systems had bunzip2. The bzip2 homepage contains 1.0.2 binaries for exactly three plat- forms, compared to over 20 systems for gzip and 30 systems for unzip. I suppose older bzip2 versions (0.9.5) are compatible, but someone should verify that they work before you pull the gzip archives. > Maybe IP isn't available, either, so we should ship QIC tapes. That's a really helpful comment. From martin at v.loewis.de Sat Sep 18 13:27:02 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 18 13:27:02 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414BF4EA.4050600@v.loewis.de> Message-ID: <414C1B86.7070302@v.loewis.de> Fredrik Lundh wrote: > Of the build systems I checked, all had gunzip, most had unzip, but > only the Linux systems had bunzip2. Sure, there are systems that don't have bunzip2 installed. However, what is the problem of installing it? All you need is a C compiler, and I'm sure you have one - how else are you going to install Python? And if building bzip2 yourself is a problem for some reason I cannot imagine, then what is the problem with using a prebuilt binary? As I said, Solaris (atleast Solaris 9) comes with bzip2. If you have an older Solaris release, you can get a binary from sunfreeware.com. For HP-UX, you can get it from the HP porting center, e.g. http://hpux.asknet.de/hppd/hpux/Misc/bzip2-1.0.2/ (both PA-RISC and Itanium binaries, for 10.20, 11.00, 11.20, and 11.22) For AIX, you can get it from http://www.bullfreeware.com/. What other systems have you been looking at? Regards, Martin From erik at heneryd.com Sat Sep 18 14:07:54 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 18 14:08:00 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C1B86.7070302@v.loewis.de> References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414BF4EA.4050600@v.loewis.de> <414C1B86.7070302@v.loewis.de> Message-ID: <414C251A.80108@heneryd.com> Martin v. L?wis wrote: > Fredrik Lundh wrote: > >> Of the build systems I checked, all had gunzip, most had unzip, but >> only the Linux systems had bunzip2. > > > Sure, there are systems that don't have bunzip2 installed. However, > what is the problem of installing it? All you need is a C compiler, > and I'm sure you have one - how else are you going to install Python? Yes, those with older, bzip2less systems can probably figure out how to get it and build it, but why force them when it's practically no work keeping it? It's one (sic) extra command for the release manager and ~9M extra disk space per release on www.python.org. And besides that... only GNU tar supports the j flag. Erik From fredrik at pythonware.com Sat Sep 18 15:13:47 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Sep 18 15:20:31 2004 Subject: [Python-Dev] Re: Re: Planning to drop gzip compression for futurereleases. References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414BF4EA.4050600@v.loewis.de> <414C1B86.7070302@v.loewis.de> <414C251A.80108@heneryd.com> Message-ID: Erik Heneryd wrote: > It's one (sic) extra command for the release manager and ~9M extra > disk space per release on www.python.org. but at 50 cents a gigabyte, and an endless stream of alphas and release candidates, that might turn out to be rather expensive. oh wait, you wrote megabytes, not gigabytes. From martin at v.loewis.de Sat Sep 18 15:52:05 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 18 15:52:12 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C251A.80108@heneryd.com> References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414BF4EA.4050600@v.loewis.de> <414C1B86.7070302@v.loewis.de> <414C251A.80108@heneryd.com> Message-ID: <414C3D85.20706@v.loewis.de> Erik Heneryd wrote: > Yes, those with older, bzip2less systems can probably figure out how to > get it and build it, but why force them when it's practically no work > keeping it? It's one (sic) extra command for the release manager and > ~9M extra disk space per release on www.python.org. Fred wouldn't have asked if it was no effort in keeping it. There is certainly more than one command to it - you have to md5sum the file, and copy the md5sum into the release notes. You have to upload the file from your workstation to python.org. I don't know how you do that, but I need to use my DSL link for uploading the MSI files; it takes roughly 30min to upload. Fortunately, I have a DSL flatrate. Regards, Martin From erik at heneryd.com Sat Sep 18 20:00:12 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 18 20:00:26 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C3D85.20706@v.loewis.de> References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414BF4EA.4050600@v.loewis.de> <414C1B86.7070302@v.loewis.de> <414C251A.80108@heneryd.com> <414C3D85.20706@v.loewis.de> Message-ID: <414C77AC.1070507@heneryd.com> Martin v. L?wis wrote: > Erik Heneryd wrote: > >> Yes, those with older, bzip2less systems can probably figure out how >> to get it and build it, but why force them when it's practically no >> work keeping it? It's one (sic) extra command for the release manager >> and ~9M extra disk space per release on www.python.org. > > > Fred wouldn't have asked if it was no effort in keeping it. There is > certainly more than one command to it - you have to md5sum the file, > and copy the md5sum into the release notes. You have to upload the file > from your workstation to python.org. I don't know how you do that, but > I need to use my DSL link for uploading the MSI files; it takes roughly > 30min to upload. Fortunately, I have a DSL flatrate. > > Regards, > Martin Yeah, I was a bit hasty. Sure, it's more than one command: * pack it * unpack it * diff it against the known-to-be-good bzip2 tree * md5sum it and add that to the release notes * add another link on the download page * ...something else? but I still think my point stands - it's not that much work, really, and it'd be a nice service to those with bzip2less systems. Regarding upload times I guess I'm just another spoiled swede; I've been on ethernet for so long I can barely remember what 5k/s was like... That said, I do realise that it all adds up and that doing a release take some work, so whether you decide to keep it or not: thanks. Erik From aahz at pythoncraft.com Sat Sep 18 20:10:07 2004 From: aahz at pythoncraft.com (Aahz) Date: Sat Sep 18 20:10:10 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: References: <414BF4EA.4050600@v.loewis.de> Message-ID: <20040918181007.GA7132@panix.com> On Sat, Sep 18, 2004, Fredrik Lundh wrote: > > Of the build systems I checked, all had gunzip, most had unzip, but > only the Linux systems had bunzip2. > > The bzip2 homepage contains 1.0.2 binaries for exactly three plat- > forms, compared to over 20 systems for gzip and 30 systems for > unzip. I suppose older bzip2 versions (0.9.5) are compatible, but > someone should verify that they work before you pull the gzip > archives. Granted that bz2-only isn't a viable option, what does gz give us over bz2/zip that makes it worthwhile to keep? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From pythondev at bitfurnace.com Sat Sep 18 20:43:03 2004 From: pythondev at bitfurnace.com (damien morton) Date: Sat Sep 18 20:47:45 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <20040918181007.GA7132@panix.com> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> Message-ID: <414C81B7.10901@bitfurnace.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20040918/eed68330/attachment.htm From bac at OCF.Berkeley.EDU Sat Sep 18 21:57:56 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sat Sep 18 21:58:04 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C81B7.10901@bitfurnace.com> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> Message-ID: <414C9344.5020501@ocf.berkeley.edu> damien morton wrote: > Umm, gzip compression is also one of the possible http compression > algorithms. bz2 isnt. > What does HTTP compression have to do with whether we have a gzipped release of Python? My personal take on all of this is that we make the release manager's job as simple as possible. That means either ditch gzip files or ditch bzip2 files. If we stick with gzip we basically eat the bandwidth cost. If we go with bzip2 we need to link to where to get the source to compile, if not host a copy of the bzip2 source ourselves. But either way I completely sympathize with the release managers and I am all for making people's lives easier at release time. So I say we should go with bzip2. While we might get our bandwidth for free thanks to the good graces of XS4ALL and Thomas, I don't think we should view it as infinite since they are still footing the bill. If we can do something easily that would reduce their cost enough to buy Thomas a soda I think we should do it. If that means some people need to go download some free software, then so be it. Considering Python has practically no required tools beyond a C compiler we have rather low dependency requirements for UNIX in my eyes. Hell, bzip2's source is less than the difference between 2.4's bzip2 source package compared to the gzip one. We could have a copy of the latest bzip2 on our server for people to download and we would still save on bandwidth even when people need both Python and bzip2. Plus, without starting a flame war, bzip2 is under a BSD license so it gets a gold star from me. =) -Brett From erik at heneryd.com Sat Sep 18 22:46:10 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 18 22:46:16 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C9344.5020501@ocf.berkeley.edu> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> Message-ID: <414C9E92.2030803@heneryd.com> Brett C. wrote: > But either way I completely sympathize with the release managers and I > am all for making people's lives easier at release time. Yep. I suppose that's what this is all about. Should we add 5 minutes of work for: 1) the release manager 2) the n (small integer) people with bzip2less systems Think 1) is the way to go, at least for finals. Oh, whatever, I don't even really care. I'll shut up now. Erik From barry at python.org Sat Sep 18 22:52:27 2004 From: barry at python.org (Barry Warsaw) Date: Sat Sep 18 22:52:33 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C9344.5020501@ocf.berkeley.edu> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> Message-ID: <1095540746.29261.1.camel@geddy.wooz.org> On Sat, 2004-09-18 at 15:57, Brett C. wrote: > My personal take on all of this is that we make the release manager's job as > simple as possible. Although if someone from the community wanted to volunteer to build tgz files, that might go a long way toward keeping this option available. Disk space on python.org isn't (or shouldn't be) an issue. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040918/ae5bc467/attachment.pgp From phd at mail2.phd.pp.ru Sat Sep 18 22:57:35 2004 From: phd at mail2.phd.pp.ru (Oleg Broytmann) Date: Sat Sep 18 22:57:45 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C9E92.2030803@heneryd.com> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> <414C9E92.2030803@heneryd.com> Message-ID: <20040918205735.GA24237@phd.pp.ru> On Sat, Sep 18, 2004 at 10:46:10PM +0200, Erik Heneryd wrote: > Yep. I suppose that's what this is all about. Should we add 5 minutes > of work for: > > 1) the release manager Add 5 minutes for EVERY release. > 2) the n (small integer) people with bzip2less systems Add 5 minutes to install bzip2 ONCE and forever. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From nhodgson at bigpond.net.au Sat Sep 18 23:01:07 2004 From: nhodgson at bigpond.net.au (Neil Hodgson) Date: Sat Sep 18 23:01:14 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for futurereleases. References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> <414C9E92.2030803@heneryd.com> Message-ID: <008601c49dc2$9e257e10$a44a8890@neil> Are there site statistics that show the current relative demand for .gz versus .bz2? Neil From gvanrossum at gmail.com Sat Sep 18 23:19:21 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat Sep 18 23:19:30 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for futurereleases. In-Reply-To: <008601c49dc2$9e257e10$a44a8890@neil> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> <414C9E92.2030803@heneryd.com> <008601c49dc2$9e257e10$a44a8890@neil> Message-ID: "When in doubt, don't pass." If there was all around agreement to drop gzip, I'd say go for it. But since there isn't, let's keep supporting it and test the waters again in a year or two. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From erik at heneryd.com Sat Sep 18 23:49:40 2004 From: erik at heneryd.com (Erik Heneryd) Date: Sat Sep 18 23:49:44 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <1095540746.29261.1.camel@geddy.wooz.org> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> <1095540746.29261.1.camel@geddy.wooz.org> Message-ID: <414CAD74.3030108@heneryd.com> Barry Warsaw wrote: > On Sat, 2004-09-18 at 15:57, Brett C. wrote: > > >>My personal take on all of this is that we make the release manager's job as >>simple as possible. > > > Although if someone from the community wanted to volunteer to build tgz > files, that might go a long way toward keeping this option available. > Disk space on python.org isn't (or shouldn't be) an issue. Sure. I could build the tar.gz if given a tar.bz2/cvs pointer, though I personally think even the coordination overhead wouldn't make it worthwhile. If nothing else, just to end this IMHO silly thread. Erik From python at rcn.com Sun Sep 19 00:34:08 2004 From: python at rcn.com (Raymond Hettinger) Date: Sun Sep 19 00:35:25 2004 Subject: [Python-Dev] Noam's open regex requests In-Reply-To: <41436BF6.6080903@myrealbox.com> Message-ID: <004d01c49dcf$9cff88c0$e841fea9@oemcomputer> [Noam Raphael] > I've suggested three things that I think should be done in that > case, and nobody objected. > > 1. Add a prominent note in the module contents page or in the module's > main page, stating that some functionality can only be acheived by using > compiled REs. I would make that read "The methods of compiled regular expressions allow more options than their simplified function counterparts. Most non-trivial applications always use the compiled form." > 2. Document the optional parameters which let you specify the start and > end pos in the findall and finditer methods of a compiled RE object. This seems reasonable to me. The API is already exposed and is useful. Why not document it. AFAICT, there are no plans to take away the functionality. > 3. Add the optional parameter "flags" to the findall and finditer > functions. Then, the four functions match, search, findall and finditer > would have the same interface: function(pattern, string[, flags]). This also seems reasonable to me. It is marginally useful and it may reduce the learning curve ever so slightly. There is nothing special about findall() and finditer() that makes them different from match() and search() with respect to flags. Raymond Hettinger From nbastin at opnet.com Sun Sep 19 04:12:58 2004 From: nbastin at opnet.com (Nick Bastin) Date: Sun Sep 19 04:13:29 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <20040918205735.GA24237@phd.pp.ru> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> <414C9E92.2030803@heneryd.com> <20040918205735.GA24237@phd.pp.ru> Message-ID: <6CDA27F9-09E1-11D9-BB3D-000D932927FE@opnet.com> On Sep 18, 2004, at 4:57 PM, Oleg Broytmann wrote: > On Sat, Sep 18, 2004 at 10:46:10PM +0200, Erik Heneryd wrote: >> Yep. I suppose that's what this is all about. Should we add 5 >> minutes >> of work for: >> >> 1) the release manager > > Add 5 minutes for EVERY release. > >> 2) the n (small integer) people with bzip2less systems > > Add 5 minutes to install bzip2 ONCE and forever. Sure, on every machine that you need to install python on (and it isn't 5 minutes either - most solaris machines aren't that fast). That's assuming that it's acceptable to your corporation to just be adding software to your unix machines that hasn't gone through a qualification process. -- Nick From bac at OCF.Berkeley.EDU Sun Sep 19 07:46:25 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Sep 19 07:46:33 2004 Subject: [Python-Dev] vimrc file in Misc (was: tabs in httplib.py and test_httplib.py) In-Reply-To: <4149C318.2010902@ocf.berkeley.edu> References: <700FAFA4-076B-11D9-B3F2-0003934AD54A@chello.se> <4148CFE1.5010503@ocf.berkeley.edu> <4149361F.3030906@v.loewis.de> <200409161515.56466.symbiont+py@berlios.de> <4149C318.2010902@ocf.berkeley.edu> Message-ID: <414D1D31.9050107@ocf.berkeley.edu> I just checked in a vimrc file in Misc that attempts to set the proper settings to follow PEPs 7 & 8. You can safely source it in your own vimrc file in order to get the proper settings for Python and C files. Hope it proves useful. -Brett From paul at pfdubois.com Sun Sep 19 08:05:28 2004 From: paul at pfdubois.com (Paul F. Dubois) Date: Sun Sep 19 08:05:31 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases In-Reply-To: <20040918204637.B934C1E4016@bag.python.org> References: <20040918204637.B934C1E4016@bag.python.org> Message-ID: <414D21A8.4090708@pfdubois.com> Some of this discussion has wandered into the 'we are all competent computer people here so what is the problem going to get x and installing it' line of reasoning. Many Python users are not very good at computing. If they know how to install Python now why make that disappear? The fact that in principle they could manage somehow if we didn't provide a zip file doesn't mean they should have to. It just isn't a big deal on our end. Not only are batteries included, but we have several sizes to fit your equipment. From fredrik at pythonware.com Sun Sep 19 09:25:04 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Sep 19 09:23:09 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com><414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> Message-ID: Brett C wrote: > My personal take on all of this is that we make the release manager's job as simple as possible. first the "no abbreviations in the standard library" and now "who cares about users; releases are for the release manager". have you even seen a Python user in real life? From chris.cavalaria at free.fr Sun Sep 19 11:15:25 2004 From: chris.cavalaria at free.fr (Christophe Cavalaria) Date: Sun Sep 19 11:15:34 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases References: <20040918204637.B934C1E4016@bag.python.org> <414D21A8.4090708@pfdubois.com> Message-ID: Paul F. Dubois wrote: > Some of this discussion has wandered into the 'we are all competent > computer people here so what is the problem going to get x and > installing it' line of reasoning. > > Many Python users are not very good at computing. If they know how to > install Python now why make that disappear? The fact that in > principle they could manage somehow if we didn't provide a zip file > doesn't mean they should have to. It just isn't a big deal on our end. > > Not only are batteries included, but we have several sizes to fit your > equipment. Well, we are not talking about a simple click-the-shiny-exe-and-the-yes-button install here. We are talking about source code install of Python on Unix-like computers. The number of users who can install Python that way but can't install bzip2, even with the source code must be very very small. From martin at v.loewis.de Sun Sep 19 11:29:41 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 19 11:29:41 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for futurereleases. In-Reply-To: <008601c49dc2$9e257e10$a44a8890@neil> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> <414C9E92.2030803@heneryd.com> <008601c49dc2$9e257e10$a44a8890@neil> Message-ID: <414D5185.5010002@v.loewis.de> Neil Hodgson wrote: > Are there site statistics that show the current relative demand for .gz > versus .bz2? Within the last few days (since the logs rotated on Sep 13), there have been 1095 accesses to Python-2.3.4.tar.bz2, and 5168 to Python-2.3.4.tgz. Regards, Martin From fredrik at pythonware.com Sun Sep 19 14:00:11 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Sep 19 14:00:18 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com><414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu><414C9E92.2030803@heneryd.com><008601c49dc2$9e257e10$a44a8890@neil> <414D5185.5010002@v.loewis.de> Message-ID: Martin v. Löwis wrote: > Within the last few days (since the logs rotated on Sep 13), there have > been 1095 accesses to Python-2.3.4.tar.bz2, and 5168 to Python-2.3.4.tgz. so given the "we'll save 5 minutes for each release, and users stuck with gzip only loses 5 minutes each" rationale, I assume this means that some- one's planning to make 314400 Python releases over the next year? From jjl at pobox.com Sun Sep 19 16:03:53 2004 From: jjl at pobox.com (John J Lee) Date: Sun Sep 19 16:04:34 2004 Subject: [Python-Dev] Re: URL processing conformance and principles (was Re: urllib.urlopen...) In-Reply-To: <200409170754.i8H7sLWr042680@chilled.skew.org> References: <200409170754.i8H7sLWr042680@chilled.skew.org> Message-ID: On Fri, 17 Sep 2004, Mike Brown wrote: [...] > Tightening up the conformance issues meant that we needed to help users > produce valid URIs from filesystem paths and vice-versa. Once again, the core > Python libs were of little use -- pathname2url and url2pathname are > platform-dependent, and are so full of bugs^H^H^H^Hfeatures that I had to > start from scratch and roll my own functions. I think what I've got at this > point would make great additions to urllib2, but I'll save them for another > day... You must be worn out after those posts :-), but: Would certainly be nice to have some more compliant, perhaps less forgiving functions for those tasks, so +1 for adding your OsPathToUri() and UriToOsPath() somewhere in the stdlib. Maybe urllib2 is as good a place as any. I suppose somebody knowledgeable about both Macs and URIs must volunteer to do the Mac work first, though. John From martin at v.loewis.de Sun Sep 19 20:05:53 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 19 20:05:50 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases In-Reply-To: References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com><414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu><414C9E92.2030803@heneryd.com><008601c49dc2$9e257e10$a44a8890@neil> <414D5185.5010002@v.loewis.de> Message-ID: <414DCA81.90007@v.loewis.de> Fredrik Lundh wrote: > so given the "we'll save 5 minutes for each release, and users stuck with > gzip only loses 5 minutes each" rationale, I assume this means that some- > one's planning to make 314400 Python releases over the next year? Talking about helpful comments... Regards, Martin From martin at v.loewis.de Sun Sep 19 20:37:54 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 19 20:37:52 2004 Subject: [Python-Dev] [TARGETDIR]lib-tk added to PythonPath in MSI In-Reply-To: <20040915115947.A26465@ActiveState.com> References: <20040915115947.A26465@ActiveState.com> Message-ID: <414DD202.5060002@v.loewis.de> Trent Mick wrote: > Shouldn't that be this instead? > > ("PythonPath", -1, prefix+r"\PythonPath", "", > "[TARGETDIR]Lib;[TARGETDIR]DLLs;[TARGETDIR]Lib\\lib-tk", "REGISTRY"), Indeed it should; thanks for pointing that out. Fixed in 1.12. Regards, Martin From anthony at interlink.com.au Mon Sep 20 04:06:01 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Sep 20 04:07:03 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> References: <200409171309.48011.fdrake@acm.org> <20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> Message-ID: <414E3B09.9070407@interlink.com.au> Nick Bastin wrote: > If we're only talking binary releases, then I don't really care, but > please don't make this change for the source releases. There are > several platforms on which Python is supported which do not support > bzip2 out of the box (Solaris, as a prime example). It adds just that > much more heartache to get python installed on such a system. I have no intention of dropping tar.gz source releases. I think Fred was talking about the documentation tarballs. Even then, I think there's some advantages to keeping both, and I don't really see the advantage to dropping the tar.gz format. But hey, that's up to Fred - he's the one who makes the doc releases. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Mon Sep 20 04:08:52 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Sep 20 04:09:10 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C3D85.20706@v.loewis.de> References: <200409171309.48011.fdrake@acm.org><20040917181409.GN21135@laranja.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414BF4EA.4050600@v.loewis.de> <414C1B86.7070302@v.loewis.de> <41 Message-ID: <414E3BB4.60505@interlink.com.au> Martin v. L?wis wrote: > Fred wouldn't have asked if it was no effort in keeping it. There is > certainly more than one command to it - you have to md5sum the file, > and copy the md5sum into the release notes. You have to upload the file > from your workstation to python.org. I don't know how you do that, but > I need to use my DSL link for uploading the MSI files; it takes roughly > 30min to upload. Fortunately, I have a DSL flatrate. Again, we're only talking about the documentation tarballs. I'm still going to be making both tar.gz and tar.bz2 format source releases - yes, it's a bit more work (gotta gpg sign both, upload both) but I'm completely unconvinced that forcing people to install bzip2 everywhere is a useful approach. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Mon Sep 20 05:31:08 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Sep 20 05:31:37 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414C9344.5020501@ocf.berkeley.edu> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> Message-ID: <414E4EFC.1080003@interlink.com.au> Brett C. wrote: > My personal take on all of this is that we make the release manager's > job as simple as possible. That means either ditch gzip files or ditch > bzip2 files. I disagree, almost 100%. The job of release management is to make it as easy as possible for people to get and use Python. The language isn't being organised for _my_ benefit. Last I looked, tar.bz2 was less than 1/4 of tar.gz in terms of number of downloads (see http://www.python.org/wwwstats/usage_200409.html) That's hardly a case for switching. -- Anthony Baxter It's never too late to have a happy childhood. From martin at v.loewis.de Mon Sep 20 08:25:16 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon Sep 20 08:25:31 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for future releases. In-Reply-To: <414E4EFC.1080003@interlink.com.au> References: <414BF4EA.4050600@v.loewis.de> <20040918181007.GA7132@panix.com> <414C81B7.10901@bitfurnace.com> <414C9344.5020501@ocf.berkeley.edu> <414E4EFC.1080003@interlink.com.au> Message-ID: <414E77CC.5030108@v.loewis.de> Anthony Baxter wrote: > Last I looked, tar.bz2 was less than 1/4 of tar.gz in terms of number > of downloads (see http://www.python.org/wwwstats/usage_200409.html) > That's hardly a case for switching. Although that may partly result from http://www.python.org/download/ referring to the .tgz only. Regards, Martin From FBatista at uniFON.com.ar Mon Sep 20 18:17:02 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Mon Sep 20 18:21:42 2004 Subject: [Python-Dev] Copyright and license texts for distributing Decimal Message-ID: I'll reformulate my question. I've got a proyect, SiGeFi, which needs the decimal module, so for being used with Py2.3 it's a must for the project to let the user to install the decimal module "easily". Also, Alex Martelli asked me to create a "decimal module" installer, for reasons related to a recipe for his new book. So I get into distutils and prepared tarball, .rpm and .exe packages for install the decimal module in your system if you have Py2.3. What I didn't solve is the license and copyright to put in the package. Regarding copyright, my first draft says: Copyright (c) 2004 Python Software Foundation. All rights reserved. Regarding license, didn't put nothing yet, should I write something like the following and include the file? See the file "LICENSE" for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES. What license should I include in the packages? And the copyright text? Thank you! Facundo Batista Desarrollo de Red fbatista@unifon.com.ar (54 11) 5130-4643 Cel: 15 5097 5024 -----Mensaje original----- De: Batista, Facundo Enviado el: Viernes, 17 de Septiembre de 2004 11:10 Para: Python Dev (E-mail) Asunto: [Python-Dev] Decimal, copyright and license People: I'm creating a decimal installer (for Py2.3 users), making tarball, .rpm and .exe versions available. What I don't know is what to put about license and copyright. Regarding copyright, my first draft says: Copyright (c) 2004 Python Software Foundation. All rights reserved. Regarding license, didn't put nothing yet, should I write something like the following and include the file? See the file "LICENSE" for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES. Remember that the "decimal installer" will be available for download not in a Python location. Thanks! Facundo Batista Desarrollo de Red fbatista@unifon.com.ar (54 11) 5130-4643 Cel: 15 5097 5024 From tim.peters at gmail.com Mon Sep 20 22:34:55 2004 From: tim.peters at gmail.com (Tim Peters) Date: Mon Sep 20 22:35:28 2004 Subject: [Python-Dev] Decimal, copyright and license In-Reply-To: References: Message-ID: <1f7befae04092013342051997e@mail.gmail.com> [Batista, Facundo] > I'm creating a decimal installer (for Py2.3 users), making tarball, .rpm and .exe > versions available. > > What I don't know is what to put about license and copyright. In the language of the PSF license, you're making "a derivative work" then. Your derivative work is *your* work, and you can license it however you like (although, as the PSF license says, you must *include* the PSF license and copyright notice). The copyright is yours, since it's your work. > Regarding copyright, my first draft says: > > Copyright (c) 2004 Python Software Foundation. > All rights reserved. You hold copyright whether you say so or not. You won't get into trouble by claiming the PSF holds copyright, though. > Regarding license, didn't put nothing yet, should I write something like the > following and include the file? No matter what else you do, you must include the PSF license and copyright. The license you want to use for your part of the work is entirely up to you; the PSF license imposes no restrictions there. > See the file "LICENSE" for information on the history of this > software, terms & conditions for usage, and a DISCLAIMER OF ALL > WARRANTIES. That would be suitable if you want to leave the impression that you're licensing your work under the terms of the PSF license. That's fine, if that's what you want to do. If you want to write a license saying people have to pay you a million dollars each time they use your installer, that's also fine. > Remember that the "decimal installer" will be available for download not in a > Python location. That part doesn't really matter. What you suggest above is all fine. From tommy at ilm.com Mon Sep 20 23:34:31 2004 From: tommy at ilm.com (Tommy Burnette) Date: Mon Sep 20 23:45:40 2004 Subject: [Python-Dev] built on beer? Message-ID: <16719.19687.44576.866934@evoke.lucasdigital.com> hey team, in a completely un-python-related thread about mobile phones last week, a friend (who I did not know knew anything about python), when asked what made a certain nokia phone stand out above one from another company, replied: "... nokia runs python, the language built on beer." does the PSF have any t-shirts that advertise this fact? :) From tommy at ilm.com Tue Sep 21 02:28:05 2004 From: tommy at ilm.com (Tommy Burnette) Date: Tue Sep 21 02:28:11 2004 Subject: [Python-Dev] built on beer? In-Reply-To: <16719.19687.44576.866934@evoke.lucasdigital.com> References: <16719.19687.44576.866934@evoke.lucasdigital.com> Message-ID: <16719.30101.8590.767412@evoke.lucasdigital.com> apologies for replying to my own posting- the "fact" I wished to advertise was the beer one, not the nokia one! Tommy Burnette writes: | hey team, | | in a completely un-python-related thread about mobile phones last | week, a friend (who I did not know knew anything about python), when | asked what made a certain nokia phone stand out above one from another | company, replied: | | | "... nokia runs python, the language built on beer." | | | does the PSF have any t-shirts that advertise this fact? :) | | | _______________________________________________ | Python-Dev mailing list | Python-Dev@python.org | http://mail.python.org/mailman/listinfo/python-dev | Unsubscribe: http://mail.python.org/mailman/options/python-dev/tommy%40ilm.com From fdrake at acm.org Tue Sep 21 16:36:15 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Sep 21 16:36:33 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <414E3B09.9070407@interlink.com.au> References: <200409171309.48011.fdrake@acm.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414E3B09.9070407@interlink.com.au> Message-ID: <200409211036.15861.fdrake@acm.org> [Responding to an absolutely enormous deluge of emails on python-dev...] On Sunday 19 September 2004 10:06 pm, Anthony Baxter wrote: > I have no intention of dropping tar.gz source releases. I think Fred > was talking about the documentation tarballs. Even then, I think there's > some advantages to keeping both, and I don't really see the advantage > to dropping the tar.gz format. But hey, that's up to Fred - he's the > one who makes the doc releases. Dang, it doesn't pay to be away from email for three days, does it? Yes, I was only talking about documentation releases. It never occurred to me anyone would think I was talking about Python source releases. Maybe I shouldn't have added python-dev to the recipients list for my original email, but too often objections get heard quite late if I don't include python-dev. For the documentation, there's a much longer history of providing the bz2 versions of the archives. There are also many more archives we can drop per release. While in theory disk space isn't supposed to be an issue, it seems to be something our sysadmin group is dealing with on a regular basis (mostly cleaning up old webserver logs). So while the space itself may not be an issue, it certainly generates tedious work for volunteers. My motivation in dropping the bz2 archives is two-fold: 1. Reduce disk space consumed per release, mostly to ease the burden on the sysadmin group. 2. Reduce the number of files posted for the documentation per release, so that choices for end-users are easier to pick through. -Fred -- Fred L. Drake, Jr. From fdrake at acm.org Tue Sep 21 17:07:02 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Sep 21 17:07:29 2004 Subject: [Doc-SIG] Re: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <200409211036.15861.fdrake@acm.org> References: <200409171309.48011.fdrake@acm.org> <414E3B09.9070407@interlink.com.au> <200409211036.15861.fdrake@acm.org> Message-ID: <200409211107.02590.fdrake@acm.org> This morning, I wrote: > My motivation in dropping the bz2 archives is two-fold: That should be the *gz* archives, not the bz2 archives! -Fred -- Fred L. Drake, Jr. From psoberoi at gmail.com Tue Sep 21 17:31:43 2004 From: psoberoi at gmail.com (Paramjit Oberoi) Date: Tue Sep 21 17:31:45 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <200409211036.15861.fdrake@acm.org> References: <200409171309.48011.fdrake@acm.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414E3B09.9070407@interlink.com.au> <200409211036.15861.fdrake@acm.org> Message-ID: I'm not sure how easy or difficult it would be---but it would be very convenient for me if the documentation was also downloadable in windows help (CHM) format. Currently CHM files are only available in windows installers, but I use them on Linux (easier to search, etc). -param On Tue, 21 Sep 2004 10:36:15 -0400, Fred L. Drake, Jr. wrote: > [Responding to an absolutely enormous deluge of emails on python-dev...] > > On Sunday 19 September 2004 10:06 pm, Anthony Baxter wrote: > > I have no intention of dropping tar.gz source releases. I think Fred > > was talking about the documentation tarballs. Even then, I think there's > > some advantages to keeping both, and I don't really see the advantage > > to dropping the tar.gz format. But hey, that's up to Fred - he's the > > one who makes the doc releases. > > Dang, it doesn't pay to be away from email for three days, does it? > > Yes, I was only talking about documentation releases. It never occurred to me > anyone would think I was talking about Python source releases. Maybe I > shouldn't have added python-dev to the recipients list for my original email, > but too often objections get heard quite late if I don't include python-dev. > > For the documentation, there's a much longer history of providing the bz2 > versions of the archives. There are also many more archives we can drop per > release. > > While in theory disk space isn't supposed to be an issue, it seems to be > something our sysadmin group is dealing with on a regular basis (mostly > cleaning up old webserver logs). So while the space itself may not be an > issue, it certainly generates tedious work for volunteers. > > My motivation in dropping the bz2 archives is two-fold: > > 1. Reduce disk space consumed per release, mostly to ease the burden on the > sysadmin group. > > 2. Reduce the number of files posted for the documentation per release, so > that choices for end-users are easier to pick through. > > > > > -Fred > > -- > Fred L. Drake, Jr. > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/psoberoi%40gmail.com > From cce at clarkevans.com Tue Sep 21 17:58:00 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Tue Sep 21 17:58:05 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <200409171309.48011.fdrake@acm.org> References: <200409171309.48011.fdrake@acm.org> Message-ID: <20040921155759.GB91940@prometheusresearch.com> Fred, >From what I understand, the algorithmic behavior of bz2 and gz are completely different -- while gzip is incremental, bz2 requires memory proportional to the size of the source information. Furthermore, most browsers now support gzip compression for their web pages, it will quite some time before bz2 support is ubiquitous. Unless these two issues are different than I understand them, I'd prefer if gzip remain in the standard Python distribution. Best, Clark From cce at clarkevans.com Tue Sep 21 18:00:54 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Tue Sep 21 18:00:58 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <20040921155759.GB91940@prometheusresearch.com> References: <200409171309.48011.fdrake@acm.org> <20040921155759.GB91940@prometheusresearch.com> Message-ID: <20040921160053.GC91940@prometheusresearch.com> *blush* I read the post wrong, please disregard my comment. From fredrik at pythonware.com Tue Sep 21 18:02:57 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Sep 21 18:01:12 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for futurereleases. References: <200409171309.48011.fdrake@acm.org><91489DBB-08FC-11D9-A518-000D932927FE@opnet.com><414E3B09.9070407@interlink.com.au> <200409211036.15861.fdrake@acm.org> Message-ID: Fred Drake wrote: > While in theory disk space isn't supposed to be an issue, it seems to be > something our sysadmin group is dealing with on a regular basis (mostly > cleaning up old webserver logs). So while the space itself may not be an > issue, it certainly generates tedious work for volunteers. I thought everyone knew that logs always fill up until the disk is almost full, no matter how much disk you have. From fdrake at acm.org Tue Sep 21 18:24:37 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Sep 21 18:24:50 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: References: <200409171309.48011.fdrake@acm.org> <200409211036.15861.fdrake@acm.org> Message-ID: <200409211224.37497.fdrake@acm.org> On Tuesday 21 September 2004 11:31 am, Paramjit Oberoi wrote: > I'm not sure how easy or difficult it would be---but it would be very > convenient for me if the documentation was also downloadable in > windows help (CHM) format. Currently CHM files are only available in > windows installers, but I use them on Linux (easier to search, etc). You do? What software supports them? It would be cool to have a decent single-file documentation browser on Linux. -Fred -- Fred L. Drake, Jr. From gvanrossum at gmail.com Tue Sep 21 18:11:55 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Sep 21 18:25:32 2004 Subject: [Python-Dev] built on beer? In-Reply-To: <16719.30101.8590.767412@evoke.lucasdigital.com> References: <16719.19687.44576.866934@evoke.lucasdigital.com> <16719.30101.8590.767412@evoke.lucasdigital.com> Message-ID: I don't know where that quote comes from, but it's true! During the early days, when hacking on Python, I often lived on stroopwafels and beer. (If you've never visited the Netherlands, you *must* Google for stroopwafels. :-) On Mon, 20 Sep 2004 17:28:05 -0700, Tommy Burnette wrote: > > apologies for replying to my own posting- the "fact" I wished to > advertise was the beer one, not the nokia one! > > Tommy Burnette writes: > | hey team, > > > | > | in a completely un-python-related thread about mobile phones last > | week, a friend (who I did not know knew anything about python), when > | asked what made a certain nokia phone stand out above one from another > | company, replied: > | > | > | "... nokia runs python, the language built on beer." > | > | > | does the PSF have any t-shirts that advertise this fact? :) > | > | > | _______________________________________________ > | Python-Dev mailing list > | Python-Dev@python.org > | http://mail.python.org/mailman/listinfo/python-dev > | Unsubscribe: http://mail.python.org/mailman/options/python-dev/tommy%40ilm.com > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Tue Sep 21 18:36:47 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Sep 21 18:40:43 2004 Subject: [Python-Dev] Re: Planning to drop gzip compression for futurereleases. References: <200409171309.48011.fdrake@acm.org><200409211036.15861.fdrake@acm.org> <200409211224.37497.fdrake@acm.org> Message-ID: Fred wrote: > You do? What software supports them? It would be cool to have a decent > single-file documentation browser on Linux. http://xchm.sourceforge.net/ From barry at python.org Tue Sep 21 18:47:42 2004 From: barry at python.org (Barry Warsaw) Date: Tue Sep 21 18:47:48 2004 Subject: [Python-Dev] built on beer? In-Reply-To: References: <16719.19687.44576.866934@evoke.lucasdigital.com> <16719.30101.8590.767412@evoke.lucasdigital.com> Message-ID: <1095785262.8357.62.camel@geddy.wooz.org> On Tue, 2004-09-21 at 12:11, Guido van Rossum wrote: > I don't know where that quote comes from, but it's true! During the > early days, when hacking on Python, I often lived on stroopwafels and > beer. (If you've never visited the Netherlands, you *must* Google for > stroopwafels. :-) http://www.amazingstroopwafels.nl/ I /thought/ that guy in the back right looked familiar! -B -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040921/30ad7de8/attachment.pgp From psoberoi at gmail.com Tue Sep 21 19:14:59 2004 From: psoberoi at gmail.com (Paramjit Oberoi) Date: Tue Sep 21 19:15:01 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <200409211224.37497.fdrake@acm.org> References: <200409171309.48011.fdrake@acm.org> <200409211036.15861.fdrake@acm.org> <200409211224.37497.fdrake@acm.org> Message-ID: > You do? What software supports them? It would be cool to have a decent > single-file documentation browser on Linux. /F pointed out xchm - http://xchm.sourceforge.net/ - that's what I use. There is also GnoCHM - http://gnochm.sourceforge.net/ - which wasn't completely stable when I last tried it. But it's written in Python/PyGTK. Both these readers use CHMLIB - http://66.93.236.84/~jedwin/projects/chmlib/ - to read CHM files. Python bindings for this library are available from the GnoCHM project: http://gnochm.sourceforge.net/pychm.html -param From martin at v.loewis.de Tue Sep 21 20:20:15 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Sep 21 20:20:40 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: References: <200409171309.48011.fdrake@acm.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414E3B09.9070407@interlink.com.au> <200409211036.15861.fdrake@acm.org> Message-ID: <415070DF.40201@v.loewis.de> Paramjit Oberoi wrote: > I'm not sure how easy or difficult it would be---but it would be very > convenient for me if the documentation was also downloadable in > windows help (CHM) format. Currently CHM files are only available in > windows installers, but I use them on Linux (easier to search, etc). I could do this along with the Windows installer releases. However, I don't think I can do this whenever Fred makes a snapshot release; I also doubt that Fred can easily do this on his own, since the documentation is build on Unix, and the CHM file on Windows. Regards, Martin From aahz at pythoncraft.com Tue Sep 21 20:25:28 2004 From: aahz at pythoncraft.com (Aahz) Date: Tue Sep 21 20:25:32 2004 Subject: [Python-Dev] built on beer? In-Reply-To: References: <16719.19687.44576.866934@evoke.lucasdigital.com> <16719.30101.8590.767412@evoke.lucasdigital.com> Message-ID: <20040921182528.GA24930@panix.com> On Tue, Sep 21, 2004, Guido van Rossum wrote: > > I don't know where that quote comes from, but it's true! During the > early days, when hacking on Python, I often lived on stroopwafels and > beer. (If you've never visited the Netherlands, you *must* Google for > stroopwafels. :-) Apparently they get imported to the US now.... -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From psoberoi at gmail.com Tue Sep 21 21:02:47 2004 From: psoberoi at gmail.com (Paramjit Oberoi) Date: Tue Sep 21 21:02:49 2004 Subject: [Python-Dev] Planning to drop gzip compression for future releases. In-Reply-To: <415070DF.40201@v.loewis.de> References: <200409171309.48011.fdrake@acm.org> <91489DBB-08FC-11D9-A518-000D932927FE@opnet.com> <414E3B09.9070407@interlink.com.au> <200409211036.15861.fdrake@acm.org> <415070DF.40201@v.loewis.de> Message-ID: > I could do this along with the Windows installer releases. However, > I don't think I can do this whenever Fred makes a snapshot release; That would be perfectly adequate. The documentation doesn't change that much between snapshots, and anyway, I usually don't use snapshot releases... As far as I am concerned, just having the CHM files corresponding to the offical releases would be fine. Thanks, -param From dw-python.org at botanicus.net Wed Sep 22 00:16:08 2004 From: dw-python.org at botanicus.net (David Wilson) Date: Wed Sep 22 00:16:17 2004 Subject: [Python-Dev] [Patch 1032206] Add API to logging package to allow intercooperation. Message-ID: <20040921221608.GA71441@thailand.botanicus.net> Hi there, There are two alternative patches provided to add a single extra API item for this package, which would allow developers the ability to extend the logging package to a certain extent without clobbering each other's work. At present, it isn't possible for a package to customise the logging.Logger class, without running the risk of having it's changes clobbered by an application using the package, or another package. This small change allows each customiser to inherit changes from the last customiser. Any chance of getting one of these solutions in for 2.4? the "loggerClass" option provides more respectable declaration syntax, but the "getLoggerclass" option provides symmetry. http://sourceforge.net/tracker/index.php?func=detail&aid=1032206&group_id=5470&atid=305470 Thanks, David. -- The next great adventure of mankind is not for people who ask, "What exactly is the point?" They will never get it. -- http://news.bbc.co.uk/1/hi/sci/tech/3302375.stm From kbk at shore.net Wed Sep 22 05:22:11 2004 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed Sep 22 05:22:17 2004 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200409220322.i8M3MBF7005521@h006008a7bda6.ne.client2.attbi.com> Patch / Bug Summary ___________________ Patches : 235 open ( -6) / 2633 closed (+11) / 2868 total ( +5) Bugs : 767 open ( +3) / 4463 closed (+10) / 5230 total (+13) RFE : 151 open ( +1) / 131 closed ( +0) / 282 total ( +1) New / Reopened Patches ______________________ (2004-09-18) CLOSED http://python.org/sf/1030422 opened by Jeff Connelly aka shellreef Patch for bug #780725 (2004-09-20) http://python.org/sf/1031213 opened by atsuo ishimoto Clean up discussion of new C thread idiom (2004-09-20) http://python.org/sf/1031233 opened by Greg Chapman atexit decorator (2004-09-21) http://python.org/sf/1031687 opened by Raymond Hettinger Add API to logging package to allow intercooperation. (2004-09-21) http://python.org/sf/1032206 opened by Dave Wilson Patches Closed ______________ Decimal performance enhancements (2004-09-01) http://python.org/sf/1020845 closed by rhettinger topdir calculated incorrectly in bdist_rpm (2004-09-03) http://python.org/sf/1022003 closed by jafo add support for the AutoReq flag in bdist_rpm (2004-09-03) http://python.org/sf/1022011 closed by jafo Adding IPv6 host handling to httplib (2004-09-15) http://python.org/sf/1028502 closed by loewis Add status code constants to httplib (2004-09-10) http://python.org/sf/1025790 closed by loewis tarfile.py longnames are truncated in getnames() (2004-09-16) http://python.org/sf/1029061 closed by loewis Patch for bug 933795. term.h and curses on Solaris (2004-08-19) http://python.org/sf/1012280 closed by loewis fix bug 807871 : tkMessageBox.askyesno wrong result (2004-08-29) http://python.org/sf/1018509 closed by loewis Error when int sent to PyLong_AsUnsignedLong (2004-09-08) http://python.org/sf/1024670 closed by loewis WinSock 2 support on Win32 w/ MSVC++ 6 (fix #860134) (2004-03-03) http://python.org/sf/908631 closed by loewis (2004-09-18) http://python.org/sf/1030422 closed by jeffconnelly New / Reopened Bugs ___________________ email.Utils not mentioned (2004-09-17) http://python.org/sf/1030118 opened by Jeff Blaine rfc822 __iter__ problem (2004-09-17) http://python.org/sf/1030125 opened by Mike Foord socket is not garbage-collected under specific circumstances (2004-09-18) CLOSED http://python.org/sf/1030249 opened by Matthias Klose distutils' dry-run wants to create some real build dirs (2004-09-18) http://python.org/sf/1030250 opened by Matthias Klose os.system exhausts file descriptors (2004-09-18) CLOSED http://python.org/sf/1030388 opened by Eray Ozkural os.path.join() does not raise TypeError (2004-09-18) http://python.org/sf/1030499 opened by Pierre Fortin PyMapping_Check crashes when argument is NULL (2004-09-19) CLOSED http://python.org/sf/1030557 opened by Michiel de Hoon PyOS_InputHook broken (2004-09-19) http://python.org/sf/1030629 opened by Michiel de Hoon Email message croaks the new email pkg parser (2004-09-19) http://python.org/sf/1030941 opened by Skip Montanaro tarfile: dirsize is not zero (2004-09-20) CLOSED http://python.org/sf/1031148 opened by Bertram Scharpf decimal module inconsistent with integers and floats (2004-09-20) CLOSED http://python.org/sf/1031480 opened by Anthony Tuininga Fold tuples of constants into a single constant (2004-09-20) http://python.org/sf/1031667 opened by Raymond Hettinger Conflicting descriptions of application order of decorators (2004-09-21) http://python.org/sf/1031897 opened by Hamish Lawson Bugs Closed ___________ help() does not check for chm file (2004-09-09) http://python.org/sf/1025392 closed by loewis socket is not garbage-collected under specific circumstances (2004-09-18) http://python.org/sf/1030249 closed by loewis configure not able to find ncurses/curses in Solaris (2004-04-12) http://python.org/sf/933795 closed by loewis tkMessageBox.askyesno wrong result (2003-09-17) http://python.org/sf/807871 closed by loewis Trivial fix for obscure bug in os.urandom() (2004-09-03) http://python.org/sf/1021596 closed by loewis os.system exhausts file descriptors (2004-09-18) http://python.org/sf/1030388 closed by loewis PyMapping_Check crashes when argument is NULL (2004-09-18) http://python.org/sf/1030557 closed by rhettinger tarfile: dirsize is not zero (2004-09-20) http://python.org/sf/1031148 closed by loewis decimal module inconsistent with integers and floats (2004-09-20) http://python.org/sf/1031480 closed by rhettinger get_installer_filename (2004-09-15) http://python.org/sf/1028334 closed by theller New / Reopened RFE __________________ Update unicodedata to version 4.0.1 (2004-09-20) http://python.org/sf/1031288 opened by Oliver Horn From python at rcn.com Wed Sep 22 22:32:00 2004 From: python at rcn.com (Raymond Hettinger) Date: Wed Sep 22 22:33:08 2004 Subject: FW: [Python-Dev] Noam's open regex requests Message-ID: <000901c4a0e3$36c1afe0$e841fea9@oemcomputer> Haven't heard a peep on this one. Is anyone going to be miffed if I accept Noam's requests? Raymond Hettinger -----Original Message----- From: python-dev-bounces+python=rcn.com@python.org [mailto:python-dev-bounces+python=rcn.com@python.org] On Behalf Of Raymond Hettinger Sent: Saturday, September 18, 2004 6:34 PM To: python-dev@python.org Cc: 'Noam Raphael' Subject: [Python-Dev] Noam's open regex requests [Noam Raphael] > I've suggested three things that I think should be done in that > case, and nobody objected. > > 1. Add a prominent note in the module contents page or in the module's > main page, stating that some functionality can only be acheived by using > compiled REs. I would make that read "The methods of compiled regular expressions allow more options than their simplified function counterparts. Most non-trivial applications always use the compiled form." > 2. Document the optional parameters which let you specify the start and > end pos in the findall and finditer methods of a compiled RE object. This seems reasonable to me. The API is already exposed and is useful. Why not document it. AFAICT, there are no plans to take away the functionality. > 3. Add the optional parameter "flags" to the findall and finditer > functions. Then, the four functions match, search, findall and finditer > would have the same interface: function(pattern, string[, flags]). This also seems reasonable to me. It is marginally useful and it may reduce the learning curve ever so slightly. There is nothing special about findall() and finditer() that makes them different from match() and search() with respect to flags. Raymond Hettinger _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/python%40rcn.com From skip at pobox.com Wed Sep 22 23:43:42 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 22 23:43:47 2004 Subject: FW: [Python-Dev] Noam's open regex requests In-Reply-To: <000901c4a0e3$36c1afe0$e841fea9@oemcomputer> References: <000901c4a0e3$36c1afe0$e841fea9@oemcomputer> Message-ID: <16721.61966.97526.546831@montanaro.dyndns.org> Raymond> Haven't heard a peep on this one. Is anyone going to be miffed Raymond> if I accept Noam's requests? I thought most of the opinion (certainly from Fredrik and Guido) ran counter to the request. Skip From gvanrossum at gmail.com Wed Sep 22 23:50:15 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Sep 22 23:50:26 2004 Subject: FW: [Python-Dev] Noam's open regex requests In-Reply-To: <16721.61966.97526.546831@montanaro.dyndns.org> References: <000901c4a0e3$36c1afe0$e841fea9@oemcomputer> <16721.61966.97526.546831@montanaro.dyndns.org> Message-ID: We're not against #1 and #2, which are just fixing the docs! I don't know what /F thinks of #3, which is a small subset of the original proposal (to add options that are already present for other APIs), but I'm +0.5 on it. FWIW. On Wed, 22 Sep 2004 16:43:42 -0500, Skip Montanaro wrote: > > Raymond> Haven't heard a peep on this one. Is anyone going to be miffed > Raymond> if I accept Noam's requests? > > I thought most of the opinion (certainly from Fredrik and Guido) ran counter > to the request. > > Skip > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Thu Sep 23 02:15:30 2004 From: python at rcn.com (Raymond Hettinger) Date: Thu Sep 23 02:16:38 2004 Subject: FW: [Python-Dev] Noam's open regex requests In-Reply-To: <16721.61966.97526.546831@montanaro.dyndns.org> Message-ID: <002c01c4a102$6ff12e20$e841fea9@oemcomputer> > Raymond> Haven't heard a peep on this one. Is anyone going to be > miffed > Raymond> if I accept Noam's requests? [Skip] > I thought most of the opinion (certainly from Fredrik and Guido) ran > counter > to the request. IIRC, this is the part of the request that wasn't shot down. Originally, the OP wanted the function API to fully duplicate the method API. There were several reasons for not doing that: API stability; where to put the flags argument relative to the start/stop arguments; the functions were supposed to be kept simple; and there were unresolvable argument order conflicts. So, the remaining part of the request is more humble: document that the functions are not supposed to be full featured, fully document the existing API, and to give findall() and finditer() the same interface as the other functions. I sent Fred a note on the third part and will stick with whatever he says if there is a reply. Raymond From tim.peters at gmail.com Thu Sep 23 10:26:58 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 23 10:27:13 2004 Subject: [Python-Dev] A cute new way to get an infinite loop Message-ID: <1f7befae040923012645bc07f8@mail.gmail.com> >>> x = [1] >>> x.extend(-y for y in x) From arigo at tunes.org Thu Sep 23 11:45:02 2004 From: arigo at tunes.org (Armin Rigo) Date: Thu Sep 23 11:50:19 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects floatobject.c, 2.132, 2.133 In-Reply-To: References: Message-ID: <20040923094502.GA10207@vicky.ecs.soton.ac.uk> Hello Tim, Your float.richcompare patch, trying to map the C semantics at the Python level, introduces artificial results when comparing NaN's with longs: >>> float('nan') > 0 False >>> float('nan') > 0L True I am not aware of all the problems and various platforms, but clearly in the patch 'vsign' by itself doesn't make much sense if 'v' is a NaN. Wouldn't all compilers and platforms compare NaNs "strangely", for some detectable definition of "stange"? Something along the lines of: #define Py_IS_NAN(v) (!Py_IS_INFINITY(v) && \ ( ((v) < 0.0 && (v) > 0.0) || \ !((v) < 1.0 || (v) > -1.0) ) Armin From imbaczek at gmail.com Thu Sep 23 14:46:35 2004 From: imbaczek at gmail.com (=?UTF-8?Q?Marek_=22Baczek=22_Baczy=C5=84ski?=) Date: Thu Sep 23 14:46:38 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae040923012645bc07f8@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: <5f3d2c310409230546693ced87@mail.gmail.com> On Thu, 23 Sep 2004 04:26:58 -0400, Tim Peters wrote: > >>> x = [1] > >>> x.extend(-y for y in x) Doesn't it leak memory when Ctrl+C'd (on Windows at least?) -- { Marek Baczy?ski :: UIN 57114871 :: GG 161671 :: JID imbaczek@jabber.gda.pl } { http://www.vlo.ids.gda.pl/ | imbaczek at poczta fm | http://www.promode.org } .. .. .. .. ... ... ...... evolve or face extinction ...... ... ... .. .. .. .. From jhylton at gmail.com Thu Sep 23 15:01:17 2004 From: jhylton at gmail.com (Jeremy Hylton) Date: Thu Sep 23 15:01:30 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae040923012645bc07f8@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: On Thu, 23 Sep 2004 04:26:58 -0400, Tim Peters wrote: > >>> x = [1] > >>> x.extend(-y for y in x) It is perhaps surprising that something lazy can work so hard. Jeremy From python at rcn.com Thu Sep 23 17:33:00 2004 From: python at rcn.com (Raymond Hettinger) Date: Thu Sep 23 17:34:14 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: <001901c4a182$9c44ee00$e841fea9@oemcomputer> > >>> x = [1] > >>> x.extend(-y for y in x) In comparison, the classic form doesn't seem as magical: x = [1] for y in x: x.append(-y) Raymond From FBatista at uniFON.com.ar Thu Sep 23 17:58:39 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Thu Sep 23 18:03:21 2004 Subject: [Python-Dev] A cute new way to get an infinite loop Message-ID: [Raymond Hettinger] #- In comparison, the classic form doesn't seem as magical: #- #- x = [1] #- for y in x: #- x.append(-y) #- The eternal inherent risk of modify the iterable being iterated. Who didn't ever fall in this? . Facundo From goodger at python.org Thu Sep 23 18:32:38 2004 From: goodger at python.org (David Goodger) Date: Thu Sep 23 18:32:58 2004 Subject: [Python-Dev] Re: A cute new way to get an infinite loop In-Reply-To: <1f7befae040923012645bc07f8@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: <4152FAA6.70201@python.org> [Tim Peters] > >>> x = [1] > >>> x.extend(-y for y in x) Not quite infinite, since eventually it will raise a MemoryError. So "while 1:" still rules that roost. ;-) -- David Goodger From tim.peters at gmail.com Thu Sep 23 19:39:56 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 23 19:40:01 2004 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects floatobject.c, 2.132, 2.133 In-Reply-To: <20040923094502.GA10207@vicky.ecs.soton.ac.uk> References: <20040923094502.GA10207@vicky.ecs.soton.ac.uk> Message-ID: <1f7befae040923103916aba29b@mail.gmail.com> [Armin Rigo] > Your float.richcompare patch, trying to map the C semantics at the Python > level, introduces artificial results when comparing NaN's with longs: Not really. All Python behavior in the presence of NaNs was accidental before. That it remains accidental was noted in the checkin comment, and in an XXX block in the new code. The specific form of accidents may or may not have changed, depending on platform. > >>> float('nan') > 0 And it remains an accident that float('nan') didn't raise ValueError on whatever box you're using (it does, e.g., on mine). > False > >>> float('nan') > 0L > True > > I am not aware of all the problems and various platforms, but clearly in the > patch 'vsign' by itself doesn't make much sense if 'v' is a NaN. Right, it makes no sense. > Wouldn't all compilers and platforms compare NaNs "strangely", for some > detectable definition of "stange"? Yes. Some may even raise SIGFPE if you try; that was also true before the patch. > Something along the lines of: > > #define Py_IS_NAN(v) (!Py_IS_INFINITY(v) && \ > ( ((v) < 0.0 && (v) > 0.0) || \ > !((v) < 1.0 || (v) > -1.0) ) As the new code says, /* XXX If we had a reliable way to check whether i is a * XXX NaN, it would belong in this branch too. */ The best candidate for 2.4 may be: #define Py_IS_NAN(v) ((v) != (v)) That works under MS VC 7.1, but didn't work under VC 6.0 (which is why the "for 2.4" qualifier -- Python on Windows is switching to 7.1 for 2.4). If someone can confirm that it works under recent gcc too, let's do that. Nothing exists that will work on all platforms, but all platforms claiming to support 754 have *some* way to spell "true iff a NaN, and don't raise SIGFPE just because I'm asking". C99 spells that isnan(x), from math.h. MS C doesn't have that, but does have _isnan(x), from float.h. That's the maddening part -- it's easy to spell on any specific platform, but nothing about the spelling (neither name nor header file) is the same across platforms. From tim.peters at gmail.com Thu Sep 23 20:11:34 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 23 20:11:42 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <5f3d2c310409230546693ced87@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> <5f3d2c310409230546693ced87@mail.gmail.com> Message-ID: <1f7befae0409231111171029d2@mail.gmail.com> [Marek Baczek Baczy?ski] > Doesn't it leak memory when Ctrl+C'd (on Windows at least?) Not really. "Leak" is reserved for cases where memory is unaccounted for. In this case, the memory is consumed by the ever-growing list: >>> x = [1] >>> x.extend(-y for y in x) Traceback (most recent call last): File "", line 1, in ? File "", line 1, in KeyboardInterrupt >>> len(x) 67090195 >>> x[:10] [1, -1, 1, -1, 1, -1, 1, -1, 1, -1] >>> At that point, doing >>> del x[:] reclaimed a few hundred megabytes. From cben at users.sf.net Thu Sep 23 19:24:35 2004 From: cben at users.sf.net (Beni Cherniavsky) Date: Thu Sep 23 21:10:30 2004 Subject: [Python-Dev] Re: A cute new way to get an infinite loop In-Reply-To: <1f7befae040923012645bc07f8@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: Tim Peters wrote: >>>>x = [1] >>>>x.extend(-y for y in x) > A simpler way: >>> x = [1, -1] >>> x.extend(iter(x)) Curiously, this didn't "work" before 2.4 either: >>> x = [1] >>> x.extend(iter(x)) >>> x [1, 1] The iterator did see the new elements after the extend call but not during it: >>> x = [1] >>> i = iter(x) >>> x.extend(x) >>> list(i) [1, 1] >>> x = [1] >>> i = iter(x) >>> x.extend([list(i)]) >>> x [1, [1]] The reason is that in 2.3 `listextend()` passed the right argument through `PySequence_Fast` which copied it before beggining to extend the list. It's much better now. I mean it! Bugs should be predictable. Infinite loop should never terminate silently. Unless explicitly terminated. From tjreedy at udel.edu Fri Sep 24 02:31:04 2004 From: tjreedy at udel.edu (Terry Reedy) Date: Fri Sep 24 02:31:16 2004 Subject: [Python-Dev] Re: A cute new way to get an infinite loop References: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: "Tim Peters" wrote in message news:1f7befae040923012645bc07f8@mail.gmail.com... >>>> x = [1] >>>> x.extend(-y for y in x) Very similar to this old way (2.2 and I presume before): >>> l=[1] >>> for i in l: l.append(i) ... Traceback (most recent call last): File "", line 1, in ? KeyboardInterrupt >>> len(l) 1623613 but admittedly a bit more baroque ;-) So, are things like this a programming bug, interpreter bug, or language definition bug? or just a 'gotcha'? Terry J. Reedy From tdelaney at avaya.com Fri Sep 24 03:17:06 2004 From: tdelaney at avaya.com (Delaney, Timothy C (Timothy)) Date: Fri Sep 24 03:17:12 2004 Subject: [Python-Dev] Re: A cute new way to get an infinite loop Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4A901D@au3010avexu1.global.avaya.com> Terry Reedy wrote: > So, are things like this a programming bug, interpreter bug, or > language definition bug? or just a 'gotcha'? Gotcha. In pretty much every language, you have to be careful about modifying what you're iterating over. I don't see that Python should be any different ;) However, Tim's example is a bit less obvious that you are modifying the thing you're iterating over ... Hence the "cuteness" IMO. Tim Delaney From tim.peters at gmail.com Fri Sep 24 06:14:56 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Sep 24 06:14:58 2004 Subject: [Python-Dev] Re: A cute new way to get an infinite loop In-Reply-To: References: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: <1f7befae040923211420e5905c@mail.gmail.com> [Terry Reedy] > Very similar to this old way (2.2 and I presume before): Been there forever, yes. > >>> l=[1] > >>> for i in l: l.append(i) > ... > Traceback (most recent call last): > File "", line 1, in ? > KeyboardInterrupt > >>> len(l) > 1623613 > > but admittedly a bit more baroque ;-) > > So, are things like this a programming bug, interpreter bug, or language > definition bug? or just a 'gotcha'? They're features, provoked into revealing their dark sides by pilot error. It's not an accident that I posted my note right after checking in a new test, in test_long.py, containing: cases.extend([-x for x in cases]) I will not admit that it didn't always contain the square brackets. And if I won't admit that, I *sure* won't admit that I initially feared hairy new code for mixed float-vs-long comparison contained an infinite loop . never-getting-an-infinite-loop-is-a-symptom-of-not-trying-hard-enough-ly y'rs - tim From imbaczek at gmail.com Fri Sep 24 11:41:50 2004 From: imbaczek at gmail.com (=?UTF-8?Q?Marek_=22Baczek=22_Baczy=C5=84ski?=) Date: Fri Sep 24 11:41:53 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae0409231111171029d2@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> <5f3d2c310409230546693ced87@mail.gmail.com> <1f7befae0409231111171029d2@mail.gmail.com> Message-ID: <5f3d2c3104092402412b891a11@mail.gmail.com> On Thu, 23 Sep 2004 14:11:34 -0400, Tim Peters wrote: > [Marek Baczek Baczy?ski] > > Doesn't it leak memory when Ctrl+C'd (on Windows at least?) > > Not really. "Leak" is reserved for cases where memory is unaccounted > for. In this case, the memory is consumed by the ever-growing list: [...] I realized that the moment after I pressed 'Send'; felt so embarrassed that I hoped no one would see that post :) Next time I'll think. Twice. -- { Marek Baczy?ski :: UIN 57114871 :: GG 161671 :: JID imbaczek@jabber.gda.pl } { http://www.vlo.ids.gda.pl/ | imbaczek at poczta fm | http://www.promode.org } .. .. .. .. ... ... ...... evolve or face extinction ...... ... ... .. .. .. .. From lists at hlabs.spb.ru Fri Sep 24 16:10:21 2004 From: lists at hlabs.spb.ru (Dmitry Vasiliev) Date: Fri Sep 24 12:02:19 2004 Subject: [Python-Dev] Methods identity...? Message-ID: <41542ACD.5080307@hlabs.spb.ru> Is this intended? Seems like a bug... (Python 2.1.3, 2.2.2, 2.3.4, 2.4a3, both old- and new- style classes.) >>> class Test(object): ... def test(self): pass ... >>> Test.test is Test.test False >>> t = Test() >>> t.test is t.test False -- Dmitry Vasiliev (dima at hlabs.spb.ru) http://hlabs.spb.ru From aahz at pythoncraft.com Fri Sep 24 16:11:05 2004 From: aahz at pythoncraft.com (Aahz) Date: Fri Sep 24 16:11:07 2004 Subject: [Python-Dev] Methods identity...? In-Reply-To: <41542ACD.5080307@hlabs.spb.ru> References: <41542ACD.5080307@hlabs.spb.ru> Message-ID: <20040924141105.GA3062@panix.com> On Fri, Sep 24, 2004, Dmitry Vasiliev wrote: > > Is this intended? Seems like a bug... > > (Python 2.1.3, 2.2.2, 2.3.4, 2.4a3, both old- and new- style classes.) > > >>> class Test(object): > ... def test(self): pass > ... > >>> Test.test is Test.test > False > >>> t = Test() > >>> t.test is t.test > False Not a bug. For more discussion, please post to comp.lang.python -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From gerrit at nl.linux.org Fri Sep 24 17:00:56 2004 From: gerrit at nl.linux.org (Gerrit) Date: Fri Sep 24 17:02:01 2004 Subject: [Python-Dev] Methods identity...? In-Reply-To: <20040924141105.GA3062@panix.com> References: <41542ACD.5080307@hlabs.spb.ru> <20040924141105.GA3062@panix.com> Message-ID: <20040924150056.GA4343@nl.linux.org> Aahz wrote: > On Fri, Sep 24, 2004, Dmitry Vasiliev wrote: > > > > Is this intended? Seems like a bug... > > > > (Python 2.1.3, 2.2.2, 2.3.4, 2.4a3, both old- and new- style classes.) > > > > >>> class Test(object): > > ... def test(self): pass > > ... > > >>> Test.test is Test.test > > False > > >>> t = Test() > > >>> t.test is t.test > > False > > Not a bug. For more discussion, please post to comp.lang.python Or search the archives, I recall having brought this up on c.l.py once. Gerrit. -- Weather in Twenthe, Netherlands 24/09 16:25: 13.0?C light rain showers; Cumulonimbus clouds observed mostly cloudy wind 5.8 m/s WNW (57 m above NAP) -- In the councils of government, we must guard against the acquisition of unwarranted influence, whether sought or unsought, by the military-industrial complex. The potential for the disastrous rise of misplaced power exists and will persist. -Dwight David Eisenhower, January 17, 1961 From ndbecker2 at verizon.net Fri Sep 24 21:59:44 2004 From: ndbecker2 at verizon.net (Neal D. Becker) Date: Fri Sep 24 21:59:49 2004 Subject: [Python-Dev] python.sty conflict with \newcommand\url Message-ID: I hope this is the correct place to post this question. I'm trying to use python.sty to write some doc for my modules. If I try to use \hyperref package, I get this: (/usr/share/texmf/tex/latex/html/url.sty ! LaTeX Error: Command \url already defined. Or name \end... illegal, see p.192 of the manual. This is what python.sty says: % Use this def/redef approach for \url{} since hyperref defined this already, % but only if we actually used hyperref: \ifpdf \newcommand{\url}[1]{{% The comment suggest a workaround for hyperref, but it doesn't look like the code actually matches the comment. Any ideas? From python at dynkin.com Sat Sep 25 05:33:39 2004 From: python at dynkin.com (George Yoshida) Date: Sat Sep 25 05:32:46 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae040923012645bc07f8@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> Message-ID: <4154E713.3090206@dynkin.com> Tim Peters wrote: >>>>x = [1] >>>>x.extend(-y for y in x) It does not always go into an infinite loop. I was bitten by this: >>> x = [] >>> x.extend(-y for y in x) Segmentation fault George From bob at redivi.com Sat Sep 25 05:36:10 2004 From: bob at redivi.com (Bob Ippolito) Date: Sat Sep 25 05:36:52 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <4154E713.3090206@dynkin.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> <4154E713.3090206@dynkin.com> Message-ID: <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> On Sep 24, 2004, at 11:33 PM, George Yoshida wrote: > Tim Peters wrote: > >>>>x = [1] > >>>>x.extend(-y for y in x) > > It does not always go into an infinite loop. I was bitten by this: > > >>> x = [] > >>> x.extend(-y for y in x) > Segmentation fault No algorithm that requires infinite memory will run for an infinite amount of time on a finite computer. Of course it should raise an exception instead of segfaulting though.. could it be blowing the stack? -bob From tim.peters at gmail.com Sat Sep 25 06:41:37 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sat Sep 25 06:41:40 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> <4154E713.3090206@dynkin.com> <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> Message-ID: <1f7befae0409242141ebdcf83@mail.gmail.com> [George Yoshida] >> It does not always go into an infinite loop. I was bitten by this: >> >> >>> x = [] >> >>> x.extend(-y for y in x) >> Segmentation fault [Bob Ippolito] > No algorithm that requires infinite memory will run for an infinite > amount of time on a finite computer. Of course it should raise an > exception instead of segfaulting though.. could it be blowing the > stack? No, its stack use is bounded (and small) no matter how long it runs. On Windows it eventually raises MemoryError. My guess is that George is using Linux. "It's a feature" that the Linux malloc() can lie (== malloc(n) can return a non-NULL value p even if you're going to get a segfault if you try to write to p+i for some i in range(n)). Linus likens this to airlines over-selling seats, based on the likelihood that someone will miss their flight. Argue with him . When malloc() claims to return memory that can't actually be used, there's not much Python can do about that (other than blow up when trying to use it). From python at dynkin.com Sat Sep 25 07:09:56 2004 From: python at dynkin.com (George Yoshida) Date: Sat Sep 25 07:09:04 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae0409242141ebdcf83@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> <4154E713.3090206@dynkin.com> <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> <1f7befae0409242141ebdcf83@mail.gmail.com> Message-ID: <4154FDA4.7090401@dynkin.com> Tim Peters wrote: > On Windows it eventually raises MemoryError. My guess is that George > is using Linux. That's right! $ uname -a Linux linux 2.6.5-7.108-smp #1 SMP Wed Aug 25 13:34:40 UTC 2004 i686 i686 i386 GNU/Linux George From ronaldoussoren at mac.com Sat Sep 25 13:56:52 2004 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Sat Sep 25 13:57:58 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae0409242141ebdcf83@mail.gmail.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> <4154E713.3090206@dynkin.com> <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> <1f7befae0409242141ebdcf83@mail.gmail.com> Message-ID: On 25-sep-04, at 6:41, Tim Peters wrote: > [George Yoshida] >>> It does not always go into an infinite loop. I was bitten by this: >>> >>>>>> x = [] >>>>>> x.extend(-y for y in x) >>> Segmentation fault > > [Bob Ippolito] >> No algorithm that requires infinite memory will run for an infinite >> amount of time on a finite computer. Of course it should raise an >> exception instead of segfaulting though.. could it be blowing the >> stack? > > No, its stack use is bounded (and small) no matter how long it runs. I get a bus error on OSX (although with a slightly out of date python2.4 from CVS). Why should this loop at all? x is the empty list, and the generator comprehension should therefore end up with an empty sequence. It's not like your initial example where the list was non-empty to at the start. It crashes because of an Py_INCREF(item) at line 2727 in listobject.c where item is NULL: 2722 assert(PyList_Check(seq)); 2723 2724 if (it->it_index < PyList_GET_SIZE(seq)) { 2725 item = PyList_GET_ITEM(seq, it->it_index); 2726 ++it->it_index; 2727 Py_INCREF(item); 2728 return item; 2729 } 2730 2731 Py_DECREF(seq); BWT. seq is null as well. From arigo at tunes.org Sat Sep 25 16:07:23 2004 From: arigo at tunes.org (Armin Rigo) Date: Sat Sep 25 16:12:38 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> References: <1f7befae040923012645bc07f8@mail.gmail.com> <4154E713.3090206@dynkin.com> <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> Message-ID: <20040925140723.GA13511@vicky.ecs.soton.ac.uk> Hi Bob, On Fri, Sep 24, 2004 at 11:36:10PM -0400, Bob Ippolito wrote: > > >>> x = [] > > >>> x.extend(-y for y in x) > > Segmentation fault > > No algorithm that requires infinite memory will run for an infinite > amount of time on a finite computer. The segfault is immediate. And the example is different, as Ronald pointed out: the list 'x' is empty! Uh oh. We have a real bug in listextend(): the list being extended is in a semi-invalid state when it's calling tp_iternext() on the 2nd iterable. This might call back Python code, which can inspect the list. The above example does just that. Crash. "Semi-invalid" means that all invariants are respected but the final items in the list are NULL. Reading them crashes. And I'm not even talking about the nasty things you can do if you modify the list while it's being extended :-) The safest solution would be to use a regular app1() to add each item as the iterable produce them instead of optimizing this case. I'm not sure we need the high-flying optimization of listextend() in this case (this is the case where the iterable we extend the list with is neither a list nor a tuple). I believe that the speed of app1() would be acceptable, given the fixed bug and the overall decrease of code complexity (though that should be measured). Armin From tim.peters at gmail.com Sat Sep 25 18:23:57 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sat Sep 25 18:24:00 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <20040925140723.GA13511@vicky.ecs.soton.ac.uk> References: <1f7befae040923012645bc07f8@mail.gmail.com> <4154E713.3090206@dynkin.com> <0AC4DE7F-0EA4-11D9-A344-000A95686CD8@redivi.com> <20040925140723.GA13511@vicky.ecs.soton.ac.uk> Message-ID: <1f7befae040925092332989585@mail.gmail.com> [Armin Rigo, on >>> x = [] >>> x.extend(-y for y in x) Segmentation fault ] > The segfault is immediate. And the example is different, as Ronald pointed > out: the list 'x' is empty! Good eye! I overlooked that too. > Uh oh. We have a real bug in listextend(): the list being extended is in a > semi-invalid state when it's calling tp_iternext() on the 2nd iterable. This > might call back Python code, which can inspect the list. The above example > does just that. Crash. > > "Semi-invalid" means that all invariants are respected but the final items in > the list are NULL. Reading them crashes. And I'm not even talking about the > nasty things you can do if you modify the list while it's being extended :-) Yup. The code doesn't check for C int overflow of m+n either. > The safest solution would be to use a regular app1() to add each item as the > iterable produce them instead of optimizing this case. I'm not sure we need > the high-flying optimization of listextend() in this case (this is the case > where the iterable we extend the list with is neither a list nor a tuple). I > believe that the speed of app1() would be acceptable, given the fixed bug and > the overall decrease of code complexity (though that should be measured). I think it's easy to fix. "The usual rule" applies: you can't assume anything about a mutable object after potentially calling back into Python. So trying to save info in "i", "m", or "n" across loop iterations can't work, and the list can never be left in an insane state ("semi" or not) at any time user code may get invoked. But since we have both "num allocated" and "num used" members in the list struct now, it's easy to use those instead of trying to carry info in locals. Patch attached. Anyone object? Of course in the example at the start of this msg, it leaves x empty. -------------- next part -------------- Index: Objects/listobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v retrieving revision 2.223 diff -c -u -r2.223 listobject.c --- Objects/listobject.c 12 Sep 2004 19:53:07 -0000 2.223 +++ Objects/listobject.c 25 Sep 2004 16:14:33 -0000 @@ -769,12 +769,20 @@ } m = self->ob_size; mn = m + n; - if (list_resize(self, mn) == -1) - goto error; - memset(&(self->ob_item[m]), 0, sizeof(*self->ob_item) * n); + if (mn >= m) { + /* Make room. */ + if (list_resize(self, mn) == -1) + goto error; + /* Make the list sane again. */ + self->ob_size = m; + } + /* Else m + n overflowed; on the chance that n lied, and there really + * is enough room, ignore it. If n was telling the truth, we'll + * eventually run out of memory during the loop. + */ /* Run iterator to exhaustion. */ - for (i = m; ; i++) { + for (;;) { PyObject *item = iternext(it); if (item == NULL) { if (PyErr_Occurred()) { @@ -785,8 +793,11 @@ } break; } - if (i < mn) - PyList_SET_ITEM(self, i, item); /* steals ref */ + if (self->ob_size < self->allocated) { + /* steals ref */ + PyList_SET_ITEM(self, self->ob_size, item); + ++self->ob_size; + } else { int status = app1(self, item); Py_DECREF(item); /* append creates a new ref */ @@ -796,10 +807,9 @@ } /* Cut back result list if initial guess was too large. */ - if (i < mn && self != NULL) { - if (list_ass_slice(self, i, mn, (PyObject *)NULL) != 0) - goto error; - } + if (self->ob_size < self->allocated) + list_resize(self, self->ob_size); /* shrinking can't fail */ + Py_DECREF(it); Py_RETURN_NONE; From python at rcn.com Sat Sep 25 20:21:39 2004 From: python at rcn.com (Raymond Hettinger) Date: Sat Sep 25 20:23:20 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae040925092332989585@mail.gmail.com> Message-ID: <000301c4a32c$809bd000$e841fea9@oemcomputer> > Patch attached. Anyone object? Of course in the example at the start > of this msg, it leaves x empty. If you can hold off one day, I'll have time to review it in detail tomorrow morning. And, I'll check to see if other parts of the code base are similarly afflicted. Raymond From python at rcn.com Sat Sep 25 21:17:15 2004 From: python at rcn.com (Raymond Hettinger) Date: Sat Sep 25 21:18:57 2004 Subject: [Python-Dev] More data points In-Reply-To: <20040925140723.GA13511@vicky.ecs.soton.ac.uk> Message-ID: <000a01c4a334$44b29e40$e841fea9@oemcomputer> [Bob Ippolito] > > > >>> x = [] > > > >>> x.extend(-y for y in x) > > > Segmentation fault I get a MemoryError. To help with get a comprehensive view when I look at this more closely tomorrow, can you try out variations on the theme with other mutables: myset.update deque.extend dict.update dict.fromkeys array.extend Raymond From ncoghlan at iinet.net.au Sun Sep 26 01:47:26 2004 From: ncoghlan at iinet.net.au (ncoghlan@iinet.net.au) Date: Sun Sep 26 01:47:32 2004 Subject: [Python-Dev] More data points In-Reply-To: <000a01c4a334$44b29e40$e841fea9@oemcomputer> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> Message-ID: <1096156046.4156038e396cd@mail.iinet.net.au> Quoting Raymond Hettinger : > [Bob Ippolito] > > > > >>> x = [] > > > > >>> x.extend(-y for y in x) > > > > Segmentation fault > > I get a MemoryError. > > To help with get a comprehensive view when I look at this more closely > tomorrow, can you try out variations on the theme with other mutables: > > myset.update > deque.extend > dict.update > dict.fromkeys > array.extend Short answer: all of these work OK for me (i.e. do nothing). Only list.extend suffers from the segmentation fault. Session transcripts (with bonus X's to trick mailreaders): [...@localhost src]$ ./python Python 2.4a3 (#16, Sep 21 2004, 17:33:57) [GCC 3.4.1 20040702 (Red Hat Linux 3.4.1-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. X>> x = [] X>> x.extend(-y for y in x) Segmentation fault [...@localhost src]$ ./python Python 2.4a3 (#16, Sep 21 2004, 17:33:57) [GCC 3.4.1 20040702 (Red Hat Linux 3.4.1-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. X>> x = set() X>> x.update(-y for y in x) X>> x set([]) X>> from collections import deque X>> x = deque() X>> x.extend(-y for y in x) X>> x deque([]) X>> x = {} X>> x.update(-y for y in x) X>> x {} X>> x.fromkeys(-y for y in x) {} X>> from array import array X>> x = array('B') X>> x.extend(-y for y in x) X>> x array('B') From ncoghlan at iinet.net.au Sun Sep 26 07:28:28 2004 From: ncoghlan at iinet.net.au (ncoghlan@iinet.net.au) Date: Sun Sep 26 07:28:34 2004 Subject: [Python-Dev] More data points In-Reply-To: <000a01c4a334$44b29e40$e841fea9@oemcomputer> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> Message-ID: <1096176508.4156537c95cca@mail.iinet.net.au> Quoting Raymond Hettinger : > To help with get a comprehensive view when I look at this more closely > tomorrow, can you try out variations on the theme with other mutables: > > myset.update > deque.extend > dict.update > dict.fromkeys > array.extend Returning to Tim's original infinite loop, the behaviour is interestingly variable. List and array go into the infinite loop. Deque and dictionary both detect that the loop variable has been mutated and throw a specific exception. Set throws the same exception as dictionary does (presumably, the main container inside 'set' is a dictionary) Details of behaviour: Python 2.4a3 (#16, Sep 21 2004, 17:33:57) [GCC 3.4.1 20040702 (Red Hat Linux 3.4.1-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. X>> x = [1] X>> x.extend(-y for y in x) Traceback (most recent call last): File "", line 1, in ? File "", line 1, in KeyboardInterrupt X>> len(x) 73727215 X>> x = set([1]) X>> x set([1]) X>> x.update(-y for y in x) Traceback (most recent call last): File "", line 1, in ? File "", line 1, in RuntimeError: dictionary changed size during iteration X>> x set([1, -1]) X>> from collections import deque X>> x = deque([1]) X>> x.extend(-y for y in x) Traceback (most recent call last): File "", line 1, in ? File "", line 1, in RuntimeError: deque changed size during iteration X>> x deque([1, -1]) X>> from array import array X>> x = array('b', '1') X>> x.extend(-y for y in x) Traceback (most recent call last): File "", line 1, in ? File "", line 1, in KeyboardInterrupt X>> len(x) 6327343 X>> x = dict.fromkeys([1]) X>> x {1: None} X>> x.update((-y, None) for y in x) Traceback (most recent call last): File "", line 1, in ? File "", line 1, in RuntimeError: dictionary changed size during iteration X>> x {1: None, -1: None} X>> x.fromkeys(-y for y in x) {-1: None} From tim.peters at gmail.com Sun Sep 26 07:50:50 2004 From: tim.peters at gmail.com (Tim Peters) Date: Sun Sep 26 07:50:54 2004 Subject: [Python-Dev] More data points In-Reply-To: <1096176508.4156537c95cca@mail.iinet.net.au> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> Message-ID: <1f7befae040925225047b6d3f3@mail.gmail.com> [ncoghlan@iinet.net.au] > Returning to Tim's original infinite loop, the behaviour is interestingly variable. > > List and array go into the infinite loop. What happens when you mutate a list while iterating over it is defined, and an infinite loop is expected for that. Ditto for array. > Deque and dictionary both detect that the loop variable has been mutated and > throw a specific exception. That's because they never suffered from list's ill-advised documentation effectively blessing mutation while iterating <0.5 wink>. > Set throws the same exception as dictionary does (presumably, the main > container inside 'set' is a dictionary) > > Details of behaviour: The last one is extremely surprising: > Python 2.4a3 (#16, Sep 21 2004, 17:33:57) > [GCC 3.4.1 20040702 (Red Hat Linux 3.4.1-2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. ... > >>> x > {1: None, -1: None} > >>> x.fromkeys(-y for y in x) > {-1: None} Are you sure get that? I get this: >>> x {1: None, -1: None} >>> x.fromkeys(-y for y in x) {1: None, -1: None} "x.fromkeys()" doesn't have anything to do with x. Any dict works same there: >>> {}.fromkeys(-y for y in x) {1: None, -1: None} >>> {'a': 'b', 'c': 'd', 'e': 'f'}.fromkeys(-y for y in x) {1: None, -1: None} >>> From ncoghlan at iinet.net.au Sun Sep 26 08:22:02 2004 From: ncoghlan at iinet.net.au (ncoghlan@iinet.net.au) Date: Sun Sep 26 08:22:08 2004 Subject: [Python-Dev] More data points In-Reply-To: <1f7befae040925225047b6d3f3@mail.gmail.com> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> Message-ID: <1096179722.4156600a2c219@mail.iinet.net.au> Quoting Tim Peters : > That's because they never suffered from list's ill-advised > documentation effectively blessing mutation while iterating <0.5 > wink>. Ah. Interesting to know. So catching this is recommended when it's feasible? > > Set throws the same exception as dictionary does (presumably, the main > > container inside 'set' is a dictionary) > > > > Details of behaviour: > > The last one is extremely surprising: And it never actually happened, either. It's a transcription error on my part. I made a mistake when testing the dict.update version (I wrote "-y for y in x", instead of "(-y, None) for y in x"). When deleting that from the transcript, I also accidentally deleted the x.fromkeys() example. When I added that example back in, I put it in the wrong spot (after the x.update example, instead of before it). So, no, dict.update isn't randomly eating dictionary entries. Sorry 'bout the false alarm. . . Cheers, Nick. From raynorj at mn.rr.com Sun Sep 26 08:44:22 2004 From: raynorj at mn.rr.com (J Raynor) Date: Sun Sep 26 08:28:18 2004 Subject: [Python-Dev] using openssh's pty code Message-ID: <41566546.7020601@mn.rr.com> Since openssh must handle pty allocation, its support for pty operations across various platforms is more robust than python's. I'd like to use openssh's code to improve on python's pty handling. I know the licenses for openssh and python are different. Can anyone tell me if it's legal to mix openssh code into python? Assuming it is, are the python maintainers willing to accept a python patch that contains some openssh code? From ncoghlan at iinet.net.au Sun Sep 26 08:28:51 2004 From: ncoghlan at iinet.net.au (ncoghlan@iinet.net.au) Date: Sun Sep 26 08:28:57 2004 Subject: [Python-Dev] More data points In-Reply-To: <1096179722.4156600a2c219@mail.iinet.net.au> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> Message-ID: <1096180131.415661a362685@mail.iinet.net.au> Quoting "ncoghlan@iinet.net.au" : > So, no, dict.update isn't randomly eating dictionary entries. And neither is dict.fromkeys, for that matter (which was what my copy-and-paste error actually showed). Cheers, Nick. With this sort of error rate, it's a good thing I'm not coding right now. . . From python at rcn.com Sun Sep 26 12:17:34 2004 From: python at rcn.com (Raymond Hettinger) Date: Sun Sep 26 12:18:45 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <1f7befae040925092332989585@mail.gmail.com> Message-ID: <000501c4a3b2$0ab29b40$e841fea9@oemcomputer> > I think it's easy to fix. "The usual rule" applies: you can't assume > anything about a mutable object after potentially calling back into > Python. So trying to save info in "i", "m", or "n" across loop > iterations can't work, and the list can never be left in an insane > state ("semi" or not) at any time user code may get invoked. But > since we have both "num allocated" and "num used" members in the list > struct now, it's easy to use those instead of trying to carry info in > locals. FWIW, I've searched the codebase and found no other variants on this problem. None of the other update/extend methods try to remember self data between iterations. Other calls to list_resize immediately fill-in the NULLS before calling arbitrary Python code. And, other places that use the over-allocation trick, map() for example, are working with a brand new list or tuple that has not been exposed to the rest of the application. One situation did look suspect. _PySequence_IterSearch() remembers an index/count across calls to PyIter_Next() -- it looks like the worst that could happen is the index or count would be wrong, but no crashers. > Patch attached. Anyone object? Of course in the example at the start > of this msg, it leaves x empty. Looks good. Reads well. Solves the problem. The timings are still fast. The test suite runs w/o exception. Please apply. Raymond From jepler at unpythonic.net Sun Sep 26 15:32:57 2004 From: jepler at unpythonic.net (Jeff Epler) Date: Sun Sep 26 15:33:02 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <41566546.7020601@mn.rr.com> References: <41566546.7020601@mn.rr.com> Message-ID: <20040926133257.GA2645@unpythonic.net> A year or so ago, it was suggested that we take some code from glib for string-to-float conversion(?). As far as I remember, after the license issues were resolved, the remaining issue was that the contributor was not himself familiar with the code. I don't know what eventually happened. You might look for this thread in python-dev archives. I think this is an entry point into that thread: http://mail.python.org/pipermail/python-dev/2003-August/037744.html Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20040926/2e993d24/attachment.pgp From martin at v.loewis.de Sun Sep 26 17:17:38 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 26 17:17:37 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <41566546.7020601@mn.rr.com> References: <41566546.7020601@mn.rr.com> Message-ID: <4156DD92.2040300@v.loewis.de> J Raynor wrote: > > Since openssh must handle pty allocation, its support for pty operations > across various platforms is more robust than python's. I'd like to use > openssh's code to improve on python's pty handling. > > I know the licenses for openssh and python are different. Can anyone > tell me if it's legal to mix openssh code into python? Assuming it is, > are the python maintainers willing to accept a python patch that > contains some openssh code? Could you change Python's pty module to more closely follow the procedures in OpenSSH, in particular those parts where OpenSSH is more robust? Regards, Martin From fredrik at pythonware.com Sun Sep 26 14:06:30 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Sep 26 18:47:57 2004 Subject: [Python-Dev] Re: python/dist/src/Lib httplib.py,1.88,1.89 Message-ID: <200409261204.i8QC4Qk29680@pythonware.com> > +++ httplib.py 14 Sep 2004 17:55:21 -0000 1.89 > @@ -525,7 +525,8 @@ > def _set_hostport(self, host, port): > if port is None: > i = host.rfind(':') > - if i >= 0: > + j = host.rfind(']') # ipv6 addresses have [...] > + if i > j: one-line alternative: i = host.find(":", host.rfind("]")) From raynorj at mn.rr.com Sun Sep 26 20:46:41 2004 From: raynorj at mn.rr.com (J Raynor) Date: Sun Sep 26 20:30:28 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <4156DD92.2040300@v.loewis.de> References: <41566546.7020601@mn.rr.com> <4156DD92.2040300@v.loewis.de> Message-ID: <41570E91.2070503@mn.rr.com> I think I could improve the pty module by having it follow openssh's procedures, but I would wind up rewriting several configure checks in python, and I imagine some of them can only reliably be checked by compiling a small C program, like configure does. I think the better solution would be to modify the C code in posixmodule.c, or to provide an alternate module (written in C). For the alternate module idea, the pty module could import it and check to see if it provides openpty() (for example), just as the pty module currently tries to use os.openpty() before it tries its own implementation of openpty(). Martin v. L?wis wrote: > J Raynor wrote: > >> >> Since openssh must handle pty allocation, its support for pty >> operations across various platforms is more robust than python's. I'd >> like to use openssh's code to improve on python's pty handling. >> >> I know the licenses for openssh and python are different. Can anyone >> tell me if it's legal to mix openssh code into python? Assuming it >> is, are the python maintainers willing to accept a python patch that >> contains some openssh code? > > > Could you change Python's pty module to more closely follow the > procedures in OpenSSH, in particular those parts where OpenSSH > is more robust? > > Regards, > Martin > From raynorj at mn.rr.com Sun Sep 26 20:48:01 2004 From: raynorj at mn.rr.com (J Raynor) Date: Sun Sep 26 20:31:45 2004 Subject: [Fwd: Re: [Python-Dev] using openssh's pty code] Message-ID: <41570EE1.1010404@mn.rr.com> I forgot to CC the list with my response. -------------- next part -------------- An embedded message was scrubbed... From: J Raynor Subject: Re: [Python-Dev] using openssh's pty code Date: Sun, 26 Sep 2004 13:28:38 -0500 Size: 1596 Url: http://mail.python.org/pipermail/python-dev/attachments/20040926/34315f35/Python-Devusingopensshsptycode.mht From martin at v.loewis.de Sun Sep 26 20:49:55 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 26 20:49:53 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <41570E91.2070503@mn.rr.com> References: <41566546.7020601@mn.rr.com> <4156DD92.2040300@v.loewis.de> <41570E91.2070503@mn.rr.com> Message-ID: <41570F53.1010203@v.loewis.de> J Raynor wrote: > I think the better solution would be to modify the C code in > posixmodule.c, or to provide an alternate module (written in C). For > the alternate module idea, the pty module could import it and check to > see if it provides openpty() (for example), just as the pty module > currently tries to use os.openpty() before it tries its own > implementation of openpty(). Either would be fine. For the separate-module approach, I strongly advise that you publish this separately first, and collect user feedback. If a sufficient number of users would like to see it included in Python, and if you volunteer to maintain the module within Python for an extended period of time, we can include it. Regards, Martin From gvanrossum at gmail.com Sun Sep 26 23:01:47 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Sep 26 23:01:50 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <41570E91.2070503@mn.rr.com> References: <41566546.7020601@mn.rr.com> <4156DD92.2040300@v.loewis.de> <41570E91.2070503@mn.rr.com> Message-ID: On Sun, 26 Sep 2004 13:46:41 -0500, J Raynor wrote: > > I think I could improve the pty module by having it follow openssh's > procedures, but I would wind up rewriting several configure checks in > python, and I imagine some of them can only reliably be checked by > compiling a small C program, like configure does. > > I think the better solution would be to modify the C code in > posixmodule.c, or to provide an alternate module (written in C). For > the alternate module idea, the pty module could import it and check to > see if it provides openpty() (for example), just as the pty module > currently tries to use os.openpty() before it tries its own > implementation of openpty(). Agreed that this would be best served by writing C code. I hope that it can be done without violating someone else's license *and* without weighing down future Python distributions with someone else's license. No matter how sensible the other license is, adding licenses to the stack of licenses is not a good idea at this point. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.peters at gmail.com Mon Sep 27 00:06:00 2004 From: tim.peters at gmail.com (Tim Peters) Date: Mon Sep 27 00:06:02 2004 Subject: [Python-Dev] A cute new way to get an infinite loop In-Reply-To: <000501c4a3b2$0ab29b40$e841fea9@oemcomputer> References: <1f7befae040925092332989585@mail.gmail.com> <000501c4a3b2$0ab29b40$e841fea9@oemcomputer> Message-ID: <1f7befae040926150618b76b21@mail.gmail.com> [Raymond Hettinger] > ... > One situation did look suspect. _PySequence_IterSearch() remembers an > index/count across calls to PyIter_Next() -- it looks like the worst > that could happen is the index or count would be wrong, but no crashers. If the operation is PY_ITERSEARCH_INDEX, n is the 0-based count of the number of times the iterator got poked before the object was found. That's always correct, by definition (given that there's no guarantee the iterator can be rewound and restarted, or even that it would yield the same objects if it could be restarted, what else could "the index of the first occurrence" mean?). If the operation is PY_ITERSEARCH_COUNT, then n is the number of times poking the iterator returned the object in question. That's also correct by defintion of what PySequence_Count() means, although there's again no guarantee that the user passes a sensible iterable object (== one that would produce the same objects if crawled over a second time). So those are fine. Thanks for checking the others, and for checking in a test and the fix! From tim.peters at gmail.com Mon Sep 27 00:33:43 2004 From: tim.peters at gmail.com (Tim Peters) Date: Mon Sep 27 00:33:47 2004 Subject: [Python-Dev] More data points In-Reply-To: <1096179722.4156600a2c219@mail.iinet.net.au> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> Message-ID: <1f7befae040926153369d0d50e@mail.gmail.com> [Tim] >> That's because they never suffered from list's ill-advised >> documentation effectively blessing mutation while iterating <0.5 wink>. [Nick] > Ah. Interesting to know. So catching this is recommended when it's feasible? According to me, but perhaps not according to all. You can work very hard to provide predictable semantics for mutation while iterating, by defining cursor objects that somehow retain sensible guarantees even if the object they point into mutates. In effect, "the current index" is a cursor in this respect when iterating over a list, and the semantics are that "the current index", on each iteration, goes up by one, and is an offset from the start of whatever state the list happens to have at that time. So, e.g., this behavior is guaranteed: >>> x = range(10) >>> for elt in x: ... x.remove(elt) >>> x [1, 3, 5, 7, 9] >>> "Guaranteed" doesn't necessarily mean unsurprising, or even useful, though. I do have uses for this behavior, but I'd be happy to give them up. The "natural" behavior of dicts when mutating while iterating is effectively unexplainable -- it "does whatever it does", based on internal details of the hashed distribution of keys into buckets, and even on the history of insertions (which affects hash collision resolution). I'm glad Python gripes about that now (it didn't always). It would also be possible, but difficult, to implement "sane" iteration+mutation semantics for dicts. A dict cursor object would need to be aware of which objects had and hadn't already been passed out by the iteration, and would even need to be robust against the dict reorganizing itself completely when it changes size. It's a lot easier all around to say "if you have to, iterate over a snapshot of the keys". In some cases, we're reduced to saying that with no way to catch violations. ZODB's BTrees are a good example here. People routinely get in trouble by mutating them while iterating over them, but the implementation is such that it would be very difficult to detect such a thing. From ilya at bluefir.net Mon Sep 27 03:46:00 2004 From: ilya at bluefir.net (Ilya Sandler) Date: Mon Sep 27 03:47:47 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: <1f7befae040926153369d0d50e@mail.gmail.com> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> Message-ID: A problem: a number of standard python modules come with a command line interfaces, e.g. pydoc.py, pdb.py , unittest.py, timeit.py, uu.py But it appears that there is no convenient out-of-the-box way to invoke these tools from command line... Basically one either has to write wrappers or to invoke them like this: python /usr/lib/python2.3/pdb.py Neither approach is convenient... Am I missing something obvious? If not, then would the following make sense? When a script specified from command line is not found and the script name does not end with py, treat the script as a module name and execute that module as __main__ So python pdb would be equivalent to python /usr/lib/python2.3/pdb.py A possible variation of the same idea would be to have an explicit command line option -m (or -M). More typing, but less magic... Ilya PS. An obvious alternative would be to install wrapper scripts/symlinks next to python, but I don't understand python packaging well enough to make a judgement here. One obvious problem with wrapper scripts would be a difficulty of versioning, I wouldn't want to have pydoc2.2 pydoc2.3.1 pydoc2.3, etc in my /usr/bin From skip at pobox.com Mon Sep 27 03:59:58 2004 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 27 04:00:09 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> Message-ID: <16727.29726.557153.219020@montanaro.dyndns.org> Ilya> a number of standard python modules come with a command line Ilya> interfaces, e.g. pydoc.py, pdb.py , unittest.py, timeit.py, uu.py Ilya> But it appears that there is no convenient out-of-the-box way to Ilya> invoke these tools from command line... Ilya> Basically one either has to write wrappers or to Ilya> invoke them like this: python /usr/lib/python2.3/pdb.py Ilya> Neither approach is convenient... Ilya> Am I missing something obvious? Search for "Scripts to install" in the setup.py file that comes with the Python distribution. If there are other scripts you'd like to see installed, just submit a patch for setup.py. Skip From martin at v.loewis.de Mon Sep 27 06:48:50 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon Sep 27 06:48:48 2004 Subject: [Fwd: Re: [Python-Dev] using openssh's pty code] In-Reply-To: <41570EE1.1010404@mn.rr.com> References: <41570EE1.1010404@mn.rr.com> Message-ID: <41579BB2.4020709@v.loewis.de> > Well, I'm not sure how that applies. I didn't see any mention of > licenses in the thread you pointed me to, but even if that thread (or > some other one) showed that it was ok to use glib code in python, that > doesn't mean I can put openssh code into python because the glib and > openssh licenses are different. My personal view is that we can only accept code contributions from the original author, in general. There have been exceptions in the past, when we have shipped a wrapper for a library along with the source code of the library, however, there should be a very good reason for that, e.g. that the functionality in the library is unique. In the specific case, I do believe it would be better to write a pty module from scratch instead of adjusting openssh code. Regards, Martin From raynorj at mn.rr.com Mon Sep 27 08:57:24 2004 From: raynorj at mn.rr.com (J Raynor) Date: Mon Sep 27 08:41:08 2004 Subject: [Fwd: Re: [Python-Dev] using openssh's pty code] In-Reply-To: <41579BB2.4020709@v.loewis.de> References: <41570EE1.1010404@mn.rr.com> <41579BB2.4020709@v.loewis.de> Message-ID: <4157B9D4.2060401@mn.rr.com> The code that I would borrow from openssh basically states that you can use it if you include in your derived work the copyright notice and disclaimer found in the file you want to borrow from. This sounds like it would pose no problems for incorporating into python, but I'm no expert on this, so I thought I'd ask. Looking at some of the python source, I can see that there are several files that contain just such notices. For example, from the Modules directory: addrinfo.h md5.h regexpr.h timing.h _bsddb.c getaddrinfo.c _localemodule.c parsermodule.c syslogmodule.c Perhaps my original question led you to believe that the openssh license was unusual, or had problematic clauses in it. Given the somewhat clarified description above of what's required to borrow openssh code, do you still have reservations about receiving patches containing it? Martin v. L?wis wrote: >> Well, I'm not sure how that applies. I didn't see any mention of >> licenses in the thread you pointed me to, but even if that thread (or >> some other one) showed that it was ok to use glib code in python, that >> doesn't mean I can put openssh code into python because the glib and >> openssh licenses are different. > > > My personal view is that we can only accept code contributions from > the original author, in general. There have been exceptions in the > past, when we have shipped a wrapper for a library along with the > source code of the library, however, there should be a very good reason > for that, e.g. that the functionality in the library is unique. > > In the specific case, I do believe it would be better to write > a pty module from scratch instead of adjusting openssh code. > > Regards, > Martin > From jim at zope.com Mon Sep 27 10:59:44 2004 From: jim at zope.com (Jim Fulton) Date: Mon Sep 27 10:59:49 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> Message-ID: <4157D680.4040809@zope.com> Ilya Sandler wrote: > A problem: > > a number of standard python modules come with a command line interfaces, > e.g. pydoc.py, pdb.py , unittest.py, timeit.py, uu.py > But it appears that there is no convenient out-of-the-box way to invoke > these tools from command line... > > Basically one either has to write wrappers or to > invoke them like this: python /usr/lib/python2.3/pdb.py > > Neither approach is convenient... > > Am I missing something obvious? If not, then would the following make > sense? > > When a script specified from command line is not found and the script name > does not end with py, treat the script as a module name and execute > that module as __main__ > > So > python pdb > would be equivalent to > python /usr/lib/python2.3/pdb.py > > A possible variation of the same idea would be to have an explicit > command line option -m (or -M). More typing, but less magic... +1 on the -m command-line variation, with the following change: I'd like Python to import the module and then run it's main function. I've been meaning to suggest smething like this myself. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim at zope.com Mon Sep 27 11:01:42 2004 From: jim at zope.com (Jim Fulton) Date: Mon Sep 27 11:01:52 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: <16727.29726.557153.219020@montanaro.dyndns.org> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> <16727.29726.557153.219020@montanaro.dyndns.org> Message-ID: <4157D6F6.1000307@zope.com> Skip Montanaro wrote: ... > Search for "Scripts to install" in the setup.py file that comes with the > Python distribution. If there are other scripts you'd like to see > installed, just submit a patch for setup.py. But then the same file gets installed twice. I'd really like something like what Ilya suggested for the common case of files that are usually used as modules but that also have a command-line interface. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From anthony at interlink.com.au Mon Sep 27 11:22:31 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Sep 27 11:23:26 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: <4157D680.4040809@zope.com> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> > +1 on the -m command-line variation, with the following change: > > I'd like Python to import the module and then run it's main function. > > I've been meaning to suggest smething like this myself. I'd prefer it import the module, with __name__ == "__main__", because it's compatible with what we do now for a module that's also a script. But I like the idea, nonetheless. Question: should python -m foo.bar.baz work? I'd say "yes". Anthony From jim at zope.com Mon Sep 27 11:32:24 2004 From: jim at zope.com (Jim Fulton) Date: Mon Sep 27 11:32:27 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: <4157DBD7.1090205@interlink.com.au> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> Message-ID: <4157DE28.5020209@zope.com> Anthony Baxter wrote: > >> +1 on the -m command-line variation, with the following change: >> >> I'd like Python to import the module and then run it's main function. >> >> I've been meaning to suggest smething like this myself. > > > I'd prefer it import the module, with __name__ == "__main__", > because it's compatible with what we do now for a module that's > also a script. But I like the idea, nonetheless. > > Question: should python -m foo.bar.baz work? I'd say "yes". Me too. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From lists at hlabs.spb.ru Mon Sep 27 16:12:32 2004 From: lists at hlabs.spb.ru (Dmitry Vasiliev) Date: Mon Sep 27 12:04:24 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> Message-ID: <41581FD0.7090501@hlabs.spb.ru> Ilya Sandler wrote: > A problem: > > a number of standard python modules come with a command line interfaces, > e.g. pydoc.py, pdb.py , unittest.py, timeit.py, uu.py > But it appears that there is no convenient out-of-the-box way to invoke > these tools from command line... > > Basically one either has to write wrappers or to > invoke them like this: python /usr/lib/python2.3/pdb.py > > Neither approach is convenient... > > Am I missing something obvious? If not, then would the following make > sense? > > When a script specified from command line is not found and the script name > does not end with py, treat the script as a module name and execute > that module as __main__ > > So > python pdb > would be equivalent to > python /usr/lib/python2.3/pdb.py > > A possible variation of the same idea would be to have an explicit > command line option -m (or -M). More typing, but less magic... There is already has been some discussion about importing from command line: http://mail.python.org/pipermail/python-dev/2003-December/041240.html I suggested the following: 1. python -p package Equivalent to: import package 2. python -p package.zip Equivalent to: import sys sys.path.insert(0, "package.zip") import package -- Dmitry Vasiliev (dima at hlabs.spb.ru) http://hlabs.spb.ru From alex.nanou at gmail.com Mon Sep 27 15:19:13 2004 From: alex.nanou at gmail.com (Alex A. Naanou) Date: Mon Sep 27 15:19:16 2004 Subject: Fwd: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: <36f8892204092706186f8a277f@mail.gmail.com> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> <41581FD0.7090501@hlabs.spb.ru> <36f8892204092706186f8a277f@mail.gmail.com> Message-ID: <36f88922040927061974e56a66@mail.gmail.com> ---------- Forwarded message ---------- From: Alex A. Naanou Date: Mon, 27 Sep 2004 17:18:20 +0400 Subject: Re: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc To: dima@hlabs.spb.ru On Mon, 27 Sep 2004 14:12:32 +0000, Dmitry Vasiliev wrote: > Ilya Sandler wrote: > > A problem: > > > > a number of standard python modules come with a command line interfaces, > > e.g. pydoc.py, pdb.py , unittest.py, timeit.py, uu.py > > But it appears that there is no convenient out-of-the-box way to invoke > > these tools from command line... > > > > Basically one either has to write wrappers or to > > invoke them like this: python /usr/lib/python2.3/pdb.py > > > > Neither approach is convenient... > > > > Am I missing something obvious? If not, then would the following make > > sense? > > > > When a script specified from command line is not found and the script name > > does not end with py, treat the script as a module name and execute > > that module as __main__ > > > > So > > python pdb > > would be equivalent to > > python /usr/lib/python2.3/pdb.py > > > > A possible variation of the same idea would be to have an explicit > > command line option -m (or -M). More typing, but less magic... > > There is already has been some discussion about importing from command line: > http://mail.python.org/pipermail/python-dev/2003-December/041240.html > > I suggested the following: > > 1. python -p package > > Equivalent to: > > import package > > 2. python -p package.zip > > Equivalent to: > > import sys > sys.path.insert(0, "package.zip") > import package > this might not be good (IMHO), as: 1) this makes an implicit import (from the point of view of the code... (imports from outside the code that the code uses))... 2) does does not solve the problem at hand, as when a module is imported its __name__ is no longer "__main__" thus its commandline handler will not start... -- Alex. -- Alex. From alex.nanou at gmail.com Mon Sep 27 15:20:00 2004 From: alex.nanou at gmail.com (Alex A. Naanou) Date: Mon Sep 27 15:20:03 2004 Subject: Fwd: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: <36f88922040927061163019d6@mail.gmail.com> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> <4157DBD7.1090205@interlink.com.au> <4157DE28.5020209@zope.com> <36f88922040927061163019d6@mail.gmail.com> Message-ID: <36f8892204092706205d0a13a3@mail.gmail.com> oops... forgot to CC the messages to python-dev ^_^ ---------- Forwarded message ---------- From: Alex A. Naanou Date: Mon, 27 Sep 2004 17:11:45 +0400 Subject: Re: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc To: jim@zope.com On Mon, 27 Sep 2004 05:32:24 -0400, Jim Fulton wrote: > Anthony Baxter wrote: > > > >> +1 on the -m command-line variation, with the following change: > >> > >> I'd like Python to import the module and then run it's main function. > >> > >> I've been meaning to suggest smething like this myself. > > > > > > I'd prefer it import the module, with __name__ == "__main__", > > because it's compatible with what we do now for a module that's > > also a script. But I like the idea, nonetheless. > > > > Question: should python -m foo.bar.baz work? I'd say "yes". > > Me too. Count me in too! ..though I must say I am against the variant with the main function, as there is an accepted and widely used mechanism in python already (__name__ == '__main__'), so why add another one or make the existing mechanism more complex... -- Alex. -- Alex. From FBatista at uniFON.com.ar Mon Sep 27 16:13:31 2004 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Mon Sep 27 16:18:22 2004 Subject: [Python-Dev] A cute new way to get an infinite loop Message-ID: [Raymond Hettinger] #- Looks good. Reads well. Solves the problem. The timings are still #- fast. The test suite runs w/o exception. These should be remembered like "The 5 conditions for a good patch" (or something). . Facundo From niemeyer at conectiva.com Mon Sep 27 16:13:19 2004 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Mon Sep 27 16:24:20 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <4156DD92.2040300@v.loewis.de> References: <41566546.7020601@mn.rr.com> <4156DD92.2040300@v.loewis.de> Message-ID: <20040927141319.GA3105@burma.localdomain> Hello Martin, > >Since openssh must handle pty allocation, its support for pty > >operations across various platforms is more robust than python's. > >I'd like to use openssh's code to improve on python's pty handling. > > > >I know the licenses for openssh and python are different. Can anyone > >tell me if it's legal to mix openssh code into python? Assuming it > >is, are the python maintainers willing to accept a python patch that > >contains some openssh code? > > Could you change Python's pty module to more closely follow the > procedures in OpenSSH, in particular those parts where OpenSSH > is more robust? If he's going to copy/base his work on openssh, the openssh license must surely be respected. FWIW, that's the issue I was talking about when we discussed the contributor agreement in the PSF list, regarding inclusion of code with foreign licenses. In this occasion, you said a contributor must not include code not authored by him, and cannot sign an agreement on such code. -- Gustavo Niemeyer http://niemeyer.net From ncoghlan at iinet.net.au Mon Sep 27 16:32:36 2004 From: ncoghlan at iinet.net.au (ncoghlan@iinet.net.au) Date: Mon Sep 27 16:32:47 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: <4157DBD7.1090205@interlink.com.au> References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> Message-ID: <1096295556.4158248495acf@mail.iinet.net.au> Quoting Anthony Baxter : > > > +1 on the -m command-line variation, with the following change: > > > > I'd like Python to import the module and then run it's main function. > > > > I've been meaning to suggest smething like this myself. > > I'd prefer it import the module, with __name__ == "__main__", > because it's compatible with what we do now for a module that's > also a script. But I like the idea, nonetheless. > > Question: should python -m foo.bar.baz work? I'd say "yes". I was curious how hard this would be to implement. Minus Andrew's addition, the answer is "Not very". So those who are interested in the idea might want to take a look at SF Patch # 1035498. The patch tries to make "./python -m pdb" mean the same thing as "./python Lib/pdb.py" on a development build. (I use that example, because I have only a very vague idea of where the pdb script ends up for an installed version of Python - which is why I think this option would be very useful!) Cheers, Nick. From wiedeman at gmx.net Mon Sep 27 09:39:30 2004 From: wiedeman at gmx.net (Christoph Wiedemann) Date: Mon Sep 27 16:46:37 2004 Subject: [Python-Dev] Py_NewInterpreter and PyGILState API Message-ID: <29383.1096270770@www48.gmx.net> Hello, first of all my apologies for sending this message to python-dev, but i tried comp.lang.pthon and python-help and didn't get helpful answers. My problem is, that Py_NewInterpreter and the in 2.3 introduced PyGILState API doesn't play nice with each other, especially in multithreaded embedding applications. I think, this is because PyGILState functions assume, there is exactly one PyThreadState instance per thread, and this is violated when using Py_NewInterpreter, which creates a new PyThreadState instance. I've tried to use state = PyGILState_Ensure(); PyThreadState_Get()->interp = interpreterIWantToUse; /* code using Python API */ PyGILState_Release(state); which seems to work, if called from one thread only, but this fails, if used by multiple threads with a "Fatal Python error: PyThreadState_Delete: invalid tstate." Now, most of you would say: "Don't use PyGILState API, use the 2.2 way of dealing with thread states". Unfortunately, i want to use PyQt, which uses the PyGILState API, and i found, that it's not easy (or even impossible?) to mix PyGILState calls with PyEval_SaveThread / PyEval_RestoreThread. I'm lost with this and would appreciate any help. I'm using Python 2.3 on linux x86. Christoph From barry at python.org Mon Sep 27 16:51:28 2004 From: barry at python.org (Barry Warsaw) Date: Mon Sep 27 16:51:35 2004 Subject: [Python-Dev] a simpler way to invoke pydoc, pdb, unittest, etc In-Reply-To: References: <000a01c4a334$44b29e40$e841fea9@oemcomputer> <1096176508.4156537c95cca@mail.iinet.net.au> <1f7befae040925225047b6d3f3@mail.gmail.com> <1096179722.4156600a2c219@mail.iinet.net.au> <1f7befae040926153369d0d50e@mail.gmail.com> Message-ID: <1096296687.23222.142.camel@geddy.wooz.org> On Sun, 2004-09-26 at 21:46, Ilya Sandler wrote: > When a script specified from command line is not found and the script name > does not end with py, treat the script as a module name and execute > that module as __main__ > > So > python pdb > would be equivalent to > python /usr/lib/python2.3/pdb.py > > A possible variation of the same idea would be to have an explicit > command line option -m (or -M). More typing, but less magic... With the command line switch, +1. One problem with the "just install it" approach is that you often get Python from downstream packagers that make their own decisions about which additional scripts to include. There's also namespace collision issues in bin directories to deal with. So Ilya's suggestion avoids both of those problems. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040927/3925af92/attachment.pgp From raymond.hettinger at verizon.net Mon Sep 27 18:35:07 2004 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Mon Sep 27 18:36:20 2004 Subject: [Python-Dev] Socket/Asyncore bug needs attention Message-ID: <002001c4a4af$f3246fe0$e841fea9@oemcomputer> Anyone who has worked on sockets or asyncore should take a look at SF bug #1010098: CPU usage shoots up with asyncore. Since Py2.3, the behavior changed for the worse. The bug report has been around for about five weeks and doesn't look like it is actively being solved. If you worked on those modules, please review your check-ins to see if they were the cause: www.python.org/sf/1010098 Thx, Raymond From tim.peters at gmail.com Mon Sep 27 18:45:35 2004 From: tim.peters at gmail.com (Tim Peters) Date: Mon Sep 27 18:45:41 2004 Subject: [Python-Dev] Socket/Asyncore bug needs attention In-Reply-To: <002001c4a4af$f3246fe0$e841fea9@oemcomputer> References: <002001c4a4af$f3246fe0$e841fea9@oemcomputer> Message-ID: <1f7befae040927094522854326@mail.gmail.com> [Raymond Hettinger] > Anyone who has worked on sockets or asyncore should take a look at SF > bug #1010098: CPU usage shoots up with asyncore. Since Py2.3, the > behavior changed for the worse. The bug report has been around for > about five weeks and doesn't look like it is actively being solved. If > you worked on those modules, please review your check-ins to see if they > were the cause: > > www.python.org/sf/1010098 FYI, as I noted in a comment there, I stared at that one long enough to determine that asyncore almost certainly wasn't to blame (despite the OP's natural belief that it must be). Instead something "above" asyncore changed so that a socket *always* shows up as "ready to write" in 2.4, but hardly ever in 2.3. CPU usage is nailed to 100% as a consequence in 2.4, while in 2.3 asyncore's select() times out instead, consuming almost no CPU. From arigo at tunes.org Mon Sep 27 19:53:19 2004 From: arigo at tunes.org (Armin Rigo) Date: Mon Sep 27 19:58:24 2004 Subject: [Python-Dev] Socket/Asyncore bug needs attention In-Reply-To: <002001c4a4af$f3246fe0$e841fea9@oemcomputer> References: <002001c4a4af$f3246fe0$e841fea9@oemcomputer> Message-ID: <20040927175319.GA32385@vicky.ecs.soton.ac.uk> Hello Raymond, On Mon, Sep 27, 2004 at 12:35:07PM -0400, Raymond Hettinger wrote: > Anyone who has worked on sockets or asyncore should take a look at SF > bug #1010098: CPU usage shoots up with asyncore. (...) If > you worked on those modules, please review your check-ins to see if they > were the cause: Funnily enough, the check-in to blame is from you :-) You replaced asynchat.py's usage of fifo lists with collection.deque()s, but you overlooked the test for emptiness, which was 'self.list == []'. This is fine if 'self.list' is really a list, but not if it is a deque :-) Fixed, checked in. Armin From python at rcn.com Mon Sep 27 21:17:14 2004 From: python at rcn.com (Raymond Hettinger) Date: Mon Sep 27 21:19:02 2004 Subject: [Python-Dev] Socket/Asyncore bug needs attention In-Reply-To: <20040927175319.GA32385@vicky.ecs.soton.ac.uk> Message-ID: <000e01c4a4c6$ad3c3320$e841fea9@oemcomputer> > Funnily enough, the check-in to blame is from you Oh, for shame! > Fixed, checked in. Thanks a million. Raymond From arigo at tunes.org Mon Sep 27 22:05:33 2004 From: arigo at tunes.org (Armin Rigo) Date: Mon Sep 27 22:10:39 2004 Subject: [Python-Dev] open('/dev/null').read() -> MemoryError Message-ID: <20040927200533.GA29621@vicky.ecs.soton.ac.uk> Hi, On my system, which is admittedly an old Linux box (2.2 kernel), one test fails: >>> file('/dev/null').read() Traceback (most recent call last): File "", line 1, in ? MemoryError This is because: >>> os.stat('/dev/null').st_size 4540321280L This looks very broken indeed. I have no idea where this number comes from. I'd also complain if I was asked to allocate a buffer large enough to hold that many bytes. If we cared, we could "enhance" the file.read() method to account for the possibility that maybe stat() lied; maybe it is desirable, instead of allocating huge amounts of memory, to revert to something like the following above some large threshold: result = [] while 1: buf = f.read(16384) if not buf: return ''.join(result) result.append(buf) Of course for genuinely large reads it's a disaster to have to allocate twice as much memory. Anyway I'm not sure we care about going around broken behaviour. I'm just wondering if os.stat() could lie in other situations too. Armin From bob at redivi.com Mon Sep 27 22:21:03 2004 From: bob at redivi.com (Bob Ippolito) Date: Mon Sep 27 22:21:10 2004 Subject: [Python-Dev] open('/dev/null').read() -> MemoryError In-Reply-To: <20040927200533.GA29621@vicky.ecs.soton.ac.uk> References: <20040927200533.GA29621@vicky.ecs.soton.ac.uk> Message-ID: On Sep 27, 2004, at 4:05 PM, Armin Rigo wrote: > On my system, which is admittedly an old Linux box (2.2 kernel), one > test > fails: > >>>> file('/dev/null').read() > Traceback (most recent call last): > File "", line 1, in ? > MemoryError > > This is because: > >>>> os.stat('/dev/null').st_size > 4540321280L > > This looks very broken indeed. I have no idea where this number comes > from. > I'd also complain if I was asked to allocate a buffer large enough to > hold > that many bytes. If we cared, we could "enhance" the file.read() > method to > account for the possibility that maybe stat() lied; maybe it is > desirable, > instead of allocating huge amounts of memory, to revert to something > like the > following above some large threshold: > > result = [] > while 1: > buf = f.read(16384) > if not buf: > return ''.join(result) > result.append(buf) > > Of course for genuinely large reads it's a disaster to have to > allocate twice > as much memory. Anyway I'm not sure we care about going around broken > behaviour. I'm just wondering if os.stat() could lie in other > situations too. file(path).read() is never really a good idea in the general case - especially for a device node. It might never terminate and it will get a MemoryError for genuinely large files anyway, especially on 32-bit architectures. People should be reading files in chunks or using mmap. Is there really anything the runtime can or should do about this? In other words, it sounds like the test should be fixed, not the implementation. -bob From lunz at falooley.org Mon Sep 27 23:33:45 2004 From: lunz at falooley.org (Jason Lunz) Date: Tue Sep 28 00:00:55 2004 Subject: [Python-Dev] upcoming stable release? Message-ID: I'm not a regular around here, so forgive me if this is obvious: Can anyone give me an idea of when to expect the next 2.3 point release? 2.3.4 dates from May, and I have a vague idea that I need some more-recent fix from the release23-maint branch. [background: I have a python-gtk app that may be affected by this bug: http://bugzilla.gnome.org/show_bug.cgi?id=149845. The changelog for the debian unstable python2.3 package says: python2.3 (2.3.4-5) unstable; urgency=medium * Updated to CVS release23-maint 20040705. - Remove threading patch, integrated upstream. I have a vague idea that this may address the same issue, and if so, the 2.3.5 release (being based on release23-maint, I assume) will be safe to use on all platforms, not just Debian sid. If I were on debian unstable everything would be fine. Unfortunately, I wish to also support windows, and on that platform I prefer to use official python.org releases, which makes me wonder whether 2.3.5 is imminent.] thanks, Jason From anthony at interlink.com.au Tue Sep 28 02:40:09 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Sep 28 02:40:39 2004 Subject: [Python-Dev] upcoming stable release? In-Reply-To: References: Message-ID: <4158B2E9.9040204@interlink.com.au> Jason Lunz wrote: > I'm not a regular around here, so forgive me if this is obvious: > > Can anyone give me an idea of when to expect the next 2.3 point release? > 2.3.4 dates from May, and I have a vague idea that I need some > more-recent fix from the release23-maint branch. After 2.4 final. -- Anthony Baxter It's never too late to have a happy childhood. From greg at electricrain.com Tue Sep 28 03:33:20 2004 From: greg at electricrain.com (Gregory P. Smith) Date: Tue Sep 28 03:33:32 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <41566546.7020601@mn.rr.com> References: <41566546.7020601@mn.rr.com> Message-ID: <20040928013320.GC1530@zot.electricrain.com> On Sun, Sep 26, 2004 at 01:44:22AM -0500, J Raynor wrote: > > Since openssh must handle pty allocation, its support for pty operations > across various platforms is more robust than python's. I'd like to use > openssh's code to improve on python's pty handling. > > I know the licenses for openssh and python are different. Can anyone > tell me if it's legal to mix openssh code into python? Assuming it is, > are the python maintainers willing to accept a python patch that > contains some openssh code? look at the openssh license. yes you can use it. its BSD or better. http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/LICENCE?rev=HEAD From gvanrossum at gmail.com Tue Sep 28 04:51:06 2004 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Sep 28 04:51:09 2004 Subject: [Python-Dev] Fwd: Python binding to Rendezvous In-Reply-To: References: Message-ID: I've been asked to help getting Rendevous (AKA zeroconf I believe) bindings for Python implemented. Anyone interested in helping out? (The person to contact if you're interested is Daniel Steinberg. Please cc me on the initial emails so I know contact is being made. --Guido ---------- Forwarded message ---------- From: Joey Trevino Date: Mon, 27 Sep 2004 17:09:48 -0700 Subject: Python binding to Rendezvous To: Guido van Rossum Cc: Daniel H Steinberg Hey Guido, Here's the description of the task from my friend Daniel Steinberg (CC'd here) at O'Reilly. I understand that you don't have time to do the work yourself, but I'm hopeful that you know of someone who would. >>>> Apple has just provided a daemon for Rendezvous for Mac, for >>>> Windows, for Linux, and for UNIX. Rich Kilmer spent an afternoon >>>> with Stuart Cheshire and wrote Ruby bindings for this daemon and so >>>> Ruby developers can easily Rendezvous enable their application. >>>> There are also Parrot and Perl bindings on the way. The question >>>> was whether someone had an afternoon free to spend with Stuart to >>>> write the Python bindings. Stuart is not a Python expert but he is >>>> good at teaming up with someone who is to help them see what hooks >>>> are needed. Thanks, Joey -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue Sep 28 07:08:21 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Sep 28 07:08:18 2004 Subject: [Fwd: Re: [Python-Dev] using openssh's pty code] In-Reply-To: <4157B9D4.2060401@mn.rr.com> References: <41570EE1.1010404@mn.rr.com> <41579BB2.4020709@v.loewis.de> <4157B9D4.2060401@mn.rr.com> Message-ID: <4158F1C5.8030300@v.loewis.de> J Raynor wrote: > The code that I would borrow from openssh basically states that you can > use it if you include in your derived work the copyright notice and > disclaimer found in the file you want to borrow from. This sounds like > it would pose no problems for incorporating into python, but I'm no > expert on this, so I thought I'd ask. It still very much depends on the *precise* wording. For example, if we assemble binary releases, what are our obligations wrt. copyright notice? If Python users embed Python into their applications, what will be their obligations? > Looking at some of the python source, I can see that there are several > files that contain just such notices. For example, from the Modules > directory: Yes. Each of these cases is somewhat worrysome, and we are working on eliminating them whereever possible. Some of them are harder to resolve than others. I'm certain that users of Python break some of these licenses, by not incorporating the proper clauses into the proper locations. Some users are worried about doing that and have asked to simplify their lifes. > Perhaps my original question led you to believe that the openssh license > was unusual, or had problematic clauses in it. Given the somewhat > clarified description above of what's required to borrow openssh code, > do you still have reservations about receiving patches containing it? I understood from the beginning that the openssh license is not unusual, and that is what worries me. It doesn't worry me so much as to completely object inclusion of code, but only if a suitable replacement is too hard to write. In any case, if you distribute the module separately first, none of this needs to concern you. Regards, Martin From martin at v.loewis.de Tue Sep 28 07:13:17 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Sep 28 07:13:14 2004 Subject: [Python-Dev] using openssh's pty code In-Reply-To: <20040927141319.GA3105@burma.localdomain> References: <41566546.7020601@mn.rr.com> <4156DD92.2040300@v.loewis.de> <20040927141319.GA3105@burma.localdomain> Message-ID: <4158F2ED.5080502@v.loewis.de> Gustavo Niemeyer wrote: > If he's going to copy/base his work on openssh, the openssh license > must surely be respected. I wasn't suggesting literal copying, but rewriting the C code in Python. > FWIW, that's the issue I was talking about when we discussed the > contributor agreement in the PSF list, regarding inclusion of code > with foreign licenses. In this occasion, you said a contributor > must not include code not authored by him, and cannot sign an > agreement on such code. Yes, and I still stand to this. A contributor can, of course, suggest that code with a different license is included, explaining what the consequences of doing so would be, and why we are then permitted to still distribute the derived work in the way we want. As this typically involves putting some sort of notice in some place, I'm concerned that the list of notices grows longer and longer over time, and becomes unmanagable for us. So a solution of the original author contributing the code with permission to distribute it under our own licenses is much preferable. Regards, Martin From arigo at tunes.org Tue Sep 28 11:37:58 2004 From: arigo at tunes.org (Armin Rigo) Date: Tue Sep 28 11:43:03 2004 Subject: [Python-Dev] open('/dev/null').read() -> MemoryError In-Reply-To: References: <20040927200533.GA29621@vicky.ecs.soton.ac.uk> Message-ID: <20040928093758.GA21112@vicky.ecs.soton.ac.uk> Hi Bob, On Mon, Sep 27, 2004 at 04:21:03PM -0400, Bob Ippolito wrote: > file(path).read() is never really a good idea in the general case - > especially for a device node. > In other words, it sounds like the test should be fixed, not the > implementation. Sounds good. Does anyone object to the following patch? Index: test_os.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/test/test_os.py,v retrieving revision 1.27 diff -c -r1.27 test_os.py *** test_os.py 29 Aug 2004 18:47:31 -0000 1.27 --- test_os.py 28 Sep 2004 09:42:26 -0000 *************** *** 340,346 **** f.write('hello') f.close() f = file(os.devnull, 'r') ! self.assertEqual(f.read(), '') f.close() class URandomTests (unittest.TestCase): --- 340,351 ---- f.write('hello') f.close() f = file(os.devnull, 'r') ! self.assertEqual(f.read(1), '') ! self.assertEqual(f.read(10), '') ! self.assertEqual(f.read(100), '') ! self.assertEqual(f.read(1000), '') ! self.assertEqual(f.read(10000), '') ! self.assertEqual(f.read(100000), '') f.close() class URandomTests (unittest.TestCase): -+- Armin From amk at amk.ca Tue Sep 28 17:37:25 2004 From: amk at amk.ca (A.M. Kuchling) Date: Tue Sep 28 17:38:45 2004 Subject: [Python-Dev] Fwd: Python binding to Rendezvous In-Reply-To: References: Message-ID: <20040928153725.GB27126@rogue.amk.ca> On Mon, Sep 27, 2004 at 07:51:06PM -0700, Guido van Rossum wrote: > I've been asked to help getting Rendevous (AKA zeroconf I believe) > bindings for Python implemented. Anyone interested in helping out? I've volunteered for this. > >>>> There are also Parrot and Perl bindings on the way. The question They're working on Ruby and Parrot bindings before Python ones? What colour is the sky in their world? --amk From judson at mcs.anl.gov Tue Sep 28 17:52:53 2004 From: judson at mcs.anl.gov (Ivan R. Judson) Date: Tue Sep 28 17:54:29 2004 Subject: [Python-Dev] Fwd: Python binding to Rendezvous In-Reply-To: Message-ID: <200409281553.i8SFr0r92026@mcs.anl.gov> We have interest in this here on a project and have been investigating using SWIG to generate multiple language bindings of the Apple Rendezvous code. We'd be interested in either helping, or just doing this as we need it for various things. Suggestions for alternative approaches are welcome. --Ivan > -----Original Message----- > From: python-dev-bounces+judson=mcs.anl.gov@python.org > [mailto:python-dev-bounces+judson=mcs.anl.gov@python.org] On > Behalf Of Guido van Rossum > Sent: Monday, September 27, 2004 9:51 PM > To: Python-Dev; Daniel H Steinberg > Subject: [Python-Dev] Fwd: Python binding to Rendezvous > > I've been asked to help getting Rendevous (AKA zeroconf I > believe) bindings for Python implemented. Anyone interested > in helping out? > (The person to contact if you're interested is Daniel Steinberg. > Please cc me on the initial emails so I know contact is being made. > > --Guido > > ---------- Forwarded message ---------- > From: Joey Trevino > Date: Mon, 27 Sep 2004 17:09:48 -0700 > Subject: Python binding to Rendezvous > To: Guido van Rossum > Cc: Daniel H Steinberg > > Hey Guido, > > Here's the description of the task from my friend Daniel > Steinberg (CC'd here) at O'Reilly. I understand that you > don't have time to do the work yourself, but I'm hopeful that > you know of someone who would. > > >>>> Apple has just provided a daemon for Rendezvous for Mac, for > >>>> Windows, for Linux, and for UNIX. Rich Kilmer spent an afternoon > >>>> with Stuart Cheshire and wrote Ruby bindings for this > daemon and so > >>>> Ruby developers can easily Rendezvous enable their application. > >>>> There are also Parrot and Perl bindings on the way. The question > >>>> was whether someone had an afternoon free to spend with > Stuart to > >>>> write the Python bindings. Stuart is not a Python expert > but he is > >>>> good at teaming up with someone who is to help them see > what hooks > >>>> are needed. > > Thanks, > Joey > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/judson%40mcs.anl.gov > > From kbk at shore.net Wed Sep 29 07:17:48 2004 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed Sep 29 07:17:53 2004 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200409290517.i8T5Hm0L017921@h006008a7bda6.ne.client2.attbi.com> Patch / Bug Summary ___________________ Patches : 235 open ( +0) / 2637 closed ( +4) / 2872 total ( +4) Bugs : 768 open ( +1) / 4480 closed (+17) / 5248 total (+18) RFE : 152 open ( +1) / 131 closed ( +0) / 283 total ( +1) New / Reopened Patches ______________________ unittest.py patch: add skipped test functionality (2004-09-24) http://python.org/sf/1034053 opened by Remy Blank Remove CoreServices / CoreFoundation dependencies in core (2004-09-26) http://python.org/sf/1035255 opened by Bob Ippolito -m option to run a module as a script (2004-09-28) http://python.org/sf/1035498 opened by Nick Coghlan Add New RPM-friendly record option to setup.py (2004-09-28) http://python.org/sf/1035576 opened by Jeff Pitman Patches Closed ______________ Add API to logging package to allow intercooperation. (2004-09-21) http://python.org/sf/1032206 closed by vsajip SystemError generated by struct.pack('P', 'notanumber') (2004-08-18) http://python.org/sf/1011240 closed by arigo (bug 952953) execve empty 2nd arg fix (2004-08-14) http://python.org/sf/1009075 closed by arigo atexit decorator (2004-09-21) http://python.org/sf/1031687 closed by rhettinger New / Reopened Bugs ___________________ idle -n crashes (2004-09-22) CLOSED http://python.org/sf/1032395 opened by Matthias Klose Odd behavior with unicode.translate on OSX. (2004-09-22) http://python.org/sf/1032615 opened by Jeremy Fincher ftplib has incomplete transfer when sending files in Windows (2004-09-22) http://python.org/sf/1032875 opened by Ed Sanville Confusing description of strict option for email.Parser (2004-09-23) http://python.org/sf/1032960 opened by Andrew Bennetts Misleading error message in random.choice (2004-09-22) CLOSED http://python.org/sf/1033038 opened by Nefarious CodeMonkey, Jr. build doesn't pick up bsddb w/Mandrake 9.2 (2004-09-23) http://python.org/sf/1033390 opened by Alex Martelli buffer() object broken. (2004-09-23) CLOSED http://python.org/sf/1033720 opened by James Y Knight Can't inherit slots from new-style classes implemented in C (2004-09-24) http://python.org/sf/1034178 opened by Phil Thompson More buffer object brokenness (2004-09-24) CLOSED http://python.org/sf/1034242 opened by James Y Knight Why does Python link to Foundation? (2004-09-24) http://python.org/sf/1034277 opened by Bob Ippolito Configure uses GNU ld flags with non-GNU compilers/linkers (2004-09-25) CLOSED http://python.org/sf/1034496 opened by Drew Schatt hex() and oct() documentation is incorrect (2004-09-27) http://python.org/sf/1035279 opened by Nick Coghlan distutils.util.get_platform() should include sys.version[:2] (2004-09-27) http://python.org/sf/1035703 opened by Bob Ippolito Tix.Grid widgets not implemented yet (2004-09-28) http://python.org/sf/1036406 opened by Christos Georgiou unicode strings cannot be dictionary keys (2004-09-28) http://python.org/sf/1036490 opened by Morten Kjeldgaard Email module's feed parser (2004-09-28) CLOSED http://python.org/sf/1036506 opened by Matthew Cowles file.next() info hidden (2004-09-28) http://python.org/sf/1036626 opened by Nick Jacobson printf() in dynload_shlib.c should be PySys_WriteStderr (2004-09-28) http://python.org/sf/1036752 opened by Jp Calderone Bugs Closed ___________ rfc822 __iter__ problem (2004-09-17) http://python.org/sf/1030125 closed by rhettinger Fold tuples of constants into a single constant (2004-09-20) http://python.org/sf/1031667 closed by rhettinger Misleading error message in random.choice (2004-09-22) http://python.org/sf/1033038 closed by rhettinger PEP 302 loader not carried through by reload function (2004-09-16) http://python.org/sf/1029475 closed by pje Float/long comparison anomaly (2002-02-06) http://python.org/sf/513866 closed by tim_one buffer() object broken. (2004-09-23) http://python.org/sf/1033720 closed by nascheme ConfigParser's get method gives utf-8 for a utf-16 config... (2004-01-10) http://python.org/sf/874354 closed by goodger More buffer object brokenness (2004-09-24) http://python.org/sf/1034242 closed by nascheme embedding in multi-threaded & multi sub-interpreter environ (2004-03-22) http://python.org/sf/921077 closed by bcannon Configure uses GNU ld flags with non-GNU compilers/linkers (2004-09-25) http://python.org/sf/1034496 closed by loewis 2.4 asyncore breaks Zope (2004-08-18) http://python.org/sf/1011606 closed by tim_one CPU usage shoots up with asyncore (2004-08-16) http://python.org/sf/1010098 closed by arigo execve rejects empty argument list (2004-05-13) http://python.org/sf/952953 closed by arigo email.Message.Message.__getitem__ doc string wrong (2004-06-25) http://python.org/sf/979924 closed by bwarsaw Email module's feed parser (2004-09-28) http://python.org/sf/1036506 closed by bwarsaw idle -n crashes (2004-09-22) http://python.org/sf/1032395 closed by kbk IDLE hangs when inactive more than 2 hours (2004-08-02) http://python.org/sf/1001869 closed by kbk From nbastin at opnet.com Wed Sep 29 16:52:32 2004 From: nbastin at opnet.com (Nick Bastin) Date: Wed Sep 29 16:52:51 2004 Subject: [Python-Dev] Finding the module from PyTypeObject? Message-ID: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> Is there any way to (reliably) find the module that defined the class represented by a given PyTypeObject in C? -- Nick From mwh at python.net Wed Sep 29 18:56:40 2004 From: mwh at python.net (Michael Hudson) Date: Wed Sep 29 18:56:42 2004 Subject: [Python-Dev] Finding the module from PyTypeObject? In-Reply-To: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> (Nick Bastin's message of "Wed, 29 Sep 2004 10:52:32 -0400") References: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> Message-ID: <2m4qlht3nb.fsf@starship.python.net> Nick Bastin writes: > Is there any way to (reliably) find the module that defined the class > represented by a given PyTypeObject in C? Not especially appropriate for python-dev... I think the answer depends on what you mean by "reliably". __module__ is a good first bet, but can be defeated with sufficient malice (or mere inattention, in the case of types defined by C). Cheers, mwh -- ... the U.S. Department of Transportation today disclosed that its agents have recently cleared airport security checkpoints with an M1 tank, a beluga whale, and a fully active South American volcano. -- http://www.satirewire.com/news/march02/screeners.shtml From nbastin at opnet.com Wed Sep 29 19:24:23 2004 From: nbastin at opnet.com (Nick Bastin) Date: Wed Sep 29 19:24:38 2004 Subject: [Python-Dev] Finding the module from PyTypeObject? In-Reply-To: <2m4qlht3nb.fsf@starship.python.net> References: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> <2m4qlht3nb.fsf@starship.python.net> Message-ID: <680B417C-123C-11D9-8F21-000D932927FE@opnet.com> On Sep 29, 2004, at 12:56 PM, Michael Hudson wrote: > Nick Bastin writes: > >> Is there any way to (reliably) find the module that defined the class >> represented by a given PyTypeObject in C? > > Not especially appropriate for python-dev... > > I think the answer depends on what you mean by "reliably". __module__ > is a good first bet, but can be defeated with sufficient malice (or > mere inattention, in the case of types defined by C). Ok, maybe more appropriately, what do people think of adding a PyType_GetModule (PyTypeObject *) which basically functions like type_module(PyTypeObject *, void *) (in Objects/typeobject.c) to the public C API, rather than having to dig around in the object themselves? -- Nick From arigo at tunes.org Wed Sep 29 22:17:02 2004 From: arigo at tunes.org (Armin Rigo) Date: Wed Sep 29 22:22:10 2004 Subject: [Python-Dev] Finding the module from PyTypeObject? In-Reply-To: <680B417C-123C-11D9-8F21-000D932927FE@opnet.com> References: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> <2m4qlht3nb.fsf@starship.python.net> <680B417C-123C-11D9-8F21-000D932927FE@opnet.com> Message-ID: <20040929201702.GA19671@vicky.ecs.soton.ac.uk> Hello Nick, On Wed, Sep 29, 2004 at 01:24:23PM -0400, Nick Bastin wrote: > Ok, maybe more appropriately, what do people think of adding a > PyType_GetModule (PyTypeObject *) which basically functions like > type_module(PyTypeObject *, void *) (in Objects/typeobject.c) to the > public C API, rather than having to dig around in the object > themselves? It looks overkill, when you can do instead: PyObject* module_name = PyObject_GetAttrString(type, "__module__"); Armin From nbastin at opnet.com Wed Sep 29 22:29:39 2004 From: nbastin at opnet.com (Nick Bastin) Date: Wed Sep 29 22:30:15 2004 Subject: [Python-Dev] Finding the module from PyTypeObject? In-Reply-To: <20040929201702.GA19671@vicky.ecs.soton.ac.uk> References: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> <2m4qlht3nb.fsf@starship.python.net> <680B417C-123C-11D9-8F21-000D932927FE@opnet.com> <20040929201702.GA19671@vicky.ecs.soton.ac.uk> Message-ID: <49A2279C-1256-11D9-8F21-000D932927FE@opnet.com> On Sep 29, 2004, at 4:17 PM, Armin Rigo wrote: > Hello Nick, > > On Wed, Sep 29, 2004 at 01:24:23PM -0400, Nick Bastin wrote: >> Ok, maybe more appropriately, what do people think of adding a >> PyType_GetModule (PyTypeObject *) which basically functions like >> type_module(PyTypeObject *, void *) (in Objects/typeobject.c) to the >> public C API, rather than having to dig around in the object >> themselves? > > It looks overkill, when you can do instead: > > PyObject* module_name = PyObject_GetAttrString(type, "__module__"); That only works most of the time, I think. To be honest, I didn't try that, but it doesn't seem that type_module would jump through the hoops it does if that worked all of the time, unless parsing tp_name is legacy code. -- Nick From tim.peters at gmail.com Wed Sep 29 22:30:10 2004 From: tim.peters at gmail.com (Tim Peters) Date: Wed Sep 29 22:30:33 2004 Subject: [Python-Dev] Odd compile errors for bad genexps Message-ID: <1f7befae04092913305132017c@mail.gmail.com> >>> (i for i in x) = 2 SystemError: assign to generator expression not possible 1. Why is that a SystemError instead of a SyntaxError? SystemError doesn't make sense. 2. Why didn't it echo the offending line? >>> (i for i in x) += 2 SyntaxError: augmented assign to tuple literal not possible 3. That's not a tuple literal. 4. See #2 . From python at rcn.com Wed Sep 29 23:30:52 2004 From: python at rcn.com (Raymond Hettinger) Date: Wed Sep 29 23:32:59 2004 Subject: [Python-Dev] Odd compile errors for bad genexps In-Reply-To: <1f7befae04092913305132017c@mail.gmail.com> Message-ID: <000601c4a66b$994d2da0$e841fea9@oemcomputer> [Tim] > >>> (i for i in x) = 2 > SystemError: assign to generator expression not possible > > 1. Why is that a SystemError instead of a SyntaxError? SystemError > doesn't make sense. It, of course, should be a SyntaxError. The fix is easy. Put in PyExc_SyntaxError on line 3206 in compile.c > 2. Why didn't it echo the offending line? I don't follow this part. The output is no different from: >>> str(x) = 2 SyntaxError: can't assign to function call > >>> (i for i in x) += 2 > SyntaxError: augmented assign to tuple literal not possible > > 3. That's not a tuple literal. The code for that one was modeled after broken code for list comps: >>> [i for i in x] += 2 SyntaxError: augmented assign to list literal not possible That's not a list literal either. For both genexps and listcomps, the test for augmented assignment should likely be moved before the same test for tuple literals and list literals (they only check for LPAR or LSQB to trigger their message). Is there a compiler weenie in the house who knows how to reliably fix this one? Though I can see the problem clearly enough, I'm just enough out of my element that I don't want to touch it. Raymond From python at rcn.com Thu Sep 30 02:40:42 2004 From: python at rcn.com (Raymond Hettinger) Date: Thu Sep 30 02:43:00 2004 Subject: [Python-Dev] Odd compile errors for bad genexps In-Reply-To: <000601c4a66b$994d2da0$e841fea9@oemcomputer> Message-ID: <000001c4a686$1e4fef00$e841fea9@oemcomputer> > [Tim] > > >>> (i for i in x) = 2 > > SystemError: assign to generator expression not possible > > > > 1. Why is that a SystemError instead of a SyntaxError? SystemError > > doesn't make sense. . . . > > >>> (i for i in x) += 2 > > SyntaxError: augmented assign to tuple literal not possible > > > > 3. That's not a tuple literal Okay, those two are fixed. > > 2. Why didn't it echo the offending line? The code for com_error() screens out the line numbering when in the interactive mode. You get the full echo when running a script. What is interesting is that some syntax errors ("2 & * 3" for example) by-pass com_error() and echo the line with a caret pointing at the offending token. These are both probably as they should be. Raymond From tim.peters at gmail.com Thu Sep 30 04:58:38 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Sep 30 04:58:43 2004 Subject: [Python-Dev] Odd compile errors for bad genexps In-Reply-To: <000001c4a686$1e4fef00$e841fea9@oemcomputer> References: <000601c4a66b$994d2da0$e841fea9@oemcomputer> <000001c4a686$1e4fef00$e841fea9@oemcomputer> Message-ID: <1f7befae04092919587fcb9eb4@mail.gmail.com> [Raymond Hettinger] > Okay, those two are fixed. Thank you! >> 2. Why didn't it echo the offending line? > The code for com_error() screens out the line numbering when in the > interactive mode. You get the full echo when running a script. > > What is interesting is that some syntax errors ("2 & * 3" for example) > by-pass com_error() and echo the line with a caret pointing at the > offending token. Or plain "+" or plain "if" or "2 &" or "*3" or "from math import sin as" etc etc. That's why I asked. I almost always see an echo echo echo. But those are actually syntax errors, in the sense that can't be derived from the formal grammar. > These are both probably as they should be. Not if it makes life harder for doctest . From ncoghlan at email.com Thu Sep 30 12:11:06 2004 From: ncoghlan at email.com (Nick Coghlan) Date: Thu Sep 30 12:11:14 2004 Subject: [Python-Dev] Running a module as a script Message-ID: <1096539066.415bdbbaaed43@mail.iinet.net.au> Patch # 1035498 attempts to implement the semantics suggested by Ilya and Anthony and co. "python -m module" Means: - find the source file for the relevant module (using the standard locations for module import) - run the located script as __main__ (note that containing packages are NOT imported first - it's as if the relevant module was executed directly from the command line) - as with '-c' anything before the option is an argument to the interpreter, anything after is an argument to the script The allowed modules are those whose associated source file meet the normal rules for a command line script. I believe that means .py and .pyc files only (e.g. "python -m profile" works, but "python -m hotshot" does not). Special import hooks (such as zipimport) almost certainly won't work (since I don't believe they work with the current command line script mechanism). Cheers, Nick. -- Nick Coghlan Brisbane, Australia From ncoghlan at email.com Thu Sep 30 12:21:47 2004 From: ncoghlan at email.com (Nick Coghlan) Date: Thu Sep 30 12:21:53 2004 Subject: [Python-Dev] Proposing a sys.special_exceptions tuple Message-ID: <1096539707.415bde3ba1425@mail.iinet.net.au> I spent some time the other day looking at the use of bare except statements in the standard library. Many of them seemed to fall into the category of 'need to catch anything user code is likely to throw, but shouldn't be masking SystemExit, StopIteration, KeyboardInterrupt, MemoryError, etc'. Changing them to "except Exception:" doesn't help, since all of the above still fit into that category (Tim posted a message recently about rearranging the Exception heirarchy to fix this. Backwards compatibility woes pretty much killed the discussion though). However, another possibility occurred to me: try: # Do stuff except sys.special_exceptions: raise except: # Deal with all the mundane stuff With an appropriately defined tuple, that makes it easy for people to "do the right thing" with regards to critical exceptions. Such a tuple could also be useful for invoking isinstance() and issubclass(). Who knows? If something like this caught on, it might some day be possible to kill a Python script with a single press of Ctrl-C };> Cheers, Nick. -- Nick Coghlan Brisbane, Australia From mwh at python.net Thu Sep 30 13:46:59 2004 From: mwh at python.net (Michael Hudson) Date: Thu Sep 30 13:47:00 2004 Subject: [Python-Dev] Finding the module from PyTypeObject? In-Reply-To: <49A2279C-1256-11D9-8F21-000D932927FE@opnet.com> (Nick Bastin's message of "Wed, 29 Sep 2004 16:29:39 -0400") References: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> <2m4qlht3nb.fsf@starship.python.net> <680B417C-123C-11D9-8F21-000D932927FE@opnet.com> <20040929201702.GA19671@vicky.ecs.soton.ac.uk> <49A2279C-1256-11D9-8F21-000D932927FE@opnet.com> Message-ID: <2mfz50rnbg.fsf@starship.python.net> Nick Bastin writes: > On Sep 29, 2004, at 4:17 PM, Armin Rigo wrote: > >> Hello Nick, >> >> On Wed, Sep 29, 2004 at 01:24:23PM -0400, Nick Bastin wrote: >>> Ok, maybe more appropriately, what do people think of adding a >>> PyType_GetModule (PyTypeObject *) which basically functions like >>> type_module(PyTypeObject *, void *) (in Objects/typeobject.c) to the >>> public C API, rather than having to dig around in the object >>> themselves? >> >> It looks overkill, when you can do instead: >> >> PyObject* module_name = PyObject_GetAttrString(type, "__module__"); > > That only works most of the time, I think. To be honest, I didn't try > that, but it doesn't seem that type_module would jump through the > hoops it does if that worked all of the time, unless parsing tp_name > is legacy code. Huh? The code above *winds up* calling type_module! Cheers, mwh -- i am trying to get Asterisk to work it is stabbing me in the face yes ... i seem to recall that feature in the documentation -- from Twisted.Quotes From nbastin at opnet.com Thu Sep 30 15:43:01 2004 From: nbastin at opnet.com (Nick Bastin) Date: Thu Sep 30 15:43:21 2004 Subject: [Python-Dev] Finding the module from PyTypeObject? In-Reply-To: <2mfz50rnbg.fsf@starship.python.net> References: <3182857D-1227-11D9-8F21-000D932927FE@opnet.com> <2m4qlht3nb.fsf@starship.python.net> <680B417C-123C-11D9-8F21-000D932927FE@opnet.com> <20040929201702.GA19671@vicky.ecs.soton.ac.uk> <49A2279C-1256-11D9-8F21-000D932927FE@opnet.com> <2mfz50rnbg.fsf@starship.python.net> Message-ID: On Sep 30, 2004, at 7:46 AM, Michael Hudson wrote: > Nick Bastin writes: > >> On Sep 29, 2004, at 4:17 PM, Armin Rigo wrote: >> >>> Hello Nick, >>> >>> On Wed, Sep 29, 2004 at 01:24:23PM -0400, Nick Bastin wrote: >>>> Ok, maybe more appropriately, what do people think of adding a >>>> PyType_GetModule (PyTypeObject *) which basically functions like >>>> type_module(PyTypeObject *, void *) (in Objects/typeobject.c) to the >>>> public C API, rather than having to dig around in the object >>>> themselves? >>> >>> It looks overkill, when you can do instead: >>> >>> PyObject* module_name = PyObject_GetAttrString(type, "__module__"); >> >> That only works most of the time, I think. To be honest, I didn't try >> that, but it doesn't seem that type_module would jump through the >> hoops it does if that worked all of the time, unless parsing tp_name >> is legacy code. > > Huh? The code above *winds up* calling type_module! Doh, nevermind...I missed the getter def. -- Nick (::slinks off back under his rock now::) From aahz at pythoncraft.com Thu Sep 30 15:57:18 2004 From: aahz at pythoncraft.com (Aahz) Date: Thu Sep 30 15:57:21 2004 Subject: [Python-Dev] Running a module as a script In-Reply-To: <1096539066.415bdbbaaed43@mail.iinet.net.au> References: <1096539066.415bdbbaaed43@mail.iinet.net.au> Message-ID: <20040930135718.GA208@panix.com> On Thu, Sep 30, 2004, Nick Coghlan wrote: > > The allowed modules are those whose associated source file meet the > normal rules for a command line script. I believe that means .py > and .pyc files only (e.g. "python -m profile" works, but "python -m > hotshot" does not). Not positive, but if you're allowing .pyc, you should probably allow .pyo if optimize mode is on. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From pje at telecommunity.com Thu Sep 30 16:19:22 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 30 16:19:22 2004 Subject: [Python-Dev] Proposing a sys.special_exceptions tuple In-Reply-To: <1096539707.415bde3ba1425@mail.iinet.net.au> Message-ID: <5.1.1.6.0.20040930101453.0244e8f0@mail.telecommunity.com> At 08:21 PM 9/30/04 +1000, Nick Coghlan wrote: >However, another possibility occurred to me: > >try: > # Do stuff >except sys.special_exceptions: > raise >except: > # Deal with all the mundane stuff > >With an appropriately defined tuple, that makes it easy for people to "do the >right thing" with regards to critical exceptions. Such a tuple could also be >useful for invoking isinstance() and issubclass(). +1. This would be a big help for developers, if only in that it will tell us what exceptions we ought to do this with. IMO, this is probably important enough to make it a builtin; maybe call it CriticalExceptions or some such. Also, maybe in 2.5 we could begin warning about bare excepts that aren't preceded by non-bare exceptions. From barry at python.org Thu Sep 30 16:27:10 2004 From: barry at python.org (Barry Warsaw) Date: Thu Sep 30 16:27:17 2004 Subject: [Python-Dev] Proposing a sys.special_exceptions tuple In-Reply-To: <5.1.1.6.0.20040930101453.0244e8f0@mail.telecommunity.com> References: <5.1.1.6.0.20040930101453.0244e8f0@mail.telecommunity.com> Message-ID: <1096554430.20270.23.camel@geddy.wooz.org> On Thu, 2004-09-30 at 10:19, Phillip J. Eby wrote: > At 08:21 PM 9/30/04 +1000, Nick Coghlan wrote: > >However, another possibility occurred to me: > > > >try: > > # Do stuff > >except sys.special_exceptions: > > raise > >except: > > # Deal with all the mundane stuff +0, except that I'd rather see it put in the exceptions module and given a name in builtins. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20040930/18ac1740/attachment.pgp From theller at python.net Thu Sep 30 16:31:22 2004 From: theller at python.net (Thomas Heller) Date: Thu Sep 30 16:31:21 2004 Subject: [Python-Dev] Running a module as a script In-Reply-To: <20040930135718.GA208@panix.com> (aahz@pythoncraft.com's message of "Thu, 30 Sep 2004 09:57:18 -0400") References: <1096539066.415bdbbaaed43@mail.iinet.net.au> <20040930135718.GA208@panix.com> Message-ID: Aahz writes: > On Thu, Sep 30, 2004, Nick Coghlan wrote: >> >> The allowed modules are those whose associated source file meet the >> normal rules for a command line script. I believe that means .py >> and .pyc files only (e.g. "python -m profile" works, but "python -m >> hotshot" does not). > > Not positive, but if you're allowing .pyc, you should probably allow > .pyo if optimize mode is on. Plus .pyw, on Windows. Thomas From pje at telecommunity.com Thu Sep 30 17:16:12 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 30 17:16:11 2004 Subject: [Python-Dev] Running a module as a script In-Reply-To: References: <20040930135718.GA208@panix.com> <1096539066.415bdbbaaed43@mail.iinet.net.au> <20040930135718.GA208@panix.com> Message-ID: <5.1.1.6.0.20040930111348.038beb60@mail.telecommunity.com> At 04:31 PM 9/30/04 +0200, Thomas Heller wrote: >Aahz writes: > > > On Thu, Sep 30, 2004, Nick Coghlan wrote: > >> > >> The allowed modules are those whose associated source file meet the > >> normal rules for a command line script. I believe that means .py > >> and .pyc files only (e.g. "python -m profile" works, but "python -m > >> hotshot" does not). > > > > Not positive, but if you're allowing .pyc, you should probably allow > > .pyo if optimize mode is on. > >Plus .pyw, on Windows. Using the C equivalent of 'imp.find_module()' should cover all these cases, and any new forms of PY_SOURCE or PY_COMPILED that come up in future. From cce at clarkevans.com Thu Sep 30 17:47:22 2004 From: cce at clarkevans.com (Clark C. Evans) Date: Thu Sep 30 17:47:25 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040909215548.GB61544@prometheusresearch.com> References: <20040908014845.GA52384@prometheusresearch.com> <0F1308BF-01C2-11D9-AC9C-000A95A50FB2@fuhm.net> <413F564D.2070708@bluewin.ch> <20040908192056.GB62848@prometheusresearch.com> <20040909101444.GA2877@vicky.ecs.soton.ac.uk> <20040909215548.GB61544@prometheusresearch.com> Message-ID: <20040930154722.GA79121@prometheusresearch.com> To distill this request to a sentence: I would like syntax-level support in Python for a Continuation Passing Style (CPS) of code execution. It is important to note that Ruby, Parrot (next-generation Perl), and SML-NJ all support this async programming style. In Python land, the Twisted framework uses this style via its Deferred mechanism. This isn't a off-the-wall request. I currently think that a generator syntax would be the best, and this proposal is for further work via defining a SuspendIterator semantics. However, I'm not tied to this implementation. A pre-parser which made Deferred object handling nicer could also work, or any other option that provides an intuitive syntax for CPS in Python. The hoops that Twisted has to jump-through to wrap Exceptions for use in a Deferred processing chain, and also the (completely necessary but yet) convoluted ways of combining Deferreds is, IMHO, a direct result of lack of support for CPS in Python. These items have a huge impact application program readability and maintenance. Clean syntax-level support for CPS in Python would be a boon for application developers. Best, Clark From lalo at laranja.org Thu Sep 30 17:52:48 2004 From: lalo at laranja.org (Lalo Martins) Date: Thu Sep 30 17:56:37 2004 Subject: [Python-Dev] Proposing a sys.special_exceptions tuple In-Reply-To: <5.1.1.6.0.20040930101453.0244e8f0@mail.telecommunity.com> References: <1096539707.415bde3ba1425@mail.iinet.net.au> <5.1.1.6.0.20040930101453.0244e8f0@mail.telecommunity.com> Message-ID: <20040930155247.GU13993@laranja.org> On Thu, Sep 30, 2004 at 10:19:22AM -0400, Phillip J. Eby wrote: > > Also, maybe in 2.5 we could begin warning about bare excepts that aren't > preceded by non-bare exceptions. try: foo() except: print_or_log_exception_in_a_way_that_is_meaningful() raise doesn't seem to be incorrect to me. For example, if the program is a daemon, I want the exception logged somewhere so that I can see it later, because I won't be watching stderr. []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://garfield.laranja.org/~lalo/gpgkey-signed.asc GNU: never give up freedom http://www.gnu.org/ From jcarlson at uci.edu Thu Sep 30 18:32:37 2004 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu Sep 30 18:39:56 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040930154722.GA79121@prometheusresearch.com> References: <20040909215548.GB61544@prometheusresearch.com> <20040930154722.GA79121@prometheusresearch.com> Message-ID: <20040930090510.FE87.JCARLSON@uci.edu> > It is important to note that Ruby, Parrot (next-generation Perl), > and SML-NJ all support this async programming style. In Python For those of us who aren't current on the latest happenings of Ruby, Parrot and SML/NJ, it may be convenient for us to hear precisely how "async programming style" is done in those languages, so we have a reference, and so that we can agree (or disagree) with you about whether they are equivalent to your PEP. It would also be nice if you were to do a bit of research on the internals of those languages, to discover how it is actually implemented. This would allow Python interpreter hackers to say, "Yes, that kind of thing is possible," "Maybe with a bit of work," "It is not possible with the current interpreter," or even "It wouldn't be usable on Jython." With that said, I believe there is a general consensus that this kind of thing would be useful. For me, if I had greenlets everywhere I'd be happy (though I understand that this may not be technically possible on Jython). - Josiah From lt at toetsch.at Thu Sep 30 21:30:17 2004 From: lt at toetsch.at (Leopold Toetsch) Date: Thu Sep 30 21:29:52 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <20040930090510.FE87.JCARLSON@uci.edu> References: <20040909215548.GB61544@prometheusresearch.com> <20040930154722.GA79121@prometheusresearch.com> <20040930090510.FE87.JCARLSON@uci.edu> Message-ID: <415C5EC9.8070308@toetsch.at> Josiah Carlson wrote: >>It is important to note that Ruby, Parrot (next-generation Perl), >>and SML-NJ all support this async programming style. In Python > > > For those of us who aren't current on the latest happenings of Ruby, > Parrot and SML/NJ, it may be convenient for us to hear precisely how > "async programming style" is done in those languages, Some clarifications WRT Parrot. Parrot isn't a language, Parrot isn't "next-generation Perl". Parrot is a virtual machine that will run Perl6. And Parrot is running currently languages like Python, tcl, m4, forth, and others more or less completely[1]. Parrot's function calling scheme is CPS. A Python generator function gets automatically translated to a coroutine. Returning from a plain function is done by invoking a continuation. And you can of course (in Parrot assembly) create a continuation store it away and invoke it at any time later, which will continue program execution at that point, where it should continue. Please note that that has nothing to do with "aync programming". Its just like a GOTO, but w/o limitation where you'll branch to - or almost no limitations: you can't cross C-stack boundaries on in other words you can't branch to other incarnations of the run-loop. (Exceptions are a bit more flexible though, but they still can only jump "up" the C-stack) Using CPS for function calls implies therefore a non-trivial rewrite of CPython, which OTOH and AFAIK is already available as Stackless Python. Making continuations usable at the language level is a different thing, though. leo [1] http://www.parrotcode.org - in CVS languages/python. The test b2.py from the Pie-thon benchmark has two generators (izip, Pi.__iter__), which are Parrot coroutines, that's working fine. From pje at telecommunity.com Thu Sep 30 21:37:17 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 30 21:37:18 2004 Subject: [Python-Dev] Proposing a sys.special_exceptions tuple In-Reply-To: <20040930155247.GU13993@laranja.org> References: <5.1.1.6.0.20040930101453.0244e8f0@mail.telecommunity.com> <1096539707.415bde3ba1425@mail.iinet.net.au> <5.1.1.6.0.20040930101453.0244e8f0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20040930153454.02bc04c0@mail.telecommunity.com> At 12:52 PM 9/30/04 -0300, Lalo Martins wrote: >On Thu, Sep 30, 2004 at 10:19:22AM -0400, Phillip J. Eby wrote: > > > > Also, maybe in 2.5 we could begin warning about bare excepts that aren't > > preceded by non-bare exceptions. > >try: > foo() >except: > print_or_log_exception_in_a_way_that_is_meaningful() > raise > >doesn't seem to be incorrect to me. For example, if the program >is a daemon, I want the exception logged somewhere so that I can >see it later, because I won't be watching stderr. 1. If the exception raised is a MemoryError, your daemon is in trouble. 2. I said *warn*, and it'd be easy to suppress the warning using 'except Exception:', if that's what you really mean 3. But I suppose this could be considered a job for pychecker. From bac at OCF.Berkeley.EDU Thu Sep 30 22:02:34 2004 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Sep 30 22:02:45 2004 Subject: [Python-Dev] [OT] stats on type-inferencing atomic types in local variables in the stdlib Message-ID: <415C665A.4060706@ocf.berkeley.edu> My thesis (which, for those who don't know, was to come up with a way to do type inferencing in the compiler without requiring any semantic changes; basically type inferencing atomic types assigned to local variables) is now far enough long that I have the algorithm done and I can generate statistics on what opcodes are called with the most common types that I can specifically infer (can also do static type checking on occasion; only triggers were actual unit tests making sure TypeError was raised for certain things like ``~4.2`` and such). Thought some of you might get a kick out of this since the numbers are rather blatent for certain opcodes and methods. To read the stats, the number to the left is the number of times the opcode was compiled (not executed!) with the specific type(s) known for the opcode (if it took two args, then both types are listed; order was considered irrelevant). Now they are listed as integers, so here is the conversion:: Basestring 4 IntegralType 8 FloatType 16 ImagType 32 DictType 64 ListType 128 TupleType 256 For the things named "meth_" that is the method being called immediately on the type. Now realize these numbers are only for opcodes where I could definitely infer the type; ones where it could be more than one type, regardless if those possibilities were very specific, I just ignored it and did not include in the stats. I also tweaked some opcodes knowing how they are more often used. So, for instance, BINARY_MODULO checks specifically for the case of when the left side is a basestring and then just doesn't worry about the other args. Other ones I just didn't bother with all the args since it was not interesting to me in terms of deciding what type-specific opcodes I want to come up with. Anyway, here are the numbers on Lib sans Lib/test (129,814 lines according to SLOCCount) for the ones above 100:: (101, ('BINARY_MULTIPLY', (8, 4))), (106, ('BINARY_SUBSCR', 128)), (118, ('GET_ITER', 128)), (124, ('BINARY_MODULO', None)), (195, ('meth_join', 4)), (204, ('BINARY_ADD', (8, 8))), (331, ('BINARY_ADD', (4, 4))), (513, ('BINARY_LSHIFT', (8, 8))), (840, ('meth_append', 128)), (1270, ('PRINT_ITEM', 4)), (1916, ('BINARY_MODULO', 4)), (12302, ('STORE_SUBSCR', 64))] We sure like our dictionaries (for those that don't know, dictionaries are created by making an empty dict and then basically doing an indivual assignment for each value). We also seem to love to use string interpolation, and printing stuff. Using list.append is also popular. Now the BINARY_LSHIFT is rather interesting, and that ties into the whole issue of how much I can actually infer; since binary work tends to be with all constants I can infer it really easily and so its frequency is rather high. Its actual frequency of use, though, compared to other things probably is not high, though. Plus I doubt Plone, for instance, uses ``<<`` very often so I suspect the opcode will get weeded out when I incorporate stats from the other apps I am taking stats from. As for the stuff I cut out, the surprising thing from those numbers was how few mathematical expressions could be inferred. I checked my numbers with grep and there really is only 3 times where a float constant is divided by a float constant (and they are all in colorsys). I was not expecting that at all. Guess global variables or object attributes tend to have them or I just can't infer the values. Either way I just wasn't expecting that. Anyway, as I said I just thought some people might find this interesting. Don't read into this too much since I am just using these numbers as guidelines for type-specific opcodes to write for use as a quantifiable measurement of the usefulness of type inferencing like this. -Brett P.S.: anyone who is *really* interested I can send you the full stats for the apps I have run my modified version of compile.c against. From pje at telecommunity.com Thu Sep 30 22:40:18 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Sep 30 22:40:19 2004 Subject: [Python-Dev] PEP 334 - Simple Coroutines via SuspendIteration In-Reply-To: <415C5EC9.8070308@toetsch.at> References: <20040930090510.FE87.JCARLSON@uci.edu> <20040909215548.GB61544@prometheusresearch.com> <20040930154722.GA79121@prometheusresearch.com> <20040930090510.FE87.JCARLSON@uci.edu> Message-ID: <5.1.1.6.0.20040930153803.02bc0140@mail.telecommunity.com> At 09:30 PM 9/30/04 +0200, Leopold Toetsch wrote: >Please note that that has nothing to do with "aync programming". Its just >like a GOTO, but w/o limitation where you'll branch to - or almost no >limitations: you can't cross C-stack boundaries on in other words you >can't branch to other incarnations of the run-loop. (Exceptions are a bit >more flexible though, but they still can only jump "up" the C-stack) > >Using CPS for function calls implies therefore a non-trivial rewrite of >CPython, which OTOH and AFAIK is already available as Stackless Python. Clark is talking about a limited subset of CPS, where continuations are only single-use. That is, a very limited form of continuations roughly equivalent in power to either Greenlets or a stack of generator-iterators. >Making continuations usable at the language level is a different thing, >though. Indeed, and luckily it isn't needed for PEP 334. PEP 334 just needs the interpreter to be able to resume evaluation of a generator frame at any CALL opcode or "for" looping that invokes a generator-iterator's next() method, if SuspendIteration was raised. I don't know if a corresponding operation for Jython is possible. (In the case of CPython, this could be implemented via a type slot to check whether a callable object is "resumable", so that you actually *could* decorate suitable functions as being resumable, not just generator-iterator next() methods.) Personally, I'm +0 (at most) on the PEP at the moment, as it doesn't IMO add much over using a generator stack, such as what I use in 'peak.events'. I'd be much more interested in a way to pass values and exceptions *into* generators, which would be more in line with what I'd consider "simple coroutines". A mechanism to pass values or exceptions into generators would be let me replace the hackish bits of 'peak.events' with clean language features, but I'm not sure PEP 334 would give me enough to be worth reorganizing my code, as it's presently defined. Also, I find the current PEP a confusing mishmash of references to various technologies (that are all harder to implement than what's actually desires) and unmotivating implementations of things I'd can't see wanting to do. It would be helpful for it to focus on motivating usage examples (such as suspending a report while waiting for a database) *early* in the PEP, rather than burying them at the end. And, most of the sample Python code looks to me like examples of how an implementation might work, but they don't illustrate the intended semantics well, nor do they really help with designing an implementation. Finally, the PEP shouldn't call these co-routines, as co-routines are able to "return" values to other co-routines. The title should be something more like "Resuming Generators after SuspendIteration", which much more accurately describes the scope of the desired result.