From guido@python.org Sat Mar 1 01:52:58 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Feb 2003 20:52:58 -0500 Subject: [Python-Dev] Traceback problem In-Reply-To: "Your message of Wed, 26 Feb 2003 03:40:11 +0100." <3E5C290B.9010802@tismer.com> References: <200302260124.h1Q1O7D16232@pcp02138704pcs.reston01.va.comcast.net> <3E5C290B.9010802@tismer.com> Message-ID: <200303010152.h211qwZ11517@pcp02138704pcs.reston01.va.comcast.net> (Picking up an old thread.) [Guido] > > Watch out though. There are situations where an exception needs > > to be stored but no frame is available (when executing purely in > > C). There is always a thread state. [Christian] > I've been sitting a while over this puzzle now. > > tstate has two different kinds of exceptions: > There are tstate->exc_XXX and tstate->curexc_XXX. > > I have been searching through the whole source trunk > to validate my thought: > > All internal stuff is only concerned with handling > tstate->curexc_XXX. Correct. This is the "hot" exception that is set by PyErr_SetString() c.s., cleared by PyErr_Clear(), and so on. > The tstate->exc_XXX is *only* used in ceval.c . Once an exception is caught by an except clause, it is transferred from tstate->curexc_XXX to tstate->exc_XXX, from which sys.exc_info() can pick it up. > References to tstate->exc_XXX are only in > pystate.c (clearing stuff) and sysmodule.c (accessing stuff). > The only place where tstate->exc_XXX is filled with life > is ceval.c, which indicates that this is purely interpreter- > -related and has nothing to do with the internal exception > state. It is eval_frame which checks for exceptions, normalizes > them and turns them into interpreter-level exceptions, > around line 2360 of ceval.c . Correct. > After stating that, I conclude that tstate.exc_XXX can only > be in use if there is an existing interpreter with an existing > frame. Nobody else makes use of this structure. > So, whenever you have to save this, you can expect a valid > frame waiting in f_back that will be able to take it. Right. Now let me explain the complicated dance with frame->f_exc_XXX. Long ago, when none of this existed, there were just a few globals: one set corresponding to the "hot" exception, and one set corresponding to sys.exc_type etc. The problem was that in code like this: try: "something that may fail" except "some exception": "do something else first" "print the exception from sys.exc_type etc." if "do something else first" invoked something that raised and caught an exception, sys.exc_type etc. were overwritten. That was a frequent cause of subtle bugs. I fixed this by changing the semantics as follows: - Within one frame, sys.exc_XXX will hold the last exception caught *in that frame*. - But initially, and as long as no exception is caught in a given frame, sys.exc_XXX will hold the last exception caught in the previous frame (or the frame before that, etc.). The first bullet fixed the bug in the above example. The second bullet was for backwards compatibility: it was (and is) common to have a function that is called when an exception is caught, and to have that function access the caught exception via sys.exc_XXX. (Example: traceback.print_exc()). At the same time I fixed the problem that sys.exc_type and friends weren't thread-safe, by introducing sys.exc_info() which gets it from tstate; but that's really a separate improvement. The reset_exc_info() function in ceval.c restores the tstate->exc_XXX variables to what they were before the current frame was called. The set_exc_info() function saves them on the frame so that reset_exc_info() can restore them. The invariant is that frame->f_exc_XXX is NULL iff the current frame never caught an exception (where "catching" an exception applies only to successful except clauses); and if the current frame ever caught an exception, frame->f_exc_XXX is the exception that was stored in tstate->exc_XXX at the start of the current frame. Now I hope you'll understand why this was never documented exactly. :-) > (This all under the maybe false assumption that I'm not wrong). No; I guess I was wrong in the quoted text at the top. :-) > Still not proposing a change. But thanks for the time, > I understood quite a lot more of the internals, now. Great! Hope this message has shed some additional light. Kevin, I'll try to get to your patch next. --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy@udel.edu Sat Mar 1 03:25:25 2003 From: tjreedy@udel.edu (Terry Reedy) Date: Fri, 28 Feb 2003 22:25:25 -0500 Subject: [Python-Dev] Re: Traceback problem References: <200302260124.h1Q1O7D16232@pcp02138704pcs.reston01.va.comcast.net> <3E5C290B.9010802@tismer.com> <200303010152.h211qwZ11517@pcp02138704pcs.reston01.va.comcast.net> Message-ID: "Guido van Rossum" wrote in message news:200303010152.h211qwZ11517@pcp02138704pcs.reston01.va.comcast.net. .. [explanation of traceback info storage] > Great! Hope this message has shed some additional light. It would be a shame for this to be lost in the archives. If there were a directory of ImplementationNotes somewhere (or an interpreter wiki), this would belong there. And responders to "where are the docs on the implementation" could be told more than "read the source". TJR From guido@python.org Sat Mar 1 03:40:55 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Feb 2003 22:40:55 -0500 Subject: [Python-Dev] Re: Traceback problem In-Reply-To: "Your message of Fri, 28 Feb 2003 22:25:25 EST." References: <200302260124.h1Q1O7D16232@pcp02138704pcs.reston01.va.comcast.net> <3E5C290B.9010802@tismer.com> <200303010152.h211qwZ11517@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303010340.h213etD23813@pcp02138704pcs.reston01.va.comcast.net> > > Great! Hope this message has shed some additional light. > > It would be a shame for this to be lost in the archives. If there > were a directory of ImplementationNotes somewhere (or an interpreter > wiki), this would belong there. And responders to "where are the docs > on the implementation" could be told more than "read the source". Good idea. I hate separating implementation notes from the code by more than absolutely necessary (Zope's cobweb of Wikis drives me nuts :-), so I added the essence of that message to ceval.c as a big comment block. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Sat Mar 1 04:06:20 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 28 Feb 2003 22:06:20 -0600 Subject: [Python-Dev] Re: Traceback problem In-Reply-To: References: <200302260124.h1Q1O7D16232@pcp02138704pcs.reston01.va.comcast.net> <3E5C290B.9010802@tismer.com> <200303010152.h211qwZ11517@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15968.12732.65507.874387@montanaro.dyndns.org> Terry> [explanation of traceback info storage] >> Great! Hope this message has shed some additional light. Terry> It would be a shame for this to be lost in the archives. If Terry> there were a directory of ImplementationNotes somewhere (or an Terry> interpreter wiki), this would belong there. Feel free to add it to http://manatee.mojam.com/pyvmwiki Skip From niemeyer@conectiva.com Sat Mar 1 08:00:43 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Sat, 1 Mar 2003 05:00:43 -0300 Subject: [Python-Dev] [663074] Codec registry Message-ID: <20030301080043.GA28745@ibook.distro.conectiva> Can someone please review the proposed solution to bug #663074? If accepted, should it be backported to 2.2.3 as well? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From mwh@python.net Sat Mar 1 11:12:21 2003 From: mwh@python.net (Michael Hudson) Date: Sat, 01 Mar 2003 11:12:21 +0000 Subject: [Python-Dev] syntax for funcion attributes In-Reply-To: <006101c2df5b$5bbc72d0$a502200a@mnotlaptop> ("Mark Nottingham"'s message of "Fri, 28 Feb 2003 10:58:11 -0800") References: <006101c2df5b$5bbc72d0$a502200a@mnotlaptop> Message-ID: <2mznofwdbe.fsf@starship.python.net> "Mark Nottingham" writes: > Hello, > > I'm not a python-dev regular, so sorry if this is a FAQ. What's the status > of defining a syntax for function attributes (PEP 232)? I don't think changes here are likely. > I'm using __doc__ to carry metadata about methods right now, but > would very much like to use function attributes. However, without a > specialized syntax, I'm stuck doing things like > > VeryLongMethodName.MetadataName = "foo" > > which is fine if it's a one-off, but I'd like others to use the code, and > this isn't exactly a friendly mechanism. If PEP gets accepted, you'll be able to do def with_metadata(func): func.metadata = "yes" def f(blah) [with_metadata]: .... or even def with_metadata(data): def inner(func): func.metadata = data return inner def f(blah) [with_metadata("yes")]: .... Would that suit you? Cheers, M. -- > Why are we talking about bricks and concrete in a lisp newsgroup? After long experiment it was found preferable to talking about why Lisp is slower than C++... -- Duane Rettig & Tim Bradshaw, comp.lang.lisp From newsgroups1@bitfurnace.com Sat Mar 1 22:54:49 2003 From: newsgroups1@bitfurnace.com (Damien Morton) Date: Sat, 1 Mar 2003 17:54:49 -0500 Subject: [Python-Dev] Re: new bytecode results References: <001301c2def2$09d374a0$6401a8c0@damien> <3E5F2260.3080808@lemburg.com> Message-ID: I just realised that scoring layouts based on adjacency is the traveling salesman problem, where the distance beteween two opcodes is freq[op1][op2]+freq[op2][op1], and the goal is to maximise the total distance traveled. Solving for 150 or so opcodes is well within reach. "Damien Morton" wrote in message news:b3o4ti$nsl$1@main.gmane.org... > > >>c) ordering cases in the switch statements by usage frequency > > >> (using average opcode usage frequencs obtained by > > >> instrumenting the interpreter) > > > > > > I might try a little simulated annealing to generate layouts with high > > > frequency opcodes near the front and coorcurring opcodes near each > > > other. > > > > I did that by hand, sort of :-) The problem is that the > > scoring phases takes rather long, so you better start with > > a good guess. > > Im wondering what good scoring scheme would look like. > > I tried a scoring scheme in which layouts were scored thusly: > > for (i = 0; i < MAXOP; i++) > for (j = 0; j < MAXOP; j++) > score += pairfreq[layout[i]][layout[j]] * (i < j ? j-i : i-j) > > This works fine, but Im thinking that a simpler scoring scheme which looks > only at the frequencies of adjacent ops might be sufficient, and would > certainly be faster. > > for (i = 1; i < MAXOP; i++) > score += pairfreq[layout[i-1]][layout[i]] > > The idea is that while caches favour locality of reference, because a cache > line is finite in size and relatively small (16 or 64 bytes), there arent > any long-range effects. In other words, caches favour adjacency of reference > rather than locality of reference. From drifty@alum.berkeley.edu Sun Mar 2 02:25:28 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sat, 1 Mar 2003 18:25:28 -0800 (PST) Subject: [Python-Dev] python-dev from 2003-02-16 through 2003-02-28 Message-ID: Since this is falling on the weekends, you guys have until Monday night to tell me how I fouled up. +++++++++++++++++++++++++++++++++++++++++++++++++++++ python-dev Summary for 2003-02-16 through 2003-02-28 +++++++++++++++++++++++++++++++++++++++++++++++++++++ .. _comp.lang.python: .. _rest: .. _last summary: ====================== Summary Announcements ====================== Nothing specific about the Summary to mention. I am starting to lean more and more towards starting summaries out in Quickies_ and then making them a full-fledged summary when they end up requiring more than a short paragraph of explanation. Helps me keep my sanity since I plan on sticking with having some summarization for every thread on python-dev. But this summary is on the lean side because traffic was lower than normal. I am sure this is in reaction to what happened last month with the massive amount of emails and various negativity that sprung up around the list. Made my life easier. =) PyCon_ is moving forward! Early-bird registration is over, but regular registration for $200 is still available. It has already shaped up to be a fun conference. If you come you can hear me make a fool of myself trying to teach the conference reST_. =) T-shirts are also available so even if you don't go to the conference you can buy a shirt at http://www.cafeshops.com/pycon and fool people into thinking your went. =) As for the `pre-PyCon sprint`_, that is also shaping up. There is already a sprint for Zope_, Twisted_, and Webware_. And now there is a sprint in the works for working on the Python core! If you are interested just . .. _PyCon: http://www.python.org/pycon/ .. _pre-PyCon sprint: http://www.python.org/cgi-bin/moinmoin/SprintPlan .. _Zope: http://www.zope.org/ .. _Twisted: http://twistedmatrix.com/ .. _Webware: http://webware.sourceforge.net/ =========================== `RELEASED: Python 2.3a2`__ =========================== __ http://mail.python.org/pipermail/python-dev/2003-February/033537.html Guido released `Python 2.3a2`_ on Feb. 19. Please download it, run the regression tests, and then test some of your own code. The more bugs we can squash before we hit 2.3b1 the better. .. _Python 2.3a2: http://www.python.org/2.3/ =================================== `new format codes for getargs.c`__ =================================== __ http://mail.python.org/pipermail/python-dev/2003-February/033579.html Thomas Heller implemented a new 'k' format code for `getargs.c`_ that ccepts integers or longs, does no range checking, and returns the lower bits in an unsigned long". After Tim Peters said that tests should be added to `_testcapimodule.c`_ the conversation was moved over to http://www.python.org/sf/595026 . .. _getargs.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Python/getargs.c .. __testcapimodule.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/_testcapimodule.c ===================================== `assymetry in descriptor behavior`__ ===================================== __ http://mail.python.org/pipermail/python-dev/2003-February/033583.html This summary is going to assume you understand descriptor's. If you don't read `What's New in 2.2`_ for a nice, simple overview or `PEP 252`_ for the technical explanation (the initial email for this thread has simple code showing how descriptor's are used). If you have ever been interested how property(), classmethod(), and staticmethod() work this will tell you. David Abrahams wondered why it was possible to invoke a descriptor's __get__() from either the class it is defined in or an instance of that class while __set__() for the same descriptor cannot be called from the class directly without having defined the descriptor a second time in the metaclass; David thought this was a little difficult. He also asked about the arguments to the descriptor API methods. Guido responded by saying that it wasn't difficult considering you could do it and that Python was pulling something off with a single notation (by using '.' for class and instance accesses) that C++ uses two notations for ('.' and '::'). As for the arguments to the various methods, they are as follows: __get__(self, obj, type) 'self' gets bound to the descriptor instance. When called for an class, obj = None while for an instance it is bound to the instance containing the descriptor. 'type' is set to the class that has the descriptor regardless of what context the descriptor is being called. The duality is so that descriptors can be happy being called either just on an instance or just a class (such as classmethod()). __set__(self, obj, value) 'self' is the descriptor, obj is the instance, and 'value' is what what the assignment is being passed. As mentioned above, this only works with classes if you create the descriptor *twice*; once in the class and once in the class's metaclass. __delete__(self, obj) Guess what gets bound to these parameters? =) A historical note: Guido said "In an early alpha release it was actually __del__, but that didn't work very well. :-)" for obvious reasons. David also submitted a doc patch for this so we are now one step closer to having new-style classes documented. Still, there is work to be done and if you care to help please do so. .. _What's New in 2.2: http://www.python.org/doc/2.2.1/whatsnew/sect-rellinks.html#SECTION000320000000000000000 .. _PEP 252: http://www.python.org/peps/pep-0252.html ====================== `Bytecode analysis`__ ====================== __ http://mail.python.org/pipermail/python-dev/2003-February/033663.html Splinter threads: - `Bytecode idea `__ - `Code Generation Idea `__ - `Dynamic bytecode analysis `__ - `new bytecode results `__ Damien Morton posted some opcode statistics and tried to get better performance out of `ceval.c`_ by coming up with a way to do a LOAD_FAST_n call (LOAD_FAST pushes a variable on to the stack) and to cut back on the size of .pyc files. Nothing panned out very much, though (all the benchmarking was done using Pystone_). Guido said that Christian Tismer's idea of changing some of the rarely-used opcodes to function calls and moving them out of the 'switch' statement might get some performance. Christian also thought that some work could be done to speed up calls that involve a ``goto fast_next_opcode`` call. Changing ceval.c to using a jump table instead of a switch also did not pan out. Jeremy Hylton spoke to let people know that sometimes having an opcode call out to a function is not necessarily slower then having the code in the switch statement. He said it depended on how much work the opcode had to do and "lots of other hard-to-predict effects" in terms of memory and generated machine code. Jeremey also reminded people that there is patch out there to use the Pentium's cycle counter to find out how many cycles is spent on each pass through the mainloop. Guido also said that "If you really want fame and fortune, try designing a more representative benchmark". AM Kuchling requested to be notified when someone decided to take on this project. Skip Montanaro pointed out that he has "an XML-RPC server available to which applications can connect and upload their dynamic opcode frequencies" at http://manatee.mojam.com:7304 . Compile with "DYNAMIC_EXCUTION_PROFILE and DXPAIRS defined" and fetch the info from the sys_ module (it's undocumented, but it looks like you can get execution info from sys.getdxp()). If you are interested in how to use Skip's server, see http://mail.python.org/pipermail/python-dev/2003-February/033767.html . Damien Morton made his modified source code available at http://www.bitfurnace.com/python/modified-source.zip and asked people give it a try and report back to him their results. Dan Sugalski suggested putting opcode that tends to execute in pairs closer together so that they would have a better chance of being in the cache. It seemed that doing any mass opcode adding made things slower since the switch got larger and thus made cache hits harder to come by. Various ideas of how to rearrange things so that the switch was not as large were suggested and are most likely still being tested. .. _ceval.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Python/ceval.c .. _Pystone: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/test/pystone.py .. _sys: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Python/sysmodule.c ========= Quickies ========= `CALL_ATTR, A Method Proposal `__ Finn Bock added one last comment to this thread from the `last summary`_ about how the proposed implementation of a CALL_ATTR bytecode was how Jython_ handled attribute calls. `[Python-checkins] python/dist/src/Misc NEWS,1.660,1.661 `__: `Package Install Manager for Python `__ We learn some people take offense to the word pimp (to the point of considering them rapists), while others think it is fine ("a pretty respectable profession [in Amsterdam]. Definitely higher standing than a cab driver, somewhat on par with a coffeeshop owner"). `incorrect regression tests `__ Neal Norwitz discovered some regresssion tests that weren't being executed. We also learn that it is best for regression test modules to define a test_main() function that executes all the tests then having the tests run as a side-effect of importation (prevents the import lock from being held). `non-binary operators `__ Hold-over from the `last summary`_ when discussing whether the ternary operator could be just chained binary operators. `Import lock knowledge required! `__ Another hold-over from the `last summary`_; Eric Jones says he wouldn't mind more fine-grained import locking. `various unix platform build/test issues `__ Neal Norwitz brought to the attention of python-dev some issues that were preventing Python from compiling on some platforms. `308: the debate is petering out `__ Samuele Pedroni posted some new stats on the `PEP 308`_ debate going on at `comp.lang.python`_. `[rfc] map enhancement `__ Ludovic Aubry proposed a change to map(), but his use-case was eliminated quickly when it was pointed out he could just rewrite his map() or use list comprehensions. `Python 2.3a2 release today? `__ Guido asked if anyone objected to releasing Python 2.3a2 on Feb. 18. Jack Jansen asked if Guido could wait a day and he said yes. `test_timeout fails on Win98SE `__ The title of the thread states what the issue was and it got resolved. `privacy in log files? `__ Guido discovered a comment about not using `PyErr_WarnExplicit()`_ because there was a worry of having code put into a log file. Discussion seemed to end on the idea that it wasn't so much security but throwing a text editor for a loop because of possible non-ASCII getting put into a log file. `What happened to fixed point? `__ David LeBlanc asked about the status of FixedPoint_ getting into the stdlib. Raymond Hettinger said he would be getting to it soon. `test_posix, test_random failing `__ I bet you can figure out what this thread is about. SF tracker items have now been created. `pickling of large arrays `__ Ralf Grosse-Kunstle asked if there was a way to minimize the amount of buffering needed to buffer an array object. It was pointed out that writing a __reduce__() method that used an iterator would prevent the need to do any major buffering. The idea of having a custom append() method for objects to return was also suggested, but didn't get resolved. `Cygwin build failing `__ A problem with builing under Cygwin was fixed by rebasing the system. `2.3a2 problem: iconv module raising RuntimeError `__ `_iconv_codec.c`_ was raising RuntimeError when it was more proper to raise ImportError. It's been fixed. `SCO Open Server 5.0.x thread support `__ Someone asked for help compiling on SCO with thread support. He was redirected to `comp.lang.python`_ to get help. `call for Windows developers `__ Thomas Heller asked for help from some Windows experts with the goal of getting ctypes_ so that one can write ActiveX controls in Python. `tuning up... `__ Andrew MacIntyre sent some performance numbers for OS/2 EMX; about 10% performance improvement from Python 2.2 with -O compared to stock Python 2.3 (-O in 2.3 does not do much since the SET_LINENO opcode was removed entirely from Python). `Weekly Python Bug/Patch Summary `__ Skip Montanaro's weekly reminder that Python is not perfect yet. =) `Needed: regexp maintainer `__ Guido asked for someone to step forward to take over for the re_ module. `_iconv_codec `__ Guido asking what that `_iconv_codec.c`_ module was for (answer: it's a wrapper for the iconv(3) POSIX module). `python/dist/src configure,1.279.6.17... `__ Neil Schemenauer asked what the whitespace rules were for pre-processor directives (e.g., #include, #define, etc.). Tim Peters (the residential C standards know-it-all) said that "Spaces and horizontal tabs are fine before '#', and between '#' and the directive name". `rename bsddbmodule.c to bsddb185.c `__ `bsddbmodule.c`_ is now going to compile to the module bsddb185. `Scheduled downtime for mail.python.org `__ mail.python.org was scheduled to go down on 2003-02-26 at 10:00 EST. `Traceback problem `__ Christian Tismer wanted a way to clear the traceback information stored by `sys.exc_info()`_ to be cleared on-demand since it is kept around as long as the frame is alive. Kevin Jacobs wrote a patch to implement this feature and named is sys.exc_clear(). And a word of warning to anyone who stores the info returned by sys.exc_info(); it creates a cycle with the frame and thus can create a huge chunk of memory to be held so make sure to delete the info when you are done with it. `module extension search order - can it be changed? `__ Skip Montanaro realized that most failed stat() calls occur because the extension search order goes C extension and then Python module; most modules are written in Python and thus the stat() call for a C extension of a module name tends to fail. Guido said it is this way so that if the build of a C extension fails a same-named Python module can be installed instead. This also lead to Skip possibly coming up with a build option of creating a zip archive of the stdlib at install time to minimize failed stat() calls. `Writing a mutable object problem with __setattr__ `__ Aleksandor Totic asked about classes and an object-persistence setup he was designing. You can learn about __setattr__() and that do find out whether an instance is new-style or classic based on whether it has a __class__ attribute (new-style has this). `Re: some preliminary timings `__ Skip Montanaro discovering that importing email_ takes a while. `GIL Pep commentary `__ David Abrahams basically saying he likes `PEP 311`_. `test_re failing again on Mac OS X `__ Someone thought `test_re`_ was failing again on OS X when it turns out it was an isolated incident. `Slowdown in Python CVS `__ Someone thought that Python had slowed down for some reason; turned out to be isolated. If you ever need to check out a CVS copy from a past date, execute ``cvs update -D '24 Feb 2003'`` (with the proper date, of course). `Some questions about maintenance of the regular expression code. `__: `New regex syntax? `__ Gary Herron stepped up to say he was interested in taking over maintenance of the re_ module. He asked, though, how to handle bug reports about ``(.*)?`` and hitting the recursion limit (a patch materialized that solved the recursion issue for non-greedy quantifiers for the common case). The suggestion of coming up with a new syntax for regexes came up but was stopped from forming on the list since that would take "over all available bandwidth in python-dev" as Guido pointed out. Can still be discussed in other forums, though... `bug? classes whose metclass has __del__ are not collectible `__ Answer: no. Reason: "The GC implementation has a good reason for this; someone else may be able to explain it". `Introducing Python `__ Gustavo Niemeyer sent a link to an mpeg promoting Python at http://www.ibiblio.org/obp/pyBiblio/pythonvideo.php . If you ever had any desire to see what some of the guys from PythonLabs look and sound like and you are not going to PyCon_ you can now quench your curiosity. `syntax for funcion attributes `__ Someone suggested a new syntax for being to access function attributes but was told that it didn't look like it would fly. .. _Jython: http://www.jython.org/ .. _PEP 308: http://www.python.org/peps/pep-0308.html .. _PyErr_WarnExplicit(): http://www.python.org/dev/doc/devel/api/exceptionHandling.html#l2h-92 .. _FixedPoint: http://fixedpoint.sf.net/ .. __iconv_codec.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/_iconv_codec.c?sortby=date .. _ctypes: http://starship.python.net/crew/theller/ctypes.html .. _re: http://www.python.org/dev/doc/devel/lib/module-re.html .. _bsddbmodule.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/bsddbmodule.c .. _sys.exc_info(): http://www.python.org/dev/doc/devel/lib/module-sys.html .. _email: http://www.python.org/dev/doc/devel/lib/module-email.html .. _PEP 311: http://www.python.org/peps/pep-0311.html .. _test_re: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/test/test_re.py From tismer@tismer.com Sun Mar 2 03:24:28 2003 From: tismer@tismer.com (Christian Tismer) Date: Sun, 02 Mar 2003 04:24:28 +0100 Subject: [Python-Dev] Re: Traceback problem In-Reply-To: <200303010340.h213etD23813@pcp02138704pcs.reston01.va.comcast.net> References: <200302260124.h1Q1O7D16232@pcp02138704pcs.reston01.va.comcast.net> <3E5C290B.9010802@tismer.com> <200303010152.h211qwZ11517@pcp02138704pcs.reston01.va.comcast.net> <200303010340.h213etD23813@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E61796C.4070505@tismer.com> Guido van Rossum wrote: > Great! Hope this message has shed some additional light. It would of course, two years earlier. When I wrote my message, I already had triple-checked that there was no way to contradict me :-) > It would be a shame for this to be lost in the archives. If there > were a directory of ImplementationNotes somewhere (or an interpreter > wiki), this would belong there. And responders to "where are the docs > on the implementation" could be told more than "read the source". Put the whole message into the comments, and all is just fine. > Good idea. I hate separating implementation notes from the code by > more than absolutely necessary (Zope's cobweb of Wikis drives me nuts > :-), so I added the essence of that message to ceval.c as a big > comment block. Hey, that's just great! Guess how often I had to re-read that code, finally concluding that it is all-right that way, but always thinking that I could have saved quite some time by taking some notes :-) The hardest thing to remember always was the fact that the callee is saving the caller's state for the exceptions. I always have to go through analysis again to get it right, and I always think this is not the way it should be. but-this-keeps-me-young -- cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From skip@manatee.mojam.com Sun Mar 2 13:00:30 2003 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 2 Mar 2003 07:00:30 -0600 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200303021300.h22D0Un0002864@manatee.mojam.com> Bug/Patch Summary ----------------- 342 open / 3393 total bugs (-16) 125 open / 1999 total patches (+5) New Bugs -------- Let assign to as raise SyntaxWarning as well (2003-02-23) http://python.org/sf/691733 LibRef 4.2.1: {m,n} description update (2003-02-23) http://python.org/sf/692016 tkinter.createfilehandler dumps core (2003-02-24) http://python.org/sf/692416 new.function() leads to segfault (2003-02-25) http://python.org/sf/692776 python always searches python23.zip (2003-02-25) http://python.org/sf/692884 new.function ignores keyword arguments (2003-02-25) http://python.org/sf/692959 Python does not build --with-pydebug on Tru64 with vendor cc (2003-02-25) http://python.org/sf/693094 2.3a2 site.py non-existing dirs (2003-02-25) http://python.org/sf/693255 2.3a2 import after os.chdir difference (2003-02-25) http://python.org/sf/693416 licence allowed, but doesn't work (2003-02-25) http://python.org/sf/693470 Can't multiply str and bool (2003-02-26) http://python.org/sf/693955 email.Parser trashes header (2003-02-26) http://python.org/sf/693996 os.popen() hangs on {Free,Open}BSD (2003-02-26) http://python.org/sf/694062 Python 2.3a2 Build fails on HP-UX11i (2003-02-27) http://python.org/sf/694431 setup.py imports pwd before it's built if HOME not set (2003-02-27) http://python.org/sf/694812 complex_new does not always respect subtypes (2003-03-01) http://python.org/sf/695651 Problems with non-greedy match groups (2003-03-01) http://python.org/sf/695688 New Patches ----------- Use datetime in _strptime (2003-02-23) http://python.org/sf/691928 fix bug 625698, speed up some comparisons (2003-02-25) http://python.org/sf/693221 fix for bug 639806: default for dict.pop (2003-02-26) http://python.org/sf/693753 fix for bug 672614 :) (2003-02-28) http://python.org/sf/695250 environment parameter for popen2 (2003-02-28) http://python.org/sf/695275 fix bug 678519: cStringIO self iterator (2003-03-01) http://python.org/sf/695710 Closed Bugs ----------- Compiler complaints in posixmodule.c (2001-11-28) http://python.org/sf/486434 import with undefineds can crash python (2001-12-02) http://python.org/sf/488184 mkcwproject: custom __initialize routine (2001-12-13) http://python.org/sf/492465 macfs.FSSpec and Carbon.File.FSSpec fail for "new" files (2002-07-24) http://python.org/sf/585923 OSA Python integration (2002-07-26) http://python.org/sf/586998 IDE should have "open recent" menu (2002-09-11) http://python.org/sf/607810 IDE output window (2002-09-11) http://python.org/sf/607821 IDE - Breakpoints don't stick to lines (2002-09-11) http://python.org/sf/608085 unicode alphanumeric regexp bug (2002-09-16) http://python.org/sf/610299 Reorganize MacPython resources on OSX (2002-10-19) http://python.org/sf/625725 remove debug prints from macmain.c (2002-11-08) http://python.org/sf/635570 ic module "path too long" error (2002-11-26) http://python.org/sf/644243 dynload_next needs better errors (2002-12-12) http://python.org/sf/652590 Compiling C sources with absolute path bug (2003-01-15) http://python.org/sf/668662 after using pdb readline does not work correctly (2003-01-28) http://python.org/sf/676342 Can't build C ext on OS X with 'altinstall' python (2003-01-29) http://python.org/sf/677293 python.exe expected in extension builds (2003-01-30) http://python.org/sf/677753 plistlib.py selftest fails (2003-02-07) http://python.org/sf/682317 Future division breaks mpz (2003-02-16) http://python.org/sf/687654 Bundlebuilder needs to pre-convert resource files (2003-02-17) http://python.org/sf/688007 macresource should handle readonly applesingle files (2003-02-17) http://python.org/sf/688011 IDLE does not work on Mac OS X (2003-02-17) http://python.org/sf/688266 64-bit int and long hash keys incompatible (2003-02-19) http://python.org/sf/689659 Docs page has no PEPs link (2003-02-19) http://python.org/sf/689826 2.3a2 build fails under IRIX 6.5 (2003-02-20) http://python.org/sf/690012 test_posix fails when run in non-interactive mode (2003-02-20) http://python.org/sf/690081 sys.last_type is missing (2003-02-20) http://python.org/sf/690109 lines run together on input (2003-02-20) http://python.org/sf/690285 2.3a2 Sol8 make fails at _iconv_codec. (2003-02-20) http://python.org/sf/690309 apply fails to check if warning raises exception (2003-02-20) http://python.org/sf/690435 _POSIX_C_SOURCE redefined (2003-02-21) http://python.org/sf/691005 shutil.copytree documentation bug (2003-02-22) http://python.org/sf/691276 codecs.open(filename, 'U', 'UTF-16') corrupts text (2003-02-22) http://python.org/sf/691291 Closed Patches -------------- Patch for sre bug 610299 (2002-11-04) http://python.org/sf/633359 array.append is sloooow (2003-02-16) http://python.org/sf/687598 2.3 .spec file for building RPMs. (2003-02-18) http://python.org/sf/688584 From vinay_sajip@red-dove.com Sun Mar 2 14:33:05 2003 From: vinay_sajip@red-dove.com (Vinay Sajip) Date: Sun, 2 Mar 2003 14:33:05 -0000 Subject: [Python-Dev] Changes to logging in CVS Message-ID: <006801c2e0c8$a4227d80$652b6992@alpha> I see that recent changes were made in logging/__init__.py to replace the use of "apply(func, args)" with "func(*args)". Doesn't this cause "invalid syntax" problems with 1.5.2? I explicitly coded using apply because I thought it was needed for 1.5.2. There are a few places where I've eschewed use of +=, for the same reason. Any chance we could change back to using apply()? Regards Vinay From mal@lemburg.com Sun Mar 2 19:23:26 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 02 Mar 2003 20:23:26 +0100 Subject: [Python-Dev] Changes to logging in CVS In-Reply-To: <006801c2e0c8$a4227d80$652b6992@alpha> References: <006801c2e0c8$a4227d80$652b6992@alpha> Message-ID: <3E625A2E.1040508@lemburg.com> Vinay Sajip wrote: > I see that recent changes were made in logging/__init__.py to replace the > use of "apply(func, args)" with "func(*args)". Doesn't this cause "invalid > syntax" problems with 1.5.2? I explicitly coded using apply because I > thought it was needed for 1.5.2. There are a few places where I've eschewed > use of +=, for the same reason. Any chance we could change back to using > apply()? You should mark the files you need 1.5.2 compatibility for in the source code. Even though PEP 291 mentions your package, I don't think that everybody knows about this PEP... -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 02 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 30 days left EuroPython 2003, Charleroi, Belgium: 114 days left From vinay_sajip@red-dove.com Sun Mar 2 19:39:16 2003 From: vinay_sajip@red-dove.com (Vinay Sajip) Date: Sun, 2 Mar 2003 19:39:16 -0000 Subject: [Python-Dev] Changes to logging in CVS References: <006801c2e0c8$a4227d80$652b6992@alpha> <3E625A2E.1040508@lemburg.com> Message-ID: <000f01c2e0f3$6a7dcfa0$652b6992@alpha> > > I see that recent changes were made in logging/__init__.py to replace the > > use of "apply(func, args)" with "func(*args)". Doesn't this cause "invalid > > syntax" problems with 1.5.2? I explicitly coded using apply because I > > thought it was needed for 1.5.2. There are a few places where I've eschewed > > use of +=, for the same reason. Any chance we could change back to using > > apply()? > > You should mark the files you need 1.5.2 compatibility for in the > source code. Even though PEP 291 mentions your package, I don't > think that everybody knows about this PEP... Fair enough, but the docstring at the top of __init__.py states: "Should work under Python versions >= 1.5.2, except that source line information is not available unless 'sys._getframe()' is." Do you mean that I need to mention this wherever the source code contains some 1.5.2-constrained idiom like "apply(func, args)" or "a = a + 1", so that it's explicit that it was coded that way for a reason? Regards, Vinay From guido@python.org Sun Mar 2 20:42:37 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 02 Mar 2003 15:42:37 -0500 Subject: [Python-Dev] Re: Changes to logging in CVS In-Reply-To: "Your message of Sun, 02 Mar 2003 14:33:05 GMT." <006801c2e0c8$a4227d80$652b6992@alpha> References: <006801c2e0c8$a4227d80$652b6992@alpha> Message-ID: <200303022042.h22KgbX18847@pcp02138704pcs.reston01.va.comcast.net> > I see that recent changes were made in logging/__init__.py to > replace the use of "apply(func, args)" with "func(*args)". Doesn't > this cause "invalid syntax" problems with 1.5.2? I explicitly coded > using apply because I thought it was needed for 1.5.2. There are a > few places where I've eschewed use of +=, for the same reason. Any > chance we could change back to using apply()? My apologies. I forgot about this. I'll roll it back. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Sun Mar 2 21:06:40 2003 From: tismer@tismer.com (Christian Tismer) Date: Sun, 02 Mar 2003 22:06:40 +0100 Subject: [Python-Dev] __slots__ for metatypes Message-ID: <3E627260.5050807@tismer.com> Hi Guido, all, last year, I wrote a patch that allows meta-types to have slots. You had no time to look into it, and I said "take your time and wait until it is stable". I have been using this for quite a while now, and today I updated it for Python 2.3 . The patch is very small and simple. Essentially, it doesn't use a fixed offset into the internal etype structure, but computed this based upon tp_basicsize. This small patch also gives lots of flexibility to people, who like to add extra stuff to their dynamic type objects. (Well, I do this frequently) It would be nice if we could add this small feature, soon. http://www.python.org/sf/696193 Due to some SF bug, I had to submit this patch twice. thanks a lot - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From newsgroups1@bitfurnace.com Mon Mar 3 01:55:57 2003 From: newsgroups1@bitfurnace.com (Damien Morton) Date: Sun, 2 Mar 2003 20:55:57 -0500 Subject: [Python-Dev] Re: new bytecode results References: <001301c2def2$09d374a0$6401a8c0@damien> <3E5F2260.3080808@lemburg.com> Message-ID: I optimised the layout of the python opcodes using a simulated annealing process that scored adjacent opcodes according to their frequency of co-occurence. This raised my PyStone benchmark from 22100 to 22700, for a 3% gain. Ive been using Skip's DXP server to gather statistics, but there isnt much data there. I should be able to achieve better results if more people contributed stats to his server, more information about which can be found here: http://manatee.mojam.com/~skip/python/ The process of layout the opcodes and switch cases has largely been automated, and generating new layouts is relatively painless and quick. Do please contribute stats for 2.3a2 to Skip's DXP server. I also implemented a LOAD_FASTER opcode, with the argument encoded into the opcode. This raised my PyStone benchmark from 22700 to 23150, for a total 5% gain. The main switch loop looks like this now: if (opcode >= LOAD_FASTER) { load_fast(opcode - LOAD_FASTER); ... goto fast_next_opcode; } switch(opcode) { case LOAD_ATTR: oparg = NEXTARG(); w = GETITEM(names, oparg); ... break; ... } Each opcode case now loads its own argument as necessary. The test for HAVE_ARGUMENT is now implemented using an array of bytes. The test now happens very infrequently, so any performance loss is negligible. const char HASARG[] = { 0 , /* STOP_CODE */ 1 , /* LOAD_ATTR */ 1 , /* CALL_FUNCTION */ 1 , /* STORE_FAST */ 0 , /* BINARY_ADD */ 0 , /* SLICE+0 */ 0 , /* SLICE+1 */ 0 , /* SLICE+2 */ ... } From tim.one@comcast.net Mon Mar 3 04:06:30 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 02 Mar 2003 23:06:30 -0500 Subject: [Python-Dev] Re: new bytecode results In-Reply-To: Message-ID: [Dan Wolfe] > In the last year of lurking on this list, I've seen requests for a good > python benchmark no less than 4 times - the most recent being damien > morton's attempt to prove/disprove his optimizations. > > Having an "approved" good benchmark/realistic test program would make > it easy to validate optimizations, and head off the consistent 'pystone > is not a realistic benchmark' arguments that come up each time.... pystone is a very good benchmark for one thing: testing the "general speed" of the interpreter. Perhaps because it *is* so atypical, it's hard to do something that gives pystone a significant speed boost. Rewriting the eval loop several years ago managed to do that, and ruthlessly cutting slop out of the dict implementation gave it an 8% boost more recently. I can't recall any other single thing that helped pystone as much as those. Jim Fulton claims that pystone is a good predictor of Zope speed on a new box, and now that I know more about Zope than I used to, I believe that: while Zope may look like Python code, there are so many meta-tricks being played under the covers that it's plausible that the only thing that really matters is how fast you can get around the eval loop. Anyway, several years ago I offered to collect and organize a set of "typical" benchmarks. Nobody responded, so that turned out to be a lot easier than I thought it would be . > Besides, it will take a 6 months just to agree to a basic framework, > and another 6 months to work around all the "competitive optimization > tricks" timbot has up his sleeve... You can't help it. If you know the code in advance, the implementation *will* get warped to favor it. The best you can hope for is that warping won't be done at the expense of other code. For example, if you decide to reorder the eval loop case statements, and use pystone as your measure of goodness, you'll end up with a different order than if you use test_descr.py as your measure. Is that cheating? I suppose it depends on who's doing it . From tim.one@comcast.net Mon Mar 3 04:14:55 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 02 Mar 2003 23:14:55 -0500 Subject: [Python-Dev] module extension search order - can it be changed? In-Reply-To: <2m7kbk30li.fsf@starship.python.net> Message-ID: [Michael Hudson] > While we're at it, linecache's charming habit of occasionally giving > out of date information is Pure Evil. Performance my arse, lying to > the user is worse. We don't use linecache often, and I've never tried comparative timing (with and without it). Offhand it's hard to believe that producing tracebacks benefits from it, or that inspect.py does. For 2.3b1, maybe we could change linecache.updatecache() to leave the cache empty and see whether anyone notices <0.7 wink>. From python@rcn.com Mon Mar 3 04:57:01 2003 From: python@rcn.com (Raymond Hettinger) Date: Sun, 2 Mar 2003 23:57:01 -0500 Subject: [Python-Dev] Re: new bytecode results References: Message-ID: <003f01c2e141$53f6c400$125ffea9@oemcomputer> From: "Tim Peters" > [Dan Wolfe] > > In the last year of lurking on this list, I've seen requests for a good > > python benchmark no less than 4 times - the most recent being damien > > morton's attempt to prove/disprove his optimizations. > > > > Having an "approved" good benchmark/realistic test program would make > > it easy to validate optimizations, and head off the consistent 'pystone > > is not a realistic benchmark' arguments that come up each time.... > > pystone is a very good benchmark for one thing: testing the "general speed" > of the interpreter. Perhaps because it *is* so atypical, it's hard to do > something that gives pystone a significant speed boost I've been working with Damien to make sure the improvements are not pystone specific. We've run against my highly optimized matrix code, against pybench, and against another one of my programs which heavily exercises a broad range of python tools. Overall, his improvements have helped across the board. I think his lastest and greatest should be accepted unless there is a maintainability hit. However, the core concept and code seems clean enough to me. FWIW, Raymond Hettinger From ben@algroup.co.uk Mon Mar 3 13:42:43 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Mon, 03 Mar 2003 13:42:43 +0000 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <15933.30607.900530.370402@localhost.localdomain> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> Message-ID: <3E635BD3.9000107@algroup.co.uk> My attentions was drawn to this unanswered email, so here goes... Jeremy Hylton wrote: >>>>>>"KPY" == Ka-Ping Yee writes: > > > KPY> Wow, how did this topic end up crossing over to this list while > KPY> i wasn't looking? :0 > > You sure react quick for someone who isn't looking . > > >> A capability system must have some rules for creating and copying > >> capabilities, but there is more than one way to realize those > >> rules in a programming language. > > KPY> I suppose there could be, but there is really only one obvious > KPY> way: creating a capability is equivalent to creating an object > KPY> -- which you can only do if you hold the constructor. A > KPY> capability is copied into another object (for the security > KPY> folks, "object" == "protection domain") when it is transmitted > KPY> as an argument to a method call. > > KPY> To build a capability system, all you need to do is to > KPY> constrain the transfer of object references such that they can > KPY> only be transmitted along other object references. That's all. > > I don't follow you hear. What does it mean to "transmit along other > object references?" That is, everything in Python is an object and > the only kind of references that exist are object references. He's actually going slightly in circles here. The idea is that in order to acquire an object reference you either create the object, or are given the reference by another object you already have a reference to, or are given it by another object that has a reference to you. Where "you" is some object, of course. What is _not_ supposed to happen is finding objects by poking around in the symbol table, for example. > I think, based on your the rest of your mail, that we're largely on > the same page, but I'd like to make sure I understand where you're > coming from. > > I don't quite follow the definition of protection domain either, as > most of the literature I'm familiar with (not much of it about > capabilities specifically) talks about a protection domain as the set > of objects a principal has access to. The natural way to extend that > to capabilities seems to me to be that a protection domain is the set > of capabilities possessed by a principal. That sounds right. The transitive closure of the capabilties possessed by a principal is also interesting, though the code in the objects determines whether you have access to any particular member of that set in practice. > Are these questions are off-topic for python-dev? > > At any rate, it still seems like there are a variety of ways to > realize capabilities in a programming language. For example, ZODB > uses a special base class called Persistent to mark persistent > objects. One could imagine using the same approach so that only some > objects have capabilities associated with them. This was the approach I tool initially but its substantially more messy than using bound methods. > KPY> The problem for Python, as Jeremy explained, is that there are > KPY> so many other ways of crawling into objects and pulling out > KPY> bits of their internals. > > KPY> Off the top of my head, i only see two things that would have > KPY> to be fixed to turn Python into a capability-secure system: > > KPY> 1. Access to each object is limited to its declared exposed > KPY> interface; no introspection allowed. > > KPY> 2. No global namespace of modules (sys.modules etc.). > > KPY> If there is willingness to consider a "secure mode" for Python > KPY> in which these two things are enforced, i would be interested > KPY> in making it happen. > > I think there is interest and I agree with your problem statement. > I'd rephrase 2 to make it more general. Control access to other > modules. The import statement is just as much of a problem as > sys.modules, right? In a secure environment, you have to control what > code can be loaded in the first place. Correct. > >> In Python, there is no private. > > KPY> Side note (probably irrelevant): in some sense there is, but > KPY> nobody uses it. Scopes are private. If you were to implement > KPY> classes and objects using lambdas with message dispatch > KPY> (i.e. the Scheme way, instead of having a separate "class" > KPY> keyword), then the scoping would take care of all the > KPY> private-ness for you. > > I was aware of Rees's dissertation when I did the nested scopes and, > partly as a result, did not provide any introspection mechanism for > closures. That is, you can get at a function's func_closure slot but > there's no way to look inside the cells from Python. I was thinking > that closures could replace Bastions. It stills seems possible, but > on several occasions I've wished I could introspect about closures > from Python code. I'm also unsure that the idea flies so well for > Python, because you really want secure Python to be as much like > regular Python as possible. If the mechanism is based on functions, > it seems hard to make it work naturally for classes and instances. > > >> The Zope proxy approach seems a little more promising, because it > >> centralizes all the security machinery in one object, a security > >> proxy. A proxy for an object can appear virtually > >> indistinguishable for the object itself, except that type(proxy) > >> != type(object_being_proxied). The proxy guarantees that any > >> object returned through the proxy is wrapped in its own proxy, > >> except for simple immutable objects like ints or strings. > > KPY> The proxy mechanism is interesting, but not for this purpose. > KPY> A proxy is how you implement revocation of capabilities: if you > KPY> insert a proxy in front of an object and grant access to that > KPY> proxy, then you can revoke the access just by telling the proxy > KPY> to stop responding. > > Sure, you can use proxies for revocation, but that's not what I was > trying to say. > > I think the fundamental problem for rexec is that you don't have a > security kernel. The code for security gets scatter throughout the > interpreter. It's hard to have much assurance in the security when > its tangled up with everything else in the language. > > You can use a proxy for an object to deal with goal #1 above -- > enforce an interface for an object. I think about this much like a > hardware capability architecture. The protected objects live in the > capability segment and regular code can't access them directly. The > only access is via a proxy object that is bound to the capability. > > Regardless of proxy vs. rexec, I'd be interested to hear what you > think about a sound way to engineer a secure Python. I'm told that proxies actually rely on rexec, too. So, I guess whichever approach you take, you need rexec. The problem is that although you can think about proxies as being like a segmented architecture, you have to enforce that segmentation. And that means doing so throughout the interpreter, doesn't it? I suppose it might be possible to abstract things in some way to make that less widespread, but probably not without having an adverse impact on speed. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From guido@python.org Mon Mar 3 14:40:40 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 03 Mar 2003 09:40:40 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: Your message of "Mon, 03 Mar 2003 13:42:43 GMT." <3E635BD3.9000107@algroup.co.uk> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> Message-ID: <200303031440.h23Eeea16004@odiug.zope.com> > I'm told that proxies actually rely on rexec, too. So, I guess whichever > approach you take, you need rexec. Yes and no. It's unclear what *you* mean when you say "rexec". There is a standard module by that name that employs Python's support for tighter security and sets up an entire restricted execution environment. And then there's the underlying facilities in Python, which allow you to override __import__ and all other built-ins; this facility is often called "restricted execution." Zope security proxies rely on the latter facilities, but not on the rexec module. I suggest that in order to avoid confusion, you should use "restricted execution" when that's what you mean, and use "rexec" only to refer to the standard module by that name. > The problem is that although you can think about proxies as being like a > segmented architecture, you have to enforce that segmentation. And that > means doing so throughout the interpreter, doesn't it? I suppose it > might be possible to abstract things in some way to make that less > widespread, but probably not without having an adverse impact on speed. The built-in restricted execution facilities indeed do distinguish between two security domains: restricted and unrestricted. In restricted mode, certain introspection APIs are disallowed. Restricted execution is enabled as soon as a particular global scope's __builtins__ is not the standard __builtins__, which is by definition the __dict__ of the __builtin__ module (note __builtin__, which is a module, vs. __builtins__, which is a global). --Guido van Rossum (home page: http://www.python.org/~guido/) From ben@algroup.co.uk Mon Mar 3 17:56:20 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Mon, 03 Mar 2003 17:56:20 +0000 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <200303031440.h23Eeea16004@odiug.zope.com> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <200303031440.h23Eeea16004@odiug.zope.com> Message-ID: <3E639744.5050407@algroup.co.uk> Guido van Rossum wrote: >>I'm told that proxies actually rely on rexec, too. So, I guess whichever >>approach you take, you need rexec. > > > Yes and no. It's unclear what *you* mean when you say "rexec". There > is a standard module by that name that employs Python's support for > tighter security and sets up an entire restricted execution > environment. And then there's the underlying facilities in Python, > which allow you to override __import__ and all other built-ins; this > facility is often called "restricted execution." Zope security > proxies rely on the latter facilities, but not on the rexec module. > > I suggest that in order to avoid confusion, you should use "restricted > execution" when that's what you mean, and use "rexec" only to refer to > the standard module by that name. OK, I mean restricted execution. >>The problem is that although you can think about proxies as being like a >>segmented architecture, you have to enforce that segmentation. And that >>means doing so throughout the interpreter, doesn't it? I suppose it >>might be possible to abstract things in some way to make that less >>widespread, but probably not without having an adverse impact on speed. > > > The built-in restricted execution facilities indeed do distinguish > between two security domains: restricted and unrestricted. In > restricted mode, certain introspection APIs are disallowed. > Restricted execution is enabled as soon as a particular global scope's > __builtins__ is not the standard __builtins__, which is by definition > the __dict__ of the __builtin__ module (note __builtin__, which is a > module, vs. __builtins__, which is a global). Oh, I understand that, but the complaint was that it is spread all over the interpreter. One of the nice thing about hardware enforced segmentation is that you have a high assurance that it really is segemented. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From newsgroups1@bitfurnace.com Mon Mar 3 23:55:47 2003 From: newsgroups1@bitfurnace.com (Damien Morton) Date: Mon, 3 Mar 2003 18:55:47 -0500 Subject: [Python-Dev] JUMP_IF_X opcodes Message-ID: I have been reviewing the compile.c module with respect to the use of JUMP_IF_XXX opcodes, and the frequency with which these opcodes are followed by a POP_TOP instruction. It seems to me that there are two kinds of uses cases for these opcodes, The first use case could be expressed as POP_THEN_JUMP_IF_XXXX The second use case could be expressed as JUMP_IF_XXX_ELSE_POP Listed below are the use cases for these instructions, and the functions in compile.c that they apear in. The form is JUMP_IF_XXX(top-of-stack-if-no-jump, top-of-stack-if-jump) com_assert_stmt - JUMP_IF_TRUE(-,-) com_if_stmt - JUMP_IF_FALSE(-,-) com_while_stmt - JUMP_IF_FALSE(-,-) com_try_except - JUMP_IF_FALSE(-,-) com_list_if - JUMP_IF_FALSE(-, -) com_comparison - JUMP_IF_FALSE(-, 0) com_and_test - JUMP_IF_FALSE(-, 0) com_test - JUMP_IF_TRUE(-, 1) Below is a minimally intrusive implementation of the expansion of JUMP_IF_FALSE into two opcodes for handling the two use cases. case JUMP_IF_FALSE_ELSE_POP: case POP_THEN_JUMP_IF_FALSE: NEXTARG(oparg); err = PyObject_IsTrue(TOP()); if (err > 0) { err = 0; POP(); } else if (err == 0) { if (opcode == POP_THEN_JUMP_IF_FALSE) POP(); JUMPBY(oparg); } else break; continue; Comments, suggestions, etc, appreciated. From dave@boost-consulting.com Tue Mar 4 00:29:58 2003 From: dave@boost-consulting.com (David Abrahams) Date: Mon, 03 Mar 2003 19:29:58 -0500 Subject: [Python-Dev] Re: __slots__ for metatypes References: <3E627260.5050807@tismer.com> Message-ID: Christian Tismer writes: > This small patch also gives lots of flexibility to people, > who like to add extra stuff to their dynamic type objects. > (Well, I do this frequently) > > It would be nice if we could add this small feature, soon. > > http://www.python.org/sf/696193 Yes, please, if possible. -Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From neal@metaslash.com Tue Mar 4 02:56:26 2003 From: neal@metaslash.com (Neal Norwitz) Date: Mon, 03 Mar 2003 21:56:26 -0500 Subject: [Python-Dev] JUMP_IF_X opcodes In-Reply-To: References: Message-ID: <20030304025626.GA24615@epoch.metaslash.com> On Mon, Mar 03, 2003 at 06:55:47PM -0500, Damien Morton wrote: > I have been reviewing the compile.c module with respect to the use of > JUMP_IF_XXX opcodes, and the frequency with which these opcodes are followed > by a POP_TOP instruction. > > It seems to me that there are two kinds of uses cases for these opcodes, > > The first use case could be expressed as POP_THEN_JUMP_IF_XXXX > The second use case could be expressed as JUMP_IF_XXX_ELSE_POP > > Comments, suggestions, etc, appreciated. I think you won't get much of a benefit by adding the 2+ instructions necessary for this scheme. I think it would be best to have JUMP_IF_XXX always do a POP_TOP and never jump to a jump. Below is an example of some code and the disassembly. >>> def f(a, b): ... if a and b: ... print 'nope' ... >>> dis.dis(f) 2 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 4 (to 10) 6 POP_TOP 7 LOAD_FAST 1 (b) >> 10 JUMP_IF_FALSE 9 (to 22) 13 POP_TOP 3 14 LOAD_CONST 1 ('no') 17 PRINT_ITEM 18 PRINT_NEWLINE 19 JUMP_FORWARD 1 (to 23) >> 22 POP_TOP >> 23 LOAD_CONST 0 (None) 26 RETURN_VALUE Note the first JUMP_IF_FALSE jumps to the second JUMP_IF_FALSE which then jumps to POP_TOP. An optimized version of this code where the POP is performed as part of the JUMP_IF_XXX could be: >>> dis.dis(f) 2 0 LOAD_FAST 0 (a) 3 JUMP_IF_FALSE 11 (to 17) 6 LOAD_FAST 1 (b) >> 9 JUMP_IF_FALSE 5 (to 17) 3 12 LOAD_CONST 1 ('no') 15 PRINT_ITEM 16 PRINT_NEWLINE >> 17 LOAD_CONST 0 (None) 20 RETURN_VALUE In the optimized version, there are at least 2 less iterations around the eval_frame loop (when a is false). 1 POP_TOP, 1 JUMP_IF_FALSE. If both a and b are true, the if body is executed and there are 3 iterations less. 2 POP_TOPs, 1 JUMP_FORWARD. With more conditions, the savings should be better. The problem is that it's difficult to get the compiler to output this code AFAIK. I believe Skip's peephole optimizer did the transformation to prevent a jump to a jump, but that was another pass. The new compiler Jeremy is working on should make these sorts of transformations easier. All that said, the scheme you propose could provide a decent speed up. The only way to know is to try. :-) Neal From guido@python.org Thu Mar 6 03:55:27 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Mar 2003 22:55:27 -0500 Subject: [Python-Dev] Fun with timeit.py Message-ID: <200303060355.h263tRv28259@pcp02138704pcs.reston01.va.comcast.net> At Jim's request, I added a utility module to the standard library that implements state-of-the-art timing of code snippets. Using a slightly modified version of this code, here's the cost in microseconds of one for loop iteration (with 'pass' as the loop body) in various Python versions. All tests were run on my home machine: a 664 MHz Pentium III with 256 KB cache, running Red Hat Linux 7.3, compiled with gcc 2.96. Note the steady improvement over the years. :-) version plain -O ------- ----- ----- 1.3 0.625 n/a 1.4 0.602 n/a 1.5.2 0.606 0.466 2.0 0.561 0.445 2.1 0.591 0.436 2.2 0.416 0.277 2.3a2+ 0.246 0.248 (1) The invocation was "python timeit.py -r5" (with -O added for the last column). This times 5 runs of a million iterations each and prints the time (normalized to usec per iteration) for the fastest run. I ran this twice for each combination and picked the lowest of the two; there was never more than 0.002 usec difference. (1) A mystery: the Python 2.3 binary installed in /usr/local/bin measured 0.266 for the -O case, but 0.248 without -O; i.e. -O made it slower! The byte-for-byte identical binary in my build tree produced the more reasonable measurements given in the table. --Guido van Rossum (home page: http://www.python.org/~guido/) From brett@python.org Thu Mar 6 06:46:28 2003 From: brett@python.org (Brett Cannon) Date: Wed, 5 Mar 2003 22:46:28 -0800 (PST) Subject: [Python-Dev] Pre-PyCon sprint ideas Message-ID: In an effort to help Guido (did you get my email about the rough draft of the email to send out to the rest of the world?) I am trying to gather ideas for what the Python core sprint can focus on. I have set up a wiki page at http://www.python.org/cgi-bin/moinmoin/PyCoreSprint for people to add ideas to. If there is something you think deserves sprint attention then go add it. -Brett From Raymond Hettinger" Iterators unified access to containers -- lets find more of those. Substitutability simplifies development so shelves have a full dictionary interface but tuples won't sprout a count method because lists differ in intent. Deprecation comes at a price but cruft has a cost of its own. Holistic refactoring beats piecemeal optimization. Comment generously, the best modules are an education to read. Be kind on the Usenet some posters are only eleven. Raymond Hettinger From mwh@python.net Thu Mar 6 11:36:58 2003 From: mwh@python.net (Michael Hudson) Date: Thu, 06 Mar 2003 11:36:58 +0000 Subject: [Python-Dev] More Zen In-Reply-To: <000f01c2e3d1$561b60a0$125ffea9@oemcomputer> ("Raymond Hettinger"'s message of "Thu, 6 Mar 2003 06:12:54 -0500") References: <000f01c2e3d1$561b60a0$125ffea9@oemcomputer> Message-ID: <2mk7fc20bp.fsf@starship.python.net> "Raymond Hettinger" writes: [snip stuff I agree with] > Comment generously, the best modules are an education to read. This one I have mild issues with. Ideally, your code is so clear that it requires no comments to read! And information for users of the code should be in docstrings. If you're implementing a non-obvious algorithm then there's a place for a comment block educating the reader how it works, but I'm leery of anything that might seem to encourage the i = i + 1 # add one to i school of commenting. Cheers, M. -- Our lecture theatre has just crashed. It will currently only silently display an unexplained line-drawing of a large dog accompanied by spookily flickering lights. -- Dan Sheppard, ucam.chat (from Owen Dunn's summary of the year) From mchermside@ingdirect.com Thu Mar 6 13:09:56 2003 From: mchermside@ingdirect.com (Chermside, Michael) Date: Thu, 6 Mar 2003 08:09:56 -0500 Subject: [Python-Dev] Re: More Zen Message-ID: <7F171EB5E155544CAC4035F0182093F03CF76E@INGDEXCHSANC1.ingdirect.com> Raymond: I particularly like your last zen point: > Be kind on the Usenet some posters are only eleven. I like it for two reasons... one, being that it's an important truth (as recently illustrated ;-)), but secondly that it reminds us that Python is more than a language... it also includes a very strong and helpful community without which all the design principles in the world would never lead to such a successful language. -- Michael Chermside From webmaster@pferdemarkt.ws Thu Mar 6 14:13:07 2003 From: webmaster@pferdemarkt.ws (webmaster@pferdemarkt.ws) Date: Thu, 6 Mar 2003 06:13:07 -0800 Subject: [Python-Dev] Pferdemarkt.ws informiert! Newsletter 03/2003 http://www.pferdemarkt.ws Message-ID: <200303061413.GAA26296@eagle.he.net> http://www.pferdemarkt.ws Wir sind in 2003 erfolgreich in des neue \"Pferdejahr 2003 gestartet. Für den schnellen Erfolg unseres Marktes möchten wir uns bei Ihnen bedanken. Heute am 06.03.2003 sind wir gut 2 Monate Online! Täglich wächst unsere Datenbank um 30 Neue Angebote. Stellen auch Sie als Privatperson Ihre zu verkaufenden Pferde direkt und vollkommen kostenlos ins Internet. Zur besseren Sichtbarmachung Ihrer Angebote können Sie bis zu ein Bild zu Ihrer Pferdeanzeige kostenlos einstellen! Wollen Sie direkt auf die erste Seite, dann können wir Ihnen unser Bonussystem empfehlen. klicken Sie hier: http://www.pferdemarkt.ws/bestellung.html Ihr http://Pferdemarkt.ws Team Klicken Sie hier um sich direkt einzuloggen http://www.Pferdemarkt.ws Kostenlos Anbieten, Kostenlos Suchen! Direkt von Privat zu Privat! Haben Sie noch Fragen mailto: webmaster@pferdemarkt.ws From jacobs@penguin.theopalgroup.com Thu Mar 6 15:18:48 2003 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Thu, 6 Mar 2003 10:18:48 -0500 (EST) Subject: [Python-Dev] httplib SSLFile broken in CVS Message-ID: Hi all, SourceForge isn't letting me in, so I'm dropping a note here to report that Raymond Hettinger's changes to httplib.py (Rev 1.72 on Wed Feb 26 22:45:18 2003 UTC) have broken the read() method on the SSLFile object. I suspect that he was trying to be clever by adding iterators to code that worked just fine (if not better) without them. Unfortunately, clever code has to be tested. The diff below repairs it, though I'd be just as happy if that part of Rev 1.72 was reverted. --- httplib.py.orig 2003-03-05 19:37:28.000000000 -0500 +++ httplib.py 2003-03-06 10:11:01.000000000 -0500 @@ -864,13 +864,15 @@ def read(self, size=None): L = [self._buf] + self._buf = '' if size is None: - self._buf = '' for s in iter(self._read, ""): L.append(s) - return "".join(L) else: - avail = len(self._buf) + avail = len(L[0]) + if avail >= size: + self._buf = L[0][size:] + return L[0][:size] for s in iter(self._read, ""): L.append(s) avail += len(s) @@ -878,14 +880,19 @@ all = "".join(L) self._buf = all[size:] return all[:size] + return "".join(L) def readline(self): L = [self._buf] self._buf = '' + i = L[0].find("\n") + 1 + if i > 0: + self._buf = L[0][i:] + return L[0][:i] for s in iter(self._read, ""): L.append(s) - if "\n" in s: - i = s.find("\n") + 1 + i = s.find("\n") + 1 + if i > 0: self._buf = s[i:] L[-1] = s[:i] break Regards, -Kevin -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From python@rcn.com Thu Mar 6 15:33:21 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 6 Mar 2003 10:33:21 -0500 Subject: [Python-Dev] httplib SSLFile broken in CVS References: Message-ID: <002301c2e3f5$b84641e0$5c10a044@oemcomputer> I'll put in your SF report and fix it. Raymond ----- Original Message ----- From: "Kevin Jacobs" To: ; "Raymond Hettinger" Sent: Thursday, March 06, 2003 10:18 AM Subject: [Python-Dev] httplib SSLFile broken in CVS > Hi all, > > SourceForge isn't letting me in, so I'm dropping a note here to report that > Raymond Hettinger's changes to httplib.py (Rev 1.72 on Wed Feb 26 22:45:18 > 2003 UTC) have broken the read() method on the SSLFile object. I suspect > that he was trying to be clever by adding iterators to code that worked just > fine (if not better) without them. Unfortunately, clever code has to be > tested. The diff below repairs it, though I'd be just as happy if that part > of Rev 1.72 was reverted. > > --- httplib.py.orig 2003-03-05 19:37:28.000000000 -0500 > +++ httplib.py 2003-03-06 10:11:01.000000000 -0500 > @@ -864,13 +864,15 @@ > > def read(self, size=None): > L = [self._buf] > + self._buf = '' > if size is None: > - self._buf = '' > for s in iter(self._read, ""): > L.append(s) > - return "".join(L) > else: > - avail = len(self._buf) > + avail = len(L[0]) > + if avail >= size: > + self._buf = L[0][size:] > + return L[0][:size] > for s in iter(self._read, ""): > L.append(s) > avail += len(s) > @@ -878,14 +880,19 @@ > all = "".join(L) > self._buf = all[size:] > return all[:size] > + return "".join(L) > > def readline(self): > L = [self._buf] > self._buf = '' > + i = L[0].find("\n") + 1 > + if i > 0: > + self._buf = L[0][i:] > + return L[0][:i] > for s in iter(self._read, ""): > L.append(s) > - if "\n" in s: > - i = s.find("\n") + 1 > + i = s.find("\n") + 1 > + if i > 0: > self._buf = s[i:] > L[-1] = s[:i] > break > > Regards, > -Kevin > > -- > -- > Kevin Jacobs > The OPAL Group - Enterprise Systems Architect > Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com > Fax: (216) 986-0714 WWW: http://www.theopalgroup.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From jacobs@penguin.theopalgroup.com Thu Mar 6 15:40:10 2003 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Thu, 6 Mar 2003 10:40:10 -0500 (EST) Subject: [Python-Dev] httplib SSLFile broken in CVS In-Reply-To: <002301c2e3f5$b84641e0$5c10a044@oemcomputer> Message-ID: On Thu, 6 Mar 2003, Raymond Hettinger wrote: > I'll put in your SF report and fix it. Thanks. Let me know if you'd like me to test any additional changes, since I have a large test suite for my applications that uses httplib+SSL extensively. -Kevin -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From python@rcn.com Thu Mar 6 15:52:22 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 6 Mar 2003 10:52:22 -0500 Subject: [Python-Dev] More Zen References: <000f01c2e3d1$561b60a0$125ffea9@oemcomputer> <2mk7fc20bp.fsf@starship.python.net> Message-ID: <003101c2e3f8$60cdf4a0$5c10a044@oemcomputer> From: "Michael Hudson" > > Comment generously, the best modules are an education to read. > > This one I have mild issues with. Ideally, your code is so clear that > it requires no comments to read! And information for users of the > code should be in docstrings. If you're implementing a non-obvious > algorithm then there's a place for a comment block educating the > reader how it works, but I'm leery of anything that might seem to > encourage the "i = i + 1 # add one to i" This ought to be more clear: Reading heapq and timeit makes you smart -- let's comment like that. Raymond Hettinger From neal@metaslash.com Thu Mar 6 15:57:06 2003 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 06 Mar 2003 10:57:06 -0500 Subject: [Python-Dev] httplib SSLFile broken in CVS In-Reply-To: References: <002301c2e3f5$b84641e0$5c10a044@oemcomputer> Message-ID: <20030306155706.GE1093@epoch.metaslash.com> On Thu, Mar 06, 2003 at 10:40:10AM -0500, Kevin Jacobs wrote: > > Thanks. Let me know if you'd like me to test any additional changes, since > I have a large test suite for my applications that uses httplib+SSL > extensively. Kevin, Any chance we could get you to augment the regression tests? It would be very helpful. Neal From bbum@codefab.com Thu Mar 6 14:51:55 2003 From: bbum@codefab.com (Bill Bumgarner) Date: Thu, 6 Mar 2003 09:51:55 -0500 Subject: [Python-Dev] xmlrpclib Message-ID: <2C8338BE-4FE3-11D7-AD53-000393877AE4@codefab.com> Is there active work on the xmlrpclib module these days? The HTTPTransport patch/addition should likely go out with 2.3 as it adds easy authentication and proxy support to xmlrpclib. Also, the unicode support in xmlrpclib is broken in that it can't handle subclasses of . b.bum From jacobs@penguin.theopalgroup.com Thu Mar 6 16:49:18 2003 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Thu, 6 Mar 2003 11:49:18 -0500 (EST) Subject: [Python-Dev] httplib SSLFile broken in CVS In-Reply-To: <20030306155706.GE1093@epoch.metaslash.com> Message-ID: On Thu, 6 Mar 2003, Neal Norwitz wrote: > On Thu, Mar 06, 2003 at 10:40:10AM -0500, Kevin Jacobs wrote: > > Thanks. Let me know if you'd like me to test any additional changes, since > > I have a large test suite for my applications that uses httplib+SSL > > extensively. > > Any chance we could get you to augment the regression tests? > It would be very helpful. How many people run the regression suite with 'network' enabled? If nobody does, then it will be a waste of time to add it. -Kevin -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From skip@pobox.com Thu Mar 6 16:50:48 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 6 Mar 2003 10:50:48 -0600 Subject: [Python-Dev] xmlrpclib In-Reply-To: <2C8338BE-4FE3-11D7-AD53-000393877AE4@codefab.com> References: <2C8338BE-4FE3-11D7-AD53-000393877AE4@codefab.com> Message-ID: <15975.31848.141376.614323@montanaro.dyndns.org> Bill> Is there active work on the xmlrpclib module these days? The Bill> HTTPTransport patch/addition should likely go out with 2.3 as it Bill> adds easy authentication and proxy support to xmlrpclib. Can you provide a SF id? I can't seem to find it. Bill> Also, the unicode support in xmlrpclib is broken in that it can't Bill> handle subclasses of . Does it handle subclasses of str? Skip From python@rcn.com Thu Mar 6 16:55:57 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 6 Mar 2003 11:55:57 -0500 Subject: [Python-Dev] httplib SSLFile broken in CVS References: Message-ID: <009701c2e401$4287ee20$5c10a044@oemcomputer> > How many people run the regression suite with 'network' enabled? If nobody > does, then it will be a waste of time to add it. I *always* run the suit with network enabled and it only takes one person running a suite to detect an error. Also, everyone who makes a change to a network resource should be running the tests with network enabled (at least for that particular change). IOW, it is definitely not a waste of time. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From bbum@codefab.com Thu Mar 6 17:10:42 2003 From: bbum@codefab.com (Bill Bumgarner) Date: Thu, 6 Mar 2003 12:10:42 -0500 Subject: [Python-Dev] xmlrpclib In-Reply-To: <15975.31848.141376.614323@montanaro.dyndns.org> Message-ID: <8FEDD9EF-4FF6-11D7-AD53-000393877AE4@codefab.com> On Thursday, Mar 6, 2003, at 11:50 US/Eastern, Skip Montanaro wrote: > Bill> Is there active work on the xmlrpclib module these days? The > Bill> HTTPTransport patch/addition should likely go out with 2.3 > as it > Bill> adds easy authentication and proxy support to xmlrpclib. > > Can you provide a SF id? I can't seem to find it. It had been closed or moved out of the SF bug queue by Fred about the same time he left python-dev, I believe. I had sent the HTTPTransport source to Fred, but that sounds like a dead end these days. Found it: 648658 > Bill> Also, the unicode support in xmlrpclib is broken in that it > can't > Bill> handle subclasses of . > > Does it handle subclasses of str? I haven't tested, but looking at the implementation, I don't think it will. In my case, I'm using xmlrpclib in the context of a Cocoa/Python based application that frequently uses Objective-C sourced strings as a part of the RPC request. The PyObjC bridge now bridges NSStrings as a subclass of unicode. Currently, the Marshaller class in xmlrpclib builds a simple dictionary of types used to encode raw objects to XML. class Marshaller: ... dispatch = {} ... def dump_string(self, value, escape=escape): self.write("%s\n" % escape(value)) dispatch[StringType] = dump_string if unicode: def dump_unicode(self, value, escape=escape): value = value.encode(self.encoding) self.write("%s\n" % escape(value)) dispatch[UnicodeType] = dump_unicode ... Where the dump method is: def __dump(self, value): try: f = self.dispatch[type(value)] except KeyError: raise TypeError, "cannot marshal %s objects" % type(value) else: f(self, value) So, no, it doesn't do subclasses properly. The workaround [for me] was easy... and bogus: import xmlrpclib Marshaller.dispatch[type(NSString.stringWithString_(''))] = Marshaller.dispatch[type(u'')] b.bum From jacobs@penguin.theopalgroup.com Thu Mar 6 17:10:52 2003 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Thu, 6 Mar 2003 12:10:52 -0500 (EST) Subject: [Python-Dev] httplib SSLFile broken in CVS In-Reply-To: <009701c2e401$4287ee20$5c10a044@oemcomputer> Message-ID: On Thu, 6 Mar 2003, Raymond Hettinger wrote: > > How many people run the regression suite with 'network' enabled? If nobody > > does, then it will be a waste of time to add it. > > IOW, it is definitely not a waste of time. Great! (Until today I didn't even know how to enable the network resource) I'll submit a patch to test_socket_ssl, since it is already using urllib. -Kevin -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From klm@zope.com Thu Mar 6 17:15:09 2003 From: klm@zope.com (Ken Manheimer) Date: Thu, 6 Mar 2003 12:15:09 -0500 (EST) Subject: [Python-Dev] More Zen In-Reply-To: <2mk7fc20bp.fsf@starship.python.net> Message-ID: On Thu, 6 Mar 2003, Michael Hudson wrote: > "Raymond Hettinger" writes: > > [snip stuff I agree with] > > > Comment generously, the best modules are an education to read. > > This one I have mild issues with. Ideally, your code is so clear that > it requires no comments to read! And information for users of the > code should be in docstrings. If you're implementing a non-obvious > algorithm then there's a place for a comment block educating the > reader how it works, but I'm leery of anything that might seem to > encourage the > > i = i + 1 # add one to i > > school of commenting. I expect that _sometimes_ some code cannot be clear, even on occasions when the algorithm is not, as a whole, particularly abstruse. I agree, though, that unnecessary comments are harmful. How about framing it like this: Comment obscure code, let the obvious speak for itself. -- Ken klm@zope.com From fdrake@acm.org Thu Mar 6 17:19:47 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 6 Mar 2003 12:19:47 -0500 Subject: [Python-Dev] xmlrpclib In-Reply-To: <8FEDD9EF-4FF6-11D7-AD53-000393877AE4@codefab.com> References: <15975.31848.141376.614323@montanaro.dyndns.org> <8FEDD9EF-4FF6-11D7-AD53-000393877AE4@codefab.com> Message-ID: <15975.33587.890152.791815@grendel.zope.com> Bill Bumgarner writes: > It had been closed or moved out of the SF bug queue by Fred about the > same time he left python-dev, I believe. I had sent the > HTTPTransport source to Fred, but that sounds like a dead end these > days. Sorry; I've just been really busy on other things. I'm on python-dev these days, though I skim the messages very quickly. > Found it: 648658 ... > I haven't tested, but looking at the implementation, I don't think it > will. I wish you were wrong on this, but I don't think you are. ;-( > In my case, I'm using xmlrpclib in the context of a Cocoa/Python based > application that frequently uses Objective-C sourced strings as a part > of the RPC request. The PyObjC bridge now bridges NSStrings as a > subclass of unicode. ... > So, no, it doesn't do subclasses properly. The workaround [for me] was > easy... and bogus: > > import xmlrpclib > Marshaller.dispatch[type(NSString.stringWithString_(''))] = > Marshaller.dispatch[type(u'')] Yeah, not too pretty. For things like this, where some manner of dispatch is needed based on type, but the type itself doesn't provide some appropriate method, there's a real problem associating the right bit of code, and I'm quite torn as to the right approach to take. ;-( -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From skip@pobox.com Thu Mar 6 17:35:00 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 6 Mar 2003 11:35:00 -0600 Subject: [Python-Dev] xmlrpclib In-Reply-To: <15975.33587.890152.791815@grendel.zope.com> References: <15975.31848.141376.614323@montanaro.dyndns.org> <8FEDD9EF-4FF6-11D7-AD53-000393877AE4@codefab.com> <15975.33587.890152.791815@grendel.zope.com> Message-ID: <15975.34500.180179.410897@montanaro.dyndns.org> Fred> For things like this, where some manner of dispatch is needed Fred> based on type, but the type itself doesn't provide some Fred> appropriate method, there's a real problem associating the right Fred> bit of code, and I'm quite torn as to the right approach to take. Slower in some cases, but couldn't you walk up the __bases__ chain until you pop off the top or hit a match in the dispatch dict? For stuff like the NSString stuff, perhaps adding a registration function to the marshaller would be appropriate. Of course, there's the problem coming out the other side. Does it matter if you put in a subclass of str and get out a plain old str? Also, xmlrpclib is built to take advantage of a number of speedup helper modules. In production usage I've found it unbearably slow if used without sgmlop, for example. I'd hate to have the default situation work and have it fail if a speedup module was added to the system. Skip From pedronis@bluewin.ch Thu Mar 6 17:21:40 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Thu, 6 Mar 2003 18:21:40 +0100 Subject: [Python-Dev] super() bug (?) Message-ID: <009e01c2e404$da3677c0$6d94fea9@newmexico> >>> class C(object): ... def f(self): pass ... >>> class D(C): pass ... >>> D.f >>> super(D,D).f > I think this should produce the same thing as D.f, that means implementation-wise f.__get__(None,D) should be called not f.__get__(D,D). _.__get__(None,D) would still do the right thing for static AND class methods: >>> def g(cls): pass ... >>> classmethod(g).__get__(None,D) > From bbum@codefab.com Thu Mar 6 17:48:22 2003 From: bbum@codefab.com (Bill Bumgarner) Date: Thu, 6 Mar 2003 12:48:22 -0500 Subject: [Python-Dev] xmlrpclib: Apology In-Reply-To: <15975.33587.890152.791815@grendel.zope.com> Message-ID: On Thursday, Mar 6, 2003, at 12:19 US/Eastern, Fred L. Drake, Jr. wrote: > I wish you were wrong on this, but I don't think you are. ;-( Fred: I apologize [publically]..... my bad. I was mistaking you for the other Fred. In any case, let me know what I can do to contribute to the effort. b.bum From fdrake@acm.org Thu Mar 6 18:05:18 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 6 Mar 2003 13:05:18 -0500 Subject: [Python-Dev] Re: xmlrpclib: Apology In-Reply-To: References: <15975.33587.890152.791815@grendel.zope.com> Message-ID: <15975.36318.538470.221747@grendel.zope.com> Bill Bumgarner writes: > On Thursday, Mar 6, 2003, at 12:19 US/Eastern, Fred L. Drake, Jr. wrote: > > I wish you were wrong on this, but I don't think you are. ;-( > > Fred: I apologize [publically]..... my bad. I was mistaking you for > the other Fred. Aha! Yeah, I think both Fredrik and myself have been transmogrified into black holes, even though you meant Fredrik. > In any case, let me know what I can do to contribute to the effort. We'll, patches are cool. ;-) I have a number of other things that need to be dealt with in other projects still, though I'm glad to help out with xmlrpclib. (There are definately some Expat matters to deal with.) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From tim.one@comcast.net Thu Mar 6 18:42:33 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 06 Mar 2003 13:42:33 -0500 Subject: [Python-Dev] httplib SSLFile broken in CVS In-Reply-To: Message-ID: [Kevin Jacobs] > How many people run the regression suite with 'network' enabled? FYI, I always do on Windows, unless I'm running tests on an unconnected laptop. From erik@pythonware.com Thu Mar 6 18:59:28 2003 From: erik@pythonware.com (erik heneryd) Date: Thu, 06 Mar 2003 19:59:28 +0100 Subject: [Python-Dev] xmlrpclib: Apology In-Reply-To: References: Message-ID: <3E679A90.7030403@pythonware.com> Bill Bumgarner wrote: > Fred: I apologize [publically]..... my bad. I was mistaking you for > the other Fred. In Sweden, Fred is not a very common nickname for Fredrik. In fact I don't know if I ever heard of a Fred-Fredrik. This includes the bot himself, who just smiles when someone calls him Fred (or Frederick or something). :-) Erik From bbum@codefab.com Thu Mar 6 18:55:59 2003 From: bbum@codefab.com (Bill Bumgarner) Date: Thu, 6 Mar 2003 13:55:59 -0500 Subject: [Python-Dev] xmlrpclib: Apology In-Reply-To: <3E679A90.7030403@pythonware.com> Message-ID: <44DEFB74-5005-11D7-BF05-000393877AE4@codefab.com> On Thursday, Mar 6, 2003, at 13:59 US/Eastern, erik heneryd wrote: > In Sweden, Fred is not a very common nickname for Fredrik. In fact I > don't know if I ever heard of a Fred-Fredrik. This includes the bot > himself, who just smiles when someone calls him Fred (or Frederick or > something). :-) I'm just a hillbilly from the midwest of the US... I wouldn't know. ;-) From jeremy@zope.com Thu Mar 6 20:14:17 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 06 Mar 2003 15:14:17 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <3E635BD3.9000107@algroup.co.uk> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> Message-ID: <1046981657.15348.80.camel@slothrop.zope.com> On Mon, 2003-03-03 at 08:42, Ben Laurie wrote: > > I think the fundamental problem for rexec is that you don't have a > > security kernel. The code for security gets scatter throughout the > > interpreter. It's hard to have much assurance in the security when > > its tangled up with everything else in the language. > > > > You can use a proxy for an object to deal with goal #1 above -- > > enforce an interface for an object. I think about this much like a > > hardware capability architecture. The protected objects live in the > > capability segment and regular code can't access them directly. The > > only access is via a proxy object that is bound to the capability. > > > > Regardless of proxy vs. rexec, I'd be interested to hear what you > > think about a sound way to engineer a secure Python. > > I'm told that proxies actually rely on rexec, too. So, I guess whichever > approach you take, you need rexec. > > The problem is that although you can think about proxies as being like a > segmented architecture, you have to enforce that segmentation. And that > means doing so throughout the interpreter, doesn't it? I suppose it > might be possible to abstract things in some way to make that less > widespread, but probably not without having an adverse impact on speed. The boundary between the interpreter and the proxy is the generic type object API. The Python code does not know anything about the representation of a proxy object, except that it is a PyObject *. As a result, the only way to invoke operations on its is to go through the various APIs in the type object's table of function pointers. There are surely limits to how far the separation can go. I expect you can't inherit from a proxy for a class, such that the base class is in a different protection domain than the subclass. But I think there are fewer ad hoc restrictions than there are in rexec. I think this provides a pretty clean separation of concerns, even if the proxy object were a standard part of Python. The only code that should manipulate the proxy representation is its implementation. The only other step would be to convince yourself that Python does not inspect arbitrary parts of a concrete PyObject * in an unsafe way. Jeremy From ben@algroup.co.uk Fri Mar 7 14:21:40 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Fri, 07 Mar 2003 14:21:40 +0000 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <1046981657.15348.80.camel@slothrop.zope.com> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> Message-ID: <3E68AAF4.3060508@algroup.co.uk> Jeremy Hylton wrote: > On Mon, 2003-03-03 at 08:42, Ben Laurie wrote: > >>>I think the fundamental problem for rexec is that you don't have a >>>security kernel. The code for security gets scatter throughout the >>>interpreter. It's hard to have much assurance in the security when >>>its tangled up with everything else in the language. >>> >>>You can use a proxy for an object to deal with goal #1 above -- >>>enforce an interface for an object. I think about this much like a >>>hardware capability architecture. The protected objects live in the >>>capability segment and regular code can't access them directly. The >>>only access is via a proxy object that is bound to the capability. >>> >>>Regardless of proxy vs. rexec, I'd be interested to hear what you >>>think about a sound way to engineer a secure Python. >> >>I'm told that proxies actually rely on rexec, too. So, I guess whichever >>approach you take, you need rexec. >> >>The problem is that although you can think about proxies as being like a >>segmented architecture, you have to enforce that segmentation. And that >>means doing so throughout the interpreter, doesn't it? I suppose it >>might be possible to abstract things in some way to make that less >>widespread, but probably not without having an adverse impact on speed. > > > The boundary between the interpreter and the proxy is the generic type > object API. The Python code does not know anything about the > representation of a proxy object, except that it is a PyObject *. As a > result, the only way to invoke operations on its is to go through the > various APIs in the type object's table of function pointers. > > There are surely limits to how far the separation can go. I expect you > can't inherit from a proxy for a class, such that the base class is in a > different protection domain than the subclass. But I think there are > fewer ad hoc restrictions than there are in rexec. > > I think this provides a pretty clean separation of concerns, even if the > proxy object were a standard part of Python. The only code that should > manipulate the proxy representation is its implementation. The only > other step would be to convince yourself that Python does not inspect > arbitrary parts of a concrete PyObject * in an unsafe way. I'm obviously missing something - surely you can say pretty much exactly the same thing about a bound method, just replace "type object" with "PyMethodObject"? And in either case, you also need to restrict access to the underlying libraries and (presumably) some of the builtin functions? BTW, Guido pointed out to me that I'm causing confusion by saying "rexec" when I really mean "restricted execution". In short, it seems to me that proxies and capabilities via bound methods both do the same basic thing: i.e. prevent inspection of what is behind the capability/proxy. Proxies add access control to decide whether you get to use them or not, whereas in a capability system simple posession of the capability is sufficient (i.e. they are like a proxy where the security check always says "yes"). You do access control using capabilities, instead of inside them. Am I not understanding proxies? Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From guido@python.org Fri Mar 7 17:41:16 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Mar 2003 12:41:16 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: "Your message of Fri, 07 Mar 2003 15:42:13 GMT." <3E68BDD5.5020608@algroup.co.uk> Message-ID: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> [Moving a discussion about capabilities to where it arguably belongs] [Ben Laurie] > The point about capabilities is that mere possession of a capability is > all that is required to exercise it. If you start adding security > checkers to them, then you don't have capabilities anymore. But the > point is somewhat deeper that than - given capabilities, you can > implement proxies without requiring any more infrastructure - you can > also implement security schemes that don't really correspond to any kind > of security checking at all (ok, you can probably find some convoluted > way to achieve the same effect, but I'll bet it comes down to having > tokens that correspond to proxies, and security checkers that allow you > to proceed if you have the appropriate token - in other words, > capabilities, but very hard to use). > > So, it seems to me, its simpler and more powerful to start with > capabilities and build proxies on top of them (or whatever alternate > scheme you want to build). > > Once more, my apologies for not just getting straight to the point. > > BTW, if you would like to explain why you don't think bound methods are > the way to go on python-dev, I'd love to hear it. It seems to e a matter of convenience. Often objects have many methods to which you want to provide access as a group. E.g. I might have a service configuration registry object. The object behaves roughly like a dictionary. A certain user may be given read-only access to the registry. Using capabilities, I would have to hand her a bunch of capabilities for various methods: __getitem__, has_key, get, keys, items, values, and many more. Using proxies I can simply give her a read-only proxy for the object. So proxies are more powerful. Before you start saying that we should use capabilities as the more fundamental mechanism and build proxies on top of that: as you point out, we already have an equivalent more fundamental mechanism, bound methods, which is equivalent to capabilities. It's just that raw capabilities aren't very usable, so one way or another we've got to build something on top of that. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 7 19:42:12 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Mar 2003 14:42:12 -0500 Subject: [Python-Dev] super() bug (?) In-Reply-To: "Your message of Thu, 06 Mar 2003 18:21:40 +0100." <009e01c2e404$da3677c0$6d94fea9@newmexico> References: <009e01c2e404$da3677c0$6d94fea9@newmexico> Message-ID: <200303071942.h27JgCq23992@pcp02138704pcs.reston01.va.comcast.net> [Samuele] > >>> class C(object): > ... def f(self): pass > ... > >>> class D(C): pass > ... > >>> D.f > > >>> super(D,D).f > > > > I think this should produce the same thing as D.f, Really? It makes no sense either way though. super(D, D) only makes sense from inside a class method; there the first argument should be the current class and the second should be the cls argument to the class method, e.g.: class C(object): def cm(cls): pass cm = classmethod(cm) class D(C): def cm(cls): super(D, cls).cm() # ~Same as C.cm(cls) And this works. I should also mention that super() should really only be used to call a method with the same name as the currently called method -- I see no use case for using super() with another method. > that means implementation-wise > > f.__get__(None,D) should be called > > not f.__get__(D,D). > > _.__get__(None,D) would still do the right thing for static AND class methods: > > >>> def g(cls): pass > ... > >>> classmethod(g).__get__(None,D) > > It shouldn't be terribly hard to detect this situation and fix it (somewhere in super_init()) but unless you have a use case I'd rather consider this as a "don't care" situation. --Guido van Rossum (home page: http://www.python.org/~guido/) From pedronis@bluewin.ch Fri Mar 7 19:47:18 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Fri, 7 Mar 2003 20:47:18 +0100 Subject: [Python-Dev] super() bug (?) References: <009e01c2e404$da3677c0$6d94fea9@newmexico> <200303071942.h27JgCq23992@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <02d701c2e4e2$5d214b00$6d94fea9@newmexico> From: "Guido van Rossum" > [Samuele] > > >>> class C(object): > > ... def f(self): pass > > ... > > >>> class D(C): pass > > ... > > >>> D.f > > > > >>> super(D,D).f > > > > > > > I think this should produce the same thing as D.f, > > Really? It makes no sense either way though. It was sloppy phrased, super(D,D).f should return the same value as C.f, that means an unbound method like D.f. > super(D, D) only makes > sense from inside a class method; there the first argument should be > the current class and the second should be the cls argument to the > class method, e.g.: > > class C(object): > def cm(cls): pass > cm = classmethod(cm) > > class D(C): > def cm(cls): > super(D, cls).cm() # ~Same as C.cm(cls) you mean C.cm() > And this works. > > I should also mention that super() should really only be used to call > a method with the same name as the currently called method -- I see > no use case for using super() with another method. > > It shouldn't be terribly hard to detect this situation and fix it > (somewhere in super_init()) but unless you have a use case I'd > rather consider this as a "don't care" situation. > no, but passing D,D even to classmethods __get__ is working but conceptually bogus, btw the clarifying example Python impl for super semantics at http://www.python.org/2.2.2/descrintro.html#cooperation is broken wrt to classmethods. From tim.one@comcast.net Fri Mar 7 21:03:36 2003 From: tim.one@comcast.net (Tim Peters) Date: Fri, 07 Mar 2003 16:03:36 -0500 Subject: [Python-Dev] test_popen broken on Win2K Message-ID: Someone changed test_popen to "quote" the path to python: cmd = '"%s" -c "import sys;print sys.argv" %s' % (sys.executable, cmdline) ^ ^ The double-quote characters above the carets are new. This causes test_popen to fail on Win2K, but not on Win98. The relevant difference appears to be the default shell (cmd.exe on the former, command.com on the latter). Simplifed example, on Win2K: >>> p = os.popen('python -c "print 666"') >>> p.read() '666\n' >>> p.close() >>> Worked fine, but doesn't if python is quoted: >>> p = os.popen('"python" -c "print 666"') >>> p.read() '' >>> p.close() 1 >>> The same kind of behavior can be observed directly from a DOS-box prompt: C:\Code\python\PCbuild>cmd /c python -c "print 666" 666 C:\Code\python\PCbuild> Worked fine, but quoting the program name flops: C:\Code\python\PCbuild>cmd /c "python" -c "print 666" 'python" -c "print' is not recognized as an internal or external command, operable program or batch file. C:\Code\python\PCbuild> So it looks like it stripped off the first and last double-quote characters, leaving two senseless double-quote characters "in the middle". >From the docs for cmd.exe: """ If /C or /K is specified, then the remainder of the command line after the switch is processed as a command line, where the following logic is used to process quote (") characters: 1. If all of the following conditions are met, then quote characters on the command line are preserved: - no /S switch - exactly two quote characters - no special characters between the two quote characters, where special is one of: &<>()@^| - there are one or more whitespace characters between the the two quote characters - the string between the two quote characters is the name of an executable file. 2. Otherwise, old behavior is to see if the first character is a quote character and if so, strip the leading character and remove the last quote character on the command line, preserving any text after the last quote character. """ We're apparently in case #2, if for no other reason then for that there aren't "exactly two quote characters". I'll check in a hack to worm around this in the test, but anyone who can do better, please do (I won't have access to a Win2K box next week). From nas@python.ca Fri Mar 7 21:21:18 2003 From: nas@python.ca (Neil Schemenauer) Date: Fri, 7 Mar 2003 13:21:18 -0800 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: References: Message-ID: <20030307212117.GG13770@glacier.arctrix.com> Tim Peters wrote: > Someone changed test_popen to "quote" the path to python: > > cmd = '"%s" -c "import sys;print sys.argv" %s' % (sys.executable, > cmdline) > ^ ^ > > The double-quote characters above the carets are new. Having to quote arguments to popen and system is a pet peave of mine. 99% of the time I don't not want or need the shell. Is it possible to write versions of system() and popen() that do not use the shell on Windows? I know it's possible on Unix systems. It would be really nice if both popen() and system() could take a sequence for the command and arguments in addition to a string. Neil From theller@python.net Fri Mar 7 21:48:29 2003 From: theller@python.net (Thomas Heller) Date: 07 Mar 2003 22:48:29 +0100 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: References: Message-ID: Tim Peters writes: > Someone changed test_popen to "quote" the path to python: > > cmd = '"%s" -c "import sys;print sys.argv" %s' % (sys.executable, > cmdline) > ^ ^ > > The double-quote characters above the carets are new. > > This causes test_popen to fail on Win2K, but not on Win98. The relevant > difference appears to be the default shell (cmd.exe on the former, > command.com on the latter). In distutils we had a similar problem. I don't remember the details at the moment exactly, but I think enclosing sys.executable in double quotes *only* when it contains spaces should do the trick. Thomas From tim.one@comcast.net Fri Mar 7 22:02:23 2003 From: tim.one@comcast.net (Tim Peters) Date: Fri, 07 Mar 2003 17:02:23 -0500 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: Message-ID: [Thomas Heller] > ... > In distutils we had a similar problem. I don't remember the details > at the moment exactly, but I think enclosing sys.executable in double > quotes *only* when it contains spaces should do the trick. That's what I checked in, but doubt it works in general. The cmdline test_popen passes to cmd.exe would have 4 double-quote characters then, and the docs I quoted clearly say it falls into the second case then (so it would strip the first and last quotes, leaving the second and third, which don't make sense anymore). The trick would work if the executable path were the only quoted thing on the cmdline. From tim.one@comcast.net Fri Mar 7 22:16:21 2003 From: tim.one@comcast.net (Tim Peters) Date: Fri, 07 Mar 2003 17:16:21 -0500 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: <20030307212117.GG13770@glacier.arctrix.com> Message-ID: [Neil Schemenauer] > Having to quote arguments to popen and system is a pet peave of mine. > 99% of the time I don't not want or need the shell. Is it possible to > write versions of system() and popen() that do not use the shell on > Windows? I know it's possible on Unix systems. It would be really nice > if both popen() and system() could take a sequence for the command and > arguments in addition to a string. Those would be quite different functions, then, unless you proposed to have Python interpret native shell metacharacters on its own too (e.g., set up pipes, do the indicated file redirections, interpolate envars, and fake whatever other shell gimmicks people may use). The spawn family of functions take a list of arguments and are sometimes more convenient. IIRC, though, on Windows the MS spawn implementation pastes them back into a cmdline, and then you get some *really* bizarre quoting problems. I always thought Tcl's "exec" cmd was worthy of stealing. That defines a sh-like syntax for specifying OS commands, but arranges to interpret them the same way on all platforms (so, e.g, "2>&1" redirects stderr to stdout even on Win95; last I looked, there were thousands of lines in the Tcl implementation devoted to making this command work). From altis@semi-retired.com Fri Mar 7 22:31:21 2003 From: altis@semi-retired.com (Kevin Altis) Date: Fri, 7 Mar 2003 14:31:21 -0800 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: Message-ID: > From: Thomas Heller > > Tim Peters writes: > > > Someone changed test_popen to "quote" the path to python: > > > > cmd = '"%s" -c "import sys;print sys.argv" %s' % (sys.executable, > > cmdline) > > ^ ^ > > > > The double-quote characters above the carets are new. > > > > This causes test_popen to fail on Win2K, but not on Win98. The relevant > > difference appears to be the default shell (cmd.exe on the former, > > command.com on the latter). > > In distutils we had a similar problem. I don't remember the details > at the moment exactly, but I think enclosing sys.executable in double > quotes *only* when it contains spaces should do the trick. My example isn't for popen, but this sounds familiar. There are a few places where I had to do things like this for some Win98 folks that installed 'Python22' into 'C:\Program Files\' instead of at 'C:\' if ' ' in sys.executable: python = '"' + sys.executable + '"' else: python = sys.executable os.spawnv(os.P_NOWAIT, python, [python, filename]) there have also been some quote issues with the arguments like filename and I'm still not sure all the cases on various versions of Windows, Mac OS X, and Linux work correctly all the time. David Ascher is on vacation, otherwise he could tell us all about the process.py module and how it relates to these issues :) ka From dave@boost-consulting.com Sat Mar 8 02:04:42 2003 From: dave@boost-consulting.com (David Abrahams) Date: Fri, 07 Mar 2003 21:04:42 -0500 Subject: [Python-Dev] [2.3a2+] Change in int() behavior Message-ID: The following change in behavior is causing one of my tests to fail. Is it intentional? Should isinstance(int(x),int) really ever return False? $ python Python 2.2.2 (#1, Feb 3 2003, 14:10:37) [GCC 3.2 20020927 (prerelease)] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> int(sys.maxint * 2) Traceback (most recent call last): File "", line 1, in ? OverflowError: long int too large to convert to int >>> exit 'Use Ctrl-D (i.e. EOF) to exit.' >>> dave@penguin ~ $ /usr/local/pydebug/bin/python Python 2.3a2+ (#1, Feb 24 2003, 15:02:10) [GCC 3.2 20020927 (prerelease)] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys [17514 refs] >>> int(sys.maxint * 2) 4294967294L [17613 refs] >>> -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim.one@comcast.net Sat Mar 8 04:06:53 2003 From: tim.one@comcast.net (Tim Peters) Date: Fri, 07 Mar 2003 23:06:53 -0500 Subject: [Python-Dev] [2.3a2+] Change in int() behavior In-Reply-To: Message-ID: [David Abrahams] > The following change in behavior is causing one of my tests to fail. Dear Lord, another buggy test . > Is it intentional? Yes, as part of the ongoing push toward int/long unification. If you tried the same test in Python 2.1, it would have blown up in the "sys.maxint * 2" part. In 2.2, it blows up in the "int()" part. In 2.3, it doesn't blow up at all. In 2.4 or 2.5, __builtin__.int and __builtin__.long may well be the same object. > Should isinstance(int(x),int) really ever return False? In 2.3, yes (albeit unfortunately). > Python 2.2.2 (#1, Feb 3 2003, 14:10:37) > >>> int(sys.maxint * 2) > Traceback (most recent call last): > File "", line 1, in ? > OverflowError: long int too large to convert to int > Python 2.3a2+ (#1, Feb 24 2003, 15:02:10) > >>> int(sys.maxint * 2) > 4294967294L Adding one more: Python 2.1.3 (#35, Apr 8 2002, 17:47:50) [MSC 32 bit (Intel)] on win32 >>> int(sys.maxint * 2) Traceback (most recent call last): File "", line 1, in ? OverflowError: integer multiplication From dave@boost-consulting.com Sat Mar 8 05:27:07 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 08 Mar 2003 00:27:07 -0500 Subject: [Python-Dev] [2.3a2+] Change in int() behavior In-Reply-To: (Tim Peters's message of "Fri, 07 Mar 2003 23:06:53 -0500") References: Message-ID: Tim Peters writes: > [David Abrahams] >> The following change in behavior is causing one of my tests to fail. > > Dear Lord, another buggy test . > >> Is it intentional? > > Yes, as part of the ongoing push toward int/long unification. If you tried > the same test in Python 2.1, it would have blown up in the "sys.maxint * 2" > part. In 2.2, it blows up in the "int()" part. In 2.3, it doesn't blow up > at all. In 2.4 or 2.5, __builtin__.int and __builtin__.long may well be the > same object. Yes, but in the meantime, PyInt_AS_LONG( invoke_int_conversion(x) ) might be a crash instead of raising an exception. That's what is causing my test to fail. I guess I just need to lowercase a few characters, but it's worth noting that this change breaks existing extension module code. -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim.one@comcast.net Sat Mar 8 06:28:54 2003 From: tim.one@comcast.net (Tim Peters) Date: Sat, 08 Mar 2003 01:28:54 -0500 Subject: [Python-Dev] [2.3a2+] Change in int() behavior In-Reply-To: Message-ID: [David Abrahams] > Yes, but in the meantime, PyInt_AS_LONG( invoke_int_conversion(x) ) > might be a crash instead of raising an exception. As the comment before that macro's definition says, /* Macro, trading safety for speed */ PyInt_AsLong() won't crash, but its result needs to be checked for an error return. Note that your example: PyInt_AS_LONG( invoke_int_conversion(x) ) wasn't safe before either: I'm not sure what invoke_int_conversion(x) means exactly, but the plausible meanings I can think of for it *could* yield a NULL pointer, or a pointer to a non-int object, in any version of Python (e.g., PyNumber_Int() calls tp_as_number->nb_int but doesn't check the return value for sanity). In either of those cases PyInt_AS_LONG blindly applied to the result could crash. > That's what is causing my test to fail. I guess I just need to lowercase > a few characters, You also need to check for an error return -- PyInt_AsLong() can fail. > but it's worth noting that this change breaks existing > extension module code. I'm not sure it can break any I wouldn't have considered broken before. It's normal (& expected) not to use the macro unless you *know* you've got an int object, usually by virtue of passing a PyInt_Check() test first. From dave@boost-consulting.com Sat Mar 8 07:27:26 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 08 Mar 2003 02:27:26 -0500 Subject: [Python-Dev] [2.3a2+] Change in int() behavior In-Reply-To: (Tim Peters's message of "Sat, 08 Mar 2003 01:28:54 -0500") References: Message-ID: Tim Peters writes: > [David Abrahams] >> Yes, but in the meantime, PyInt_AS_LONG( invoke_int_conversion(x) ) >> might be a crash instead of raising an exception. > > As the comment before that macro's definition says, > > /* Macro, trading safety for speed */ > > PyInt_AsLong() won't crash, but its result needs to be checked for an error > return. > > Note that your example: > > PyInt_AS_LONG( invoke_int_conversion(x) ) > > wasn't safe before either: I'm not sure what invoke_int_conversion(x) means > exactly, but the plausible meanings I can think of for it *could* yield a > NULL pointer, or a pointer to a non-int object, in any version of > Python Yeah, actually invoke_int_conversion basically invoked the nb_int slot, and I _was_ doing the NULL check (don't forget, C++ has exceptions). > (e.g., PyNumber_Int() calls tp_as_number->nb_int but doesn't check the > return value for sanity). In either of those cases PyInt_AS_LONG blindly > applied to the result could crash. > > >> That's what is causing my test to fail. I guess I just need to lowercase >> a few characters, > > You also need to check for an error return -- PyInt_AsLong() can fail. I knew that, but thanks. I was exaggerating for effect; did it, it works. >> but it's worth noting that this change breaks existing >> extension module code. > > I'm not sure it can break any I wouldn't have considered broken before. > It's normal (& expected) not to use the macro unless you *know* you've got > an int object, usually by virtue of passing a PyInt_Check() test first. I guess I was reading http://www.python.org/doc/current/ref/numeric-types.html#l2h-184 a little too strongly when I wrote the code: __complex__(self) __int__(self) __long__(self) __float__(self) Called to implement the built-in functions complex() int() long() and float(). Should return a value of the appropriate type. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ That's pretty weak language, and even if it were strong I should've known better. Some joker would eventually add an __int__ method, that returns, say, a Long. I just didn't expect it to happen in the core (for some reason). -- Dave Abrahams Boost Consulting www.boost-consulting.com From newsgroups1@bitfurnace.com Sat Mar 8 08:26:13 2003 From: newsgroups1@bitfurnace.com (Damien Morton) Date: Sat, 8 Mar 2003 03:26:13 -0500 Subject: [Python-Dev] acceptability of asm in python code? Message-ID: In the BINARY_ADD opcode, and in most arithmetic opcodes, there is a line that checks for overflow that looks like this: if ((i^a) < 0 && (i^b) < 0) goto slow_add; I got a small speedup by replacing this with a macro defined thusly: #if defined(_MSC_VER) and defined(_M_IX86) #define IF_OVERFLOW_GOTO(X) __asm { jo X }; #else #define IF_OVERFLOW_GOTO(X) if ((i^a) < 0 && (i^b) < 0) goto X; #endif Would this case be an acceptable use of snippets of inline assembler? From martin@v.loewis.de Sat Mar 8 11:43:22 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat, 8 Mar 2003 12:43:22 +0100 Subject: [Python-Dev] Internationalizing domain names Message-ID: <200303081143.h28BhMTQ002892@mira.informatik.hu-berlin.de> IETF has recently published a series of RFCs to support non-ASCII characters in domain names. This is called IDNA, Internationalizing domain names in applications. It works by applications converting Unicode domain names into ASCII ones (using an ACE, ASCII compatible encoding), which are then send to the DNS. I have implemented this technology for Python, and would like to see it included in Python 2.3. It consists of the following pieces: - Tools/unicode/mkstringprep.py, which generates Lib/stringprep.py from the source of RFC 3454, - Lib/encodings/punycode.py, patch 632643, implementing RFC 3492, - Lib/encodings/idna.py, implementing both RFC 3493 (nameprep) and RFC 3490 (idna) - modifications to the socket module, to accept Unicode for host names, and convert it using IDNA. - various test cases Changes to httplib, ftplib, etc are not necessary, as they just pass the host names through to the socket calls. I have no changes to the urllib* modules, as the work on IRIs (internationalized resource identifiers) is still in progress. As the result, if one puts non-ASCII into just the hostname part of an URL, urllib will do the right thing; urllib2 will complain about the non-ASCII characters. Would anybody like to review these changes? Regards, Martin From ben@algroup.co.uk Sat Mar 8 12:27:40 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Sat, 08 Mar 2003 12:27:40 +0000 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E69E1BC.5090508@algroup.co.uk> Guido van Rossum wrote: > [Moving a discussion about capabilities to where it arguably belongs] > > [Ben Laurie] > >>The point about capabilities is that mere possession of a capability is >>all that is required to exercise it. If you start adding security >>checkers to them, then you don't have capabilities anymore. But the >>point is somewhat deeper that than - given capabilities, you can >>implement proxies without requiring any more infrastructure - you can >>also implement security schemes that don't really correspond to any kind >>of security checking at all (ok, you can probably find some convoluted >>way to achieve the same effect, but I'll bet it comes down to having >>tokens that correspond to proxies, and security checkers that allow you >>to proceed if you have the appropriate token - in other words, >>capabilities, but very hard to use). >> >>So, it seems to me, its simpler and more powerful to start with >>capabilities and build proxies on top of them (or whatever alternate >>scheme you want to build). >> >>Once more, my apologies for not just getting straight to the point. >> >>BTW, if you would like to explain why you don't think bound methods are >>the way to go on python-dev, I'd love to hear it. > > > It seems to e a matter of convenience. Often objects have many > methods to which you want to provide access as a group. E.g. I might > have a service configuration registry object. The object behaves > roughly like a dictionary. A certain user may be given read-only > access to the registry. Using capabilities, I would have to hand her > a bunch of capabilities for various methods: __getitem__, has_key, > get, keys, items, values, and many more. Using proxies I can simply > give her a read-only proxy for the object. So proxies are more > powerful. > > Before you start saying that we should use capabilities as the more > fundamental mechanism and build proxies on top of that: as you point > out, we already have an equivalent more fundamental mechanism, bound > methods, which is equivalent to capabilities. It's just that raw > capabilities aren't very usable, so one way or another we've got to > build something on top of that. I'm not trying to persuade you that capabilities are better than proxies. I'd prefer to build on them, and it seems you'd prefer to do it another way. That's fine with me - my goal is to make capabilities both possible and easily usable in Python, not to persuade everyone to use them (yet ;-). Bound methods are not capabilities unless they are secured. It seems the correct way to do this is to use restricted execution, and perhaps some other tricks. What I am trying to nail down is exactly what needs doing to get us from where we are now to where capabilities actually work. As I understand it, what is needed is: a) Fix restricted execution, which is in a state of disrepair b) Override import, open (and other stuff? what?) c) Wrap or replace some of the existing libraries, certify that others are "safe" It looks to me like a and b are shared with proxies, and c would be different, by definition. Is there anything else? Am I on the wrong track? I am going to write this all up into a document which can be used as a starting point for work to complete this. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From guido@python.org Sat Mar 8 13:29:58 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 08 Mar 2003 08:29:58 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: "Your message of Sat, 08 Mar 2003 12:27:40 GMT." <3E69E1BC.5090508@algroup.co.uk> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> Message-ID: <200303081329.h28DTw527129@pcp02138704pcs.reston01.va.comcast.net> > What I am trying to nail down is exactly what needs doing to get us > from where we are now to where capabilities actually work. As I > understand it, what is needed is: > > a) Fix restricted execution, which is in a state of disrepair Yes. > b) Override import, open (and other stuff? what?) Don't worry about this; it's taken care of by the rexec module; each application will probably want to do this a little differently (certainly Zope has its own way). > c) Wrap or replace some of the existing libraries, certify that others > are "safe" This should only be necessary for (core and 3rd party) extension modules. The rexec module has a framework for this. > It looks to me like a and b are shared with proxies, and c would be > different, by definition. Is there anything else? Am I on the wrong track? I don't know why you think (c) is different. > I am going to write this all up into a document which can be used as a > starting point for work to complete this. Excellent. --Guido van Rossum (home page: http://www.python.org/~guido/) From pedronis@bluewin.ch Sat Mar 8 12:50:50 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sat, 8 Mar 2003 13:50:50 +0100 Subject: [Python-Dev] Re: Capabilities References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> Message-ID: <013201c2e571$596d5dc0$6d94fea9@newmexico> From: "Ben Laurie" > Guido van Rossum wrote: > > [Moving a discussion about capabilities to where it arguably belongs] > > > > [Ben Laurie] > > > >>The point about capabilities is that mere possession of a capability is > >>all that is required to exercise it. If you start adding security > >>checkers to them, then you don't have capabilities anymore. But the > >>point is somewhat deeper that than - given capabilities, you can > >>implement proxies without requiring any more infrastructure - you can > >>also implement security schemes that don't really correspond to any kind > >>of security checking at all (ok, you can probably find some convoluted > >>way to achieve the same effect, but I'll bet it comes down to having > >>tokens that correspond to proxies, and security checkers that allow you > >>to proceed if you have the appropriate token - in other words, > >>capabilities, but very hard to use). > >> > >>So, it seems to me, its simpler and more powerful to start with > >>capabilities and build proxies on top of them (or whatever alternate > >>scheme you want to build). > >> > >>Once more, my apologies for not just getting straight to the point. > >> > >>BTW, if you would like to explain why you don't think bound methods are > >>the way to go on python-dev, I'd love to hear it. > > > > > > It seems to e a matter of convenience. Often objects have many > > methods to which you want to provide access as a group. E.g. I might > > have a service configuration registry object. The object behaves > > roughly like a dictionary. A certain user may be given read-only > > access to the registry. Using capabilities, I would have to hand her > > a bunch of capabilities for various methods: __getitem__, has_key, > > get, keys, items, values, and many more. Using proxies I can simply > > give her a read-only proxy for the object. So proxies are more > > powerful. > > > > Before you start saying that we should use capabilities as the more > > fundamental mechanism and build proxies on top of that: as you point > > out, we already have an equivalent more fundamental mechanism, bound > > methods, which is equivalent to capabilities. It's just that raw > > capabilities aren't very usable, so one way or another we've got to > > build something on top of that. > > I'm not trying to persuade you that capabilities are better than > proxies. I'd prefer to build on them, and it seems you'd prefer to do it > another way. That's fine with me - my goal is to make capabilities both > possible and easily usable in Python, not to persuade everyone to use > them (yet ;-). > > Bound methods are not capabilities unless they are secured. It seems the > correct way to do this is to use restricted execution, and perhaps some > other tricks. What I am trying to nail down is exactly what needs doing > to get us from where we are now to where capabilities actually work. As > I understand it, what is needed is: > > a) Fix restricted execution, which is in a state of disrepair > > b) Override import, open (and other stuff? what?) > > c) Wrap or replace some of the existing libraries, certify that others > are "safe" > > It looks to me like a and b are shared with proxies, and c would be > different, by definition. Is there anything else? Am I on the wrong track? there is a difference: proxies cover indipendently much of the holes in restricted execution ... about restricted execution: - the way a new frame acquires the default built-ins vs. installed resticted bult-ins is likely correct but needs auditing; e.g. the last problem fixed related to this was: http://python.org/sf/577530 - under restricted execution some operation, in particular reflective ops ought to be prohibited, the code that implements this is scattered and/because this operations share the same execution paths with "normal" ops; so the first thing is enumerate all that should be prohibited, or devise an approach to security that can work with just a minimal set of guarantees (disabled ops and/or encapsulated objects) These were e.g. identified "problems": http://mail.python.org/pipermail/python-dev/2002-December/031160.html http://mail.python.org/pipermail/python-dev/2003-January/031851.html From jepler@unpythonic.net Sat Mar 8 14:38:44 2003 From: jepler@unpythonic.net (Jeff Epler) Date: Sat, 8 Mar 2003 08:38:44 -0600 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: References: Message-ID: <20030308143843.GB1025@unpythonic.net> When I tackled this problem for a program of mine, I ended up making sure that I always used the "short filename" form for the program to be executed. This way, there were no spaces in the filename and no need to quote them. However, the function I used to do this comes from win32, so test_popen can't use it. Nor can Python fix this up for all users of os.popen() Jeff From thomas@xs4all.net Sat Mar 8 14:40:59 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 8 Mar 2003 15:40:59 +0100 Subject: [Python-Dev] Internationalizing domain names In-Reply-To: <200303081143.h28BhMTQ002892@mira.informatik.hu-berlin.de> References: <200303081143.h28BhMTQ002892@mira.informatik.hu-berlin.de> Message-ID: <20030308144059.GI2112@xs4all.nl> On Sat, Mar 08, 2003 at 12:43:22PM +0100, Martin v. L=F6wis wrote: > Would anybody like to review these changes? I can take a look at it, but I don't actually use IDNA, so don't consider= me an authorative resource :) --=20 Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me = spread! From ben@algroup.co.uk Sat Mar 8 18:09:46 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Sat, 08 Mar 2003 18:09:46 +0000 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <200303081329.h28DTw527129@pcp02138704pcs.reston01.va.comcast.net> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <200303081329.h28DTw527129@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6A31EA.4090609@algroup.co.uk> Guido van Rossum wrote: >>What I am trying to nail down is exactly what needs doing to get us >>from where we are now to where capabilities actually work. As I >>understand it, what is needed is: >> >>a) Fix restricted execution, which is in a state of disrepair > > > Yes. > > >>b) Override import, open (and other stuff? what?) > > > Don't worry about this; it's taken care of by the rexec module; each > application will probably want to do this a little differently > (certainly Zope has its own way). I believe I heard way back that there was a lack of confidence rexec overrode everything that needed overriding - or am I getting mixed up with restricted execution? >>c) Wrap or replace some of the existing libraries, certify that others >>are "safe" > > > This should only be necessary for (core and 3rd party) extension > modules. The rexec module has a framework for this. > > >>It looks to me like a and b are shared with proxies, and c would be >>different, by definition. Is there anything else? Am I on the wrong track? > > > I don't know why you think (c) is different. Because with proxies you'd wrap with proxies, and with capabilities you'd wrap with capabilities. Or do you think there's a way that would work for both (which would, of course, be great)? Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From jeremy@alum.mit.edu Sat Mar 8 19:05:22 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 08 Mar 2003 14:05:22 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <3E69E1BC.5090508@algroup.co.uk> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> Message-ID: <1047150320.2347.26.camel@localhost.localdomain> On Sat, 2003-03-08 at 07:27, Ben Laurie wrote: > Bound methods are not capabilities unless they are secured. It seems the > correct way to do this is to use restricted execution, and perhaps some > other tricks. What I am trying to nail down is exactly what needs doing > to get us from where we are now to where capabilities actually work. As > I understand it, what is needed is: > > a) Fix restricted execution, which is in a state of disrepair > > b) Override import, open (and other stuff? what?) > > c) Wrap or replace some of the existing libraries, certify that others > are "safe" > > It looks to me like a and b are shared with proxies, and c would be > different, by definition. Is there anything else? Am I on the wrong track? I have been trying to argue, though I feel a bit muddled at times, that the proxy approach eliminates the need for rexec and makes it possible to build a "restricted environment" without relying on the rexec code in the interpreter. Any security scheme needs some kind of information hiding to guarantee that untrusted code does not break into the representation of an object, so that, for example, an object can be used as a capability. I think we've discussed two different ways to implement information hiding. The rexec approach is to add code to the interpreter to disable certain introspection features when running untrusted code. The proxy approach is to wrap protected objects in proxies before passing them to untrusted code. I think both techniques achieve the same end, but with different limitations. I prefer the proxy approach because it is more self contained. The rexec approach requires that all developers working in the core on introspection features be aware of security issues. The security kernel ends up being most of the core interpreter -- anything that can introspection on objects. The proxy approach is to create an object that specifically disables introspection by not exposing internals to the core. We need to do some more careful analysis to be sure that proxies really achieve the goal of information hiding. I think another benefit of proxies vs. rexec is that untrusted code can still use all of the standard introspection features when dealing with objects it creates itself. Code running in rexec can't use any introspective feature, period, because all those features are disabled. With the proxy approach, introspection is only disabled on protected objects. > I am going to write this all up into a document which can be used as a > starting point for work to complete this. It sounds like a PEP would be the right thing. It would be nice if the PEP could explain the rationale for a secure Python environment and then develop (at least) the capability approach to building that environment. Perhaps I could chip in with some explanation of the proxy approach. Jeremy From guido@python.org Sun Mar 9 00:25:13 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 08 Mar 2003 19:25:13 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: "Your message of Sat, 08 Mar 2003 18:09:46 GMT." <3E6A31EA.4090609@algroup.co.uk> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <200303081329.h28DTw527129@pcp02138704pcs.reston01.va.comcast.net> <3E6A31EA.4090609@algroup.co.uk> Message-ID: <200303090025.h290PDY27718@pcp02138704pcs.reston01.va.comcast.net> > >>b) Override import, open (and other stuff? what?) > > > > Don't worry about this; it's taken care of by the rexec module; each > > application will probably want to do this a little differently > > (certainly Zope has its own way). > > I believe I heard way back that there was a lack of confidence rexec > overrode everything that needed overriding - or am I getting mixed up > with restricted execution? Indeed. > >>c) Wrap or replace some of the existing libraries, certify that others > >>are "safe" > > > > This should only be necessary for (core and 3rd party) extension > > modules. The rexec module has a framework for this. > > > >>It looks to me like a and b are shared with proxies, and c would be > >>different, by definition. Is there anything else? Am I on the wrong track? > > > > > > I don't know why you think (c) is different. > > Because with proxies you'd wrap with proxies, and with capabilities > you'd wrap with capabilities. Or do you think there's a way that would > work for both (which would, of course, be great)? OK, fair enough. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 9 01:00:02 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 08 Mar 2003 20:00:02 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: "Your message of 08 Mar 2003 14:05:22 EST." <1047150320.2347.26.camel@localhost.localdomain> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> Message-ID: <200303090100.h29102I27782@pcp02138704pcs.reston01.va.comcast.net> [Jeremy] > I have been trying to argue, though I feel a bit muddled at times, that > the proxy approach eliminates the need for rexec and makes it possible > to build a "restricted environment" without relying on the rexec code in > the interpreter. There's one rexec-related feature that you'll need to use though: that all built-ins (including __import__) are loaded from the __builtins__ variable in the globals, and that there's no way to get access to the default __builtins__ (assuming the restricted builtins override __import__ with something that won't let you import the real sys module, etc.). I mention this because this is actually a larger part of the restricted execution code than the restrictions on certain introspections that are also part of it. The latter are clearly not enough, and perhaps we should drop them (*requiring* proxies or capabilities to implement the rexec module, rather than the old and wounded Bastion [see Samuele's posts]). But the former (the treatment of __builtins__) is essential. Perhaps mostly unrelated, I'll also note something about proxy implementation. Assuming proxies are instances of a type proxy, that type must derive from a type object. This means that if p is a proxy, object.__getattribute__(p, 'foo') is valid. It will take some very careful analysis to prove that this cannot circumvent the proxy's safeguards. (I believe Zope's proxies are safe.) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Sun Mar 9 03:41:55 2003 From: tim.one@comcast.net (Tim Peters) Date: Sat, 08 Mar 2003 22:41:55 -0500 Subject: [Python-Dev] acceptability of asm in python code? In-Reply-To: Message-ID: [Damien Morton] > In the BINARY_ADD opcode, and in most arithmetic opcodes, Aren't add and subtract the whole story here? > there is a line that checks for overflow that looks like this: > > if ((i^a) < 0 && (i^b) < 0) goto slow_add; > > I got a small speedup by replacing this with a macro defined thusly: > > #if defined(_MSC_VER) and defined(_M_IX86) "and" isn't C, so I assume you were very lucky . > #define IF_OVERFLOW_GOTO(X) __asm { jo X }; > #else > #define IF_OVERFLOW_GOTO(X) if ((i^a) < 0 && (i^b) < 0) goto X; > #endif > > Would this case be an acceptable use of snippets of inline assembler? If you had said "a huge speedup, on all programs", on the weak end of maybe. "Small speedup" isn't worth the obscurity. Note that Python contains no assembler now. From tismer@tismer.com Sun Mar 9 04:16:24 2003 From: tismer@tismer.com (Christian Tismer) Date: Sun, 09 Mar 2003 05:16:24 +0100 Subject: [Python-Dev] acceptability of asm in python code? In-Reply-To: References: Message-ID: <3E6AC018.90007@tismer.com> Tim Peters wrote: > [Damien Morton] > >>In the BINARY_ADD opcode, and in most arithmetic opcodes, > > > Aren't add and subtract the whole story here? > > >>there is a line that checks for overflow that looks like this: >> >>if ((i^a) < 0 && (i^b) < 0) goto slow_add; >> >>I got a small speedup by replacing this with a macro defined thusly: >> >>#if defined(_MSC_VER) and defined(_M_IX86) > > > "and" isn't C, so I assume you were very lucky . > > >>#define IF_OVERFLOW_GOTO(X) __asm { jo X }; >>#else >>#define IF_OVERFLOW_GOTO(X) if ((i^a) < 0 && (i^b) < 0) goto X; >>#endif >> >>Would this case be an acceptable use of snippets of inline assembler? > > > If you had said "a huge speedup, on all programs", on the weak end of maybe. > "Small speedup" isn't worth the obscurity. Note that Python contains no > assembler now. Just to add my 0.02 EUR. You know that I'm not reluctant to use assembly for platform specific speedups. But first, I'm with Tim, not going this path for such a small win. Second, I'd like to point out that going to assembly for such a huge function like eval_frame is rather dangerous: All compilers have different ways of handling the appearance of assembly. This is a dangerous path, believe me: MS C's behavior is one of the worst, which is the reason why I was very careful to put this in a clean-room for Stackless, for instance: For the appearance of ASM code in some function, the calling sequence and the optimization strategy are changed drastically. Register allocation is changed, the optimization level is reduced, and the calling convention is *never* without stack frames. This might not have changed eval_frame's behavior too much, just because it is too big to benefit from certain optimizations now, but I remember that I changed it once to use about two registers less, and I might re-apply these changes to give the eval loop a boost of about 10 percent. The existance of a single one asm statement would voiden this effect! Hint: Write a small, understandable function twice, once using assembly and once without. Compile the stuff, and set the listing option to everything. Then look at the .cod file, and wonder how different the two versions are. This will make you very reluctant to use any asm statement at all, unless you want to re-write the whole function in assembly, including the "naked" option. Doing the latter for eval_frame would be worthwhile, but then I'd suggest to do this as an external .asm file. If you do this right, taking cache lines and probabilities into account, you can for sure create an overall gain of up to 20 percent. But even this remarkable gain wouldn't be enough, even for me, to go this hard path for a single platform. sincerely -- chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From dmorton@bitfurnace.com Sun Mar 9 05:00:09 2003 From: dmorton@bitfurnace.com (damien morton) Date: Sun, 9 Mar 2003 00:00:09 -0500 Subject: [Python-Dev] acceptability of asm in python code? In-Reply-To: Message-ID: <000501c2e5f8$c384b6e0$6401a8c0@damien> > -----Original Message----- > From: Tim Peters [mailto:tim.one@comcast.net] > Sent: Saturday, 8 March 2003 22:42 > To: Damien Morton > Cc: python-dev@python.org > Subject: RE: [Python-Dev] acceptability of asm in python code? > > > [Damien Morton] > > In the BINARY_ADD opcode, and in most arithmetic opcodes, > > Aren't add and subtract the whole story here? ADD, SUBTRACT and INPLACE variants, yes. Potentially also MULTIPLY. > > there is a line that checks for overflow that looks like this: > > > > if ((i^a) < 0 && (i^b) < 0) goto slow_add; > > > > I got a small speedup by replacing this with a macro defined thusly: > > > > #if defined(_MSC_VER) and defined(_M_IX86) > > "and" isn't C, so I assume you were very lucky . I had been using _MSC_VER, but decided to be a bit more specific for my post. Youre right, of course, the define I posted would not have worked. > > #define IF_OVERFLOW_GOTO(X) __asm { jo X }; > > #else > > #define IF_OVERFLOW_GOTO(X) if ((i^a) < 0 && (i^b) < 0) > goto X; #endif > > > > Would this case be an acceptable use of snippets of inline > assembler? > > If you had said "a huge speedup, on all programs", on the > weak end of maybe. "Small speedup" isn't worth the obscurity. > Note that Python contains no assembler now. Its arguable which is more obscure, the x86 assembly instruction "jo" (jump if overflow), or the xor trickery in C. I take your point, though, about there being no assembly in python now. From cjohns@cybertec.com.au Sun Mar 9 05:17:23 2003 From: cjohns@cybertec.com.au (Chris Johns) Date: Sun, 09 Mar 2003 16:17:23 +1100 Subject: [Python-Dev] VERSION in getpath.c Message-ID: <3E6ACE63.8080903@cybertec.com.au> Hello, First, I am new to Python so I hope this is the correct place to post this type of question. I am playing with embedding Python 2.3a and I am tring to get importing to work. I have noticed the following in module_search_path : /tftpboot/lib/python21.zip /python/lib/python2.1/lib-dynload The 2.1 comes from the VERSION label at the start of getpath.c. Should this be PACKAGE_VERSION ? Regards -- Chris Johns, cjohns at cybertec.com.au From eppstein@ics.uci.edu Sun Mar 9 05:33:06 2003 From: eppstein@ics.uci.edu (David Eppstein) Date: Sat, 08 Mar 2003 21:33:06 -0800 Subject: [Python-Dev] Re: acceptability of asm in python code? References: <000501c2e5f8$c384b6e0$6401a8c0@damien> Message-ID: In article <000501c2e5f8$c384b6e0$6401a8c0@damien>, "damien morton" wrote: > > If you had said "a huge speedup, on all programs", on the > > weak end of maybe. "Small speedup" isn't worth the obscurity. > > Note that Python contains no assembler now. > > Its arguable which is more obscure, the x86 assembly instruction "jo" > (jump if overflow), or the xor trickery in C. > > I take your point, though, about there being no assembly in python now. The place to put this sort of low-level instruction optimization is in the peepholer of your C compiler. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From jim@zope.com Sun Mar 9 11:01:18 2003 From: jim@zope.com (Jim Fulton) Date: Sun, 09 Mar 2003 06:01:18 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6B1EFE.4060500@zope.com> Guido van Rossum wrote: > [Moving a discussion about capabilities to where it arguably belongs] Thanks Guido. I'll respond to Ben here. > [Ben Laurie] > >>The point about capabilities is that mere possession of a capability is >>all that is required to exercise it. If you start adding security >>checkers to them, then you don't have capabilities anymore. Right. Jeremy keeps remining me of this point. Zope 3 uses proxies in a way that doesn't conform to this definition. Zope proxies proxy an object to be protected *and* a policy object called a "checker". The checkers used in Zope perform checks at access time. One could, instead, perform the checks when the proxies are created or earlier and use checkers that simply allowed some names or operations and not others. IOW, you could certainly implement a strict capability model with Zope proxies. ... >>BTW, if you would like to explain why you don't think bound methods are >>the way to go on python-dev, I'd love to hear it. I'll give an answer similar to Guido's but with a different emphasis. I'm an object zealot. :) I like working with object oriented systems. I don't want to lose that and, thus, I don't want computation to be reduced to passing around basic values and functions. I want to be able to pass around objects with interfaces. Zope proxies make it easy to define a capability in terms of an interface. I think this is really important for object-oriented systems. Another feature of Zope proxies that I think is important is that they automate creation of proxies. When you get an attribute from a proxy, the value is proxied. (Actually, the checker decides whether the value is proxied. Zope checkers proxy all objects except basic objects such as numbers, strings, and None.) When you perform an operation on a proxied object, the result is proxied. This means that the code being proxied doesn't have to be aware of proxies, capabilities, or a security model. Note that when you access a method on a proxied object, the method itself is proxied. All you can to with a proxied method is call it, get it's name, and convert it to a string. This is true even of the proxied method is passed to unrestricted code. I agree that we all need restricted execution to work better than it does now. I was hoping that we could colaborate at a higher level as well. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Sun Mar 9 11:13:59 2003 From: jim@zope.com (Jim Fulton) Date: Sun, 09 Mar 2003 06:13:59 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <1047150320.2347.26.camel@localhost.localdomain> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> Message-ID: <3E6B21F7.3040300@zope.com> Jeremy Hylton wrote: > On Sat, 2003-03-08 at 07:27, Ben Laurie wrote: > >>Bound methods are not capabilities unless they are secured. It seems the >>correct way to do this is to use restricted execution, and perhaps some >>other tricks. What I am trying to nail down is exactly what needs doing >>to get us from where we are now to where capabilities actually work. As >>I understand it, what is needed is: >> >>a) Fix restricted execution, which is in a state of disrepair >> >>b) Override import, open (and other stuff? what?) >> >>c) Wrap or replace some of the existing libraries, certify that others >>are "safe" >> >>It looks to me like a and b are shared with proxies, and c would be >>different, by definition. Is there anything else? Am I on the wrong track? > > > I have been trying to argue, though I feel a bit muddled at times, that > the proxy approach eliminates the need for rexec and makes it possible > to build a "restricted environment" without relying on the rexec code in > the interpreter. > > Any security scheme needs some kind of information hiding to guarantee > that untrusted code does not break into the representation of an object, > so that, for example, an object can be used as a capability. I think > we've discussed two different ways to implement information hiding. > > The rexec approach is to add code to the interpreter to disable certain > introspection features when running untrusted code. > > The proxy approach is to wrap protected objects in proxies before > passing them to untrusted code. > > I think both techniques achieve the same end, but with different > limitations. I prefer the proxy approach because it is more self > contained. The rexec approach requires that all developers working in > the core on introspection features be aware of security issues. The > security kernel ends up being most of the core interpreter -- anything > that can introspection on objects. The proxy approach is to create an > object that specifically disables introspection by not exposing > internals to the core. We need to do some more careful analysis to be > sure that proxies really achieve the goal of information hiding. > > I think another benefit of proxies vs. rexec is that untrusted code can > still use all of the standard introspection features when dealing with > objects it creates itself. Code running in rexec can't use any > introspective feature, period, because all those features are disabled. > With the proxy approach, introspection is only disabled on protected > objects. These are all good points. Proxies have a dark side though. They sometimes trip up standard facilities in Python that either depend on specific types or on identity comparisons. With a bit of effort, proxies can be made highly transparent, but they change an object's type and id. For example, you can't proxy exceptions without breaking exception handling. In Zope, we rely on restricted execution to prevent certian kinds of introspection on exceptions and exception classes. In Zope, we also don't proxy None, because None is usually checked for identity. We also don't proxy strings, and numbers. I think I agree that you could build a restricted environment with proxies alone, but, to do so, you would need to make Python far more proxy aware. I think that the language would need to be aware of proxies at a far deeper level. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Sun Mar 9 11:29:15 2003 From: jim@zope.com (Jim Fulton) Date: Sun, 09 Mar 2003 06:29:15 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <3E68AAF4.3060508@algroup.co.uk> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> <3E68AAF4.3060508@algroup.co.uk> Message-ID: <3E6B258B.2080207@zope.com> Ben Laurie wrote: > Jeremy Hylton wrote: > ... > And in either case, you also need to restrict access to the underlying > libraries and (presumably) some of the builtin functions? You don't need restricted execution to make proxies work. In Zope, we choose to use restricted execution in cases where proxies don't work well. (For example, as I mentioned in another note, we can't currently proxy exceptions.) > BTW, Guido pointed out to me that I'm causing confusion by saying > "rexec" when I really mean "restricted execution". Right. I think that there is some confusion floating around wrt proxies (not your fault :) ... > In short, it seems to me that proxies and capabilities via bound methods > both do the same basic thing: i.e. prevent inspection of what is behind > the capability/proxy. Proxies add access control to decide whether you > get to use them or not, whereas in a capability system simple posession > of the capability is sufficient (i.e. they are like a proxy where the > security check always says "yes"). You do access control using > capabilities, instead of inside them. > > Am I not understanding proxies? You are understanding proxies as they are *applied* in Zope. This is understandable, since the information I sent you: http://cvs.zope.org/Zope3/src/zope/security/readme.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup talks more about the higher-level application of proxies in Zope than about the basic proxy features. Really, Zope proxies are on about the same level as bound methods. They are a lower-level abstraction than capabilities. YOu could use them to implement capabilities or you could use them to implement a different approach, as we have done in Zope. As I mentioned in another Zope, I think proxies provide a better way to implement capabilities than bound methods because they provide access to objects with whole interfaces, rather than just individual functions or methods. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pedronis@bluewin.ch Sun Mar 9 11:30:09 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sun, 9 Mar 2003 12:30:09 +0100 Subject: [Python-Dev] Re: Capabilities References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> Message-ID: <011f01c2e62f$3e6d5840$6d94fea9@newmexico> From: "Jim Fulton" > For example, you can't proxy exceptions without > breaking exception handling. In Zope, we rely on restricted execution to prevent > certian kinds of introspection on exceptions and exception classes. In Zope, we > also don't proxy None, because None is usually checked for identity. We also don't > proxy strings, and numbers. > That was a question I was asking myself about proxies: exception handling. But I never had the time to play with it to check. Does that mean that restricted code can get unproxied instances of classic classes as caught exceptions? From guido@python.org Sun Mar 9 11:40:27 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 09 Mar 2003 06:40:27 -0500 Subject: [Python-Dev] VERSION in getpath.c In-Reply-To: "Your message of Sun, 09 Mar 2003 16:17:23 +1100." <3E6ACE63.8080903@cybertec.com.au> References: <3E6ACE63.8080903@cybertec.com.au> Message-ID: <200303091140.h29BeRO04633@pcp02138704pcs.reston01.va.comcast.net> > First, I am new to Python so I hope this is the correct place to > post this type of question. It's not, but you're forgiven. > I am playing with embedding Python 2.3a and I am tring to get > importing to work. I have noticed the following in > module_search_path : > > /tftpboot/lib/python21.zip > /python/lib/python2.1/lib-dynload > > The 2.1 comes from the VERSION label at the start of > getpath.c. Should this be PACKAGE_VERSION ? No, if you look in the Makefile the VERSION variable is passed in from the Makefile to the compilation of getpath.c (only), so that you can override it (and a few other parameters) from the Makefile command line. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 9 12:03:18 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 09 Mar 2003 07:03:18 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: "Your message of Sun, 09 Mar 2003 06:29:15 EST." <3E6B258B.2080207@zope.com> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> <3E68AAF4.3060508@algroup.co.uk> <3E6B258B.2080207@zope.com> Message-ID: <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> [Jim] > You don't need restricted execution to make proxies work. Um, I think that's a dangerous mistake, or a confusion in terminology. Without restricted execution, untrusted code would have access to sys.modules, and from there it would be able to access removeAllProxies. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 9 12:06:31 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 09 Mar 2003 07:06:31 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: "Your message of Sun, 09 Mar 2003 06:29:15 EST." <3E6B258B.2080207@zope.com> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> <3E68AAF4.3060508@algroup.co.uk> <3E6B258B.2080207@zope.com> Message-ID: <200303091206.h29C6VI04752@pcp02138704pcs.reston01.va.comcast.net> > Really, Zope proxies are on about the same level as bound methods. Another difference is that proxies were *designed* for securing off all access. Bound methods have introspection facilities which allow you to go around them. Restricted execution tries to fence off those introspection facilities, but there may be a hole in the fence. --Guido van Rossum (home page: http://www.python.org/~guido/) From zooko@zooko.com Sun Mar 9 12:40:23 2003 From: zooko@zooko.com (Zooko) Date: Sun, 09 Mar 2003 07:40:23 -0500 Subject: [Python-Dev] Re: Capabilities Message-ID: To enforce capability access control, a language requires three things: 1. Pointer-safety. (There must not be a function available which performs the inverse of id().) Python has pointer-safety (unless a 3rd party native extension module has been executed). 2. Mandatory private data (accessible only by the object itself). Normal Python doesn't have mandatory private data. If I understand correctly, both rexec and proxies (attempt to) provide this. They also attempt to provide another safety feature: a wrapper around the standard library and builtins that turns off access to dangerous features according to an overridable security policy. 3. A standard library that follows the Principle of Least Privilege. That is, a library full of tools that you can extend to an object in order to empower it to do specific things (e.g. __builtin__.abs(), os.times(), ...) without thereby also empowering it to do other things (e.g. __builtin__.file(), os.system(), ...). Python doesn't have such a library. Now the Principle of Least Privilege approach to making a library safe is very different from the "sandbox" approach. The latter is to remove all "dangerous" tools from the toolbox (or in our case, to have them dynamically disabled by the "restricted" bit which is determined by an overridable policy). The former is to separate the tools so that dangerous ones don't come tied together with common ones. The security policy, then, is expressed by code that grants or withholds capabilities (== references) rather than by code that toggles the "restricted" bit. Of course, you can start by denying the entire standard library to restricted code, and then incrementally refactor the library or wrap it in Least-Privilege wrappers. Until you have a substantial Least-Privilege-respecting library you can't gain the big benefit of capabilities -- code which is capable of doing something useful without also being capable of doing harm. (You can gain the "sandbox" style of security -- code which is incapable of doing anything useful or harmful.) This requirement also means that there can be no "ambient authority" -- authority that an object receives even if its creator has given it no references. Regards, Zooko P.S. I learned this three-part paradigm from Mark Miller whose paper with Chip Morningstar and Bill Frantz articulates it in more detail: http://www.erights.org/elib/capability/ode/ode-capabilities.html#patt-coop From zooko@zooko.com Sun Mar 9 12:48:31 2003 From: zooko@zooko.com (Zooko) Date: Sun, 09 Mar 2003 07:48:31 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: Message from Zooko of "Sun, 09 Mar 2003 07:40:23 EST." Message-ID: Following-up to my own post in order to apologize for contributing to the tradition of confusing restricted execution with rexec. I, Zooko, wrote: > > 2. Mandatory private data (accessible only by the object itself). Normal > Python doesn't have mandatory private data. If I understand correctly, both > rexec and proxies (attempt to) provide this. They also attempt to provide > another safety feature: a wrapper around the standard library and builtins that > turns off access to dangerous features according to an overridable security > policy. Perhaps it is that "restricted execution" is designed to provide private data, by disabling certain introspection features, and "rexec" and "proxies" are designed to provide the wrapper feature? Regards, Zooko From ben@algroup.co.uk Sun Mar 9 12:45:39 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Sun, 09 Mar 2003 12:45:39 +0000 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <1047150320.2347.26.camel@localhost.localdomain> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> Message-ID: <3E6B3773.7070600@algroup.co.uk> Jeremy Hylton wrote: > On Sat, 2003-03-08 at 07:27, Ben Laurie wrote: > >>Bound methods are not capabilities unless they are secured. It seems the >>correct way to do this is to use restricted execution, and perhaps some >>other tricks. What I am trying to nail down is exactly what needs doing >>to get us from where we are now to where capabilities actually work. As >>I understand it, what is needed is: >> >>a) Fix restricted execution, which is in a state of disrepair >> >>b) Override import, open (and other stuff? what?) >> >>c) Wrap or replace some of the existing libraries, certify that others >>are "safe" >> >>It looks to me like a and b are shared with proxies, and c would be >>different, by definition. Is there anything else? Am I on the wrong track? > > > I have been trying to argue, though I feel a bit muddled at times, that > the proxy approach eliminates the need for rexec and makes it possible > to build a "restricted environment" without relying on the rexec code in > the interpreter. Wouldn't that suggest that the way to fix restricted execution is to do something proxylike, then? > Any security scheme needs some kind of information hiding to guarantee > that untrusted code does not break into the representation of an object, > so that, for example, an object can be used as a capability. I think > we've discussed two different ways to implement information hiding. Yes. > The rexec approach is to add code to the interpreter to disable certain > introspection features when running untrusted code. > > The proxy approach is to wrap protected objects in proxies before > passing them to untrusted code. Again, this suggests to me that perhaps restricted execution should also use wrapping. I guess I will study this idea in more detail when I start writing. > I think both techniques achieve the same end, but with different > limitations. I prefer the proxy approach because it is more self > contained. The rexec approach requires that all developers working in > the core on introspection features be aware of security issues. The > security kernel ends up being most of the core interpreter -- anything > that can introspection on objects. The proxy approach is to create an > object that specifically disables introspection by not exposing > internals to the core. We need to do some more careful analysis to be > sure that proxies really achieve the goal of information hiding. If restricted execution were implemented in the same way, then proxies and restricted execution would both benefit from this analysis. > I think another benefit of proxies vs. rexec is that untrusted code can > still use all of the standard introspection features when dealing with > objects it creates itself. Code running in rexec can't use any > introspective feature, period, because all those features are disabled. > With the proxy approach, introspection is only disabled on protected > objects. Right - this does seem like a desirable feature. >>I am going to write this all up into a document which can be used as a >>starting point for work to complete this. > > It sounds like a PEP would be the right thing. It would be nice if the > PEP could explain the rationale for a secure Python environment and then > develop (at least) the capability approach to building that > environment. Perhaps I could chip in with some explanation of the proxy > approach. That would be excellent! I will write a draft as specified in PEP 1. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From skip@manatee.mojam.com Sun Mar 9 13:00:22 2003 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 9 Mar 2003 07:00:22 -0600 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200303091300.h29D0MAE015342@manatee.mojam.com> Bug/Patch Summary ----------------- 349 open / 3432 total bugs (+7) 124 open / 2011 total patches (+1) New Bugs -------- Move modules out of Carbon (2003-03-02) http://python.org/sf/696206 PyMac_GetFSRef should accept unicode (2003-03-02) http://python.org/sf/696253 Carbon.CF module needs new style classes (2003-03-03) http://python.org/sf/696527 Python 2.4: Warn about omitted mutable_flag. (2003-03-03) http://python.org/sf/696535 How to make a class iterable using a member generator. (2003-03-03) http://python.org/sf/696777 CGIHTTPServer doesn't quote arguments correctly on Windows. (2003-03-03) http://python.org/sf/696846 gensuitemodule overhaul (2003-03-04) http://python.org/sf/697179 string.strip implementation/doc mismatch (2003-03-04) http://python.org/sf/697220 test_posix fails: getlogin (2003-03-04) http://python.org/sf/697556 string.atoi function causing TypeError (2003-03-04) http://python.org/sf/697591 Mention gmtime in Chapter 6.9 "Time access and conversions" (2003-03-05) http://python.org/sf/697983 Move gmtime function from calendar to time module (2003-03-05) http://python.org/sf/697985 Clarify timegm documentation (2003-03-05) http://python.org/sf/697986 Clarify daylight variable meaning (2003-03-05) http://python.org/sf/697988 Clarify mktime semantics (2003-03-05) http://python.org/sf/697989 Document strptime limitation (2003-03-05) http://python.org/sf/697990 __file__ attribute missing from dynamicly loaded module (2003-03-05) http://python.org/sf/698282 urllib2 Request.get_host and proxies (2003-03-05) http://python.org/sf/698374 Tk 8.4.2 and Tkinter.py _substitue function (2003-03-05) http://python.org/sf/698517 list.index() bhvr change > python2.x (2003-03-06) http://python.org/sf/698561 imaplib: parsing INTERNALDATE (2003-03-06) http://python.org/sf/698706 Provide "plucker" format docs. (2003-03-06) http://python.org/sf/698900 Error using Tkinter embeded in C++ (2003-03-06) http://python.org/sf/699068 HTMLParser crash on glued tag attributes (2003-03-06) http://python.org/sf/699079 Tutorial uses omitted slice indices before explaining them (2003-03-06) http://python.org/sf/699237 builtin type inconsistency (2003-03-07) http://python.org/sf/699312 ncurses/curses on solaris (2003-03-07) http://python.org/sf/699379 refcount problem involding debugger (2003-03-07) http://python.org/sf/699594 MIMEText's c'tor adds unwanted trailing newline to text (2003-03-07) http://python.org/sf/699600 Erroneous error message from IDLE (2003-03-07) http://python.org/sf/699630 Canvas Widget origin is off-screen (2003-03-07) http://python.org/sf/699816 Obscure error message (2003-03-08) http://python.org/sf/699934 site.py should ignore trailing CRs in .pth files (2003-03-08) http://python.org/sf/700055 New Patches ----------- allow proxy server authentication with pimp (2003-03-02) http://python.org/sf/696392 fix bug #670311: sys.exit and PYTHONINSPECT (2003-03-04) http://python.org/sf/697613 optparse unit tests + fixes (2003-03-05) http://python.org/sf/697939 optparse OptionGroup docs (2003-03-05) http://python.org/sf/697941 docs for hotshot module (2003-03-05) http://python.org/sf/698505 ZipFile - support for file decryption (2003-03-06) http://python.org/sf/698833 Closed Bugs ----------- threads within an embedded python interpreter (2000-11-03) http://python.org/sf/221327 unreliable file.read() error handling (2002-02-23) http://python.org/sf/521782 Flawed fcntl.ioctl implementation. (2002-05-14) http://python.org/sf/555817 Get rid of etype struct (2002-08-06) http://python.org/sf/591586 email 2.4.3 pkg mail header error (2002-10-16) http://python.org/sf/624254 email incompatibility upgrading to 2.2.2 (2002-10-20) http://python.org/sf/626119 HeaderParseError: no header value (2002-11-04) http://python.org/sf/633550 Optional argument for dict.pop() method (2002-11-17) http://python.org/sf/639806 email.Header misparses mixed headers (2002-11-18) http://python.org/sf/640110 datetime docs need review, LaTeX (2002-12-16) http://python.org/sf/654846 long(3.1415) gives zero on Solaris 8 (2003-02-03) http://python.org/sf/679520 Header loses lines, formats strangely (2003-02-03) http://python.org/sf/679827 Carbon.CF.CFString should require ASCII (2003-02-07) http://python.org/sf/682215 email: preamble must be \n terminated (2003-02-07) http://python.org/sf/682504 socket module on solaris (2003-02-11) http://python.org/sf/684903 IDE asks for attention when quitting (2003-02-11) http://python.org/sf/684975 robotparser only applies first applicable rule (2003-02-20) http://python.org/sf/690214 register command not listed in command line help (2003-02-20) http://python.org/sf/690389 can't build bsddb for 2.3a2 (2003-02-20) http://python.org/sf/690419 LibRef 4.2.1: {m,n} description update (2003-02-23) http://python.org/sf/692016 tkinter.createfilehandler dumps core (2003-02-24) http://python.org/sf/692416 licence allowed, but doesn't work (2003-02-25) http://python.org/sf/693470 Can't multiply str and bool (2003-02-26) http://python.org/sf/693955 email.Parser trashes header (2003-02-26) http://python.org/sf/693996 os.popen() hangs on {Free,Open}BSD (2003-02-26) http://python.org/sf/694062 complex_new does not always respect subtypes (2003-03-01) http://python.org/sf/695651 Closed Patches -------------- More DictMixin (2003-01-14) http://python.org/sf/667730 test_pty hanging on hpux11 (2003-01-20) http://python.org/sf/671384 Make the default encoding provided on Windows (2003-01-21) http://python.org/sf/671666 unicode support for os.listdir() (2003-02-09) http://python.org/sf/683592 Allow freeze to exclude implicits (2003-02-11) http://python.org/sf/684677 Use datetime in _strptime (2003-02-23) http://python.org/sf/691928 fix for bug 639806: default for dict.pop (2003-02-26) http://python.org/sf/693753 From mats@laplaza.org Sat Mar 8 22:31:25 2003 From: mats@laplaza.org (Mats Wichmann) Date: Sat, 08 Mar 2003 15:31:25 -0700 Subject: [Python-Dev] Re: acceptability of asm in python code? In-Reply-To: <20030308170008.25630.88365.Mailman@mail.python.org> Message-ID: <5.1.0.14.1.20030308152850.01ee2328@mail.laplaza.org> > >In the BINARY_ADD opcode, and in most arithmetic opcodes, there is a line >that checks for overflow that looks like this: > >if ((i^a) < 0 && (i^b) < 0) goto slow_add; > >I got a small speedup by replacing this with a macro defined thusly: > >#if defined(_MSC_VER) and defined(_M_IX86) >#define IF_OVERFLOW_GOTO(X) __asm { jo X }; >#else >#define IF_OVERFLOW_GOTO(X) if ((i^a) < 0 && (i^b) < 0) goto X; >#endif > >Would this case be an acceptable use of snippets of inline assembler? I'd personally be more comfortable if we didn't go down that road; there are compilers that don't support asm's (e.g. the Intel Linux compilers). From pedronis@bluewin.ch Sun Mar 9 19:09:33 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sun, 9 Mar 2003 20:09:33 +0100 Subject: [Python-Dev] Re: Capabilities References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> Message-ID: <05a701c2e66f$6c001be0$6d94fea9@newmexico> From: "Samuele Pedroni" > From: "Jim Fulton" > > For example, you can't proxy exceptions without > > breaking exception handling. In Zope, we rely on restricted execution to > prevent > > certian kinds of introspection on exceptions and exception classes. In Zope, > we > > also don't proxy None, because None is usually checked for identity. We also > don't > > proxy strings, and numbers. > > > That was a question I was asking myself about proxies: exception handling. > But I never had the time to play with it to check. > > Does that mean that restricted code can get unproxied instances of classic > classes as caught exceptions? maybe the question was unclear, but it was serious, what I was asking is whether some restricted code can do: try: deliberate code to force exception except Exception,e: ... so that e is caught unproxied. Looking at zope/security/_proxy.c it seems this can be the case... then to be (likely) on the safe side, all exception class definitions for possible e classes: like e.g. class MyExc(Exception): ... ought to be executed _in restricted mode_, or be "trivial/empty": something like class MyExc(Exception): def __init__(self, msg): self.message = msg Exception.__init__(self, msg) def __str__(self): return self.message is already too much rope. Although it seems not to have the "nice" two-level-of-calls behavior of Bastion instances, an unproxied instance of MyExc if MyExc was defined outside of restricted execution, can be used to break out of restricted execution. regards. From Jack.Jansen@oratrix.com Sun Mar 9 21:24:37 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Sun, 9 Mar 2003 22:24:37 +0100 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: Message-ID: <87C29952-5275-11D7-B151-000A27B19B96@oratrix.com> On vrijdag, maa 7, 2003, at 22:48 Europe/Amsterdam, Thomas Heller wrote: > In distutils we had a similar problem. I don't remember the details > at the moment exactly, but I think enclosing sys.executable in double > quotes *only* when it contains spaces should do the trick. But only spaces may not be good enough. What I think we really want is a function that makes any string safe for popen/exec/shell script (or raises an exception if it can't be done?). As this function will have to be platform-specific it seems os.path would be a suitable place for it. Or would this give a false sense of security to people who write cgi scripts or something and then suddenly get hit by an IFS hack or similar trick? -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From logi.stix@verizon.net Sun Mar 9 21:47:10 2003 From: logi.stix@verizon.net (logistix) Date: Sun, 9 Mar 2003 16:47:10 -0500 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: <20030308143843.GB1025@unpythonic.net> Message-ID: <000001c2e685$7114e9b0$20bba8c0@XP> > -----Original Message----- > From: python-dev-admin@python.org > [mailto:python-dev-admin@python.org] On Behalf Of Jeff Epler > Sent: Saturday, March 08, 2003 9:39 AM > To: Tim Peters > Cc: PythonDev > Subject: Re: [Python-Dev] test_popen broken on Win2K > > > When I tackled this problem for a program of mine, I ended up > making sure that I always used the "short filename" form for > the program to be executed. This way, there were no spaces > in the filename and no need to quote them. > > However, the function I used to do this comes from > win32, so test_popen can't use it. Nor can Python > fix this up for all users of > os.popen() > > Jeff > Note that there's a policy/reghack that disables short filenames. This allegedly improves file IO by up to 25 % and is commonly recommended as a performance enhancement for domains that only have W2K + servers and clients. It's also part of Microsoft's Server lockdown documentation. I believe the official stance by Microsoft itself is that short filenames are a legacy feature. From tim.one@comcast.net Sun Mar 9 21:53:51 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 09 Mar 2003 16:53:51 -0500 Subject: [Python-Dev] acceptability of asm in python code? In-Reply-To: <000501c2e5f8$c384b6e0$6401a8c0@damien> Message-ID: [damien morton] > Its arguable which is more obscure, the x86 assembly instruction "jo" > (jump if overflow), or the xor trickery in C. It's not just the assembler, it's also the world of delicate assumptions about how the compiler interleaves generated C code with the forced inline assembler, how that affects optimization in general (see Chris Tismer's post about that), and how brittle that all is. One example of the latter: an idea that resurfaces from time to time is to make Python "short ints" the platform spelling of a 64-bit int. The C overflow-checking code wouldn't be affected by that (part of the reason it's obscure is that it makes no assumption about the size of a Python int). With the inline assembler, though, it would just break -- jo would pick up some accidental setting of the overflow flag under MSVC, or we'd have to arrange to generate __int64 addition code that set the flag the way the macro expects. For a little speedup on the sole operation(s) it targets, it's just not worth the ongoing puzzles. BTW, I'm not sure it's possible to buy a PC anymore less than twice as fast as the one I'm using right now . > I take your point, though, about there being no assembly in python now. There's one place I wish there were: I wish Jeremy had time to fold in his bit of assembler to read the Pentium's clock register. That's a wonderful facility we can't get at now, and the assembler would be limited to a tiny and isolated function. From greg@cosc.canterbury.ac.nz Sun Mar 9 22:42:44 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 10 Mar 2003 11:42:44 +1300 (NZDT) Subject: [Python-Dev] Capabilities In-Reply-To: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303092242.h29MgiJ16667@oma.cosc.canterbury.ac.nz> > E.g. I might have a service configuration registry object. The object > behaves roughly like a dictionary. A certain user may be given > read-only access to the registry. Maybe every Python object should have a flag which can be set to prevent introspection -- like the current restricted execution mechanism, but on a per-object basis. Then any object could be used as a capability. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Sun Mar 9 22:52:39 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 10 Mar 2003 11:52:39 +1300 (NZDT) Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: Message-ID: <200303092252.h29Mqdk17019@oma.cosc.canterbury.ac.nz> > Those would be quite different functions, then, unless you proposed to have > Python interpret native shell metacharacters on its own too (e.g., set up > pipes, do the indicated file redirections, interpolate envars, and fake > whatever other shell gimmicks people may use). What we need is a function which does all those things, but uses some way of specifying them *other* than shell metacharacters. E.g. os.plumb(("sed", "-e", "s/dead/resting/", "parrots"), ("grep", "norwegian"), output = myfile)) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Mon Mar 10 00:10:51 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 09 Mar 2003 19:10:51 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: "Your message of Mon, 10 Mar 2003 11:42:44 +1300." <200303092242.h29MgiJ16667@oma.cosc.canterbury.ac.nz> References: <200303092242.h29MgiJ16667@oma.cosc.canterbury.ac.nz> Message-ID: <200303100010.h2A0ApY06031@pcp02138704pcs.reston01.va.comcast.net> > Maybe every Python object should have a flag which > can be set to prevent introspection -- like the current > restricted execution mechanism, but on a per-object > basis. Then any object could be used as a capability. I think the capability folks would object to calling it a capability though. :-) Two questions: - Where to store the flag? It probably would cost 4 bytes per object. - Which attributes are considered introspective? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Mar 10 01:08:20 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 09 Mar 2003 20:08:20 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: "Your message of Sun, 09 Mar 2003 07:48:31 EST." References: Message-ID: <200303100108.h2A18KS06619@pcp02138704pcs.reston01.va.comcast.net> [Zooko] > > 2. Mandatory private data (accessible only by the object itself). > > Normal Python doesn't have mandatory private data. If I > > understand correctly, both rexec and proxies (attempt to) provide > > this. They also attempt to provide another safety feature: a > > wrapper around the standard library and builtins that turns off > > access to dangerous features according to an overridable security > > policy. [Zooko, responding to himself] > Perhaps it is that "restricted execution" is designed to provide > private data, by disabling certain introspection features, and > "rexec" and "proxies" are designed to provide the wrapper feature? Not really. Restricted execution doesn't provide private data in general: all instance variables of all user-defined classes are accessible to restricted code. However, restricted execution prevents introspection paths that can lead from a function or bound method to its globals or object, respectively, thereby effectively turning functions and bound methods into capabilities. Security proxies can be used to enforce private data, however. The "rexec" module is used to wrap the standard library. Its approach is the following, implemented by overriding __import__: - For modules written in Python, it gives the untrusted code a separate copy of the module, so that the untrusted code can't mess with module globals that might have a meaning to the trusted kernel. - For extension modules (i.e. modules written in C), it has a list of trusted modules, it provides wrappers for some others that only allow a safe subset, and all others are completely off limits. It also wraps open() and a few other built-ins. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Mar 10 01:10:36 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 09 Mar 2003 20:10:36 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: "Your message of Sun, 09 Mar 2003 07:40:23 EST." References: Message-ID: <200303100110.h2A1AaC06743@pcp02138704pcs.reston01.va.comcast.net> > 3. A standard library that follows the Principle of Least > Privilege. That is, a library full of tools that you can extend to > an object in order to empower it to do specific things > (e.g. __builtin__.abs(), os.times(), ...) without thereby also > empowering it to do other things (e.g. __builtin__.file(), > os.system(), ...). Python doesn't have such a library. > > Now the Principle of Least Privilege approach to making a library > safe is very different from the "sandbox" approach. The latter is > to remove all "dangerous" tools from the toolbox (or in our case, to > have them dynamically disabled by the "restricted" bit which is > determined by an overridable policy). The former is to separate the > tools so that dangerous ones don't come tied together with common > ones. The security policy, then, is expressed by code that grants > or withholds capabilities (== references) rather than by code that > toggles the "restricted" bit. This sounds interesting, but I'm not sure I follow it. Can you elaborate by giving a couple of examples? > Of course, you can start by denying the entire standard library to > restricted code, and then incrementally refactor the library or wrap > it in Least-Privilege wrappers. > > Until you have a substantial Least-Privilege-respecting library you > can't gain the big benefit of capabilities -- code which is capable > of doing something useful without also being capable of doing harm. > (You can gain the "sandbox" style of security -- code which is > incapable of doing anything useful or harmful.) > > This requirement also means that there can be no "ambient authority" > -- authority that an object receives even if its creator has given > it no references. Again, I would perhaps understand this if you gave a specific example. Is it like suid in Unix? > Regards, > > Zooko > > P.S. I learned this three-part paradigm from Mark Miller whose > paper with Chip Morningstar and Bill Frantz articulates it in more > detail: > > http://www.erights.org/elib/capability/ode/ode-capabilities.html#patt-coop The paper didn't seem immediately relevant, or perhaps it's too long-winded and I gave up before it touched upon the relevant stuff. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@python.net Mon Mar 10 02:16:47 2003 From: gward@python.net (Greg Ward) Date: Sun, 9 Mar 2003 21:16:47 -0500 Subject: [Python-Dev] Where is OSS used? Message-ID: <20030310021647.GA2378@cthulhu.gerg.ca> I'm working on docs for ossaudiodev, and I thought I'd ask here before bugging the OSS people: does anyone know which operating systems use OSS (Open Sound System) as the standard audio interface? I know Linux up to 2.4 does, as do some (all?) versions of FreeBSD. Do any of the other BSD flavours (OpenBSD, NetBSD, ...) use OSS out-of-the-box? (If you have access to a FooBSD box, take a look for /usr/include/*/soundcard.h -- if it looks like this: """ #ifndef SOUNDCARD_H #define SOUNDCARD_H /* * Copyright by Hannu Savolainen 1993-1997 [...] """ then it's OSS.) Anyone know precisely which 2.5.x version of Linux dropped OSS in favour of ALSA? Please reply directly to me -- my python-dev subscription is temporarily disabled because I went on holiday a week ago, and still haven't caught up with my other email backlog... Thanks -- Greg -- Greg Ward http://www.gerg.ca/ I just read that 50% of the population has below median IQ! From thomas@xs4all.net Mon Mar 10 09:31:24 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 10 Mar 2003 10:31:24 +0100 Subject: [Python-Dev] Where is OSS used? In-Reply-To: <20030310021647.GA2378@cthulhu.gerg.ca> References: <20030310021647.GA2378@cthulhu.gerg.ca> Message-ID: <20030310093124.GM2112@xs4all.nl> On Sun, Mar 09, 2003 at 09:16:47PM -0500, Greg Ward wrote: > I'm working on docs for ossaudiodev, and I thought I'd ask here before > bugging the OSS people: does anyone know which operating systems use OSS > (Open Sound System) as the standard audio interface? I know Linux up to > 2.4 does, as do some (all?) versions of FreeBSD. I'd say 'recent'. I don't recall when it was added, definately a while back, but the oldest machine I have (FreeBSD 4.2) has OSS/Free. From googling I get the impression that it's been there since 3.x, so 'recently' definately holds. Likewise, googling shows OpenBSD also uses OSS/Free -- the commercial OSS installation manual tells you to remove references to OSS/Free from the kernel :) And there's a boatload of supported platforms in the commercial OSS of course, see www.opensound.com. But I don't suggest we try and plug OSS :) > Anyone know precisely which 2.5.x version of Linux dropped OSS in favour > of ALSA? OSS wasn't dropped (not yet anyway), ALSA was added. Also, ALSA has an OSS emulation mode, so I think it's safe to say you need to 'have OSS or ALSA with OSS API emulation' enabled. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ping@zesty.ca Mon Mar 10 10:51:40 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Mon, 10 Mar 2003 04:51:40 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Ben Laurie wrote: > BTW, if you would like to explain why you don't think bound methods are > the way to go on python-dev, I'd love to hear it. Guido van Rossum wrote: > Using capabilities, I would have to hand her > a bunch of capabilities for various methods: __getitem__, has_key, > get, keys, items, values, and many more. Using proxies I can simply > give her a read-only proxy for the object. So proxies are more > powerful. There seems to be a persistent confusion here that i would like to dispel: a capability is not a single lambda. Guido's paragraph, above, seems to believe that it is. In fact, the pattern he described is a common and powerful way of using capabilities. A capability is just an unforgeable object reference. In a pure capability system, the only thing you can do with a capability is to call methods on it (or, if you prefer, all you can do is send messages to it). Interposing an object to expose only a subset of another object's API, such as a read-only subset, is exactly the power capabilities give you. It seems to me that the "rexec vs. proxy" debate is really about a very different question: How do we get from Python's currently promiscuous objects to properly restricted objects? (Once we have properly restricted objects in either fashion, yes, definitely, using proxies to restrict access is a great technique.) If i understand correctly, the "proxy" answer is "we create a special wrapper object, then the programmer has to individually wrap any object they want to be secure". And the "rexec" answer is "we create an interpreter mode in which all objects are secure". I think the latter is far better. To have any sort of real chance at establishing security, you have to start from a place where everything is secure, instead of starting from a place where everything is insecure and you have to individually secure every single object with its own wrapper. The eventual ideal is to have a system where all objects are "pure" objects (i.e. non-introspectable capabilities) by default. Anyone wanting to do introspection would simply have to obtain the "introspect" capability from a privileged place (e.g. sys). For example, class Foo: pass print Foo.__dict__ # fails from sys import introspect print introspect(Foo).__dict__ # succeeds When running the interpreter in secure mode, "introspect" would just be missing from the sys module (again, ideally sys.introspect wouldn't exist by default, and a command-line option would turn it on, but i realize that's far away). This would have the effect of the "introspectable flag" that Guido mentioned, but without expending any storage at all, until you actually needed to introspect something. -- ?!ng From ping@zesty.ca Mon Mar 10 10:55:42 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Mon, 10 Mar 2003 04:55:42 -0600 (CST) Subject: [Python-Dev] Re: Capabilities In-Reply-To: <3E6A31EA.4090609@algroup.co.uk> Message-ID: On Sat, 8 Mar 2003, Ben Laurie wrote: > >>c) Wrap or replace some of the existing libraries, certify that others > >>are "safe" > > > > This should only be necessary for (core and 3rd party) extension > > modules. The rexec module has a framework for this. > > > >>It looks to me like a and b are shared with proxies, and c would be > >>different, by definition. Is there anything else? Am I on the wrong track? > > > > I don't know why you think (c) is different. > > Because with proxies you'd wrap with proxies, and with capabilities > you'd wrap with capabilities. Or do you think there's a way that would > work for both (which would, of course, be great)? This doesn't make any sense to me. The standard libraries would provide proxy wrappers in either caes. The rexec vs. proxy issue doesn't enter into it. By the way -- to avoid confusion between "proxies used to wrap unrestricted objects in order to make them into secure objects" and "proxies used to reduce the interface of an existing secure object", let's call the first "proxy" (as has been used in the "rexec vs. proxy" discussion so far), and call the second a "facet" (which is the term commonly used when capabilities people talk about reducing an interface). We often talk about providing, say, a "read-only facet" on an object. -- ?!ng From ben@algroup.co.uk Mon Mar 10 11:02:26 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Mon, 10 Mar 2003 11:02:26 +0000 Subject: [Python-Dev] Capabilities In-Reply-To: <200303100010.h2A0ApY06031@pcp02138704pcs.reston01.va.comcast.net> References: <200303092242.h29MgiJ16667@oma.cosc.canterbury.ac.nz> <200303100010.h2A0ApY06031@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6C70C2.2010104@algroup.co.uk> Guido van Rossum wrote: >>Maybe every Python object should have a flag which >>can be set to prevent introspection -- like the current >>restricted execution mechanism, but on a per-object >>basis. Then any object could be used as a capability. > > > I think the capability folks would object to calling it a capability > though. :-) No, objects are another way to do it, though it seems to me with somewhat less ease - because the most common use of capabilities is to restrict the type of access to objects other objects have, so you'd need to have multiple objects proxying to the "real" one if you do it at the object level. If we were going to go this route, I'd like the alternative of _also_ being able to set the flag on a bound method. > Two questions: > > - Where to store the flag? It probably would cost 4 bytes per object. You can swap space for time by storing it as an attribute, of course. > - Which attributes are considered introspective? All of them, except methods. Of course, this is what my first approximation to capabilities did (that's what a "capclass" was). Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From ping@zesty.ca Mon Mar 10 11:14:24 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Mon, 10 Mar 2003 05:14:24 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <200303100010.h2A0ApY06031@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Sun, 9 Mar 2003, Guido van Rossum wrote: > - Which attributes are considered introspective? Here's a preliminary description of the boundary between "introspective" and "restricted", off the top of my head: 1. The only thing you can do with a bound method is to call it (bound methods have no attributes except __doc__). 2. The following instance attributes are off limits: __class__, __dict__, __module__. That might be a reasonable start. However, there is still the problem that the established technique for storing instance-specific state in Python is to use globally- accessible data attributes instead of a limited scope. We would also need to add a safe (private) place for instances to put state. -- ?!ng From jim@ZOPE.COM Mon Mar 10 11:26:35 2003 From: jim@ZOPE.COM (Jim Fulton) Date: Mon, 10 Mar 2003 06:26:35 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <011f01c2e62f$3e6d5840$6d94fea9@newmexico> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> Message-ID: <3E6C766B.80400@zope.com> Samuele Pedroni wrote: > From: "Jim Fulton" > >>For example, you can't proxy exceptions without >>breaking exception handling. In Zope, we rely on restricted execution to > > prevent > >>certian kinds of introspection on exceptions and exception classes. In Zope, > > we > >>also don't proxy None, because None is usually checked for identity. We also > > don't > >>proxy strings, and numbers. >> > > That was a question I was asking myself about proxies: exception handling. > But I never had the time to play with it to check. > > Does that mean that restricted code can get unproxied instances of classic > classes as caught exceptions? Right. What we can (and will do) is intercept the exceptions and proxy the exception's instance data. So we'll be relying on restricted execution to protect the exception method meta data and on proxies to protect the exception data. Of course, we'd prefer to be able to proxy the the exception instances themselves. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Mon Mar 10 11:31:16 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 06:31:16 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> <3E68AAF4.3060508@algroup.co.uk> <3E6B258B.2080207@zope.com> <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6C7784.5060103@zope.com> Guido van Rossum wrote: > [Jim] > >>You don't need restricted execution to make proxies work. > > > Um, I think that's a dangerous mistake, or a confusion in terminology. All I'm saying is that the proxy mechanism itself doesn't rely on restricted execution. > Without restricted execution, untrusted code would have access to > sys.modules, and from there it would be able to access > removeAllProxies. All we need to be able to do is control imports. It turns out that to prevent access to sys.modules, we have to replace __builtins__, which has the side-effect of enabling restricted execution. You don't need anything but the ability to restrict imports and other unproxied access to sys.modules to use proxies. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Mon Mar 10 11:41:19 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 06:41:19 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> <3E68AAF4.3060508@algroup.co.uk> <3E6B258B.2080207@zope.com> <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6C79DF.8070809@zope.com> Guido van Rossum wrote: > [Jim] > >>You don't need restricted execution to make proxies work. > > > Um, I think that's a dangerous mistake, or a confusion in terminology. All I'm saying is that the proxy mechanism itself doesn't rely on restricted execution. > Without restricted execution, untrusted code would have access to > sys.modules, and from there it would be able to access > removeAllProxies. All we need to be able to do is control imports. It turns out that to prevent access to sys.modules, we have to replace __builtins__, which has the side-effect of enabling restricted execution. You don't need anything but the ability to restrict imports and other unproxied access to sys.modules to use proxies. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Mon Mar 10 11:51:22 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 06:51:22 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <05a701c2e66f$6c001be0$6d94fea9@newmexico> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> <05a701c2e66f$6c001be0$6d94fea9@newmexico> Message-ID: <3E6C7C3A.2090104@zope.com> Samuele Pedroni wrote: > From: "Samuele Pedroni" > >>From: "Jim Fulton" >> >>>For example, you can't proxy exceptions without >>>breaking exception handling. In Zope, we rely on restricted execution to >> >>prevent >> >>>certian kinds of introspection on exceptions and exception classes. In > > Zope, > >>we >> >>>also don't proxy None, because None is usually checked for identity. We > > also > >>don't >> >>>proxy strings, and numbers. >>> >> >>That was a question I was asking myself about proxies: exception handling. >>But I never had the time to play with it to check. >> >>Does that mean that restricted code can get unproxied instances of classic >>classes as caught exceptions? > > > maybe the question was unclear, I think it was clear. > but it was serious, what I was asking is > whether some restricted code can do: > > try: > deliberate code to force exception > except Exception,e: > ... > > so that e is caught unproxied. Right. e is caught unproxied. > Looking at zope/security/_proxy.c it seems this > can be the case... Yes, > then to be (likely) on the safe side, all exception class definitions for > possible e classes: like e.g. > > class MyExc(Exception): > ... > > > ought to be executed _in restricted mode_, or be "trivial/empty": something > like > > class MyExc(Exception): > def __init__(self, msg): > self.message = msg > Exception.__init__(self, msg) > > def __str__(self): > return self.message > > is already too much rope. I'm not sure if you are saying that this examples is "trivial/empty" or not. It seems that yuo are saying that it is not trvial enough. If so, why? > Although it seems not to have the "nice" two-level-of-calls behavior of Bastion > instances, an unproxied instance of MyExc if MyExc was defined outside of > restricted execution, can be used to break out of restricted execution. How can it be used to break out of restricted execution? I see three risks: 1. The exception provides methods to do harmful things, such as create side effects or provide access to data outside the exception. 2. The exception creates data that needs to be protected. For example Zope uses a NotFoundError exception taht contains an object being searched. 3. The exception methods meta data provide access to module globals. Risk 1 needs to be mitigated through proper exception design. Exceptions need to be limited in what their methods do. This is a bit brittle, but all standard exceptions have this property. Risk 2 is mitigated by proxying exception instance data. Proxies can do this. This is what we've decided to do, although we haven't implemented it yet. Risk 3 is mitigated by restricted execution. Have I missed anything? Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Mon Mar 10 12:23:08 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 07:23:08 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: References: Message-ID: <3E6C83AC.7070100@zope.com> Ka-Ping Yee wrote: > Ben Laurie wrote: > >>BTW, if you would like to explain why you don't think bound methods are >>the way to go on python-dev, I'd love to hear it. > > > Guido van Rossum wrote: > >>Using capabilities, I would have to hand her >>a bunch of capabilities for various methods: __getitem__, has_key, >>get, keys, items, values, and many more. Using proxies I can simply >>give her a read-only proxy for the object. So proxies are more >>powerful. I'm pretty sure that Guido meant to say "bound method" rather than "capability" in the text above. I think that the debate is partly whether to express capabilities (or some other scheme) in terms of bound methods or proxies, which expose entire interfaces. > There seems to be a persistent confusion here that i would like > to dispel: a capability is not a single lambda. There are a bunch of confusions floating around. :) A major one is a concise definition os what a capability and why the capability approach is good or bad. In reading about capabilities in E, http://www.erights.org/. I really need to read all that stuff again. Of course, as others pointed out, I ended up creating something for Zope 3 that isn't capabilities. I think you touch on a reason below. > Guido's paragraph, above, seems to believe that it is. In fact, > the pattern he described is a common and powerful way of using > capabilities. A capability is just an unforgeable object reference. > In a pure capability system, the only thing you can do with a > capability is to call methods on it (or, if you prefer, all you > can do is send messages to it). Interposing an object to expose > only a subset of another object's API, such as a read-only subset, > is exactly the power capabilities give you. > > It seems to me that the "rexec vs. proxy" debate is really about > a very different question: How do we get from Python's currently > promiscuous objects to properly restricted objects? > > (Once we have properly restricted objects in either fashion, yes, > definitely, using proxies to restrict access is a great technique.) > > If i understand correctly, the "proxy" answer is "we create a > special wrapper object, then the programmer has to individually > wrap any object they want to be secure". And the "rexec" answer > is "we create an interpreter mode in which all objects are secure". > > I think the latter is far better. To have any sort of real chance > at establishing security, you have to start from a place where > everything is secure, instead of starting from a place where > everything is insecure and you have to individually secure every > single object with its own wrapper. > > The eventual ideal is to have a system where all objects are > "pure" objects (i.e. non-introspectable capabilities) by default. > Anyone wanting to do introspection would simply have to obtain > the "introspect" capability from a privileged place (e.g. sys). > For example, > > class Foo: > pass > > print Foo.__dict__ # fails > > from sys import introspect > print introspect(Foo).__dict__ # succeeds > > When running the interpreter in secure mode, "introspect" > would just be missing from the sys module (again, ideally > sys.introspect wouldn't exist by default, and a command-line > option would turn it on, but i realize that's far away). > > This would have the effect of the "introspectable flag" that > Guido mentioned, but without expending any storage at all, > until you actually needed to introspect something. You seem to be arguing that programmers should not have to explictly create capabilities, but that everythink should be a capability by default. Please correct me if I'm wrong. I thought that the main point of capabilities was that programmers *should* explictly bother to pass capabilities. Programmers should think about arguments passed to (or returned or raised to) other code as capabilities to do things and pass *just* the capabilities needed. I find a lot of appeal in this idea. Zope employs proxies in a way that falls somewhere between the extremes of capabilities and implicitly protecting everything. (I'm going to be a little sloppy hear for brevity. A Zope proxy is made up of two objects, a simple proxy that *could* be used to implement capabilities and a checker that provides policy. The policy we currently use in Zope is not a capability policy.) Zope security proxies assure that "everything" is proxied. (We choose not to proxy simple valies like numbers, strings, and None.) Values returned from operations on proxied. This maked it pretty straightforward to set up execution environments where untrusted code only has access to proxies. In addition, if untrusted code calls trusted code, the untrusted code can only pass proxies. This means that trusted code can't be tricked into performing operations that the untrusted code could not perform. Zope proxies achiev this level of automation by providing registries, mostly based on classes, that allow programmers to say how different kinds of objects should be proxied. Programmers decide what capabilities to expose at "compile" time (really program startup) rather than run time. Programmers *can* create proxies explicitly that provides non-default access. In fact, there are apis that actually provide the the equivalent of capabilities. I mention all of this because I think it's worth thinking/debating this issue about how explicit security should be. On the one hand, explictly giving *just* the capabilities needed for a task seems very appealing. OTOH, making sure that everything is protected by default is safer. I suspect that there are ways to combine (trade off?) these in reasonable ways. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From ben@algroup.co.uk Mon Mar 10 14:03:28 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Mon, 10 Mar 2003 14:03:28 +0000 Subject: [Python-Dev] Re: Capabilities In-Reply-To: References: Message-ID: <3E6C9B30.8030901@algroup.co.uk> Ka-Ping Yee wrote: > On Sat, 8 Mar 2003, Ben Laurie wrote: > >>>>c) Wrap or replace some of the existing libraries, certify that others >>>>are "safe" >>> >>>This should only be necessary for (core and 3rd party) extension >>>modules. The rexec module has a framework for this. >>> >>> >>>>It looks to me like a and b are shared with proxies, and c would be >>>>different, by definition. Is there anything else? Am I on the wrong track? >>> >>>I don't know why you think (c) is different. >> >>Because with proxies you'd wrap with proxies, and with capabilities >>you'd wrap with capabilities. Or do you think there's a way that would >>work for both (which would, of course, be great)? > > > This doesn't make any sense to me. The standard libraries would provide > proxy wrappers in either caes. The rexec vs. proxy issue doesn't enter > into it. We've got too much overloading here! I meant "proxy" as in "Zope proxy". Yes, in either case they'll be wrapped in some kind of (non-Zope) proxy, but the actual wrapper would be different. > By the way -- to avoid confusion between "proxies used to wrap > unrestricted objects in order to make them into secure objects" and > "proxies used to reduce the interface of an existing secure object", > let's call the first "proxy" (as has been used in the "rexec vs. proxy" > discussion so far), and call the second a "facet" (which is the term > commonly used when capabilities people talk about reducing an interface). > We often talk about providing, say, a "read-only facet" on an object. This would be more applicable to an object-based capability model, which Jim and Guido seem to favour. In fact, perhaps it would be nicest to be able to do both - i.e. bound methods _and_ opaque objects. Then we'd all be happy. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From pedronis@bluewin.ch Mon Mar 10 14:18:43 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Mon, 10 Mar 2003 15:18:43 +0100 Subject: [Python-Dev] Re: Capabilities References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> <05a701c2e66f$6c001be0$6d94fea9@newmexico> <3E6C7C3A.2090104@zope.com> Message-ID: <008001c2e70f$f514c520$6d94fea9@newmexico> From: "Jim Fulton" > How can it be used to break out of restricted execution? > > I see three risks: > > 1. The exception provides methods to do harmful things, > such as create side effects or provide access to data outside > the exception. > > 2. The exception creates data that needs to be protected. For example > Zope uses a NotFoundError exception taht contains an object being searched. > > 3. The exception methods meta data provide access to module globals. > > Risk 1 needs to be mitigated through proper exception design. Exceptions > need to be limited in what their methods do. This is a bit brittle, but > all standard exceptions have this property. > > Risk 2 is mitigated by proxying exception instance data. Proxies can do this. > This is what we've decided to do, although we haven't implemented it yet. > > Risk 3 is mitigated by restricted execution. > > Have I missed anything? OK, I have had the time to really try what I was thinking about. I have not found a way to really break out from restricted execution (does not mean I'm sure there isn't) BUT: I'm considering: - Python 2.2.2 - Zope 3 3.0a1 and zope.security.interpreter.RestrictedInterpreter with zope.security.simplepolicies.ParanoidSecurityPolicy (the default) so 1. a bug (rexec had it too). If I remember correctly the solution is re-injecting __builtins__ before each exec C:\transit\Zope3-3.0a1\src\zope\security>\usr\python22\python Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.append('..\..') >>> from zope.security.interpreter import RestrictedInterpreter >>> ri=RestrictedInterpreter() >>> ri.ri_exec("class A: pass") >>> ri.ri_exec("print A.__dict__") Traceback (most recent call last): File "", line 1, in ? File "..\..\zope\security\interpreter.py", line 32, in ri_exec exec code in self.globals File "", line 1, in ? RuntimeError: class.__dict__ not accessible in restricted mode >>> ri.ri_exec("del __builtins__") >>> ri.ri_exec("print A.__dict__") {'__module__': '__builtin__', '__doc__': None} or be sure to call ri_exec only once on each RestrictedInterpreter instance. Assuming that fixed: 2. If code executed under a RestrictedInterpreter could obtain a MyExc instance and had a working unproxied/non-proxying 'property' built-in, it could very likely break out from restricted execution. Fortunately the 'property' passed to such code is not working. Given that that's not the case I skip the illustration. 3. How much this scenario is likely really depend on how RestrictedInterpreter is used, how and where exceptions are defined, if really restricted code can manage to get an instance of one of them ... if further restrictions e.g. on subclassing are added or removed ... if the general situation of restricted execution and new-style classes improve. All of this I don't know. Here I consider: a "dangerous" function ('sys.exit') is imported in the same module where MyExc is defined, MyExc is not defined under restricted execution, a proxied function is passed to restricted code such that it can capture an instance of MyExc (as I said whether this set of things is likely/unlikely I don't know): import sys from sys import exit # !!! same module as MyExc sys.path.append('C:/transit/Zope3-3.0a1/src') from zope.security.interpreter import RestrictedInterpreter from zope.security.checker import ProxyFactory class MyExc(Exception): # !!! definition outside of resticted execution def __init__(self,msg): self.message = msg Exception.__init__(self,msg) def __str__(self): return self.message def myfunc(): raise MyExc('foo') ri = RestrictedInterpreter() ri.globals['myfunc'] = ProxyFactory(myfunc) f = open('c:/Documenti/x.txt','r') code = f.read() f.close() ri.ri_exec(code) print "OK" Anyway I have a _very baroque_ x.txt that manages to call sys.exit. regards From jim@zope.com Mon Mar 10 15:29:41 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 10:29:41 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: <200303092242.h29MgiJ16667@oma.cosc.canterbury.ac.nz> References: <200303092242.h29MgiJ16667@oma.cosc.canterbury.ac.nz> Message-ID: <3E6CAF65.4040505@zope.com> Greg Ewing wrote: >>E.g. I might have a service configuration registry object. The object >>behaves roughly like a dictionary. A certain user may be given >>read-only access to the registry. > > > Maybe every Python object should have a flag which > can be set to prevent introspection -- like the current > restricted execution mechanism, but on a per-object > basis. Then any object could be used as a capability. Yes, but not a very useful one. For example, given a file, you often want to create a "file read" capability which is an object that allows reading the file but not writing the file. Just preventing introspection isn't enough. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From ben@algroup.co.uk Mon Mar 10 15:30:13 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Mon, 10 Mar 2003 15:30:13 +0000 Subject: [Python-Dev] Capabilities In-Reply-To: References: Message-ID: <3E6CAF85.30300@algroup.co.uk> Ka-Ping Yee wrote: > Ben Laurie wrote: > >>BTW, if you would like to explain why you don't think bound methods are >>the way to go on python-dev, I'd love to hear it. > > > Guido van Rossum wrote: > >>Using capabilities, I would have to hand her >>a bunch of capabilities for various methods: __getitem__, has_key, >>get, keys, items, values, and many more. Using proxies I can simply >>give her a read-only proxy for the object. So proxies are more >>powerful. > > > There seems to be a persistent confusion here that i would like > to dispel: a capability is not a single lambda. > > Guido's paragraph, above, seems to believe that it is. In fact, > the pattern he described is a common and powerful way of using > capabilities. A capability is just an unforgeable object reference. > In a pure capability system, the only thing you can do with a > capability is to call methods on it (or, if you prefer, all you > can do is send messages to it). Interposing an object to expose > only a subset of another object's API, such as a read-only subset, > is exactly the power capabilities give you. I think this is an implementation detail, as I have mentioned before. A capability is a thing with certain properties, as discussed ad nauseam. You can implement them using bound methods or using opaque objects. Personally, I'd like to do both, but if I had to choose, I'd use bound methods. Yes, this probably is a shift in position - I'm still trying to figure this stuff out, is my excuse! Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From guido@python.org Mon Mar 10 15:38:28 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Mar 2003 10:38:28 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Your message of "Mon, 10 Mar 2003 04:51:40 CST." References: Message-ID: <200303101538.h2AFcTR12087@odiug.zope.com> > Ben Laurie wrote: > > BTW, if you would like to explain why you don't think bound methods are > > the way to go on python-dev, I'd love to hear it. > > Guido van Rossum wrote: > > Using capabilities, I would have to hand her > > a bunch of capabilities for various methods: __getitem__, has_key, > > get, keys, items, values, and many more. Using proxies I can simply > > give her a read-only proxy for the object. So proxies are more > > powerful. (Jim surmised that I meant to write "bound methods". Alas, I don't get off that easily: at the time I wrote that I really did think that a capability had to be a single function.) [Ping] > There seems to be a persistent confusion here that i would like > to dispel: a capability is not a single lambda. I guess, I misunderstood.. I was sure that Ben told me this was so. Apparently I misread, or you have a different definition of capability than he does (wouldn't be the first time.) > Guido's paragraph, above, seems to believe that it is. In fact, > the pattern he described is a common and powerful way of using > capabilities. A capability is just an unforgeable object reference. > In a pure capability system, the only thing you can do with a > capability is to call methods on it (or, if you prefer, all you > can do is send messages to it). Interposing an object to expose > only a subset of another object's API, such as a read-only subset, > is exactly the power capabilities give you. So a proxy with a fixed (not depending on the caller) policy about which methods you can should be considered as equivalent to a capability -- in fact this would be a way to implement capabilities. > It seems to me that the "rexec vs. proxy" debate is really about > a very different question: How do we get from Python's currently > promiscuous objects to properly restricted objects? > > (Once we have properly restricted objects in either fashion, yes, > definitely, using proxies to restrict access is a great technique.) > > If i understand correctly, the "proxy" answer is "we create a > special wrapper object, then the programmer has to individually > wrap any object they want to be secure". And the "rexec" answer > is "we create an interpreter mode in which all objects are secure". Well, actually, restricted execution as currently implemented does *not* strive to make all objects secure: untrusted code can still inspect all attributes of an object unless that object is proxied by a Bastion, or unless that object is one of a few built-in types (e.g. bound methods) for which some attributes are privatized. > I think the latter is far better. To have any sort of real chance > at establishing security, you have to start from a place where > everything is secure, instead of starting from a place where > everything is insecure and you have to individually secure every > single object with its own wrapper. But we don't have the latter. > The eventual ideal is to have a system where all objects are > "pure" objects (i.e. non-introspectable capabilities) by default. That wouldn't be Python. > Anyone wanting to do introspection would simply have to obtain > the "introspect" capability from a privileged place (e.g. sys). > For example, > > class Foo: > pass > > print Foo.__dict__ # fails > > from sys import introspect > print introspect(Foo).__dict__ # succeeds > > When running the interpreter in secure mode, "introspect" > would just be missing from the sys module (again, ideally > sys.introspect wouldn't exist by default, and a command-line > option would turn it on, but i realize that's far away). > > This would have the effect of the "introspectable flag" that > Guido mentioned, but without expending any storage at all, > until you actually needed to introspect something. That flag wasn't my idea, it was some one else's (Greg Ewing?). --Guido van Rossum (home page: http://www.python.org/~guido/) From jim@zope.com Mon Mar 10 15:34:38 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 10:34:38 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <1047150320.2347.26.camel@localhost.localdomain> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> Message-ID: <3E6CB08E.4030905@zope.com> Jeremy Hylton wrote: ... > I think both techniques achieve the same end, but with different > limitations. I prefer the proxy approach because it is more self > contained. The rexec approach requires that all developers working in > the core on introspection features be aware of security issues. The > security kernel ends up being most of the core interpreter -- anything > that can introspection on objects. I think that there is an important corrolary. Changes to the security policy are very hard to make. For example, if we change our mind about what should be safe or not: we have many places to make the change, we have lot's of tests to redo. people have to reinstall or rebuild Python to get the change. With proxies, the update is provides as fairly small and self-contained library update. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Mon Mar 10 15:40:08 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 10:40:08 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: References: Message-ID: <3E6CB1D8.4050108@zope.com> Ka-Ping Yee wrote: > On Sun, 9 Mar 2003, Guido van Rossum wrote: > >>- Which attributes are considered introspective? > > > Here's a preliminary description of the boundary between "introspective" > and "restricted", off the top of my head: > > 1. The only thing you can do with a bound method is to call it > (bound methods have no attributes except __doc__). Well, I see no harm and much usefulness in allowing __name__, __repr__, and __str__. > 2. The following instance attributes are off limits: > __class__, __dict__, __module__. > > That might be a reasonable start. I generally want to be able to get the __class__. This is harmless in my case, because I get a proxy back. > However, there is still the problem that the established technique > for storing instance-specific state in Python is to use globally- > accessible data attributes instead of a limited scope. We would > also need to add a safe (private) place for instances to put state. I'm don't understand why this is necessary. In general, you want to restrict what attributes (data, properties, methods, etc.) are accessible in certain situations. I don't follow what makes data attributes special. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pedronis@bluewin.ch Mon Mar 10 15:44:14 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Mon, 10 Mar 2003 16:44:14 +0100 Subject: [Python-Dev] Capabilities References: Message-ID: <025c01c2e71b$e7a3cc40$6d94fea9@newmexico> From: "Ka-Ping Yee" > However, there is still the problem that the established technique > for storing instance-specific state in Python is to use globally- > accessible data attributes instead of a limited scope. We would > also need to add a safe (private) place for instances to put state. Indeed, that's the fact that implementations of methods are normal functions that access the instance attributes like everything else do, that's why Zope-proxies become necessary (and a bit brittle): class A: def geta(self): return self.a # 1 a=A() a.a # 2 (1) and (2) are using the same operation/execution path. The other issue, as you wrote, is also that introspection operations are like normal operations too (and they share the same execution path also): a.__dict__ vs. introspect(a).__dict__ The problem is that there is obviously a flexibility/easy-of-use trade-off. Python is a language that maximizes that and where e.g. introspection feels easy and natural, OTOH analyzing security become nightmarish. regards. From guido@python.org Mon Mar 10 15:47:53 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Mar 2003 10:47:53 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Your message of "Mon, 10 Mar 2003 11:02:26 GMT." <3E6C70C2.2010104@algroup.co.uk> References: <200303092242.h29MgiJ16667@oma.cosc.canterbury.ac.nz> <200303100010.h2A0ApY06031@pcp02138704pcs.reston01.va.comcast.net> <3E6C70C2.2010104@algroup.co.uk> Message-ID: <200303101548.h2AFm0212138@odiug.zope.com> [Someone else] > >>Maybe every Python object should have a flag which > >>can be set to prevent introspection -- like the current > >>restricted execution mechanism, but on a per-object > >>basis. Then any object could be used as a capability. > Guido van Rossum wrote: > > I think the capability folks would object to calling it a capability > > though. :-) [Ben] > No, objects are another way to do it, though it seems to me with > somewhat less ease - because the most common use of capabilities is to > restrict the type of access to objects other objects have, so you'd need > to have multiple objects proxying to the "real" one if you do it at the > object level. I'm not sure I understand. Do you mean that because there may be several security levels you'd need different capabilities for an object for each level? Since there are also several methods, you end up managing multiple capabilities in either case. Anyway, Zope security proxies aren't "managed" this way. The trusted code doesn't have a set of objects representing capabilities that it hands out -- a proxy is manufactured freshly on each use. I wonder if this might be one cause of repeated misunderstandings? > If we were going to go this route, I'd like the alternative of _also_ > being able to set the flag on a bound method. > > > Two questions: > > > > - Where to store the flag? It probably would cost 4 bytes per object. > > You can swap space for time by storing it as an attribute, of course. Not all Python objects have a dict where to store arbitrary attributes. And even if they do, that's about the most expensive way to store a flag. And you'd have to worry about someone getting a hold of that dict and deleting the attribute (assuming that the flag defaults to allow introspection, otherwise no Python code written today would continue to work). > > - Which attributes are considered introspective? > > All of them, except methods. That's not very Pythonic. > Of course, this is what my first approximation to capabilities did > (that's what a "capclass" was). I never knew what a capclass was. I don't think you ever explained it so clearly ("doesn't allow access to non-method attributes") before. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Mar 10 15:53:14 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Mar 2003 10:53:14 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Your message of "Mon, 10 Mar 2003 05:14:24 CST." References: Message-ID: <200303101553.h2AFrE312165@odiug.zope.com> > On Sun, 9 Mar 2003, Guido van Rossum wrote: > > - Which attributes are considered introspective? > > Here's a preliminary description of the boundary between "introspective" > and "restricted", off the top of my head: > > 1. The only thing you can do with a bound method is to call it > (bound methods have no attributes except __doc__). Plus __repr__ and __str__. And if they have attributes at all they have __getattribute__. And if they are callable they have __call__. > 2. The following instance attributes are off limits: > __class__, __dict__, __module__. > > That might be a reasonable start. Not sure. Classic rexec disallowed these (and a few more), but the problem with disallowing __dict__ of an instance was that this made it impossible for untrusted code to use certain coding patterns like overriding __setattr__. > However, there is still the problem that the established technique > for storing instance-specific state in Python is to use globally- > accessible data attributes instead of a limited scope. We would > also need to add a safe (private) place for instances to put state. I wonder if we could write special descriptors for this? The problem as I see it is that the interpreter doesn't keep track of whether a particular function is part of a class definition or not, so there's no way to tell whether it should have access to private data or not. Proxies get around this, but with the stated disadvantages. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Mar 10 15:40:30 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Mar 2003 10:40:30 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: Your message of "Mon, 10 Mar 2003 04:55:42 CST." References: Message-ID: <200303101540.h2AFeW012107@odiug.zope.com> [Ping] > By the way -- to avoid confusion between "proxies used to wrap > unrestricted objects in order to make them into secure objects" and > "proxies used to reduce the interface of an existing secure object", > let's call the first "proxy" (as has been used in the "rexec vs. proxy" > discussion so far), and call the second a "facet" (which is the term > commonly used when capabilities people talk about reducing an interface). > We often talk about providing, say, a "read-only facet" on an object. Hm, I'm not sure I understand the difference between the two definitions you give. What does "making something into a secure object" mean if not "reducing its interface"? And what is the fundamental difference between a secure object and an insecure one? In my world view there's a gradual difference. The only truly secure object is None. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Mar 10 16:12:37 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Mar 2003 11:12:37 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: Your message of "Mon, 10 Mar 2003 06:31:16 EST." <3E6C7784.5060103@zope.com> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> <3E68AAF4.3060508@algroup.co.uk> <3E6B258B.2080207@zope.com> <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> <3E6C7784.5060103@zope.com> Message-ID: <200303101612.h2AGCdV12210@odiug.zope.com> [Jim] > > >You don't need restricted execution to make proxies work. [Guido] > > Um, I think that's a dangerous mistake, or a confusion in terminology. [Jim] > All I'm saying is that the proxy mechanism itself doesn't rely on > restricted execution. > > > Without restricted execution, untrusted code would have access to > > sys.modules, and from there it would be able to access > > removeAllProxies. > > All we need to be able to do is control imports. It turns out that > to prevent access to sys.modules, we have to replace __builtins__, > which has the side-effect of enabling restricted execution. You > don't need anything but the ability to restrict imports and other > unproxied access to sys.modules to use proxies. Turns out this was another terminology misunderstanding. I think of the ability to overload __import__ and set __builtins__ as part of the restricted execution implementation, because that's why they were implemented. Jim thought that these were separate features, and that restricted execution in the interpreter only referred to the closing off of some introspection attributes (e.g. im_self, __dict__ and func_globals). --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@zope.com Mon Mar 10 16:59:26 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 10 Mar 2003 11:59:26 -0500 Subject: [Python-Dev] Capabilities in Python In-Reply-To: <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> References: <15930.48758.62473.425111@slothrop.zope.com> <15933.30607.900530.370402@localhost.localdomain> <3E635BD3.9000107@algroup.co.uk> <1046981657.15348.80.camel@slothrop.zope.com> <3E68AAF4.3060508@algroup.co.uk> <3E6B258B.2080207@zope.com> <200303091203.h29C3Iu04731@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1047315566.15066.7.camel@slothrop.zope.com> On Sun, 2003-03-09 at 07:03, Guido van Rossum wrote: > [Jim] > > You don't need restricted execution to make proxies work. > > Um, I think that's a dangerous mistake, or a confusion in terminology. > > Without restricted execution, untrusted code would have access to > sys.modules, and from there it would be able to access > removeAllProxies. Guido and I discovered that we were not using the same terminology in our own discussions. Guido suggests the following terms: rexec -- the rexec module in the Python standard library restricted execution -- the features in the Python code depending on PyEval_GetRestricted(). We still need a term to refer to an arbitrary mechanism for providing a secure environment for untrusted code. (I had been using "restricted execution" to mean this.) Perhaps a "safe interpreter"? Jeremy From jim@zope.com Mon Mar 10 17:10:29 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 12:10:29 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: <025c01c2e71b$e7a3cc40$6d94fea9@newmexico> References: <025c01c2e71b$e7a3cc40$6d94fea9@newmexico> Message-ID: <3E6CC705.7000901@zope.com> Samuele Pedroni wrote: > From: "Ka-Ping Yee" > >>However, there is still the problem that the established technique >>for storing instance-specific state in Python is to use globally- >>accessible data attributes instead of a limited scope. We would >>also need to add a safe (private) place for instances to put state. > > > Indeed, that's the fact that implementations of methods are normal functions > that access the instance attributes like everything else do, > that's why Zope-proxies become necessary (and a bit brittle): > > class A: > def geta(self): > return self.a # 1 > > a=A() > > a.a # 2 > > (1) and (2) are using the same operation/execution path. This points out a nice feature of zope proxies. The proxied object's methods are called with an unproxied self, so you can easily allow access to the object's methods without providing access to other attributes. Or, equivalently, you can provide access to one set of methods and those methods can use other methods that you don't provide access to. Could you explain why you say that zope proxies are brittle? Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jeremy@zope.com Mon Mar 10 17:18:29 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 10 Mar 2003 12:18:29 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: References: Message-ID: <1047316709.15064.22.camel@slothrop.zope.com> On Sun, 2003-03-09 at 07:40, Zooko wrote: > 3. A standard library that follows the Principle of Least Privilege. That is, > a library full of tools that you can extend to an object in order to empower it > to do specific things (e.g. __builtin__.abs(), os.times(), ...) without thereby > also empowering it to do other things (e.g. __builtin__.file(), os.system(), > ...). Python doesn't have such a library. > > Now the Principle of Least Privilege approach to making a library safe is very > different from the "sandbox" approach. The latter is to remove all "dangerous" > tools from the toolbox (or in our case, to have them dynamically disabled by the > "restricted" bit which is determined by an overridable policy). The former is > to separate the tools so that dangerous ones don't come tied together with > common ones. The security policy, then, is expressed by code that grants or > withholds capabilities (== references) rather than by code that toggles the > "restricted" bit. > > Of course, you can start by denying the entire standard library to restricted > code, and then incrementally refactor the library or wrap it in Least-Privilege > wrappers. > > Until you have a substantial Least-Privilege-respecting library you can't gain > the big benefit of capabilities -- code which is capable of doing something > useful without also being capable of doing harm. (You can gain the "sandbox" > style of security -- code which is incapable of doing anything useful or > harmful.) If you need to rewrite all the libraries to be capability-aware, then you need to trust everyone who writes library code to understand capabilities and be thorough enough to get them right. I think this exacerbates the current problem of restricted execution in Python: The responsibility for making the system secure is spread through the interpreter. To do an audit to convince yourself the system is secure, you have to look at a lot of the interpreter. I don't see how it would help to add the standard library to the mix. It seems like we have a conflict between two design principles -- economy of mechanism and least privelege. > P.S. I learned this three-part paradigm from Mark Miller whose paper with Chip > Morningstar and Bill Frantz articulates it in more detail: > > http://www.erights.org/elib/capability/ode/ode-capabilities.html#patt-coop I don't see the part of this paper that talks about library design :-). I assume that it's the first section "Only Connectivity Begets Connectivity." But I don't know if I understand how that applies to library design in concrete terms. Jeremy From jeremy@alum.mit.edu Mon Mar 10 17:26:26 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 10 Mar 2003 12:26:26 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <05a701c2e66f$6c001be0$6d94fea9@newmexico> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> <05a701c2e66f$6c001be0$6d94fea9@newmexico> Message-ID: <1047317185.15066.29.camel@slothrop.zope.com> On Sun, 2003-03-09 at 14:09, Samuele Pedroni wrote: > maybe the question was unclear, but it was serious, what I was asking is > whether some restricted code can do: > > try: > deliberate code to force exception > except Exception,e: > ... > > so that e is caught unproxied. Looking at zope/security/_proxy.c it seems this > can be the case... > > then to be (likely) on the safe side, all exception class definitions for > possible e classes: like e.g. > > class MyExc(Exception): > ... > > > ought to be executed _in restricted mode_, or be "trivial/empty": something > like > > class MyExc(Exception): > def __init__(self, msg): > self.message = msg > Exception.__init__(self, msg) > > def __str__(self): > return self.message > > is already too much rope. > > Although it seems not to have the "nice" two-level-of-calls behavior of Bastion > instances, an unproxied instance of MyExc if MyExc was defined outside of > restricted execution, can be used to break out of restricted execution. Exceptions do seem like a problem. If the exception objects are defined in the safe interpreter, then untrusted code that catches an exception can't follow references to an unsafe interpreter. But it can modify the exception objects and classes, which has the potential to cause a lot of problems. It also complicates the design of systems that want to run untrusted code, because they must be very careful never to pass trusted exception instances to untrusted code. It seems like it would be nice if proxies could be used as exceptions, so that there was a simple mechanism to enforce protection. Jeremy From jeremy@zope.com Mon Mar 10 17:36:22 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 10 Mar 2003 12:36:22 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: References: Message-ID: <1047317782.15066.38.camel@slothrop.zope.com> On Mon, 2003-03-10 at 05:51, Ka-Ping Yee wrote: > It seems to me that the "rexec vs. proxy" debate is really about > a very different question: How do we get from Python's currently > promiscuous objects to properly restricted objects? I think that's the right question. > (Once we have properly restricted objects in either fashion, yes, > definitely, using proxies to restrict access is a great technique.) > > If i understand correctly, the "proxy" answer is "we create a > special wrapper object, then the programmer has to individually > wrap any object they want to be secure". And the "rexec" answer > is "we create an interpreter mode in which all objects are secure". The proxy answer is a bit more complex. Any object returned from a proxy is itself wrapped in a proxy, except for immutable objects like None, ints, and strings. The initial proxy creates a barrier between the code that created the proxy and the client that uses the proxy. > I think the latter is far better. To have any sort of real chance > at establishing security, you have to start from a place where > everything is secure, instead of starting from a place where > everything is insecure and you have to individually secure every > single object with its own wrapper. It would indeed be impractical to wrap every object manually. I think both approaches tend towards the design principle of fail-safe defaults and complete mediation. A proxy mediates all access to the object it wraps. By default, it allows no access. When it allows access, it creates new proxies that provide the same facilities as the original. The one exception is for immutable objects. (Immutability is good for so many reasons.) > The eventual ideal is to have a system where all objects are > "pure" objects (i.e. non-introspectable capabilities) by default. > Anyone wanting to do introspection would simply have to obtain > the "introspect" capability from a privileged place (e.g. sys). > For example, > > class Foo: > pass > > print Foo.__dict__ # fails > > from sys import introspect > print introspect(Foo).__dict__ # succeeds > > When running the interpreter in secure mode, "introspect" > would just be missing from the sys module (again, ideally > sys.introspect wouldn't exist by default, and a command-line > option would turn it on, but i realize that's far away). > > This would have the effect of the "introspectable flag" that > Guido mentioned, but without expending any storage at all, > until you actually needed to introspect something. If Python's introspection were less ad hoc, I suppose this issue would be easier to tackle. (Has anyone done security design for a CLOS-style meta-object protocol?) Note that the biggest problem with the introspectable flag is that it would need to be checked all over the interpreter internals. For example, the interpreter optimisizes bound method calls by extracting the im_self and im_func and calling im_func directly passing im_self and the rest of the arguments. This is all done within the mainloop using a single type check and a bunch of macros to extract fields from the bound method. It is pretty common to use macros that depend on the representation of builtin types like functions, methods, dictionaries, etc. Jeremy From zooko@zooko.com Mon Mar 10 18:24:11 2003 From: zooko@zooko.com (Zooko) Date: Mon, 10 Mar 2003 13:24:11 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: Message from Jeremy Hylton of "10 Mar 2003 12:26:26 EST." <1047317185.15066.29.camel@slothrop.zope.com> References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> <05a701c2e66f$6c001be0$6d94fea9@newmexico> <1047317185.15066.29.camel@slothrop.zope.com> Message-ID: Jeremy Hylton wrote: > > Exceptions do seem like a problem. This reminds me of a similar problem. Object A is a powerful object. Object B is a proxy for A which passes through only a subset of A's methods. So B is suitable to give to Object C, which should be able to use the B subset but not the full A set. The problem is if the B subset of methods includes a callback idiom, in which Object A calls a function provided by its client and passes a reference to itself as an argument. class A: def register_event_handler(self, handler): self.handlers.append(handler) def process_events(self): # ... for handler in self.handlers: handler(self) This allows C full access to object A's methods if C has access to the register_event_handler() method. (Even if A has private data and even if there is no flaw in the proxy or capability enforcement that prevents C from getting access to A through B.) So the designer of the B proxy has to not only exclude dangerous methods of A, but also has to either exclude methods that lead to this kind of callback, or else make B a two-faced proxy that registers itself instead of C as the handler, forwards the callback, and passes a reference to itself instead of to A in the callback. Regards, Zooko From pedronis@bluewin.ch Mon Mar 10 18:53:08 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Mon, 10 Mar 2003 19:53:08 +0100 Subject: [Python-Dev] Capabilities References: <025c01c2e71b$e7a3cc40$6d94fea9@newmexico> <3E6CC705.7000901@zope.com> Message-ID: <067a01c2e736$4af64920$6d94fea9@newmexico> From: "Jim Fulton" > > Could you explain why you say that zope proxies are brittle? > from my small experience playing with RestrictedIntepreter: you wrap into proxies a lot of builtins: *) 'object' for example, then class C(object): ... does not work but given that some basic types are left alone, one can use Type = ''.__class__.__class__ class C: __metaclass__ = Type *) iter seems not to work (deliberate decision or bug?) *) proxied 'property' is unusable *) built-in functions return proxies even if the argument were unproxied: _12 = map(None,[1,2]) class A: pass a = A() a.a = [1,2] _12 = getattr(a,'a') in both cases with the proxied version of map and getattr the result _12 would be a proxied list. deliberate safer-side decisions? I can see it both ways: - see other mail - map(None,[obj])[0] becomes a way to get a a proxied version of obj that can be passed to code that would maybe unwrap it and believe that is some other legit object. regards. From zooko@zooko.com Mon Mar 10 20:11:04 2003 From: zooko@zooko.com (Zooko) Date: Mon, 10 Mar 2003 15:11:04 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: Message from Jeremy Hylton of "10 Mar 2003 12:18:29 EST." <1047316709.15064.22.camel@slothrop.zope.com> References: <1047316709.15064.22.camel@slothrop.zope.com> Message-ID: (I, Zooko, wrote the lines prepended with "> > ".) Jeremy Hylton wrote: > > > Until you have a substantial Least-Privilege-respecting library you can't gain > > the big benefit of capabilities -- code which is capable of doing something > > useful without also being capable of doing harm. (You can gain the "sandbox" > > style of security -- code which is incapable of doing anything useful or > > harmful.) > > If you need to rewrite all the libraries to be capability-aware, then > you need to trust everyone who writes library code to understand > capabilities and be thorough enough to get them right. With capabilities, as with any other security regime, you can execute code while denying it access to any of the standard libraries. However if you want to provide code access to some of the standard library's privileges without providing access to all of them, then you in any possible security regime need (a) some way to express which privileges it gets and which it doesn't, with sufficiently fine granularity that you can grant the privileges you want while excluding those you must, and (b) when actually executing the code you have to choose which specific privileges to extend. In a capability secure language the first step, (a) is done by the language designer. Then the library designer provides a library of bundles of privileges, and then (b) a programmer executes the code, passing to that code all and only those privileges which he wants that code to have. The library designer's job is actually pretty easy -- just: 1. try to make privileges which are likely to be wanted separately conveniently separable and 2. try to make privileges which are likely to be wanted together conveniently bundled. If the library designers err on either side, the application programmer can patch it up. For example, suppose the library designer made it so that a single object, the "os" object, contained both the "os.system()" method and the "os.times()" method, and the programmer wants to extend the ability to get a timestamp without extending the ability to invoke arbitrary commands. (Note: I'm aware that os is a module and not an object, but for now I want to think of it as an object to be passed by reference instead of as a modules to be "import"'ed. If we continue along the cap-Python path we'll have to come back to this.) So the programmer just defines a proxy: class osproxy: def __init__(self, os): self.os=os def times(self): return self.os.times() and gives an instance of osproxy instead of the os object itself. (In practice, when it is only a single method, you would of course prefer to just pass the method itself. The proxy pattern is more general.) If the library designer has erred on the other side, making separate objects for each of a dozen different related and innocuous functions, the programmer will very likely define one object which contains all of those functions and pass a reference to that object where he would have had to pass a dozen references to a dozen functions. I may have made too big a deal about this originally. I just spent a few minutes browsing through modindex.html (parts of which I am already intimately familiar with), and nothing jumped out at me as needing to be wrapped or refactored before it could be used in a cap-Python. Perhaps the Python Standard Library's natural modularity has already gotten us most of the way there. > > http://www.erights.org/elib/capability/ode/ode-capabilities.html#patt-coop > > I don't see the part of this paper that talks about library design :-). > I assume that it's the first section "Only Connectivity Begets > Connectivity." But I don't know if I understand how that applies to > library design in concrete terms. No, "Only Connectivity Begets Connectivity" is just the "pointer-safety" requirement -- that one can't get a reference to an object, except by either (a) creating the object, or (b) getting the reference from some other object which already had the reference. Hm. Yes, that page doesn't really talk about library design. The authors of E performed a project [1] for DARPA in which they implemented a web browser which could host pluggable renderers, such that a malicious renderer was constrained in the damage it could do. (I have no idea what DARPA wants with such a thing. ;-)) The security review team at the conclusion of the project (which included great cryptographer David Wagner) wrote [2] that E appeared to have advanced the state of the art without breaking a sweat. The security flaws that they uncovered were mostly due to insufficient wrapping of the Java standard libraries. For example, the E folks had allowed an object to access a Java "File" object so that it could access a single file, without realizing that the Java File object has a "getParentFile()" method which returns the parent directory. That was why I made such a big deal about the importance of a secure standard library in my previous message. (As you know, Python's file objects don't have a "getParentFile()" method, so we're already one step ahead of Java there...) Regards, Zooko [1] http://www.combex.com/tech/darpaBrowser.html [2] http://www.combex.com/papers/darpa-review/index.html From greg@cosc.canterbury.ac.nz Mon Mar 10 20:23:10 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 11 Mar 2003 09:23:10 +1300 (NZDT) Subject: [Python-Dev] Capabilities In-Reply-To: Message-ID: <200303102023.h2AKNAw23873@oma.cosc.canterbury.ac.nz> Ka-Ping Yee : > The eventual ideal is to have a system where all objects are > "pure" objects (i.e. non-introspectable capabilities) by default. Perhaps it would be useful to distinguish between what might be called "read-only" introspection, and more powerful forms of introspection. Usually it doesn't do any harm to be able to find out things like what class an object belongs to and what methods it supports, so perhaps these kinds of introspections don't need to be restricted by default. But more intrusive things like reading/writing arbitrary attributes or calling arbitrary methods would require special permission. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Mar 10 20:59:59 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 11 Mar 2003 09:59:59 +1300 (NZDT) Subject: [Python-Dev] Capabilities In-Reply-To: <200303101538.h2AFcTR12087@odiug.zope.com> Message-ID: <200303102059.h2AKxxR24396@oma.cosc.canterbury.ac.nz> Guido: > That flag wasn't my idea, it was some one else's (Greg Ewing?). Yes, it was my idea. I was thinking that there was already a word of flags in the object struct that might have some room left, but I may have been thinking of type objects. I'm not sure it's such a good idea now anyway. As has been pointed out, you'd still need proxies of some kind to restrict interfaces. It would just mean you'd be able to build your proxy out of any suitable type of object. The other idea was that trusted code would be able to set the flag on all the objects that it passed to untrusted code, instead of having to proxy them all. But, as has also been pointed out, that's a rather brittle way to enforce security. I think I agree that to really get on top of this security business we need to move towards having dangerous things forbidden by default rather than allowed by default. To that end, it would be useful if we could pin down exactly what's dangerous and what isn't. It seems to me that most uses of introspection by most programs are harmless. Can we sort out those (hopefully few) things that are dangerous, and separate them from the existing introspection mechanisms? Access to sys.modules has been mentioned as a key thing that needs to be restricted. Maybe this shouldn't be an arbitrarily-accessible variable? Maybe the sys module shouldn't be a module at all, but some special object that won't let you do nasty things with its contents unless you've got special privileges (which most code would *not* have by default). One of the "nasty" things would be picking the real __builtins__ out of sys.modules. Are there any others? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From zooko@zooko.com Mon Mar 10 21:15:18 2003 From: zooko@zooko.com (Zooko) Date: Mon, 10 Mar 2003 16:15:18 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: Message from Guido van Rossum of "Sun, 09 Mar 2003 20:10:36 EST." <200303100110.h2A1AaC06743@pcp02138704pcs.reston01.va.comcast.net> References: <200303100110.h2A1AaC06743@pcp02138704pcs.reston01.va.comcast.net> Message-ID: (I, Zooko, wrote the lines prepended with "> > ".) Guido wrote: > > > The [Principle-of-Least-Privilege approach to securing a standard library] > > is to separate the tools so that dangerous ones don't come tied together > > with common ones. The security policy, then, is expressed by code that > > grants or withholds capabilities (== references) rather than by code that > > toggles the "restricted" bit. > > This sounds interesting, but I'm not sure I follow it. Can you > elaborate by giving a couple of examples? First let me say that "capability access control" [1] is a theoretical construct, comparable to "access control lists" [2] and "Trust Management" [3]. Each is a formal model for specifying access control rules -- who is allowed to do what. But in the context of Python we are interested not only in the theoretical model but also in a specific way of implementing it -- by making object references unforgeable and binding all authorities to object references. So in this discussion it may not be clear whether a claimed advantage of "capabilities" flows from the formal model or from the practice of unifying security programming with object oriented programming. I don't think it is important to differentiate in this discussion. Now for examples... Hm, well first of all, where are rexec and Zope proxies currently used? I believe that a "cap-Python" would support those uses, implementing the same security policies, but more cleanly since access control would be a first-class part of the language. I don't know Zope very well, and rather than guess, I'd like to ask someone who does know Zope to give a typical example of how proxies are used in workaday Zope. I suspect that capabilities are quite similar to Zope proxies. Now for a quick made-up example to demonstrate what I meant about expressing security policy above, consider a tic-tac-toe game that is supposed to draw to the screen. In "restricted Python v1", certain modules have been flagged as "safe" and others "unsafe". Code can execute other code with a "restricted" flag set, something like this: # restricted Python v1 game = eval(TicTacToeGame, restricted=True) game.display() Unfortunately, in "restricted Python v1", all of the modules that allow drawing to the screen are marked as "unsafe", so the tic-tac-toe-game immediately dies with an exception. In "restricted Python v2", an arbitrary security policy can be implemented: # restricted Python v2 games=[] def securitypolicy(subject, action, object): if ((subject in games) and (action == "import") and (object == "wxPython")) or (subject in games) and (action == "execute") and (object == "wxPython.Window") or (subject in games) and (action == "execute") and (object == "wxPython.Window.paint")): return True # ... return False game = eval(TicTacToeGame, policy=securitypolicy) gameobjh.append(game) game.display() I think that the "rexec" design was along the lines of "restricted Python v2", but I apologize if this simple analogy insults anyone. I'm not sure whether "restricted Python v2" is expressive enough to implement the capability security access control model or not, but I don't care, because I don't like "restricted Python v2". I like restricted Python v3: # restricted Python v3 game = TicTacToeGame() game.display(wxPython.wxWindow()) Now the game object has a reference to the window object, and it can use that reference to draw the pictures. If I later change this design and decide that instead of drawing to a window, I want the game to write to a file, then I'll change the implementation of the TicTacToeGame class, and then'll I'll come back here to this code and change it from passing a wxWindows to: # restricted Python v3 game = TicTacToeGame() game.display(open("/tmp/tttgame.out","w")) Now if I were writing in "restricted Python v2", then in addition to those two changes I would also have to make a third change, which is to edit my securitypolicy function in order to allow this particular game object to access a file named "/tmp/tttgame.out", and to disallow it access to wxPython: # restricted Python v2 def securitypolicy(subject, action, object): if (subject in games) and (action in ("read", "write",)) and (object == "file:/tmp/tttgame.out"): return True # ... return False game = TicTacToeGame() game.display("/tmp/tttgame.out") This is what I meant by saying that the security policy is expressed in Python instead of by twiddling access bits in an embedded policy language. In a capability-secure language, the change (which the programmer has to make anyway), from "wxPython.wxWindows()" to "open('/tmp/tttgame.out', 'w')" is necessary and sufficient to enforce the programmer's intended security policy, so there is no need for the redundant and brittle "policy" function. I find this unification access control and application logic to resonate deeply with the Zen of Python. Regards, Zooko [1] http://www.eros-os.org/papers/shap-thesis.ps [2] http://www.research.microsoft.com/~lampson/09-Protection/Acrobat.pdf [3] http://citeseer.nj.nec.com/blaze96decentralized.html From jim@zope.com Mon Mar 10 21:16:02 2003 From: jim@zope.com (Jim Fulton) Date: Mon, 10 Mar 2003 16:16:02 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: <067a01c2e736$4af64920$6d94fea9@newmexico> References: <025c01c2e71b$e7a3cc40$6d94fea9@newmexico> <3E6CC705.7000901@zope.com> <067a01c2e736$4af64920$6d94fea9@newmexico> Message-ID: <3E6D0092.6050404@zope.com> Samuele Pedroni wrote: > From: "Jim Fulton" > >>Could you explain why you say that zope proxies are brittle? >> > > > from my small experience playing with RestrictedIntepreter: Um, er, I've been meaning to mention that RestrictedIntepreter is not does and isn't used anywhere yet. It's a bit of a decoy at this time. :] At this point, RestrictedIntepreter is really an incomplete prototype. OTOH, RestrictedBuiltins is used for Python expressions in Zope page templates. Simple Python expressions in zpt don't tend to run into the sorts of problems you've found. > you wrap into proxies a lot of builtins: > > *) 'object' for example, then > class C(object): ... does not work Right. We will fix this. It should be possible to subclass proxied classes. The resulting classes should then be proxies. object and type should probably be special cases. > but given that some basic types are left alone, one can use > Type = ''.__class__.__class__ > > class C: > __metaclass__ = Type > *) iter seems not to work (deliberate decision or bug?) bug I imagine. > *) proxied 'property' is unusable ditto > *) built-in functions return proxies even if the argument were unproxied: > > _12 = map(None,[1,2]) Interesting case. It looks like map shouldn't be proxied. > class A: pass > a = A() > a.a = [1,2] > > _12 = getattr(a,'a') Ditto. > in both cases with the proxied version of map and getattr the result _12 would > be a proxied list. > > deliberate safer-side decisions? no. > I can see it both ways: > - see other mail I don't know what other mail you are refering to. Maybe it doesn't matter. > - map(None,[obj])[0] becomes a way to get a a proxied version of obj that can > be passed to code that would maybe unwrap it and believe > that is some other legit object. Any code that unwraps proxies should be viewed with great suspicion. I currently consider any use of that API without an extensive accompanying comment to be a virtual "XXX" comment. I'm sorry to have had you spend so much time on what is a bit od a decoy. OTOH, you've pointed out a number of points that we do need to address to move our RestrictedInterpreter beyond the prototype stage. You've found a number of problems and issues in deciding how to proxy builtins. I would argue that these are not problems in the proxies themselves but in the applications to builtins. But perhaps I'm wrong. Another area that we haven't dealt with yet is how proxies will work in untrusted *persistent* modules. But you probably don't want to know about that. ;) Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pedronis@bluewin.ch Mon Mar 10 21:26:41 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Mon, 10 Mar 2003 22:26:41 +0100 Subject: [Python-Dev] Capabilities References: <025c01c2e71b$e7a3cc40$6d94fea9@newmexico> <3E6CC705.7000901@zope.com> <067a01c2e736$4af64920$6d94fea9@newmexico> <3E6D0092.6050404@zope.com> Message-ID: <0b3e01c2e74b$bdff8c00$6d94fea9@newmexico> From: "Jim Fulton" > > I can see it both ways: > > - see other mail > > I don't know what other mail you are refering to. Maybe it doesn't matter. the other side of the coin, is that with a working unproxied/non-proxying property and/or a non-proxied map&getattr and a unproxied MyExc instance, I can break out restricted execution. From pedronis@bluewin.ch Mon Mar 10 22:14:44 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Mon, 10 Mar 2003 23:14:44 +0100 Subject: [Python-Dev] Capabilities References: <025c01c2e71b$e7a3cc40$6d94fea9@newmexico> <3E6CC705.7000901@zope.com> <067a01c2e736$4af64920$6d94fea9@newmexico> <3E6D0092.6050404@zope.com> Message-ID: <0c5a01c2e752$74c99e20$6d94fea9@newmexico> From: "Jim Fulton" > Samuele Pedroni wrote: > > From: "Jim Fulton" > > > >>Could you explain why you say that zope proxies are brittle? > >> > > > > > > from my small experience playing with RestrictedIntepreter: > > Um, er, I've been meaning to mention that RestrictedIntepreter is > not does and isn't used anywhere yet. It's a bit of a decoy > at this time. :] I knew it isn't used. I'm not that naive, it seemed nevertheless a show-case/playground of proxy+restricted execution approach. regards. From jepler@unpythonic.net Mon Mar 10 01:27:26 2003 From: jepler@unpythonic.net (Jeff Epler) Date: Sun, 9 Mar 2003 19:27:26 -0600 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: <200303092252.h29Mqdk17019@oma.cosc.canterbury.ac.nz> References: <200303092252.h29Mqdk17019@oma.cosc.canterbury.ac.nz> Message-ID: <20030310012724.GA1144@unpythonic.net> [attribution lost] > > Those would be quite different functions, then, unless you proposed to have > > Python interpret native shell metacharacters on its own too (e.g., set up > > pipes, do the indicated file redirections, interpolate envars, and fake > > whatever other shell gimmicks people may use). On Mon, Mar 10, 2003 at 11:52:39AM +1300, Greg Ewing wrote: > What we need is a function which does all those things, > but uses some way of specifying them *other* than shell > metacharacters. E.g. > > os.plumb(("sed", "-e", "s/dead/resting/", "parrots"), > ("grep", "norwegian"), output = myfile)) +1 on the concept. +1 on something that can be transformed to use tcl's "exec" so that it'll begin working on several common arches immediately. Jeff From tim.one@comcast.net Tue Mar 11 00:41:42 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 10 Mar 2003 19:41:42 -0500 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: <20030310012724.GA1144@unpythonic.net> Message-ID: [Greg Ewing] > What we need is a function which does all those things, > but uses some way of specifying them *other* than shell > metacharacters. E.g. > > os.plumb(("sed", "-e", "s/dead/resting/", "parrots"), > ("grep", "norwegian"), output = myfile)) [Jeff Epler] > +1 on the concept. +1 on something that can be transformed to use tcl's > "exec" so that it'll begin working on several common arches immediately. They're really the same thing -- Tcl's exec would be a simple transformation of a cross-platform sh-like syntax into Greg's hypothesized functions. The pain in Tcl's exec implementation was in providing the functionality across platforms, not in parsing the sh-like syntax. Then again, Tcl was trying to run all the way back to Windows 3.1, and Python already gave up on that. From oren-py-d@hishome.net Tue Mar 11 01:19:20 2003 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 10 Mar 2003 20:19:20 -0500 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: <200303092252.h29Mqdk17019@oma.cosc.canterbury.ac.nz> References: <200303092252.h29Mqdk17019@oma.cosc.canterbury.ac.nz> Message-ID: <20030311011920.GA41330@hishome.net> On Mon, Mar 10, 2003 at 11:52:39AM +1300, Greg Ewing wrote: > > Those would be quite different functions, then, unless you proposed to have > > Python interpret native shell metacharacters on its own too (e.g., set up > > pipes, do the indicated file redirections, interpolate envars, and fake > > whatever other shell gimmicks people may use). > > What we need is a function which does all those things, > but uses some way of specifying them *other* than shell > metacharacters. E.g. > > os.plumb(("sed", "-e", "s/dead/resting/", "parrots"), > ("grep", "norwegian"), output = myfile)) How about this: cmd.sed('-e', 's/dead/resting', 'parrots') / cmd.grep('norwegian') >> myfile or this: def mygrep(pattern): def tran(upstream): for s in upstream: if re.search(pattern, s): yield s return transformation(tran) open('parrots') / (lambda s:s.replace('dead','resting')) / mygrep('norwegian')) >> open('myfile', 'w') This is not some hypothetical syntax - I have a module that actually does this. It can mix python functions, generators and external commands in the same flow, use any iterable object as source, use a file, list or other data consumer as destination and a few more goodies. It's not finished but it mostly works. I don't have much time to work on it, though. oh-dear-what-have-I-done-now-I'll-have-to-finish-it-ly yours, Oren From Anthony Baxter Tue Mar 11 02:07:01 2003 From: Anthony Baxter (Anthony Baxter) Date: Tue, 11 Mar 2003 13:07:01 +1100 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <200303101540.h2AFeW012107@odiug.zope.com> Message-ID: <200303110207.h2B272e30128@localhost.localdomain> >>> Guido van Rossum wrote > The only truly secure > object is None. :-) You sure? >>> None.__class__.__class__.mro(type(None))[1] Not sure what else it's possible to get to from None... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From mwh@python.net Tue Mar 11 10:29:45 2003 From: mwh@python.net (Michael Hudson) Date: Tue, 11 Mar 2003 10:29:45 +0000 Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: (Tim Peters's message of "Mon, 10 Mar 2003 19:41:42 -0500") References: Message-ID: <2mfzpufb6u.fsf@starship.python.net> Tim Peters writes: > [Greg Ewing] >> What we need is a function which does all those things, >> but uses some way of specifying them *other* than shell >> metacharacters. E.g. >> >> os.plumb(("sed", "-e", "s/dead/resting/", "parrots"), >> ("grep", "norwegian"), output = myfile)) > > [Jeff Epler] >> +1 on the concept. +1 on something that can be transformed to use tcl's >> "exec" so that it'll begin working on several common arches immediately. > > They're really the same thing -- Tcl's exec would be a simple transformation > of a cross-platform sh-like syntax into Greg's hypothesized functions. I think Jeff was suggesting that we implement it like this: def plumb(cmd): import Tkinter return Tkinter.call('exec ' + cmd) or whatever. Cheers, M. -- at any rate, I'm satisfied that not only do they know which end of the pointy thing to hold, but where to poke it for maximum effect. -- Eric The Read, asr, on google.com From gward@python.net Tue Mar 11 14:15:59 2003 From: gward@python.net (Greg Ward) Date: Tue, 11 Mar 2003 09:15:59 -0500 Subject: [Python-Dev] Audio devices Message-ID: <20030311141559.GA15139@cthulhu.gerg.ca> Back at work on the ossaudiodev docs for a few minutes. Documenting an API is always a great opportunity to clean it up, and the ossaudiodev.open() function has a weird interface right now. From the current docs: """ open([device, ] mode) Open an audio device and return an OSS audio device object. This object supports many file-like methods, such as read(), write(), and fileno() (although there are subtle differences between conventional Unix read/write semantics and those of OSS audio devices). It also supports a number of audio-specific methods; see below for the complete list of methods. Note the unusual calling syntax: the first argument is optional, and the second is required. This is a historical artifact for compatibility with the older linuxaudiodev module which ossaudiodev supersedes. device is the audio device filename to use. If it is not specified, this module first looks in the environment variable AUDIODEV for a device to use. If not found, it falls back to /dev/dsp. mode is one of 'r' for read-only (record) access, 'w' for write-only (playback) access and 'rw' for both. Since many soundcards only allow one process to have the recorder or player open at a time it is a good idea to open the device only for the activity needed. Further, some soundcards are half-duplex: they can be opened for reading or writing, but not both at once. """ The historical background is that in linuxaudiodev prior to Python 2.3, it was *impossible* to specify the device file to open -- you had to do something like this: os.environ['AUDIODEV'] = "/dev/dsp2" dsp = linuxaudiodev.open("w") Fixing that wart is what led me to create ossaudiodev in the first place. Cleaning up the remaining ugliness in ossaudiodev.open() brings things nicely full-circle. Anyways, since the module has been renamed, who cares about backwards compatibility with linuxaudiodev? I'd like to change the open() interface to: open(device, mode) where both are required. (Most use of the audio device is for playback, not recording. But a default mode of "w" goes counter to expectations. So I think 'mode' should be required.) This would also mean getting rid of the $AUDIODEV check in ossaudiodev.c. Less C code is a good thing, unless of course it leads to lots of redundant Python code all over the world. Finally, for consistency I should also change openmixer() to require a 'device' argument (currently, it does the same thing, but hardcodes "/dev/mixer" and checks $MIXERDEVICE). Of course, this will lead people to hardcode "/dev/dsp" (and/or "/dev/mixer") into their Python audio scripts. That's bad if other OSS-using operating systems have different names for the standard audio devices. Do they? But it's certainly no *worse* than the situation for C programmers, who have to assume "/dev/dsp" as a default -- the open(2) system call certainly doesn't let you get away with leaving the filename out. And besides, "/dev/dsp" is already hard-coded into ossaudiodev.c, so if that's inappropriate on certain operating systems, somebody's going to lose already. Thoughts? Greg -- Greg Ward http://www.gerg.ca/ Sure, I'm paranoid... but am I paranoid ENOUGH? From guido@python.org Tue Mar 11 14:54:00 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Mar 2003 09:54:00 -0500 Subject: [Python-Dev] Audio devices In-Reply-To: Your message of "Tue, 11 Mar 2003 09:15:59 EST." <20030311141559.GA15139@cthulhu.gerg.ca> References: <20030311141559.GA15139@cthulhu.gerg.ca> Message-ID: <200303111454.h2BEs1B23261@odiug.zope.com> > Back at work on the ossaudiodev docs for a few minutes. Great! I wonder if you have any thoughts on why running test_ossaudiodev hangs when run on Linux Red Hat 7.3? I'm currently using a 2.4.18-24.7.x kernel. I have no idea what other info would be useful to debug this. Regarding the changes you propose, I was going to vote +1 on all, but I realize I'm not a user so my vote should only count as epsilon. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@python.net Tue Mar 11 15:58:41 2003 From: gward@python.net (Greg Ward) Date: Tue, 11 Mar 2003 10:58:41 -0500 Subject: [Python-Dev] Audio devices In-Reply-To: <200303111454.h2BEs1B23261@odiug.zope.com> References: <20030311141559.GA15139@cthulhu.gerg.ca> <200303111454.h2BEs1B23261@odiug.zope.com> Message-ID: <20030311155841.GB14963@cthulhu.gerg.ca> On 11 March 2003, Guido van Rossum said: > Great! I wonder if you have any thoughts on why running > test_ossaudiodev hangs when run on Linux Red Hat 7.3? I'm currently > using a 2.4.18-24.7.x kernel. I have no idea what other info would be > useful to debug this. The most obvious cause is that some other process has the audio device open, and your audio {hardware, device driver} only allows one at a time. If you're running one of those newfangled GUI environments like KDE or GNOME, it's quite likely that the esound or aRTSd (however you spell it) daemon started when you logged in, and is thus blocking all access to your /dev/dsp. This sucks, but IMHO it's not ossaudiodev's job to know about esound and similar. One way to test this is to take your system down to single-user (or at least a console-only, no-X11 runlevel) and then try running test_ossaudiodev. Hmmm, it looks like calling open() with O_NONBLOCK helps. I know this does *not* affect later read()/write() -- there's a special ioctl() for non-blocking read/write -- but it *does* appear to fix blocking open(). At least for me it turned a second open() attempt on the same device from "hang" to "IOError: [Errno 16] Device or resource busy: '/dev/dsp2'". Try this patch; if it works I'll check it in: --- Modules/ossaudiodev.c 10 Mar 2003 03:17:06 -0000 1.24 +++ Modules/ossaudiodev.c 11 Mar 2003 15:56:24 -0000 @@ -131,7 +131,7 @@ basedev = "/dev/dsp"; } - if ((fd = open(basedev, imode)) == -1) { + if ((fd = open(basedev, imode|O_NONBLOCK)) == -1) { PyErr_SetFromErrnoWithFilename(PyExc_IOError, basedev); return NULL; } test_ossaudiodev.py will still need fixing to handle the EBUSY error, but at least this should prevent hanging on open(). Greg -- Greg Ward http://www.gerg.ca/ Hand me a pair of leather pants and a CASIO keyboard -- I'm living for today! From guido@python.org Tue Mar 11 16:04:54 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Mar 2003 11:04:54 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: Your message of "Mon, 10 Mar 2003 16:15:18 EST." References: <200303100110.h2A1AaC06743@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303111605.h2BG4wA24117@odiug.zope.com> (Zooko, would it kill you to keep your line lengths well below 79?) [Zooko] > > > The [Principle-of-Least-Privilege approach to securing a > > > standard library] is to separate the tools so that dangerous > > > ones don't come tied together with common ones. The security > > > policy, then, is expressed by code that grants or withholds > > > capabilities (== references) rather than by code that toggles > > > the "restricted" bit. [Guido] > > This sounds interesting, but I'm not sure I follow it. Can you > > elaborate by giving a couple of examples? [Zooko] > First let me say that "capability access control" [1] is a > theoretical construct, comparable to "access control lists" [2] and > "Trust Management" [3]. Each is a formal model for specifying > access control rules -- who is allowed to do what. > > But in the context of Python we are interested not only in the > theoretical model but also in a specific way of implementing it -- > by making object references unforgeable and binding all authorities > to object references. > > So in this discussion it may not be clear whether a claimed > advantage of "capabilities" flows from the formal model or from the > practice of unifying security programming with object oriented > programming. I don't think it is important to differentiate in this > discussion. > > Now for examples... > > Hm, well first of all, where are rexec and Zope proxies currently > used? I believe that a "cap-Python" would support those uses, > implementing the same security policies, but more cleanly since > access control would be a first-class part of the language. > > I don't know Zope very well, and rather than guess, I'd like to ask > someone who does know Zope to give a typical example of how proxies > are used in workaday Zope. I suspect that capabilities are quite > similar to Zope proxies. Yes. > Now for a quick made-up example to demonstrate what I meant about > expressing security policy above, consider a tic-tac-toe game that > is supposed to draw to the screen. > > In "restricted Python v1", certain modules have been flagged as > "safe" and others "unsafe". Code can execute other code with a > "restricted" flag set, something like this: > > # restricted Python v1 > game = eval(TicTacToeGame, restricted=True) > game.display() > > Unfortunately, in "restricted Python v1", all of the modules that > allow drawing to the screen are marked as "unsafe", so the > tic-tac-toe-game immediately dies with an exception. > > In "restricted Python v2", an arbitrary security policy can be implemented: > > # restricted Python v2 > games=[] > def securitypolicy(subject, action, object): > if ((subject in games) and (action == "import") and (object == "wxPython")) or > (subject in games) and (action == "execute") and (object == "wxPython.Window") or > (subject in games) and (action == "execute") and (object == "wxPython.Window.paint")): > return True > # ... > return False > > game = eval(TicTacToeGame, policy=securitypolicy) > gameobjh.append(game) > game.display() > > I think that the "rexec" design was along the lines of "restricted > Python v2", but I apologize if this simple analogy insults anyone. Not really. The rexec design gives you the tools to implement either v1, v2 or v3. Its basic features are more like v1, but it has a concept of Zope-like proxies, named Bastions, and it allows you to use functions or bound methods as capabilities. Bastions are mostly a convenience to allow a bunch of capabilities to be used like an object. The "security policy" you sketch as part of v2 would be possible but there aren't really any hooks to implement this; you'd have to craft it out of Bastions and capabilities. > I'm not sure whether "restricted Python v2" is expressive enough to > implement the capability security access control model or not, but I > don't care, because I don't like "restricted Python v2". I like > restricted Python v3: > > # restricted Python v3 > game = TicTacToeGame() > game.display(wxPython.wxWindow()) > > Now the game object has a reference to the window object, and it can > use that reference to draw the pictures. If I later change this > design and decide that instead of drawing to a window, I want the > game to write to a file, then I'll change the implementation of the > TicTacToeGame class, and then'll I'll come back here to this code > and change it from passing a wxWindows to: > > # restricted Python v3 > game = TicTacToeGame() > game.display(open("/tmp/tttgame.out","w")) > > Now if I were writing in "restricted Python v2", then in addition to > those two changes I would also have to make a third change, which is > to edit my securitypolicy function in order to allow this particular > game object to access a file named "/tmp/tttgame.out", and to > disallow it access to wxPython: > > # restricted Python v2 > def securitypolicy(subject, action, object): > if (subject in games) and (action in ("read", "write",)) and (object == "file:/tmp/tttgame.out"): > return True > # ... > return False > > game = TicTacToeGame() > game.display("/tmp/tttgame.out") > > This is what I meant by saying that the security policy is expressed > in Python instead of by twiddling access bits in an embedded policy > language. In a capability-secure language, the change (which the > programmer has to make anyway), from "wxPython.wxWindows()" to > "open('/tmp/tttgame.out', 'w')" is necessary and sufficient to > enforce the programmer's intended security policy, so there is no > need for the redundant and brittle "policy" function. > > I find this unification access control and application logic to > resonate deeply with the Zen of Python. Me too. > Regards, > > Zooko > > [1] http://www.eros-os.org/papers/shap-thesis.ps > [2] http://www.research.microsoft.com/~lampson/09-Protection/Acrobat.pdf > [3] http://citeseer.nj.nec.com/blaze96decentralized.html --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@gmx.net Tue Mar 11 15:12:41 2003 From: ark@gmx.net (Arne Koewing) Date: Tue, 11 Mar 2003 16:12:41 +0100 Subject: [Python-Dev] Re: Audio devices References: <20030311141559.GA15139@cthulhu.gerg.ca> Message-ID: <87zno2ym1i.fsf@gmx.net> Greg Ward writes: > Of course, this will lead people to hardcode "/dev/dsp" (and/or > "/dev/mixer") into their Python audio scripts. That's bad if other > OSS-using operating systems have different names for the standard audio > devices. Do they? with devfs (Linux) this would be /dev/sound/dsp and /dev/sound/mixer (but there are compatibility links...) From guido@python.org Tue Mar 11 16:20:18 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Mar 2003 11:20:18 -0500 Subject: [Python-Dev] Audio devices In-Reply-To: Your message of "Tue, 11 Mar 2003 10:58:41 EST." <20030311155841.GB14963@cthulhu.gerg.ca> References: <20030311141559.GA15139@cthulhu.gerg.ca> <200303111454.h2BEs1B23261@odiug.zope.com> <20030311155841.GB14963@cthulhu.gerg.ca> Message-ID: <200303111620.h2BGKJn27187@odiug.zope.com> > On 11 March 2003, Guido van Rossum said: > > Great! I wonder if you have any thoughts on why running > > test_ossaudiodev hangs when run on Linux Red Hat 7.3? I'm currently > > using a 2.4.18-24.7.x kernel. I have no idea what other info would be > > useful to debug this. [Greg] > The most obvious cause is that some other process has the audio device > open, and your audio {hardware, device driver} only allows one at a > time. Hm, but I *do* hear some sound coming out of the speaker: a quiet, sped-up squeaky version of the "nobody expects the spanish inquisition" soundclip that test_linuxaudiodev also used to play. (The latter now crashes for me with "linuxaudiodev.error: (0, 'Error')".) > If you're running one of those newfangled GUI environments like KDE or > GNOME, it's quite likely that the esound or aRTSd (however you spell it) > daemon started when you logged in, and is thus blocking all access to > your /dev/dsp. This sucks, but IMHO it's not ossaudiodev's job to know > about esound and similar. > > One way to test this is to take your system down to single-user (or at > least a console-only, no-X11 runlevel) and then try running > test_ossaudiodev. I tried this at runlevel 1, and the symptoms are identical: some squeaks, then it hangs. > Hmmm, it looks like calling open() with O_NONBLOCK helps. I know this > does *not* affect later read()/write() -- there's a special ioctl() for > non-blocking read/write -- but it *does* appear to fix blocking open(). > At least for me it turned a second open() attempt on the same device > from "hang" to "IOError: [Errno 16] Device or resource busy: > '/dev/dsp2'". > > Try this patch; if it works I'll check it in: > > --- Modules/ossaudiodev.c 10 Mar 2003 03:17:06 -0000 1.24 > +++ Modules/ossaudiodev.c 11 Mar 2003 15:56:24 -0000 > @@ -131,7 +131,7 @@ > basedev = "/dev/dsp"; > } > > - if ((fd = open(basedev, imode)) == -1) { > + if ((fd = open(basedev, imode|O_NONBLOCK)) == -1) { > PyErr_SetFromErrnoWithFilename(PyExc_IOError, basedev); > return NULL; > } > > test_ossaudiodev.py will still need fixing to handle the EBUSY error, > but at least this should prevent hanging on open(). Yes, it fixes the hang. Please check it in! The sample is still played at too high a speed, but maybe that's expected? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Mar 11 16:33:51 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Mar 2003 11:33:51 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Your message of "Tue, 11 Mar 2003 09:59:59 +1300." <200303102059.h2AKxxR24396@oma.cosc.canterbury.ac.nz> References: <200303102059.h2AKxxR24396@oma.cosc.canterbury.ac.nz> Message-ID: <200303111633.h2BGXum27649@odiug.zope.com> [Greg Ewing] > I think I agree that to really get on top of this security business we > need to move towards having dangerous things forbidden by default > rather than allowed by default. This is more or less what the rexec module implements, except for convenience it has a list of unsafe built-ins rather than a list of safe built-in. > To that end, it would be useful if we could pin down exactly what's > dangerous and what isn't. It seems to me that most uses of > introspection by most programs are harmless. Can we sort out those > (hopefully few) things that are dangerous, and separate them from the > existing introspection mechanisms? Maybe, maybe not. The original restricted execution code (not the rexec module) arbitrarily decided that setting class attributes was dangerous but getting them was not. Samuele found that new-style classes allow both, but always disallows write-access to the class __dict__ (you have to use the setattr protocol); this is good or bad depending on how it's used. The real problem is that harmful access may be granted via innocent-looking access. For example, allowing read-only access to a function's globals gives you access to the unrestricted 'open' function... > Access to sys.modules has been mentioned as a key thing that needs to > be restricted. Maybe this shouldn't be an arbitrarily-accessible > variable? Maybe the sys module shouldn't be a module at all, but some > special object that won't let you do nasty things with its contents > unless you've got special privileges (which most code would *not* > have by default). That's pretty much what the rexec module implements; it overrides __import__ and when you ask for sys, you get a fake sys that only contains stuff that should be safe. > One of the "nasty" things would be picking the real __builtins__ out > of sys.modules. Are there any others? Picking an unsafe extension module out of sys.modules. --Guido van Rossum (home page: http://www.python.org/~guido/) From pedronis@bluewin.ch Tue Mar 11 17:02:36 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Tue, 11 Mar 2003 18:02:36 +0100 Subject: [Python-Dev] Capabilities References: <200303102059.h2AKxxR24396@oma.cosc.canterbury.ac.nz> <200303111633.h2BGXum27649@odiug.zope.com> Message-ID: <054701c2e7f0$0435d7c0$6d94fea9@newmexico> From: "Guido van Rossum" > [Greg Ewing] > > I think I agree that to really get on top of this security business we > > need to move towards having dangerous things forbidden by default > > rather than allowed by default. > > This is more or less what the rexec module implements, except for > convenience it has a list of unsafe built-ins rather than a list of > safe built-in. > > > To that end, it would be useful if we could pin down exactly what's > > dangerous and what isn't. It seems to me that most uses of > > introspection by most programs are harmless. Can we sort out those > > (hopefully few) things that are dangerous, and separate them from the > > existing introspection mechanisms? > > Maybe, maybe not. The original restricted execution code (not the > rexec module) arbitrarily decided that setting class attributes was > dangerous but getting them was not. Samuele found that new-style > classes allow both, but always disallows write-access to the class > __dict__ (you have to use the setattr protocol); this is good or bad > depending on how it's used. but given that methods can be overriden per instance with classic-classes: class C: def f(s): ... c=C() c.f = lambda s: s it was not so effective. > The real problem is that harmful access may be granted via > innocent-looking access. For example, allowing read-only access to a > function's globals gives you access to the unrestricted 'open' > function... restricted execution alone for example does not have a notion of subclassable vs. non subclassable classes, and given its approach, subclassing can be dangerous. For sure a good thing would be for func_* and im_* attributes of functions and methods to be substituted by special accessor functions/objects, indipendently of restricted mode. Function and method should be for normal code basically opaque. From ping@zesty.ca Tue Mar 11 17:24:13 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Tue, 11 Mar 2003 11:24:13 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <3E6CB1D8.4050108@zope.com> Message-ID: On Mon, 10 Mar 2003, Jim Fulton wrote: > Ka-Ping Yee wrote: > > Here's a preliminary description of the boundary between "introspective" > > and "restricted", off the top of my head: > > > > 1. The only thing you can do with a bound method is to call it > > (bound methods have no attributes except __doc__). > > Well, I see no harm and much usefulness > in allowing __name__, __repr__, and __str__. Depends. In a truly secure system, classes would only reveal information about themselves if they wanted to. The default __repr__ gives away to the id() of the instance, and __name__ gives away the name of the method, which would prevent you from creating proxies that are indistinguishable from the original. Sometimes it is useful to be able to do that. > > 2. The following instance attributes are off limits: > > __class__, __dict__, __module__. > > I generally want to be able to get the __class__. This is harmless > in my case, because I get a proxy back. We definitely do not want to provide access to __class__. Access to an instance should not give you the power to create more instances of its class. If you passed somebody a file object, access to the class would convey the power to open any file on the filesystem! > > However, there is still the problem that the established technique > > for storing instance-specific state in Python is to use globally- > > accessible data attributes instead of a limited scope. We would > > also need to add a safe (private) place for instances to put state. > > I'm don't understand why this is necessary. In general, you want to > restrict what attributes (data, properties, methods, etc.) are accessible > in certain situations. I don't follow what makes data attributes special. Instances currently don't have a private place to put their state, and unless there is a convenient way do that, implementers will tend to expose their instance state in public data attributes. Even if the instance had properties, the properties still (as yet) have no way to conveniently distinguish if access is being attempted from within an instance method, or from outside the instance. -- ?!ng From ping@zesty.ca Tue Mar 11 17:28:59 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Tue, 11 Mar 2003 11:28:59 -0600 (CST) Subject: [Python-Dev] Re: Capabilities In-Reply-To: <200303101540.h2AFeW012107@odiug.zope.com> Message-ID: On Mon, 10 Mar 2003, Guido van Rossum wrote: > [Ping] > > By the way -- to avoid confusion between "proxies used to wrap > > unrestricted objects in order to make them into secure objects" and > > "proxies used to reduce the interface of an existing secure object", > > let's call the first "proxy" (as has been used in the "rexec vs. proxy" > > discussion so far), and call the second a "facet" (which is the term > > commonly used when capabilities people talk about reducing an interface). > > Hm, I'm not sure I understand the difference between the two > definitions you give. What does "making something into a secure > object" mean if not "reducing its interface"? And what is the > fundamental difference between a secure object and an insecure one? > In my world view there's a gradual difference. I acknowledge that it's not perfectly black and white, but what i meant in the above is that a "secure object" is one that exposes only its declared interface. The key difference i'm getting at is whether the interface is the one intended by the programmer. Proxies are for ensuring that the interface doesn't leak things the programmer never intended; facets are for the programmer to intentionally reduce the interface of an already secure object to limit its powers. Er, perhaps another way of saying it is that proxies are at the system level and facets are at the user level. -- ?!ng From pedronis@bluewin.ch Tue Mar 11 17:30:27 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Tue, 11 Mar 2003 18:30:27 +0100 Subject: [Python-Dev] Capabilities References: <200303102059.h2AKxxR24396@oma.cosc.canterbury.ac.nz> <200303111633.h2BGXum27649@odiug.zope.com> <054701c2e7f0$0435d7c0$6d94fea9@newmexico> Message-ID: <062d01c2e7f3$e82b0d80$6d94fea9@newmexico> From: "Samuele Pedroni" > For sure a good thing would be for func_* and im_* attributes of functions and > methods to be substituted by special accessor functions/objects, indipendently > of restricted mode. to clarify: I mean something like func_globals(f) vs f.func_globals regards From ben@algroup.co.uk Tue Mar 11 17:50:38 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Tue, 11 Mar 2003 17:50:38 +0000 Subject: [Python-Dev] Capabilities In-Reply-To: <200303101538.h2AFcTR12087@odiug.zope.com> References: <200303101538.h2AFcTR12087@odiug.zope.com> Message-ID: <3E6E21EE.50709@algroup.co.uk> Guido van Rossum wrote: >>Ben Laurie wrote: >>There seems to be a persistent confusion here that i would like >>to dispel: a capability is not a single lambda. > > > I guess, I misunderstood.. I was sure that Ben told me this was so. > Apparently I misread, or you have a different definition of capability > than he does (wouldn't be the first time.) The thing is that a capability is a pretty abstract notion. You can implement them as classes or as lambdas - I initially did them as classes, but decided that lambdas were neater, at least in the context of Python. I could be wrong. It could just be my particular bias, which is why I'd prefer, ideally, to be able to do either. I'm sure if people want to be definition lawyers they can find documentation explaining why either of those isn't quite right, but I'm interested in functionality and the functionality is available either way. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From greg@cosc.canterbury.ac.nz Tue Mar 11 21:08:44 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 12 Mar 2003 10:08:44 +1300 (NZDT) Subject: [Python-Dev] test_popen broken on Win2K In-Reply-To: <2mfzpufb6u.fsf@starship.python.net> Message-ID: <200303112108.h2BL8ii29619@oma.cosc.canterbury.ac.nz> > I think Jeff was suggesting that we implement it like this: > > def plumb(cmd): > import Tkinter > return Tkinter.call('exec ' + cmd) But then you'd be going through the bottleneck of a string syntax with all its attendant quoting problems. The whole point of my suggestion was to avoid that! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From gward@python.net Tue Mar 11 21:44:50 2003 From: gward@python.net (Greg Ward) Date: Tue, 11 Mar 2003 16:44:50 -0500 Subject: [Python-Dev] Audio devices In-Reply-To: <200303111620.h2BGKJn27187@odiug.zope.com> References: <20030311141559.GA15139@cthulhu.gerg.ca> <200303111454.h2BEs1B23261@odiug.zope.com> <20030311155841.GB14963@cthulhu.gerg.ca> <200303111620.h2BGKJn27187@odiug.zope.com> Message-ID: <20030311214450.GA16297@cthulhu.gerg.ca> On 11 March 2003, Guido van Rossum said: > Yes, it fixes the hang. Please check it in! OK, done. > The sample is still played at too high a speed, but maybe that's > expected? No, definitely not. On my system, it sounds the same as it has with linuxaudiodev for quite a while, and the same as it did with sunaudiodev on my old Sun box at CNRI before that. *Maybe* there's something wrong with how setparameters() initializes the audio device. Try this patch and see how it goes: --- Lib/test/test_ossaudiodev.py 14 Feb 2003 19:29:22 -0000 1.4 +++ Lib/test/test_ossaudiodev.py 11 Mar 2003 21:42:31 -0000 @@ -52,8 +52,10 @@ a.fileno() # set parameters based on .au file headers - a.setparameters(rate, 16, nchannels, fmt) - a.write(data) + a.setfmt(fmt) + a.channels(nchannels) + a.speed(rate) + a.writeall(data) a.flush() a.close() Hmmm, I just noticed that setting O_NONBLOCK at open() time *does* have an effect -- I needed to change that write() to writeall() in order to hear the whole test sound. Uh-oh. Greg -- Greg Ward http://www.gerg.ca/ "... but in the town it was well known that when they got home their fat and psychopathic wives would thrash them to within inches of their lives ..." From skip@pobox.com Tue Mar 11 22:20:07 2003 From: skip@pobox.com (Skip Montanaro) Date: Tue, 11 Mar 2003 16:20:07 -0600 Subject: [Python-Dev] bsddb3 test errors - are these expected? Message-ID: <15982.24855.181016.568236@montanaro.dyndns.org> I just tried running regrtest with "-uall,-largefile" (after a "cvs up", "./config.status --recheck", and "make") on my Mac OS X system. It chugged for awhile, then spit this out several times: Exception in thread reader 4: Traceback (most recent call last): File "/Users/skip/src/python/head/dist/src/Lib/threading.py", line 411, in __bootstrap self.run() File "/Users/skip/src/python/head/dist/src/Lib/threading.py", line 399, in run self.__target(*self.__args, **self.__kwargs) File "/Users/skip/src/python/head/dist/src/Lib/bsddb/test/test_thread.py", line 270, in reade rThread rec = c.first() DBLockDeadlockError: (-30995, 'DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock') once for each thread, then this: /Users/skip/src/python/head/dist/src/Lib/bsddb/dbutils.py:67: RuntimeWarning: DB_INCOMPLETE: Ca che flush was unable to complete return function(*_args, **_kwargs) After chugging awhile longer, it segfaulted. What (if anything) can I do to provide useful inputs to someone who can possibly fix the problem? Skip From fincher.8@osu.edu Tue Mar 11 19:49:42 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Tue, 11 Mar 2003 14:49:42 -0500 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: <15982.24855.181016.568236@montanaro.dyndns.org> References: <15982.24855.181016.568236@montanaro.dyndns.org> Message-ID: <200303111449.42035.fincher.8@osu.edu> There are many places in the standard library where some code either iterates over a literal list or checks for membership in a literal list. I'm curious if it would be considered productive and useful to go through and change those places to iterate over/check for membership in literal tuples instead fo lists. The tuple, I think, more closely reflects the read-only literal nature of the code and is slightly faster to boot. (Not that the speed really matters, I'm sure there aren't any such tests in performance-sensitive locations). Would such an endeavor be useful? Would a patch to that effect be accepted? Jeremy From nas@python.ca Wed Mar 12 00:05:06 2003 From: nas@python.ca (Neil Schemenauer) Date: Tue, 11 Mar 2003 16:05:06 -0800 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: <200303111449.42035.fincher.8@osu.edu> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> Message-ID: <20030312000505.GA26614@glacier.arctrix.com> Jeremy Fincher wrote: > Would such an endeavor be useful? Would a patch to that effect be accepted? I doubt it. It would be more useful to look over the list of open patches and bugs and find something that you can help on. Neil From guido@python.org Wed Mar 12 02:11:12 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Mar 2003 21:11:12 -0500 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: "Your message of Tue, 11 Mar 2003 14:49:42 EST." <200303111449.42035.fincher.8@osu.edu> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> Message-ID: <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> > There are many places in the standard library where some code either > iterates over a literal list or checks for membership in a literal > list. I'm curious if it would be considered productive and useful > to go through and change those places to iterate over/check for > membership in literal tuples instead fo lists. The tuple, I think, > more closely reflects the read-only literal nature of the code and > is slightly faster to boot. -1. I bet you can't prove the speed-up. Tuples are for heterogeneous data, list are for homogeneous data. Tuples are *not* read-only lists. Tuples require extra care in case the number of elements shrinks to 1. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Mar 12 02:15:38 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 11 Mar 2003 21:15:38 -0500 Subject: [Python-Dev] bsddb3 test errors - are these expected? In-Reply-To: <15982.24855.181016.568236@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > I just tried running regrtest with "-uall,-largefile" (after a "cvs up", > "./config.status --recheck", and "make") on my Mac OS X system. > It chugged for awhile, then spit this out several times: > > Exception in thread reader 4: > Traceback (most recent call last): > File > "/Users/skip/src/python/head/dist/src/Lib/threading.py", line > 411, in __bootstrap > self.run() > File > "/Users/skip/src/python/head/dist/src/Lib/threading.py", line 399, in run > self.__target(*self.__args, **self.__kwargs) > File > "/Users/skip/src/python/head/dist/src/Lib/bsddb/test/test_thread.p > y", line 270, in reade > rThread > rec = c.first() > DBLockDeadlockError: (-30995, 'DB_LOCK_DEADLOCK: Locker > killed to resolve a deadlock') > > once for each thread, I believe those are timing-related and harmless. It would be better if the test suite suppressed such msgs, if so. > then this: > > /Users/skip/src/python/head/dist/src/Lib/bsddb/dbutils.py:67: > RuntimeWarning: DB_INCOMPLETE: Ca > che flush was unable to complete > return function(*_args, **_kwargs) That's not good, though. > After chugging awhile longer, it segfaulted. Nor that. > What (if anything) can I do to provide useful inputs to someone who can > possibly fix the problem? Sorry, no idea. Is the pybsddb SF project still open for business? From tismer@tismer.com Wed Mar 12 03:39:26 2003 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Mar 2003 04:39:26 +0100 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6EABEE.2040108@tismer.com> Guido van Rossum wrote: ... > Tuples are for heterogeneous data, list are for homogeneous data. > Tuples are *not* read-only lists. Oh! Did you point that out anywhere, before, and I missed it? Are you thinking of lists as to be really somehow being homogeneous data, in a sense to be replacible by some array optimization, sometimes, while tuples aren't? I never realized this, and I'm a bit stunned. (but by no means negative about it, just surprized) ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From fincher.8@osu.edu Wed Mar 12 01:03:08 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Tue, 11 Mar 2003 20:03:08 -0500 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303112003.08689.fincher.8@osu.edu> On Tuesday 11 March 2003 09:11 pm, Guido van Rossum wrote: > I bet you can't prove the speed-up. Here's the script I used to test it: import timeit report = \ """ Size: %s Tuple Time: %s List Time: %s List Time - Tuple Time: %s """ if __name__ == '__main__': import sys if len(sys.argv) > 1: upperLimit = sys.argv[1] else: upperLimit = 10 for i in xrange(upperLimit): lst = range(i) tpl = tuple(lst) tupleTimer = timeit.Timer('%s in %r' % (upperLimit, tpl)) listTimer = timeit.Timer('%s in %r' % (upperLimit, lst)) minTupleTime = min(tupleTimer.repeat()) minListTime = min(listTimer.repeat()) difference = minListTime - minTupleTime print report % (i, minTupleTime, minListTime, difference) There seems to be a constant 1.3 usec or so difference between creating a tuple and creating a list. As I mentioned earlier, I seriously doubt it would have any significant impact on the overall execution speed of any non-trivial Python program, but it exists nonetheless. Maybe in the realm of 'low hanging fruit' it's the fruit that's fallen to the ground and begun to rot :) > Tuples are for heterogeneous data, list are for homogeneous data. > Tuples are *not* read-only lists. I understand this in a strictly typed language, but in Python, since lists can be just as heterogeneous as tuples, it's always seemed to me that the greatest difference between lists and tuples is the mutability and extensibility of lists. Jeremy From Anthony Baxter Wed Mar 12 06:35:25 2003 From: Anthony Baxter (Anthony Baxter) Date: Wed, 12 Mar 2003 17:35:25 +1100 Subject: [Python-Dev] Audio devices In-Reply-To: <200303111620.h2BGKJn27187@odiug.zope.com> Message-ID: <200303120635.h2C6ZP803522@localhost.localdomain> >>> Guido van Rossum wrote > The sample is still played at too high a speed, but maybe that's > expected? For what it's worth, my redhat 7.3 Dell laptop does this whenever it gets a mono sample to play. I'm waiting for RH8.1, this will hopefully fix the problem. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From mal@lemburg.com Wed Mar 12 09:01:32 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Mar 2003 10:01:32 +0100 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6EF76C.50608@lemburg.com> Guido van Rossum wrote: >>There are many places in the standard library where some code either >>iterates over a literal list or checks for membership in a literal >>list. I'm curious if it would be considered productive and useful >>to go through and change those places to iterate over/check for >>membership in literal tuples instead fo lists. The tuple, I think, >>more closely reflects the read-only literal nature of the code and >>is slightly faster to boot. > > > -1. > > I bet you can't prove the speed-up. He probably can :-) That's why I have so many tools in mxTools which return tuples instead of lists, e.g. trange() and indices(). Both the tuple creation and the iteration are faster than list creation and access (tuples don't use indirection which saves you a second malloc() and dereference). As always: it's the sum of small tweaks like these that makes the difference in the overall performance of an application. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 12 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 20 days left EuroPython 2003, Charleroi, Belgium: 104 days left From dave@boost-consulting.com Wed Mar 12 12:24:48 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 12 Mar 2003 07:24:48 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW Message-ID: Someone I work with recently caused a test to start asserting in VC7's instrumented free() call, using a pydebug build. He explained the change this way: "I switched from PyObject_New to PyObject_NEW, which according to it's documentation omits the check for type_object != 0 and consequently should run a little bit faster" [he doesn't ever pass 0 as the typeobject] Did he miss some other important fact about PyObject_NEW? Does the doc need to be fixed? -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Wed Mar 12 12:48:05 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 12 Mar 2003 07:48:05 -0500 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: "Your message of Wed, 12 Mar 2003 04:39:26 +0100." <3E6EABEE.2040108@tismer.com> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> <3E6EABEE.2040108@tismer.com> Message-ID: <200303121248.h2CCm5K29846@pcp02138704pcs.reston01.va.comcast.net> > > Tuples are for heterogeneous data, list are for homogeneous data. > > Tuples are *not* read-only lists. > > Oh! > > Did you point that out anywhere, before, and I missed it? Yes. I've been saying this for years whenever people would listen (which is not often :-( ) > Are you thinking of lists as to be really somehow > being homogeneous data, in a sense to be replacible > by some array optimization, sometimes, while tuples aren't? Python is a dynamic language, and you can do whatever you want with the data structures it gives you. But when thinking about extending the language with optional type declarations or automatic type inference, I always think of the type of a list as "list of T" while I think of a tuple's type as "tuple of length N with items of types T1, T2, T3, ..., TN". So [1, 2] and [1, 2, 3] are both "list of int" (and "list of Number" and "list of Object", of course) while ("hello", 42) is a "2-tuple with items str and int" and (42, "hello", 3.14) is a "3-tuple with items int, str, float". > I never realized this, and I'm a bit stunned. > (but by no means negative about it, just surprized) You learn something new every day. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Mar 12 12:54:55 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 12 Mar 2003 07:54:55 -0500 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: "Your message of Tue, 11 Mar 2003 20:03:08 EST." <200303112003.08689.fincher.8@osu.edu> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> <200303112003.08689.fincher.8@osu.edu> Message-ID: <200303121254.h2CCstj29915@pcp02138704pcs.reston01.va.comcast.net> > On Tuesday 11 March 2003 09:11 pm, Guido van Rossum wrote: > > I bet you can't prove the speed-up. > > Here's the script I used to test it: [Good use of 'timeit' module skipped] > There seems to be a constant 1.3 usec or so difference between > creating a tuple and creating a list. As I mentioned earlier, I > seriously doubt it would have any significant impact on the overall > execution speed of any non-trivial Python program, but it exists > nonetheless. Maybe in the realm of 'low hanging fruit' it's the > fruit that's fallen to the ground and begun to rot :) Of course creating a tuple is faster than creating a list. I meant that you wouldn't be able to show a speed difference in any of the places where you would consider adding it (i.e. in context). > > Tuples are for heterogeneous data, list are for homogeneous data. > > Tuples are *not* read-only lists. > > I understand this in a strictly typed language, but in Python, since > lists can be just as heterogeneous as tuples, it's always seemed to > me that the greatest difference between lists and tuples is the > mutability and extensibility of lists. Sorry, you're wrong. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Mar 12 12:57:25 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Mar 2003 13:57:25 +0100 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: References: Message-ID: <3E6F2EB5.5020207@lemburg.com> David Abrahams wrote: > Someone I work with recently caused a test to start asserting in VC7's > instrumented free() call, using a pydebug build. He explained the > change this way: > > "I switched from PyObject_New to PyObject_NEW, which according to it's > documentation omits the check for type_object != 0 and consequently > should run a little bit faster" > > [he doesn't ever pass 0 as the typeobject] > > Did he miss some other important fact about PyObject_NEW? Does the > doc need to be fixed? Does he use PyObject_DEL() to free the object ? -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 12 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 20 days left EuroPython 2003, Charleroi, Belgium: 104 days left From guido@python.org Wed Mar 12 13:08:20 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 12 Mar 2003 08:08:20 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: "Your message of Wed, 12 Mar 2003 07:24:48 EST." References: Message-ID: <200303121308.h2CD8Kj29996@pcp02138704pcs.reston01.va.comcast.net> > Someone I work with recently caused a test to start asserting in VC7's > instrumented free() call, using a pydebug build. He explained the > change this way: > > "I switched from PyObject_New to PyObject_NEW, which according to it's > documentation omits the check for type_object != 0 and consequently > should run a little bit faster" > > [he doesn't ever pass 0 as the typeobject] > > Did he miss some other important fact about PyObject_NEW? Does the > doc need to be fixed? You can read the source code as well as I can. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Wed Mar 12 14:06:10 2003 From: ark@research.att.com (Andrew Koenig) Date: 12 Mar 2003 09:06:10 -0500 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: <200303121248.h2CCm5K29846@pcp02138704pcs.reston01.va.comcast.net> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> <3E6EABEE.2040108@tismer.com> <200303121248.h2CCm5K29846@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido> Python is a dynamic language, and you can do whatever you want Guido> with the data structures it gives you. But when thinking about Guido> extending the language with optional type declarations or Guido> automatic type inference, I always think of the type of a list Guido> as "list of T" while I think of a tuple's type as "tuple of Guido> length N with items of types T1, T2, T3, ..., TN". So [1, 2] Guido> and [1, 2, 3] are both "list of int" (and "list of Number" and Guido> "list of Object", of course) while ("hello", 42) is a "2-tuple Guido> with items str and int" and (42, "hello", 3.14) is a "3-tuple Guido> with items int, str, float". It might interest you to know that Standard ML, which is statically but polymorphically typed, draws exactly that distinction. It has both tuple and list types. The type of a tuple includes the type of each of its elements, whereas all of the elements of a list must be the same type. Moreover, although the type of a list includes the type of its elements, it does not include how many elements there are. So in ML, the type of [1, 2, 3] is "int list", and the type of ("hello", 42) is "string * int". -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From barry@python.org Wed Mar 12 14:23:29 2003 From: barry@python.org (Barry A. Warsaw) Date: Wed, 12 Mar 2003 09:23:29 -0500 Subject: [Python-Dev] Ridiculously minor tweaks? References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> <3E6EABEE.2040108@tismer.com> <200303121248.h2CCm5K29846@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15983.17121.164482.184532@gargle.gargle.HOWL> >>>>> "GvR" == Guido van Rossum writes: GvR> I always think of the type of a list as "list of T" while I GvR> think of a tuple's type as "tuple of length N with items of GvR> types T1, T2, T3, ..., TN". So [1, 2] and [1, 2, 3] are both GvR> "list of int" (and "list of Number" and "list of Object", of GvR> course) while ("hello", 42) is a "2-tuple with items str and GvR> int" and (42, "hello", 3.14) is a "3-tuple with items int, GvR> str, float". Of course (1, 2, 3) fits under that description, where, just by chance T1 == T2 == T3. But one of the ways I think about it is the tuple's relationship to argument and return passing. It's the tuple that's used when multiple values are returned from a function and they are almost always heterogeneous. And while lists can be used for unpacking sequences, I tend to think of tuples when I want record types, e.g. rec = magic(blah) length, prefix, interface = rec -Barry From tismer@tismer.com Wed Mar 12 14:46:55 2003 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Mar 2003 15:46:55 +0100 Subject: [Python-Dev] Ridiculously minor tweaks? In-Reply-To: <200303121248.h2CCm5K29846@pcp02138704pcs.reston01.va.comcast.net> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> <3E6EABEE.2040108@tismer.com> <200303121248.h2CCm5K29846@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6F485F.3030104@tismer.com> Guido van Rossum wrote: > Tuples are for heterogeneous data, list are for homogeneous data. > Tuples are *not* read-only lists. > > Oh! > > Did you point that out anywhere, before, and I missed it? > > Yes. I've been saying this for years whenever people would listen > (which is not often :-( ) Sorry. >>Are you thinking of lists as to be really somehow >>being homogeneous data, in a sense to be replacible >>by some array optimization, sometimes, while tuples aren't? > > > Python is a dynamic language, and you can do whatever you want with > the data structures it gives you. But when thinking about extending > the language with optional type declarations or automatic type > inference, I always think of the type of a list as "list of T" while I > think of a tuple's type as "tuple of length N with items of types T1, > T2, T3, ..., TN". So [1, 2] and [1, 2, 3] are both "list of int" (and > "list of Number" and "list of Object", of course) while ("hello", 42) > is a "2-tuple with items str and int" and (42, "hello", 3.14) is a > "3-tuple with items int, str, float". Oh yes, after re-thinking this, my question was dumb. From my own usage of tuples and lists, I know that I almost always use lists as collections of objects of the same type, while tuples are often used to group different things together. Basically, I knew this all, and I'm asking myself why I asked. Probably since I'm looking at lists and tuples too much technically, these days. cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From pedronis@bluewin.ch Wed Mar 12 16:25:58 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Wed, 12 Mar 2003 17:25:58 +0100 Subject: [Python-Dev] Re: Capabilities References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> <05a701c2e66f$6c001be0$6d94fea9@newmexico> <3E6C7C3A.2090104@zope.com> <008001c2e70f$f514c520$6d94fea9@newmexico> Message-ID: <007a01c2e8b4$1092ab00$6d94fea9@newmexico> This is a multi-part message in MIME format. ------=_NextPart_000_0077_01C2E8BC.71D0CC00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit I posted > [...] > > class MyExc(Exception): # !!! definition outside of resticted execution > def __init__(self,msg): > self.message = msg > Exception.__init__(self,msg) > > def __str__(self): > return self.message > > def myfunc(): > raise MyExc('foo') > > ri = RestrictedInterpreter() > > ri.globals['myfunc'] = ProxyFactory(myfunc) > > f = open('c:/Documenti/x.txt','r') > code = f.read() > f.close() > > ri.ri_exec(code) > > print "OK" > > > Anyway I have a _very baroque_ x.txt that manages to call sys.exit. > attached is a modified version of s.py that takes a filename for the code to run inside the RestrictedInterpreter. Also myfunc is now myexc_source . There is also a new function candy, next mail on that. Here is a run with xpl1 (was x.txt): ...>\usr\python22\python -i s.py xpl1 restricted execution no exit cannot access sys.exit directly Got sys.exit ...> no OK, no Python prompt ! here is xpl1 code [warning: metaclasses, descriptors usage, functional programming ahead :)] [some things are artifacts of the non-deliberate limitations inside RestrictedInterpreter] #Object = ''.__class__.__base__ Type = ''.__class__.__class__ class Iter: __metaclass__ = Type def __init__(self,v): self.v = v self.i = 0 def __iter__(self): return self def next(self): try: v = self.v[self.i] self.i += 1 return v except IndexError: raise StopIteration class consta: __metaclass__ = Type def __init__(self,o): self.o = o def __get__(self,obj,typ): return self.o # try: myexc_source() except Exception,e: pass MyExc = e.__class__ e__str__ = e.__str__ # try: e__str__.func_globals except: print "restricted execution" try: exit(0) except: print "no exit" try: import sys sys.exit(0) except: print "cannot access sys.exit directly" # class Y: class __metaclass__(Type): def __iter__(cls): return Iter(['func_globals']) class X(Y,MyExc): message = None __call__ = consta(getattr) def __iter__(self): return Iter([e__str__]) #def __get__(self,x,X): # print self,x,y # return map(self,x,X) __get__ = map # x isinst MyExc # x.message === x.__get__(x,X) === map(x,x,X) # x(o,a) === getattr(o,a) # map(None,x) === [e__str__] # map(None,X) === ['func_globals'] x=X() X.message = x g = MyExc.__str__(x) print "Got sys.exit" g[0]['exit'](0) ------=_NextPart_000_0077_01C2E8BC.71D0CC00 Content-Type: text/plain; name="s.py" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="s.py" import sys from os.path import join as pathjoin,dirname code_fname =3D pathjoin(dirname(sys.argv[0]),sys.argv[1]) from sys import exit # !!! same module as MyExc sys.path.append('C:/transit/Zope3-3.0a1/src') # ! add Zope3 (alpha) to = sys.path from zope.security.interpreter import RestrictedInterpreter from zope.security.checker import ProxyFactory class MyExc(Exception): # !!! definition outside of resticted execution def __init__(self,msg): self.message =3D msg Exception.__init__(self,msg) def __str__(self): return self.message def myexc_source(): raise MyExc('foo') def candy(s): if s =3D=3D "yes": return 'candy' else: return 'none' ri =3D RestrictedInterpreter() ri.globals['myexc_source'] =3D ProxyFactory(myexc_source) ri.globals['candy'] =3D ProxyFactory(candy) f =3D open(code_fname,'r') code =3D f.read() f.close() ri.ri_exec(code) print "OK" ------=_NextPart_000_0077_01C2E8BC.71D0CC00-- From ben@algroup.co.uk Wed Mar 12 16:24:40 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Wed, 12 Mar 2003 16:24:40 +0000 Subject: [Python-Dev] Re: Capabilities In-Reply-To: References: <200303100110.h2A1AaC06743@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E6F5F48.7040001@algroup.co.uk> Zooko wrote: > I suspect that capabilities are quite similar to Zope proxies. If I understand them correctly, a Zope proxy where the security checker always says "yes" is a capability. Except, possibly, they may be forgeable, I don't know them well enough to know. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From altis@semi-retired.com Wed Mar 12 16:43:04 2003 From: altis@semi-retired.com (Kevin Altis) Date: Wed, 12 Mar 2003 08:43:04 -0800 Subject: [Python-Dev] os.path.dirname misleading? Message-ID: I'm not sure whether to classify this as a bug or a feature request. Recently, I got burned by the fact that despite the name, dirname() does not return the expected directory portion of a path if you pass it a directory, instead it will return the parent directory because it uses split. That it uses split is clearly documented and also evident in the source, though both fail to point out the case of passing in a directory path. "dirname(path) Return the directory name of pathname path. This is the first half of the pair returned by split(path)." # Return the head (dirname) part of a path. def dirname(p): """Returns the directory component of a pathname""" return split(p)[0] However, to get what I would consider correct behavior based on the function name, the code would need to be: def dirname(p): """Returns the directory component of a pathname""" if isdir(p): return p else: return split(p)[0] Changing dirname() may in fact break existing code if people expect it to just use split, so a dirname2() function seems called for, but that seems silly, given that dirname should probably be doing an isdir() check. ka From aahz@pythoncraft.com Wed Mar 12 16:48:58 2003 From: aahz@pythoncraft.com (Aahz) Date: Wed, 12 Mar 2003 11:48:58 -0500 Subject: [Python-Dev] Tuples vs lists In-Reply-To: <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> References: <15982.24855.181016.568236@montanaro.dyndns.org> <200303111449.42035.fincher.8@osu.edu> <200303120211.h2C2BCH28989@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030312164858.GA16021@panix.com> On Tue, Mar 11, 2003, Guido van Rossum wrote: > > Tuples are for heterogeneous data, list are for homogeneous data. > Tuples are *not* read-only lists. It's been on my To-Do list to update PEP 8 since last June; if someone else wants to do it, be my guest. ;-) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Register for PyCon now! http://www.python.org/pycon/reg.html From skip@pobox.com Wed Mar 12 16:54:57 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 12 Mar 2003 10:54:57 -0600 Subject: [Python-Dev] os.path.dirname misleading? In-Reply-To: References: Message-ID: <15983.26209.303521.418616@montanaro.dyndns.org> Kevin> However, to get what I would consider correct behavior based on Kevin> the function name, the code would need to be: Kevin> def dirname(p): Kevin> """Returns the directory component of a pathname""" Kevin> if isdir(p): Kevin> return p Kevin> else: Kevin> return split(p)[0] No can do. On my Mac I could execute: >>> import ntpath >>> print ntpath.dirname("C:\\system\\win32") C:\system Calling isdir() is not an option. Taken another way, "/usr/bin" is a path to a file, so "/usr" is its directory component. and "bin" is its basename: >>> os.path.dirname("/usr/bin") '/usr' >>> os.path.basename("/usr/bin") 'bin' That "/usr/bin" happens to also be a directory is beside the point. Skip From guido@python.org Wed Mar 12 17:04:38 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 12 Mar 2003 12:04:38 -0500 Subject: [Python-Dev] os.path.dirname misleading? In-Reply-To: Your message of "Wed, 12 Mar 2003 08:43:04 PST." References: Message-ID: <200303121704.h2CH4jn00404@odiug.zope.com> > I'm not sure whether to classify this as a bug or a feature request. > Recently, I got burned by the fact that despite the name, dirname() > does not return the expected directory portion of a path if you pass > it a directory, instead it will return the parent directory because > it uses split. This is the first time I've ever heard of this confusion. dirname is named after the Unix shell function of the same name, which behaves the same way. I'm not even sure I understand what you expected -- you expected dirname("foo") to return "foo" if foo is a directory? What would be the point of that? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Mar 12 17:15:21 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 12 Mar 2003 12:15:21 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: <200303121308.h2CD8Kj29996@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [David Abrahams] > Someone I work with recently caused a test to start asserting in VC7's > instrumented free() call, using a pydebug build. He explained the > change this way: > > "I switched from PyObject_New to PyObject_NEW, which according to it's > documentation omits the check for type_object != 0 and consequently > should run a little bit faster" > > [he doesn't ever pass 0 as the typeobject] > Did he miss some other important fact about PyObject_NEW? Does the > doc need to be fixed? [Guido] > You can read the source code as well as I can. Possibly, but not as well as I can -- the memory API's implementation is monumentally convoluted, especially before 2.3. Speaking of which, David, which version of Python was "someone" using? Did they enable pymalloc? Did they give you a traceback (showing from where free() was called)? Was it even freeing a Python object at the time? In what code base did someone make this substitution (e.g., Python core, Boost sources, someone's own extension module, someone else's extension module)? The straight answer to your question is no. A nastier answer is that many memory mgmt screwups are shy, and can be triggered by seemingly irrelevant changes. From altis@semi-retired.com Wed Mar 12 17:45:15 2003 From: altis@semi-retired.com (Kevin Altis) Date: Wed, 12 Mar 2003 09:45:15 -0800 Subject: [Python-Dev] os.path.dirname misleading? In-Reply-To: <200303121704.h2CH4jn00404@odiug.zope.com> Message-ID: > From: Guido van Rossum > > > I'm not sure whether to classify this as a bug or a feature request. > > Recently, I got burned by the fact that despite the name, dirname() > > does not return the expected directory portion of a path if you pass > > it a directory, instead it will return the parent directory because > > it uses split. > > This is the first time I've ever heard of this confusion. dirname is > named after the Unix shell function of the same name, which behaves > the same way. Well that's news. I never heard of or used dirname in the shell. But with that historical context it makes more sense now. > I'm not even sure I understand what you expected -- you expected > dirname("foo") to return "foo" if foo is a directory? What would be > the point of that? Yes, I expected to get the directory passed in based on the function name. In the code in question I don't know whether the path is a directory or a file when I call dirname. I was simply misled by the function name. Looking at this further I can see that I'm just going to have to create my own directory(path) function because of how os.path.split behaves which impacts dirname, I definitely need an isdir() check. >>> os.path.split('c:\\mypython\\bugs\\') ('c:\\mypython\\bugs', '') >>> os.path.split('c:\\mypython\\bugs') ('c:\\mypython', 'bugs') Hmm, I may actually switch to using split(path)[0] and split(path)[-1] (or split(path)[1]) in some cases since those might be more descriptive of what dirname and basename actually do. Pity the functions aren't named os.path.head and os.path.tail. Sorry for the confusion, ka From pedronis@bluewin.ch Wed Mar 12 17:53:21 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Wed, 12 Mar 2003 18:53:21 +0100 Subject: about candy [Python-Dev] Re: Capabilities References: <200303071741.h27HfGb23015@pcp02138704pcs.reston01.va.comcast.net> <3E69E1BC.5090508@algroup.co.uk> <1047150320.2347.26.camel@localhost.localdomain> <3E6B21F7.3040300@zope.com> <011f01c2e62f$3e6d5840$6d94fea9@newmexico> <05a701c2e66f$6c001be0$6d94fea9@newmexico> <3E6C7C3A.2090104@zope.com> <008001c2e70f$f514c520$6d94fea9@newmexico> <007a01c2e8b4$1092ab00$6d94fea9@newmexico> Message-ID: <031001c2e8c0$45a066a0$6d94fea9@newmexico> [me] > attached is a modified version of s.py that takes a filename for the code to > run inside the RestrictedInterpreter. Also myfunc is now myexc_source . There > is also a new function candy, next mail on that. Consider from s.py: -- * -- from sys import exit ... def candy(s): if s == "yes": return 'candy' else: return 'none' ri = RestrictedInterpreter() ri.globals['candy'] = ProxyFactory(candy) ... ri.ri_exec(code) print "OK" -- * -- No unproxied exceptions, on the other hand both rexec and the prototype RestrictedIntrepreter supply code with globals() [!], and apply() ... I have some _even more baroque_ code (xpl2) that exploits candy and manages to call sys.exit: ...>\usr\python22\python -i s.py xpl2 candy Got sys.exit ...> In this case xpl2 could be rewritten as a single expression of the form: candy(...) although that would make for a totally masochistic exercise and a total obfuscated python entry. No, I haven't done/ tried that :) regards. From dave@boost-consulting.com Wed Mar 12 18:03:35 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 12 Mar 2003 13:03:35 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: (Tim Peters's message of "Wed, 12 Mar 2003 12:15:21 -0500") References: Message-ID: Tim Peters writes: > [Guido] >> You can read the source code as well as I can. > > Possibly, but not as well as I can -- the memory API's > implementation is monumentally convoluted, especially before 2.3. > Speaking of which, David, which version of Python was "someone" > using? I was the one who discovered the problem, using Python 2.2.2. Curiously, "someone" missed it because he was using vc6 instead of vc7. > Did they enable pymalloc? I don't think I did that. I don't exactly know what pymalloc is. > Did they give you a traceback (showing from where free() was > called)? I can get one for you. Here: > MSVCRTD.DLL!_free_dbg_lk(void * pUserData=0x00c46338, int nBlockUse=1) Line 1044 + 0x30 C MSVCRTD.DLL!_free_dbg(void * pUserData=0x00c46338, int nBlockUse=1) Line 1001 + 0xd C MSVCRTD.DLL!free(void * pUserData=0x00c46338) Line 956 + 0xb C python22_d.dll!_PyObject_Del(_object * op=0x00c46338) Line 146 + 0xa C opaque_ext_d.pyd!dealloc(_object * self=0x00c46338) Line 12 + 0xa C++ python22_d.dll!_Py_Dealloc(_object * op=0x00c46338) Line 1837 + 0x7 C python22_d.dll!tupledealloc(PyTupleObject * op=0x0093c9a8) Line 147 + 0x70 C python22_d.dll!_Py_Dealloc(_object * op=0x0093c9a8) Line 1837 + 0x7 C python22_d.dll!do_call(_object * func=0x00c45e00, _object * * * pp_stack=0x0012edf8, int na=1, int nk=0) Line 3273 + 0x43 C python22_d.dll!eval_frame(_frame * f=0x008c2e68) Line 2038 + 0x1e C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x00963068, _object * globals=0x008e65a8, _object * locals=0x008e65a8, _object * * args=0x00000000, int argcount=0, _object * * kws=0x00000000, int kwcount=0, _object * * defs=0x00000000, int defcount=0, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!PyEval_EvalCode(PyCodeObject * co=0x00963068, _object * globals=0x008e65a8, _object * locals=0x008e65a8) Line 486 + 0x1f C python22_d.dll!exec_statement(_frame * f=0x008efbf0, _object * prog=0x00963068, _object * globals=0x008e65a8, _object * locals=0x008e65a8) Line 3668 + 0x11 C python22_d.dll!eval_frame(_frame * f=0x008efbf0) Line 1482 + 0x15 C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x008c79a8, _object * globals=0x008e5940, _object * locals=0x00000000, _object * * args=0x00965a7c, int argcount=7, _object * * kws=0x00965a98, int kwcount=0, _object * * defs=0x00000000, int defcount=0, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!fast_function(_object * func=0x0095cc98, _object * * * pp_stack=0x0012f268, int n=7, int na=7, int nk=0) Line 3173 + 0x41 C python22_d.dll!eval_frame(_frame * f=0x00965900) Line 2035 + 0x25 C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x008c6fd8, _object * globals=0x008e5940, _object * locals=0x00000000, _object * * args=0x008d8c64, int argcount=5, _object * * kws=0x008d8c78, int kwcount=0, _object * * defs=0x00000000, int defcount=0, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!fast_function(_object * func=0x009551a8, _object * * * pp_stack=0x0012f494, int n=5, int na=5, int nk=0) Line 3173 + 0x41 C python22_d.dll!eval_frame(_frame * f=0x008d8af0) Line 2035 + 0x25 C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x008cee50, _object * globals=0x008e5940, _object * locals=0x00000000, _object * * args=0x008c75d8, int argcount=5, _object * * kws=0x008c75ec, int kwcount=0, _object * * defs=0x0092d7a4, int defcount=3, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!fast_function(_object * func=0x00955220, _object * * * pp_stack=0x0012f6c0, int n=5, int na=5, int nk=0) Line 3173 + 0x41 C python22_d.dll!eval_frame(_frame * f=0x008c7450) Line 2035 + 0x25 C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x008ddba0, _object * globals=0x008e5940, _object * locals=0x00000000, _object * * args=0x0089582c, int argcount=3, _object * * kws=0x00895838, int kwcount=0, _object * * defs=0x0092e2cc, int defcount=1, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!fast_function(_object * func=0x0095a850, _object * * * pp_stack=0x0012f8ec, int n=3, int na=3, int nk=0) Line 3173 + 0x41 C python22_d.dll!eval_frame(_frame * f=0x008956a8) Line 2035 + 0x25 C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x008e3848, _object * globals=0x008e5940, _object * locals=0x00000000, _object * * args=0x008787f4, int argcount=1, _object * * kws=0x008787f8, int kwcount=0, _object * * defs=0x0092eaa4, int defcount=5, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!fast_function(_object * func=0x0095e620, _object * * * pp_stack=0x0012fb18, int n=1, int na=1, int nk=0) Line 3173 + 0x41 C python22_d.dll!eval_frame(_frame * f=0x00878690) Line 2035 + 0x25 C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x008ce618, _object * globals=0x0086f780, _object * locals=0x00000000, _object * * args=0x008777bc, int argcount=0, _object * * kws=0x008777bc, int kwcount=0, _object * * defs=0x0084e56c, int defcount=1, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!fast_function(_object * func=0x008736e8, _object * * * pp_stack=0x0012fd44, int n=0, int na=0, int nk=0) Line 3173 + 0x41 C python22_d.dll!eval_frame(_frame * f=0x00877660) Line 2035 + 0x25 C python22_d.dll!PyEval_EvalCodeEx(PyCodeObject * co=0x0084d8a8, _object * globals=0x0086f780, _object * locals=0x0086f780, _object * * args=0x00000000, int argcount=0, _object * * kws=0x00000000, int kwcount=0, _object * * defs=0x00000000, int defcount=0, _object * closure=0x00000000) Line 2595 + 0x9 C python22_d.dll!PyEval_EvalCode(PyCodeObject * co=0x0084d8a8, _object * globals=0x0086f780, _object * locals=0x0086f780) Line 486 + 0x1f C python22_d.dll!run_node(_node * n=0x0088e730, char * filename=0x00842def, _object * globals=0x0086f780, _object * locals=0x0086f780, PyCompilerFlags * flags=0x0012ff38) Line 1079 + 0x11 C python22_d.dll!run_err_node(_node * n=0x0088e730, char * filename=0x00842def, _object * globals=0x0086f780, _object * locals=0x0086f780, PyCompilerFlags * flags=0x0012ff38) Line 1066 + 0x19 C python22_d.dll!PyRun_FileExFlags(_iobuf * fp=0x10261888, char * filename=0x00842def, int start=257, _object * globals=0x0086f780, _object * locals=0x0086f780, int closeit=1, PyCompilerFlags * flags=0x0012ff38) Line 1057 + 0x19 C python22_d.dll!PyRun_SimpleFileExFlags(_iobuf * fp=0x10261888, char * filename=0x00842def, int closeit=1, PyCompilerFlags * flags=0x0012ff38) Line 686 + 0x22 C python22_d.dll!PyRun_AnyFileExFlags(_iobuf * fp=0x10261888, char * filename=0x00842def, int closeit=1, PyCompilerFlags * flags=0x0012ff38) Line 495 + 0x15 C python22_d.dll!Py_Main(int argc=2, char * * argv=0x00842db8) Line 367 + 0x30 C python_d.exe!main(int argc=2, char * * argv=0x00842db8) Line 10 + 0xd C python_d.exe!mainCRTStartup() Line 338 + 0x11 C kernel32.dll!77e814c7() > Was it even freeing a Python object at the time? Yup. > In what code base did someone make this substitution (e.g., Python > core, Boost sources, someone's own extension module, someone else's > extension module)? Boost sources > The straight answer to your question is no. A nastier answer is > that many memory mgmt screwups are shy, and can be triggered by > seemingly irrelevant changes. Both answers seem to amount to "'someone' must have a bug in his code". Am I reading that correctly? -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim.one@comcast.net Wed Mar 12 18:18:29 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 12 Mar 2003 13:18:29 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: Message-ID: [David Abrahams] > I was the one who discovered the problem, using Python 2.2.2. > Curiously, "someone" missed it because he was using vc6 instead of > vc7. So you were using VC7. If so, using it for what? Every stick of code in question, or were you mixing VC7-compiled code with VC6-compiled code? If the latter, talk to Microsoft (by most accounts their runtime support libraries aren't compatible with each other). [traceback freeing a tuple] > Both answers seem to amount to "'someone' must have a bug in his > code". Am I reading that correctly? Yes, for the right meaning of "someone". Possibilities beyond you include Python and Microsoft. Best guess I can make based on what you haven't told us yet is that you were mixing the released Python 2.2.2 Windows core DLL (built with MSVC6) with extension code using MSVC7 C runtime libraries. Right or wrong? From dave@boost-consulting.com Wed Mar 12 18:42:53 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 12 Mar 2003 13:42:53 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: (Tim Peters's message of "Wed, 12 Mar 2003 13:18:29 -0500") References: Message-ID: Tim Peters writes: > [David Abrahams] >> I was the one who discovered the problem, using Python 2.2.2. >> Curiously, "someone" missed it because he was using vc6 instead of >> vc7. > > So you were using VC7. If so, using it for what? Every stick of code in > question, or were you mixing VC7-compiled code with VC6-compiled > code? Python was compiled with vc6, the rest with vc7. I test this combination regularly and have never seen a problem. > If the latter, talk to Microsoft (by most accounts their runtime > support libraries aren't compatible with each other). Sure, but that's only an issue if you are allocating resources in one runtime lib and deallocating in another AFAIK. There's nothing beyond memory allocation going on here, and the type object in question has a custom deallocator which goes to the same runtime that allocated it. >> Both answers seem to amount to "'someone' must have a bug in his >> code". Am I reading that correctly? > > Yes, for the right meaning of "someone". Possibilities beyond you include > Python and Microsoft. Best guess I can make based on what you haven't told > us yet is that you were mixing the released Python 2.2.2 Windows core DLL > (built with MSVC6) with extension code using MSVC7 C runtime libraries. > Right or wrong? Totally and absolutely right. -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim.one@comcast.net Wed Mar 12 18:57:34 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 12 Mar 2003 13:57:34 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: Message-ID: [David Abrahams] > Python was compiled with vc6, the rest with vc7. I test this > combination regularly and have never seen a problem. You have now . > Sure, but that's only an issue if you are allocating resources in one > runtime lib and deallocating in another AFAIK. There's nothing > beyond memory allocation going on here, and the type object in > question has a custom deallocator which goes to the same runtime that > allocated it. See my later msg -- returning memory to a heap it wasn't obtained from is fatal enough. The object memory itself is in question here, not memory allocated *by* the object. Look at the traceback you sent if the distinction isn't clear. From dave@boost-consulting.com Wed Mar 12 19:08:00 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 12 Mar 2003 14:08:00 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: (Tim Peters's message of "Wed, 12 Mar 2003 13:49:22 -0500") References: Message-ID: Tim Peters writes: > Question: I don't have VC7 and don't know what it does. The traceback > ended in MSVCRTD.DLL, which I recognize as MS's debug-mode C runtime DLL for > VC6. Does VC7 use the same DLL name, or some other DLL name? The same one. > If the latter, my theory is that PyObject_New used the MSVC6 malloc, > but that PyObject_NEW used the MSCV7 malloc (due to macro expansion > in your code). Brilliant theory! > In both cases the MSVC6 free() gets called. Ah, correct. I misread "someone's" code; the delete function just calls PyObject_Del(). I think "someone" probably ought to do something more explicit to control where things are allocated/freed. But for now, I think using PyObject_New/PyObject_Del is reasonable. > But the MSVC6 and MSVC7 heaps are distinct, so the debug-mode MSVC6 > free() complains because it wasn't the source of the memory getting > freed. A missing piece of the puzzle: what was the error msg at the > time this thing died? unhandled exception at 0x10213638 (MSVCRTD.DLL) in python_d.exe: User breakpoint. It seems to me that in light of all this, it's probably worth noting this difference between PyObject_New and PyObject_NEW in the docs. People *will* develop extension modules with different compilers from the one Python was compiled with... I know, submit a patch. -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave@boost-consulting.com Wed Mar 12 19:09:17 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 12 Mar 2003 14:09:17 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: (Tim Peters's message of "Wed, 12 Mar 2003 13:57:34 -0500") References: Message-ID: Tim Peters writes: >> Sure, but that's only an issue if you are allocating resources in one >> runtime lib and deallocating in another AFAIK. There's nothing >> beyond memory allocation going on here, and the type object in >> question has a custom deallocator which goes to the same runtime that >> allocated it. > > See my later msg -- returning memory to a heap it wasn't obtained from is > fatal enough. I think that's exactly what I said. > The object memory itself is in question here, not memory > allocated *by* the object. I think that's also exactly what I thought. -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim.one@comcast.net Wed Mar 12 19:26:22 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 12 Mar 2003 14:26:22 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: Message-ID: [David Abrahams] > ... > It seems to me that in light of all this, it's probably worth noting > this difference between PyObject_New and PyObject_NEW in the docs. I don't think the macro versions should ever be used outside the core. Inside the core, it's safe. So I think the "doc bug" is that the docs mention PyObject_NEW at all. > People *will* develop extension modules with different compilers from > the one Python was compiled with... Yup. > I know, submit a patch. That would be a sociable thing. From skip@pobox.com Wed Mar 12 19:40:10 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 12 Mar 2003 13:40:10 -0600 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: References: Message-ID: <15983.36122.138471.586434@montanaro.dyndns.org> David> But for now, I think using PyObject_New/PyObject_Del is David> reasonable. Or perhaps PyObject_NEW/PyObject_DEL. S From theller@python.net Wed Mar 12 19:52:47 2003 From: theller@python.net (Thomas Heller) Date: 12 Mar 2003 20:52:47 +0100 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: References: Message-ID: <7kb4z7jk.fsf@python.net> Tim Peters writes: > [David Abrahams] > > ... > > It seems to me that in light of all this, it's probably worth noting > > this difference between PyObject_New and PyObject_NEW in the docs. > > I don't think the macro versions should ever be used outside the core. > Inside the core, it's safe. So I think the "doc bug" is that the docs > mention PyObject_NEW at all. > Better to explícitely warn about them with a wording similar to that from the section 9.2 Memory Interface: In addition, the following macro sets are provided for calling the Python memory allocator directly, without involving the C API functions listed above. However, note that their use does not preserve binary compatibility accross Python versions [] and is therefore deprecated in extension modules. Maybe 'and compilers' should be inserted between the []. Thomas From greg@cosc.canterbury.ac.nz Wed Mar 12 21:27:31 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Mar 2003 10:27:31 +1300 (NZDT) Subject: How long is your shopping tuple? (Re: [Python-Dev] Ridiculously minor tweaks?) In-Reply-To: <3E6EABEE.2040108@tismer.com> Message-ID: <200303122127.h2CLRVZ02250@oma.cosc.canterbury.ac.nz> Guido: > Tuples are for heterogeneous data, list are for homogeneous data. > Tuples are *not* read-only lists. Weird things are happening in my brain this morning. After reading this thread, I was replying to something unrelated and had the occasion to use the phrase "It's on my list"... and I briefly wondered whether I should use the word "tuple" instead! Somehow "It's on my tuple" doesn't have quite the same ring to it. So, yes, tuples ARE different from lists... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Mar 12 21:29:25 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Mar 2003 10:29:25 +1300 (NZDT) Subject: [Python-Dev] Re: Capabilities In-Reply-To: <3E6F5F48.7040001@algroup.co.uk> Message-ID: <200303122129.h2CLTPq02277@oma.cosc.canterbury.ac.nz> Ben Laurie : > If I understand them correctly, a Zope proxy where the security checker > always says "yes" is a capability. Except, possibly, they may be > forgeable, I don't know them well enough to know. A security checker that could be easily forged wouldn't be very, er, secure, would it?-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin@v.loewis.de Wed Mar 12 21:33:50 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 12 Mar 2003 22:33:50 +0100 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: References: Message-ID: David Abrahams writes: > Sure, but that's only an issue if you are allocating resources in one > runtime lib and deallocating in another AFAIK. No. You also cannot pass struct FILE* from one C library to the other; file locking will then crash. Regards, Martin From brett@python.org Wed Mar 12 21:36:59 2003 From: brett@python.org (Brett Cannon) Date: Wed, 12 Mar 2003 13:36:59 -0800 (PST) Subject: [Python-Dev] Care to sprint on the core at PyCon? Message-ID: Four members of PythonLabs will be at the pre-PyCon sprint (more info on sprints at http://www.python.org/cgi-bin/moinmoin/SprintPlan ) running one for the Python core. If you would like to attend, email me at brett@python.org to say so. You must be registered for PyCon to be able to attend. And please do this ASAP so we can get the ball rolling on this and lock down who will be there. And regardless whether you care to attend or not, please look at http://www.python.org/cgi-bin/moinmoin/PyCoreSprint and make suggestions on what the group should sprint on. -Brett From jeremy@zope.com Wed Mar 12 21:43:38 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 12 Mar 2003 16:43:38 -0500 Subject: [Python-Dev] Care to sprint on the core at PyCon? In-Reply-To: References: Message-ID: <1047505418.21994.150.camel@slothrop.zope.com> On Wed, 2003-03-12 at 16:36, Brett Cannon wrote: > Four members of PythonLabs will be at the pre-PyCon sprint (more info on > sprints at http://www.python.org/cgi-bin/moinmoin/SprintPlan ) running one > for the Python core. If you would like to attend, email me at > brett@python.org to say so. You must be registered for PyCon to be able > to attend. And please do this ASAP so we can get the ball rolling on this > and lock down who will be there. Thanks for taking this up! There is still some room for sprinters. > And regardless whether you care to attend or not, please look at > http://www.python.org/cgi-bin/moinmoin/PyCoreSprint and make suggestions > on what the group should sprint on. I would like to do some sprinting on the ast branch, which I noted in the wiki. Jeremy From dave@boost-consulting.com Wed Mar 12 22:15:12 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 12 Mar 2003 17:15:12 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: (martin@v.loewis.de's message of "12 Mar 2003 22:33:50 +0100") References: Message-ID: martin@v.loewis.de (Martin v. L=F6wis) writes: > David Abrahams writes: > >> Sure, but that's only an issue if you are allocating resources in one >> runtime lib and deallocating in another AFAIK.=20=20 > > No. You also cannot pass struct FILE* from one C library to the other; > file locking will then crash. A file is a resource. --=20 Dave Abrahams Boost Consulting www.boost-consulting.com From greg@cosc.canterbury.ac.nz Thu Mar 13 01:51:27 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 13 Mar 2003 14:51:27 +1300 (NZDT) Subject: [Python-Dev] os.path.dirname misleading? In-Reply-To: Message-ID: <200303130151.h2D1pRf09895@oma.cosc.canterbury.ac.nz> Kevin Altis : > Pity the functions aren't named > os.path.head and os.path.tail. It wouldn't be entirely clear what they mean even then -- "head" might mean just the first pathname component. In a tool I wrote some years ago in Scheme, I called them "filename-directory" and "filename-nondirectory". Which suffered from the same problem, really (they didn't consult the file system either). But it didn't matter, since I was the only person who used them, and *I* knew what they meant. :-) Maybe they should be called "all_except_the_last_pathname_component" and "last_pathname_component"? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From skip@pobox.com Thu Mar 13 03:38:43 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 12 Mar 2003 21:38:43 -0600 Subject: [Python-Dev] os.path.dirname misleading? In-Reply-To: <200303130151.h2D1pRf09895@oma.cosc.canterbury.ac.nz> References: <200303130151.h2D1pRf09895@oma.cosc.canterbury.ac.nz> Message-ID: <15983.64835.715338.915063@montanaro.dyndns.org> Greg> Kevin Altis : >> Pity the functions aren't named os.path.head and os.path.tail. Greg> It wouldn't be entirely clear what they mean even then -- "head" Greg> might mean just the first pathname component. ... Greg> Maybe they should be called Greg> "all_except_the_last_pathname_component" and Greg> "last_pathname_component"? I know, how about car and cdr? ;-) Skip From martin@v.loewis.de Thu Mar 13 07:46:13 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 13 Mar 2003 08:46:13 +0100 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: References: Message-ID: David Abrahams writes: > martin@v.loewis.de (Martin v. L=C2=8E=C3=B6wis) writes: >=20 > > David Abrahams writes: > > > >> Sure, but that's only an issue if you are allocating resources in one > >> runtime lib and deallocating in another AFAIK.=20=20 > > > > No. You also cannot pass struct FILE* from one C library to the other; > > file locking will then crash. >=20 > A file is a resource. Yes, but printf is neither allocation nor deallocation. Regards, Martin From ben@algroup.co.uk Thu Mar 13 10:47:59 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Thu, 13 Mar 2003 10:47:59 +0000 Subject: [Python-Dev] Re: Capabilities In-Reply-To: <200303122129.h2CLTPq02277@oma.cosc.canterbury.ac.nz> References: <200303122129.h2CLTPq02277@oma.cosc.canterbury.ac.nz> Message-ID: <3E7061DF.8020207@algroup.co.uk> Greg Ewing wrote: > Ben Laurie : > > >>If I understand them correctly, a Zope proxy where the security checker >>always says "yes" is a capability. Except, possibly, they may be >>forgeable, I don't know them well enough to know. > > > A security checker that could be easily forged wouldn't > be very, er, secure, would it?-) Its the proxy that needs to be unforgeable. And since their model is role-based, I assume its not a fundamental requirement for them. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From mwh@python.net Thu Mar 13 11:02:31 2003 From: mwh@python.net (Michael Hudson) Date: Thu, 13 Mar 2003 11:02:31 +0000 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: (Tim Peters's message of "Wed, 12 Mar 2003 14:26:22 -0500") References: Message-ID: <2mvfynedh4.fsf@starship.python.net> Tim Peters writes: > [David Abrahams] >> ... >> It seems to me that in light of all this, it's probably worth noting >> this difference between PyObject_New and PyObject_NEW in the docs. > > I don't think the macro versions should ever be used outside the core. > Inside the core, it's safe. So I think the "doc bug" is that the docs > mention PyObject_NEW at all. What, precisely, does PyObject_NEW save you? From a brief squint at the sources, my best guess is "nothing" -- and it may even be a pessimization due to increased code size. Maybe we could kill it entirely (after the usual round of deprecations, of course). Cheers, M. -- Like most people, I don't always agree with the BDFL (especially when he wants to change things I've just written about in very large books), ... -- Mark Lutz, http://python.oreilly.com/news/python_0501.html From dave@boost-consulting.com Thu Mar 13 12:28:00 2003 From: dave@boost-consulting.com (David Abrahams) Date: Thu, 13 Mar 2003 07:28:00 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: <7kb4z7jk.fsf@python.net> (Thomas Heller's message of "12 Mar 2003 20:52:47 +0100") References: <7kb4z7jk.fsf@python.net> Message-ID: Thomas Heller writes: >> I don't think the macro versions should ever be used outside the core. >> Inside the core, it's safe. So I think the "doc bug" is that the docs >> mention PyObject_NEW at all. >>=20 > > Better to expl=EDcitely warn about them with a wording similar to that > from the section 9.2 Memory Interface: > > In addition, the following macro sets are provided for calling the > Python memory allocator directly, without involving the C API > functions listed above. However, note that their use does not preserve > binary compatibility accross Python versions [] and is therefore > deprecated in extension modules. > > Maybe 'and compilers' should be inserted between the []. I'm not in a position to decide which one of these is better. Thomas, maybe you should be submitting the patch? --=20 Dave Abrahams Boost Consulting www.boost-consulting.com From dave@boost-consulting.com Thu Mar 13 12:26:35 2003 From: dave@boost-consulting.com (David Abrahams) Date: Thu, 13 Mar 2003 07:26:35 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: (martin@v.loewis.de's message of "13 Mar 2003 08:46:13 +0100") References: Message-ID: martin@v.loewis.de (Martin v. L=F6wis) writes: > David Abrahams writes: > >> > No. You also cannot pass struct FILE* from one C library to the other; >> > file locking will then crash. >>=20 >> A file is a resource. > > Yes, but printf is neither allocation nor deallocation. fprintf, but point taken. --=20 Dave Abrahams Boost Consulting www.boost-consulting.com From theller@python.net Thu Mar 13 12:39:28 2003 From: theller@python.net (Thomas Heller) Date: 13 Mar 2003 13:39:28 +0100 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: References: <7kb4z7jk.fsf@python.net> Message-ID: <8yvjqw3j.fsf@python.net> David Abrahams writes: > Thomas Heller writes: > > >> I don't think the macro versions should ever be used outside the core. > >> Inside the core, it's safe. So I think the "doc bug" is that the docs > >> mention PyObject_NEW at all. > >> > > > > Better to explícitely warn about them with a wording similar to that > > from the section 9.2 Memory Interface: > > I'm not in a position to decide which one of these is better. Thomas, > maybe you should be submitting the patch? Nor am I. You can submit a patch as well, and the discussion will show. Sorry, no time. Thomas From dave@boost-consulting.com Thu Mar 13 13:00:24 2003 From: dave@boost-consulting.com (David Abrahams) Date: Thu, 13 Mar 2003 08:00:24 -0500 Subject: [Python-Dev] PyObject_New vs PyObject_NEW In-Reply-To: <8yvjqw3j.fsf@python.net> (Thomas Heller's message of "13 Mar 2003 13:39:28 +0100") References: <7kb4z7jk.fsf@python.net> <8yvjqw3j.fsf@python.net> Message-ID: Thomas Heller writes: > David Abrahams writes: > >> Thomas Heller writes: >>=20 >> >> I don't think the macro versions should ever be used outside the core. >> >> Inside the core, it's safe. So I think the "doc bug" is that the docs >> >> mention PyObject_NEW at all. >> >>=20 >> > >> > Better to expl=EDcitely warn about them with a wording similar to that >> > from the section 9.2 Memory Interface: >>=20 >> I'm not in a position to decide which one of these is better. Thomas, >> maybe you should be submitting the patch? > > Nor am I. You can submit a patch as well, and the discussion will show. > > Sorry, no time. Fair enough; Tim's patch was much easier ;-) --=20 Dave Abrahams Boost Consulting www.boost-consulting.com From fuf@mageo.cz Thu Mar 13 13:36:09 2003 From: fuf@mageo.cz (Michal Vitecek) Date: Thu, 13 Mar 2003 14:36:09 +0100 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? Message-ID: <20030313133609.GA23223@foof.i3.cz> [this was sent to python-list, but i'm re-posting here as told by Skip] hello, i had a quick look at Objects/abstract.c in 2.2.2's source. almost every function there checks whether the objects it's passed are not NULL. if they are, SystemError exception occurs. since i've never come across such exception i've commented out those checks. the resulting python binary did 6.5% more pystones on average (the numbers are below). my question is: are those checks really necessary in non-debug python build? the pystone results: BEFORE: $ for (( i = 0; i <= 5; i++ )); do ./pystone.py; done Pystone(1.1) time for 10000 passes = 0.6 This machine benchmarks at 16666.7 pystones/second Pystone(1.1) time for 10000 passes = 0.56 This machine benchmarks at 17857.1 pystones/second Pystone(1.1) time for 10000 passes = 0.58 This machine benchmarks at 17241.4 pystones/second Pystone(1.1) time for 10000 passes = 0.57 This machine benchmarks at 17543.9 pystones/second Pystone(1.1) time for 10000 passes = 0.57 This machine benchmarks at 17543.9 pystones/second AFTER: $ for (( i = 0; i <= 5; i++ )); do ./pystone.py; done Pystone(1.1) time for 10000 passes = 0.54 This machine benchmarks at 18518.5 pystones/second Pystone(1.1) time for 10000 passes = 0.57 This machine benchmarks at 17543.9 pystones/second Pystone(1.1) time for 10000 passes = 0.55 This machine benchmarks at 18181.8 pystones/second Pystone(1.1) time for 10000 passes = 0.52 This machine benchmarks at 19230.8 pystones/second Pystone(1.1) time for 10000 passes = 0.52 This machine benchmarks at 19230.8 pystones/second Pystone(1.1) time for 10000 passes = 0.54 -- fuf (fuf@mageo.cz) From mwh@python.net Thu Mar 13 13:45:45 2003 From: mwh@python.net (Michael Hudson) Date: Thu, 13 Mar 2003 13:45:45 +0000 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <20030313133609.GA23223@foof.i3.cz> (Michal Vitecek's message of "Thu, 13 Mar 2003 14:36:09 +0100") References: <20030313133609.GA23223@foof.i3.cz> Message-ID: <2msmtre5x2.fsf@starship.python.net> Michal Vitecek writes: > i had a quick look at Objects/abstract.c in 2.2.2's source. almost > every function there checks whether the objects it's passed are not > NULL. if they are, SystemError exception occurs. since i've never come > across such exception i've commented out those checks. There are a number of bits of stupidly defensive programming in Python... personally, I'd like to see the back of them. > the resulting python binary did 6.5% more pystones on average (the > numbers are below). Wow! Can we persuade you to try CVS HEAD? > my question is: are those checks really necessary > in non-debug python build? This is the tricky bit, of course. I don't think so, but it's hard to be sure. OTOH, it could be the easiest 5% speed up ever... Cheers, M. -- This makes it possible to pass complex object hierarchies to a C coder who thinks computer science has made no worthwhile advancements since the invention of the pointer. -- Gordon McMillan, 30 Jul 1998 From mwh@python.net Thu Mar 13 14:06:56 2003 From: mwh@python.net (Michael Hudson) Date: Thu, 13 Mar 2003 14:06:56 +0000 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <2msmtre5x2.fsf@starship.python.net> (Michael Hudson's message of "Thu, 13 Mar 2003 13:45:45 +0000") References: <20030313133609.GA23223@foof.i3.cz> <2msmtre5x2.fsf@starship.python.net> Message-ID: <2mptove4xr.fsf@starship.python.net> Michael Hudson writes: >> the resulting python binary did 6.5% more pystones on average (the >> numbers are below). > > Wow! Can we persuade you to try CVS HEAD? Actually, I've now tried it, and saw a pystone increase of more like 0.1%. Are you sure the abstract.c changes are the only difference between the two binaries? Cheers, M. -- I've reinvented the idea of variables and types as in a programming language, something I do on every project. -- Greg Ward, September 1998 From fuf@mageo.cz Thu Mar 13 14:18:58 2003 From: fuf@mageo.cz (Michal Vitecek) Date: Thu, 13 Mar 2003 15:18:58 +0100 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <2msmtre5x2.fsf@starship.python.net> References: <20030313133609.GA23223@foof.i3.cz> <2msmtre5x2.fsf@starship.python.net> Message-ID: <20030313141858.GB23223@foof.i3.cz> Michael Hudson wrote: >Wow! Can we persuade you to try CVS HEAD? okay - i did as you said and the speed-up is only 2.1% so it's not probably worth it. here come the numbers: BEFORE: $ for (( i = 0; i <= 5; i++ )); do ./python Lib/test/pystone.py; done Pystone(1.1) time for 50000 passes = 1.97 This machine benchmarks at 25380.7 pystones/second Pystone(1.1) time for 50000 passes = 1.92 This machine benchmarks at 26041.7 pystones/second Pystone(1.1) time for 50000 passes = 1.96 This machine benchmarks at 25510.2 pystones/second Pystone(1.1) time for 50000 passes = 1.97 This machine benchmarks at 25380.7 pystones/second Pystone(1.1) time for 50000 passes = 1.96 This machine benchmarks at 25510.2 pystones/second Pystone(1.1) time for 50000 passes = 1.96 This machine benchmarks at 25510.2 pystones/second AFTER: $ for (( i = 0; i <= 5; i++ )); do ./python Lib/test/pystone.py; done Pystone(1.1) time for 50000 passes = 1.95 This machine benchmarks at 25641 pystones/second Pystone(1.1) time for 50000 passes = 1.93 This machine benchmarks at 25906.7 pystones/second Pystone(1.1) time for 50000 passes = 1.91 This machine benchmarks at 26178 pystones/second Pystone(1.1) time for 50000 passes = 1.92 This machine benchmarks at 26041.7 pystones/second Pystone(1.1) time for 50000 passes = 1.89 This machine benchmarks at 26455 pystones/second Pystone(1.1) time for 50000 passes = 1.89 This machine benchmarks at 26455 pystones/second -- fuf (fuf@mageo.cz) From guido@python.org Thu Mar 13 14:29:38 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 13 Mar 2003 09:29:38 -0500 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: Your message of "Thu, 13 Mar 2003 14:36:09 +0100." <20030313133609.GA23223@foof.i3.cz> References: <20030313133609.GA23223@foof.i3.cz> Message-ID: <200303131429.h2DETem03635@odiug.zope.com> > i had a quick look at Objects/abstract.c in 2.2.2's source. almost > every function there checks whether the objects it's passed are not > NULL. if they are, SystemError exception occurs. since i've never come > across such exception i've commented out those checks. > > the resulting python binary did 6.5% more pystones on average (the > numbers are below). my question is: are those checks really necessary > in non-debug python build? Unfortunately, this is part of the safety net for poor extension writers, and I'm not sure we can drop it. Given that Pystone is so regular, it's probably just one or two of the functions you changed that make the difference. If you can figure out which ones, perhaps you could inline just those (in the switch in ceval.c) and get the same effect. Anyway, I only get a 1% speedup. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Thu Mar 13 14:33:21 2003 From: mwh@python.net (Michael Hudson) Date: Thu, 13 Mar 2003 14:33:21 +0000 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <20030313141858.GB23223@foof.i3.cz> (Michal Vitecek's message of "Thu, 13 Mar 2003 15:18:58 +0100") References: <20030313133609.GA23223@foof.i3.cz> <2msmtre5x2.fsf@starship.python.net> <20030313141858.GB23223@foof.i3.cz> Message-ID: <2mn0jze3pq.fsf@starship.python.net> Michal Vitecek writes: > Michael Hudson wrote: >>Wow! Can we persuade you to try CVS HEAD? > > okay - i did as you said and the speed-up is only 2.1% so it's not > probably worth it. here come the numbers: I didn't say "*two* point one", I said "*nought* point one"!: BEFORE: $ for i in 1 2 3 4 5; do ./python- ../Lib/test/pystone.py; done Pystone(1.1) time for 50000 passes = 3.39 This machine benchmarks at 14749.3 pystones/second Pystone(1.1) time for 50000 passes = 3.39 This machine benchmarks at 14749.3 pystones/second Pystone(1.1) time for 50000 passes = 3.38 This machine benchmarks at 14792.9 pystones/second Pystone(1.1) time for 50000 passes = 3.37 This machine benchmarks at 14836.8 pystones/second Pystone(1.1) time for 50000 passes = 3.39 This machine benchmarks at 14749.3 pystones/second AFTER: $ for i in 1 2 3 4 5; do ./python ../Lib/test/pystone.py; done Pystone(1.1) time for 50000 passes = 3.38 This machine benchmarks at 14792.9 pystones/second Pystone(1.1) time for 50000 passes = 3.38 This machine benchmarks at 14792.9 pystones/second Pystone(1.1) time for 50000 passes = 3.38 This machine benchmarks at 14792.9 pystones/second Pystone(1.1) time for 50000 passes = 3.38 This machine benchmarks at 14792.9 pystones/second Pystone(1.1) time for 50000 passes = 3.4 This machine benchmarks at 14705.9 pystones/second If it was a 2% gain, I'd say go for it (though Guido isn't so sure, it seems). What compiler/platform are you using? Cheers, M. -- languages shape the way we think, or don't. -- Erik Naggum, comp.lang.lisp From skip@pobox.com Thu Mar 13 14:42:26 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 13 Mar 2003 08:42:26 -0600 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <2mn0jze3pq.fsf@starship.python.net> References: <20030313133609.GA23223@foof.i3.cz> <2msmtre5x2.fsf@starship.python.net> <20030313141858.GB23223@foof.i3.cz> <2mn0jze3pq.fsf@starship.python.net> Message-ID: <15984.39122.349303.287830@montanaro.dyndns.org> Michal, Can you post your changes to abstract.c as a patch on SourceForge? That would allow multiple people to mull it over and all be sure they are working from the same code base. If Michael Hudson and Guido reported substantially different speedups than you, perhaps you were doing something they weren't. Skip From fuf@mageo.cz Thu Mar 13 15:09:05 2003 From: fuf@mageo.cz (Michal Vitecek) Date: Thu, 13 Mar 2003 16:09:05 +0100 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <2mn0jze3pq.fsf@starship.python.net> References: <20030313133609.GA23223@foof.i3.cz> <2msmtre5x2.fsf@starship.python.net> <20030313141858.GB23223@foof.i3.cz> <2mn0jze3pq.fsf@starship.python.net> Message-ID: <20030313150905.GC23223@foof.i3.cz> Michael Hudson wrote: >> okay - i did as you said and the speed-up is only 2.1% so it's not >> probably worth it. here come the numbers: > >I didn't say "*two* point one", I said "*nought* point one"!: crap. i found the problem - on a _completely unused_ computer the difference is indeed only ~0.7%. my apologies for false alarm :/ sorry, -- fuf (fuf@mageo.cz) From dave@boost-consulting.com Thu Mar 13 16:27:07 2003 From: dave@boost-consulting.com (David Abrahams) Date: Thu, 13 Mar 2003 11:27:07 -0500 Subject: [Python-Dev] More int/long integration issues Message-ID: I was recently surprised by: Python 2.3a2+ (#1, Feb 24 2003, 15:02:10) [GCC 3.2 20020927 (prerelease)] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> xrange(2 ** 32) Traceback (most recent call last): File "", line 1, in ? OverflowError: long int too large to convert to int Now that we have a kind of long/int integration, maybe it makes sense to update xrange()? Or is that really a 2.4 feature? -- Dave Abrahams Boost Consulting www.boost-consulting.com From mal@lemburg.com Thu Mar 13 16:30:09 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 13 Mar 2003 17:30:09 +0100 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <20030313150905.GC23223@foof.i3.cz> References: <20030313133609.GA23223@foof.i3.cz> <2msmtre5x2.fsf@starship.python.net> <20030313141858.GB23223@foof.i3.cz> <2mn0jze3pq.fsf@starship.python.net> <20030313150905.GC23223@foof.i3.cz> Message-ID: <3E70B211.4030600@lemburg.com> Michal Vitecek wrote: > Michael Hudson wrote: > >>> okay - i did as you said and the speed-up is only 2.1% so it's not >>> probably worth it. here come the numbers: >> >>I didn't say "*two* point one", I said "*nought* point one"!: > > crap. i found the problem - on a _completely unused_ computer the > difference is indeed only ~0.7%. my apologies for false alarm :/ I'd rather suggest to take a look at making more use of the available Python macros in the interpreter. Things like PyInt_AsLong() can often be written as PyInt_AS_LONG() because there's a type check only a few lines above the call. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 13 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 19 days left EuroPython 2003, Charleroi, Belgium: 103 days left From aahz@pythoncraft.com Thu Mar 13 16:42:47 2003 From: aahz@pythoncraft.com (Aahz) Date: Thu, 13 Mar 2003 11:42:47 -0500 Subject: [Python-Dev] More int/long integration issues In-Reply-To: References: Message-ID: <20030313164247.GB22296@panix.com> On Thu, Mar 13, 2003, David Abrahams wrote: > > I was recently surprised by: > > Python 2.3a2+ (#1, Feb 24 2003, 15:02:10) > [GCC 3.2 20020927 (prerelease)] on cygwin > Type "help", "copyright", "credits" or "license" for more information. > >>> xrange(2 ** 32) > Traceback (most recent call last): > File "", line 1, in ? > OverflowError: long int too large to convert to int > > Now that we have a kind of long/int integration, maybe it makes sense > to update xrange()? Or is that really a 2.4 feature? IIRC, it was decided that doing that wouldn't make sense until the standard sequences (lists/tuples) can support more than 2**31 items. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Register for PyCon now! http://www.python.org/pycon/reg.html From guido@python.org Thu Mar 13 17:24:33 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 13 Mar 2003 12:24:33 -0500 Subject: [Python-Dev] More int/long integration issues In-Reply-To: Your message of "Thu, 13 Mar 2003 11:27:07 EST." References: Message-ID: <200303131724.h2DHOZS05548@odiug.zope.com> > Now that we have a kind of long/int integration, maybe it makes sense > to update xrange()? Or is that really a 2.4 feature? IMO, xrange() must die. As a compromise to practicality, it should lose functionality, not gain any. --Guido van Rossum (home page: http://www.python.org/~guido/) From mcherm@mcherm.com Thu Mar 13 17:31:08 2003 From: mcherm@mcherm.com (Chermside, Michael) Date: Thu, 13 Mar 2003 12:31:08 -0500 Subject: [Python-Dev] Re: More int/long integration issues Message-ID: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> Guido writes: > IMO, xrange() must die. Glad to hear it. I always found range() vs xrange() a wart. But if you had it do do over, how would you do it? -- Michael Chermside From dave@boost-consulting.com Thu Mar 13 17:43:04 2003 From: dave@boost-consulting.com (David Abrahams) Date: Thu, 13 Mar 2003 12:43:04 -0500 Subject: [Python-Dev] More int/long integration issues In-Reply-To: <200303131724.h2DHOZS05548@odiug.zope.com> (Guido van Rossum's message of "Thu, 13 Mar 2003 12:24:33 -0500") References: <200303131724.h2DHOZS05548@odiug.zope.com> Message-ID: Guido van Rossum writes: > IMO, xrange() must die. > > As a compromise to practicality, it should lose functionality, not > gain any. OK, range() becomes lazy, then? Or is there another plan? -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Thu Mar 13 19:03:27 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 13 Mar 2003 14:03:27 -0500 Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: Your message of "Thu, 13 Mar 2003 12:31:08 EST." <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> Message-ID: <200303131903.h2DJ3Ug06240@odiug.zope.com> > Guido writes: > > IMO, xrange() must die. > > > > As a compromise to practicality, it should lose functionality, not > > gain any. [Michael Chermside] > Glad to hear it. I always found range() vs xrange() a wart. It is, and it is one that I hate. > But if you had it do do over, how would you do it? I'd make range() an iterator. To get a concrete list that you can modify, you'd have to write list(range(N)). But that can't be done without breaking backwards compatibility, so I won't. [David Abrahams] > OK, range() becomes lazy, then? Or is there another plan? The bytecode compiler should be clever enough to see that you're writing for i in range(...): ... and that there's no definition of range other than the built-in one (this requires a subtle change of language rules); it can then substitute an internal equivalent to xrange(). --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Thu Mar 13 19:15:09 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 13 Mar 2003 14:15:09 -0500 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? References: <20030313133609.GA23223@foof.i3.cz> <200303131429.h2DETem03635@odiug.zope.com> Message-ID: <007d01c2e994$ddb73c00$3c10a044@oemcomputer> > > i had a quick look at Objects/abstract.c in 2.2.2's source. almost > > every function there checks whether the objects it's passed are not > > NULL. if they are, SystemError exception occurs. since i've never come > > across such exception i've commented out those checks. > Unfortunately, this is part of the safety net for poor extension > writers, and I'm not sure we can drop it. Can we get most of the same benefit by using an assert() rather than NULL-->SystemError? Raymond Hettinger From jeremy@zope.com Thu Mar 13 20:01:03 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 13 Mar 2003 15:01:03 -0500 Subject: [Python-Dev] are NULL checks in Objects/abstract.c really needed? In-Reply-To: <007d01c2e994$ddb73c00$3c10a044@oemcomputer> References: <20030313133609.GA23223@foof.i3.cz> <200303131429.h2DETem03635@odiug.zope.com> <007d01c2e994$ddb73c00$3c10a044@oemcomputer> Message-ID: <1047585663.4296.2.camel@slothrop.zope.com> On Thu, 2003-03-13 at 14:15, Raymond Hettinger wrote: > > > i had a quick look at Objects/abstract.c in 2.2.2's source. almost > > > every function there checks whether the objects it's passed are not > > > NULL. if they are, SystemError exception occurs. since i've never come > > > across such exception i've commented out those checks. > > > Unfortunately, this is part of the safety net for poor extension > > writers, and I'm not sure we can drop it. > > Can we get most of the same benefit by using > an assert() rather than NULL-->SystemError? No. assert() causes the program to fail. SystemError() raises an exception and lets the program keep going. Those are vastly different effects. Jeremy From python@rcn.com Thu Mar 13 20:16:03 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 13 Mar 2003 15:16:03 -0500 Subject: [Python-Dev] are NULL checks in Objects/abstract.c reallyneeded? References: <20030313133609.GA23223@foof.i3.cz> <200303131429.h2DETem03635@odiug.zope.com> <007d01c2e994$ddb73c00$3c10a044@oemcomputer> <1047585663.4296.2.camel@slothrop.zope.com> Message-ID: <002d01c2e99d$5fd188a0$3c10a044@oemcomputer> > > Can we get most of the same benefit by using > > an assert() rather than NULL-->SystemError? > > No. assert() causes the program to fail. SystemError() raises an > exception and lets the program keep going. Those are vastly different > effects. Of course. My thought was that either one will come to the attention of the extension writer before the extension goes out. But then, if the code in question never got excercised, then it would crash in the hands of a user. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From python-kbutler@sabaydi.com Thu Mar 13 20:09:01 2003 From: python-kbutler@sabaydi.com (Kevin J. Butler) Date: Thu, 13 Mar 2003 13:09:01 -0700 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <20030312164902.10494.64514.Mailman@mail.python.org> References: <20030312164902.10494.64514.Mailman@mail.python.org> Message-ID: <3E70E55D.1050102@sabaydi.com> > *Guido van Rossum * guido@python.org > // > >Tuples are for heterogeneous data, list are for homogeneous data. > Only if you include *both* null cases: - tuple of type( i ) == type( i+1 ) - list of PyObject Homo-/heterogeneity is orthogonal to the primary benefits of lists (mutability) and of tuples (fixed order/length). Else why can you do list( (1, "two", 3.0) ) and tuple( [x, y, z] ) ? >Tuples are *not* read-only lists. > It just happens that "tuple( sequence )" is the most easy & obvious (and thus right?) way to spell "immutable sequence". Stop reading whenever you're convinced. ;-) (not about mutability, but about homo/heterogeneity) There are three (mostly) independent characteristics of tuples (in most to least important order, by frequency of use, IMO): - fixed order/fixed length - used in function argument/return tuples and all uses as a "struct" - heterogeneity allowed but not required - used in many function argument tuples and many "struct" tuples - immutability - implies fixed-order and fixed-length, and used occasionally for specific needs The important characteristics of lists are also independent of each other (again, IMO on the order): - mutability of length & content - used for dynamically building collections - heterogeneity allowed but not required - used occasionally for specific needs It turns out that fixed-length sequences are often useful for heterogeneous data, and that most sequences that require mutability are homogeneous. Examples from the standard library (found by grep '= (' and grep '= \[' ): # homogeneous tuple - homogeneity, fixed order, and fixed length are all required # CVS says Guido wrote/imported this. ;-) whrandom.py: self._seed = (x or 1, y or 1, z or 1) # homogeneous tuple - homogeneity is required - all entries must be 'types' # suitable for passing to 'isinstance( A, typesTuple )', which (needlessly?) requires a tuple to avoid # possibly recursive general sequences types.py: StringTypes = (StringType, UnicodeType) # heterogeneous list of values of all basic types (we need to be able to copy all types of values) # this could be a tuple, but neither immutability, nor fixed length, nor fixed order are needed, so it makes more sense as a list # CVS blames Guido here, too, in version 1.1. ;-) copy.py: l = [None, 1, 2L, 3.14, 'xyzzy', (1, 2L), [3.14, 'abc'], {'abc': 'ABC'}, (), [], {}] Other homogeneous tuples (may benefit from mutability, but require fixed-length/order): - 3D coordinates - RGB color - binary tree node (child, next) Other heterogeneous lists (homogeneous lists of base-class instances blah-blah-blah): - files AND directories to traverse (strings? "File" objects?) - emails AND faxes AND voicemails AND tasks in your Inbox (items?) - mail AND newsgroup accounts (accounts?) - return values OR exceptions from a list of test cases and test suites (PyObjects? introduce an artificial base class?) Must-be-stubborn-if-you-got-this-far-ly y'rs ;-) kb From cnetzer@mail.arc.nasa.gov Thu Mar 13 20:25:17 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 13 Mar 2003 12:25:17 -0800 Subject: [Python-Dev] More int/long integration issues In-Reply-To: <20030313164247.GB22296@panix.com> References: <20030313164247.GB22296@panix.com> Message-ID: <1047587117.660.33.camel@sayge.arc.nasa.gov> On Thu, 2003-03-13 at 08:42, Aahz wrote: > On Thu, Mar 13, 2003, David Abrahams wrote: > > Now that we have a kind of long/int integration, maybe it makes sense > > to update xrange()? Or is that really a 2.4 feature? > > IIRC, it was decided that doing that wouldn't make sense until the > standard sequences (lists/tuples) can support more than 2**31 items. I'm working on a patch that allows both range() and xrange() to work with large (PyLong) values. Currently, with my patch, the length of range is still limited to a C long (due to memory issues anyway), and xrange() could support longer sequences (conceptually), although indexing them still is limited to C int indices. I noticed the need for a least supporting long values when I found some bugs in code that did things like: a = 1/1e-5 range( a-20, a) or a = 1/1e-6 b = 1/1e-5 c = 1/1e-4 range(a, b, c) Now, this example is hardcoded, but in graphing software, or other numerical work, the actual values come from the data set. All of a sudden, you could be dealing with very small numbers (say, because you want to examine error values), and you get: a = 1/1e-21 b = 1/1e-20 c = 1/1e-19 range(a, b, c) And your piece of code now fails. By the comments I've seen, this failure tends to come as a big surprise (people are simply expecting range to be able to work with PyLong values, over short lengths). Also, someone who is working with large files (> C long on his machine) claimed to be having problems w/ xrange() failing (although, if he is indexing the xrange object, my patch can't help anyway) I've seen enough people asking in the newsgroups about this behavior (at least four in the past 5 months or so), and I've submitted some application patches to make things work for these cases (ie. by explicitly subtracting out the large common base of each parameter, and adding it back in after the list is generated), so I decided to make a patch to change the range behavior. Fixing range was relatively easy, and could be done with no performance penalty (the code to handle longs ranges is only invoked after the existing code path fails; the common case is unaltered). Fixing xrange() is trickier, and I'm opting to maintain backwards compatibility as much as possible. In any case, I should have the patch ready to submit within the next week or so (just a few hours more work is needed, for testing and cleanup) Then the argument about whether it should ever be included can begin in earnest. But I have seen enough examples of people being surprised that ranges of long values (where the range length is well within the addressable limit, but the range values must be PyLongs) that I think at least range() should be fixed. And if range() is fixed, then sadly, xrange() should be fixed as well (IMO). BTW, I'm all for deprecating xrange() with all deliberate speed. Doing so would only make updating range behavior easier. Chad From jeremy@zope.com Thu Mar 13 20:35:45 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 13 Mar 2003 15:35:45 -0500 Subject: [Python-Dev] are NULL checks in Objects/abstract.c reallyneeded? In-Reply-To: <002d01c2e99d$5fd188a0$3c10a044@oemcomputer> References: <20030313133609.GA23223@foof.i3.cz> <200303131429.h2DETem03635@odiug.zope.com> <007d01c2e994$ddb73c00$3c10a044@oemcomputer> <1047585663.4296.2.camel@slothrop.zope.com> <002d01c2e99d$5fd188a0$3c10a044@oemcomputer> Message-ID: <1047587745.4296.8.camel@slothrop.zope.com> On Thu, 2003-03-13 at 15:16, Raymond Hettinger wrote: > > > Can we get most of the same benefit by using > > > an assert() rather than NULL-->SystemError? > > > > No. assert() causes the program to fail. SystemError() raises an > > exception and lets the program keep going. Those are vastly different > > effects. > > Of course. My thought was that either one will come to the > attention of the extension writer before the extension goes out. > But then, if the code in question never got excercised, then it > would crash in the hands of a user. That's right. We should expect that some number of bugs in extension code are going to be found by end users. An end user is better able to cope with a SystemError than a core file. Long running servers have a different reason to prefer SystemError. A Zope process allows untrusted code to call some extension module, believing it is safe. A bug is found in the extension. If the bug tickles an assert(), Zope crashes. If the bug raises an exception, Zope catches it and continues. > Raymond Hettinger > > ################################################################# > ################################################################# > ################################################################# > ##### > ##### > ##### > ################################################################# > ################################################################# > ################################################################# Your funky sig is back :-). Jeremy From skip@pobox.com Thu Mar 13 20:50:36 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 13 Mar 2003 14:50:36 -0600 Subject: [Python-Dev] are NULL checks in Objects/abstract.c reallyneeded? In-Reply-To: <002d01c2e99d$5fd188a0$3c10a044@oemcomputer> References: <20030313133609.GA23223@foof.i3.cz> <200303131429.h2DETem03635@odiug.zope.com> <007d01c2e994$ddb73c00$3c10a044@oemcomputer> <1047585663.4296.2.camel@slothrop.zope.com> <002d01c2e99d$5fd188a0$3c10a044@oemcomputer> Message-ID: <15984.61212.224414.818876@montanaro.dyndns.org> Raymond> Can we get most of the same benefit by using an assert() rather Raymond> than NULL-->SystemError? Jeremy> No. assert() causes the program to fail. SystemError() raises Jeremy> an exception and lets the program keep going. Those are vastly Jeremy> different effects. It's not clear to me that you'd see any benefit anyway. The checking code currently looks like this: if (o == NULL) return null_error(); If you changed it to use assert you'd have assert(o != NULL); which expands to ((o != NULL) ? 0 : __assert(...)); In the common case you still test for either o==NULL or o!=NULL. Unless one test is terrifically faster than the other (and you executed it a helluva lot) you wouldn't gain anything except the loss of the possibility (however slim) that you might be able to recover. Still, for people who's only desire is speed and are willing to sacrifice checks to get it, perhaps we should have a --without-null-checks configure flag. ;-) I bet if you were ruthless in eliminating checks (especially in ceval.c) you would see an easily measurable speedup. Skip From tim.one@comcast.net Thu Mar 13 21:20:52 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 13 Mar 2003 16:20:52 -0500 Subject: [Python-Dev] are NULL checks in Objects/abstract.c reallyneeded? In-Reply-To: <15984.61212.224414.818876@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > It's not clear to me that you'd see any benefit anyway. The checking code > currently looks like this: > > if (o == NULL) > return null_error(); > > If you changed it to use assert you'd have > > assert(o != NULL); > > which expands to > > ((o != NULL) ? 0 : __assert(...)); > ... In the release build, Python arranges to #define the preprocessor NDEBUG symbol, which in turn causes assert() to expand to nothing (or maybe to (void)0, or something like that, depending on the compiler). That's standard ANSI C behavior for assert(). IOW, asserts cost nothing in a release build -- and don't do anything in a release build either. From greg@cosc.canterbury.ac.nz Thu Mar 13 21:37:46 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Mar 2003 10:37:46 +1300 (NZDT) Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: <200303131903.h2DJ3Ug06240@odiug.zope.com> Message-ID: <200303132137.h2DLbkf06569@oma.cosc.canterbury.ac.nz> Guido mused: > and that there's no definition of range other than the built-in one > (this requires a subtle change of language rules); it can then > substitute an internal equivalent to xrange(). That sounds good. What sort of subtle language change do you have in mind which would permit this deduction? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From skip@pobox.com Thu Mar 13 21:46:54 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 13 Mar 2003 15:46:54 -0600 Subject: [Python-Dev] are NULL checks in Objects/abstract.c reallyneeded? In-Reply-To: References: <15984.61212.224414.818876@montanaro.dyndns.org> Message-ID: <15984.64590.983479.795437@montanaro.dyndns.org> Tim> In the release build, Python arranges to #define the preprocessor Tim> NDEBUG symbol, which in turn causes assert() to expand to nothing Yeah, I forgot about that. Okay, so the analysis was flawed. You didn't comment on the --without-null-checks option. ;-) Skip From guido@python.org Fri Mar 14 00:22:09 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 13 Mar 2003 19:22:09 -0500 Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: "Your message of Fri, 14 Mar 2003 10:37:46 +1300." <200303132137.h2DLbkf06569@oma.cosc.canterbury.ac.nz> References: <200303132137.h2DLbkf06569@oma.cosc.canterbury.ac.nz> Message-ID: <200303140022.h2E0M9400393@pcp02138704pcs.reston01.va.comcast.net> > > and that there's no definition of range other than the built-in one > > (this requires a subtle change of language rules); it can then > > substitute an internal equivalent to xrange(). > > That sounds good. What sort of subtle language change > do you have in mind which would permit this deduction? An official prohibition on inserting names in other namespaces that shadow built-ins. The prohibition needn't be enforced (although that would be nice). A program that does import foomod foomod.range = ... would be invalid, but an implementation might not be able to catch all cases, e.g. import foomod foomod.__dict__['range'] = ... It could be enforced, mostly, by making a module's __dict__ attribute return a read-only proxy like a new-style class's __dict__ attribute does, and putting an explicit ban on setting certain names in the module setattr implementation. But the module itself could also play games, e.g. it could do exec "range = ..." in globals() Another module could also do from foomod import f # a function f.func_globals['range'] = ... All these things would be illegal without necessarily being enforced. (The only way I see for total enforcement would be to change the dict implementation to trap certain assignments.) BTW, import foomod foomod.foo = ... would still be allowed -- it's only setting previously unset built-in names (or maybe built-in names that are known to be used by the module) that would be prohibited. Also, foomod could explicit allow setting an attribute by doing something like range = range # copy the built-in into a global to disable the optimization. I.e. setting something that's already set should always be allowed. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Fri Mar 14 01:12:01 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 13 Mar 2003 20:12:01 -0500 Subject: [Python-Dev] are NULL checks in Objects/abstract.c reallyneeded? In-Reply-To: <15984.64590.983479.795437@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > ... > You didn't comment on the --without-null-checks option. ;-) That's true! agreeably y'rs - tim From andrewm@object-craft.com.au Fri Mar 14 01:47:55 2003 From: andrewm@object-craft.com.au (Andrew McNamara) Date: Fri, 14 Mar 2003 12:47:55 +1100 Subject: [Python-Dev] Iterable sockets? Message-ID: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> Line oriented network protocols are very common, and I often find myself calling the socket makefile method so I can read complete lines from a socket. I'm probably not the first one who's wished that socket objects where more file-like. While I don't think we'd want to go as far as to turn them into a stdio based file object, it might make sense to allow them to be iterated over (and add a .readline() method, I guess). This would necessitate adding some input buffering, which will complicate things like the .recv() method, so I'm not sure it's that good an idea, but it removes one gotchya for neophytes (and forgetful veterans). Thoughts? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From guido@python.org Fri Mar 14 01:57:00 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 13 Mar 2003 20:57:00 -0500 Subject: [Python-Dev] Iterable sockets? In-Reply-To: "Your message of Fri, 14 Mar 2003 12:47:55 +1100." <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> References: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> Message-ID: <200303140157.h2E1v0i00939@pcp02138704pcs.reston01.va.comcast.net> > Line oriented network protocols are very common, and I often find myself > calling the socket makefile method so I can read complete lines from a > socket. I'm probably not the first one who's wished that socket objects > where more file-like. > > While I don't think we'd want to go as far as to turn them into a stdio > based file object, it might make sense to allow them to be iterated over > (and add a .readline() method, I guess). This would necessitate adding some > input buffering, which will complicate things like the .recv() method, so > I'm not sure it's that good an idea, but it removes one gotchya for > neophytes (and forgetful veterans). Thoughts? Um, why doesn't the makefile() method do what you want? --Guido van Rossum (home page: http://www.python.org/~guido/) From andrewm@object-craft.com.au Fri Mar 14 02:38:00 2003 From: andrewm@object-craft.com.au (Andrew McNamara) Date: Fri, 14 Mar 2003 13:38:00 +1100 Subject: [Python-Dev] Iterable sockets? In-Reply-To: Message from Guido van Rossum of "Thu, 13 Mar 2003 20:57:00 CDT." <200303140157.h2E1v0i00939@pcp02138704pcs.reston01.va.comcast.net> References: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> <200303140157.h2E1v0i00939@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030314023800.7BA6D3CC5F@coffee.object-craft.com.au> >> Line oriented network protocols are very common, and I often find myself >> calling the socket makefile method so I can read complete lines from a >> socket. I'm probably not the first one who's wished that socket objects >> where more file-like. >> >> While I don't think we'd want to go as far as to turn them into a stdio >> based file object, it might make sense to allow them to be iterated over >> (and add a .readline() method, I guess). This would necessitate adding some >> input buffering, which will complicate things like the .recv() method, so >> I'm not sure it's that good an idea, but it removes one gotchya for >> neophytes (and forgetful veterans). Thoughts? > >Um, why doesn't the makefile() method do what you want? The short answer is that it does, but not very tidily - by turning the socket object into a file object, I lose the original socket object functionality (for example, shutdown()). At another level, the concept of a "file-like" object is a very common python idiom - socket is the odd one out these days. It's really not a big deal - we could regularise the interface at the cost of more implementation complexity. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From guido@python.org Fri Mar 14 02:53:03 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 13 Mar 2003 21:53:03 -0500 Subject: [Python-Dev] More int/long integration issues In-Reply-To: "Your message of 13 Mar 2003 12:25:17 PST." <1047587117.660.33.camel@sayge.arc.nasa.gov> References: <20030313164247.GB22296@panix.com> <1047587117.660.33.camel@sayge.arc.nasa.gov> Message-ID: <200303140253.h2E2r3o01163@pcp02138704pcs.reston01.va.comcast.net> > I'm working on a patch that allows both range() and xrange() to work > with large (PyLong) values. I'm not interested for xrange(). As I said, xrange() is a crutch and should not be given features that make it hard to kill. For range(), sure, upload to SF. > I noticed the need for a least supporting long values when I found some > bugs in code that did things like: > > a = 1/1e-5 > range( a-20, a) This should be a TypeError. I'm sorry it isn't. range() is only defined for ints, and unfortunately if you pass it a float it truncates rather than failing. > or > > a = 1/1e-6 > b = 1/1e-5 > c = 1/1e-4 > range(a, b, c) Ditto. (BTW why don't you write this as 1e6, 1e5, 1e4???) > Now, this example is hardcoded, but in graphing software, or other > numerical work, the actual values come from the data set. All of a > sudden, you could be dealing with very small numbers (say, because you > want to examine error values), and you get: > > a = 1/1e-21 > b = 1/1e-20 > c = 1/1e-19 > range(a, b, c) > > And your piece of code now fails. By the comments I've seen, this > failure tends to come as a big surprise (people are simply expecting > range to be able to work with PyLong values, over short lengths). But 1/1e-21 is not a long. It's a float. You're flirting with disaster here. > Also, someone who is working with large files (> C long on his machine) > claimed to be having problems w/ xrange() failing (although, if he is > indexing the xrange object, my patch can't help anyway) That's a totally different problem. Indeed you can't use xrange() with values > sys.maxint. But it should be easy to recode this without xrange. > I've seen enough people asking in the newsgroups about this behavior (at > least four in the past 5 months or so), and I've submitted some > application patches to make things work for these cases (ie. by > explicitly subtracting out the large common base of each parameter, and > adding it back in after the list is generated), so I decided to make a > patch to change the range behavior. > > Fixing range was relatively easy, and could be done with no performance > penalty (the code to handle longs ranges is only invoked after the > existing code path fails; the common case is unaltered). Fixing > xrange() is trickier, and I'm opting to maintain backwards compatibility > as much as possible. > > In any case, I should have the patch ready to submit within the next > week or so (just a few hours more work is needed, for testing and > cleanup) > > Then the argument about whether it should ever be included can begin in > earnest. But I have seen enough examples of people being surprised that > ranges of long values (where the range length is well within the > addressable limit, but the range values must be PyLongs) that I think at > least range() should be fixed. Yes. > And if range() is fixed, then sadly, > xrange() should be fixed as well (IMO). No. > BTW, I'm all for deprecating xrange() with all deliberate speed. Doing > so would only make updating range behavior easier. It can't be deprecated until we have an alternative. That will have to wait until Python 2.4. I fought its addition to the language long and hard, but the arguments from PBP (Practicality Beats Purity) were too strong. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 14 02:56:42 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 13 Mar 2003 21:56:42 -0500 Subject: [Python-Dev] Iterable sockets? In-Reply-To: "Your message of Fri, 14 Mar 2003 13:38:00 +1100." <20030314023800.7BA6D3CC5F@coffee.object-craft.com.au> References: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> <200303140157.h2E1v0i00939@pcp02138704pcs.reston01.va.comcast.net> <20030314023800.7BA6D3CC5F@coffee.object-craft.com.au> Message-ID: <200303140256.h2E2ugI01187@pcp02138704pcs.reston01.va.comcast.net> > >> Line oriented network protocols are very common, and I often find > >> myself calling the socket makefile method so I can read complete > >> lines from a socket. I'm probably not the first one who's wished > >> that socket objects where more file-like. > >> > >> While I don't think we'd want to go as far as to turn them into a > >> stdio based file object, it might make sense to allow them to be > >> iterated over (and add a .readline() method, I guess). This would > >> necessitate adding some input buffering, which will complicate > >> things like the .recv() method, so I'm not sure it's that good an > >> idea, but it removes one gotchya for neophytes (and forgetful > >> veterans). Thoughts? > > > >Um, why doesn't the makefile() method do what you want? > > The short answer is that it does, but not very tidily - by turning the > socket object into a file object, I lose the original socket object > functionality (for example, shutdown()). You can just keep the socket around though. > At another level, the concept of a "file-like" object is a very common > python idiom - socket is the odd one out these days. > > It's really not a big deal - we could regularise the interface at the > cost of more implementation complexity. I'm not sure if I'd call that regularizing. It would by necessity become some kind of odd mixture. In any case, I find the file abstraction a bit arcane too. Maybe we should strive to replace all these with something better in Python 3.0, to be prototyped in the standard library starting with 2.4. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Fri Mar 14 03:03:58 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 13 Mar 2003 21:03:58 -0600 Subject: [Python-Dev] Iterable sockets? In-Reply-To: <20030314023800.7BA6D3CC5F@coffee.object-craft.com.au> References: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> <200303140157.h2E1v0i00939@pcp02138704pcs.reston01.va.comcast.net> <20030314023800.7BA6D3CC5F@coffee.object-craft.com.au> Message-ID: <15985.18078.155484.648103@montanaro.dyndns.org> >> Um, why doesn't the makefile() method do what you want? Andrew> The short answer is that it does, but not very tidily - by Andrew> turning the socket object into a file object, I lose the Andrew> original socket object functionality (for example, shutdown()). Would it be sufficient for the close() method on the object returned by sock.makefile() to call shutdown(2) on the underlying socket? Skip From andrewm@object-craft.com.au Fri Mar 14 03:11:41 2003 From: andrewm@object-craft.com.au (Andrew McNamara) Date: Fri, 14 Mar 2003 14:11:41 +1100 Subject: [Python-Dev] Iterable sockets? In-Reply-To: Message from Guido van Rossum of "Thu, 13 Mar 2003 21:56:42 CDT." <200303140256.h2E2ugI01187@pcp02138704pcs.reston01.va.comcast.net> References: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> <200303140157.h2E1v0i00939@pcp02138704pcs.reston01.va.comcast.net> <20030314023800.7BA6D3CC5F@coffee.object-craft.com.au> <200303140256.h2E2ugI01187@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030314031141.E468A3CC5F@coffee.object-craft.com.au> >> The short answer is that it does, but not very tidily - by turning the >> socket object into a file object, I lose the original socket object >> functionality (for example, shutdown()). > >You can just keep the socket around though. Yes. Which has always struck me as slightly ugly. >> At another level, the concept of a "file-like" object is a very common >> python idiom - socket is the odd one out these days. >> >> It's really not a big deal - we could regularise the interface at the >> cost of more implementation complexity. > >I'm not sure if I'd call that regularizing. It would by necessity >become some kind of odd mixture. I guess you would keep the send() and recv() interfaces for raw access, and add read(), write(), readlines(), etc, which would be buffered. I'd chose to then view it as a superset of a file-like object. >In any case, I find the file abstraction a bit arcane too. Maybe we >should strive to replace all these with something better in Python 3.0, to >be prototyped in the standard library starting with 2.4. And get rid of stdio along the way, with any luck... 8-) It would also be nice to make the buffering play nicely with select()/poll()-threaded applications... if we're talking about wishlists... 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From greg@cosc.canterbury.ac.nz Fri Mar 14 03:26:19 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 14 Mar 2003 16:26:19 +1300 (NZDT) Subject: [Python-Dev] Iterable sockets? In-Reply-To: <15985.18078.155484.648103@montanaro.dyndns.org> Message-ID: <200303140326.h2E3QJZ07822@oma.cosc.canterbury.ac.nz> Skip Montanaro : > Would it be sufficient for the close() method on the object returned by > sock.makefile() to call shutdown(2) on the underlying socket? I don't think that would be very useful - shutdown() is normally used to shut the socket down in one direction only. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From cnetzer@mail.arc.nasa.gov Fri Mar 14 03:52:15 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 13 Mar 2003 19:52:15 -0800 Subject: [Python-Dev] More int/long integration issues In-Reply-To: <200303140253.h2E2r3o01163@pcp02138704pcs.reston01.va.comcast.net> References: <20030313164247.GB22296@panix.com> <1047587117.660.33.camel@sayge.arc.nasa.gov> <200303140253.h2E2r3o01163@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1047613935.661.119.camel@sayge.arc.nasa.gov> On Thu, 2003-03-13 at 18:53, Guido van Rossum wrote: > > a = 1/1e-5 > > range( a-20, a) > > This should be a TypeError. I'm sorry it isn't. Yeah. I can easily make it do this, BTW. (ie. keep it backwards compatible for smaller floats, but disallow it when dealing with PyLong size floats). With large floats, the implicit conversion to PyLong gets even less sensible, due to granularity issues. > > a = 1/1e-6 > > b = 1/1e-5 > > c = 1/1e-4 > > range(a, b, c) > > Ditto. > > (BTW why don't you write this as 1e6, 1e5, 1e4???) Just emphasizing that the coders may not have even expected to be dealing with such "large" values, but they got them anyway because they were plotting very "small" values (and the plotting operation did the inversion). A bad choice of example, I guess. Okay, I decided to go look at the specific code I was talking about. It essentially did stuff like: large_float = 1e20 a = long( math.ceil( large_float ) ) b = a + 10 range( a, b ) So, it actually wasn't submitting floats to range(), but was expecting it to work on long values (within the limits of memory). Again, it is also easy to fix these uses, but we agree that in principle that it should work... I've heard others doing number theory work, who hoped or expected it to work, as well. (Typically, they wanted to use HUGE step sizes, for example) In any case, I'll get the patch submitted fairly soon, for range(). Need to update the tests. > But 1/1e-21 is not a long. It's a float. You're flirting with > disaster here. Yep. I agree. > > And if range() is fixed, then sadly, > > xrange() should be fixed as well (IMO). > > No. Alright. That makes things (fairly) easy. :) > > BTW, I'm all for deprecating xrange() with all deliberate speed. Doing > > so would only make updating range behavior easier. > > It can't be deprecated until we have an alternative. That will have > to wait until Python 2.4. I'm also coding an irange() for consideration in the itertools module. At least an (explicit) replacement for the iteration usage (although, maybe not necessary if you actually do the lazy-list in "for" loop change.) If people need the indexing and length operations, too, I can only suggest a pure python implementation (which could return an irange() iterator when needed). Is that a dead-end idea, or a starter? Chad Netzer From aleax@aleax.it Fri Mar 14 07:44:12 2003 From: aleax@aleax.it (Alex Martelli) Date: Fri, 14 Mar 2003 08:44:12 +0100 Subject: [Python-Dev] Iterable sockets? In-Reply-To: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> References: <20030314014755.BE3AB3CC5F@coffee.object-craft.com.au> Message-ID: <200303140844.12401.aleax@aleax.it> On Friday 14 March 2003 02:47 am, Andrew McNamara wrote: > Line oriented network protocols are very common, and I often find myself > calling the socket makefile method so I can read complete lines from a > socket. I'm probably not the first one who's wished that socket objects > where more file-like. > > While I don't think we'd want to go as far as to turn them into a stdio > based file object, it might make sense to allow them to be iterated over > (and add a .readline() method, I guess). This would necessitate adding some > input buffering, which will complicate things like the .recv() method, so > I'm not sure it's that good an idea, but it removes one gotchya for > neophytes (and forgetful veterans). Thoughts? I've had occasion to code a "socket that turns into a filelike object at need" (back in Python 2.0, I believe) and I used something like (can't find the original code, but AFAIR it was a bit like the following): class richsocket: def __init__(self, sock, *args): self.sock = socket.socket(*args) self.file = None def __getattr__(self, name): try: result = getattr(self.sock, name) except AttributeError: pass else: return result if self.file is None: self.file = self.sock.makefile() return getattr(self.file, name) This has some issues (e.g. method close goes to self.sock when it should probably go to self.file if not None; plus, the buffering issues you mention, etc), but nothing that looks too hard to fix -- in my use case I applied AGNI and never needed any more than this simple and smooth "double alternate delegation" pattern. Today, if type socket supports inheritance (haven't checked), it should be even easier, I suspect. Alex From aleax@aleax.it Fri Mar 14 08:03:10 2003 From: aleax@aleax.it (Alex Martelli) Date: Fri, 14 Mar 2003 09:03:10 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <3E70E55D.1050102@sabaydi.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <3E70E55D.1050102@sabaydi.com> Message-ID: <200303140903.10045.aleax@aleax.it> On Thursday 13 March 2003 09:09 pm, Kevin J. Butler wrote: ... > The important characteristics of lists are also independent of each > other (again, IMO on the order): > > - mutability of length & content - used for dynamically building > collections > - heterogeneity allowed but not required - used occasionally > for specific needs I think some methods must also go on this list of important characteristics -- the sort method, in particular. If you need to sort stuff (including heterogenous stuff, EXCEPT if the latter includes at least one complex AND at least one other number of any kind) you put it into a list and sort the list -- that's the Python way of sorting, and sorting is an often-needed thing. Sorting plays with mutability by working in-place, but for many uses it would be just as good if sorting returned a sorted copy instead -- the key thing here is the sorting, not the mutability. Alex From guido@python.org Fri Mar 14 12:16:26 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 14 Mar 2003 07:16:26 -0500 Subject: [Python-Dev] More int/long integration issues In-Reply-To: "Your message of 13 Mar 2003 19:52:15 PST." <1047613935.661.119.camel@sayge.arc.nasa.gov> References: <20030313164247.GB22296@panix.com> <1047587117.660.33.camel@sayge.arc.nasa.gov> <200303140253.h2E2r3o01163@pcp02138704pcs.reston01.va.comcast.net> <1047613935.661.119.camel@sayge.arc.nasa.gov> Message-ID: <200303141216.h2ECGQq02022@pcp02138704pcs.reston01.va.comcast.net> > I've heard others doing number theory work, who hoped or expected it to > work, as well. (Typically, they wanted to use HUGE step sizes, for > example) As long as they wanted to use longs, that's fair. E.g. now that we're trying to get rid of the difference between ints and longs, something like range(0, 2**100, 2**99) should really just work (and it better give us [0, 2**99] :-). > In any case, I'll get the patch submitted fairly soon, for range(). > Need to update the tests. Thanks. I had hoped to release beta1 before PyCon, but that's not realistic. But I'll work on it soon after. > I'm also coding an irange() for consideration in the itertools module. > At least an (explicit) replacement for the iteration usage (although, > maybe not necessary if you actually do the lazy-list in "for" loop > change.) If people need the indexing and length operations, too, I can > only suggest a pure python implementation (which could return an > irange() iterator when needed). Is that a dead-end idea, or a starter? That's something for Raymond H. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Fri Mar 14 15:28:19 2003 From: nas@python.ca (Neil Schemenauer) Date: Fri, 14 Mar 2003 07:28:19 -0800 Subject: [Python-Dev] More int/long integration issues In-Reply-To: <200303140253.h2E2r3o01163@pcp02138704pcs.reston01.va.comcast.net> References: <20030313164247.GB22296@panix.com> <1047587117.660.33.camel@sayge.arc.nasa.gov> <200303140253.h2E2r3o01163@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030314152819.GA3004@glacier.arctrix.com> Guido van Rossum wrote: > > a = 1/1e-5 > > range( a-20, a) > > This should be a TypeError. I'm sorry it isn't. A least it gives a DeprecationWarning now. Neil From tismer@tismer.com Fri Mar 14 15:42:10 2003 From: tismer@tismer.com (Christian Tismer) Date: Fri, 14 Mar 2003 16:42:10 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <200303140903.10045.aleax@aleax.it> References: <20030312164902.10494.64514.Mailman@mail.python.org> <3E70E55D.1050102@sabaydi.com> <200303140903.10045.aleax@aleax.it> Message-ID: <3E71F851.3030802@tismer.com> Alex Martelli wrote: ... > Sorting plays with mutability by working in-place, but for many > uses it would be just as good if sorting returned a sorted copy > instead -- the key thing here is the sorting, not the mutability. And the key assumption for sorting things is that the things are sortable, which means there exists and order on the basic set. Which again suggests that list elements usually have something in common. ciao - chris p.s.: I'm not using a shopping tuple, since I sort my list be the stores I have to visit. :-) -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer@tismer.com Fri Mar 14 16:35:56 2003 From: tismer@tismer.com (Christian Tismer) Date: Fri, 14 Mar 2003 17:35:56 +0100 Subject: [Python-Dev] PyEval_GetFrame() revisited Message-ID: <3E7204EC.60506@tismer.com> Hi there! here has been this patch to the threadstate, which allows to override the tstate's frame access. I just saw the part of the patch that modifies pyexpat: f = PyFrame_New( tstate, /*back*/ c, /*code*/ ! PyEval_GetGlobals(), /*globals*/ NULL /*locals*/ where the PyEval_GetGLobals is used instead of tstate->frame->f_globals Well, this unfortunately is not sufficient for this module, since pyexpat still *has* direct access to tstate->frame, in a much worse way: pyexpat does read and write the frame variable! In line 326, function call_with_frame, pyexpat creates a new frame, assigns it to tstate->frame and later on assigns f_f_back to it. Reason why I'm thinking about this: In order to simplify Stackless, I thought to remove the frame variable, and let it be accessed always via my current tasklet, which holds the frame. Looking for the number of necessary patches, I stumbled over PyExpat, and thought I should better keep my hands off. Too bad. Does it make sense to think about an API for modifying the frame? Or are we at a dead end here? cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From just@letterror.com Fri Mar 14 16:06:14 2003 From: just@letterror.com (Just van Rossum) Date: Fri, 14 Mar 2003 17:06:14 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <3E71F851.3030802@tismer.com> Message-ID: Christian Tismer wrote: > And the key assumption for sorting things is that > the things are sortable, which means there > exists and order on the basic set. > Which again suggests that list elements usually > have something in common. To me it suggests that some lists are sortable and others are not... There's one aspect about this discussion that I haven't seen mentioned yet: syntax. I think the suggested usages of lists vs. tuples has more to do with list vs. tuple _syntax_, and less with mutability. From this perspective it is natural that tuples support a different set of methods than lists. However, mutable vs. immutable has it's uses also, and from _that_ perspective it is far less understandable that tuples lack certain methods. FWIW, I quite like the way how the core classes in Cocoa/NextStep are designed. For each container-ish object there's a mutable an immutable variant, where the mutable variant is usually a subclass of the immutable one. Examples: NSString -> NSMutableString NSData -> NSMutableData NSArray -> NSMutableArray NSDictionary -> NSMutableDictionary (But then again, Objective-C doesn't have syntax support for lists _or_ tuples...) Just From guido@python.org Fri Mar 14 17:55:22 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 14 Mar 2003 12:55:22 -0500 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: "Your message of Fri, 14 Mar 2003 17:06:14 +0100." References: Message-ID: <200303141755.h2EHtMA03501@pcp02138704pcs.reston01.va.comcast.net> > FWIW, I quite like the way how the core classes in Cocoa/NextStep are > designed. For each container-ish object there's a mutable an immutable > variant, where the mutable variant is usually a subclass of the > immutable one. Examples: > NSString -> NSMutableString > NSData -> NSMutableData > NSArray -> NSMutableArray > NSDictionary -> NSMutableDictionary This has its downside too though. A function designed to take an immutable class instance cannot rely on the class instance to remain unchanged, because the caller could pass it an instance of the corresponding mutable subclass! (For example, the function might use the argument as a dict key.) In some sense this inheritance pattern breaks the "Liskov substibutability" principle: if B is a base class of C, whenever a B instance is expected, a C instance may be used. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Fri Mar 14 18:05:05 2003 From: tismer@tismer.com (Christian Tismer) Date: Fri, 14 Mar 2003 19:05:05 +0100 Subject: [Python-Dev] PyEval_GetFrame() revisited In-Reply-To: <3E7204EC.60506@tismer.com> References: <3E7204EC.60506@tismer.com> Message-ID: <3E7219D1.6090306@tismer.com> Christian Tismer wrote: > Hi there! > > here has been this patch to the threadstate, which > allows to override the tstate's frame access. > > I just saw the part of the patch that modifies > pyexpat: > > > f = PyFrame_New( > tstate, /*back*/ > c, /*code*/ > ! PyEval_GetGlobals(), /*globals*/ > NULL /*locals*/ > > where the PyEval_GetGLobals is used instead of > tstate->frame->f_globals > > Well, this unfortunately is not sufficient > for this module, since pyexpat still *has* direct > access to tstate->frame, in a much worse way: > pyexpat does read and write the frame variable! Ah!! Can it be that PyEval_GetFrame() is just indended to signal to an extension like Psyco that it needs to quickly invent a frame now? So it is *not* thought of to be a complete interface for accessing tstate->frame no longer explicitly, is is only meant for read access? So I can't move it elsewhere and probably need to work around that forever, unless we also write PyEval_SetFrame() sigh - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From fdrake@acm.org Fri Mar 14 19:07:37 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Mar 2003 14:07:37 -0500 Subject: [Python-Dev] PyEval_GetFrame() revisited In-Reply-To: <3E7204EC.60506@tismer.com> References: <3E7204EC.60506@tismer.com> <3E7219D1.6090306@tismer.com> Message-ID: <15986.10361.70059.323966@grendel.zope.com> Christian Tismer writes: > Hi there! Good afternoon! > here has been this patch to the threadstate, which > allows to override the tstate's frame access. ... > Reason why I'm thinking about this: > In order to simplify Stackless, I thought to remove > the frame variable, and let it be accessed always > via my current tasklet, which holds the frame. > > Looking for the number of necessary patches, I stumbled > over PyExpat, and thought I should better keep my > hands off. Too bad. > > Does it make sense to think about an API for > modifying the frame? Or are we at a dead end here? What's being modified isn't the frame but the tstate, but it may be reasonable to provide some API to manipulate the "current" frame. I think pyexpat is unique in doing this, but it actually makes a lot of sense; there are other modules for which a similar behavior is likely to be appropriate (one example I can think of is Fredrik's sgmlop module). What pyexpat is trying to achieve is fairly simple, and I don't think there's a better way currently. When Python code calls the Parse() or ParseFile() method of a parser object (returned from pyexpat.ParserCreate()), the parser can generate many different callbacks into Python code. pyexpat generates an artificial code object and frame that can be used to generate more useful tracebacks when exceptions are raise within callbacks; the code object indicates which callback Expat triggered, separately from the function assigned to handle that callback. This makes it much easier to debug handler functions. If there were API functions to get/set the frame, pyexpat wouldn't need to poke into the tstate at all. Would that alleviate the difficulties this creates for Stackless / Psycho? -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From zooko@zooko.com Fri Mar 14 19:09:02 2003 From: zooko@zooko.com (Zooko) Date: Fri, 14 Mar 2003 14:09:02 -0500 Subject: [Python-Dev] mutability (was: lists v. tuples) In-Reply-To: Message from Guido van Rossum of "Fri, 14 Mar 2003 12:55:22 EST." <200303141755.h2EHtMA03501@pcp02138704pcs.reston01.va.comcast.net> References: <200303141755.h2EHtMA03501@pcp02138704pcs.reston01.va.comcast.net> Message-ID: GvR wrote: > > This has its downside too though. A function designed to take an > immutable class instance cannot rely on the class instance to remain > unchanged, because the caller could pass it an instance of the > corresponding mutable subclass! (For example, the function might use > the argument as a dict key.) In some sense this inheritance pattern > breaks the "Liskov substibutability" principle: if B is a base class > of C, whenever a B instance is expected, a C instance may be used. Indeed! Presumably the designers of the NextStep libraries thought to themselves that they couldn't do it the other way (have NSArray subclass NSMutableArray) because NSArray couldn't provide a real implementation of a mutation method like "NSArray addObject". If you include the immutability guarantee as well as the methods in the "contract" offered by an interface, then its clear that neither can be a Liskov-substitution-principle-preserving subtype of the other. The E Language paid careful attention to this issue because a surprise about mutability could easily be a security hole. Their solution is quite Pythonic, inasmuch as type-checking is dynamic, structural (an object matches a type if it offers the interface regardless of whether it is explicitly declared to be a subtype), and soft (an object can implement only part of a type). These are the three noble features of Python's type system. (I occasionally hear about efforts to cripple Python's type system in order to make it as ungainly as Java's, but fortunately they always seem to fade away...) So in E, it's the same: if you are expecting a mutable list (a "FlexList") and you get an immutable one, you'll get an exception at run-time if you try a mutation operation like mylist.append("spam"). Like Python, E's strings do the right thing if you invoke immutable list ("ConstList") methods on them. The syntax for constructing maps and lists and indexing them is similar to Python's. That syntax always constructs immutable structures, a mutable version of which is generated with the method "mylist.diverge()". To get an immutable version of a mutable structure, you write "mylist.snapshot()". http://erights.org/elang/quick-ref.html#Structures Regards, Zooko http://zooko.com/ ^-- newly and incompletely restored From just@letterror.com Fri Mar 14 19:12:06 2003 From: just@letterror.com (Just van Rossum) Date: Fri, 14 Mar 2003 20:12:06 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <200303141755.h2EHtMA03501@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum wrote: > > FWIW, I quite like the way how the core classes in Cocoa/NextStep > > are designed. For each container-ish object there's a mutable an > > immutable variant, where the mutable variant is usually a subclass > > of the immutable one. Examples: > > NSString -> NSMutableString > > NSData -> NSMutableData > > NSArray -> NSMutableArray > > NSDictionary -> NSMutableDictionary > > This has its downside too though. A function designed to take an > immutable class instance cannot rely on the class instance to remain > unchanged, because the caller could pass it an instance of the > corresponding mutable subclass! (For example, the function might use > the argument as a dict key.) In some sense this inheritance pattern > breaks the "Liskov substibutability" principle: if B is a base class > of C, whenever a B instance is expected, a C instance may be used. I'm not sure how much relevance this principle has in a language in which the inheritance tree has little meaning, but since I _am_ sure you read more books about this than I did, I'll take your word for it ;-) It's not so much the inheritance hierarchy that I like about the Cocoa core classes, but the fact that mutability is a prominent part of the design. I think Python would be a better language if it had a mutable string type as well as a mutable byte-oriented data type. An immutable dict would be handy at times. An immutable list type would be great, too. Wait, we already have that. Sure, tuples are often used for struct-like things and lists for that other stuff , but I don't think it's right to say you _must_ use them like that, and that seeing/using tuples as immutable lists is _wrong_. Just From guido@python.org Fri Mar 14 19:26:10 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 14 Mar 2003 14:26:10 -0500 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: "Your message of Fri, 14 Mar 2003 20:12:06 +0100." References: Message-ID: <200303141926.h2EJQBA04239@pcp02138704pcs.reston01.va.comcast.net> > Sure, tuples are often used for struct-like things and lists for > that other stuff , but I don't think it's right to say you > _must_ use them like that, and that seeing/using tuples as immutable > lists is _wrong_. it's not wrong, but I find that many people use tuples in situations where they should really use lists, and the immutability is irrelevant. Using tuples seems to be a reflex for some people because creating a tuple saves a microsecond or so. That sounds like the wrong thing to let inform your reflexes... --Guido van Rossum (home page: http://www.python.org/~guido/) From mcherm@mcherm.com Fri Mar 14 20:13:39 2003 From: mcherm@mcherm.com (Chermside, Michael) Date: Fri, 14 Mar 2003 15:13:39 -0500 Subject: [Python-Dev] Re: lists v. tuples Message-ID: <7F171EB5E155544CAC4035F0182093F04211EE@INGDEXCHSANC1.ingdirect.com> Just writes: > It's not so much the inheritance hierarchy that I like about the Cocoa > core classes, but the fact that mutability is a prominent part of the > design. I think Python would be a better language if it had a mutable > string type as well as a mutable byte-oriented data type. An immutable > dict would be handy at times. An immutable list type would be great, > too. Wait, we already have that. I've often had the same thought myself. I'm imagining designing my own language, and I note that both mutable and immutable strings are handy, depending on what you're doing. The same is true of data containers (of all sorts, lists and dicts being examples). "What the heck?" I say to myself, "In *my* perfect language, there'll be mutable and immutable versions of every object. (With the obvious conversion behavior.) Why, you won't even have to code them separately... just specify some property indicating whether or not that instance is mutable." Then I realize that C++ has exactly this feature (it's called "const"), and that I find it to be an annoyance far more often than I find it handy. And I begin to question. Wish-I-knew-the-answer-but-I-haven't-been-enlightened-yet -- Michael Chermside From tismer@tismer.com Fri Mar 14 23:29:26 2003 From: tismer@tismer.com (Christian Tismer) Date: Sat, 15 Mar 2003 00:29:26 +0100 Subject: [Python-Dev] PyEval_GetFrame() revisited In-Reply-To: <15986.10361.70059.323966@grendel.zope.com> References: <3E7204EC.60506@tismer.com> <3E7219D1.6090306@tismer.com> <15986.10361.70059.323966@grendel.zope.com> Message-ID: <3E7265D6.3050202@tismer.com> Fred L. Drake, Jr. wrote: > Christian Tismer writes: ... > > Does it make sense to think about an API for > > modifying the frame? Or are we at a dead end here? > > What's being modified isn't the frame but the tstate, but it may be > reasonable to provide some API to manipulate the "current" frame. That was what I intended to say. > I think pyexpat is unique in doing this, but it actually makes a lot > of sense; there are other modules for which a similar behavior is > likely to be appropriate (one example I can think of is Fredrik's > sgmlop module). I just looked into MHammond's files and found AXDebug.cpp reading tstate's frame, too, line 192: PyFrameObject *frame = state ? state->frame : NULL; WOuld be another candidate to use PyEval_GetFrame() > What pyexpat is trying to achieve is fairly simple, and I don't think > there's a better way currently. When Python code calls the Parse() or > ParseFile() method of a parser object (returned from > pyexpat.ParserCreate()), the parser can generate many different > callbacks into Python code. pyexpat generates an artificial code > object and frame that can be used to generate more useful tracebacks > when exceptions are raise within callbacks; the code object indicates > which callback Expat triggered, separately from the function assigned > to handle that callback. This makes it much easier to debug handler > functions. Yes, this makes very much sense, to just wrap a frame around something to get useful tracebacks. > If there were API functions to get/set the frame, pyexpat wouldn't > need to poke into the tstate at all. Would that alleviate the > difficulties this creates for Stackless / Psycho? I think so. For internal functions, inside the main Python implementation, it is no problem to maintain a number of patches, or to even use interface functions whenre they make sense like enabling Psyco. For external modules, it would be nicer if certain implementation details could be hidden, to give more freedom to implement somethign differently, without breaking unknown modules. PyExpat is in a grey zone, since it is already in the Python distribution, and I had no problem to patch it. The reason why I'm asking is that in Stackless 2.0 and above, tstate is always carrying a so-called tasklet object which holds the current frame. There can be many of these, while there is only one current one per tstate, and they can be switched at certain times. At the moment, I always have to take care to preserve a valid tstate->frame variable, whenever I'm switching tasklets. It would make the code much smaller and cleaner, if I could simply redefine the current frame to be just the frame held in the current tasklet. Something like PyEval_SetFrame() would make sense, but there is a problem with the protocol: After changing the frame, the currently running interpreter would not know to execute it, but continue to run the running one. For normal Python, PyEval_SetFrame() would only make sense, if the calling code would also make sure that the new frame is run, or popped off after a special action was done. PyExpat, as I understand, uses this frame just as a wrapper for the case of an exception, when the frame would be used only when a failing function returns NULL. Stackless extends this and also knows to return a special value instead of NULL, in order to tell the calling interpreter to stop its action and let the new frame run. Even in PyExpat's use, it would hard to explain the use of such a function, I fear. But I'd really like to have it. cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tim.one@comcast.net Sat Mar 15 03:00:46 2003 From: tim.one@comcast.net (Tim Peters) Date: Fri, 14 Mar 2003 22:00:46 -0500 Subject: [Python-Dev] tzset Message-ID: We seem to have added tzset() gimmicks to CVS Python. test_time now fails on Windows, simply because time.tzset raises AttributeError there. Now Windows does support tzset(), but not TZ values of the form test_time.test_tzset() is testing, like environ['TZ'] = 'US/Eastern' and environ['TZ'] = 'Australia/Melbourne' The rub here is that I haven't found *any* tzset man pages on the web that claim TZ accepts such values (other than to silently ignore them because they're not in a recognized format). The POSIX defn is typical: http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html and search down for TZ. There's no way to read that as supporting the values we're testing. Anyone have a clue? not-all-pits-should-be-dived-into-ly y'rs - tim From zen@shangri-la.dropbear.id.au Sat Mar 15 06:34:26 2003 From: zen@shangri-la.dropbear.id.au (Stuart Bishop) Date: Sat, 15 Mar 2003 17:34:26 +1100 Subject: [Python-Dev] tzset In-Reply-To: Message-ID: <2ADB787D-56B0-11D7-9648-000393B63DDC@shangri-la.dropbear.id.au> On Saturday, March 15, 2003, at 02:00 PM, Tim Peters wrote: > We seem to have added tzset() gimmicks to CVS Python. That was my patch. > test_time now fails on Windows, simply because time.tzset raises > AttributeError there. test_time.test_tzset should only be run if time.tzset is defined (which should only be there if configure determines that tzset works with the TZ formats we are testing). Feel like adding a clause at the top of test_tzset to skip the test if time.tzset is not defined, or should I submit a patch? > Now Windows does support tzset(), but not TZ values of the form > test_time.test_tzset() is testing, like > > environ['TZ'] = 'US/Eastern' > and > environ['TZ'] = 'Australia/Melbourne' > > The rub here is that I haven't found *any* tzset man pages on the web > that > claim TZ accepts such values (other than to silently ignore them > because It specifies the pathname to a tzfile(5) format file, relative to a OS defined default. From BSD: If its value does not begin with a colon, it is first used as the path- name of a file (as described above) from which to read the time conver- sion information. If that file cannot be read, the value is then inter- preted as a direct specification (the format is described below) of the time conversion information. Solaris has a similar definition. Linux documents this format as needing to start with a ':' but accepts it (at least I think I tested this...) To me, this is the useful format as all the others require you to know your DST transition times rather rely of the OS to supply them. At the moment if the 'path to a tzfile(5)' format is not accepted, your tzset(3) is considered broken and time.tzset not built. I'm happy to rewrite the detection in configure.in and the test in test_time.py to lower the bar on this, but I think a better solution may be to determine if Windows has a format that lets us to DST calculations and keep the bar high. I was hoping that such a format would a) exist and b) Allow translation between the Unix standard of Country/Region to whatever-windows-uses. > not-all-pits-should-be-dived-into-ly y'rs - tim but-i-was-pushed-by-those-damn-politicans-ly y'rs -- Stuart Bishop http://shangri-la.dropbear.id.au/ From drifty@alum.berkeley.edu Sat Mar 15 06:40:16 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 14 Mar 2003 22:40:16 -0800 (PST) Subject: [Python-Dev] tzset In-Reply-To: <2ADB787D-56B0-11D7-9648-000393B63DDC@shangri-la.dropbear.id.au> References: <2ADB787D-56B0-11D7-9648-000393B63DDC@shangri-la.dropbear.id.au> Message-ID: [Stuart Bishop] > I'm happy to rewrite the detection in configure.in and the test in > test_time.py to lower the bar on this, but I think a better solution > may be to determine if Windows has a format that lets us to DST > calculations and keep the bar high. I was hoping that such a format > would a) exist and b) Allow translation between the Unix standard of > Country/Region to whatever-windows-uses. > If there is one thing I have learned from writing _strptime is that you cannot be strict in the slightest for your input when it comes to time-based data. I think this is another case where you need to be loose about input and strict with output. -Brett From aleax@aleax.it Sat Mar 15 07:57:53 2003 From: aleax@aleax.it (Alex Martelli) Date: Sat, 15 Mar 2003 08:57:53 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <3E71F851.3030802@tismer.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> Message-ID: <200303150857.53214.aleax@aleax.it> On Friday 14 March 2003 04:42 pm, Christian Tismer wrote: > Alex Martelli wrote: > ... > > > Sorting plays with mutability by working in-place, but for many > > uses it would be just as good if sorting returned a sorted copy > > instead -- the key thing here is the sorting, not the mutability. > > And the key assumption for sorting things is that > the things are sortable, which means there > exists and order on the basic set. > Which again suggests that list elements usually > have something in common. If a list contains ONE complex number and no other number, then the list can be sorted. If the list contains elements that having something in common, by both being complex numbers, then it cannot be sorted. So, lists whose elements have LESS in common (by being of widely different types) are more likely to be sortable than lists some of whose elements have in common the fact of being numbers (if one or more of those numbers are complex). Although not likely to give practical problems (after all I suspect most Pythonistas never use complex numbers at all), this anomaly (introduced in 1.6, I think) makes conceptualization less uniform and thus somewhat harder to teach. Alex From guido@python.org Sat Mar 15 12:14:50 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 15 Mar 2003 07:14:50 -0500 Subject: [Python-Dev] tzset In-Reply-To: "Your message of Sat, 15 Mar 2003 17:34:26 +1100." <2ADB787D-56B0-11D7-9648-000393B63DDC@shangri-la.dropbear.id.au> References: <2ADB787D-56B0-11D7-9648-000393B63DDC@shangri-la.dropbear.id.au> Message-ID: <200303151214.h2FCEoH05946@pcp02138704pcs.reston01.va.comcast.net> [Stuart] > test_time.test_tzset should only be run if time.tzset is defined > (which should only be there if configure determines that tzset works > with the TZ formats we are testing). Feel like adding a clause at the > top of test_tzset to skip the test if time.tzset is not defined, or > should I submit a patch? Done. [Tim] > > Now Windows does support tzset(), but not TZ values of the form > > test_time.test_tzset() is testing, like > > > > environ['TZ'] = 'US/Eastern' > > and > > environ['TZ'] = 'Australia/Melbourne' > > > > The rub here is that I haven't found *any* tzset man pages on the > > web that claim TZ accepts such values (other than to silently > > ignore them because > > It specifies the pathname to a tzfile(5) format file, relative to > a OS defined default. From BSD: > If its value does not begin with a colon, it is first used as > the pathname of a file (as described above) from which to > read the time conversion information. If that file cannot be > read, the value is then interpreted as a direct specification > (the format is described below) of the time conversion > information. > Solaris has a similar definition. Linux documents this format as > needing to start with a ':' but accepts it (at least I think I > tested this...) > > To me, this is the useful format as all the others require you to > know your DST transition times rather rely of the OS to supply them. > At the moment if the 'path to a tzfile(5)' format is not accepted, your > tzset(3) is considered broken and time.tzset not built. > > I'm happy to rewrite the detection in configure.in and the test in > test_time.py to lower the bar on this, but I think a better solution > may be to determine if Windows has a format that lets us to DST > calculations and keep the bar high. I was hoping that such a format > would a) exist and b) Allow translation between the Unix standard of > Country/Region to whatever-windows-uses. Nevertheless I don't think that the standard definition for tzset() defines which values will be accepted by a particular tzset implementation. So a test that relies on these is bound to fail on systems, not because tzset is broken, but because the test makes unfair assumptions. Perhaps you can rewrite the test to use only standardized input forms? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Mar 15 12:36:19 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 15 Mar 2003 07:36:19 -0500 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: "Your message of Sat, 15 Mar 2003 08:57:53 +0100." <200303150857.53214.aleax@aleax.it> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> Message-ID: <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> > > > Sorting plays with mutability by working in-place, but for many > > > uses it would be just as good if sorting returned a sorted copy > > > instead -- the key thing here is the sorting, not the mutability. > > > > And the key assumption for sorting things is that > > the things are sortable, which means there > > exists and order on the basic set. > > Which again suggests that list elements usually > > have something in common. > > If a list contains ONE complex number and no other number, > then the list can be sorted. But the order isn't meaningful. > If the list contains elements that having something in common, > by both being complex numbers, then it cannot be sorted. > > So, lists whose elements have LESS in common (by being of > widely different types) are more likely to be sortable than lists > some of whose elements have in common the fact of being > numbers (if one or more of those numbers are complex). > > Although not likely to give practical problems (after all I suspect > most Pythonistas never use complex numbers at all), this > anomaly (introduced in 1.6, I think) makes conceptualization > less uniform and thus somewhat harder to teach. If I had to do it over again, I'd only implement == and != for objects of vastly differing types, and limit <, <=, >, >= to objects that are meaningfully comparable. I'd like to to this in Python 3.0, but that probably means we'd have to start deprecating default comparisons except (in)equality in Python 2.4. --Guido van Rossum (home page: http://www.python.org/~guido/) From bbum@codefab.com Sat Mar 15 13:46:19 2003 From: bbum@codefab.com (Bill Bumgarner) Date: Sat, 15 Mar 2003 08:46:19 -0500 Subject: [Python-Dev] Mutability vs. Immutability (was Re: lists v. tuples) In-Reply-To: <20030315063502.17965.17900.Mailman@mail.python.org> Message-ID: <8068F74D-56EC-11D7-95E7-000393877AE4@codefab.com> On Saturday, Mar 15, 2003, at 01:35 US/Eastern, python-dev-request@python.org wrote: > This has its downside too though. A function designed to take an > immutable class instance cannot rely on the class instance to remain > unchanged, because the caller could pass it an instance of the > corresponding mutable subclass! (For example, the function might use > the argument as a dict key.) In some sense this inheritance pattern > breaks the "Liskov substibutability" principle: if B is a base class > of C, whenever a B instance is expected, a C instance may be used. In practice, this isn't an issue though it does require that the developer follow a couple of simple patterns. Since Objective-C is a C derived language, requiring the developer to follow a couple of extra simple patterns isn't a big deal considering that the developer already has to deal with all of the really fun memory management issues associated with a pointer based language. Namely, if your code takes an array-- for example-- and is going to hang on to the reference for a while and expect immutability, simply copy the array when storing it away: - (void) setSearchPath: (NSArray *) anArray { if (searchPath != anArray) { [searchPath release]; searchPath = [anArray copy]; } } If anArray is mutable, the invocation of -copy creates an immutable copy of the array without copying its contents. If anArray is immutable, -copy simply returns the same array with the reference count bumped by one: // NSArray implementation - copy { return [self retain]; } Easy and efficient, as long as the developer remembers to follow the pattern.... b.bum From tismer@tismer.com Sat Mar 15 15:59:19 2003 From: tismer@tismer.com (Christian Tismer) Date: Sat, 15 Mar 2003 16:59:19 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <200303150857.53214.aleax@aleax.it> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> Message-ID: <3E734DD7.3080105@tismer.com> Alex Martelli wrote: > On Friday 14 March 2003 04:42 pm, Christian Tismer wrote: ... >>And the key assumption for sorting things is that >>the things are sortable, which means there >>exists and order on the basic set. >>Which again suggests that list elements usually >>have something in common. > > If a list contains ONE complex number and no other number, > then the list can be sorted. By a similar argument, tuples of one element can be sorted and reversed, just by doing nothing :-) > If the list contains elements that having something in common, > by both being complex numbers, then it cannot be sorted. Sure it can, by supplying a compare function, which implements the particular sorting operation that you want. Perhaps you want to sort them by their abs value or something. (And then you probably will want a stable sort, which is meanwhile a nice fact thanks to Tim: >>> a=[1, 2, 2+2j, 3+1j, 1+3j, 3-3j, 3+1j, 1+3j] >>> a.sort(lambda x, y:cmp(abs(x), abs(y))) >>> a [1, 2, (2+2j), (3+1j), (1+3j), (3+1j), (1+3j), (3-3j)] >>> ) Complex just has no total order, which makes it impossible to provide a meaningful default ordering. > So, lists whose elements have LESS in common (by being of > widely different types) are more likely to be sortable than lists > some of whose elements have in common the fact of being > numbers (if one or more of those numbers are complex). I agree that my statement does not apply when putting non-sortable things into a list. But I don't believe that people are putting widely different types into a list in order to sort them. (Although there is an arbitrary order between strings and numbers, which I would drop in Python 2.4, too). chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tjreedy@udel.edu Sat Mar 15 17:54:05 2003 From: tjreedy@udel.edu (Terry Reedy) Date: Sat, 15 Mar 2003 12:54:05 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> Message-ID: "Guido van Rossum" wrote in message news:200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net. .. > But the order isn't meaningful. .... > If I had to do it over again, I'd only implement == and != for objects > of vastly differing types, and limit <, <=, >, >= to objects that are > meaningfully comparable. For user-defined types/classes, I presume that this would still mean deferring to the appropriate magic method (__cmp__ or __ge__?) to define 'meaningful'. > I'd like to to this in Python 3.0, but that probably means we'd have > to start deprecating default comparisons except (in)equality in Python > 2.4. +1, I think. Based on reading cl.py, the validity of nonsense comparisons is one of the more surprising 'features' of Python for beginners -- who reasonably expect a TypeError or ValueError. Once they get past that, they are then surprised by the unstability across versions. Given that universal sorting of hetero-lists is now broken, I think it would be better to do away with it cleanly. It is seldom needed and would still be available with a user-defined sorting function (which requires some thought as to what is really wanted). A Python version of the present algorithm could be included (in Tools/xx perhaps) for anyone who actually needs it. Terry J. Reedy From aleax@aleax.it Sat Mar 15 18:07:35 2003 From: aleax@aleax.it (Alex Martelli) Date: Sat, 15 Mar 2003 19:07:35 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <3E734DD7.3080105@tismer.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303150857.53214.aleax@aleax.it> <3E734DD7.3080105@tismer.com> Message-ID: <200303151907.35471.aleax@aleax.it> On Saturday 15 March 2003 04:59 pm, Christian Tismer wrote: ... > Complex just has no total order, which makes it impossible to > provide a meaningful default ordering. Back in Python 1.5.2 times, the "impossible to provide" ordering was there. No more (and no less!) "meaningful" than, say, comparisons between (e.g.) lists, numbers, strings and dicts, which _are_ still provided as of Python 2.3. > I agree that my statement does not apply when putting > non-sortable things into a list. But I don't believe A list containing ONE complex number and (e.g.) three strings is sortable. So, there are NO "non-sortable things". A list is non-sortable (unless by providing a custom compare, as you pointed out) if it contains a complex number and any other number -- so, there _are_ "non-sortable LISTS" (unless suitable custom compares are used), but still no "non-sortable THINGS" in current Python. > that people are putting widely different types into > a list in order to sort them. (Although there is an > arbitrary order between strings and numbers, which > I would drop in Python 2.4, too). Such a change would indeed enormously increase the number of non-sortable (except by providing custom compares) lists. So, for example, all places which get and sort the list of keys in a dictionary in order to return or display the keys should presumably do the sorting within a try/except? Or do you think a dictionary should also be constrained to have keys that are all comparable with each other (i.e., presumably, never both string AND number keys) as well as hashable? I fail to see how forbidding me to sort the list of keys of any arbitrary dict will enhance my productivity in any way -- it's bad enough (in theory -- in practice it doesn't bite much as complex numbers are used so rarely) with the complex numbers thingy, why make it even worse by inventing a novel "strings vs numbers" split? Since when is Python about forbidding the user to do quite normal things such as sorting the list of keys of any arbitrary dictionary for more elegant display -- for no practically useful purpose that I've ever seen offered, in brisk violation of "practicality beats purity"? Alex From tismer@tismer.com Sat Mar 15 19:50:33 2003 From: tismer@tismer.com (Christian Tismer) Date: Sat, 15 Mar 2003 20:50:33 +0100 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <200303151907.35471.aleax@aleax.it> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303150857.53214.aleax@aleax.it> <3E734DD7.3080105@tismer.com> <200303151907.35471.aleax@aleax.it> Message-ID: <3E738409.4060500@tismer.com> Alex Martelli wrote: ... > A list containing ONE complex number and (e.g.) three > strings is sortable. So, there are NO "non-sortable things". Quite an academical POV for a practical man like you. > A list is non-sortable (unless by providing a custom compare, > as you pointed out) if it contains a complex number and any > other number -- so, there _are_ "non-sortable LISTS" (unless > suitable custom compares are used), but still no "non-sortable > THINGS" in current Python. Don't understand: How is a tuple not a non-sortable thing, unless I turn it into a list, which is not a tuple? Or do you mean the complex, which refuses to be sorted, unlike other obejcts, which don't provide any ordering, and are ordered by ID? [number/string comparison] > Such a change would indeed enormously increase the > number of non-sortable (except by providing custom > compares) lists. Theoretical lists, or those existing in real applications? For the latter, most of the time, mixing ints and strings was most often a programming error in my past. > So, for example, all places which get > and sort the list of keys in a dictionary in order to return > or display the keys should presumably do the sorting > within a try/except? Or do you think a dictionary should > also be constrained to have keys that are all comparable > with each other (i.e., presumably, never both string AND > number keys) as well as hashable? I would like to have sub-classes of dictionaries, which protect me from putting key into them which I didn't intend to. But that doesn't mean that I want to forbid it once and forever. Concerning general dicts, you are right, sorting the keys makes sense to get them into some arbitrary order. > I fail to see how forbidding me to sort the list of keys of > any arbitrary dict will enhance my productivity in any way -- I thought it would catch the cases where you failed to build a key of the intended type. Maybe this is worse than what we have now, tho. I have to say that this wasn't the point of my message, so I don't care to discuss it. ... > Since when is Python about forbidding the user to do > quite normal things such as sorting the list of keys of > any arbitrary dictionary for more elegant display -- for > no practically useful purpose that I've ever seen offered, > in brisk violation of "practicality beats purity"? Well, I just don't like such an arbitrary thing, that a string is always bigger than an int. Since we don't allow them to use as each other by coercion, we also should not compare them. Bean counts are bean counts, and names are names. One could go the AWK way, where ints and strings were concerted whenever necessaray, but that would be even worse. Maybe the way Python handles it is not so bad. But then it sould be consequent and at least move complex objects into their own group in the sorted array, maybe just not sorting themselves. Anyway, this would also not increase your/my productivity in any way, so let's get back to real problems. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido@python.org Sat Mar 15 22:45:34 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 15 Mar 2003 17:45:34 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: "Your message of Sat, 15 Mar 2003 12:54:05 EST." References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> [Terry Reedy] > For user-defined types/classes, I presume that this would still mean > deferring to the appropriate magic method (__cmp__ or __ge__?) to > define 'meaningful'. Yes. And I'm still hoping to remove __cmp__; there should be only one way to overload comparisons. > > I'd like to to this in Python 3.0, but that probably means we'd have > > to start deprecating default comparisons except (in)equality in > Python > > 2.4. > > +1, I think. > > Based on reading cl.py, the validity of nonsense comparisons is one of > the more surprising 'features' of Python for beginners -- who > reasonably expect a TypeError or ValueError. Once they get past that, > they are then surprised by the unstability across versions. Given > that universal sorting of hetero-lists is now broken, I think it would > be better to do away with it cleanly. It is seldom needed and would > still be available with a user-defined sorting function (which > requires some thought as to what is really wanted). Exactly. > A Python version of the present algorithm could be included (in > Tools/xx perhaps) for anyone who actually needs it. I doubt there will be many takers. Let people make up their own version, so they know its behavior. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Mar 15 22:43:10 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 15 Mar 2003 17:43:10 -0500 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: "Your message of Sat, 15 Mar 2003 19:07:35 +0100." <200303151907.35471.aleax@aleax.it> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303150857.53214.aleax@aleax.it> <3E734DD7.3080105@tismer.com> <200303151907.35471.aleax@aleax.it> Message-ID: <200303152243.h2FMhA706558@pcp02138704pcs.reston01.va.comcast.net> [Christian Tismer] > > that people are putting widely different types into > > a list in order to sort them. (Although there is an > > arbitrary order between strings and numbers, which > > I would drop in Python 2.4, too). [Alex Martelli] > Such a change would indeed enormously increase the > number of non-sortable (except by providing custom > compares) lists. So, for example, all places which get > and sort the list of keys in a dictionary in order to return > or display the keys should presumably do the sorting > within a try/except? I don't believe this argument. I've indeed often sorted a dict's keys (or values), but always in situations where the sorted values were homogeneous as far meaningful comparison goes, e.g. all numbers, or all strings, or all "compatible" tuples. > Or do you think a dictionary should also be constrained to have keys > that are all comparable with each other (i.e., presumably, never > both string AND number keys) as well as hashable? If you know *nothing* about the keys of a dict, you already have to do that if you want to sort the keys. There are lots of apps that have no need to ever sort the keys: if there weren't, it would have been wiser to keep the keys in sorted order in the first place, like ABC did. > I fail to see how forbidding me to sort the list of keys of > any arbitrary dict will enhance my productivity in any way -- > it's bad enough (in theory -- in practice it doesn't bite much > as complex numbers are used so rarely) with the complex > numbers thingy, why make it even worse by inventing a > novel "strings vs numbers" split? To the contrary, I don't see how it will reduce your productivity. You seem to be focusing on the wrong thing (sorting dict keys). The right thing to consider here is that "a < b" should only work if a and b can be meaningfully ordered, just like "a + b" only works if a and b can be meaningfully added. > Since when is Python about forbidding the user to do > quite normal things such as sorting the list of keys of > any arbitrary dictionary for more elegant display -- for > no practically useful purpose that I've ever seen offered, > in brisk violation of "practicality beats purity"? I doubt that elegant display of a dictionary with wildly incompatible keys is high on anybody's list of use cases. On the other hand, I'm sure that raising an exception on abominations like 2 < "1" or (1, 2) < 0 is a good thing, just like we all agree that forbidding 1 + "2" is a good thing. Of course, == and != will continue to accept objects of incongruous types -- these will simply be considered inequal. That's the cornerstone of dictionaries, and I see no reason to change this -- while I don't know whether 1 ought to be considered less than or greater than 1j, I damn well know they aren't equal! (And I'm specifically excluding gray areas like comparing tuples and lists. Given that (a, b) = [1, 2] now works, as does [].extend(()), it might be better to allow comparing tuples to lists, and even consider them equal if they have the same length and their items compare equal pairwise. This despite my position about the different idiomatic uses of the two types. And so the circle closes [see Subject]. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Sat Mar 15 23:44:36 2003 From: ark@research.att.com (Andrew Koenig) Date: 15 Mar 2003 18:44:36 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido> Yes. And I'm still hoping to remove __cmp__; there should be Guido> only one way to overload comparisons. Moreover, for some data structures, the __cmp__ approach can be expensive. For example, if you're comparing sequences of any kind, and you know that the comparison is for == or !=, you have your answer immediately if the sequences differ in length. If you don't know what's being tested, as you wouldn't inside __cmp__, you may spend a lot more time to obtain a result that will be thrown away. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From glyph@twistedmatrix.com Sun Mar 16 04:19:30 2003 From: glyph@twistedmatrix.com (Glyph Lefkowitz) Date: Sat, 15 Mar 2003 22:19:30 -0600 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <20030315063502.17965.57753.Mailman@mail.python.org> Message-ID: <7BFB4526-5766-11D7-806A-000393C9700E@twistedmatrix.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, March 15, 2003, at 12:35 AM, python-dev-request@python.org wrote: > Then I realize that C++ has exactly this feature (it's called "const"), > and that I find it to be an annoyance far more often than I find it > handy. And I begin to question. I have thought about this as well; I think that the problem is that in C++, you have to declare "const" *everywhere* -- you can't just pass a mutable data structure and have the "right thing" happen the way it most obviously should. Arguably when one is mucking about at such a low level as is common in C++, this is something that you have to be really careful about, but I still think that it's handled badly in the syntax. Imagine for a moment that dictionaries and lists in Python had a "const" method which would immutabilify them (if that's even a word). The following example: >> const char* strrev(const char* torev) { >> // reverse the string >> } >> ... >> char* x = "1234"; >> char* y = (char*) strrev((const char*)x); // tell me I've been bad, >> g++! >> ... As opposed to: >> def strrev(s): >> "Please pass me immutable data!" >> # reverse the string >> ... >> x = '1234' >> y = strrev(x.const()).copy() >> ... I think that the latter is far less likely to annoy. Of course, in this hypothetical example, I can design all kinds of convenient behavior for these Python mutability operations to have, like 'copy' always returning a mutable shallow copy of the data structure in question, and '.const()' making an object immutable and then returning 'self'... This smells like another unformed PEP I don't have the time to think about or implement :-(, but I would definitely like to see mutability guarantees worm their way into the language at some point, too. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (Darwin) iD8DBQE+c/tWvVGR4uSOE2wRAmeoAJ9tSOYOKTCxcl6Aj6reelmFU8OafwCggcNY smKTK1+HRCCEC9Pl/mhE4cI= =cMYA -----END PGP SIGNATURE----- From tim.one@comcast.net Sun Mar 16 04:36:28 2003 From: tim.one@comcast.net (Tim Peters) Date: Sat, 15 Mar 2003 23:36:28 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > Yes. And I'm still hoping to remove __cmp__; there should be only one > way to overload comparisons. As long as we're going to break everyone's code, list.sort(f) should also be redefined then so that f returns a Boolean, f(a, b) meaning a is less than b. list.sort()'s implementation uses only less-than already, and it seems that all newcomers to Python who didn't come by way of C or Perl (same thing in this respect ) expect sort's comparison function to work that way. From tim.one@comcast.net Sun Mar 16 05:07:47 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 16 Mar 2003 00:07:47 -0500 Subject: [Python-Dev] tzset In-Reply-To: Message-ID: [Brett Cannon] > If there is one thing I have learned from writing _strptime is that you > cannot be strict in the slightest for your input when it comes to > time-based data. I think this is another case where you need to be loose > about input and strict with output. Python doesn't do anything with TZ's value -- it doesn't even look to see whether TZ is set, let alone parse it (well, Python's obsolete tzparse module parses TZ's value, but the new code in question does not). The cross-platform semantics of TZ are a joke. The tests we have rely on non-standard extensions (viewing POSIX as the only definitive std here). Even if they stuffed colons at the front, POSIX leaves the interpretation of colon-initiated TZ values entirely up to the implementation: If TZ is of the first format (that is, if the first character is a colon), the characters following the colon are handled in an implementation-defined manner. Worse, if the platform tzset() isn't happy with TZ's value, it has no way to tell you: the function is declared void, and has no defined effects on errno. I hope the community takes up the challenge of building a sane cross-platform time zone facility building on 2.3 datetime's tzinfo objects. From tim.one@comcast.net Sun Mar 16 05:28:04 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 16 Mar 2003 00:28:04 -0500 Subject: [Python-Dev] tzset In-Reply-To: <2ADB787D-56B0-11D7-9648-000393B63DDC@shangri-la.dropbear.id.au> Message-ID: [Stuart Bishop] > ... > To me, this is the useful format as all the others require you to > know your DST transition times rather rely of the OS to supply them. But since there are no defined names in POSIX, supplying transition rules explicitly via the second POSIX format is the only way that has a shot at being portable. > At the moment if the 'path to a tzfile(5)' format is not accepted, your > tzset(3) is considered broken and time.tzset not built. I'll let the Unix weenies straighten out their own mess here . > I'm happy to rewrite the detection in configure.in and the test in > test_time.py to lower the bar on this, but I think a better solution > may be to determine if Windows has a format that lets us to DST > calculations and keep the bar high. I couldn't parse that, but I've got no interest in exposing the Windows version of tzset() to Python users regardless (it's a lame effort to mimic part of the Unixish TZ gimmicks; the Win32 API has a richer way to deal with time zones, which doesn't use environment variables). From arigo@tunes.org Sun Mar 16 07:33:03 2003 From: arigo@tunes.org (Armin Rigo) Date: Sat, 15 Mar 2003 23:33:03 -0800 (PST) Subject: [Python-Dev] PyEval_GetFrame() revisited In-Reply-To: <3E7219D1.6090306@tismer.com>; from tismer@tismer.com on Fri, Mar 14, 2003 at 07:05:05PM +0100 References: <3E7204EC.60506@tismer.com> <3E7219D1.6090306@tismer.com> Message-ID: <20030316073303.31B3E5147@bespin.org> Hello Christian, On Fri, Mar 14, 2003 at 07:05:05PM +0100, Christian Tismer wrote: > > where the PyEval_GetGLobals is used instead of > > tstate->frame->f_globals > Ah!! > Can it be that PyEval_GetFrame() is just indended > to signal to an extension like Psyco that it needs > to quickly invent a frame now? Yes, indeed. This was a very limited hack so that the frame would get the correct locals even in the presence of Psyco. Now I realize that it may have been pointless anyway, if this dummy frame is never really used but for tracebacks. Maybe an API to manipulate tstate->frame could be useful and really lightweight. Alternatively, we could consider what pyexpat does as a general pattern and have an API for it, e.g.: PyFrame_Push(PyFrameObject* f) -> pushes 'f' on the frame stack, assert()ing that f->f_back is tstate->frame or pushes a new placeholder frame if 'f' is NULL. This also calls the profile and trace hooks. PyFrame_Pop() -> pops the frame, calling profile and trace hooks, and recording a traceback if PyErr_Occurred(). and maybe a PyFrame_FromC() function that creates a placeholder with controllable parameters as in pyexpat.c:getcode(). A bientôt, Armin. From guido@python.org Sun Mar 16 12:32:04 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 16 Mar 2003 07:32:04 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: "Your message of 15 Mar 2003 18:44:36 EST." References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> > Guido> Yes. And I'm still hoping to remove __cmp__; there should be > Guido> only one way to overload comparisons. > > Moreover, for some data structures, the __cmp__ approach can be > expensive. For example, if you're comparing sequences of any kind, > and you know that the comparison is for == or !=, you have your answer > immediately if the sequences differ in length. If you don't know > what's being tested, as you wouldn't inside __cmp__, you may spend a > lot more time to obtain a result that will be thrown away. Yes. OTOH, as long as cmp() is in the language, these same situations are more efficiently done by a __cmp__ implementation than by calling __lt__ and then __eq__ or similar (it's hard to decide which order is best). So cmp() should be removed at the same time as __cmp__. And then we should also change list.sort(), as Tim points out. Maybe we can start introducing this earlier by using keyword arguments: list.sort(lt=function) sorts using a < implementation list.sort(cmp=function) sorts using a __cmp__ implementation --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 16 13:06:02 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 16 Mar 2003 08:06:02 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: "Your message of Sun, 16 Mar 2003 07:32:04 EST." <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> > > Guido> Yes. And I'm still hoping to remove __cmp__; there should be > > Guido> only one way to overload comparisons. [Andrew] > > Moreover, for some data structures, the __cmp__ approach can be > > expensive. For example, if you're comparing sequences of any kind, > > and you know that the comparison is for == or !=, you have your answer > > immediately if the sequences differ in length. If you don't know > > what's being tested, as you wouldn't inside __cmp__, you may spend a > > lot more time to obtain a result that will be thrown away. [Guido] > Yes. OTOH, as long as cmp() is in the language, these same situations > are more efficiently done by a __cmp__ implementation than by calling > __lt__ and then __eq__ or similar (it's hard to decide which order is > best). So cmp() should be removed at the same time as __cmp__. I realized the first sentence wasn't very clear. I meant that implementing cmp() is inefficient without __cmp__ for some types (especially container types). Example: cmp(range(1000)+[1], range(1000)+[0]) If the list type implements __cmp__, each of the pairs of items is compared once. OTOH, if the list type only implemented __lt__, __eq__ and __gt__, cmp() presumably would have to try one of those first, and then another one. If it picked __lt__ followed by __eq__, it would get two False results in a row, meaning it could return 1 (cmp() doesn't really expect incomparable results :-), but at the cost of comparing each pair of items twice. If cmp() picked another set of two operators to try, I'd simply adjust the example. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Sun Mar 16 15:37:14 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 16 Mar 2003 10:37:14 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030316153714.GA17944@panix.com> On Sun, Mar 16, 2003, Guido van Rossum wrote: > > I realized the first sentence wasn't very clear. I meant that > implementing cmp() is inefficient without __cmp__ for some types > (especially container types). Example: > > cmp(range(1000)+[1], range(1000)+[0]) > > If the list type implements __cmp__, each of the pairs of items is > compared once. OTOH, if the list type only implemented __lt__, __eq__ > and __gt__, cmp() presumably would have to try one of those first, and > then another one. If it picked __lt__ followed by __eq__, it would > get two False results in a row, meaning it could return 1 (cmp() > doesn't really expect incomparable results :-), but at the cost of > comparing each pair of items twice. If cmp() picked another set of > two operators to try, I'd simply adjust the example. That's something I've been thinking about. I use cmp() for that purpose in the BCD module, because I do need the 3-way result (and it appears that Eric kept that). OTOH, it's certainly easy enough to define a cmp() function, and not having the builtin wouldn't kill performance. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Register for PyCon now! http://www.python.org/pycon/reg.html From ark@research.att.com Sun Mar 16 16:02:13 2003 From: ark@research.att.com (Andrew Koenig) Date: Sun, 16 Mar 2003 11:02:13 -0500 (EST) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Sun, 16 Mar 2003 08:06:02 -0500) References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303161602.h2GG2DO00056@europa.research.att.com> Guido> I realized the first sentence wasn't very clear. I meant that Guido> implementing cmp() is inefficient without __cmp__ for some types Guido> (especially container types). Example: Guido> cmp(range(1000)+[1], range(1000)+[0]) Guido> If the list type implements __cmp__, each of the pairs of items Guido> is compared once. OTOH, if the list type only implemented Guido> __lt__, __eq__ and __gt__, cmp() presumably would have to try Guido> one of those first, and then another one. If it picked __lt__ Guido> followed by __eq__, it would get two False results in a row, Guido> meaning it could return 1 (cmp() doesn't really expect Guido> incomparable results :-), but at the cost of comparing each Guido> pair of items twice. If cmp() picked another set of two Guido> operators to try, I'd simply adjust the example. Yes. If you want to present a 3-way comparison to users, an underlying 3-way comparison is the fastest way to do it. The trouble is that a 3-way comparison is definitely not the fastest way to present a 2-way comparison to users. So if you want users to see separate 2-way and 3-way comparisons, I think the fastest way to implement them is not to try to force commonality where none exists. From ark@research.att.com Sun Mar 16 18:59:30 2003 From: ark@research.att.com (Andrew Koenig) Date: 16 Mar 2003 13:59:30 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: References: Message-ID: Tim> [Guido] >> Yes. And I'm still hoping to remove __cmp__; there should be only one >> way to overload comparisons. Tim> As long as we're going to break everyone's code, list.sort(f) Tim> should also be redefined then so that f returns a Boolean, f(a, Tim> b) meaning a is less than b. I don't think it's necessary to break code in order to accommodate that change, as long as you're willing to tolerate one extra comparison per call to sort, plus a small amount of additional overhead. As I understand it, the problem is to distinguish between a function that returns negative, zero, or positive, depending on the result of the comparison, and a function that returns true or false. So if we had a way to determine efficiently which kind of function the user supplied, we could maintain compatibility. Imagine, then, that we have a function f, and we want to figure out which kind of function it is. Assume, furthermore, that the only kind of commparisons we want to perform is to determine whether a < b for various values of a and b. Note first that whenever f(a, b) returns 0, we don't care which kind of function f is, because a < b will be false in either case. So we allow our sort algorithm to run until the first time a call to f(a, b) returns a nonzero value. Now we can determine what kind of function f is by calling f(b, a). If f(b, a) is zero, then f is a boolean predicate. If f(b, a) is nonzero, then f returns negative/zero/positive -- and, incidentally, f(b, a) had better have the opposite sign from f(a, b). I understand that there is some overhead involved in storing the information about which kind of comparison it is, and testing it on each comparison. I suspect, however, that that overhead can be made small compared to the overhead involved in calling the comparison function itself. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From thomas@xs4all.net Sun Mar 16 19:42:00 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 16 Mar 2003 20:42:00 +0100 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030316194200.GO2112@xs4all.nl> On Sun, Mar 16, 2003 at 07:32:04AM -0500, Guido van Rossum wrote: > > Guido> Yes. And I'm still hoping to remove __cmp__; there should be > > Guido> only one way to overload comparisons. [ Andrew Koenig ] > > Moreover, for some data structures, the __cmp__ approach can be > > expensive. For example, if you're comparing sequences of any kind, > > and you know that the comparison is for == or !=, you have your answer > > immediately if the sequences differ in length. If you don't know > > what's being tested, as you wouldn't inside __cmp__, you may spend a > > lot more time to obtain a result that will be thrown away. > Yes. OTOH, as long as cmp() is in the language, these same situations > are more efficiently done by a __cmp__ implementation than by calling > __lt__ and then __eq__ or similar (it's hard to decide which order is > best). So cmp() should be removed at the same time as __cmp__. I'm confused, as well as conflicted. I know I'm not well educated in language design or mathematics, and I realize that comparisons between types don't always make *mathematical* sense, but to go from there to removing type-agnostic (not the right term, but bear with me) list-sorting and three-way comparison altogether is really a big jump, and one I really don't agree with. I find being able to sort (true) heterogeneous lists in a consistent if not 'purely' sensible manner to be quit useful at times, and all other times I already know I have a homogeneous list and don't care about it. It's a practical approach because I don't have to think about how it's going to be sorted, I don't have to take every edgecase into account, and I don't have to know in advance what my list is going to contain (or update all calls to 'sort' when I find I have to find a conflicting type to the list.) I do not see how this is harmful; the cases I've seen where people bump into this the hard way (e.g. doing "0" < 1) were fundamental misunderstandings that would be corrected in a dozen other ways. Allowing 'senseless' comparisons does not strike me as a major source of confusion or bad code. I was uneasy about the change in complex number comparison, but I didn't mind that, then, because it is a very rarely used feature to start with and when looking at it from a 'unified number' point of view, it makes perfect sense. But the latter does not apply to most other types, and I don't believe it should. My defensive programming nature (I write Perl for a living, if I wasn't defensive by nature I'd have committed suicide by now) would probably make me always use a 'useful sorter' function, possibly by using subclasses of list (so I could guard against other subtle changes, too, by changing one utility library, tw.Tools. Yuck.) I really don't like how that affects the readability of the code. I'd feel better about disallowing '==' for floating point numbers, as I can see why that is a problem. But I don't feel good about that either ;) I really like how most Python objects are convenient. Lists grow when you need them to, slices do what you think they do, dicts can take any (hashable) object as a key (boy, do I miss that in Perl), mathematical operations work with simple operators even between types, objects, instances, classes and modules all behave consistently and have consistent syntax. Yes, Python has some odd quirks, some of which require a comment or two when writing code that will be read by people with little or no Python knowledge (e.g. my colleagues.) But I believe adding a small comment like "'global' is necessary to *assign* to variables in the module namespace" or "'%(var)s' is how we say '$var'" or "'x and y or s' is like 'x ? y : s' if y is true, which it always is here" or any of the half-dozen other things I can imagine, not counting oddities in standard modules, is preferable over forcing the syntax or restricting the usage to try and 'solve' the quirks. > And then we should also change list.sort(), as Tim points out. Maybe > we can start introducing this earlier by using keyword arguments: > list.sort(lt=function) sorts using a < implementation > list.sort(cmp=function) sorts using a __cmp__ implementation Perhaps we need stricter prototypes, that define the returnvalue. Or properties on (or classes of) functions, so we can tell whether a function implements the lessthan interface, or the threeway one. It would definately *look* better than the above ;) Practically-beats-this-idea-in-my-eyes'ly y'rs ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Sun Mar 16 20:34:17 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 16 Mar 2003 15:34:17 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: "Your message of Sun, 16 Mar 2003 11:02:13 EST." <200303161602.h2GG2DO00056@europa.research.att.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> <200303161602.h2GG2DO00056@europa.research.att.com> Message-ID: <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> > Guido> I realized the first sentence wasn't very clear. I meant that > Guido> implementing cmp() is inefficient without __cmp__ for some types > Guido> (especially container types). Example: > > Guido> cmp(range(1000)+[1], range(1000)+[0]) > > Guido> If the list type implements __cmp__, each of the pairs of items > Guido> is compared once. OTOH, if the list type only implemented > Guido> __lt__, __eq__ and __gt__, cmp() presumably would have to try > Guido> one of those first, and then another one. If it picked __lt__ > Guido> followed by __eq__, it would get two False results in a row, > Guido> meaning it could return 1 (cmp() doesn't really expect > Guido> incomparable results :-), but at the cost of comparing each > Guido> pair of items twice. If cmp() picked another set of two > Guido> operators to try, I'd simply adjust the example. [Andrew Koenig] > Yes. If you want to present a 3-way comparison to users, an > underlying 3-way comparison is the fastest way to do it. The trouble > is that a 3-way comparison is definitely not the fastest way to > present a 2-way comparison to users. > > So if you want users to see separate 2-way and 3-way comparisons, > I think the fastest way to implement them is not to try to force > commonality where none exists. This seems an argument for keeping both __cmp__ and the six __lt__ etc. Yet TOOWTDI makes me want to get rid of __cmp__. I wonder, what's the need for cmp()? My hunch is that the main reason for cmp() is that it's specified in various APIs -- e.g. list.sort(), or FP hardware. But don't those APIs usually specify cmp() because their designers (mistakenly) believed the three different outcomes were easy to compute together and it would simplify the API? --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Sun Mar 16 20:43:49 2003 From: python@rcn.com (Raymond Hettinger) Date: Sun, 16 Mar 2003 15:43:49 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> <200303161602.h2GG2DO00056@europa.research.att.com> <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <002501c2ebfc$d68f59c0$125ffea9@oemcomputer> > [Andrew Koenig] > > Yes. If you want to present a 3-way comparison to users, an > > underlying 3-way comparison is the fastest way to do it. The trouble > > is that a 3-way comparison is definitely not the fastest way to > > present a 2-way comparison to users. > > > > So if you want users to see separate 2-way and 3-way comparisons, > > I think the fastest way to implement them is not to try to force > > commonality where none exists. > > This seems an argument for keeping both __cmp__ and the six __lt__ > etc. Yet TOOWTDI makes me want to get rid of __cmp__. Recent experience with sets.py shows that __cmp__ has a high PITA factor when combined rich comparisons. There was no good way to produce all of the desired behaviors: * <, <=, >, >= having subset interpretations * __cmp__ being marked as not implemented * cmp(a,b) not by-passing __cmp__ when __lt__ and __eq__ were defined. The source of the complications is that comparing Set('a') and Set('b') returns False for *all* of <, <=, ==, >=, >. Internally, three-way compares relied on the falsehood of some implying the truth of others. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From pedronis@bluewin.ch Sun Mar 16 20:48:11 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sun, 16 Mar 2003 21:48:11 +0100 Subject: [Python-Dev] Re: Re: lists v. tuples References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> <200303161602.h2GG2DO00056@europa.research.att.com> <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <028801c2ebfd$5bc6be80$6d94fea9@newmexico> From: "Guido van Rossum" > > This seems an argument for keeping both __cmp__ and the six __lt__ > etc. Yet TOOWTDI makes me want to get rid of __cmp__. > one minor problem with the six __lt__ etc is that they should be all defined. For quick things (although I know better) I still define just __cmp__ out of laziness. regards From python@rcn.com Sun Mar 16 21:59:56 2003 From: python@rcn.com (Raymond Hettinger) Date: Sun, 16 Mar 2003 16:59:56 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> <200303161602.h2GG2DO00056@europa.research.att.com> <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> <028801c2ebfd$5bc6be80$6d94fea9@newmexico> Message-ID: <001701c2ec07$624f8520$125ffea9@oemcomputer> > one minor problem with the six __lt__ etc is that they should be all defined. > For quick things (although I know better) I still define just __cmp__ out of > laziness. Sometime back, I proposed a mixin for this. class C(CompareMixin): def __eq__(self, other): ... def __lt__(self, other): ... The __eq__ by itself is enough to get __ne__ defined for you. Defining both __eq__ and __lt__ gets you all the rest. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From tim.one@comcast.net Sun Mar 16 22:09:29 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 16 Mar 2003 17:09:29 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <028801c2ebfd$5bc6be80$6d94fea9@newmexico> Message-ID: [Samuele Pedroni] > one minor problem with the six __lt__ etc is that they should be > all defined. For quick things (although I know better) I still define > just __cmp__ out of laziness. Or out of sanity . 2.3's datetime type is interesting that way. The Python implementation of that (which lives in Zope3, and in a Python sandbox, and which you may want to use for Jython) now has lots of trivial variations of def __le__(self, other): if isinstance(other, date): return self.__cmp(other) <= 0 elif hasattr(other, "timetuple"): return NotImplemented else: _cmperror(self, other) Before 2.3a2, it just defined __cmp__ and so avoided this code near-duplication, but then we decided it would be better to let == and != return False and True (respectively, and instead of raising TypeError) when mixing a date or time or datetime with some other type. From tim.one@comcast.net Sun Mar 16 22:32:15 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 16 Mar 2003 17:32:15 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > I wonder, what's the need for cmp()? My hunch is that the main reason > for cmp() is that it's specified in various APIs -- e.g. list.sort(), > or FP hardware. But don't those APIs usually specify cmp() because > their designers (mistakenly) believed the three different outcomes > were easy to compute together and it would simplify the API? The three possible outcomes from lexicographic comparison are natural to compute together, though (compare elements left to right until hitting the first non-equal element compare). I expect C's designers had string comparison mostly in mind, and it's natural for lots of search structures to need know which of the three outcomes obtains. For example, probing a vanilla binary search tree needs to stop when it hits a node with key equal to the thing searched for, or move left or right when != obtains. Long int comparison is a variant of lexicographic comparison, and this problem shows up repeatedly in a multitude of guises: you have postive long ints x and y, and want to find the long int q closest to x/y. q, r = divmod(x, y) # round nearest/even if 2*r > q or (q & 1 and 2*r == q): q += 1 is more expensive than necessary when the "q & 1 and 2*r == q" part holds: the "2*r > q" part had to compare 2*r to q all the way to the end to deduce that > wasn't the answer, and then you do it all over again to deduce that equality is the right answer. q, r = divmod(x, y) c = cmp(2*r, q) if c > 0 or (q & 1 and c == 0): q += 1 is faster, and Python's long_compare() doesn't do any more work than is really needed by this algorithm. So sometimes cmp() is exactly what you want. OTOH, if Python never had it, the efficiency gains in such cases probably aren't common enough that a compelling case for adding it could have been made. From drifty@alum.berkeley.edu Sun Mar 16 23:25:15 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 16 Mar 2003 15:25:15 -0800 (PST) Subject: [Python-Dev] tzset In-Reply-To: References: Message-ID: [Tim Peters] > The cross-platform semantics of TZ are a joke. The tests we have rely on > non-standard extensions (viewing POSIX as the only definitive std here). > Even if they stuffed colons at the front, POSIX leaves the interpretation of > colon-initiated TZ values entirely up to the implementation: > > If TZ is of the first format (that is, if the first character is a > colon), the characters following the colon are handled in an > implementation-defined manner. > > Worse, if the platform tzset() isn't happy with TZ's value, it has no way to > tell you: the function is declared void, and has no defined effects on > errno. > If this thing is so broken, why are we bothering with it? It's one thing to want to give people access to facilities that do something useful; it's another thing entirely to give them access to something that is broken. Perhaps if we are going to bother to make this available the work should be done to make it have more standard output? So take whatever the C function returns and then make it conform to some reasonable output. -Brett From greg@cosc.canterbury.ac.nz Sun Mar 16 23:45:40 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 17 Mar 2003 11:45:40 +1200 (NZST) Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <200303150857.53214.aleax@aleax.it> Message-ID: <200303162345.h2GNjeH03921@oma.cosc.canterbury.ac.nz> Alex Martelli : > So, lists whose elements have LESS in common (by being of > widely different types) are more likely to be sortable than lists > some of whose elements have in common the fact of being > numbers (if one or more of those numbers are complex). As I think I've mentioned before, Python really needs two different kinds of comparison: one which does whatever makes sense for objects of compatible types (and which need not be supported by all types), and another which imposes an arbitrary order on all objects. When sorting a list, you would have to specify which kind of ordering you wanted. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Sun Mar 16 23:54:54 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 17 Mar 2003 11:54:54 +1200 (NZST) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303162354.h2GNssR04071@oma.cosc.canterbury.ac.nz> Guido: > And I'm still hoping to remove __cmp__; there should be only one > way to overload comparisons. I'd rather you kept it and re-defined it to mean "compare for arbitrary ordering". (Maybe change its name if there are backwards-compatibility issues.) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Mar 17 00:37:20 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 17 Mar 2003 12:37:20 +1200 (NZST) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303170037.h2H0bKf04632@oma.cosc.canterbury.ac.nz> Guido: > But don't those APIs usually specify cmp() because > their designers (mistakenly) believed the three different outcomes > were easy to compute together and it would simplify the API? I reckon it all goes back to Fortran with its IF (X) 10,20,30 statement. Maybe the first Fortran machine had a 3-way jump instruction? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Mon Mar 17 01:43:33 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 16 Mar 2003 20:43:33 -0500 Subject: [Python-Dev] tzset In-Reply-To: "Your message of Sun, 16 Mar 2003 15:25:15 PST." References: Message-ID: <200303170143.h2H1hXH16557@pcp02138704pcs.reston01.va.comcast.net> > If this thing is so broken, why are we bothering with it? It's one thing > to want to give people access to facilities that do something useful; it's > another thing entirely to give them access to something that is broken. > > Perhaps if we are going to bother to make this available the work should > be done to make it have more standard output? So take whatever the C > function returns and then make it conform to some reasonable output. I look at it differently. It's useful to make the platform tzset() available, because it lets us do something that couldn't be done before: change the definition of local time without restarting Python. If tzset() doesn't take standardized arguments, that's the problem of whoever wants to use it. There are lots of functions that have this: for example, anything taking a filename. At least it's there. The test suite for tzset() probably is too strict; we'll tune it to avoid failures on common platforms during the beta cycle. I don't know if it makes sense to provide tzset() on Windows; from Tim's description it doesn't sound likely. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Mar 17 01:50:40 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 16 Mar 2003 20:50:40 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: "Your message of Mon, 17 Mar 2003 11:54:54 +1200." <200303162354.h2GNssR04071@oma.cosc.canterbury.ac.nz> References: <200303162354.h2GNssR04071@oma.cosc.canterbury.ac.nz> Message-ID: <200303170150.h2H1of116581@pcp02138704pcs.reston01.va.comcast.net> > Guido: > > And I'm still hoping to remove __cmp__; there should be only one > > way to overload comparisons. [Greg] > I'd rather you kept it and re-defined it to mean > "compare for arbitrary ordering". (Maybe change its > name if there are backwards-compatibility issues.) Hm, that's not what it does now, and an arbitrary ordering is better defined by a "less" style operator. I've been thinking of __before__ and a built-in before(x, y) -> bool. (Not __less__ / less, because IMO that's to close to __lt__ / <.) BTW, there are two possible uses for before(): it could be used to impose an arbitrary ordering for types that don't have one now (like complex); and it could be used to impose an ordering between different types (like numbers and strings). I've got a gut feeling that the requirements for these are somewhat different, but can't quite pinpoint it. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Mar 17 02:17:23 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 16 Mar 2003 21:17:23 -0500 Subject: [Python-Dev] tzset In-Reply-To: <200303170143.h2H1hXH16557@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > I don't know if it makes sense to provide tzset() on Windows; from > Tim's description it doesn't sound likely. I wouldn't object if someone else wanted to do the work (which includes documenting it well enough to cut off an endless stream of obvious questions). The Windows tzset is weak but maybe usable for some people. For example, time zone names must be exactly 3 characters, and you can't tell the Windows tzset when daylight time begins or ends: it uses US rules no matter what the time zone. The native Win32 SetTimeZoneInformation() doesn't suffer these idiocies, but I'm not sure whether calling that affects the Unixish _tzname (etc) variables. "Doing the work" also means figuring out all that stuff. From zen@shangri-la.dropbear.id.au Mon Mar 17 03:51:42 2003 From: zen@shangri-la.dropbear.id.au (Stuart Bishop) Date: Mon, 17 Mar 2003 14:51:42 +1100 Subject: [Python-Dev] tzset In-Reply-To: Message-ID: On Sunday, March 16, 2003, at 04:07 PM, Tim Peters wrote: > Worse, if the platform tzset() isn't happy with TZ's value, it has no > way to > tell you: the function is declared void, and has no defined effects on > errno. Yup. It sucks, but is the best there is. I can't even find proprietary solutions for various Unix flavours. Maybe a post to Slashdot saying Zope 3 will be Windows only due to limitations in POSIX would at least get something for the free distros :-) > I hope the community takes up the challenge of building a sane > cross-platform time zone facility building on 2.3 datetime's tzinfo > objects. A cross-platform time zone facility isn't a problem - the data we need is available and maintained as part of numerous free Unix distributions. We could even steal C code to decode it if we are particularly lazy. The trick is that updates to coutries' timezone changes don't follow the Python release schedule, and I think this was covered in depth on python-dev not long ago in excruciating details that I'm sure no one wants to repeat :-) So the actual problem would be how to distribute data file updates to Python installations, which would also mean we could support the various ISO standards relating to things like country codes and languages (which I'm sure many of us are currently doing manually). Possibly a script that could be run as the user-who-owns-the-python-installation to update from source forge, which python-announce as the notification channel when files are updated? -- Stuart Bishop http://shangri-la.dropbear.id.au/ From aleax@aleax.it Mon Mar 17 07:25:48 2003 From: aleax@aleax.it (Alex Martelli) Date: Mon, 17 Mar 2003 08:25:48 +0100 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303170150.h2H1of116581@pcp02138704pcs.reston01.va.comcast.net> References: <200303162354.h2GNssR04071@oma.cosc.canterbury.ac.nz> <200303170150.h2H1of116581@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303170825.48556.aleax@aleax.it> On Monday 17 March 2003 02:50 am, Guido van Rossum wrote: > > Guido: > > > And I'm still hoping to remove __cmp__; there should be only one > > > way to overload comparisons. > > [Greg] > > > I'd rather you kept it and re-defined it to mean > > "compare for arbitrary ordering". (Maybe change its > > name if there are backwards-compatibility issues.) > > Hm, that's not what it does now, and an arbitrary ordering is better > defined by a "less" style operator. +1. I entirely agree that any ordering is easier to define with a 2-way comparison with the usual constraints of ordering, i.e., for any x, y, before(x,x) is false, before(x,y) implies not before(y,x), before(x,y) and before(y,z) implies before(x,z) and for this specific purpose of arbitrary ordering, it would, I think, be necessary for 'before' to define a total ordering, i.e. the implied equivalence being equality, i.e. not before(x,y) and not before(y,x) imply x==y (This latter desideratum may be a source of problems, see below). It would also be very nice if before(x,y) were the same as x I've been thinking of __before__ and a built-in before(x, y) -> bool. > (Not __less__ / less, because IMO that's to close to __lt__ / <.) I love the name 'before' and I entirely concur with your decision to avoid the name 'less'. > BTW, there are two possible uses for before(): it could be used to > impose an arbitrary ordering for types that don't have one now (like > complex); and it could be used to impose an ordering between different > types (like numbers and strings). I've got a gut feeling that the > requirements for these are somewhat different, but can't quite > pinpoint it. Perhaps subclassing/subtyping [and other possible cases where x==y may be true yet type(x) is not type(y)] may be the sticky issues, when all desirable constraints are considered together. The simplest problematic case I can think of is before(1,1+0j) -- by the "respect ==" criterion I would want both this comparison, and the same one with operands swapped, to be false; but by the criterion of imposing ordering between different incomparable types, I would like 'before' to range all instances of 'complex' "together", either all before, or all after, "normal" (comparable) numbers (ideally in a way that while arbitrary is repeatable over different runs/platforms/Python releases -- mostly for ease of teaching and explaining, rather than as a specific application need). Alex From whisper@oz.net Mon Mar 17 07:57:24 2003 From: whisper@oz.net (David LeBlanc) Date: Sun, 16 Mar 2003 23:57:24 -0800 Subject: [Python-Dev] Windows IO Message-ID: It looks as though IO in Python (2.2.1), regardless of platform or device, happens in Objects/fileobject.c and, in particular, writing occurs in file_write(...)? A few questions I hope a lurking (timbot? ;) ) person can answer: 1. Is the above true, or does something different happen when using a Windows console/commandline? 2. Is there any way to know if a console is being used (that a device is the console)? 3. What's the purpose of the PC/msvcrtmodule.c file? Does it play any role in the regular pythonic IO scheme of things? I'm interested in discovering if the Win32 API for screen reading/writing can be used so that character color attributes and cursor commands can be manipulated. It would be nice if those could be used transparently to a python application so that an application sending (for instance) ANSI color codes would succede and one that didn't wouldn't care. I realize this is sort of like curses - is there a Windows version of curses that plays well with Python and isn't GPL? TIA, David LeBlanc Seattle, WA USA From thomas@xs4all.net Mon Mar 17 10:57:56 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 17 Mar 2003 11:57:56 +0100 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules _hotshot.c,1.32,1.33 arraymodule.c,2.84,2.85 xreadlinesmodule.c,1.13,1.14 In-Reply-To: References: Message-ID: <20030317105756.GP2112@xs4all.nl> On Mon, Mar 17, 2003 at 12:35:54AM -0800, rhettinger@users.sourceforge.net wrote: > Created PyObject_GenericGetIter(). > Factors out the common case of returning self. PyObject_GenericGetIter doesn't really describe what it does; I would assume that tried to get the iter by assuming obj was a sequency type, and returning an iter that wraps that. Wouldn't "PyObject_GetSelfIter" or "PyObject_GenericSelfIter" or "PyObject_SelfIter" be a better name ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh@python.net Mon Mar 17 11:19:38 2003 From: mwh@python.net (Michael Hudson) Date: Mon, 17 Mar 2003 11:19:38 +0000 Subject: [Python-Dev] PyEval_GetFrame() revisited In-Reply-To: <20030316073303.31B3E5147@bespin.org> (Armin Rigo's message of "Sat, 15 Mar 2003 23:33:03 -0800 (PST)") References: <3E7204EC.60506@tismer.com> <3E7219D1.6090306@tismer.com> <20030316073303.31B3E5147@bespin.org> Message-ID: <2mhea29r5h.fsf@starship.python.net> Armin Rigo writes: > Maybe an API to manipulate tstate->frame could be useful and really > lightweight. Alternatively, we could consider what pyexpat does as > a general pattern and have an API for it, e.g.: Yes please! Cheers, M. -- Our lecture theatre has just crashed. It will currently only silently display an unexplained line-drawing of a large dog accompanied by spookily flickering lights. -- Dan Sheppard, ucam.chat (from Owen Dunn's summary of the year) From mwh@python.net Mon Mar 17 11:33:30 2003 From: mwh@python.net (Michael Hudson) Date: Mon, 17 Mar 2003 11:33:30 +0000 Subject: [Python-Dev] Re: lists v. tuples In-Reply-To: <3E734DD7.3080105@tismer.com> (Christian Tismer's message of "Sat, 15 Mar 2003 16:59:19 +0100") References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <3E734DD7.3080105@tismer.com> Message-ID: <2mel569qid.fsf@starship.python.net> Christian Tismer writes: > >>> a=[1, 2, 2+2j, 3+1j, 1+3j, 3-3j, 3+1j, 1+3j] > >>> a.sort(lambda x, y:cmp(abs(x), abs(y))) > >>> a > [1, 2, (2+2j), (3+1j), (1+3j), (3+1j), (1+3j), (3-3j)] > >>> Ooh, now I get to mention the list.sort feature request I came up with this weekend : I'd like to be able to write the above call as: >>> a.sort(key=abs) Cheers, M. -- 112. Computer Science is embarrassed by the computer. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From mal@lemburg.com Mon Mar 17 11:34:00 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 17 Mar 2003 12:34:00 +0100 Subject: [Python-Dev] tzset In-Reply-To: References: Message-ID: <3E75B2A8.3030102@lemburg.com> Stuart Bishop wrote: > On Sunday, March 16, 2003, at 04:07 PM, Tim Peters wrote: > >> Worse, if the platform tzset() isn't happy with TZ's value, it has no >> way to tell you: the function is declared void, and has no defined effects on >> errno. > > Yup. It sucks, but is the best there is. I can't even find proprietary > solutions for various Unix flavours. Maybe a post to Slashdot saying > Zope 3 will be Windows only due to limitations in POSIX would at least > get something for the free distros :-) I wonder why we need a TZ-parser then ? If it's non-standard anyway, the module is probably better off outside the core as separate download from e.g. SF. >> I hope the community takes up the challenge of building a sane >> cross-platform time zone facility building on 2.3 datetime's tzinfo >> objects. > > A cross-platform time zone facility isn't a problem - the data we need is > available and maintained as part of numerous free Unix distributions. We > could even steal C code to decode it if we are particularly lazy. -1 Why bloat the Python distribution with yet another locale implementation ? -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 17 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 15 days left EuroPython 2003, Charleroi, Belgium: 99 days left From guido@python.org Mon Mar 17 12:26:24 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 17 Mar 2003 07:26:24 -0500 Subject: [Python-Dev] Who approved PyObject_GenericGetIter()??? In-Reply-To: "Your message of Mon, 17 Mar 2003 00:22:59 PST." References: Message-ID: <200303171226.h2HCQOS17719@pcp02138704pcs.reston01.va.comcast.net> > Modified Files: > object.c > Log Message: > Created PyObject_GenericGetIter(). > Factors out the common case of returning self. > > > > Index: object.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v > retrieving revision 2.199 > retrieving revision 2.200 > diff -C2 -d -r2.199 -r2.200 > *** object.c 19 Feb 2003 03:19:29 -0000 2.199 > --- object.c 17 Mar 2003 08:22:56 -0000 2.200 > *************** > *** 1302,1305 **** > --- 1302,1312 ---- > > PyObject * > + PyObject_GenericGetIter(PyObject *obj) > + { > + Py_INCREF(obj); > + return obj; > + } > + > + PyObject * > PyObject_GenericGetAttr(PyObject *obj, PyObject *name) > { Huh? Where was this agreed upon? __iter__ returning self doesn't sound very generic to me, so at the very least the name should be changed IMO. Also, adding a standard API for a helper function this trivial doesn't really make sense to me. So maybe I'm missing something. Please explain. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Mar 17 12:35:23 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 17 Mar 2003 07:35:23 -0500 Subject: [Python-Dev] tzset In-Reply-To: "Your message of Mon, 17 Mar 2003 12:34:00 +0100." <3E75B2A8.3030102@lemburg.com> References: <3E75B2A8.3030102@lemburg.com> Message-ID: <200303171235.h2HCZNC17807@pcp02138704pcs.reston01.va.comcast.net> > > A cross-platform time zone facility isn't a problem - the data we need is > > available and maintained as part of numerous free Unix distributions. We > > could even steal C code to decode it if we are particularly lazy. > > -1 > > Why bloat the Python distribution with yet another locale > implementation ? Agreed. This should be a 3rd party add-on. (Especially since in many cases, tzset() does all you need.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mchermside@ingdirect.com Mon Mar 17 13:50:22 2003 From: mchermside@ingdirect.com (Chermside, Michael) Date: Mon, 17 Mar 2003 08:50:22 -0500 Subject: [Python-Dev] Re: lists v. tuples Message-ID: <7F171EB5E155544CAC4035F0182093F04211EF@INGDEXCHSANC1.ingdirect.com> [Christian Tismer] > that people are putting widely different types into > a list in order to sort them. (Although there is an > arbitrary order between strings and numbers, which > I would drop in Python 2.4, too). [Alex Martelli] > Such a change would indeed enormously increase the > number of non-sortable (except by providing custom > compares) lists. So, for example, all places which get > and sort the list of keys in a dictionary in order to return=20 > or display the keys should presumably do the sorting > within a try/except? [...] > Or do you think a dictionary should also be constrained to have keys > that are all comparable with each other (i.e., presumably, never > both string AND number keys) as well as hashable? [Guido van Rossum] > I don't believe this argument. I've indeed often sorted a dict's keys > (or values), but always in situations where the sorted values were > homogeneous as far meaningful comparison goes, e.g. all numbers, or > all strings, or all "compatible" tuples. >=20 > If you know *nothing* about the keys of a dict, you already have to do > that if you want to sort the keys. >=20 > There are lots of apps that have no need to ever sort the keys: if > there weren't, it would have been wiser to keep the keys in sorted > order in the first place, like ABC did. Actually, I found Alex's example to be quite persuasive. I had been reading this thread and thinking how I essentially never create and sort lists containing mixed arbitrary objects. But I DO use dicts, and while most of my dicts have string-only keys, there are others that don't. I wouldn't want to maintain the keys in sorted order, because I don't have to sort my dictionaries (at least the ones that have mixed arbitrary objects for keys), *EXCEPT* that I *DO* sort them when I'm debugging! It's a pain (as I'm sure you know) to examine two dicts in a logfile or debug session and find how they differ, a task made much easier by sorting the keys before listing. So Alex convinced me that I *DO* have a use-case for sorting arbitrary things after all... in code (like my dict prettifier) used for coding and debugging. And if I ever used complex numbers in my lists, I'd already be in trouble... but somehow it's never come up. (I guess complex #s as keys are unusual ;-).) I think the lesson is that we shouldn't break arbitrary object comparison (more than it's already broken) until AFTER Guido's OTHER proposal (the "before()" comparison) is in place to be used in this sort of situation. I wouldn't mind switching over to a slightly different syntax as long as I don't have to write a custom sort routine each and every time I want to print a dict to the logs. [Guido van Rossum] > I'm sure that raising an exception on abominations like 2 < "1" or > (1, 2) < 0 is a good thing, just like we all agree that forbidding > 1 + "2" is a good thing. I agree with you there! -- Michael Chermside From python@rcn.com Mon Mar 17 14:02:02 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 17 Mar 2003 09:02:02 -0500 Subject: [Python-Dev] PyObject_GenericGetIter() References: <200303171226.h2HCQOS17719@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <005801c2ec8d$c8fd9100$125ffea9@oemcomputer> > Where was this agreed upon? Perhaps I overstepped. It's been on my todo list for a couple of months and didn't seem to be even slightly controversial. > __iter__ returning self doesn't sound very generic to me, > so at the very least the name should be changed IMO. Thomas suggested PyObject_GetSelfIter, PyObject_GenericSelfIter, or PyObject_SelfIter. Consistent with the other tp_slot fillers, I suggest PyObject_GenericIter. > Also, adding a standard API for a helper function this > trivial doesn't really make sense to me. This identical code was duplicated in a dozen different modules in the same context. It comes up when writing most iterators and needed to be factored out. Raymond Hettinge From python@rcn.com Mon Mar 17 14:41:01 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 17 Mar 2003 09:41:01 -0500 Subject: [Python-Dev] PyObject_GenericGetIter() References: <200303171226.h2HCQOS17719@pcp02138704pcs.reston01.va.comcast.net> <005801c2ec8d$c8fd9100$125ffea9@oemcomputer> Message-ID: <001e01c2ec93$3bcc7d40$125ffea9@oemcomputer> > > __iter__ returning self doesn't sound very generic to me, > > so at the very least the name should be changed IMO. > > Thomas suggested PyObject_GetSelfIter, PyObject_GenericSelfIter, > or PyObject_SelfIter. Consistent with the other tp_slot fillers, I > suggest PyObject_GenericIter. A couple of other thoughts. While Thomas found the "getiter" part of the name to be unintuitive, the type of the slot is named (getiterfunc) and most of the replaced functions had names like dictiter_getiter, enum_getiter, iter_getiter, listiter_getiter, xreadlines_getiter ... So, my first preference is the name in the subject line. Raymond Hettinger From mchermside@ingdirect.com Mon Mar 17 14:59:33 2003 From: mchermside@ingdirect.com (Chermside, Michael) Date: Mon, 17 Mar 2003 09:59:33 -0500 Subject: [Python-Dev] Re: lists v. tuples Message-ID: <7F171EB5E155544CAC4035F0182093F03CF7A5@INGDEXCHSANC1.ingdirect.com> [Glyph Lefkowitz] > This smells like another unformed PEP I don't have the time to think > about or implement :-(, but I would definitely like to see mutability > guarantees worm their way into the language at some point, too. Hmm... as far as I can tell, this would be a fairly trivial change. All we'd need to do is make a slight modification (just adding one method to each) to ints, strings, and tuples (and perhaps a couple of others) and you'd have your guarantee! Of-course-guaranteeing-that-everything-is-mutable-might-not-be-what-you-e= xpected -- Michael Chermside From thomas@xs4all.net Mon Mar 17 15:10:40 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 17 Mar 2003 16:10:40 +0100 Subject: [Python-Dev] PyObject_GenericGetIter() In-Reply-To: <001e01c2ec93$3bcc7d40$125ffea9@oemcomputer> References: <200303171226.h2HCQOS17719@pcp02138704pcs.reston01.va.comcast.net> <005801c2ec8d$c8fd9100$125ffea9@oemcomputer> <001e01c2ec93$3bcc7d40$125ffea9@oemcomputer> Message-ID: <20030317151040.GQ2112@xs4all.nl> On Mon, Mar 17, 2003 at 09:41:01AM -0500, Raymond Hettinger wrote: > > Thomas suggested PyObject_GetSelfIter, PyObject_GenericSelfIter, > > or PyObject_SelfIter. Consistent with the other tp_slot fillers, I > > suggest PyObject_GenericIter. > A couple of other thoughts. While Thomas found the "getiter" part > of the name to be unintuitive, the type of the slot is named (getiterfunc) > and most of the replaced functions had names like dictiter_getiter, > enum_getiter, iter_getiter, listiter_getiter, xreadlines_getiter ... > So, my first preference is the name in the subject line. But my original point still stands. dictiter_getiter, enum_getiter, iter_getiter are all fairly clear: they get the iter for an (existing) dictiter/enum/iter object. PyObject_GenericGetIter does not return an iterator for a generic object, it's a generic way to return an iterator *for an iterator*. PyIter_GenericGetIter, PyObject_IterGetIter, etc are all more descriptive. I also agree with Guido on that this should not be a public API function (and if it is, it should be documented .) Functions that aren't part of the public API but can't be static should be prefixed with _Py. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From ark@research.att.com Mon Mar 17 15:17:33 2003 From: ark@research.att.com (Andrew Koenig) Date: 17 Mar 2003 10:17:33 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: References: Message-ID: Tim> For example, probing a vanilla binary search tree needs to stop Tim> when it hits a node with key equal to the thing searched for, or Tim> move left or right when != obtains. The binary-search routines in the C++ standard library mostly avoid having to do != comparisons by defining their interfaces in the following clever way: binary_search returns a boolean that indicates whether the value sought is in the sequence. It does not say where that value is. lower_bound returns the first position ahead of which the given value could be inserted without disrupting the ordering of the sequence. upper_bound returns the last position ahead of which the given value could be inserted without disrupting the ordering of the sequence. equal_range returns (lower_bound, upper_bound) as a pair. In Python terms: binary_search([3, 5, 7], 6) would yield False binary_search([3, 5, 7], 7) would yield True lower_bound([1, 3, 5, 7, 9, 11], 9) would yield 4 lower_bound([1, 3, 5, 7, 9, 11], 8) would also yield 4 upper_bound([1, 3, 5, 7, 9, 11], 9) would yield 5 equal_range([1, 1, 3, 3, 3, 5, 5, 5, 7], 3) would yield (2, 5). If you like, equal_range(seq, x) returns (l, h) such that all the elements of seq[l:h] are equal to x. If l == h, the subsequence is the empty sequence between the two adjacent elements with values that bracket x. These definitions turn out to be useful in practice, and are also easy to implement efficiently using only < comparisons. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From guido@python.org Mon Mar 17 15:22:00 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 17 Mar 2003 10:22:00 -0500 Subject: [Python-Dev] PyObject_GenericGetIter() In-Reply-To: "Your message of Mon, 17 Mar 2003 09:02:02 EST." <005801c2ec8d$c8fd9100$125ffea9@oemcomputer> References: <200303171226.h2HCQOS17719@pcp02138704pcs.reston01.va.comcast.net> <005801c2ec8d$c8fd9100$125ffea9@oemcomputer> Message-ID: <200303171522.h2HFM0R18130@pcp02138704pcs.reston01.va.comcast.net> > > Where was this agreed upon? > > Perhaps I overstepped. It's been on my todo list for a > couple of months and didn't seem to be even slightly > controversial. > > > > __iter__ returning self doesn't sound very generic to me, > > so at the very least the name should be changed IMO. > > Thomas suggested PyObject_GetSelfIter, PyObject_GenericSelfIter, > or PyObject_SelfIter. Consistent with the other tp_slot fillers, I > suggest PyObject_GenericIter. The "generic" functions aren't just slot fillers, they do a lot of work that is typical for most types. The self-iter, OTOH, doesn't do what most types' iterators need -- it only does what most *iterators* need for their own iterator. So a name with 'Self' in it is more appropriate. I'd pick PyObject_SelfIter. > > Also, adding a standard API for a helper function this > > trivial doesn't really make sense to me. > > This identical code was duplicated in a dozen different > modules in the same context. It comes up when writing > most iterators and needed to be factored out. "Need" is a strong word. It's okay to add this little convenience, but please give it a proper name. Maybe some day we'll have a true generic iterator helper too. --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Mon Mar 17 15:28:15 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 17 Mar 2003 10:28:15 -0500 Subject: [Python-Dev] PyObject_GenericGetIter() References: <200303171226.h2HCQOS17719@pcp02138704pcs.reston01.va.comcast.net> <005801c2ec8d$c8fd9100$125ffea9@oemcomputer> <200303171522.h2HFM0R18130@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <001501c2ec99$d50dd480$125ffea9@oemcomputer> > The "generic" functions aren't just slot fillers, they do a lot of > work that is typical for most types. Learned something new today. > I'd pick PyObject_SelfIter. Good. I'll put it in this evening. Raymond Hettinger From tim.one@comcast.net Mon Mar 17 18:45:17 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 17 Mar 2003 13:45:17 -0500 Subject: [Python-Dev] RE: Windows IO In-Reply-To: Message-ID: [David LeBlanc] > It looks as though IO in Python (2.2.1), regardless of platform or device, > happens in Objects/fileobject.c and, in particular, writing occurs in > file_write(...)? For builtin file objects, at least there, and in file_writelines(), and it's also possible to use f.fileno() and then use lower-level facilities (like os.write()). > A few questions I hope a lurking (timbot? ;) ) person can answer: > > 1. Is the above true, or does something different happen when using a > Windows console/commandline? Using one how? If via a Python file object, yes, the above is true. > 2. Is there any way to know if a console is being used (that a > device is the console)? >>> import sys >>> sys.stdin.isatty() True >>> sys.stdout.isatty() True >>> whatever = open('whatever.txt', 'w') >>> whatever.isatty() False >>> > 3. What's the purpose of the PC/msvcrtmodule.c file? It implements the Windows-specific msvcrt module: http://www.python.org/doc/current/lib/module-msvcrt.html > Does it play any role in the regular pythonic IO scheme of things? No, and mixing console-mode IO via that module with standard IO can be a disaster. > I'm interested in discovering if the Win32 API for screen reading/writing > can be used so that character color attributes and cursor commands can be > manipulated. It would be nice if those could be used transparently to a > python application so that an application sending (for instance) > ANSI color codes would succede and one that didn't wouldn't care. I > realize this is sort of like curses - is there a Windows version of curses > that plays well with Python and isn't GPL? This really belongw on c.l.py, where it gets asked frequently enough. I haven't paid attention to the answers. Fredrik's Console extension for Windows should tickle your fancy: http://effbot.org/zone/console-index.htm From martin@v.loewis.de Mon Mar 17 18:47:23 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 17 Mar 2003 19:47:23 +0100 Subject: [Python-Dev] Windows IO In-Reply-To: References: Message-ID: "David LeBlanc" writes: > It looks as though IO in Python (2.2.1), regardless of platform or device, > happens in Objects/fileobject.c and, in particular, writing occurs in > file_write(...)? [...] > 1. Is the above true, or does something different happen when using a > Windows console/commandline? If you ask "does writing occur in file_write, even on Windows", then "yes". If you ask "does all writing occur in file_write, even on Windows", then "no". It also occurs in file_writelines, posix_write, string_print, w_string, and many places that use fprintf (too many to enumerate them here). > 2. Is there any way to know if a console is being used (that a device is the > console)? posix.isatty comes close. > 3. What's the purpose of the PC/msvcrtmodule.c file? It exposes the following functions {"heapmin", msvcrt_heapmin, METH_VARARGS}, {"locking", msvcrt_locking, METH_VARARGS}, {"setmode", msvcrt_setmode, METH_VARARGS}, {"open_osfhandle", msvcrt_open_osfhandle, METH_VARARGS}, {"get_osfhandle", msvcrt_get_osfhandle, METH_VARARGS}, {"kbhit", msvcrt_kbhit, METH_VARARGS}, {"getch", msvcrt_getch, METH_VARARGS}, {"getche", msvcrt_getche, METH_VARARGS}, {"putch", msvcrt_putch, METH_VARARGS}, {"ungetch", msvcrt_ungetch, METH_VARARGS}, as well as a few symbolic constants. > Does it play any role in the regular pythonic IO scheme of things? No. None of these functions is normally called; getpass.py uses msvcrt. Regards, Martin From jim@interet.com Mon Mar 17 19:05:11 2003 From: jim@interet.com (James C. Ahlstrom) Date: Mon, 17 Mar 2003 14:05:11 -0500 Subject: [Python-Dev] Windows IO References: Message-ID: <3E761C67.1030606@interet.com> David LeBlanc wrote: >1. Is the above true, or does something different happen when using a >Windows console/commandline? AFAIK, all Python IO uses the fprintf() functions of Windows. These stream IO functions are Posix emulations, are not the native Windows IO functions, and are second class citizens. The native Windows IO functions are CreateFile(), ReadFile(), WriteFile() etc. The native Windows functions support additional functionality. >2. Is there any way to know if a console is being used (that a device >is the onsole)? All Windows programs must provide a window to operate. But to make porting character-mode programs easier, Windows provides a "Console Window" feature. This is a Windows window you can create which contains the handy features needed to support character WriteFile() and fprintf(). Usually there is no need to test if a console is in use. A Windows program created as a console program has that coded into its header, and the console is created when it starts. It is possible to use CreateProcess() to create a process and its console window, but again there is no need to test. >3. What's the purpose of the PC/msvcrtmodule.c file? Does it play any >role >in the regular pythonic IO scheme of things? This is a handy module, but plays no role in Python IO. >I'm interested in discovering if the Win32 API for screen >reading/writing >can be used so that character color attributes and cursor >commands can be manipulated. A console window supports arrays of cells, and the cell contains the character and the cell attribule. That means you can control color of each cell. Both character input and mouse input are supported. There is a cursor. The whole thing is a lot like a terminal (if anyone out there remembers those). Jim Ahlstrom From barry@python.org Mon Mar 17 19:39:43 2003 From: barry@python.org (Barry A. Warsaw) Date: Mon, 17 Mar 2003 14:39:43 -0500 Subject: [Python-Dev] test_posix failures? Message-ID: <15990.9343.255137.681030@yyz.zope.com> test_posix fails for me in current CVS: ERROR: testNoArgFunctions (__main__.PosixTester) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_posix.py", line 46, in testNoArgFunctions posix_func() OSError: [Errno 2] No such file or directory ---------------------------------------------------------------------- Ran 18 tests in 0.038s narrowed down to posix.getlogin(). FTR I'm on RH7.3. Here's the fun part : this succeeds if running in an xterm, but fails if running in a XEmacs 21.4.11 shell buffer. I tried it with Emacs 21.2 as well and it also fails there. A little C program calling getlogin() gives the same results. It also fails in a XEmacs compilation buffer. So the os.isatty() test isn't enough. This returns True in all three shells but getlogin() still fails. The weird thing is that I've never seen failures here before and I do this type of testing all the time. Does anybody else see this? Maybe we should just remove getlogin() from NO_ARG_FUNCTIONS? -Barry From thomas@xs4all.net Mon Mar 17 19:57:53 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 17 Mar 2003 20:57:53 +0100 Subject: [Python-Dev] test_posix failures? In-Reply-To: <15990.9343.255137.681030@yyz.zope.com> References: <15990.9343.255137.681030@yyz.zope.com> Message-ID: <20030317195753.GS2112@xs4all.nl> On Mon, Mar 17, 2003 at 02:39:43PM -0500, Barry A. Warsaw wrote: > test_posix fails for me in current CVS: > narrowed down to posix.getlogin(). FTR I'm on RH7.3. > Here's the fun part : this succeeds if running in an xterm, but > fails if running in a XEmacs 21.4.11 shell buffer. I tried it with > Emacs 21.2 as well and it also fails there. A little C program > calling getlogin() gives the same results. It also fails in a XEmacs > compilation buffer. > So the os.isatty() test isn't enough. This returns True in all three > shells but getlogin() still fails. The weird thing is that I've never > seen failures here before and I do this type of testing all the time. Getlogin isn't guaranteed to work even when running in a terminal. Using the excellent 'screen' tool, you can 'log out' your session (on a per-shell basis.) Being 'logged in' just means there is an entry for your terminal in /var/run/utmp (although the location of the 'utmp' file is system-dependent.) Observe: >>> os.getlogin() 'thomas' ^AL This window is no longer logged in. >>> os.getlogin() Traceback (most recent call last): File "", line 1, in ? OSError: [Errno 2] No such file or directory >>> Inbetween the two 'getlogin' calls, I hit '^A', 'L', and screen told me "This window is no longer logged in." And look, ma, no login. Note that there isn't really a portable way to write utmp (screen jumps through hoops, and disables the ability if it can't figure out how to do it) and it might need special privileges, so we can't just add a utmp entry to check against. I'm guessing you updated your (X)Emacs, your libc or something else on your platform that causes this problem for you, Barry. Perhaps the 'getlogin' test needs a 'utmp' resource ? :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From whisper@oz.net Mon Mar 17 20:01:38 2003 From: whisper@oz.net (David LeBlanc) Date: Mon, 17 Mar 2003 12:01:38 -0800 Subject: [Python-Dev] RE: Windows IO In-Reply-To: Message-ID: > > 2. Is there any way to know if a console is being used (that a > > device is the console)? > > >>> import sys > >>> sys.stdin.isatty() > True > >>> sys.stdout.isatty() > True > >>> whatever = open('whatever.txt', 'w') > >>> whatever.isatty() > False > >>> > J:\>python Python 2.2.1 (#34, Jul 16 2002, 16:25:42) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.isatty() 64 >>> Have we discovered the mystery of life at last? "True" is 64? :) NOTE: PythonDoc says "isatty" is Unix only. > This really belongw on c.l.py, where it gets asked frequently enough. Sorry, I thought a Python "guts" question would be ok here and likely to get better informed answers than over on the (more or less) application side of the house. > I haven't paid attention to the answers. Fredrik's Console extension for > Windows should tickle your fancy: I'll take a look at it. Meanwhile, someone has kindly sent me source for a PDCurses binding for Python. Thanks to everyone for their answers! I'd like to thank the Academy, the Screen Actor's Guild and especially, Timbot, who helped make it all possible :) ;) Regards, Dave LeBlanc Seattle, WA USA From barry@python.org Mon Mar 17 20:06:32 2003 From: barry@python.org (Barry A. Warsaw) Date: Mon, 17 Mar 2003 15:06:32 -0500 Subject: [Python-Dev] test_posix failures? References: <15990.9343.255137.681030@yyz.zope.com> <20030317195753.GS2112@xs4all.nl> Message-ID: <15990.10952.901572.773533@gargle.gargle.HOWL> >>>>> "TW" == Thomas Wouters writes: TW> I'm guessing you updated your (X)Emacs, your libc or something TW> else on your platform that causes this problem for you, TW> Barry. Who knows? :) TW> Perhaps the 'getlogin' test needs a 'utmp' resource ? TW> :-) Maybe we should just ditch the test. It's only there to make sure that getlogin() takes no arguments. -Barry From neal@metaslash.com Mon Mar 17 20:29:47 2003 From: neal@metaslash.com (Neal Norwitz) Date: Mon, 17 Mar 2003 15:29:47 -0500 Subject: [Python-Dev] test_posix failures? In-Reply-To: <15990.10952.901572.773533@gargle.gargle.HOWL> References: <15990.9343.255137.681030@yyz.zope.com> <20030317195753.GS2112@xs4all.nl> <15990.10952.901572.773533@gargle.gargle.HOWL> Message-ID: <20030317202947.GD14067@epoch.metaslash.com> On Mon, Mar 17, 2003 at 03:06:32PM -0500, Barry A. Warsaw wrote: > > >>>>> "TW" == Thomas Wouters writes: > > TW> I'm guessing you updated your (X)Emacs, your libc or something > TW> else on your platform that causes this problem for you, > TW> Barry. > > Who knows? :) > > TW> Perhaps the 'getlogin' test needs a 'utmp' resource ? > TW> :-) > > Maybe we should just ditch the test. It's only there to make sure > that getlogin() takes no arguments. I think the test should be removed. I'm the one who added it, but there have been too many problems with it to make it useful. I will remove it later, unless someone beats me to it. Neal From tim.one@comcast.net Mon Mar 17 20:38:40 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 17 Mar 2003 15:38:40 -0500 Subject: [Python-Dev] RE: Windows IO In-Reply-To: Message-ID: [David LeBlanc] > Have we discovered the mystery of life at last? "True" is 64? :) > NOTE: PythonDoc says "isatty" is Unix only. I don't know what PythonDoc means. The docs for the file-object method isatty (which my examples used) do not say it's Unix only: http://www.python.org/doc/current/lib/bltin-file-objects.html If some other piece of doc contradicts that, please tell mailto:python-docs@python.org or open an SF bug report? From tim.one@comcast.net Mon Mar 17 21:26:08 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 17 Mar 2003 16:26:08 -0500 Subject: [Python-Dev] tzset In-Reply-To: <3E75B2A8.3030102@lemburg.com> Message-ID: [Stuart Bishop] >> Yup. It sucks, but is the best there is. I can't even find proprietary >> solutions for various Unix flavours. Maybe a post to Slashdot saying >> Zope 3 will be Windows only due to limitations in POSIX would at least >> get something for the free distros :-) [M.-A. Lemburg] > I wonder why we need a TZ-parser then ? If it's non-standard > anyway, the module is probably better off outside the core as > separate download from e.g. SF. TZ parsing code hasn't been added to Python, just a wrapper around the platform tzset() function (if any, and for now ignoring the flavor of tzset supplied by Windows). POSIX defines various forms TZ values can take. Some forms have portable meaning across POSIX systems, while others do not. >>> I hope the community takes up the challenge of building a sane >>> cross-platform time zone facility building on 2.3 datetime's tzinfo >>> objects. >> A cross-platform time zone facility isn't a problem - the data >> we need is available and maintained as part of numerous free Unix >> distributions. We could even steal C code to decode it if we are >> particularly lazy. > -1 > > Why bloat the Python distribution with yet another locale > implementation ? Well, I didn't say anything about the std distribution. Whether there or elsewhere, Python didn't and doesn't have any portable (x-platform) way to deal with time zones. 2.3's tzinfo objects are capable of carrying time zone information in a sane x-platform way, but no concrete tzinfo objects are supplied. From whisper@oz.net Mon Mar 17 21:53:55 2003 From: whisper@oz.net (David LeBlanc) Date: Mon, 17 Mar 2003 13:53:55 -0800 Subject: [Python-Dev] RE: Windows IO In-Reply-To: Message-ID: > -----Original Message----- > From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On > Behalf Of Tim Peters > Sent: Monday, March 17, 2003 12:39 > To: David LeBlanc > Cc: Python-Dev@Python. Org > Subject: RE: [Python-Dev] RE: Windows IO > > > [David LeBlanc] > > Have we discovered the mystery of life at last? "True" is 64? :) > > NOTE: PythonDoc says "isatty" is Unix only. > > I don't know what PythonDoc means. The docs for the file-object method > isatty (which my examples used) do not say it's Unix only: > > http://www.python.org/doc/current/lib/bltin-file-objects.html > > If some other piece of doc contradicts that, please tell > > mailto:python-docs@python.org > > or open an SF bug report? > I don't have the capability to open an SF bug report. "isatty" is not documented at all under the Global Modules "sys" entry for Python 2.2.1 documentation (sorry, I thought "PythonDoc" was a recognized name). The following doesn't work: J:\>python Python 2.2.1 (#34, Jul 16 2002, 16:25:42) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> stdout.isatty() Traceback (most recent call last): File "", line 1, in ? NameError: name 'stdout' is not defined >>> isatty(stdout) Traceback (most recent call last): File "", line 1, in ? NameError: name 'isatty' is not defined >>> isatty(__stdout__) Traceback (most recent call last): File "", line 1, in ? NameError: name 'isatty' is not defined >>> import os >>> os.stdout.isatty() Traceback (most recent call last): File "", line 1, in ? AttributeError: 'module' object has no attribute 'stdout' >>> Is isatty a built-in, a function of os only available on Unix, or a function of sys available on all platforms? It appears to be a function in the sys module and so the doc for it should go there? Under the "os" entry it's: "isatty(fd) Return 1 if the file descriptor fd is open and connected to a tty(-like) device, else 0. Availability: Unix. " I don't see how to create a file() that is connected to stdout without importing sys...? Is there a way? If there is not, than file.isatty() is moot. So, really, what is the meaning of "64" as the return from sys.stdout.isatty()? Dave LeBlanc Seattle, WA USA From fdrake@acm.org Mon Mar 17 22:06:49 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 17 Mar 2003 17:06:49 -0500 Subject: [Python-Dev] RE: Windows IO In-Reply-To: References: Message-ID: <15990.18169.95695.749468@grendel.zope.com> David LeBlanc writes: > "isatty" is not documented at all under the Global Modules "sys" entry for > Python 2.2.1 documentation (sorry, I thought "PythonDoc" was a recognized > name). The following doesn't work: ... > Is isatty a built-in, a function of os only available on Unix, or a function > of sys available on all platforms? It appears to be a function in the sys > module and so the doc for it should go there? isatty() is a method of a file object. It's documented as part of the file object; see section 2.2.8 of the library reference manual. > Under the "os" entry it's: > "isatty(fd) > Return 1 if the file descriptor fd is open and connected to a tty(-like) > device, else 0. Availability: Unix. " > > I don't see how to create a file() that is connected to stdout without > importing sys...? Is there a way? If there is not, than file.isatty() is > moot. Standard output is the file object sys.stdout. > So, really, what is the meaning of "64" as the return from > sys.stdout.isatty()? It's a true value. That's all. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From martin@v.loewis.de Mon Mar 17 22:08:31 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 17 Mar 2003 23:08:31 +0100 Subject: [Python-Dev] Windows IO In-Reply-To: <3E761C67.1030606@interet.com> References: <3E761C67.1030606@interet.com> Message-ID: <3E76475F.9080508@v.loewis.de> James C. Ahlstrom wrote: > AFAIK, all Python IO uses the fprintf() functions of Windows. These > stream IO functions are Posix emulations, are not the native > Windows IO functions, and are second class citizens. The native > Windows IO functions are CreateFile(), ReadFile(), WriteFile() etc. > The native Windows functions support additional functionality. This isn't really the case. fprintf is not (primarily) defined in POSIX, but in standard C, and it is part of the standard C library that comes with the C compiler. It is true that fprintf is not a system call on Windows, but neither is it a system call on Unix (the system call on Unix is write(2)). Regards, Martin From martin@v.loewis.de Mon Mar 17 22:12:07 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 17 Mar 2003 23:12:07 +0100 Subject: [Python-Dev] RE: Windows IO In-Reply-To: References: Message-ID: <3E764837.4090400@v.loewis.de> David LeBlanc wrote: > Is isatty a built-in, a function of os only available on Unix, or a function > of sys available on all platforms? It appears to be a function in the sys > module and so the doc for it should go there? *This* question definitely is off-topic for python-dev. Python-dev readers are supposed to study the Python source code to answer such a question. > So, really, what is the meaning of "64" as the return from > sys.stdout.isatty()? Use the source, Luke. Regards, Martin From tim.one@comcast.net Mon Mar 17 22:09:54 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 17 Mar 2003 17:09:54 -0500 Subject: [Python-Dev] RE: Windows IO In-Reply-To: Message-ID: [David LeBlanc] > I don't have the capability to open an SF bug report. It's not restricted -- anyone can open a bug report. You need a web browser and an internet connection, of course. > "isatty" is not documented at all under the Global Modules "sys" entry for > Python 2.2.1 documentation No, but why would it be? I gave you a link to the current docs before: http://www.python.org/doc/current/lib/bltin-file-objects.html Go there and search down for isatty. In 2.2.1, the link is this instead: http://www.python.org/doc/2.2.1/lib/bltin-file-objects.html > (sorry, I thought "PythonDoc" was a recognized > name). The following doesn't work: > J:\>python > Python 2.2.1 (#34, Jul 16 2002, 16:25:42) [MSC 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> stdout.isatty() > Traceback (most recent call last): > File "", line 1, in ? > NameError: name 'stdout' is not defined I showed concrete examples in the last msg. stdout lives in sys, as was shown there: >>> import sys >>> sys.stdout.isatty() True >>> That's in 2.3. I don't have 2.2.1. Here's under 2.0: >>> import sys >>> sys.stdout.isatty() 64 >>> > >>> isatty(stdout) > Traceback (most recent call last): > File "", line 1, in ? > NameError: name 'isatty' is not defined > >>> isatty(__stdout__) > Traceback (most recent call last): > File "", line 1, in ? > NameError: name 'isatty' is not defined > >>> import os > >>> os.stdout.isatty() > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'stdout' > >>> Please read the docs -- there's no reason to expect any of those to work. > Is isatty a built-in, No. > a function of os only available on Unix, No, although os.isatty exists on some platforms. fileobject.isatty() exists on all platforms. > or a function of sys available on all platforms? It's not in sys on any platform. > It appears to be a function in the sys module and so the doc for it should > go there? Nope, isatty() is never in sys. It's primarily a *method* on file objects, as all the examples I've given have used. sys.stdin and sys.stdout are file objects. > Under the "os" entry it's: > "isatty(fd) > Return 1 if the file descriptor fd is open and connected to a tty(-like) > device, else 0. Availability: Unix. " > > I don't see how to create a file() that is connected to stdout without > importing sys...? Is there a way? If there is not, than file.isatty() is > moot. Sorry, I don't understand the question. > So, really, what is the meaning of "64" as the return from > sys.stdout.isatty()? Before Python 2.3, it's simply the value Microsoft's isatty() function returned. Python 2.3 translates it to a bool. Microsoft's docs say: _isatty returns a nonzero value handle is associated with a character device. Otherwise, _isatty returns 0. The grammar errors are copied verbatim from their docs, BTW -- telling me that didn't make sense won't help you . From mstone@ugcs.caltech.edu Mon Mar 17 22:25:56 2003 From: mstone@ugcs.caltech.edu (mstone@ugcs.caltech.edu) Date: Mon, 17 Mar 2003 14:25:56 -0800 (PST) Subject: [Python-Dev] test_posix failures? Message-ID: > I think the test should be removed. I'm the one who added it, but > there have been too many problems with it to make it useful. > I will remove it later, unless someone beats me to it. > > Neal In each case given the exception thrown is an OSError: [Errno 2] No such file or directory, apparently due to inability to locate utmp. The patch you already supplied at SF bug #697556 should fix any of those. Perhaps it's not necessary to scrap the test altogether? (Now if anyone wants to look into the test_socket problem at 697556....) -Michael From thomas@xs4all.net Mon Mar 17 22:42:33 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 17 Mar 2003 23:42:33 +0100 Subject: [Python-Dev] test_posix failures? In-Reply-To: References: Message-ID: <20030317224233.GT2112@xs4all.nl> On Mon, Mar 17, 2003 at 02:25:56PM -0800, mstone@ugcs.caltech.edu wrote: > > I think the test should be removed. I'm the one who added it, but > > there have been too many problems with it to make it useful. > > I will remove it later, unless someone beats me to it. > In each case given the exception thrown is an OSError: [Errno 2] No > such file or directory, apparently due to inability to locate utmp. No, the inability to locate the attached terminal in utmp (as I showed in my example.) But this is not documented behaviour -- at least not on Linux, BSDI and FreeBSD. I'm not able to reproduce the behaviour on the latter two, but this may be because screen's "log-out" code isn't working properly. In any case, we can't really rely on it only throwing OSError, but it's probably 'good enough for our purposes'. > (Now if anyone wants to look into the test_socket problem at 697556....) A different bug report in the same bugreport ? Not suprising no one looked at it ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From brett@python.org Mon Mar 17 23:11:46 2003 From: brett@python.org (Brett Cannon) Date: Mon, 17 Mar 2003 15:11:46 -0800 (PST) Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 Message-ID: Guys have about 24 hours to point out how imperfect the summary is. For those who participated in the `Capabilities` thread, please check my summary. I feel a little shaky about it and feel I could add more detail, but I didn't want to add any wrong info so I kept it rather shallow. ------------------ python-dev Summary for 2003-03-01 through 2003-03-15 +++++++++++++++++++++++++++++++++++++++++++++++++++++ .. _last summary: ====================== Summary Announcements ====================== As I am sure most readers of this summary know by now, I am going to PyCon_. This means that I will be occupied the whole last week of this month. I suspect python-dev traffic will be light since I believe most of PythonLabs will be at Pycon and thus not working. =) But still, I will be occupied myself and thus won't have a chance to work on the summary until I come home. This means you should expect the next summary to be rather late. I will get to it, though, at some point. And in case you haven't yet, register for PyCon_. .. _PyCon: http://www.python.org/pycon/ ============================================= `Ridiculously minor tweaks?`__ ============================================= __ http://mail.python.org/pipermail/python-dev/2003-March/033962.html Splinter threads: - `How long is your shopping tuple? `__ - `Tuples vs lists `__ - `Re: lists v. tuples `__ - `mutability `__ The original point of this thread was Jeremy Fincher finding out if patches changing lists to tuples where the list was not mutated would be accepted for a miniscule performance boost (the answer was no). But this wasn't the interesting knowledge that came out of this thread. This thread led to Guido stating his intended uses of tuples and lists. And you might be going, "lists are for mutable sequences of objects while tuples are for immutable sequences of objects". Well, that is not what Guido thinks of lists and tuples (and don't feel bad if you thought otherwise; Christian Tismer didn't even know what Guido had in mind and Python does not exactly require you to agree with Guido on this). Turns out that tuples, in Guido's view of the world, are "for heterogeneous data" and "list[s] are for homogeneous data"; "Tuples are *not* read-only lists". Guido spelled out his thinking on this in a later email. He basically said that he viewed lists as "a sequence of items of type X" while tuples are more like "a sequence of length N with items of type X1, X2, X3, ..." This makes sense since lists can be sorted while tuples can't; sorting on different types don't necessarily result in a sequence sorted the way you think about it. And if you are still having issues of wrapping your head around this, just view tuples as structs and lists as arrays as in C. This thread then led to another topic of comparisons_ in Python. Guido ended up mentioning how he wished == and != worked on all types (with disparate types always being !=) while all of the other comparisons only worked on similar types for the interpreter's default comparison abilities. This then led to Guido saying how he wished the __cmp__() magic method and the cmp() built-in didn't exist. This is because there are currently two ways to do comparisons; __cmp__(), and then all of the other rich comparison magic methods. You can implement the same functionality as __cmp__() using just __lt__() and __eq__(). There can also be an unneeded performance penalty for __cmp__() since (using the previously mentioned way of re-implementing __cmp__()) you might have to do some unneeded comparisons when all you need is __eq__(). This discussion is still going on. .. _comparisons: http://www.python.org/dev/doc/devel/ref/customization.html#l2h-91 =========================== `Capabilities in Python`__ =========================== __ http://mail.python.org/pipermail/python-dev/2003-March/033820.html Splinter threads: - `Capabilities `__ - `about candy `__ This is a continuation of a discussion covered in the `last summary`_. This was *definitely* the thread from hell for this summary. =) It is very long and there was confusion at multiple points over terminology. You have been warned. Three things were constantly being discussed in this thread; restricted execution, capabilities, and proxies. We discuss them in this order. Restricted execution basically cuts out access to certain objects at execution time. Currently, if you replace the global __builtins__ with something other then what __builtin__.__dict__ has then you enable restricted execution in Python. This cuts off access to built-in objects so as to prevent you from circumventing security code by, for instance, importing the sys_ module so you can replace a module's code in sys.modules. Both capabilities and proxies are worthless without restricted execution since they could be circumvented without it. Capabilities can loosely be thought of like bound methods. Security with capabilities is done based on possession; if you hold a reference to an object you can use that object. Proxies are a wrapper around objects that restrict access to the object. This restriction extends all the way to the core; even core code can't get access to parts of a proxied object that it doesn't want any object to get a hold of. There was talk of a PEP on all of this but one has not appeared yet. .. _sys: http://www.python.org/dev/doc/devel/lib/module-sys.html ========= Quickies ========= `Codec registry `__ Gustavo Niemeyer asked someone to review a patch. `Changes to logging in CVS `__ Vinay Sajip if someone checked-in changes to the logging_ package could be rolled back since it broke compatibility with Python 1.5.2 which the logging package tries to keep (as mentioned in `PEP 291`_). The changes were removed. .. _logging: http://www.python.org/dev/doc/devel/lib/module-logging.html .. _PEP 291: http://www.python.org/peps/pep-0291.html `__slots__ for metatypes `__ Christian Tismer asked Guido and the list to take a look at a patch that would allow meta-types to have a __slots__. The patch was accepted and applied. `new bytecode results `__ Damien Morton continues on his quest to get performance boosts from fiddling with the eval loop contained in `ceval.c`_ and trying out various opcode ideas. It was pointed out that pystone_ is a good indicator of how Zope_ will perform on a new box. It was also stated by Tim Peters that since it is such an atypical test that it helps to make sure any improvements you make *really* do make an improvement. Damien also requested more people contribute statistical information to Skip Montanaro's stat server (more info at http://manatee.mojam.com/~skip/python/ ). .. ceval.c: .. _pystone: .. _Zope: http://www.zope.org/ `module extension search order - can it be changed? `__ This was discussed in the `last summary`_. Tim Peters mentioned how he doesn't use linecache_ often and that it's printing out of date info is of any great use for tracebacks. .. _linecache: http://www.python.org/dev/doc/devel/lib/module-linecache.html `JUMP_IF_X opcodes `__ Damien Morton, still on the prowl for better opcodes, suggested introducing opcodes that combined branching opcodes and POP_TOP (which pops the top of the interpreter stack)and did the pop based on the truth value of what was being tested. Neal Norwitz suggested that instead the branching instructions just always pop the stack. If all of this cool opcode stuff that Damien keeps doing interests you, you will want to read `opcode.h`_, `ceval.c`_, and learn how to use the dis_ module. .. _opcode.h: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Include/opcode.h .. _ceval.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Python/ceval.c .. _dis: http://www.python.org/dev/doc/devel/lib/module-dis.html `Fun with timeit.py `__ A new module named timeit_ was added to the stdlib at the request of Jim Fulton. The module times the execution of code snippets. Guido timed the execution of going through a 'for' loop a million times with interpreters from Python 1.3 up to the current CVS (2.3a2 with patches up to that point). The result was that CVS was the fastest. .. _timeit: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/timeit.py `Pre-PyCon sprint ideas `__ I asked the list to suggest ideas to sprint on at PyCon_. `More Zen `__ Words of wisdom from Raymond Hettinger that everyone should read. And if you have never read Raymond's `School of Hard Knocks`_ email you owe yourself to stop whatever you are doing and read it **now**. I can personally vouch that email is right on the money; I have experienced (or suffered, depending on your view =) every single thing on that list sans writing a PEP (although writing the Summary is starting to be enough writing to be equal =) . .. _School of Hard Knocks: http://mail.python.org/pipermail/python-dev/2002-September/028725.html `xmlrpclib `__ : `xmlrpclib: Apology `__ Bill Bumgarner, the "hillbilly from the midwest of the US", asked if the xmlrpclib_ module was being maintained. The lesson was also learned to not call Fredrick Lundh "Fred" on the list since Fred L. Drake, Jr. tends to be associated with the name. =) .. _xmlrpclib: http://www.python.org/dev/doc/devel/lib/module-xmlrpclib.html `httplib SSLFile broken in CVS `__ Something got broken and fixed. `super() bug (?) `__ Samuele Pedroni thought he may have found a bug with super() but turned out it wasn't. `test_popen broken on Win2K `__ Win2k does not like quoting of commands when there is no space in the command as Tim Peters discovered. There were discussions on how to deal with this. The suggestion of coming up with an sh-like syntax that works on all platforms (like what tcl's exec command has) was suggsted. `Change in int() behavior `__ David Abrahams rediscovered the joys of the road to which leads to int/long unification when he noticed that ``isinstance(int(sys.maxint*2), int)`` returns False. This will not be an issue once we are farther down this road. `acceptability of asm in python code? `__ Damien Morton popped his optimizing head back up on python-dev asking if assembly code was acceptable in the core. As of right now there is none, but Tim Peters stated that if there was some that had "a huge speedup, on all programs" then it would be considered, although "on the weak end of maybe". Christian Tismer (who plays with assembly in Stackless_) warned against it in a large function since it can mess up caching. .. _Stackless: http://www.stackless.com/ `Internationalizing domain names `__ Martin v. Lwis asked someone to look over his patches to implement IDNA (International Domain Names in Applications) which allows non-ASCII characters in domain names. `VERSION in getpath.c `__ Guido explains to someone what compile variables are used to generate some compile-based search paths. `Where is OSS used? `__ Greg Ward asked what OSs use OSS_. .. _OSS: http://www.opensound.com/ `Audio devices `__ Greg Ward asked for opinions on some API issues for ossaudiodev_. .. _ossaudiodev: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/ossaudiodev.c `bsddb3 test errors - are these expected? `__ Skip Montanaro asked if some errors from the testing of bsddb3_ on OS X were expected. .. _bsddb3: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/_bsddb.c `os.path.dirname misleading? `__ Kevin Altis was surprised to discover that `os.path.dirname`_ would return the tail end of a directory instead of an empty string when the argument to the function was just a directory name. .. _os.path.dirname: http://www.python.org/dev/doc/devel/lib/module-os.path.html#l2h-1443 `Care to sprint on the core at PyCon? `__ Me asking the world if they wanted to sprint on the core at the pre-PyCon sprint (if you do, read the email for details). `Iterable sockets? `__ Andrew McNamara wished that socket objects were iterable on a per-line basis without having to call makefile(). Guido said he would rather come up with a better abstraction for Python 3 and prototype it in Python 2.4 or later. `More int/long integration issues `__ David Abrahams noticed that range() and xrange() couldn't accept a long. It basically led to Guido stating he hates xrange() and wish it didn't exist. But since getting rid of it would break code he can at least prevent it from gaining abilities. It also led to Guido mentioning again how he would like to prohibit shadowing of built-ins. `tzset `__ A new function, time.tzset(), was added to Python and the tests had failed under Windows. The tests and the ./configure check were changed as needed. `PyObject_New vs PyObject_NEW `__ Lesson of the thread: PyObject_NEW is only to be used in the core; use `PyObject_New()`_ for extension modules. .. _PyObject_New(): http://www.python.org/dev/doc/devel/api/allocating-objects.html `are NULL checks in Objects/abstract.c really needed? `__ ... They are not required, but they are there to protect you against poorly written extensions. Skip Montanaro subsequently suggested a --without-null-checks compile option. `PyEval_GetFrame() revisited `__ A possible API for manipulating the current frame was still being discussed. From ark@research.att.com Mon Mar 17 23:34:02 2003 From: ark@research.att.com (Andrew Koenig) Date: Mon, 17 Mar 2003 18:34:02 -0500 (EST) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Sun, 16 Mar 2003 15:34:17 -0500) References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200303161306.h2GD62L15598@pcp02138704pcs.reston01.va.comcast.net> <200303161602.h2GG2DO00056@europa.research.att.com> <200303162034.h2GKYH415958@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303172334.h2HNY2j13488@europa.research.att.com> Guido> This seems an argument for keeping both __cmp__ and the six __lt__ Guido> etc. Yet TOOWTDI makes me want to get rid of __cmp__. I'm beginning to wonder if part of what's going on is that there are really two different concepts that go under the general label of "comparison", namely the cases where trichotomy does and does not apply. In the first case, we have a total ordering; in the second, we have what C++ calls a "strict weak ordering", which is really an ordering of equivalence classes. From greg@cosc.canterbury.ac.nz Tue Mar 18 00:03:19 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Mar 2003 12:03:19 +1200 (NZST) Subject: [Python-Dev] PyObject_GenericGetIter() In-Reply-To: <20030317151040.GQ2112@xs4all.nl> Message-ID: <200303180003.h2I03JJ25264@oma.cosc.canterbury.ac.nz> Thomas Wouters : > it's a generic way to return an iterator *for an > iterator*. PyIter_GenericGetIter, PyObject_IterGetIter, etc are all > more descriptive. It should have Generic in it somewhere to fit the pattern. I'd go for PyIter_GenericGetIter. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Tue Mar 18 01:34:01 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 17 Mar 2003 20:34:01 -0500 Subject: [Python-Dev] PyObject_GenericGetIter() In-Reply-To: "Your message of Tue, 18 Mar 2003 12:03:19 +1200." <200303180003.h2I03JJ25264@oma.cosc.canterbury.ac.nz> References: <200303180003.h2I03JJ25264@oma.cosc.canterbury.ac.nz> Message-ID: <200303180134.h2I1Y1X19684@pcp02138704pcs.reston01.va.comcast.net> > It should have Generic in it somewhere to fit the pattern. Why? It's *not* generic. It's *specific* (to iterators). --Guido van Rossum (home page: http://www.python.org/~guido/) From Raymond Hettinger" There is an SF report that Pmw gets TypeErrors under Py2.3 but not under previous versions of Python: www.python.org/sf/697591 There are three parts to the story: 1. a method in _tkinter.c was changed (probably appropriately) to occassionally return ints in addition to strings. 2. Pmw used string.atoi() to coerce the result to an int. This should probably be changed, but I don't want existing Pmw to suddenly fail under 2.3. 3. String.atoi(s) works by calling int(s,10). It is the ten part that makes int() raise a TypeError. The long way to fix this bug is to 1) have Neal or MvL research the propriety of the changes to _tkinter and possibly find that they needed to be done and have to be left alone and 2) have the Pmw folks update their code to not use string.atoi(s) when s is not a string (seems like a bug, but s used to always be a string when they wrote the code). Step 2 doesn't help existing users of Pmw unless they get a bugfix release. The shortcut is to fix something that isn't broken and have string.atoi stop automatically appending the ten to the int() call. Current string.atoi: def atoi(s, base=10): return _int(s, base) Proposed string.atoi: def atoi(s, *args): return _int(s, *args) The shortcut has some appeal because it lets the improvements to _tkinter stay in place and allows existing Pmw installations to continue to operate. Otherwise, one of the two has to change. Does anyone think changin string.atoi is the wrong way to go? Raymond Hettinger From greg@cosc.canterbury.ac.nz Tue Mar 18 01:41:18 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Mar 2003 13:41:18 +1200 (NZST) Subject: [Python-Dev] PyObject_GenericGetIter() In-Reply-To: <200303180134.h2I1Y1X19684@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303180141.h2I1fIq27804@oma.cosc.canterbury.ac.nz> > Why? It's *not* generic. It's *specific* (to iterators). That's why I voted for PyIter_GenericGetIter and not PyObject_GenericGetIter. PyIter_ means it has to do with iterators; Generic means it's a default implementation; GetIter identifies which type slot it implements. Hmmm... maybe we need a formal grammar for Python/C API function names, to help settle questions like this... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Mar 18 01:47:24 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Mar 2003 13:47:24 +1200 (NZST) Subject: [Python-Dev] Shortcut bugfix In-Reply-To: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> Message-ID: <200303180147.h2I1lOs27924@oma.cosc.canterbury.ac.nz> Raymond Hettinger : > Current string.atoi: > def atoi(s, base=10): > return _int(s, base) > > Proposed string.atoi: > def atoi(s, *args): > return _int(s, *args) It looks harmless to me. My only concern would be if it started making things like string.atoi("0x10") accepted, but a quick experiment suggests that this would not be the case (in 2.2, at least). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin@v.loewis.de Tue Mar 18 06:58:03 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 18 Mar 2003 07:58:03 +0100 Subject: [Python-Dev] Shortcut bugfix In-Reply-To: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> References: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> Message-ID: "Raymond Hettinger" writes: > There are three parts to the story: > 1. a method in _tkinter.c was changed (probably appropriately) to > occassionally return ints in addition to strings. > > 2. Pmw used string.atoi() to coerce the result to an int. > This should probably be changed, but I don't want > existing Pmw to suddenly fail under 2.3. Is Pmw using _tkinter directly, or indirectly via Tkinter? Neither answer to this question makes sense: a) if Pmw uses _tkinter directly, it should not receive int results. b) if Pmw uses Tkinter, it should not find methods that used to return strings but now return ints. If, for some strange reason, b) does happen, applications can invoke Tkinter.wantobjects = 0 to restore the old behaviour. > Does anyone think changin string.atoi is the wrong way to go? It would change the historical behaviour, I believe: string.atoi(10) used to give a TypeError even back in Python 1.5. Regards, Martin From ben@algroup.co.uk Tue Mar 18 09:43:36 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Tue, 18 Mar 2003 09:43:36 +0000 Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 In-Reply-To: References: Message-ID: <3E76EA48.7070402@algroup.co.uk> Brett Cannon wrote: > There was talk of a PEP on all of this but one has not appeared yet. I am working on it now. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From mal@lemburg.com Tue Mar 18 09:50:29 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 18 Mar 2003 10:50:29 +0100 Subject: [Python-Dev] Shortcut bugfix In-Reply-To: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> References: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> Message-ID: <3E76EBE5.8020200@lemburg.com> Raymond Hettinger wrote: > The shortcut is to fix something that isn't broken and have string.atoi > stop automatically appending the ten to the int() call. > > Current string.atoi: > def atoi(s, base=10): > return _int(s, base) > > Proposed string.atoi: > def atoi(s, *args): > return _int(s, *args) > > The shortcut has some appeal because it lets the improvements > to _tkinter stay in place and allows existing Pmw installations > to continue to operate. Otherwise, one of the two has to change. > > Does anyone think changin string.atoi is the wrong way to go? Yes, because it changes the semantics. string.atoi() would suddenly start to accept non-strings like integers, floats, etc. My suggestion would be to carefully reconsider the changes to _tkinter. If it's true that a method can now return strings *and* integers which previously only returned strings, then such a change is clearly not backward compatible. I'd create a new method for the new semantics in that case. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 18 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 14 days left EuroPython 2003, Charleroi, Belgium: 98 days left From ben@algroup.co.uk Tue Mar 18 09:54:06 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Tue, 18 Mar 2003 09:54:06 +0000 Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 In-Reply-To: References: Message-ID: <3E76ECBE.7050200@algroup.co.uk> Brett Cannon wrote: > Capabilities can loosely be thought of like bound methods. Security with > capabilities is done based on possession; if you hold a reference to an > object you can use that object. This confusion is my fault: I just happened to like using bound methods as the basis for capabilities, but objects can also be used, so long as access to them is appropriately restricted. This is explained in detail in the PEP I am writing (with help from others, I should note). > Proxies are a wrapper around objects that restrict access to the object. > This restriction extends all the way to the core; even core code can't get > access to parts of a proxied object that it doesn't want any object to get > a hold of. Its not clear to me what you mean by "core code" - certainly anything written in C can slice through a proxy without any problems (or, indeed, a capability). Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From mal@lemburg.com Tue Mar 18 10:27:14 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 18 Mar 2003 11:27:14 +0100 Subject: [Python-Dev] capabilities & proxies (python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: <3E76ECBE.7050200@algroup.co.uk> References: <3E76ECBE.7050200@algroup.co.uk> Message-ID: <3E76F482.7030508@lemburg.com> Ben Laurie wrote: > Brett Cannon wrote: > >> Capabilities can loosely be thought of like bound methods. Security with >> capabilities is done based on possession; if you hold a reference to an >> object you can use that object. > > > This confusion is my fault: I just happened to like using bound methods > as the basis for capabilities, but objects can also be used, so long as > access to them is appropriately restricted. This is explained in detail > in the PEP I am writing (with help from others, I should note). > >> Proxies are a wrapper around objects that restrict access to the object. >> This restriction extends all the way to the core; even core code can't >> get >> access to parts of a proxied object that it doesn't want any object to >> get >> a hold of. > > Its not clear to me what you mean by "core code" - certainly anything > written in C can slice through a proxy without any problems (or, indeed, > a capability). That's certainly true... BTW, just in case you aren't aware of it, mxProxy implements pretty much what Brett summarized here for proxies. You may want to have a look. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 18 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 14 days left EuroPython 2003, Charleroi, Belgium: 98 days left From zooko@zooko.com Tue Mar 18 11:40:41 2003 From: zooko@zooko.com (Zooko) Date: Tue, 18 Mar 2003 06:40:41 -0500 Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 In-Reply-To: Message from Brett Cannon of "Mon, 17 Mar 2003 15:11:46 PST." References: Message-ID: > Capabilities can loosely be thought of like bound methods. Security with > capabilities is done based on possession; if you hold a reference to an > object you can use that object. No -- capabilities (as envisioned for Python) are references. Whether a reference to an object, to a bound method, or to a function doesn't matter. Note that it isn't that capabilities are "like" references, it is that capabilities *are* references. Every reference is a capability. Every capability is a reference. > Security with > capabilities is done based on possession; if you hold a reference to an > object you can use that object. Yes. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From guido@python.org Tue Mar 18 12:15:09 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 18 Mar 2003 07:15:09 -0500 Subject: [Python-Dev] Shortcut bugfix In-Reply-To: "Your message of Tue, 18 Mar 2003 10:50:29 +0100." <3E76EBE5.8020200@lemburg.com> References: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> <3E76EBE5.8020200@lemburg.com> Message-ID: <200303181215.h2ICF9j21821@pcp02138704pcs.reston01.va.comcast.net> > > Does anyone think changin string.atoi is the wrong way to go? > > Yes, because it changes the semantics. string.atoi() would suddenly > start to accept non-strings like integers, floats, etc. > > My suggestion would be to carefully reconsider the changes to > _tkinter. If it's true that a method can now return strings *and* > integers which previously only returned strings, then such a > change is clearly not backward compatible. I'd create a new > method for the new semantics in that case. That's my gut feeling too. --Guido van Rossum (home page: http://www.python.org/~guido/) From ben@algroup.co.uk Tue Mar 18 12:11:09 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Tue, 18 Mar 2003 12:11:09 +0000 Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 In-Reply-To: References: Message-ID: <3E770CDD.7050206@algroup.co.uk> Zooko wrote: >>Capabilities can loosely be thought of like bound methods. Security with >>capabilities is done based on possession; if you hold a reference to an >>object you can use that object. > > > No -- capabilities (as envisioned for Python) are references. Whether a > reference to an object, to a bound method, or to a function doesn't matter. > > Note that it isn't that capabilities are "like" references, it is that > capabilities *are* references. Every reference is a capability. Every > capability is a reference. I should note that this is a new (and good) idea, not one that we've previously expressed. And, of course, they are references with restrictions, which will be spelt out in the PEP. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From zooko@zooko.com Tue Mar 18 12:41:23 2003 From: zooko@zooko.com (Zooko) Date: Tue, 18 Mar 2003 07:41:23 -0500 Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: Message from Zooko of "Tue, 18 Mar 2003 06:40:41 EST." References: Message-ID: brett@python.org wrote: > > Security with capabilities is done based on possession; if you hold a > reference to an object you can use that object. Note that you can use capabilities as your sole access control mechanism if every resource that you want to protect is identifiable with a Python reference. For example, suppose you want to control the ability to listen on sockets for network traffic. If there is a reference (e.g., to an object) that represents the privilege of listening on sockets, then you can give such a reference to one object, allowing that object it to listen on sockets, while withholding it from another object, thus preventing that one from listening on sockets. The only part of Python that isn't already well-matched with capabilities is the way that authority is gained by importing modules, and you can load a module even when you weren't given a reference to it. Reconciling Python modules with capabilities would be the challenging part. Python objects, bound methods, functions, and suchlike are already well-matched to capabilities. In case that isn't clear, I'll write a quick example. Recalling my "tic-tac-toe game" example from [1], I wrote code which allowed or denied the tic-tac-toe game to paint a window on the screen and to write to a file. This "allow-or- deny" enforcement was unified with the designation of which window and which file. That is: by passing a reference to a certain window, I simultaneously told the tic-tac-toe game which window to draw in *and* extended to it the privilege of drawing to the screen. This is a central motto of the capability security crowd: "Unify designation with authority." In a capability system *all* authority -- everything that you could ever want to prevent -- is mediated by capabilities. Code that is loaded and run, but to which no capabilities are extended, must be incapable of doing anything dangerous. Now, what about modules? In current Python, some code can "import os" and gain all kinds of authority. In the rexec scheme, as I understand it, there was a handler function which could be overridden to determine what happens when I try "import os". This is effectively a "policy mini-language", such as in the hypothetical "restricted Python v2" [1]. (Guido has pointed out that this overridable policy handler could be used to implement capabilities as well as other regimes. I think I agree in principle, but what I am advocating here is having the core language implement capabilities so that the programmer-visible part is as minimal and unified as possible.) Now what I would *like* is that instead of doing "import os" to load code, instead the caller provides, or doesn't provide the os module as part of the construction/invocation of A. I don't have a clear idea yet of how that could be implemented in a Pythonic, compatible way. Just to help me think about it I'll suggest a non-Pythonic and incompatible way: there is no "import" keyword. When you invoke a constructor, function, method, etc., you have to pass as arguments references to everything that the code will need to do its job. So, assuming the tic-tac-toe game requires the "math" module and the "string" module, I would have to write: # restricted Python v3+modules game = TicTacToeGame() game.display(open("/tmp/tttgame.out","w"), math, string) The burden of typing in dozens of module names with each invocation can be eased by: 1. bundling modules together (put math, string, and some other stuff into one object/module/package named "standardstuff" and pass that as an argument), 2. "safe" modules that nobody could ever wish to prevent could be globally available (via the resurrected "import" keyword, I guess). If math and string are both "safe", then the example goes back to: # restricted Python v3+modules game = TicTacToeGame() # a game against itself that writes results to a file game.display(open("/tmp/tttgame.out","w")) # a game against remote, listening on a socket game.display(open("/tmp/tttgame.out","w"), socket) (Note that there is a bootstrapping problem -- *some* code has to receive a reference to the os module ex nihilo. That code should be "trusted" code -- the Python interpreter, basically.) Ah, but this last line shows another problem -- the game now has the socket module, and the ability to open sockets to remote hosts and more. I just wanted to allow it to listen on a particular port! The code would be safer if I didn't pass the large-grained module and instead passed a specific object: # a game against remote, listening on a socket listensocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listensocket.bind(('daring.cwi.nl', 8901,)) game.display(open("/tmp/tttgame.out","w"), listensocket) Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links [1] http://mail.python.org/pipermail/python-dev/2003-March/033938.html From zooko@zooko.com Tue Mar 18 12:44:47 2003 From: zooko@zooko.com (Zooko) Date: Tue, 18 Mar 2003 07:44:47 -0500 Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 In-Reply-To: Message from Ben Laurie of "Tue, 18 Mar 2003 12:11:09 GMT." <3E770CDD.7050206@algroup.co.uk> References: <3E770CDD.7050206@algroup.co.uk> Message-ID: (I, Zooko, wrote the lines prepended with "> > ".) Ben Laurie wrote: > > > No -- capabilities (as envisioned for Python) are references. Whether a > > reference to an object, to a bound method, or to a function doesn't matter. > > I should note that this is a new (and good) idea, not one that we've > previously expressed. And, of course, they are references with > restrictions, which will be spelt out in the PEP. Yes. It isn't that Brett missed something and I corrected him, it's that I just asserted an idea that hasn't previously been posted to the list. Is the python-dev summary allowed to describe ideas posted in discussion of the python-dev summary? ;-) Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From guido@python.org Tue Mar 18 13:56:12 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 18 Mar 2003 08:56:12 -0500 Subject: [Python-Dev] Shortcut bugfix In-Reply-To: Your message of "Tue, 18 Mar 2003 07:15:09 EST." <200303181215.h2ICF9j21821@pcp02138704pcs.reston01.va.comcast.net> References: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> <3E76EBE5.8020200@lemburg.com> <200303181215.h2ICF9j21821@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303181356.h2IDuEv19695@odiug.zope.com> [RH] > > > Does anyone think changin string.atoi is the wrong way to go? [MAL] > > Yes, because it changes the semantics. string.atoi() would suddenly > > start to accept non-strings like integers, floats, etc. > > > > My suggestion would be to carefully reconsider the changes to > > _tkinter. If it's true that a method can now return strings *and* > > integers which previously only returned strings, then such a > > change is clearly not backward compatible. I'd create a new > > method for the new semantics in that case. [GvR] > That's my gut feeling too. I misspoke. I agree with MAL that string.atoi() shouldn't be changed. But I didn't mean to imply that I wanted the changes to _tkinter and Tkinter to be rolled back. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Tue Mar 18 16:25:07 2003 From: aahz@pythoncraft.com (Aahz) Date: Tue, 18 Mar 2003 11:25:07 -0500 Subject: [Python-Dev] Capabilities Message-ID: <20030318162507.GB13338@panix.com> On Tue, Mar 18, 2003, Zooko wrote: > > No -- capabilities (as envisioned for Python) are references. Whether > a reference to an object, to a bound method, or to a function doesn't > matter. > > Note that it isn't that capabilities are "like" references, it is > that capabilities *are* references. Every reference is a capability. > Every capability is a reference. Are you saying that an int is a capability? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Register for PyCon now! http://www.python.org/pycon/reg.html From aahz@pythoncraft.com Tue Mar 18 16:26:23 2003 From: aahz@pythoncraft.com (Aahz) Date: Tue, 18 Mar 2003 11:26:23 -0500 Subject: [Python-Dev] capability-mediated modules Message-ID: <20030318162623.GC13338@panix.com> On Tue, Mar 18, 2003, Zooko wrote: > > For example, suppose you want to control the ability to listen on > sockets for network traffic. If there is a reference (e.g., to an > object) that represents the privilege of listening on sockets, then > you can give such a reference to one object, allowing that object it > to listen on sockets, while withholding it from another object, thus > preventing that one from listening on sockets. Doesn't that only work if the second object never gains a reference to the first object? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Register for PyCon now! http://www.python.org/pycon/reg.html From zooko@zooko.com Tue Mar 18 16:55:05 2003 From: zooko@zooko.com (Zooko) Date: Tue, 18 Mar 2003 11:55:05 -0500 Subject: [Python-Dev] Re: Capabilities In-Reply-To: Message from Aahz of "Tue, 18 Mar 2003 11:25:07 EST." <20030318162507.GB13338@panix.com> References: <20030318162507.GB13338@panix.com> Message-ID: (I, Zooko, wrote the lines prepended with "> > ".) Aahz wrote: > > > No -- capabilities (as envisioned for Python) are references. Whether > > a reference to an object, to a bound method, or to a function doesn't > > matter. > > > > Note that it isn't that capabilities are "like" references, it is > > that capabilities *are* references. Every reference is a capability. > > Every capability is a reference. > > Are you saying that an int is a capability? Do you mean: references are really just memory addresses? Python has pointer- safety so Python code cannot access a Python object without a reference to it, even if it knows that object's memory address. This is the first requirement listed in this message: [1]. Or do you mean: I could have a reference to some fundamental computational concept like an int -- would that reference be a capability? I would say yes, all references, even to some basic programming language constructs like None or True, are capabilities. Things like None, True, integers, etc., need to be available to all code (just so that we don't have to pass the same bundle of dozens of standard references to every object we create). Fortunately they can also be made safe so that it is okay for untrusted code to access them. Unfortunately the current implementations of things like None are not safe [2]. This is part of the third requirement listed in [1]. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links [1] http://mail.python.org/pipermail/python-dev/2003-March/033891.html [2] http://mail.python.org/pipermail/python-dev/2003-March/033945.html From pje@telecommunity.com Tue Mar 18 17:41:01 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 18 Mar 2003 12:41:01 -0500 Subject: [Python-Dev] capability-mediated modules Message-ID: <5.1.1.6.0.20030318123644.01dfe310@mail.rapidsite.net> Zooko wrote: >Just to help me think about it I'll suggest a non-Pythonic and incompatible way: >there is no "import" keyword. When you invoke a constructor, function, method, >etc., you have to pass as arguments references to everything that the code will >need to do its job. So, assuming the tic-tac-toe game requires the "math" >module and the "string" module, I would have to write: There's a *much* simpler way to do this. 'import' is implemented by calling an '__import__' function -- which of course is a capability. To do mediated imports, it's only necessary to supply a mediating version of '__import__'. One also needs a specialized version of '__import__' to set up a newly imported module's builtins. From zooko@zooko.com Tue Mar 18 18:01:05 2003 From: zooko@zooko.com (Zooko) Date: Tue, 18 Mar 2003 13:01:05 -0500 Subject: [Python-Dev] capability-mediated modules In-Reply-To: Message from "Phillip J. Eby" of "Tue, 18 Mar 2003 12:41:01 EST." <5.1.1.6.0.20030318123644.01dfe310@mail.rapidsite.net> References: <5.1.1.6.0.20030318123644.01dfe310@mail.rapidsite.net> Message-ID: "Phillip J. Eby" wrote: > > There's a *much* simpler way to do this. 'import' is implemented by > calling an '__import__' function -- which of course is a capability. To do > mediated imports, it's only necessary to supply a mediating version of > '__import__'. One also needs a specialized version of '__import__' to set > up a newly imported module's builtins. I'm aware of this feature, but I was groping for a more elegant (and more capability-flavored) way to do it. As far as I can tell, the technique you describe is the "policy mini-language" way to implement access control. If I want a certain chunk of code to have access to a certain module, I set the policy with an overridable handler or a configuration object, specifying that this code is allowed to have access to this module, and then I invoke the code. This is analogous to "restricted Python v2"'s way of controlling access to certain resources in this e-mail message [1]. If I change my mind about how the code should work, I have to make changes in two places: first the part of the code that says "import spam" has to change to say "import eggs", and second the policy configuration has to change from "suchandsuch is allowed to import spam" to "suchandsuch is allowed to import eggs". In a pure capability language like E, whenever a module is imported it comes into life without permission to do anything, and then the importer grants it permission to do whatever it needs to do (by passing references). This is more like "restricted Python v3" in that designating which module the code ought to use, and authorizing the code to use that module, are both done in the same act (by passing a reference to the module in question). There is a description with examples in the "E in a Walnut" book [2]. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links [1] http://mail.python.org/pipermail/python-dev/2003-March/033938.html [2] http://www.skyhunter.com/marcs/ewalnut.html#SEC16 From zooko@zooko.com Tue Mar 18 16:36:42 2003 From: zooko@zooko.com (Zooko) Date: Tue, 18 Mar 2003 11:36:42 -0500 Subject: [Python-Dev] capability-mediated modules In-Reply-To: Message from Aahz of "Tue, 18 Mar 2003 11:26:23 EST." <20030318162623.GC13338@panix.com> References: <20030318162623.GC13338@panix.com> Message-ID: (I, Zooko, wrote the lines prepended with "> > ".) Aahz wrote: > > > For example, suppose you want to control the ability to listen on > > sockets for network traffic. If there is a reference (e.g., to an > > object) that represents the privilege of listening on sockets, then > > you can give such a reference to one object, allowing that object it > > to listen on sockets, while withholding it from another object, thus > > preventing that one from listening on sockets. > > Doesn't that only work if the second object never gains a reference to > the first object? This is why real mandatory private data is needed. The second object could have a reference to the first object, and could use the first object through some interface offered by the first object, without being able to access the first object's socket-listener capability. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From martin@v.loewis.de Tue Mar 18 20:40:32 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 18 Mar 2003 21:40:32 +0100 Subject: [Python-Dev] Shortcut bugfix In-Reply-To: <3E76EBE5.8020200@lemburg.com> References: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> <3E76EBE5.8020200@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > If it's true that a method can now return strings *and* > integers which previously only returned strings That is not true. Regards, Martin From martin@v.loewis.de Tue Mar 18 20:41:50 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 18 Mar 2003 21:41:50 +0100 Subject: [Python-Dev] Shortcut bugfix In-Reply-To: <200303181356.h2IDuEv19695@odiug.zope.com> References: <001501c2ecee$be00f2e0$125ffea9@oemcomputer> <3E76EBE5.8020200@lemburg.com> <200303181215.h2ICF9j21821@pcp02138704pcs.reston01.va.comcast.net> <200303181356.h2IDuEv19695@odiug.zope.com> Message-ID: Guido van Rossum writes: > I misspoke. I agree with MAL that string.atoi() shouldn't be > changed. But I didn't mean to imply that I wanted the changes to > _tkinter and Tkinter to be rolled back. I'd like to understand the problem with Pmw first, and I agree that changing atoi is not the right solution, regardless of what the problem is. Regards, Martin From drifty@alum.berkeley.edu Tue Mar 18 21:45:35 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Tue, 18 Mar 2003 13:45:35 -0800 (PST) Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 In-Reply-To: References: <3E770CDD.7050206@algroup.co.uk> Message-ID: [Zooko] > Is the python-dev summary allowed to describe ideas posted in discussion of the > python-dev summary? ;-) > This is all going into the current summary. Wasn't expecting this to generate new content, though. =) -Brett From tismer@tismer.com Tue Mar 18 22:46:11 2003 From: tismer@tismer.com (Christian Tismer) Date: Tue, 18 Mar 2003 23:46:11 +0100 Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 In-Reply-To: References: <3E770CDD.7050206@algroup.co.uk> Message-ID: <3E77A1B3.7040108@tismer.com> Brett Cannon wrote: > [Zooko] > > >>Is the python-dev summary allowed to describe ideas posted in discussion of the >>python-dev summary? ;-) >> > > > This is all going into the current summary. Wasn't expecting this to > generate new content, though. =) Hee hee, be aware of infinite recursion :-) -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From csv@mail.mojam.com Tue Mar 18 23:09:46 2003 From: csv@mail.mojam.com (Skip Montanaro) Date: Tue, 18 Mar 2003 17:09:46 -0600 Subject: [Python-Dev] csv package ready for prime-time? Message-ID: <15991.42810.167975.876841@montanaro.dyndns.org> I'm ready to move the csv package out of the sandbox into the main CVS trunk. Since my last post there have been a few changes and comments: * Cliff Wells contributed his csv file parameter sniffing code * The installation is now a package instead of a single module (not sure if the docs have caught up with this change yet) * On the mailing list, the following threads of significance are found: - John Machin pointed out a few bugs and raised issues with my decision to ignore blank lines in my DictReader class. I don't believe we ever reached a concensus we were both happy with. (That is, John may still be slightly unhappy with the current results. I didn't change the behavior as a result of the thread.) - Andrew Dalke reported some problems using a space character as the delimiter which appear to be resolved. Is there a formal process for "dusting off" software which has been playing in the sandbox? What about getting PEP 305 stamped with the BDFL seal of approval? (I realize Guido's busy in the run-up to PyCon.) As a gentle reminder, the relevant URLs are http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/ http://www.python.org/peps/pep-0305.html You can browse the mailing list archives at http://manatee.mojam.com/pipermail/csv You can check out the code and play with it fairly easily. From your sandbox directory execute cvs up -dP . cd csv python setup.py install Feedback to the csv mailing list please (Reply-To: adjusted accordingly). Skip From greg@cosc.canterbury.ac.nz Tue Mar 18 23:17:55 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 19 Mar 2003 11:17:55 +1200 (NZST) Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: Message-ID: <200303182317.h2INHtF09051@oma.cosc.canterbury.ac.nz> Zooko : > Now what I would *like* is that instead of doing "import os" to load code, > instead the caller provides, or doesn't provide the os module as part of the > construction/invocation of A. > > I don't have a clear idea yet of how that could be implemented in a > Pythonic, compatible way. Maybe, instead of there being one ultra-global namespace for importing modules from, it should be part of a function's environment. By default a function invocation would inherit the "import environment" of it's caller, but the caller could override this to provide a more restricted environment. This would be equivalent to passing in a set of allowable modules as an implicit parameter to every call. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Mar 18 23:48:43 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 19 Mar 2003 11:48:43 +1200 (NZST) Subject: [Python-Dev] Capabilities In-Reply-To: <20030318162507.GB13338@panix.com> Message-ID: <200303182348.h2INmhK09178@oma.cosc.canterbury.ac.nz> Aahz : > Are you saying that an int is a capability? Some integers could confer quite powerful capabilities. 42, for example, apparently gives us the capability of knowing the answer to the ultimate question of life, the universe and everything, which has to be a pretty awesome thing to know! All we need now is a capability which gives us access to the question... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tdelaney@avaya.com Wed Mar 19 00:05:03 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Wed, 19 Mar 2003 11:05:03 +1100 Subject: [Python-Dev] python-dev Summary for 2003-03-01 through 2003-03-15 Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE2E7D33@au3010avexu1.global.avaya.com> > From: Christian Tismer [mailto:tismer@tismer.com] > >=20 > > This is all going into the current summary. Wasn't=20 > expecting this to > > generate new content, though. =3D) >=20 > Hee hee, be aware of infinite recursion :-) I wouldn't expect it to be that much of a problem. Each time through = seems to take a lot of time, so there are plenty of opportunities to = break out of the infinite recursion before the net fills up ... Tim Delaney From guido@python.org Wed Mar 19 01:22:28 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 18 Mar 2003 20:22:28 -0500 Subject: [Python-Dev] csv package ready for prime-time? In-Reply-To: "Your message of Tue, 18 Mar 2003 17:09:46 CST." <15991.42810.167975.876841@montanaro.dyndns.org> References: <15991.42810.167975.876841@montanaro.dyndns.org> Message-ID: <200303190122.h2J1MSI22680@pcp02138704pcs.reston01.va.comcast.net> > Is there a formal process for "dusting off" software which has been > playing in the sandbox? What about getting PEP 305 stamped with the > BDFL seal of approval? (I realize Guido's busy in the run-up to > PyCon.) I have no bandwidth for looking at this until after I'm back from the UK Python conference (April 7), but I think there's no reason why you should wait for me with moving the csv package to the dist/src tree. --Guido van Rossum (home page: http://www.python.org/~guido/) From neal@metaslash.com Wed Mar 19 01:31:48 2003 From: neal@metaslash.com (Neal Norwitz) Date: Tue, 18 Mar 2003 20:31:48 -0500 Subject: [Python-Dev] string.strip doc vs code mismatch Message-ID: <20030319013148.GX14067@epoch.metaslash.com> Could someone please review the patch attached to this bug: http://python.org/sf/697220 There are two patches attached--one for 2.3 and one for 2.2.3. As you may recall, there was a parameter added to all the strip methods for 2.3. This was inadvertantly backported to 2.2.2, but only for string and unicode methods. Sometime back it was decided to update the string module for 2.2.2 to stay in sync with the string/unicode methods. However, there have been various partial fixes. I believe the patches correct all the problems: * fix differences between extra param name--it was referred to as both chars and sep in code/doc, always use chars * update docstrings, add notes about chars param (sep -> chars) * add chars param to each string.*strip() function (ie, lstrip, rstrip) * update doc with functionality and to correct when the param was added Neal From zen@shangri-la.dropbear.id.au Thu Mar 20 05:01:33 2003 From: zen@shangri-la.dropbear.id.au (Stuart Bishop) Date: Thu, 20 Mar 2003 16:01:33 +1100 Subject: [Python-Dev] tzset In-Reply-To: Message-ID: <057832A9-5A91-11D7-8A30-000393B63DDC@shangri-la.dropbear.id.au> On Monday, March 17, 2003, at 01:17 PM, Tim Peters wrote: > [Guido] >> ... >> I don't know if it makes sense to provide tzset() on Windows; from >> Tim's description it doesn't sound likely. > > I wouldn't object if someone else wanted to do the work (which includes > documenting it well enough to cut off an endless stream of obvious > questions). The Windows tzset is weak but maybe usable for some > people. > For example, time zone names must be exactly 3 characters, and you > can't > tell the Windows tzset when daylight time begins or ends: it uses US > rules > no matter what the time zone. The native Win32 > SetTimeZoneInformation() > doesn't suffer these idiocies, but I'm not sure whether calling that > affects > the Unixish _tzname (etc) variables. "Doing the work" also means > figuring > out all that stuff. I've submitted an update to SF: http://www.python.org/sf/706707 This version should only build time.tzset if it accepts the TZ environment variable formats documented at: http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html So it shouldn't build under Windows. The last alternative would be to expose time.tzset if it exists at all, and the test suite would simply check to make sure it doesn't raise an exception. This would leave behaviour totally up to the OS, and the corresponding lack of documentation in the Python library reference. -- Stuart Bishop http://shangri-la.dropbear.id.au/ From tim_one@email.msn.com Thu Mar 20 05:09:53 2003 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 20 Mar 2003 00:09:53 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303172334.h2HNY2j13488@europa.research.att.com> Message-ID: [Andrew Koenig] > I'm beginning to wonder if part of what's going on is that there are > really two different concepts that go under the general label of > "comparison", namely the cases where trichotomy does and does not apply. > > In the first case, we have a total ordering; in the second, we have what > C++ calls a "strict weak ordering", which is really an ordering of > equivalence classes. I'm afraid Python people really want a total ordering, because that's what Python gave them at the start (ya, I understand the long vs float business, but nobody in real life ever griped about that). It's a curious thing that the *specific* total ordering Python supplied changed across releases, and nobody complained about that(*). Also curious that, within a release, nobody complained that the specific total ordering can change across program runs (comparisons falling back to comparing object addresses are consistent within a run, but not necessarily across runs). That doesn't deny there are multiple comparison concepts people want, it just speaks against a strict weak ordering being one of them. For example, when using binary search on a sorted list to determine membership, people want total ordering. OTOH, when faced with 42 < "42" in isolation, sane Python people want an exception. When faced with "x in sequence_or_mapping", most people want __eq__ but some people want object identity (e.g., it's not always helpful that 3 == 3.0). One size doesn't fit anyone all the time. (*) I have to take that back: people *did* complain when the relative position of None changed. It's an undocumented fact that None compares "less than" non-None objects now (of types that don't force a different outcome), but that wasn't always so, and I clearly recall a few complaints after that changed, from people who apparently deliberately relied on its equally undocumented comparison behavior before. From tim_one@email.msn.com Thu Mar 20 05:58:55 2003 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 20 Mar 2003 00:58:55 -0500 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: Message-ID: [ark@research.att.com] > The binary-search routines in the C++ standard library mostly avoid > having to do != comparisons by defining their interfaces in the > following clever way: > > binary_search returns a boolean that indicates whether the > value sought is in the sequence. It does not > say where that value is. > > lower_bound returns the first position ahead of which > the given value could be inserted without > disrupting the ordering of the sequence. > > upper_bound returns the last position ahead of which > the given value could be inserted without > disrupting the ordering of the sequence. These last two are quite like Python's bisect.bisect_{left, right}, which are implemented using only __lt__ element comparison. > equal_range returns (lower_bound, upper_bound) as a pair. > > In Python terms: > > binary_search([3, 5, 7], 6) would yield False > binary_search([3, 5, 7], 7) would yield True > lower_bound([1, 3, 5, 7, 9, 11], 9) would yield 4 > lower_bound([1, 3, 5, 7, 9, 11], 8) would also yield 4 > upper_bound([1, 3, 5, 7, 9, 11], 9) would yield 5 >>> import bisect >>> x = [1, 3, 5, 7, 9, 11] >>> bisect.bisect_left(x, 9) 4 >>> bisect.bisect_left(x, 8) 4 >>> bisect.bisect_right(x, 9) 5 >>> We conclude that C++ did something right . > equal_range([1, 1, 3, 3, 3, 5, 5, 5, 7], 3) > would yield (2, 5). > > If you like, equal_range(seq, x) returns (l, h) such that all the > elements of seq[l:h] are equal to x. If l == h, the subsequence is > the empty sequence between the two adjacent elements with values that > bracket x. > > These definitions turn out to be useful in practice, and are also > easy to implement efficiently using only < comparisons. I think Python got the most valuable of these, and they're useful in Python too. Nevertheless, if you're coding an explicit conventional binary search tree (nodes containing a value, a reference to "a left" node, and a reference to "a right" node), cmp() is more convenient; and even more so if you're coding a ternary search tree. Sometimes cmp allows for more compact code. Python's previous samplesort implementation endured a *little* clumsiness to infer equality (a == b) from not (ab). The current adaptive mergesort feels the restriction to < more acutely and in more places. For example, when merging two runs A and B, part of the adaptive strategy is to precompute, via a form of binary search, where A[0] belongs in B, and where B[-1] belongs in A. This sounds like two instances of the same task, but they're maddeningly different because-- in order to preserve stability --the first search needs to be of the bisect_left flavor and the second of bisect_right. Combining both modes of operation in a single search routine with a flag argument, and sticking purely to __lt__, leads to horridly obscure code, so these searches are actually implemented by distinct functions. If it were able to use cmp() instead, folding them into one routine would have been unobjectionable (if < is needed, check for cmp < 0; if <= is needed, check for cmp <= 0 same-as cmp < 1; so 0 or 1 could be passed in to select between < and <= very efficiently and reasonably clearly). From ben@algroup.co.uk Thu Mar 20 10:33:26 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Thu, 20 Mar 2003 10:33:26 +0000 Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: <200303182317.h2INHtF09051@oma.cosc.canterbury.ac.nz> References: <200303182317.h2INHtF09051@oma.cosc.canterbury.ac.nz> Message-ID: <3E7998F6.6010201@algroup.co.uk> Greg Ewing wrote: > Zooko : > > >>Now what I would *like* is that instead of doing "import os" to load code, >>instead the caller provides, or doesn't provide the os module as part of the >>construction/invocation of A. >> >>I don't have a clear idea yet of how that could be implemented in a >>Pythonic, compatible way. > > > Maybe, instead of there being one ultra-global namespace for importing > modules from, it should be part of a function's environment. By > default a function invocation would inherit the "import environment" > of it's caller, but the caller could override this to provide a more > restricted environment. Inheriting things is not the capability way. Passing capabilities that allow imports is, of course, but isn't very Pythonic. I'm not sure there's a neat way to fix this that keeps both camps happy. > This would be equivalent to passing in a set of allowable > modules as an implicit parameter to every call. Making it explicit would make me happy. Can you pass parameters to an import? Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From pedronis@bluewin.ch Thu Mar 20 12:41:07 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Thu, 20 Mar 2003 13:41:07 +0100 Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) References: <200303182317.h2INHtF09051@oma.cosc.canterbury.ac.nz> <3E7998F6.6010201@algroup.co.uk> Message-ID: <003601c2eedd$fab452e0$6d94fea9@newmexico> > > Making it explicit would make me happy. Can you pass parameters to an > import? > not directly, an extension like import module(parmmod=....,...) would not seem totally unreasonable. The problem is that normally modules are uniquely globally identified singletons, but the very notion of parametrization implies instantiation and that breaks the singleton part. When to instatiate a new module and when not? a potential problem is not simply module specific global state but that the e.g. classes exported from two instances of the same module would be _distinct_ and so not interoperable. regards. From ben@algroup.co.uk Thu Mar 20 15:33:38 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Thu, 20 Mar 2003 15:33:38 +0000 Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: <003601c2eedd$fab452e0$6d94fea9@newmexico> References: <200303182317.h2INHtF09051@oma.cosc.canterbury.ac.nz> <3E7998F6.6010201@algroup.co.uk> <003601c2eedd$fab452e0$6d94fea9@newmexico> Message-ID: <3E79DF52.3020305@algroup.co.uk> Samuele Pedroni wrote: >>Making it explicit would make me happy. Can you pass parameters to an >>import? >> > > > not directly, > > an extension like > > import module(parmmod=....,...) > > would not seem totally unreasonable. > > The problem is that normally modules are uniquely globally identified > singletons, but the very notion of parametrization implies instantiation and > that breaks the singleton part. When to instatiate a new module and when not? > > a potential problem is not simply module specific global state but that the > e.g. classes exported from two instances of the same module would be _distinct_ > and so not interoperable. I don't think we'd want to change them from being singletons, just restrict access to them based on capabilities. So, I was more thinking of something like: import(capability) module where the capability conveys the authority to import the module. Oh. I see the problem: if module A imports module B, and then module A is imported in turn by C and D, with C having a capability to B that it hands to A, but D _not_ doing so, then where are we? I suppose we would say that the import of A into D failed in that case. Of course, this still leaves open the question of how we pass the authority to import into the module, so I guess it would look like: import(cap1) module(cap2,cap3,...) and cap2 etc. would have to only be used in the import statements in the module. And this is getting messy. OTOH, my original idea was that the only modules a capability-enabled module would be allowed to import would be ones that are either capability-safe, or modules in the same "package" (for some definition of package). Any other module would have to be imported by whoever initialised the capability environment, and appropriate capabilities handed in to the capability-enabled objects. This sounds cleaner to me, if somewhat nebulous at the moment. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From zooko@zooko.com Thu Mar 20 21:34:00 2003 From: zooko@zooko.com (Zooko) Date: Thu, 20 Mar 2003 16:34:00 -0500 Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: Message from Greg Ewing of "Wed, 19 Mar 2003 11:17:55 +1200." <200303182317.h2INHtF09051@oma.cosc.canterbury.ac.nz> References: <200303182317.h2INHtF09051@oma.cosc.canterbury.ac.nz> Message-ID: Greg Ewing wrote: > > Maybe, instead of there being one ultra-global namespace for importing > modules from, it should be part of a function's environment. By > default a function invocation would inherit the "import environment" > of it's caller, but the caller could override this to provide a more > restricted environment. This is a reasonable idea too. It bears an intriguing similarity to scoping in general. One can put capabilities into local variables, and then functions and classes that you define inside that scope automatically have access to them. That doesn't work for separate modules, of course, which have no enclosing lexical scope. So your proposal seems sort of like a kind of dynamic scoping for modules, but instead of the imported module having access to all variables in the scope of the "import" statement (the *lexical* scope of the import statement), it has access to specific ones -- either a special "variables accessible to imported modules" dict or specially flagged ones, or something. For what it's worth, the solution to this problem in E is quite elegant. When code is loaded from a module, it is executed with optional arguments. So if your spam module requires a TCP socket, you can write (transliterating to Python syntax): # Python with E's parameterized import import socket import spam(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) If spam needs access to the eggs module, you could write: import eggs import spam(eggs) But as Samuele Pedroni has pointed out [1] there are deeper problems here, namely that modules are currently singletons, which doesn't fit with the notion of parameterization. It also doesn't fit with security! Consider a module that is safe to use if you give it your credit card number, and safe to use if you give it a network socket, but unsafe if you give it both! Capabilities offer this kind of security -- you can arrange it so that nobody else can give privileges to an object, thus allowing you to give the object privileges which would otherwise be dangerous. This is easy with objects in cap-Python, but not with modules: ttt1 = TicTacToe() ttt1.verify_card_number(mycreditcard) ttt2 = TicTacToe() ttt2.connect_to_server(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) So one design for cap-Python might say that only safe modules can be imported by cap-Python code. Every unsafe privilege would have to be granted by using references (passed as arguments, assigned to variables, etc.). No authority would ever be made available to capability-secured code through "import". This might not be much of a loss, since all of the unsafe stuff that you can currently import -- socket, os, etc. -- is rather too coarse-grained anyway and will almost certainly be wrapped in a finer-grained interface before being given to capability-confined code. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links [1] http://mail.python.org/pipermail/python-dev/2003-March/034172.html From dave@boost-consulting.com Thu Mar 20 21:43:51 2003 From: dave@boost-consulting.com (David Abrahams) Date: Thu, 20 Mar 2003 16:43:51 -0500 Subject: [Python-Dev] Re: More int/long integration issues References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> Message-ID: Guido van Rossum writes: > The bytecode compiler should be clever enough to see that you're > writing > > for i in range(...): ... > > and that there's no definition of range other than the built-in one > (this requires a subtle change of language rules); it can then > substitute an internal equivalent to xrange(). Ouch! What happens to: def foo(seq): for x in seq: ... foo(xrange(small, really_big)) if xrange dies?? -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Thu Mar 20 22:33:35 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 20 Mar 2003 17:33:35 -0500 Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: Your message of "Thu, 20 Mar 2003 16:43:51 EST." References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> Message-ID: <200303202233.h2KMXbG07782@odiug.zope.com> > Guido van Rossum writes: > > > The bytecode compiler should be clever enough to see that you're > > writing > > > > for i in range(...): ... > > > > and that there's no definition of range other than the built-in one > > (this requires a subtle change of language rules); it can then > > substitute an internal equivalent to xrange(). > > Ouch! What happens to: > > def foo(seq): > for x in seq: > ... > > foo(xrange(small, really_big)) > > if xrange dies?? Good point. I guess xrange() can't die until range() becomes an iterator (which can't be before Python 3.0). Hm, maybe range() shouldn't be an iterator but an interator generator. No time to explain; see the discussion about restartable iterators. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Thu Mar 20 22:46:36 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 20 Mar 2003 16:46:36 -0600 Subject: [Python-Dev] socket timeouts fail w/ makefile() Message-ID: <15994.17612.495528.162817@montanaro.dyndns.org> I discovered much to my chagrin today that the socket module's new timeout capability doesn't play well with file objects as returned by a socket's makefile method. Tim O'Malley's timeoutsocket module avoids this problem by implementing a simple file-like object directly on top of the socket without calling makefile(). Is there some reason this approach wasn't adopted when adding timeouts to the socket module? I would think the greatest use of timeouts would be using higher-level line-oriented modules like urllib and ftplib. In addition, since makefile() isn't always available, it seems worthwhile to implement something in socket.py, thus making makefile() universally available. I filed a bug report about this issue earlier today in case people are interested: http://www.python.org/sf/707074 Skip From greg@cosc.canterbury.ac.nz Thu Mar 20 23:16:13 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 21 Mar 2003 11:16:13 +1200 (NZST) Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: <003601c2eedd$fab452e0$6d94fea9@newmexico> Message-ID: <200303202316.h2KNGDM07222@oma.cosc.canterbury.ac.nz> Samuele Pedroni : > The problem is that normally modules are uniquely globally identified > singletons, but the very notion of parametrization implies instantiation and > that breaks the singleton part. Python already has things you can instantiate -- they're called classes! Seems to me if you want instantiation, you should be using a class, not a module. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From skip@pobox.com Thu Mar 20 23:38:54 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 20 Mar 2003 17:38:54 -0600 Subject: [Python-Dev] csv package stitched into CVS hierarchy Message-ID: <15994.20750.356162.465058@montanaro.dyndns.org> The csv package is now in the main branch of the CVS hierarchy. I will leave the structure in the sandbox for a few days before "cvs remove"ing it in case I missed something. Skip From greg@cosc.canterbury.ac.nz Thu Mar 20 23:51:06 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 21 Mar 2003 11:51:06 +1200 (NZST) Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) In-Reply-To: Message-ID: <200303202351.h2KNp6707299@oma.cosc.canterbury.ac.nz> Zooko : > So your proposal seems sort of like a kind of dynamic scoping for > modules Yes, it would be dynamic scoping of the import namespace. The reason I think it needs to be dynamic rather than lexical is that it isn't really objects or functions that we want to allow or deny capabilities to, it's *users* (for some suitably general notion of "user"). It may be okay for a particular method to do something when it's called by one user, but not another. The current method of controlling access to modules by overriding __import__ suffers from the problem that a given module can only have one __import__ hook at a time. There's no way for different users of the same module to have different importing abilities. >From what's been said about E, it seems that the solution there is to have instantiable modules (which means they're more like classes than modules, in Python terms) and to explicitly pass a lot of capabilities around. It seems to me that you'd end up with a lot of extra parameters to pass around in calls that way, and most of the time you'd just be passing on what had been passed to you -- hence my suggestion of dynamic scoping. But, not having studied any real E code, it may be that it doesn't turn out to be that bad in practice. Probably I shouldn't say any more until I know what I'm talking about... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From pedronis@bluewin.ch Thu Mar 20 23:54:21 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Fri, 21 Mar 2003 00:54:21 +0100 Subject: [Python-Dev] capability-mediated modules (was: python-dev Summary for 2003-03-01 through 2003-03-15) References: <200303202316.h2KNGDM07222@oma.cosc.canterbury.ac.nz> Message-ID: <024c01c2ef3c$076aa120$6d94fea9@newmexico> > Samuele Pedroni : > > > The problem is that normally modules are uniquely globally identified > > singletons, but the very notion of parametrization implies instantiation and > > that breaks the singleton part. to makes things clearer > Python already has things you can instantiate -- they're > called classes! Python already has things you can parametrize -- they're called classes! > Seems to me if you want instantiation, you should be using > a class, not a module. Seems to me if you want parametrization, you should be using a class, not a module. Maybe. [ what is sometimes called a "unit" that means a parametrizable and instatiable module can be a useful generic-programming construct. ] the underlying questions is how much cap-Python programming can be like/we want it like current Python programming? for example concretely, module and imports are often used to access "program-wide" factories. Do we want cap-confined client code to be rewritten in order to pass the factories or single factory-constructed objects otherwise: [Zooko] >So one design for cap-Python might say that only safe modules can be imported by >cap-Python code. Every unsafe privilege would have to be granted by using >references (passed as arguments, assigned to variables, etc.). No authority >would ever be made available to capability-secured code through "import". or not. There are trade-offs in terms of necessary semantics changes/complexity vs. language overall feeling preservation and legacy code reuse and adaptation. From neal@metaslash.com Fri Mar 21 00:33:40 2003 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 20 Mar 2003 19:33:40 -0500 Subject: [Python-Dev] csv package stitched into CVS hierarchy In-Reply-To: <15994.20750.356162.465058@montanaro.dyndns.org> References: <15994.20750.356162.465058@montanaro.dyndns.org> Message-ID: <20030321003340.GN14067@epoch.metaslash.com> On Thu, Mar 20, 2003 at 05:38:54PM -0600, Skip Montanaro wrote: > The csv package is now in the main branch of the CVS hierarchy. I will > leave the structure in the sandbox for a few days before "cvs remove"ing it > in case I missed something. Tim, can you do the magic to make sure the CSV module is in the Windows distribution? I think this means modifying PCbuild/python20.wse at least. Neal From guido@python.org Fri Mar 21 00:57:34 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 20 Mar 2003 19:57:34 -0500 Subject: [Python-Dev] socket timeouts fail w/ makefile() In-Reply-To: "Your message of Thu, 20 Mar 2003 16:46:36 CST." <15994.17612.495528.162817@montanaro.dyndns.org> References: <15994.17612.495528.162817@montanaro.dyndns.org> Message-ID: <200303210057.h2L0vY608028@pcp02138704pcs.reston01.va.comcast.net> > I discovered much to my chagrin today that the socket module's new timeout > capability doesn't play well with file objects as returned by a socket's > makefile method. Can you explain better how it doesn't work? > Tim O'Malley's timeoutsocket module avoids this problem by > implementing a simple file-like object directly on top of the socket > without calling makefile(). Is there some reason this approach > wasn't adopted when adding timeouts to the socket module? I guess nobody thought of this so far. > I would think the greatest use of timeouts would be using > higher-level line-oriented modules like urllib and ftplib. In > addition, since makefile() isn't always available, it seems > worthwhile to implement something in socket.py, thus making > makefile() universally available. Um, when is makefile() not available? There's code for Windows that emulates it, returning a file-like object. Maybe that code should be enabled universally rather than only on Windows... > I filed a bug report about this issue earlier today in case people > are interested: > > http://www.python.org/sf/707074 I'm interested, but have no time... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Fri Mar 21 01:39:48 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 20 Mar 2003 20:39:48 -0500 Subject: [Python-Dev] csv package stitched into CVS hierarchy In-Reply-To: <20030321003340.GN14067@epoch.metaslash.com> Message-ID: [Neal Norwitz] > Tim, can you do the magic to make sure the CSV module is in the > Windows distribution? I think this means modifying > PCbuild/python20.wse at least. There's a lot of stuff that needs to be done to add a new separately compiled module. The good news is the same as the bad news here: a piece gets dropped on the floor if and only if it isn't accessed by the standard test suite. In this case, it looks like the test suite covers it, so be of good cheer: nothing will get forgotten (except for whatever pieces test_csv.py forgets to test ). From skip@pobox.com Fri Mar 21 03:15:45 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 20 Mar 2003 21:15:45 -0600 Subject: [Python-Dev] socket timeouts fail w/ makefile() In-Reply-To: <200303210057.h2L0vY608028@pcp02138704pcs.reston01.va.comcast.net> References: <15994.17612.495528.162817@montanaro.dyndns.org> <200303210057.h2L0vY608028@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15994.33761.677545.551348@montanaro.dyndns.org> >> I discovered much to my chagrin today that the socket module's new >> timeout capability doesn't play well with file objects as returned by >> a socket's makefile method. Guido> Can you explain better how it doesn't work? When the socket is in non-blocking mode, reads on the file returned by .makefile() will fail with an IOError if there is nothing to return. >> I would think the greatest use of timeouts would be using >> higher-level line-oriented modules like urllib and ftplib. In >> addition, since makefile() isn't always available, it seems >> worthwhile to implement something in socket.py, thus making >> makefile() universally available. Guido> Um, when is makefile() not available? I don't know. I was going by the doc string in socketmodule.c which says, in part: ... makefile([mode, [bufsize]]) -- return a file object for the socket [*]\n\ ... [*] not available on all platforms!"); Maybe the docs are just wrong. According to the #ifdef in the code, if NO_DUP is defined (OS/2, Windows, BeOS), makefile() isn't. Guido> There's code for Windows that emulates it, returning a file-like Guido> object. Maybe that code should be enabled universally rather Guido> than only on Windows... That sounds similar to what is in timeoutsocket.py. It would have the advantage of providing identical semantics across all platforms. Skip From guido@python.org Fri Mar 21 11:17:29 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 21 Mar 2003 06:17:29 -0500 Subject: [Python-Dev] socket timeouts fail w/ makefile() In-Reply-To: "Your message of Thu, 20 Mar 2003 21:15:45 CST." <15994.33761.677545.551348@montanaro.dyndns.org> References: <15994.17612.495528.162817@montanaro.dyndns.org> <200303210057.h2L0vY608028@pcp02138704pcs.reston01.va.comcast.net> <15994.33761.677545.551348@montanaro.dyndns.org> Message-ID: <200303211117.h2LBHTh23630@pcp02138704pcs.reston01.va.comcast.net> > >> I discovered much to my chagrin today that the socket > >> module's new timeout capability doesn't play well with file > >> objects as returned by a socket's makefile method. > > Guido> Can you explain better how it doesn't work? > > When the socket is in non-blocking mode, reads on the file returned by > .makefile() will fail with an IOError if there is nothing to return. Isn't that exactly what a timeout is supposed to do? What would you have expected? > >> I would think the greatest use of timeouts would be using > >> higher-level line-oriented modules like urllib and ftplib. In > >> addition, since makefile() isn't always available, it seems > >> worthwhile to implement something in socket.py, thus making > >> makefile() universally available. > > Guido> Um, when is makefile() not available? > > I don't know. I was going by the doc string in socketmodule.c which > says, in part: > > ... > makefile([mode, [bufsize]]) -- return a file object for the socket [*]\n\ > ... > [*] not available on all platforms!"); > > Maybe the docs are just wrong. According to the #ifdef in the code, if > NO_DUP is defined (OS/2, Windows, BeOS), makefile() isn't. That's the docs for the _socket module, which is (nowadays) an implementation detail. Read socket.py instead. > Guido> There's code for Windows that emulates it, returning a > Guido> file-like object. Maybe that code should be enabled > Guido> universally rather than only on Windows... > > That sounds similar to what is in timeoutsocket.py. It would have the > advantage of providing identical semantics across all platforms. Again, I won't have time to do this until after I'm back from Python UK, so I'd appreciate it if someone helped with this, e.g. by filing a patch. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Fri Mar 21 12:28:39 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 21 Mar 2003 06:28:39 -0600 Subject: [Python-Dev] socket timeouts fail w/ makefile() In-Reply-To: <200303211117.h2LBHTh23630@pcp02138704pcs.reston01.va.comcast.net> References: <15994.17612.495528.162817@montanaro.dyndns.org> <200303210057.h2L0vY608028@pcp02138704pcs.reston01.va.comcast.net> <15994.33761.677545.551348@montanaro.dyndns.org> <200303211117.h2LBHTh23630@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15995.1399.226407.866954@montanaro.dyndns.org> >> When the socket is in non-blocking mode, reads on the file returned >> by .makefile() will fail with an IOError if there is nothing to >> return. Guido> Isn't that exactly what a timeout is supposed to do? What would Guido> you have expected? Sorry, I wasn't clear. It fails immediately. The timeout isn't observed. >> makefile([mode, [bufsize]]) -- return a file object for the socket [*]\n\ >> ... >> [*] not available on all platforms!"); Guido> That's the docs for the _socket module, which is (nowadays) an Guido> implementation detail. Read socket.py instead. Maybe _socket shouldn't have such a detailed doc string or should indicate its subservient relationship to socket? I was reading it as if it was a comment in the code, which, in theory, should still be accurate. Guido> Again, I won't have time to do this until after I'm back from Guido> Python UK, so I'd appreciate it if someone helped with this, Guido> e.g. by filing a patch. I'll take a look. There's a bug already in the system (http://www.python.org/sf/707074) to which a patch could be applied, so if someone comes up with something, that's where it goes. Skip From dave@boost-consulting.com Fri Mar 21 14:43:36 2003 From: dave@boost-consulting.com (David Abrahams) Date: Fri, 21 Mar 2003 09:43:36 -0500 Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: <200303202233.h2KMXbG07782@odiug.zope.com> (Guido van Rossum's message of "Thu, 20 Mar 2003 17:33:35 -0500") References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <200303202233.h2KMXbG07782@odiug.zope.com> Message-ID: Guido van Rossum writes: >> Guido van Rossum writes: >> >> > The bytecode compiler should be clever enough to see that you're >> > writing >> > >> > for i in range(...): ... >> > >> > and that there's no definition of range other than the built-in one >> > (this requires a subtle change of language rules); it can then >> > substitute an internal equivalent to xrange(). >> >> Ouch! What happens to: >> >> def foo(seq): >> for x in seq: >> ... >> >> foo(xrange(small, really_big)) >> >> if xrange dies?? > > Good point. I guess xrange() can't die until range() becomes an > iterator (which can't be before Python 3.0). > > Hm, maybe range() shouldn't be an iterator but an interator > generator. No time to explain; see the discussion about restartable > iterators. I think what you mean is fairly obvious. list et al. are iterator generators, right? It's just a thing with an __iter__ function which produces an iterator? If so, I tend to agree that's the right behavior for range(). range(x,y,z) should be an immutable object. -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Fri Mar 21 14:55:16 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 21 Mar 2003 09:55:16 -0500 Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: "Your message of Fri, 21 Mar 2003 09:43:36 EST." References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <200303202233.h2KMXbG07782@odiug.zope.com> Message-ID: <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> > > Hm, maybe range() shouldn't be an iterator but an interator > > generator. No time to explain; see the discussion about restartable > > iterators. > > I think what you mean is fairly obvious. list et al. are iterator > generators, right? It's just a thing with an __iter__ function which > produces an iterator? > > If so, I tend to agree that's the right behavior for range(). > range(x,y,z) should be an immutable object. Yes. Idioms like this are used fairly often: seq = range(...) for i in seq: ... for i in seq: ... # etc. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Fri Mar 21 17:53:03 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 21 Mar 2003 11:53:03 -0600 Subject: [Python-Dev] socket timeouts fail w/ makefile() Message-ID: <15995.20863.424664.1587@montanaro.dyndns.org> Guido> Again, I won't have time to do this until after I'm back from Guido> Python UK, so I'd appreciate it if someone helped with this, Guido> e.g. by filing a patch. Skip> I'll take a look. I attached a patch to http://www.python.org/sf/707074 which makes the socket wrapper unconditional, and added a new test case (test_urllibnet.py - requires the 'network' resource) which fails before applying the patch and succeeds afterward. Would someone else like to take a look at it? Guido's the natural candidate, but is busy with near-term conferences. Thx, Skip From Tino.Lange@isg.de Fri Mar 21 19:15:20 2003 From: Tino.Lange@isg.de (Tino Lange) Date: Fri, 21 Mar 2003 20:15:20 +0100 Subject: [Python-Dev] New Module? Tiger Hashsum Message-ID: <3E7B64C8.F3302144@isg.de> Hi! Today I suddenly needed the tiger hashsums from python - it's not included in the standard distribution and I couldn't find it somewhere. So I thought that's it's maybe time again to contribute :-) It was a quite straight forward task to write a wrapper that is able to calculate such hash-sums from Python, besides the tiger.c/tiger.h it's only a few lines of code. It runs perfect under Linux with distutils - I guess someone who knows windows better has to look for a windows port beacuse of the 'long long' integers (shouldn't be too hard) ... But at least for me it's really useful: > >>> import tiger > >>> tiger.tiger("Python is cool... And now it can even calculate tiger hashsums!") > (135509944, 135510340, 135510352, 135509920, 135510016, 135197188) > >>> tiger.__doc__ > 'This module gives you access to the fast, cryptographic tiger hash function from Eli Biham, see http://www.cs.technion.ac.il/~biham/Reports/Tiger/ for details.' > >>> tiger.tiger.__doc__ > 'tiger(string) -> (int, int, int, int, int, int) -- compute a 192 bit hash-sum of given string (which can contain zero characters)' Are you interested to get the code, maybe for the next release? Shall I send it to someone of you developers? Or upload somewhere to your project page? Or just send it here as attachment? Just let me know. Best regards! Tino From skip@pobox.com Fri Mar 21 19:36:13 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 21 Mar 2003 13:36:13 -0600 Subject: [Python-Dev] New Module? Tiger Hashsum In-Reply-To: <3E7B64C8.F3302144@isg.de> References: <3E7B64C8.F3302144@isg.de> Message-ID: <15995.27053.118634.347706@montanaro.dyndns.org> Tino> I guess someone who knows windows better has to look for a windows Tino> port beacuse of the 'long long' integers (shouldn't be too hard) Tino> ... The LONG_LONG macro is defined in Python's Include/pyport.h file. Just use it instead of 'long long'. On Windows I think 'long long' is spelled '__int64'. Skip From Tino.Lange@isg.de Fri Mar 21 20:02:31 2003 From: Tino.Lange@isg.de (Tino Lange) Date: Fri, 21 Mar 2003 21:02:31 +0100 Subject: [Python-Dev] New Module? Tiger Hashsum References: <3E7B64C8.F3302144@isg.de> <15995.27053.118634.347706@montanaro.dyndns.org> Message-ID: <3E7B6FD7.193B1981@isg.de> Skip, Ah, great! Thank you! I'll try that tomorrow with MSVC 6. Cheers, Tino ---------- Skip Montanaro wrote: > > Tino> I guess someone who knows windows better has to look for a windows > Tino> port beacuse of the 'long long' integers (shouldn't be too hard) > Tino> ... > > The LONG_LONG macro is defined in Python's Include/pyport.h file. Just use > it instead of 'long long'. On Windows I think 'long long' is spelled > '__int64'. > > Skip From martin@v.loewis.de Fri Mar 21 22:11:12 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 21 Mar 2003 23:11:12 +0100 Subject: [Python-Dev] New Module? Tiger Hashsum In-Reply-To: <3E7B64C8.F3302144@isg.de> References: <3E7B64C8.F3302144@isg.de> Message-ID: Tino Lange writes: > Are you interested to get the code, maybe for the next release? Shall I > send it to someone of you developers? Or upload somewhere to your > project page? Or just send it here as attachment? Dear Tino, We are usually reluctant to add modules to the Python core distribution, until there is some user community interested in that module. Until then, I recommend you submit your module to the Vaults of Parnassus, and announce it to comp.lang.python.announce. Regards, Martin From cnetzer@mail.arc.nasa.gov Fri Mar 21 22:42:07 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 21 Mar 2003 14:42:07 -0800 Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <200303202233.h2KMXbG07782@odiug.zope.com> <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1048286527.651.29.camel@sayge.arc.nasa.gov> On Fri, 2003-03-21 at 06:55, Guido van Rossum wrote: > > > Hm, maybe range() shouldn't be an iterator but an interator > > > generator. No time to explain; see the discussion about restartable > > > iterators. Hmmm. Now that've uploaded my patch extending range() to longs, I'd like to work on this. I've already written a C range() iterator (incorporating PyLongs), and it would be very nice to have it automatically be a lazy range() when used in a loop. In any case, assuming you are quite busy, but would consider this for the 2.4 timeframe, I will do some work on it. If it is already being covered, I'll gladly stay away from it. :) -- Bay Area Python Interest Group - http://www.baypiggies.net/ Chad Netzer (any opinion expressed is my own and not NASA's or my employer's) From Tino.Lange@isg.de Sat Mar 22 08:42:20 2003 From: Tino.Lange@isg.de (Tino Lange) Date: Sat, 22 Mar 2003 09:42:20 +0100 Subject: [Python-Dev] Icon for Python RSS Feed? Message-ID: <3E7C21EC.1A5BD79@isg.de> Hi! Can you add an nice icon for the news XML RDF resource http://www.python.org/channews.rdf for example in http://www.python.org/favicon.ico? I think this is the standard location - at least for my KNewsTicker. Then your news will be clearly marked as Python News in the Newsticker :-) Thanks and have a nice day! Tino From dave@boost-consulting.com Sat Mar 22 23:25:00 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 22 Mar 2003 18:25:00 -0500 Subject: [Python-Dev] How to suppress instance __dict__? Message-ID: I am generating extension types derived from a type which is derived from int 'int' by calling the metaclass; in order to prevent instances of the most-derived type from getting an instance __dict__ I am putting an empty tuple in the class __dict__ as '__slots__'. The problem with this hack is that it disables pickling of these babies: "a class that defines __slots__ without defining __getstate__ cannot be pickled" Yes, I can define __getstate__, __setstate__, and __getinitargs__ (the only one that can actually do any work, since ints are immutable), but I was wondering if there's a more straightforward way to suppress the instance __dict__ in the derived classes. TIA, -- Dave Abrahams Boost Consulting www.boost-consulting.com From martin@v.loewis.de Sun Mar 23 08:24:51 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 23 Mar 2003 09:24:51 +0100 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: References: Message-ID: David Abrahams writes: > Yes, I can define __getstate__, __setstate__, and __getinitargs__ (the > only one that can actually do any work, since ints are immutable), > but I was wondering if there's a more straightforward way to suppress > the instance __dict__ in the derived classes. Setting tp_dictoffset to 0 might help. However, I'm unsure what consequences this has; read the source. Regards, Martin From dave@boost-consulting.com Sun Mar 23 12:58:30 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 23 Mar 2003 07:58:30 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: (martin@v.loewis.de's message of "23 Mar 2003 09:24:51 +0100") References: Message-ID: martin@v.loewis.de (Martin v. L=F6wis) writes: > David Abrahams writes: > >> Yes, I can define __getstate__, __setstate__, and __getinitargs__ (the >> only one that can actually do any work, since ints are immutable), >> but I was wondering if there's a more straightforward way to suppress >> the instance __dict__ in the derived classes. > > Setting tp_dictoffset to 0 might help.=20 AFAICT I don't get to do that, since as I wrote: I am generating extension types derived from a type which is derived from int 'int' by calling the metaclass ^^^^^^^^^^^^^^^^^^^^^^^^ > However, I'm unsure what consequences this has; read the source. Unfortunately, this is one of the twistiest areas of the Python source, so while I could struggle through it I'm hoping there's someone around here who knows the answer off the top of his benevolent Dutch head --=20 Dave Abrahams Boost Consulting www.boost-consulting.com From skip@mojam.com Sun Mar 23 13:00:21 2003 From: skip@mojam.com (Skip Montanaro) Date: Sun, 23 Mar 2003 07:00:21 -0600 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200303231300.h2ND0LH14101@manatee.mojam.com> Bug/Patch Summary ----------------- 372 open / 3479 total bugs (+23) 133 open / 2033 total patches (+11) New Bugs -------- ncurses/curses on solaris (2003-03-10) http://python.org/sf/700780 Compiler Limits (indentation) (2003-03-10) http://python.org/sf/700827 WINDOW in py_curses.h needs ncurses-devel (2003-03-11) http://python.org/sf/701751 configure option --enable-shared make problems (2003-03-11) http://python.org/sf/701823 Thread running (os.system or popen#) (2003-03-11) http://python.org/sf/701836 getsockopt/setsockopt with SO_RCVTIMEO are inconsistent (2003-03-11) http://python.org/sf/701936 --without-cxx flag of configure isn't documented. (2003-03-12) http://python.org/sf/702147 No documentation of static/dynamic python modules. (2003-03-12) http://python.org/sf/702157 dumbdbm __del__ bug (2003-03-12) http://python.org/sf/702775 deepcopy can't copy self-referential new-style classes (2003-03-13) http://python.org/sf/702858 os.utime can fail with TypeError (2003-03-13) http://python.org/sf/703066 os.popen with mode "rb" fails on Unix (2003-03-13) http://python.org/sf/703198 Several objects don't decref tmp on failure in subtype_new (2003-03-14) http://python.org/sf/703666 strange warnings messages in interpreter (2003-03-14) http://python.org/sf/703779 Problems printing and sleep (2003-03-15) http://python.org/sf/704194 _tkinter.c won't build w/o threads? (2003-03-16) http://python.org/sf/704641 Problems building python with tkinter on HPUX... (2003-03-17) http://python.org/sf/704919 python-mode.el: sexp commands don't understand Python (2003-03-17) http://python.org/sf/705005 imap docs: s/criterium/criterion/ (2003-03-17) http://python.org/sf/705120 Assertion failed, python aborts (2003-03-17) http://python.org/sf/705231 Error when using PyZipFile to create archive (2003-03-17) http://python.org/sf/705295 test_atexit fails in directories with spaces (2003-03-18) http://python.org/sf/705792 python accepts illegal "import mod.sub as name" syntax (2003-03-19) http://python.org/sf/706253 print raises exception when no console available (2003-03-19) http://python.org/sf/706263 test_socket fails when not connected (2003-03-19) http://python.org/sf/706450 u''.translate not documented (2003-03-19) http://python.org/sf/706546 Expose FinderInfo in FSCatalogInfo (2003-03-19) http://python.org/sf/706585 Crbon.File.FSSpec should accept non-existing pathnames (2003-03-19) http://python.org/sf/706592 codecs.open and iterators (2003-03-19) http://python.org/sf/706595 timeouts incompatible w/ line-oriented protocols (2003-03-20) http://python.org/sf/707074 -i -u options give SyntaxError on Windows (2003-03-21) http://python.org/sf/707576 elisp: IM-python menu and newline in function defs (2003-03-21) http://python.org/sf/707707 math.fabs documentation is misleading (2003-03-22) http://python.org/sf/708205 DistributionMetaData error ? (2003-03-23) http://python.org/sf/708320 New Patches ----------- Replacing and deleting files in a zipfile archive. (2003-03-10) http://python.org/sf/700858 Wrong prototype for PyUnicode_Splitlines on documentation (2003-03-11) http://python.org/sf/701395 more apply removals (2003-03-11) http://python.org/sf/701494 Reloading pseudo modules (2003-03-11) http://python.org/sf/701743 AE Inheritance fixes (2003-03-12) http://python.org/sf/702620 Kill off docs for unsafe macros (2003-03-13) http://python.org/sf/702933 add direct access to MD5 compression function to md5 module (2003-03-16) http://python.org/sf/704676 Fix a few broken links in pydoc (2003-03-19) http://python.org/sf/706338 fix bug #685846: raw_input defers signals (2003-03-19) http://python.org/sf/706406 Adds Mock Object support to unittest.TestCase (2003-03-19) http://python.org/sf/706590 time.tzset standards compliance update (2003-03-19) http://python.org/sf/706707 fix bug #682813: dircache.listdir doesn't signal error (2003-03-20) http://python.org/sf/707167 Improve code generation (2003-03-20) http://python.org/sf/707257 Allow range() to return long integer values (2003-03-21) http://python.org/sf/707427 fix for #698517, Tkinter and tk8.4.2 (2003-03-21) http://python.org/sf/707701 bug fix 702858: deepcopying reflexive objects (2003-03-21) http://python.org/sf/707900 TelnetPopen3, TelnetBase, Expect split (2003-03-22) http://python.org/sf/708007 unchecked return value in import.c (2003-03-22) http://python.org/sf/708201 Closed Bugs ----------- printing email object deletes whitespace (2002-08-13) http://python.org/sf/594893 asynchat problems multi-threaded (2002-08-14) http://python.org/sf/595217 plat-mac not on sys.path (2003-01-03) http://python.org/sf/661521 codec registry and Python embedding problem (2003-01-06) http://python.org/sf/663074 email.Header() encoding does not work properly (2003-01-27) http://python.org/sf/675420 Applet support is broken (2003-02-18) http://python.org/sf/688907 test_cpickle overflows stack on MacOS9 (2003-02-21) http://python.org/sf/690622 PyMac_GetFSRef should accept unicode (2003-03-02) http://python.org/sf/696253 test_posix fails: getlogin (2003-03-04) http://python.org/sf/697556 list.index() bhvr change > python2.x (2003-03-06) http://python.org/sf/698561 Tutorial uses omitted slice indices before explaining them (2003-03-06) http://python.org/sf/699237 ncurses/curses on solaris (2003-03-07) http://python.org/sf/699379 MIMEText's c'tor adds unwanted trailing newline to text (2003-03-07) http://python.org/sf/699600 Closed Patches -------------- Put IDE scripts in ~/Library (2002-07-08) http://python.org/sf/578667 Fix: asynchat.py: endless loop (2002-12-06) http://python.org/sf/649762 (email) Escape backslashes in specialsre and escapesre (2003-01-06) http://python.org/sf/663369 HTMLParser -- allow "," in attributes (2003-01-17) http://python.org/sf/669683 test_htmlparser.py -- "," in attributes (2003-01-24) http://python.org/sf/674448 Add tzset method to time module (2003-01-27) http://python.org/sf/675422 bundlebuilder: Add dylibs, frameworks to the bundle (2003-02-06) http://python.org/sf/681927 allow proxy server authentication with pimp (2003-03-02) http://python.org/sf/696392 optparse unit tests + fixes (2003-03-05) http://python.org/sf/697939 From mwh@python.net Sun Mar 23 13:08:30 2003 From: mwh@python.net (Michael Hudson) Date: Sun, 23 Mar 2003 13:08:30 +0000 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: (David Abrahams's message of "Sun, 23 Mar 2003 07:58:30 -0500") References: Message-ID: <2my936kz75.fsf@starship.python.net> David Abrahams writes: > Unfortunately, this is one of the twistiest areas of the Python > source, so while I could struggle through it I'm hoping there's > someone around here who knows the answer off the top of his benevolent > Dutch head Well, I'm familiar enough with that bit of the source (search for "add_dict" in typeobject.c) to answer your question: no, there's no more straightforward way to suppress the instance __dict__ in the derived classes. Cheers, M. -- The rapid establishment of social ties, even of a fleeting nature, advance not only that goal but its standing in the uberconscious mesh of communal psychic, subjective, and algorithmic interbeing. But I fear I'm restating the obvious. -- Will Ware, comp.lang.python From guido@python.org Sun Mar 23 13:21:12 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 23 Mar 2003 08:21:12 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: "Your message of Sat, 22 Mar 2003 18:25:00 EST." References: Message-ID: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> > I am generating extension types derived from a type which is derived > from int 'int' by calling the metaclass; in order to prevent instances > of the most-derived type from getting an instance __dict__ I am > putting an empty tuple in the class __dict__ as '__slots__'. The > problem with this hack is that it disables pickling of these babies: > > "a class that defines __slots__ without defining __getstate__ > cannot be pickled" > > Yes, I can define __getstate__, __setstate__, and __getinitargs__ (the > only one that can actually do any work, since ints are immutable), > but I was wondering if there's a more straightforward way to suppress > the instance __dict__ in the derived classes. Actually, even __getinitargs__ won't work, because __init__ is called after the object is created. In Python 2.3, you'd use __getnewargs__, but I expect you're still bound to supporting Python 2.2 (Python 2.3 also doesn't have the error message above when pickling). I think you could subclass the metaclass, override __new__, and delete the bogus __getstate__ from the type's __dict__. Then you'll get the default pickling behavior which ignores slots; that should work just fine in your case. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From dave@boost-consulting.com Sun Mar 23 14:48:53 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 23 Mar 2003 09:48:53 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Sun, 23 Mar 2003 08:21:12 -0500") References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: >> I am generating extension types derived from a type which is derived >> from int 'int' by calling the metaclass; in order to prevent instances >> of the most-derived type from getting an instance __dict__ I am >> putting an empty tuple in the class __dict__ as '__slots__'. The >> problem with this hack is that it disables pickling of these babies: >> >> "a class that defines __slots__ without defining __getstate__ >> cannot be pickled" >> >> Yes, I can define __getstate__, __setstate__, and __getinitargs__ (the >> only one that can actually do any work, since ints are immutable), >> but I was wondering if there's a more straightforward way to suppress >> the instance __dict__ in the derived classes. > > Actually, even __getinitargs__ won't work, because __init__ is called > after the object is created. ...and ints are immutable. Right. > In Python 2.3, you'd use __getnewargs__, Cute. It's almost too bad that the distinction between __new__ and __init__ is there -- as we find we need to legitimize the use of __new__ with things like __getnewargs__ it be comes a little less clear which one should be used, and when. TIMTOWDI and all that. In the absence of clear guidelines I'm tempted to suggest that C++ got this part right. Occasionally we get people who think they want to call overridden virtual functions from constructors (I presume the analogous thing could be done safely from __init__ but not from __new__) but that's pretty rare. I'm interested in gaining insight into the Pythonic thinking behind __new__/__init__; I'm sure I don't have the complete picture. > but I expect you're still bound to supporting Python 2.2 Yup, I think it would be bad to force my users to move to an unreleased Python version at this point ;-) > (Python 2.3 also doesn't have the error message above when > pickling). Nice. Too bad about 2.2. > I think you could subclass the metaclass, override __new__, and delete > the bogus __getstate__ from the type's __dict__. Then you'll get the > default pickling behavior which ignores slots; that should work just > fine in your case. :-) Ooh, that's sneaky! But I can't quite see how it works. The error message I quoted at the top about __getstate__ happens when you try to pickle an instance of the class. If I delete __getstate__ during __new__, it won't be there for pickle to find when I try to do the pickling. What will keep it from inducing the same error? Thanks, -- Dave Abrahams Boost Consulting www.boost-consulting.com From pedronis@bluewin.ch Sun Mar 23 14:46:08 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Sun, 23 Mar 2003 15:46:08 +0100 Subject: [Python-Dev] [ot] offline Message-ID: <009801c2f14a$f0ba2480$6d94fea9@newmexico> I will be essentially offline for the next 2 weeks. regards. From dave@boost-consulting.com Sun Mar 23 16:41:17 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 23 Mar 2003 11:41:17 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Sun, 23 Mar 2003 10:46:40 -0500") References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: >> > I think you could subclass the metaclass, override __new__, and delete >> > the bogus __getstate__ from the type's __dict__. Then you'll get the >> > default pickling behavior which ignores slots; that should work just >> > fine in your case. :-) >> >> Ooh, that's sneaky! But I can't quite see how it works. The error >> message I quoted at the top about __getstate__ happens when you try to >> pickle an instance of the class. If I delete __getstate__ during >> __new__, it won't be there for pickle to find when I try to do the >> pickling. What will keep it from inducing the same error? > > Just try it. There are many ways to customize pickling, and if > __getstate__ doesn't exist, pickling is done differently. Since this doesn't work: >>> d = type('d', (object,), { '__slots__' : ['foo'] } ) >>> pickle.dumps(d()) I'm still baffled as to why this works: >>> class mc(type): ... def __new__(self, *args): ... x = type.__new__(self, *args) ... del args[2]['__getstate__'] ... return x ... >>> c = mc('c', (object,), { '__slots__' : ['foo'], '__getstate__' : lambda self: tuple() } ) >>> pickle.dumps(c()) 'ccopy_reg\n_reconstructor\np0\n(c__main__\nc\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n.' especially since: >>> dir(d) == dir(c) 1 I don't see the logic in the source for object.__reduce__(), so where is it? OK, I see it in typeobject.c. But now: >>> c.__getstate__ > OK, this seems to indicate that my attempt to remove __getstate__ from the class __dict__ was a failure. That explains why pickling c works, but not why you suggested that I remove __getstate__ inside of __new__. Did you mean for me to do something different? I note that c's __slots__ aren't pickled at all, which I guess was the point of the __getstate__ requirement: >>> x = c() >>> x.foo = 1 >>> pickle.dumps(x) == pickle.dumps(c()) 1 Fortunately, in our case the __slots__ are empty so it doesn't matter. -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Sun Mar 23 21:04:16 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 23 Mar 2003 16:04:16 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: "Your message of Sun, 23 Mar 2003 11:41:17 EST." References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303232104.h2NL4GQ04819@pcp02138704pcs.reston01.va.comcast.net> > Guido van Rossum writes: > >> > I think you could subclass the metaclass, override __new__, and delete > >> > the bogus __getstate__ from the type's __dict__. Then you'll get the > >> > default pickling behavior which ignores slots; that should work just > >> > fine in your case. :-) [David] > >> Ooh, that's sneaky! But I can't quite see how it works. The error > >> message I quoted at the top about __getstate__ happens when you try to > >> pickle an instance of the class. If I delete __getstate__ during > >> __new__, it won't be there for pickle to find when I try to do the > >> pickling. What will keep it from inducing the same error? [Guido] > > Just try it. There are many ways to customize pickling, and if > > __getstate__ doesn't exist, pickling is done differently. > > Since this doesn't work: > > >>> d = type('d', (object,), { '__slots__' : ['foo'] } ) > >>> pickle.dumps(d()) Um, you're changing the rules in the middle of the game. You said you had an *empty* __slots__. My recommendation only applied to that case. I also thought you were doing this from C, not from Python, but I may be mistaken. > I'm still baffled as to why this works: > > >>> class mc(type): > ... def __new__(self, *args): > ... x = type.__new__(self, *args) > ... del args[2]['__getstate__'] Hm. I don't think that x.__dict__ is args[2]; it's a copy, and deleting __getstate__ from the arguments doesn't make any difference to this example. > ... return x > ... > >>> c = mc('c', (object,), { '__slots__' : ['foo'], '__getstate__' : lambda self: tuple() } ) Why are you passing a __getstate__ in? The point was getting rid of the __getstate__ that type.__new__ inserts. > >>> pickle.dumps(c()) > 'ccopy_reg\n_reconstructor\np0\n(c__main__\nc\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n.' > > especially since: > > >>> dir(d) == dir(c) > 1 I think you have been testing something very different from what you think you did here. dir(d) == dir(c) because they both have a __getstate__; but d.__getstate__ is a built-in that raises an exception, while c.__getstate__ is the lambda you passed in. And have you tried unpickling yet? I expect it to fail. > I don't see the logic in the source for object.__reduce__(), so where > is it? OK, I see it in typeobject.c. But now: > > >>> c.__getstate__ > > > > OK, this seems to indicate that my attempt to remove __getstate__ from > the class __dict__ was a failure. That explains why pickling c works, > but not why you suggested that I remove __getstate__ inside of > __new__. Did you mean for me to do something different? Yes. I was assuming you'd do this at the C level. To do what I suggested in Python, I think you'd have to write this: class M(type): def __new__(cls, name, bases, dict): C = type.__new__(cls, name, bases, dict) del C.__getstate__ return C > I note that c's __slots__ aren't pickled at all, which I guess was the > point of the __getstate__ requirement: > > >>> x = c() > >>> x.foo = 1 > >>> pickle.dumps(x) == pickle.dumps(c()) > 1 > > Fortunately, in our case the __slots__ are empty so it doesn't matter. Right. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 23 21:15:15 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 23 Mar 2003 16:15:15 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: "Your message of Sun, 23 Mar 2003 09:48:53 EST." References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303232115.h2NLFFA04846@pcp02138704pcs.reston01.va.comcast.net> > It's almost too bad that the distinction between __new__ and __init__ > is there -- as we find we need to legitimize the use of __new__ with > things like __getnewargs__ it be comes a little less clear which one > should be used, and when. TIMTOWDI and all that. __new__ creates a new, initialized object. __init__ sets some values in an exsting object. __init__ is a regular method and can be called to reinitialize an existing object (not that I recommend this, but the mechanism doesn't forbid it). It follows that immutable objects must be initialized using __new__, since by the time __init__ is called the object already exists and is immutable. > In the absence of clear guidelines I'm tempted to suggest that C++ got > this part right. Of course you would. I tend to think that Python's analogon to C++ constructors is __new__, and that __init__ is a different mechanism (although it can often be used where you would use a constructor in C++). > Occasionally we get people who think they want to call overridden > virtual functions from constructors (I presume the analogous thing > could be done safely from __init__ but not from __new__) Whether or not that can be done safely from __init__ depends on the subclass __init__; it's easy enough to construct examples that don't work. But yes, for __new__ the situation is more analogous to C++, except that AFAIK in C++ when you try that you get the base class virtual function, while in Python you get the overridden method -- which finds an instance that is incompletely initialized. > but that's pretty rare. I'm interested in gaining insight into the > Pythonic thinking behind __new__/__init__; I'm sure I don't have the > complete picture. __new__ was introduced to allow initializing immutable objects. It really applies more to types implemented in C than types implemented in Python. But it is needed so that a Python subclass of an immutable C base classs can pass arguments of its choice to the C base class's constructor. > Nice. Too bad about 2.2. Maybe the new pickling could be backported, but I fear that it depends on some other 2.3 feature that's harder to backport, so I haven't looked into this. --Guido van Rossum (home page: http://www.python.org/~guido/) From dave@boost-consulting.com Sun Mar 23 21:45:48 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 23 Mar 2003 16:45:48 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: <200303232104.h2NL4GQ04819@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Sun, 23 Mar 2003 16:04:16 -0500") References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net> <200303232104.h2NL4GQ04819@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: >> Guido van Rossum writes: >> >> > I think you could subclass the metaclass, override __new__, and delete >> >> > the bogus __getstate__ from the type's __dict__. Then you'll get the >> >> > default pickling behavior which ignores slots; that should work just >> >> > fine in your case. :-) > > [David] >> >> Ooh, that's sneaky! But I can't quite see how it works. The error >> >> message I quoted at the top about __getstate__ happens when you try to >> >> pickle an instance of the class. If I delete __getstate__ during >> >> __new__, it won't be there for pickle to find when I try to do the >> >> pickling. What will keep it from inducing the same error? > > [Guido] >> > Just try it. There are many ways to customize pickling, and if >> > __getstate__ doesn't exist, pickling is done differently. >> >> Since this doesn't work: >> >> >>> d = type('d', (object,), { '__slots__' : ['foo'] } ) >> >>> pickle.dumps(d()) > > Um, you're changing the rules in the middle of the game. You said you > had an *empty* __slots__. I did. I just stuck something in there so I could verify that things were working in the expected way. > My recommendation only applied to that case. I also thought you > were doing this from C, not from Python, but I may be mistaken. You're not mistaken; Just like Python gives a productivity boost over C/C++ for ordinary programming, I find I can learn a lot more about the Python core in a short period of time by writing Python code than by writing 'C' code, so I usually try that first. >> I'm still baffled as to why this works: >> >> >>> class mc(type): >> ... def __new__(self, *args): >> ... x = type.__new__(self, *args) >> ... del args[2]['__getstate__'] > > Hm. I don't think that x.__dict__ is args[2]; it's a copy, and > deleting __getstate__ from the arguments doesn't make any difference > to this example. ...as I discovered... >> ... return x >> ... >> >>> c = mc('c', (object,), { '__slots__' : ['foo'], '__getstate__' : lambda self: tuple() } ) > > Why are you passing a __getstate__ in? The point was getting rid of > the __getstate__ that type.__new__ inserts. Because I didn't understand your intention, nor did I know that the automatic __getstate__ was responsible for generating the error message. I thought the idea was to define a __getstate__, which is a known way to suppress the error message, and then kill it in __new__. I figured that pickle was looking for __getstate__ and when it wasn't there but __slots__ was, rasing the exception. This may explain why I didn't see how the approach could work. Now I understand what you meant. >> >>> pickle.dumps(c()) >> 'ccopy_reg\n_reconstructor\np0\n(c__main__\nc\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n.' >> >> especially since: >> >> >>> dir(d) == dir(c) >> 1 > > I think you have been testing something very different from what you > think you did here. dir(d) == dir(c) because they both have a > __getstate__; but d.__getstate__ is a built-in that raises an > exception, while c.__getstate__ is the lambda you passed in. Yeah, I found that out below. > And have you tried unpickling yet? I expect it to fail. Nope. >> I don't see the logic in the source for object.__reduce__(), so where >> is it? OK, I see it in typeobject.c. But now: >> >> >>> c.__getstate__ >> > >> >> OK, this seems to indicate that my attempt to remove __getstate__ from >> the class __dict__ was a failure. That explains why pickling c works, >> but not why you suggested that I remove __getstate__ inside of >> __new__. Did you mean for me to do something different? > > Yes. I was assuming you'd do this at the C level. To do what I > suggested in Python, I think you'd have to write this: > > class M(type): > def __new__(cls, name, bases, dict): > C = type.__new__(cls, name, bases, dict) > del C.__getstate__ > return C I tried to get too fancy with del C.__dict__['__getstate__'] which didn't work of course. Anyway, thanks for spelling it out for me. I think I understand everything now. -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave@boost-consulting.com Sun Mar 23 21:53:09 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 23 Mar 2003 16:53:09 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: <200303232115.h2NLFFA04846@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Sun, 23 Mar 2003 16:15:15 -0500") References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <200303232115.h2NLFFA04846@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: >> It's almost too bad that the distinction between __new__ and >> __init__ is there -- as we find we need to legitimize the use of >> __new__ with things like __getnewargs__ it be comes a little less >> clear which one should be used, and when. TIMTOWDI and all that. > > __new__ creates a new, initialized object. __init__ sets some values > in an exsting object. __init__ is a regular method and can be called > to reinitialize an existing object (not that I recommend this, but the > mechanism doesn't forbid it). It follows that immutable objects must > be initialized using __new__, since by the time __init__ is called the > object already exists and is immutable. Shouldn't most objects be initialized by __new__, really? IME it's dangerous to have uninitialized objects floating about, especially in the presence of exceptions. >> In the absence of clear guidelines I'm tempted to suggest that C++ got >> this part right. > > Of course you would. Oh, c'mon. C++ is ugly, both brittle *and* inflexible, expensive, painful, etc. There must be at least _one_ well-designed thing about it. Maybe this is it! > I tend to think that Python's analogon to C++ constructors is > __new__, Yup. > and that __init__ is a different mechanism (although it can often be > used where you would use a constructor in C++). > >> Occasionally we get people who think they want to call overridden >> virtual functions from constructors (I presume the analogous thing >> could be done safely from __init__ but not from __new__) > > Whether or not that can be done safely from __init__ depends on the > subclass __init__; it's easy enough to construct examples that don't > work. But yes, for __new__ the situation is more analogous to C++, > except that AFAIK in C++ when you try that you get the base class > virtual function, while in Python you get the overridden method -- > which finds an instance that is incompletely initialized. Either one seems equally likely to be what you don't want. >> but that's pretty rare. I'm interested in gaining insight into the >> Pythonic thinking behind __new__/__init__; I'm sure I don't have the >> complete picture. > > __new__ was introduced to allow initializing immutable objects. It > really applies more to types implemented in C than types implemented > in Python. But it is needed so that a Python subclass of an immutable > C base classs can pass arguments of its choice to the C base class's > constructor. > >> Nice. Too bad about 2.2. > > Maybe the new pickling could be backported, but I fear that it depends > on some other 2.3 feature that's harder to backport, so I haven't > looked into this. Are people who don't want to upgrade really that much more willing if it doesn't involve a minor revision number? I figure that if I supported 2.2 once, I'd have to be very circumspect about doing something which required an upgrade to 2.2.x. -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Sun Mar 23 22:33:59 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 23 Mar 2003 17:33:59 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: "Your message of Sun, 23 Mar 2003 16:53:09 EST." References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <200303232115.h2NLFFA04846@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303232233.h2NMXx905265@pcp02138704pcs.reston01.va.comcast.net> [David] > >> It's almost too bad that the distinction between __new__ and > >> __init__ is there -- as we find we need to legitimize the use of > >> __new__ with things like __getnewargs__ it be comes a little less > >> clear which one should be used, and when. TIMTOWDI and all that. [Guido] > > __new__ creates a new, initialized object. __init__ sets some values > > in an exsting object. __init__ is a regular method and can be called > > to reinitialize an existing object (not that I recommend this, but the > > mechanism doesn't forbid it). It follows that immutable objects must > > be initialized using __new__, since by the time __init__ is called the > > object already exists and is immutable. [David again] > Shouldn't most objects be initialized by __new__, really? IME it's > dangerous to have uninitialized objects floating about, especially in > the presence of exceptions. Normally, there are no external references to an object until after __init__ returns, so you should be safe unless __init__ saves a reference to self somewhere. It does mean that __del__ can be surprised by an uninitialized object, and that's a known pitfall. And an exception in the middle of __new__ has the same problem. So I don't think __new__ is preferred over __init__, unless you need a feature that only __new__ offers (like initializing an immutable base class or returning an existing object or an object of a different class). > >> In the absence of clear guidelines I'm tempted to suggest that C++ got > >> this part right. > > > > Of course you would. > > Oh, c'mon. C++ is ugly, both brittle *and* inflexible, expensive, > painful, etc. There must be at least _one_ well-designed thing about > it. Maybe this is it! You said it. :-) > > I tend to think that Python's analogon to C++ constructors is > > __new__, > > Yup. > > > and that __init__ is a different mechanism (although it can often be > > used where you would use a constructor in C++). > > > >> Occasionally we get people who think they want to call overridden > >> virtual functions from constructors (I presume the analogous thing > >> could be done safely from __init__ but not from __new__) > > > > Whether or not that can be done safely from __init__ depends on the > > subclass __init__; it's easy enough to construct examples that don't > > work. But yes, for __new__ the situation is more analogous to C++, > > except that AFAIK in C++ when you try that you get the base class > > virtual function, while in Python you get the overridden method -- > > which finds an instance that is incompletely initialized. > > Either one seems equally likely to be what you don't want. Yeah, this is something where you can't seem to win. :-( > >> but that's pretty rare. I'm interested in gaining insight into the > >> Pythonic thinking behind __new__/__init__; I'm sure I don't have the > >> complete picture. > > > > __new__ was introduced to allow initializing immutable objects. It > > really applies more to types implemented in C than types implemented > > in Python. But it is needed so that a Python subclass of an immutable > > C base classs can pass arguments of its choice to the C base class's > > constructor. > > > >> Nice. Too bad about 2.2. > > > > Maybe the new pickling could be backported, but I fear that it depends > > on some other 2.3 feature that's harder to backport, so I haven't > > looked into this. > > Are people who don't want to upgrade really that much more willing if > it doesn't involve a minor revision number? I figure that if I > supported 2.2 once, I'd have to be very circumspect about doing > something which required an upgrade to 2.2.x. The idea is that an upgrade from 2.2.x to 2.2.(x+1) won't break any code, it will only fix bugs. For example, Zope requires Python 2.2.1 because of a particular bug in Python 2.2[.0] that caused Zope core dumps. Of course, the "breaks no code" promise can't be true 100% (because some code could depend on a bug), but we try a lot harder not to break stuff than with a 2.x to 2.(x+1). Even though there we also try not to break stuff, we're less anal about it (otherwise the language would just get uglier and uglier by maintaining strict backwards compatibility with all past mistakes). --Guido van Rossum (home page: http://www.python.org/~guido/) From dave@boost-consulting.com Sun Mar 23 23:18:26 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 23 Mar 2003 18:18:26 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: <200303232233.h2NMXx905265@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Sun, 23 Mar 2003 17:33:59 -0500") References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <200303232115.h2NLFFA04846@pcp02138704pcs.reston01.va.comcast.net> <200303232233.h2NMXx905265@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > [Guido] >> > __new__ creates a new, initialized object. __init__ sets some values >> > in an exsting object. __init__ is a regular method and can be called >> > to reinitialize an existing object (not that I recommend this, but the >> > mechanism doesn't forbid it). It follows that immutable objects must >> > be initialized using __new__, since by the time __init__ is called the >> > object already exists and is immutable. > > [David again] >> Shouldn't most objects be initialized by __new__, really? IME it's >> dangerous to have uninitialized objects floating about, especially in >> the presence of exceptions. > > Normally, there are no external references to an object until after > __init__ returns Good point; that's a feature you don't get unless you build two-phase initialization into the core language. Two-phase initialization is more dangerous in C++ because it's not a core language feature. > so you should be safe unless __init__ saves a reference to self > somewhere. It does mean that __del__ can be surprised by an > uninitialized object, and that's a known pitfall. And an exception > in the middle of __new__ has the same problem. C++ deals with that by only destroying the fully-initialized bases and subobjects when an exception is thrown during construction. That's hard to do in the presence of two-phase initialization, though. It may be less of a problem for Python because __del__ is much less commonly needed than nontrivial destructors are in C++. > So I don't think __new__ is preferred over __init__, unless you need a > feature that only __new__ offers (like initializing an immutable base > class or returning an existing object or an object of a different > class). In other words, TIMTOWTDI? <0.3 wink> -- Dave Abrahams Boost Consulting www.boost-consulting.com From tismer@tismer.com Mon Mar 24 12:40:44 2003 From: tismer@tismer.com (Christian Tismer) Date: Mon, 24 Mar 2003 13:40:44 +0100 Subject: [Python-Dev] funny leak Message-ID: <3E7EFCCC.2090202@tismer.com> Hi Tim et al, I just tested generators and found a memory leak. (Has nothing to do with generators). The following code adds one to the overall refcount and gc cannot reclaim it. def conjoin(gs): def gen(): gs # unbreakable cycle gen # unless one is commented out Should I send a bug report, or is this known? The above holds for Python 2.2.2 upto the current CVS version. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From neal@metaslash.com Mon Mar 24 13:43:16 2003 From: neal@metaslash.com (Neal Norwitz) Date: Mon, 24 Mar 2003 08:43:16 -0500 Subject: [Python-Dev] funny leak In-Reply-To: <3E7EFCCC.2090202@tismer.com> References: <3E7EFCCC.2090202@tismer.com> Message-ID: <20030324134316.GR25722@epoch.metaslash.com> On Mon, Mar 24, 2003 at 01:40:44PM +0100, Christian Tismer wrote: > > I just tested generators and found a memory leak. > (Has nothing to do with generators). > The following code adds one to the overall refcount > and gc cannot reclaim it. > > def conjoin(gs): > def gen(): > gs # unbreakable cycle > gen # unless one is commented out With current CVS: >>> gc.collect() 0 [23150 refs] >>> conjoin(1) [23160 refs] >>> conjoin(1) [23170 refs] >>> gc.collect() 8 [23151 refs] >>> conjoin(1) [23161 refs] >>> conjoin(1) [23171 refs] >>> gc.collect() 8 [23151 refs] >>> conjoin(1) [23161 refs] >>> conjoin(1) [23171 refs] >>> gc.collect() 8 [23151 refs] One ref may be leaked the first time gc.collect() is called with garbage (23150 -> 23151). But after that, no more refs are leaked (ref count stays at 23151). Neal From jepler@unpythonic.net Mon Mar 24 13:51:00 2003 From: jepler@unpythonic.net (Jeff Epler) Date: Mon, 24 Mar 2003 07:51:00 -0600 Subject: [Python-Dev] funny leak In-Reply-To: <20030324134316.GR25722@epoch.metaslash.com> References: <3E7EFCCC.2090202@tismer.com> <20030324134316.GR25722@epoch.metaslash.com> Message-ID: <20030324135059.GB28860@unpythonic.net> On Mon, Mar 24, 2003 at 08:43:16AM -0500, Neal Norwitz wrote: > One ref may be leaked the first time gc.collect() is called with > garbage (23150 -> 23151). But after that, no more refs are leaked > (ref count stays at 23151). If that's true, then running the 'def' block repeatedly will leak references, right? I think from Christian's original message this is what he meant, but I'm not sure. Jeff From neal@metaslash.com Mon Mar 24 14:00:23 2003 From: neal@metaslash.com (Neal Norwitz) Date: Mon, 24 Mar 2003 09:00:23 -0500 Subject: [Python-Dev] funny leak In-Reply-To: <20030324135059.GB28860@unpythonic.net> References: <3E7EFCCC.2090202@tismer.com> <20030324134316.GR25722@epoch.metaslash.com> <20030324135059.GB28860@unpythonic.net> Message-ID: <20030324140023.GT25722@epoch.metaslash.com> On Mon, Mar 24, 2003 at 07:51:00AM -0600, Jeff Epler wrote: > On Mon, Mar 24, 2003 at 08:43:16AM -0500, Neal Norwitz wrote: > > One ref may be leaked the first time gc.collect() is called with > > garbage (23150 -> 23151). But after that, no more refs are leaked > > (ref count stays at 23151). > > If that's true, then running the 'def' block repeatedly will leak > references, right? I think from Christian's original message this is > what he meant, but I'm not sure. I misread the original message. Running the 'def' block does indeed leak a reference and collect() has no effect. Similarly: [23154 refs] >>> def conjoin(gs): ... def gen(): ... gs # unbreakable cycle ... gen # unless one is commented out ... [23194 refs] >>> del conjoin [23155 refs] Neal From tismer@tismer.com Mon Mar 24 14:04:46 2003 From: tismer@tismer.com (Christian Tismer) Date: Mon, 24 Mar 2003 15:04:46 +0100 Subject: [Python-Dev] funny leak In-Reply-To: <20030324134316.GR25722@epoch.metaslash.com> References: <3E7EFCCC.2090202@tismer.com> <20030324134316.GR25722@epoch.metaslash.com> Message-ID: <3E7F107E.4020403@tismer.com> Neal Norwitz wrote: > On Mon, Mar 24, 2003 at 01:40:44PM +0100, Christian Tismer wrote: > >>I just tested generators and found a memory leak. >>(Has nothing to do with generators). >>The following code adds one to the overall refcount >>and gc cannot reclaim it. >> >>def conjoin(gs): >> def gen(): >> gs # unbreakable cycle >> gen # unless one is commented out > > > With current CVS: ... > One ref may be leaked the first time gc.collect() is called with > garbage (23150 -> 23151). But after that, no more refs are leaked > (ref count stays at 23151). No, this is not the point. Don't call the function at all, just execute the above code and call gc.collect(). You will see one reference eaten every time you repeat this. ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer@tismer.com Mon Mar 24 14:05:18 2003 From: tismer@tismer.com (Christian Tismer) Date: Mon, 24 Mar 2003 15:05:18 +0100 Subject: [Python-Dev] funny leak In-Reply-To: <20030324135059.GB28860@unpythonic.net> References: <3E7EFCCC.2090202@tismer.com> <20030324134316.GR25722@epoch.metaslash.com> <20030324135059.GB28860@unpythonic.net> Message-ID: <3E7F109E.2060401@tismer.com> Jeff Epler wrote: > On Mon, Mar 24, 2003 at 08:43:16AM -0500, Neal Norwitz wrote: > >>One ref may be leaked the first time gc.collect() is called with >>garbage (23150 -> 23151). But after that, no more refs are leaked >>(ref count stays at 23151). > > > If that's true, then running the 'def' block repeatedly will leak > references, right? I think from Christian's original message this is > what he meant, but I'm not sure. Exactly. -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From oren-py-d@hishome.net Mon Mar 24 14:06:38 2003 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 24 Mar 2003 09:06:38 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: References: <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <200303232115.h2NLFFA04846@pcp02138704pcs.reston01.va.comcast.net> <200303232233.h2NMXx905265@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030324140638.GA41602@hishome.net> On Sun, Mar 23, 2003 at 06:18:26PM -0500, David Abrahams wrote: > Guido van Rossum writes: ... > > [Guido] ...> > > [David again] ... > > > [Guido] .. Ummm... I'm confused. So what is the recommended way to do it? Oren From tim.one@comcast.net Mon Mar 24 15:57:04 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 24 Mar 2003 10:57:04 -0500 Subject: [Python-Dev] funny leak In-Reply-To: <3E7F107E.4020403@tismer.com> Message-ID: [Christian Tismer] > No, this is not the point. Don't call the function > at all, just execute the above code and call > gc.collect(). You will see one reference eaten > every time you repeat this. Can you show explicit evidence instead of trying to describe it? Here's what I tried: def one(): def conjoin(gs): def gen(): gs # unbreakable cycle gen # unless one is commented out import sys, gc lastrc = 0 while 1: one() gc.collect() thisrc = sys.gettotalrefcount() print thisrc - lastrc, lastrc = thisrc Running that program under a debug-build CVS Python shows no growth in sys.gettotalrefcount() after the first two iterations. It also displays no process-size growth. IOW, I see no evidence of any flavor of leak. I don't claim that you don't, but I don't know what "just execute the above code ... one reference eaten every time" *means*. It can't mean executing the specific program I pasted in above, because that simply doesn't eat a reference each time. From tim.one@comcast.net Mon Mar 24 16:02:14 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 24 Mar 2003 11:02:14 -0500 Subject: [Python-Dev] funny leak In-Reply-To: <3E7F107E.4020403@tismer.com> Message-ID: OK, *this* program leaks a reference each time around; probably a missing decref in the compiler: source = """\ def conjoin(gs): def gen(): gs # unbreakable cycle gen # unless one is commented out """ def one(): exec source in {} import sys, gc lastrc = 0 while 1: one() gc.collect() thisrc = sys.gettotalrefcount() print thisrc - lastrc, lastrc = thisrc From tim.one@comcast.net Mon Mar 24 16:10:46 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 24 Mar 2003 11:10:46 -0500 Subject: [Python-Dev] funny leak In-Reply-To: <3E7F107E.4020403@tismer.com> Message-ID: OK, there's no leaking memory here, but there is a leaking refcount: the refcount on the int 0 keeps going up. The compiler has leaked references to little integers before, but offhand I don't recall the details. ----- old stuff ----- OK, *this* program leaks a reference each time around; probably a missing decref in the compiler: source = """\ def conjoin(gs): def gen(): gs # unbreakable cycle gen # unless one is commented out """ def one(): exec source in {} import sys, gc lastrc = 0 while 1: one() gc.collect() thisrc = sys.gettotalrefcount() print thisrc - lastrc, lastrc = thisrc From mwh@python.net Mon Mar 24 16:47:34 2003 From: mwh@python.net (Michael Hudson) Date: Mon, 24 Mar 2003 16:47:34 +0000 Subject: [Python-Dev] funny leak In-Reply-To: (Tim Peters's message of "Mon, 24 Mar 2003 11:10:46 -0500") References: Message-ID: <2m7kaoaezd.fsf@starship.python.net> Tim Peters writes: > OK, there's no leaking memory here, but there is a leaking refcount: the > refcount on the int 0 keeps going up. The compiler has leaked references to > little integers before, but offhand I don't recall the details. This seems to be all it takes: Index: compile.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/compile.c,v retrieving revision 2.275 diff -c -C7 -r2.275 compile.c *** compile.c 12 Feb 2003 16:56:51 -0000 2.275 --- compile.c 24 Mar 2003 16:43:28 -0000 *************** *** 4524,4537 **** --- 4564,4578 ---- d = PyDict_New(); for (i = PyList_GET_SIZE(list); --i >= 0; ) { v = PyInt_FromLong(i); if (v == NULL) goto fail; if (PyDict_SetItem(d, PyList_GET_ITEM(list, i), v) < 0) goto fail; + Py_DECREF(v); if (PyDict_DelItem(*cellvars, PyList_GET_ITEM(list, i)) < 0) goto fail; } pos = 0; i = PyList_GET_SIZE(list); Py_DECREF(list); while (PyDict_Next(*cellvars, &pos, &v, &w)) { ... found by the obscure strategy of searching for "PyInt_FromLong" in Python/compile.c ... A quick eyeballing suggests there are a bunch more of these, but only on error returns. Cheers, M. -- ... when all the programmes on all the channels actually were made by actors with cleft pallettes speaking lines by dyslexic writers filmed by blind cameramen instead of merely seeming like that, it somehow made the whole thing more worthwhile. -- HHGTG, Episode 11 From jacobs@penguin.theopalgroup.com Mon Mar 24 17:37:18 2003 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Mon, 24 Mar 2003 12:37:18 -0500 (EST) Subject: [Python-Dev] funny leak In-Reply-To: <2m7kaoaezd.fsf@starship.python.net> Message-ID: On Mon, 24 Mar 2003, Michael Hudson wrote: > Tim Peters writes: > > > OK, there's no leaking memory here, but there is a leaking refcount: the > > refcount on the int 0 keeps going up. The compiler has leaked references to > > little integers before, but offhand I don't recall the details. > > This seems to be all it takes: Your patch isn't a 100% fix, since a reference can still be leaked if the PyDict_SetItem fails. If nobody beats me to it, I can do a validation pass through compile.c and see how many I can squash. -Kevin > > Index: compile.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Python/compile.c,v > retrieving revision 2.275 > diff -c -C7 -r2.275 compile.c > *** compile.c 12 Feb 2003 16:56:51 -0000 2.275 > --- compile.c 24 Mar 2003 16:43:28 -0000 > *************** > *** 4524,4537 **** > --- 4564,4578 ---- > d = PyDict_New(); > for (i = PyList_GET_SIZE(list); --i >= 0; ) { > v = PyInt_FromLong(i); > if (v == NULL) > goto fail; > if (PyDict_SetItem(d, PyList_GET_ITEM(list, i), v) < 0) > goto fail; > + Py_DECREF(v); > if (PyDict_DelItem(*cellvars, PyList_GET_ITEM(list, i)) < 0) > goto fail; > } > pos = 0; > i = PyList_GET_SIZE(list); > Py_DECREF(list); > while (PyDict_Next(*cellvars, &pos, &v, &w)) { > > ... found by the obscure strategy of searching for "PyInt_FromLong" in > Python/compile.c ... > > A quick eyeballing suggests there are a bunch more of these, but only > on error returns. > > Cheers, > M. > > -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From tim.one@comcast.net Mon Mar 24 17:51:40 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 24 Mar 2003 12:51:40 -0500 Subject: [Python-Dev] funny leak In-Reply-To: Message-ID: [Michael Hudson] >> This seems to be all it takes: [Kevin Jacobs] > Your patch isn't a 100% fix, since a reference can still be leaked if > the PyDict_SetItem fails. The patch I checked in paid attention to that. > If nobody beats me to it, I can do a validation pass > through compile.c and see how many I can squash. > ... > A quick eyeballing suggests there are a bunch more of these, but only > on error returns. Possibly. If a dict setitem call fails, it's almost certainly because we're out of memory, and the program is going to die soon regardless. How much pain it's worth to die with a refcount that's not one too large is open to debate . From mwh@python.net Mon Mar 24 17:55:40 2003 From: mwh@python.net (Michael Hudson) Date: Mon, 24 Mar 2003 17:55:40 +0000 Subject: [Python-Dev] funny leak In-Reply-To: (Tim Peters's message of "Mon, 24 Mar 2003 12:51:40 -0500") References: Message-ID: <2my9348x9f.fsf@starship.python.net> Tim Peters writes: [me] >> A quick eyeballing suggests there are a bunch more of these, but only >> on error returns. > > Possibly. If a dict setitem call fails, it's almost certainly because we're > out of memory, and the program is going to die soon regardless. How much > pain it's worth to die with a refcount that's not one too large is open to > debate . This occurred to me too. I don't think I care enough to do anything about it today. Cheers, M. -- In case you're not a computer person, I should probably point out that "Real Soon Now" is a technical term meaning "sometime before the heat-death of the universe, maybe". -- Scott Fahlman From tim.one@comcast.net Mon Mar 24 17:59:54 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 24 Mar 2003 12:59:54 -0500 Subject: [Python-Dev] funny leak In-Reply-To: <2m7kaoaezd.fsf@starship.python.net> Message-ID: [Michael Hudson] > ... found by the obscure strategy of searching for "PyInt_FromLong" in > Python/compile.c ... Heh. Here at the PyCon sprint, Jeremy & I did the same thing. The mystery for you is how I figued out 0 was leaking to begin with -- but my lips are sealed. From skip@pobox.com Mon Mar 24 18:16:27 2003 From: skip@pobox.com (Skip Montanaro) Date: Mon, 24 Mar 2003 12:16:27 -0600 Subject: [Python-Dev] funny leak In-Reply-To: References: <2m7kaoaezd.fsf@starship.python.net> Message-ID: <15999.19323.486024.376000@montanaro.dyndns.org> Tim> Heh. Here at the PyCon sprint ... So how's it going? S From tim.one@comcast.net Mon Mar 24 20:19:55 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 24 Mar 2003 15:19:55 -0500 Subject: [Python-Dev] funny leak In-Reply-To: <15999.19323.486024.376000@montanaro.dyndns.org> Message-ID: [Tim] > Heh. Here at the PyCon sprint ... [Skip Montanaro] > So how's it going? I wouldn't have guessed it, but legions of Indonesian houseboys giving sprinters foot massages really does increase productivity! I'm not so sure about the ubiquitous champagne fountains, though. roughing-it-ly y'rs - tim From tismer@tismer.com Mon Mar 24 22:48:43 2003 From: tismer@tismer.com (Christian Tismer) Date: Mon, 24 Mar 2003 23:48:43 +0100 Subject: [Python-Dev] funny leak In-Reply-To: References: Message-ID: <3E7F8B4B.1040006@tismer.com> Tim Peters wrote: > [Christian Tismer] > >>No, this is not the point. Don't call the function >>at all, just execute the above code and call >>gc.collect(). You will see one reference eaten >>every time you repeat this. > > > Can you show explicit evidence instead of trying to describe it? Here's > what I tried: Sorry, I had to re-read your message several times until I understood where I wasn't clear: By "execute" I meant exec() this piece of Python code. I actually pasted it in, watching the refcount grow. cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From skip@pobox.com Tue Mar 25 00:06:22 2003 From: skip@pobox.com (Skip Montanaro) Date: Mon, 24 Mar 2003 18:06:22 -0600 Subject: [Python-Dev] Checkins to Attic? Message-ID: <15999.40318.473557.200043@montanaro.dyndns.org> I noticed on the python-checkins list that several changes (newcompile.c and friends) were checked into what appears to be the Attic, e.g.: Update of /cvsroot/python/python/dist/src/Python In directory sc8-pr-cvs1:/tmp/cvs-serv2961/Python Modified Files: Tag: ast-branch newcompile.c Log Message: Redeclared stuff to stop wngs about signed-vs-unsigned mismatches. Index: newcompile.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/Attic/newcompile.c,v retrieving revision 1.1.2.23 retrieving revision 1.1.2.24 Note the RCS file above. Not all files were in the Attic though: Update of /cvsroot/python/python/dist/src/Include In directory sc8-pr-cvs1:/tmp/cvs-serv2961/Include Modified Files: Tag: ast-branch compile.h Log Message: Redeclared stuff to stop wngs about signed-vs-unsigned mismatches. Index: compile.h =================================================================== RCS file: /cvsroot/python/python/dist/src/Include/compile.h,v I'm probably just missing something obvious, but I thought I'd ask. Skip From tim.one@comcast.net Tue Mar 25 00:10:56 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 24 Mar 2003 19:10:56 -0500 Subject: [Python-Dev] Checkins to Attic? In-Reply-To: <15999.40318.473557.200043@montanaro.dyndns.org> Message-ID: As explained on the checkins list, files that are brand new on a branch live in the Attic. CVS uses the Attic for several things, and there's no problem here. From neal@metaslash.com Tue Mar 25 00:11:31 2003 From: neal@metaslash.com (Neal Norwitz) Date: Mon, 24 Mar 2003 19:11:31 -0500 Subject: [Python-Dev] Checkins to Attic? In-Reply-To: <15999.40318.473557.200043@montanaro.dyndns.org> References: <15999.40318.473557.200043@montanaro.dyndns.org> Message-ID: <20030325001130.GD12443@epoch.metaslash.com> On Mon, Mar 24, 2003 at 06:06:22PM -0600, Skip Montanaro wrote: > I noticed on the python-checkins list that several changes (newcompile.c and > friends) were checked into what appears to be the Attic, e.g.: > [snip Attic/newcompile.c] > > Note the RCS file above. Not all files were in the Attic though: > [snip Include/compile.h] > > I'm probably just missing something obvious, but I thought I'd ask. newcompile.c only exists on the branch, not on the head, but compile.h exists in the head. I believe files that are only on the branch reside in the Attic. But shhhh, don't tell the neighbors. :-) Neal From gward@python.net Tue Mar 25 02:04:20 2003 From: gward@python.net (Greg Ward) Date: Mon, 24 Mar 2003 21:04:20 -0500 Subject: [Python-Dev] ossaudiodev tweak needs testing Message-ID: <20030325020420.GA1406@cthulhu.gerg.ca> Hi all -- I have another tweak to the ossaudiodev module that might make it work a little better. Background: some time ago, Jeremy and Guido had problems with test_ossaudiodev hanging due to a blocking open() call. So I made the open() non-blocking in rev 1.25 on 2003/03/11. But that screwed things up for David Hammerton, who emailed me privately the other day that a write() call later on was dying with EAGAIN -- not entirely surprising, since that's how write() is supposed to behave on a file descriptor opened with O_NONBLOCK if it would have blocked. Most OSS device drivers don't actually act that way (sigh), but apparently David's does. So this patch reverses the effect of open() with O_NONBLOCK, meaning the file is back in blocking mode in the conventional Unix sense. (It's in blocking mode in the OSS sense the whole time, or at least until Python code calls the nonblock() method on it.) If you have a Linux or FreeBSD machine with sound hardware that works, can you please run ./python Lib/test/regrtest.py -uaudio test_ossaudiodev with the current CVS head (ie. rev 1.25 of ossaudiodev.c and rev 1.4 of test_ossaudiodev.py), then apply this patch: --- Modules/ossaudiodev.c 11 Mar 2003 16:53:13 -0000 1.25 +++ Modules/ossaudiodev.c 25 Mar 2003 01:54:46 -0000 @@ -139,6 +139,15 @@ PyErr_SetFromErrnoWithFilename(PyExc_IOError, basedev); return NULL; } + + /* And (try to) put it back in blocking mode so we get the + expected write() semantics. */ + if (fcntl(fd, F_SETFL, 0) == -1) { + close(fd); + PyErr_SetFromErrnoWithFilename(PyExc_IOError, basedev); + return NULL; + } + if (ioctl(fd, SNDCTL_DSP_GETFMTS, &afmts) == -1) { PyErr_SetFromErrnoWithFilename(PyExc_IOError, basedev); return NULL; and try it again? If it works in both cases, great. If it crashed with CVS head (EAGAIN from write()?), and now works, wonderful! (Please let me know.) If it works before this patch but not with it, then PLEASE let me know! Otherwise I'll check this in. Thanks -- Greg -- Greg Ward http://www.gerg.ca/ From graham_guttocks@yahoo.co.nz Tue Mar 25 19:42:00 2003 From: graham_guttocks@yahoo.co.nz (=?iso-8859-1?q?Graham=20Guttocks?=) Date: Wed, 26 Mar 2003 07:42:00 +1200 (NZST) Subject: [Python-Dev] cvs.python.sourceforge.net fouled up Message-ID: <20030325194200.52171.qmail@web10305.mail.yahoo.com> $ cvs update cvs [update aborted]: recv() from server cvs.python.sourceforge.net: EOF I've been having this problem on and off for weeks now with anonymous Python cvs. ===== Regards, Graham http://mobile.yahoo.com.au - Yahoo! Mobile - Check & compose your email via SMS on your Telstra or Vodafone mobile. From skip@pobox.com Tue Mar 25 19:58:50 2003 From: skip@pobox.com (Skip Montanaro) Date: Tue, 25 Mar 2003 13:58:50 -0600 Subject: [Python-Dev] cvs.python.sourceforge.net fouled up In-Reply-To: <20030325194200.52171.qmail@web10305.mail.yahoo.com> References: <20030325194200.52171.qmail@web10305.mail.yahoo.com> Message-ID: <16000.46330.437431.388294@montanaro.dyndns.org> Graham> $ cvs update Graham> cvs [update aborted]: recv() from server cvs.python.sourceforge.net: EOF Graham> I've been having this problem on and off for weeks now with Graham> anonymous Python cvs. I've noticed various problems as well, from extraordinarily slow response times to failures such as the above. At the moment it seems to be working reasonably well. I'm using authenticate access, not anonymous, though I don't think that should make a difference. Skip From graham_guttocks@yahoo.co.nz Tue Mar 25 20:22:38 2003 From: graham_guttocks@yahoo.co.nz (=?iso-8859-1?q?Graham=20Guttocks?=) Date: Wed, 26 Mar 2003 08:22:38 +1200 (NZST) Subject: [Python-Dev] cvs.python.sourceforge.net fouled up In-Reply-To: <16000.46330.437431.388294@montanaro.dyndns.org> Message-ID: <20030325202238.72747.qmail@web10304.mail.yahoo.com> Skip Montanaro wrote: > > I'm using authenticate access, not anonymous, though I > don't think that should make a difference. Actually, it does make a difference. I've only had the problem I posted when using anonymous pserver cvs. When using authenticated (ssh) cvs access to sourceforge, my results are MUCH better. ===== Regards, Graham http://mobile.yahoo.com.au - Yahoo! Mobile - Check & compose your email via SMS on your Telstra or Vodafone mobile. From martin@v.loewis.de Tue Mar 25 20:29:04 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 25 Mar 2003 21:29:04 +0100 Subject: [Python-Dev] cvs.python.sourceforge.net fouled up In-Reply-To: <20030325202238.72747.qmail@web10304.mail.yahoo.com> References: <20030325202238.72747.qmail@web10304.mail.yahoo.com> Message-ID: Graham Guttocks writes: > Actually, it does make a difference. I've only had the > problem I posted when using anonymous pserver cvs. When > using authenticated (ssh) cvs access to sourceforge, my > results are MUCH better. This is documented (see site status): in overload situations, anonymous access is disabled in favour of authenticated access (to let people who actually work on all this continue to work). Regards, Martin From graham_guttocks@yahoo.co.nz Tue Mar 25 21:23:17 2003 From: graham_guttocks@yahoo.co.nz (=?iso-8859-1?q?Graham=20Guttocks?=) Date: Wed, 26 Mar 2003 09:23:17 +1200 (NZST) Subject: [Python-Dev] cvs.python.sourceforge.net fouled up In-Reply-To: Message-ID: <20030325212317.10077.qmail@web10308.mail.yahoo.com> "Martin v. Löwis" wrote: > > This is documented (see site status): in overload situations, > anonymous access is disabled in favour of authenticated access Unfortunately, it seems the "overload" situation is now becoming the standard. I can't remember the last time I was able to anonymous cvs update on the first try. ===== Regards, Graham http://mobile.yahoo.com.au - Yahoo! Mobile - Check & compose your email via SMS on your Telstra or Vodafone mobile. From greg@cosc.canterbury.ac.nz Tue Mar 25 22:49:27 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Mar 2003 10:49:27 +1200 (NZST) Subject: [Python-Dev] Doc strings for typeslots? In-Reply-To: Message-ID: <200303252249.h2PMnR310503@oma.cosc.canterbury.ac.nz> A Pyrex user recently pointed out to me that trying to give a docstring to an __xxx__ method of an extension type doesn't work. The reason for this is that the C functions implementing these methods live in slots of the typeobject, and there's apparently nowhere to put docstrings for them. I'm speculating that this could be worked around by getting the slot's wrapper object out of the type dict after the type is initialised, and stuffing a docstring into it. This would only work if a new set of wrappers is created for each type, rather than re-using generic ones. An experiment suggests that this is what happens -- can anyone confirm this? Or, is there a better way of giving these things docstrings that I've missed? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Tue Mar 25 23:02:14 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 25 Mar 2003 18:02:14 -0500 Subject: [Python-Dev] Doc strings for typeslots? In-Reply-To: "Your message of Wed, 26 Mar 2003 10:49:27 +1200." <200303252249.h2PMnR310503@oma.cosc.canterbury.ac.nz> References: <200303252249.h2PMnR310503@oma.cosc.canterbury.ac.nz> Message-ID: <200303252302.h2PN2ER11372@pcp02138704pcs.reston01.va.comcast.net> > A Pyrex user recently pointed out to me that trying > to give a docstring to an __xxx__ method of an > extension type doesn't work. > > The reason for this is that the C functions implementing > these methods live in slots of the typeobject, and there's > apparently nowhere to put docstrings for them. > > I'm speculating that this could be worked around by > getting the slot's wrapper object out of the type > dict after the type is initialised, and stuffing a > docstring into it. > > This would only work if a new set of wrappers is created > for each type, rather than re-using generic ones. An > experiment suggests that this is what happens -- can > anyone confirm this? > > Or, is there a better way of giving these things > docstrings that I've missed? Um, I'm afraid this is how it is. __xxx__ methods have generic docstrings. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Wed Mar 26 00:01:00 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Mar 2003 12:01:00 +1200 (NZST) Subject: [Python-Dev] Doc strings for typeslots? In-Reply-To: <200303252302.h2PN2ER11372@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303260001.h2Q010211097@oma.cosc.canterbury.ac.nz> > Um, I'm afraid this is how it is. __xxx__ methods have generic > docstrings. :-( Can you just clarify a bit what you mean by "this": would my idea of poking a docstring into the wrapper object work, or do all types share the same wrappers? It seems as though they *don't* share the same wrappers... Python 2.2 (#1, Jul 11 2002, 14:19:37) >>> id(int.__dict__['__add__']) 135662196 >>> id(float.__dict__['__add__']) 135668268 ...or is there some magic going on there that I'm not aware of? Thanks, Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Wed Mar 26 02:32:55 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 25 Mar 2003 21:32:55 -0500 Subject: [Python-Dev] Doc strings for typeslots? In-Reply-To: "Your message of Wed, 26 Mar 2003 12:01:00 +1200." <200303260001.h2Q010211097@oma.cosc.canterbury.ac.nz> References: <200303260001.h2Q010211097@oma.cosc.canterbury.ac.nz> Message-ID: <200303260232.h2Q2Wts11814@pcp02138704pcs.reston01.va.comcast.net> > > Um, I'm afraid this is how it is. __xxx__ methods have generic > > docstrings. :-( > > Can you just clarify a bit what you mean by "this": > would my idea of poking a docstring into the wrapper > object work, or do all types share the same wrappers? > > It seems as though they *don't* share the same wrappers... > > Python 2.2 (#1, Jul 11 2002, 14:19:37) > >>> id(int.__dict__['__add__']) > 135662196 > >>> id(float.__dict__['__add__']) > 135668268 > > ...or is there some magic going on there that I'm > not aware of? The descriptors are indeed separate objects, because they wrap different C implemetations (int vs. float add). But they contain a pointer to a static piece of data which is shared by all wrappers, and that's where they get their docstring. --Guido van Rossum (home page: http://www.python.org/~guido/) From Raymond Hettinger" >From past rumblings, I gather that Python is moving towards preventing __builtins__ from being shadowed. I would like to know what you guys think about going ahead with that idea whenever the -O optimization flag is set. The idea is to scan the code for lines like: LOAD_GLOBAL 2 (range) and, if the name is found in __builtins__, then lookup the name, add the reference to the constants table and, replace the code with something like: LOAD_CONST 5 () The opcode replacement bypasses module level shadowing but leaves local shadowing intact. For example: modglob = 1 range = xrange def f(list): for i in list: # local shadowing of 'list' is unaffected print ord(i) # access to 'ord' is optimized j = modglob # non-shadowed globals are unaffected k = range(j) # shadowing of globals is ignored I've already tried out a pure python proof-of-concept and it is straightforward to recode it in C and attach it to PyCode_New(). Raymond Hettinger From jack@performancedrivers.com Thu Mar 27 22:15:05 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Thu, 27 Mar 2003 17:15:05 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer>; from raymond.hettinger@verizon.net on Thu, Mar 27, 2003 at 04:37:59PM -0500 References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> Message-ID: <20030327171505.A1450@localhost.localdomain> On Thu, Mar 27, 2003 at 04:37:59PM -0500, Raymond Hettinger wrote: > >From past rumblings, I gather that Python is moving > towards preventing __builtins__ from being shadowed. > > I would like to know what you guys think about going ahead > with that idea whenever the -O optimization flag is set. > The behavior of a program under -O should be as similar as possible to normal operation. This would break that for some programs. A per-file pragma directive would work. The downside of a pragma module or keyword would be what people try to add to it later. -jackdied From skip@pobox.com Thu Mar 27 22:17:53 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 27 Mar 2003 16:17:53 -0600 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> Message-ID: <16003.30865.470726.301805@montanaro.dyndns.org> Raymond> From past rumblings, I gather that Python is moving towards Raymond> preventing __builtins__ from being shadowed. Raymond> I would like to know what you guys think about going ahead with Raymond> that idea whenever the -O optimization flag is set. Interesting idea, but I think of shadowing builtins being somewhat orthogonal to optimization in the usual sense (speed things up without changing the program's semantics). This is clearly a semantic change, so I'd like to see a different command line flag control this behavior. What happens if you run one program with builtin shadowing enabled, it writes some .pyo files with your suggested change, then later you run another program without it? Seems like there should be some memory in the file of how it was generated so the importer can raise an exception if it finds a mismatch between the shadowing command line flag and a previously generated bytecode file. Skip From mal@lemburg.com Thu Mar 27 22:24:50 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 27 Mar 2003 23:24:50 +0100 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> Message-ID: <3E837A32.6010400@lemburg.com> Raymond Hettinger wrote: >>From past rumblings, I gather that Python is moving > towards preventing __builtins__ from being shadowed. > > I would like to know what you guys think about going ahead > with that idea whenever the -O optimization flag is set. > > The idea is to scan the code for lines like: > > LOAD_GLOBAL 2 (range) > > > and, if the name is found in __builtins__, then lookup > the name, add the reference to the constants table and, > replace the code with something like: > > LOAD_CONST 5 () Using the -O for this is not a working possibility. -OO is reserved for optimizations which can change semantics, but even there, I'd rather like a per-module switch than a command line switch. BTW, why not have a new opcode for symbols in the builtins and then only tweak the opcode implementation instead of having the compiler generate different code ? > The opcode replacement bypasses module level shadowing > but leaves local shadowing intact. For example: > > modglob = 1 > range = xrange > def f(list): > for i in list: # local shadowing of 'list' is unaffected > print ord(i) # access to 'ord' is optimized > j = modglob # non-shadowed globals are unaffected > k = range(j) # shadowing of globals is ignored > > > I've already tried out a pure python proof-of-concept and it is > straightforward to recode it in C and attach it to PyCode_New(). > > > Raymond Hettinger > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 27 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 5 days left EuroPython 2003, Charleroi, Belgium: 89 days left From python@rcn.com Fri Mar 28 00:35:20 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 27 Mar 2003 19:35:20 -0500 Subject: [Python-Dev] Fast access to __builtins__ References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <3E837A32.6010400@lemburg.com> Message-ID: <003601c2f4c1$ea28c0c0$df10a044@oemcomputer> [Jack Diederich] > The behavior of a program under -O should be as similar as possible to normal > operation. This would break that for some programs. > A per-file pragma directive would work. [Skip Montanero] > This is clearly a semantic change, so > I'd like to see a different command line flag control this behavior [M.-A. Lemburg] > Using the -O for this is not a working possibility. -OO > is reserved for optimizations which can change semantics, > but even there, I'd rather like a per-module switch than > a command line switch. That makes good sense. Are you guys thinking of something like this: __fastbuiltins__ = True # optimize all subsequent defs in the module [Jack Diederich] > The downside of a pragma module or > keyword would be what people try to add to it later. Ideally, enabling the pragma would also trigger warnings when the module shadows a builtin. [M.-A. Lemburg] > BTW, why not have a new opcode for symbols in the > builtins and then only tweak the opcode implementation > instead of having the compiler generate different code ? Either way results in changing one opcode/oparg pair, so I don't see how having a new opcode helps. At some point, the name has to be looked-up and a reference to it stored. Afterwards, LOAD_CONST is all that is needed to fetch the reference. Raymond Hettinger From nas@python.ca Fri Mar 28 02:26:49 2003 From: nas@python.ca (Neil Schemenauer) Date: Thu, 27 Mar 2003 18:26:49 -0800 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <3E837A32.6010400@lemburg.com> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <3E837A32.6010400@lemburg.com> Message-ID: <20030328022649.GA30139@glacier.arctrix.com> M.-A. Lemburg wrote: > Using the -O for this is not a working possibility. -OO > is reserved for optimizations which can change semantics, > but even there, I'd rather like a per-module switch than > a command line switch. Optimization options that globally change semantics seem like a bad idea. How would you know some module you are using will not break? I agree with Mark that a per-module switch would be better. In this case, I'm not sure either option is necessary. If I understand Guido correctly, eventually programs may not be allowed to stick names into other modules that override builtins used by that module. If that is disallowed then the compiler knows if a name is a builtin or a global. We could introduce a warning for code that breaks the new rules and have a __future__ statement that implements the optimization. Neil From greg@cosc.canterbury.ac.nz Fri Mar 28 02:48:37 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 28 Mar 2003 14:48:37 +1200 (NZST) Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <20030328022649.GA30139@glacier.arctrix.com> Message-ID: <200303280248.h2S2mb621846@oma.cosc.canterbury.ac.nz> Neil Schemenauer : > Optimization options that globally change semantics seem like a bad > idea. How would you know some module you are using will not break? I > agree with Mark that a per-module switch would be better. There's something a bit strange about this situation, though. The compiler knows whether a module shadows any of its *own* builtins, and can avoid applying the optimisation to those names. So the optimisation doesn't change the semantics of the module itself, provided some conditions are met. But those conditions depend on things *outside* the module -- namely, whether any *other* module assigns to one of this module's globals so as to shadow a builtin. This makes me think that having a flag inside the module is not the right thing to do, or at least it's not the only thing that's needed. There needs to be a way to turn the optimisation *off* from outside the affected module. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jack@performancedrivers.com Fri Mar 28 03:21:10 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Thu, 27 Mar 2003 22:21:10 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <003601c2f4c1$ea28c0c0$df10a044@oemcomputer>; from python@rcn.com on Thu, Mar 27, 2003 at 07:35:20PM -0500 References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <3E837A32.6010400@lemburg.com> <003601c2f4c1$ea28c0c0$df10a044@oemcomputer> Message-ID: <20030327222109.B1450@localhost.localdomain> On Thu, Mar 27, 2003 at 07:35:20PM -0500, Raymond Hettinger wrote: [Raymond proposed this pythonic version of a pragma] > __fastbuiltins__ = True # optimize all subsequent defs in the module > > [Jack Diederich] > > The downside of a pragma module or > > keyword would be what people try to add to it later. > > Ideally, enabling the pragma would also trigger warnings when the > module shadows a builtin. I was thinking that a first-class keyword like pragma no_shadow_builtins would give people a hook to suggest all kinds of nastiness in the future. In your pythonness your suggestion avoided this entirely. We are talking about a very per-module thing, and author's intent. As a progression, how about a subclass of dictionaries that implement warn-on-assign and error-on-assign properties. Having a subclass of dicts specifically for symbol tables has been suggested before and has a wide variety of benefits[1]. This is a good example. -jackdied [1] benefits of a specific 'symtab' type that derives from dict (all variations on 'one stop shop for optimizations') * string-only * assign-once (builtins) * cached lookups From guido@python.org Fri Mar 28 03:23:22 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 27 Mar 2003 22:23:22 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Thu, 27 Mar 2003 16:37:59 EST." <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> Message-ID: <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> Hi Raymond. Too bad you couldn't make it to the conference! We're all having a great time on and off the GWU premises. I used your "more zen" on a slide in my keynote. > From past rumblings, I gather that Python is moving > towards preventing __builtins__ from being shadowed. You must be misunderstanding. The only thing I want to forbid is to stick a name in *another* module's globals that would shadow a builtin. E.g. suppose module A contains: def f(a): return len(a) and module B contains: import A A.len = lambda a: len(a) or 1 # evil len() The assignment to A.len would be forbidden. OTOH this: import random if random.random() >= 0.5: len = 42 def f(): return len will always be allowed and mean what it currently means. The difference is that in the first module, analysis of module A does not reveal that len is shadowed; OTOH in the second example, analyzing just the module's code shows that len may be a global built-in. This is important because a programmer shouldn't have to know the names of built-in objects she doesn't use (also important because in a future version of the language, a name you've picked for a global may become a builtin). The idea of forbidding module B in the first example is that the optimizer is allowed to replace len(a) with a bytecode that calls PyOject_Size() rather than looking up "len" in globals and builtins. The optimizer should only be allowed to make this assumption if careful analysis of an entire module doesn't reveal any possibility that "len" can be shadowed. But it cannot be required to look at all other modules (since those other modules may not even have been written!). Hope this helps. BTW this idea is quite old; I've described it a few years ago under a subject something like "low-hanging fruit". --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Fri Mar 28 03:50:31 2003 From: nas@python.ca (Neil Schemenauer) Date: Thu, 27 Mar 2003 19:50:31 -0800 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030328035031.GB30245@glacier.arctrix.com> Guido van Rossum wrote: > BTW this idea is quite old; I've described it a few years ago under a > subject something like "low-hanging fruit". I really like this idea. If a patch appeared on SF soon, do you think 2.3 could include a warning for code that violates the rule? If so, how about also including a flag to allowed optimizations based on the rule? For example, I think we could have the equivalent of LOAD_FAST for builtin names. Implementing the optimizations could be a bit of work, especially with the existing compiler, but I think the warning should be fairly easy. Neil From guido@python.org Fri Mar 28 04:15:41 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 27 Mar 2003 23:15:41 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Thu, 27 Mar 2003 19:50:31 PST." <20030328035031.GB30245@glacier.arctrix.com> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> Message-ID: <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> > Guido van Rossum wrote: > > BTW this idea is quite old; I've described it a few years ago under a > > subject something like "low-hanging fruit". > > I really like this idea. If a patch appeared on SF soon, do you think > 2.3 could include a warning for code that violates the rule? Maybe. Though you probably would only want to warn when this is done to a .py module -- C extensions should be exempt. And the warning should only warn about inserting names that are actually builtins. > If so, how about also including a flag to allowed optimizations based on > the rule? For example, I think we could have the equivalent of > LOAD_FAST for builtin names. Implementing the optimizations could be a > bit of work, especially with the existing compiler, but I think the > warning should be fairly easy. Sure, let's experiment! --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Fri Mar 28 04:45:31 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 27 Mar 2003 23:45:31 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1048826730.23083.90.camel@localhost.localdomain> On Thu, 2003-03-27 at 23:15, Guido van Rossum wrote: > > I really like this idea. If a patch appeared on SF soon, do you think > > 2.3 could include a warning for code that violates the rule? > > Maybe. Though you probably would only want to warn when this is done > to a .py module -- C extensions should be exempt. And the warning > should only warn about inserting names that are actually builtins. It seems like C extensions pose thorny problems that need to be solved. In particular, the C API says that module's have a dictionary and that adding a key creates global variable in the module. We'll have to break this one way or another, because we don't want to allow C extensions to add globals that shadow builtins. Right? There's a similar problem for Python code, but I imagine it's easy to come up with a dict proxy with the necessary restrictions along the lines of a new-style class dict proxy. How do we break the C API? There's lots of extension code that relies on getting the dict. My first guess is to add an exception that says setting a name that shadows a builtin has no effect. Then extend the getattr code and the module-dict-proxy to ignore those names. Jeremy From guido@python.org Fri Mar 28 04:49:16 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 27 Mar 2003 23:49:16 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of 27 Mar 2003 23:45:31 EST." <1048826730.23083.90.camel@localhost.localdomain> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> Message-ID: <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> > It seems like C extensions pose thorny problems that need to be solved. > In particular, the C API says that module's have a dictionary and that > adding a key creates global variable in the module. We'll have to break > this one way or another, because we don't want to allow C extensions to > add globals that shadow builtins. Right? I don't see the problem. Typically, C extension modules don't have Python code that runs in their globals, so messing with a C extension's globals from the outside has no bad effect on Python code. The problem is more that once a module is loaded, you can't tell from the module whether it was loaded from a .py module or a C extension. > There's a similar problem for Python code, but I imagine it's easy to > come up with a dict proxy with the necessary restrictions along the > lines of a new-style class dict proxy. I'd be happy to proclaim that doing something like import X d = X.__dict__ d["spam"] = 42 # or exec "spam = 42" in d is always prohibited. > How do we break the C API? There's lots of extension code that relies > on getting the dict. My first guess is to add an exception that says > setting a name that shadows a builtin has no effect. Then extend the > getattr code and the module-dict-proxy to ignore those names. The C code can continue to access the real dict. This is what happens for new-style classes: in Python, C.__dict__ is a read-only proxy, but in C, C->tp_dict is a real dict. Then the setattr operation can do as it pleases. For new-style classes, it doesn't forbid anything but updates the type struct when an operator was modified; for modules, it could issue a warning when a name is set that didn't exist before and that shadows a built-in. (Ideally, it should only warn about built-ins that are actually used by the module's code, but that requires the parser to make the list of such built-ins available somehow.) Anyway, the C code that accesses the dict usually lives in the extension module's init function. Frankly, I'm a bit confused by your post. Maybe I don't understand what you're proposing? --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Fri Mar 28 05:51:13 2003 From: nas@python.ca (Neil Schemenauer) Date: Thu, 27 Mar 2003 21:51:13 -0800 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030328055113.GA30405@glacier.arctrix.com> Guido van Rossum wrote: > Though you probably would only want to warn when this is done to a .py > module -- C extensions should be exempt. Exempt from poking or being poked? > And the warning should only warn about inserting names that are > actually builtins. I have rough patch. The idea is to have the tp_setattro slot of modules check if the name being set is a builtin. It seems to work but perhaps there are cases that make that approach invalid. Time for bed now. :-) Neil From mal@lemburg.com Fri Mar 28 08:34:54 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 28 Mar 2003 09:34:54 +0100 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <003601c2f4c1$ea28c0c0$df10a044@oemcomputer> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <3E837A32.6010400@lemburg.com> <003601c2f4c1$ea28c0c0$df10a044@oemcomputer> Message-ID: <3E84092E.1080803@lemburg.com> Raymond Hettinger wrote: > [per module switch] > That makes good sense. > Are you guys thinking of something like this: > > __fastbuiltins__ = True # optimize all subsequent defs in the module > > [M.-A. Lemburg] > >>BTW, why not have a new opcode for symbols in the >>builtins and then only tweak the opcode implementation >>instead of having the compiler generate different code ? > > Either way results in changing one opcode/oparg pair, so I > don't see how having a new opcode helps. At some point, > the name has to be looked-up and a reference to it stored. > Afterwards, LOAD_CONST is all that is needed to fetch > the reference. Right, but with the new opcode you could have the interpreter decide whether to optimize or not without recompiling the code. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 28 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 4 days left EuroPython 2003, Charleroi, Belgium: 88 days left From mal@lemburg.com Fri Mar 28 08:38:28 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 28 Mar 2003 09:38:28 +0100 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E840A04.6070700@lemburg.com> Guido van Rossum wrote: > I'd be happy to proclaim that doing something like > > import X > d = X.__dict__ > d["spam"] = 42 # or exec "spam = 42" in d > > is always prohibited. That would break lazy module imports such as the one I'm using in mx.Misc.LazyModule.py. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 28 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 4 days left EuroPython 2003, Charleroi, Belgium: 88 days left From aleax@aleax.it Fri Mar 28 09:31:30 2003 From: aleax@aleax.it (Alex Martelli) Date: Fri, 28 Mar 2003 10:31:30 +0100 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303281031.30093.aleax@aleax.it> On Friday 28 March 2003 05:49 am, Guido van Rossum wrote: ... > I don't see the problem. Typically, C extension modules don't have > Python code that runs in their globals, so messing with a C > extension's globals from the outside has no bad effect on Python code. It happens, though -- for code whose performance is not important, e.g. initialization and "resetting" kind of stuff, a PyRun_String can be SO much more concise and handier than meticulous expansion of basically the same things into tens of lines of C code... since "messing from the outside" happens after initialization, and the use cases I can easily find are all specifically DURING initialization, it may be that this problem is too rare to worry about, but, I'm not so sure. Alex From oren-py-d@hishome.net Fri Mar 28 09:44:42 2003 From: oren-py-d@hishome.net (Oren Tirosh) Date: Fri, 28 Mar 2003 04:44:42 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <20030328055113.GA30405@glacier.arctrix.com> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <20030328055113.GA30405@glacier.arctrix.com> Message-ID: <20030328094441.GA10818@hishome.net> On Thu, Mar 27, 2003 at 09:51:13PM -0800, Neil Schemenauer wrote: > Guido van Rossum wrote: > > Though you probably would only want to warn when this is done to a .py > > module -- C extensions should be exempt. > > Exempt from poking or being poked? > > > And the warning should only warn about inserting names that are > > actually builtins. > > I have rough patch. The idea is to have the tp_setattro slot of modules > check if the name being set is a builtin. Does it check if it's one of the standard __builtin__ module or whether it is an attribute of whatever object is currently set as the module's __builtins__ attribute? Oren From python@rcn.com Fri Mar 28 10:58:44 2003 From: python@rcn.com (Raymond Hettinger) Date: Fri, 28 Mar 2003 05:58:44 -0500 Subject: [Python-Dev] Fast access to __builtins__ References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <005d01c2f519$00acb020$aa11a044@oemcomputer> [GvR] > Hi Raymond. Too bad you couldn't make it to the conference! We're > all having a great time on and off the GWU premises. Glad you guys are having a great time. I wish I could be there. > I used your "more zen" on a slide in my keynote. Cool. Any chance of getting your keynote slides on the net? > > From past rumblings, I gather that Python is moving > > towards preventing __builtins__ from being shadowed. > > You must be misunderstanding. > > The only thing I want to forbid is to stick a name in *another* > module's globals that would shadow a builtin. Yes, that *is* different. Allowing shadows means having to watch out for trees. > The idea of forbidding module B in the first example is that the > optimizer is allowed to replace len(a) with a bytecode that calls > PyOject_Size() rather than looking up "len" in globals and builtins. > The optimizer should only be allowed to make this assumption if > careful analysis of an entire module doesn't reveal any possibility > that "len" can be shadowed . . . > BTW this idea is quite old; I've described it a few years ago under a > subject something like "low-hanging fruit". The fruit is a bit high. Doing a full module analysis means deferring the optimization for a second pass after all the code has already been generated. It's doable, but much harder. def f(x): return len(x) + 10 # knowing whether to optimize this def g(): global len # when this is allowed len = lambda x: 5 # is a bear The task is much simpler if it can be known in advance that the substitution is allowed (i.e. a module level switch like: __fastbuiltins__ = True). Raymond Hettinger From jeremy@alum.mit.edu Fri Mar 28 12:29:54 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 28 Mar 2003 07:29:54 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1048854593.23083.97.camel@localhost.localdomain> On Thu, 2003-03-27 at 23:49, Guido van Rossum wrote: > Frankly, I'm a bit confused by your post. Maybe I don't understand > what you're proposing? Modules are modules, right? That is, pickle.py and cPickle.so are both represented as module objects at runtime. A C extension can call PyModule_GetDict() on any module. If so, then any extension module can add names to the __dict__ of any Python module. The problem is that modules expose their representation at the C API level (namespace implemented as PyDictObject), so it's difficult to forbid things at the C level. Jeremy From guido@python.org Fri Mar 28 12:23:44 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 07:23:44 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Thu, 27 Mar 2003 21:51:13 PST." <20030328055113.GA30405@glacier.arctrix.com> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <20030328055113.GA30405@glacier.arctrix.com> Message-ID: <200303281223.h2SCNiU20028@pcp02138704pcs.reston01.va.comcast.net> > Guido van Rossum wrote: > > Though you probably would only want to warn when this is done to a .py > > module -- C extensions should be exempt. > > Exempt from poking or being poked? >From being poked. Poking from C code can't really be prevented, but isn't a problem. > > And the warning should only warn about inserting names that are > > actually builtins. > > I have rough patch. The idea is to have the tp_setattro slot of modules > check if the name being set is a builtin. It seems to work but perhaps > there are cases that make that approach invalid. Time for bed now. :-) SF? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 28 12:26:54 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 07:26:54 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Fri, 28 Mar 2003 09:38:28 +0100." <3E840A04.6070700@lemburg.com> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> <3E840A04.6070700@lemburg.com> Message-ID: <200303281226.h2SCQsc20058@pcp02138704pcs.reston01.va.comcast.net> > > I'd be happy to proclaim that doing something like > > > > import X > > d = X.__dict__ > > d["spam"] = 42 # or exec "spam = 42" in d > > > > is always prohibited. > > That would break lazy module imports such as the one I'm using > in mx.Misc.LazyModule.py. But you could rewrite LazyModule.py to use setattr(X, "spam", 42), right? I don't think it's worth it to have a dict proxy that allows certain keys to be set but not others. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 28 12:30:16 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 07:30:16 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Fri, 28 Mar 2003 10:31:30 +0100." <200303281031.30093.aleax@aleax.it> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> <200303281031.30093.aleax@aleax.it> Message-ID: <200303281230.h2SCUHh20080@pcp02138704pcs.reston01.va.comcast.net> > > I don't see the problem. Typically, C extension modules don't have > > Python code that runs in their globals, so messing with a C > > extension's globals from the outside has no bad effect on Python code. > > It happens, though -- for code whose performance is not important, > e.g. initialization and "resetting" kind of stuff, a PyRun_String can be > SO much more concise and handier than meticulous expansion of > basically the same things into tens of lines of C code... since > "messing from the outside" happens after initialization, and the use > cases I can easily find are all specifically DURING initialization, it may > be that this problem is too rare to worry about, but, I'm not so sure. I think this use case won't have a problem. The C code has access to the real dict, so PyRun_String() never knows that it's poking into a module's globals. Also this is done during module initialization. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Mar 28 12:28:25 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 28 Mar 2003 13:28:25 +0100 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303281226.h2SCQsc20058@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> <3E840A04.6070700@lemburg.com> <200303281226.h2SCQsc20058@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E843FE9.6030400@lemburg.com> Guido van Rossum wrote: >>>I'd be happy to proclaim that doing something like >>> >>> import X >>> d = X.__dict__ >>> d["spam"] = 42 # or exec "spam = 42" in d >>> >>>is always prohibited. >> >>That would break lazy module imports such as the one I'm using >>in mx.Misc.LazyModule.py. > > But you could rewrite LazyModule.py to use setattr(X, "spam", 42), right? Sure. > I don't think it's worth it to have a dict proxy that allows certain > keys to be set but not others. The question is: why make this complicated ? If the programmer enables __fast_builtins__ (or similar) in the module scope, she should be aware that tweaking the module globals from the outside won't have the desired effect. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 28 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 4 days left EuroPython 2003, Charleroi, Belgium: 88 days left From guido@python.org Fri Mar 28 12:33:19 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 07:33:19 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Fri, 28 Mar 2003 04:44:42 EST." <20030328094441.GA10818@hishome.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <20030328055113.GA30405@glacier.arctrix.com> <20030328094441.GA10818@hishome.net> Message-ID: <200303281233.h2SCXJO20101@pcp02138704pcs.reston01.va.comcast.net> > Does it check if it's one of the standard __builtin__ module or > whether it is an attribute of whatever object is currently set as > the module's __builtins__ attribute? Only standard builtins need to be exempt, because the compiler isn't going to optimize non-standard builtins. That's because (a) there won't be special opcodes that implement those builtins directly, and (b) the bytecode compiler doesn't know the contents of __builtins__ so it can't possibly know about nonstandard builtins anyway to generate a LOAD_BUILTIN opcode. BTW, I expect that nonstandard builtins will be ruled out in some future version of the language, or will have to be declared differently. They are too confusing for the human reader of the code. --Guido van Rossum (home page: http://www.python.org/~guido/) From mcherm@mcherm.com Fri Mar 28 12:34:53 2003 From: mcherm@mcherm.com (Michael Chermside) Date: Fri, 28 Mar 2003 04:34:53 -0800 Subject: [Python-Dev] Re: Fast access to __builtins__ Message-ID: <1048854893.3e84416d25541@mcherm.com> Raymond writes: > I've already tried out a pure python proof-of-concept Does that mean that you can give us some idea what kind of performance boost this actually resulted in? -- Michael Chermside From guido@python.org Fri Mar 28 12:39:04 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 07:39:04 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Fri, 28 Mar 2003 05:58:44 EST." <005d01c2f519$00acb020$aa11a044@oemcomputer> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <005d01c2f519$00acb020$aa11a044@oemcomputer> Message-ID: <200303281239.h2SCd4K20135@pcp02138704pcs.reston01.va.comcast.net> > Cool. Any chance of getting your keynote slides on the net? Yes, after the conference. > > > From past rumblings, I gather that Python is moving > > > towards preventing __builtins__ from being shadowed. > > > > You must be misunderstanding. > > > > The only thing I want to forbid is to stick a name in *another* > > module's globals that would shadow a builtin. > > Yes, that *is* different. > Allowing shadows means having to watch out for trees. Being poetic? > > The idea of forbidding module B in the first example is that the > > optimizer is allowed to replace len(a) with a bytecode that calls > > PyOject_Size() rather than looking up "len" in globals and builtins. > > The optimizer should only be allowed to make this assumption if > > careful analysis of an entire module doesn't reveal any possibility > > that "len" can be shadowed > . . . > > BTW this idea is quite old; I've described it a few years ago under a > > subject something like "low-hanging fruit". > > The fruit is a bit high. Doing a full module analysis means > deferring the optimization for a second pass after all the code > has already been generated. It's doable, but much harder. You're stuck in a one-pass compiler mindset. We build a parse tree for the entire module before we start generating bytecode. We already have tools to do namespace analysis for the entire tree (Jeremy added these to implement nested scopes). > def f(x): > return len(x) + 10 # knowing whether to optimize this > > def g(): > global len # when this is allowed > len = lambda x: 5 # is a bear > > The task is much simpler if it can be known in advance that > the substitution is allowed (i.e. a module level switch like: > __fastbuiltins__ = True). -1000. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 28 12:45:03 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 07:45:03 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of 28 Mar 2003 07:29:54 EST." <1048854593.23083.97.camel@localhost.localdomain> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> <1048854593.23083.97.camel@localhost.localdomain> Message-ID: <200303281245.h2SCj3n20179@pcp02138704pcs.reston01.va.comcast.net> > > Frankly, I'm a bit confused by your post. Maybe I don't understand > > what you're proposing? > > Modules are modules, right? That is, pickle.py and cPickle.so are both > represented as module objects at runtime. A C extension can call > PyModule_GetDict() on any module. If so, then any extension module can > add names to the __dict__ of any Python module. The problem is that > modules expose their representation at the C API level (namespace > implemented as PyDictObject), so it's difficult to forbid things at the > C level. Oh sure. I don't think it's necessary to forbid things at the C API level in the sense of making it impossible to do. We'll just document that C code shouldn't do that. There's plenty that C code could do but shouldn't because it breaks the world. I don't expect there will be much C in violation of this prohibition. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 28 12:48:42 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 07:48:42 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Fri, 28 Mar 2003 13:28:25 +0100." <3E843FE9.6030400@lemburg.com> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> <3E840A04.6070700@lemburg.com> <200303281226.h2SCQsc20058@pcp02138704pcs.reston01.va.comcast.net> <3E843FE9.6030400@lemburg.com> Message-ID: <200303281248.h2SCmgP20200@pcp02138704pcs.reston01.va.comcast.net> > The question is: why make this complicated ? > > If the programmer enables __fast_builtins__ (or similar) in the > module scope, she should be aware that tweaking the module globals > from the outside won't have the desired effect. I don't want programmers to have to add all sorts of magical incantations to their top to guide the optimizer. Today it's __fast_builtins__, tomorrow it's a promise that a class won't be poked. Poking a module from the outside is frequent enough, but poking names that shadow builtins is extremely rare. So almost all modules would need __fast_builtins__, because it would almost always help. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Fri Mar 28 12:46:37 2003 From: barry@python.org (Barry Warsaw) Date: Fri, 28 Mar 2003 07:46:37 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303281233.h2SCXJO20101@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <50AD9BA1-611B-11D7-98FE-003065EEFAC8@python.org> On Friday, March 28, 2003, at 07:33 AM, Guido van Rossum wrote: > BTW, I expect that nonstandard builtins will be ruled out in some > future version of the language, or will have to be declared > differently. They are too confusing for the human reader of the code. When you say "nonstandard builtins", do you mean nonstandard names or nonstandard values, or both? E.g. assigning gettext.ugettext() to builtin _() or setting open() to some debugging func. I wouldn't want to completely disallow these, but I'd be happy if you had to do something special and/or (more) explicit to make them work. -Barry From skip@pobox.com Fri Mar 28 13:30:08 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 28 Mar 2003 07:30:08 -0600 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <50AD9BA1-611B-11D7-98FE-003065EEFAC8@python.org> References: <200303281233.h2SCXJO20101@pcp02138704pcs.reston01.va.comcast.net> <50AD9BA1-611B-11D7-98FE-003065EEFAC8@python.org> Message-ID: <16004.20064.67576.926961@montanaro.dyndns.org> Barry> ... or setting open() to some debugging func. Barry> I wouldn't want to completely disallow these, but I'd be happy if Barry> you had to do something special and/or (more) explicit to make Barry> them work. Like a compiler flag to disable the run-time optimization so your debugging open() would be seen everywhere? Sort of like Guido's observation about __fastbuiltins__ = True, the frequent case (regular, optimized version of open()) should be the default, while the exception requires programmer or user action. Skip From nas@python.ca Fri Mar 28 13:47:34 2003 From: nas@python.ca (Neil Schemenauer) Date: Fri, 28 Mar 2003 05:47:34 -0800 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303281245.h2SCj3n20179@pcp02138704pcs.reston01.va.comcast.net> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> <1048854593.23083.97.camel@localhost.localdomain> <200303281245.h2SCj3n20179@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030328134734.GA30759@glacier.arctrix.com> Guido van Rossum wrote: > I don't think it's necessary to forbid things at the C API level in > the sense of making it impossible to do. We'll just document > that C code shouldn't do that. What about Python code that modifies that module __dict__ directly? For example, using vars() or globals() to get a reference to it and doing __setitem__ on it. My warning code only catches assignments that go through the module tp_setattro slot. I suppose warning about direct __dict__ poking would require a proxy object to wrap the module dict. Neil From guido@python.org Fri Mar 28 14:26:11 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 09:26:11 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Fri, 28 Mar 2003 07:46:37 EST." <50AD9BA1-611B-11D7-98FE-003065EEFAC8@python.org> References: <50AD9BA1-611B-11D7-98FE-003065EEFAC8@python.org> Message-ID: <200303281426.h2SEQB720449@pcp02138704pcs.reston01.va.comcast.net> > > BTW, I expect that nonstandard builtins will be ruled out in some > > future version of the language, or will have to be declared > > differently. They are too confusing for the human reader of the code. > > When you say "nonstandard builtins", do you mean nonstandard names > or nonstandard values, or both? E.g. assigning gettext.ugettext() > to builtin _() or setting open() to some debugging func. Nonstandard names. The compiler can't know what's in __builtin__, but it can know the names of the official built-ins. > I wouldn't want to completely disallow these, but I'd be happy if > you had to do something special and/or (more) explicit to make them > work. "from __builtin__ import open" should do it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Mar 28 14:33:19 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Mar 2003 09:33:19 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: "Your message of Fri, 28 Mar 2003 05:47:34 PST." <20030328134734.GA30759@glacier.arctrix.com> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <20030328035031.GB30245@glacier.arctrix.com> <200303280415.h2S4Ff019207@pcp02138704pcs.reston01.va.comcast.net> <1048826730.23083.90.camel@localhost.localdomain> <200303280449.h2S4nGn19327@pcp02138704pcs.reston01.va.comcast.net> <1048854593.23083.97.camel@localhost.localdomain> <200303281245.h2SCj3n20179@pcp02138704pcs.reston01.va.comcast.net> <20030328134734.GA30759@glacier.arctrix.com> Message-ID: <200303281433.h2SEXJq20499@pcp02138704pcs.reston01.va.comcast.net> > > I don't think it's necessary to forbid things at the C API level in > > the sense of making it impossible to do. We'll just document > > that C code shouldn't do that. > > What about Python code that modifies that module __dict__ directly? For > example, using vars() or globals() to get a reference to it and doing > __setitem__ on it. My warning code only catches assignments that go > through the module tp_setattro slot. I suppose warning about direct > __dict__ poking would require a proxy object to wrap the module dict. Yeah, that's another niggling issue. It would be a shame if using globals() or vars() anywhere in a module would disable this optimization. But we can't make these return a proxy either, because they are frequently used with e.g. "exec ... in globals()". --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Fri Mar 28 15:19:15 2003 From: barry@python.org (Barry Warsaw) Date: 28 Mar 2003 10:19:15 -0500 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <16004.20064.67576.926961@montanaro.dyndns.org> References: <200303281233.h2SCXJO20101@pcp02138704pcs.reston01.va.comcast.net> <50AD9BA1-611B-11D7-98FE-003065EEFAC8@python.org> <16004.20064.67576.926961@montanaro.dyndns.org> Message-ID: <1048864755.1753.3.camel@geddy> On Fri, 2003-03-28 at 08:30, Skip Montanaro wrote: > Like a compiler flag to disable the run-time optimization so your debugging > open() would be seen everywhere? Sure, that would work. I'm still thinking about "from __builtins__ import open". Part of the issue there is that you might not be sure /which/ open is causing the problems. But I agree that this is not a common case; I don't even think it would be common programming practice (i.e. my use case is primarily debugging). -Barry P.S. I don't actually poke _() into builtins :) From python@rcn.com Fri Mar 28 23:05:17 2003 From: python@rcn.com (Raymond Hettinger) Date: Fri, 28 Mar 2003 18:05:17 -0500 Subject: [Python-Dev] Re: Fast access to __builtins__ References: <1048854893.3e84416d25541@mcherm.com> Message-ID: <004b01c2f57e$7fb13700$bf11a044@oemcomputer> [Raymond] > > I've already tried out a pure python proof-of-concept [Michael Chermside] > Does that mean that you can give us some idea what kind of > performance boost this actually resulted in? It depends on what you're timing but it is not a big win. * Speed doubles in demo code that just makes references to globals but is much more modest when the builtins are called. This shows that the call time is more significant than the reference time: def f(i): dict; hasattr; float; pow; list; range # speed more than doubles hex(i); str(i); oct(i); int(i); float(i) # 12% gain * Contrived examples show the best gains while code from real apps show smaller improvements: def shuffle(random=random.random): # 6% gain x = list('abcdefghijklmnopqrstuvwyz0123456789') for i in xrange(len(x)-1, 0, -1): j = int(random() * (i+1)) x[i], x[j] = x[j], x[i] * PyStone does not use any builtins. * Scanning my own sources, it looks like some of the builtins almost never appear inside loops (dir, map, filter, zip, dict, range). The ones that are in loops usually do something simple (int, str, chr, len). Either way, builtin access never seems to dominate the running time. OTOH, maybe that's just the way I write code. Raymond Hettinger From python@rcn.com Sat Mar 29 07:11:03 2003 From: python@rcn.com (Raymond Hettinger) Date: Sat, 29 Mar 2003 02:11:03 -0500 Subject: [Python-Dev] Fast access to __builtins__ References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <005d01c2f519$00acb020$aa11a044@oemcomputer> <200303281239.h2SCd4K20135@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <001e01c2f5c2$5e490720$990ca044@oemcomputer> > > The fruit is a bit high. Doing a full module analysis means > > deferring the optimization for a second pass after all the code > > has already been generated. It's doable, but much harder. > > You're stuck in a one-pass compiler mindset. We build a parse tree > for the entire module before we start generating bytecode. We already > have tools to do namespace analysis for the entire tree (Jeremy added > these to implement nested scopes). . . . > > The task is much simpler if it can be known in advance that > > the substitution is allowed (i.e. a module level switch like: > > __fastbuiltins__ = True). > > -1000. Having ruled out a module level switch, the -O flag, and the -OO flag, that leaves the namespace analysis of the entire tree or taking an approach that doesn't change the bytecode. Taking the second approach, I've loaded a small patch for caching lookups into the __builtins__ namespace: www.python.org/sf/711722 It's not as fast as using LOAD_CONST, but is safe in all but one extreme case: calling the function, having an intervening poke into the __builtins__ module, and then calling the function again. I put the cache lookup in the safest possible place. It can be made twice as fast by putting it before the func_globals() lookup. That works in all cases except: calling the function, having an intervening shadowing global assignment, and then calling the function again. This doesn't come-up anywhere in the test suite, my own apps, or apps I've downloaded. Note, regular shadowing (before the first function call) continues to work fine. The bad news is that I've made many timings and found only modest speed-ups in real code. It turns out that access time for builtins is less significant than the time to call and execute those builtins. But, every little bit helps. Raymond Hettinger From mal@lemburg.com Sat Mar 29 11:09:46 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 29 Mar 2003 12:09:46 +0100 Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <001e01c2f5c2$5e490720$990ca044@oemcomputer> References: <000601c2f4ab$32aed9e0$9a0fa044@oemcomputer> <200303280323.h2S3NMQ18471@pcp02138704pcs.reston01.va.comcast.net> <005d01c2f519$00acb020$aa11a044@oemcomputer> <200303281239.h2SCd4K20135@pcp02138704pcs.reston01.va.comcast.net> <001e01c2f5c2$5e490720$990ca044@oemcomputer> Message-ID: <3E857EFA.8010205@lemburg.com> Raymond Hettinger wrote: > The bad news is that I've made many timings and found only modest > speed-ups in real code. It turns out that access time for builtins is > less significant than the time to call and execute those builtins. > But, every little bit helps. Perhaps you ought to look into special casing calling builtins, e.g. by adding a byte code CALL_BUILTIN ?! Since the signatures of the builtins are known in advance, the calling overhead could be reduced, though I'm not sure how much more can be gained since the function call code was refactored. Another idea which might be worth looking into is that of speeding up parsing of C function call arguments, e.g. by caching the results or adding fast paths for common combinations. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 29 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 3 days left EuroPython 2003, Charleroi, Belgium: 87 days left From Raymond Hettinger" >>> def f(): None >>> dis(f) 2 0 LOAD_GLOBAL 0 (None) 3 POP_TOP 4 LOAD_CONST 0 (None) 7 RETURN_VALUE >>> None = 1 :1: SyntaxWarning: assignment to None >>> f() == None False Is this a bug? Should the compiler use the GLOBAL in both places? Or, is it reasonable to use CONST in both places? Raymond Hettinger From tjreedy@udel.edu Sat Mar 29 21:41:54 2003 From: tjreedy@udel.edu (Terry Reedy) Date: Sat, 29 Mar 2003 16:41:54 -0500 Subject: [Python-Dev] Re: Compiler treats None both as a constant and variable References: <002001c2f636$77ad2ce0$e60ca044@oemcomputer> Message-ID: "Raymond Hettinger" wrote in message news:002001c2f636$77ad2ce0$e60ca044@oemcomputer... > >>> def f(): > None > > >>> dis(f) > 2 0 LOAD_GLOBAL 0 (None) > 3 POP_TOP > 4 LOAD_CONST 0 (None) > 7 RETURN_VALUE > > >>> None = 1 > :1: SyntaxWarning: assignment to None > >>> f() == None > False > > > Is this a bug? If one understands (as I do) the default return 'None' to mean the singleton NoneType object that the name 'None' is assumed to be bound to in the docs and which it is bound to on startup, then no. > Should the compiler use the GLOBAL in both places? > Or, is it reasonable to use CONST in both places? I understood the latter to be the plan after a sufficient warning period. TJR From ping@zesty.ca Sun Mar 30 00:27:50 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Sat, 29 Mar 2003 18:27:50 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <200303102023.h2AKNAw23873@oma.cosc.canterbury.ac.nz> Message-ID: On Tue, 11 Mar 2003, Greg Ewing wrote: > Perhaps it would be useful to distinguish between what > might be called "read-only" introspection, and more > powerful forms of introspection. > > Usually it doesn't do any harm to be able to find out > things like what class an object belongs to and what > methods it supports, so perhaps these kinds of > introspections don't need to be restricted by default. A serious flaw with this particular point is that Python does not separate the identity of a class from the power to create instances of that class. Having access to a particular instance should certainly not allow one to ask it for its class, and then instantiate the class with arbitrary constructor arguments. -- ?!ng From ping@zesty.ca Sun Mar 30 00:31:18 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Sat, 29 Mar 2003 18:31:18 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <3E6CAF65.4040505@zope.com> Message-ID: On Mon, 10 Mar 2003, Jim Fulton wrote: > > Maybe every Python object should have a flag which > > can be set to prevent introspection -- like the current > > restricted execution mechanism, but on a per-object > > basis. Then any object could be used as a capability. > > Yes, but not a very useful one. For example, given a file, > you often want to create a "file read" capability which is > an object that allows reading the file but not writing the file. > Just preventing introspection isn't enough. All right. Let me provide an example; maybe this can help ground the discussion a bit. We seem to be doing a lot of dancing around the issue of what a capability is. In my view, it's a red herring to discuss whether or not a particular object "is a capability" or not. It's like asking whether something is an "object". Capabilities are a discipline under which objects are used -- it's better to think of them as a technique or a style of programming. What is at issue here (IMHO) is "how might Python change to facilitate this style of programming?" (The analogy with object-oriented programming holds here also. Even if Python didn't have a "class" keyword, you could still program in an object-oriented style. In fact, the C implementation of Python is clearly object-oriented, even though C has no features specifically designed for OOP. But adding "class" made it a lot easier to do a particular style of object-oriented programming in Python. Unfortunately, the particular style encouraged by Python's "class" keyword doesn't work so well for capability-style programming, because all instance state is public. But Python's "class" is not the only way to do object-oriented programming -- see below.) Okay, at last to the example, then. Here is one way to program in a capability style using today's Python, relying on no changes to the interpreter. This example defines a "class" called DirectoryReader that provides read-only access to only a particular subtree of the filesystem. import os class Namespace: def __init__(self, *args, **kw): for value in args: self.__dict__[value.__name__] = value for name, value in kw.items(): self.__dict__[name] = value class ReadOnly(Namespace): def __setattr__(self, name, value): raise TypeError('read-only namespace') def FileReader(path, name): self = Namespace(file=open(path, 'r')) def __repr__(): return '' % name def reset(): self.file.seek(0) return ReadOnly(__repr__, reset, self.file.read, self.file.close) def DirectoryReader(path, name): def __repr__(): return '' % name def list(): return os.listdir(path) def readfile(name): fullpath = os.path.join(path, name) if os.path.isfile(fullpath): return FileReader(fullpath, name) def getdir(name): fullpath = os.path.join(path, name) if os.path.isdir(fullpath): return DirectoryReader(fullpath, name) return ReadOnly(__repr__, list, readfile, getdir) Now, if we pass an instance of DirectoryReader to code running in restricted mode, i think this is actually secure. Specifically, the only introspective attributes we have to disallow, in order for these objects to enforce their intended restrictions, are im_self and func_globals. Of course, we still have to hide __import__ and sys.modules if we want to prevent code from obtaining access to the filesystem in other ways. Hiding __dict__, while it has no impact on restricting filesystem access, allows us to pass the same DirectoryReader object to two clients without inadvertently creating a communication channel between them. -- ?!ng From tim_one@email.msn.com Sun Mar 30 02:24:29 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 29 Mar 2003 21:24:29 -0500 Subject: [Python-Dev] Compiler treats None both as a constant and variable In-Reply-To: <002001c2f636$77ad2ce0$e60ca044@oemcomputer> Message-ID: [Raymond Hettinger] > >>> def f(): > None > > >>> dis(f) > 2 0 LOAD_GLOBAL 0 (None) > 3 POP_TOP > 4 LOAD_CONST 0 (None) > 7 RETURN_VALUE > > >>> None = 1 > :1: SyntaxWarning: assignment to None > >>> f() == None > False > > > Is this a bug? It's arguable, but it's always been this way, and is so boring that nobody has bothered to argue about it before . It's clear that explicit references to names must follow "the usual" name resolution rules, so you can't gripe about the LOAD_GLOBAL in this function: None is the name of a builtin, and in the language as currently defined, builtin names can be shadowed by globals or locals. What's arguable is the LOAD_CONST, which is generated for the implicit reference to None. The meaning of the Ref Man's A call always returns some value, possibly None, unless it raises an exception. How this value is computed depends on the type of the callable object. is arguably arguable, but I don't think *reasonably* so. To me it clearly intends "the" None, not whatever object you get by evaluating name "None" inside the callable. Likewise when it says the default value of object.__doc__ is None, I think it also clearly means "the" None. > Should the compiler use the GLOBAL in both places? For backward compatibility it has to retain the LOAD_CONST. In this specific example, it doesn't matter whether the first is LOAD_CONST, or LOAD_GLOBAL, or simply thrown away, since the LOAD_GLOBAL 0 POP_TOP pair has no visible effect. If it were a more interesting function, like def f(): global aglobal aglobal = None then for backward compatibility it would have to remain LOAD_GLOBAL. > Or, is it reasonable to use CONST in both places? Not today, but we should be moving in that direction. IMO, None should become a keyword. From skip@mojam.com Sun Mar 30 13:00:29 2003 From: skip@mojam.com (Skip Montanaro) Date: Sun, 30 Mar 2003 07:00:29 -0600 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200303301300.h2UD0TN06135@manatee.mojam.com> Bug/Patch Summary ----------------- 379 open / 3499 total bugs (+9) 138 open / 2050 total patches (+7) New Bugs -------- Lineno calculation sometimes broken (2003-03-24) http://python.org/sf/708901 socket timeouts produce wrong errors in win32 (2003-03-24) http://python.org/sf/708927 sgmllib.SGMLParser.reset() problem (2003-03-25) http://python.org/sf/709491 IDE stdin doesn't have readlines (2003-03-26) http://python.org/sf/710373 Raise IDE output window over splash screen on early crash (2003-03-26) http://python.org/sf/710374 math.log(0) differs from math.log(0L) (2003-03-27) http://python.org/sf/711019 A large block of commands after an "if" cannot be compiled (2003-03-28) http://python.org/sf/711268 htmllib.HTMLParser.anchorlist problem (2003-03-28) http://python.org/sf/711632 SEEK_{SET,CUR,END} missing in 2.2.2 (2003-03-29) http://python.org/sf/711830 Lookup of Mac error string can mess up resfile chain (2003-03-29) http://python.org/sf/711967 gensuitemodule needs to be documented (2003-03-29) http://python.org/sf/711986 IDE textwindow scrollbar is over-enthusiastic (2003-03-29) http://python.org/sf/711989 IDE needs easy access to builtin help() (2003-03-29) http://python.org/sf/711991 OpenBSD 3.2: make altinstall dumps core (2003-03-29) http://python.org/sf/712056 New Patches ----------- add offset to mmap (2003-03-23) http://python.org/sf/708374 OpenVMS complementary patches (2003-03-23) http://python.org/sf/708495 unchecked return values - compile.c (2003-03-23) http://python.org/sf/708604 remove -static option from cygwinccompiler (2003-03-24) http://python.org/sf/709178 CALL_ATTR opcode (2003-03-25) http://python.org/sf/709744 Make "%c" % u"a" work (2003-03-26) http://python.org/sf/710127 Backport to 2.2.2 of codec registry fix (2003-03-27) http://python.org/sf/710576 new test_urllib and patch for found urllib bug (2003-03-27) http://python.org/sf/711002 Warn about inter-module assignments shadowing builtins (2003-03-28) http://python.org/sf/711448 Removing unnecessary lock operations (2003-03-29) http://python.org/sf/711835 urllib2 doesn't support non-anonymous ftp (2003-03-29) http://python.org/sf/711838 Cause pydoc to show data descriptor __doc__ strings (2003-03-29) http://python.org/sf/711902 Obsolete comment in urlparse.py (2003-03-30) http://python.org/sf/712124 Closed Bugs ----------- 2.3a2 build fails on Solaris: posixmodule (2003-02-20) http://python.org/sf/690317 string.atoi function causing TypeError (2003-03-04) http://python.org/sf/697591 Tk 8.4.2 and Tkinter.py _substitue function (2003-03-05) http://python.org/sf/698517 _tkinter.c won't build w/o threads? (2003-03-16) http://python.org/sf/704641 imap docs: s/criterium/criterion/ (2003-03-17) http://python.org/sf/705120 timeouts incompatible w/ line-oriented protocols (2003-03-20) http://python.org/sf/707074 DistributionMetaData error ? (2003-03-23) http://python.org/sf/708320 Closed Patches -------------- fix xmlrpclib float marshalling bug (2002-03-19) http://python.org/sf/532180 Add _winreg support for Cygwin (2002-05-11) http://python.org/sf/554807 New codecs: html, asciihtml (2002-08-03) http://python.org/sf/590682 Check for readline 2.2 features (2002-12-29) http://python.org/sf/659834 AE Inheritance fixes (2003-03-12) http://python.org/sf/702620 Improve code generation (2003-03-20) http://python.org/sf/707257 fix for #698517, Tkinter and tk8.4.2 (2003-03-21) http://python.org/sf/707701 unchecked return value in import.c (2003-03-22) http://python.org/sf/708201 From ping@zesty.ca Sun Mar 30 17:31:37 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Sun, 30 Mar 2003 11:31:37 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: Message-ID: On Sat, 29 Mar 2003, Ka-Ping Yee wrote: > Okay, at last to the example, then. The following is a better formulation in the capability style -- please ignore the previous one. The previously posted code allows names to carry authority, which is a big no-no. This code gets rid of names altogether in the API for file access; it's better to deal with just objects. import os, __builtin__ class Namespace: def __init__(self, *args, **kw): for value in args: self.__dict__[value.__name__] = value for name, value in kw.items(): self.__dict__[name] = value class ImmutableNamespace(Namespace): def __setattr__(self, name, value): raise TypeError('read-only namespace') def ReadStream(file, name): def __repr__(): return '' % name return ImmutableNamespace(__repr__, file.read, file.close, name=name) def FileReader(path, name): def __repr__(): return '' % name def open(): return ReadStream(__builtin__.open(path, 'r'), name) def getsize(): return os.path.getsize(path) def getmtime(): return os.path.getmtime(path) return ImmutableNamespace(__repr__, open, getsize, getmtime, name=name) def DirectoryReader(path, name): def __repr__(): return '' % name def getfiles(): files = [] for name in os.listdir(path): fullpath = os.path.join(path, name) if os.path.isfile(fullpath): files.append(FileReader(fullpath, name)) return files def getdirs(): dirs = [] for name in os.listdir(path): fullpath = os.path.join(path, name) if os.path.isdir(fullpath): dirs.append(DirectoryReader(fullpath, name)) return dirs return ImmutableNamespace(__repr__, getfiles, getdirs, name=name) -- ?!ng From paul@prescod.net Sun Mar 30 18:43:12 2003 From: paul@prescod.net (Paul Prescod) Date: Sun, 30 Mar 2003 10:43:12 -0800 Subject: [Python-Dev] Capabilities In-Reply-To: References: Message-ID: <3E873AC0.2050004@prescod.net> Ka-Ping Yee wrote: >... > > Specifically, the only introspective attributes we have to disallow, in > order for these objects to enforce their intended restrictions, are > im_self and func_globals. Of course, we still have to hide __import__ and > sys.modules if we want to prevent code from obtaining access to the > filesystem in other ways. It wouldn't have hurt for you to describe how the code achieves security by using lexical closure namespaces instead of dictionary-backed namespaces. ;) Part of the trick is that the external names are irrelevant to the functioning of the object. I don't understand one thing. The immutability imposed by the "ImmutableNamespace" trick is easy to turn off. But once I turn it off, I couldn't figure out any way to violate the security because the closure's variables are invisible to any code that is not defined within its block. Why bother with the ImmutableNamespace bit at all? x = DirectoryReader(".", "foo") print x.getfiles() del x.__class__.__setattr__ x.foo = 5 del x.getfiles del x.getdirs x.getfiles() Traceback (most recent call last): File "../foo.py", line 64, in ? x.getfiles() AttributeError: ImmutableNamespace instance has no attribute 'getfiles' But I couldn't figure out how to use this to get access to the file system because as I said before, the external names are irrelevant to the object's implementation. They are early bound. def FileReader(path, name): ... def open2(): print "open2" return open() direct = DirectoryReader(".", "foo") file = direct.getfiles()[0] print file.open2() FileReaderClass = file.__class__ del FileReaderClass.__setattr__ del file.open print file.open2() "open2" binds to open at definition time, not at runtime. I can't see in this model how to implement what C++ calls a "friend" class. Even C++ and Java have ways that related classes can poke around each others internals. So perhaps this is part of what would need to change in Python to have a first-class capabilities feature. If this technique became widespread, Python's restrictions on assigning to lexically inherited variables would probably become annoying. Paul Prescod From guido@python.org Sun Mar 30 19:02:38 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 30 Mar 2003 14:02:38 -0500 Subject: [Python-Dev] Compiler treats None both as a constant and variable In-Reply-To: "Your message of Sat, 29 Mar 2003 16:02:10 EST." <002001c2f636$77ad2ce0$e60ca044@oemcomputer> References: <002001c2f636$77ad2ce0$e60ca044@oemcomputer> Message-ID: <200303301902.h2UJ2cU00562@pcp02138704pcs.reston01.va.comcast.net> > >>> def f(): > None > > >>> dis(f) > 2 0 LOAD_GLOBAL 0 (None) > 3 POP_TOP > 4 LOAD_CONST 0 (None) > 7 RETURN_VALUE > > >>> None = 1 > :1: SyntaxWarning: assignment to None > >>> f() == None > False > > > Is this a bug? Yes, assigning to None is a bug. :-) > Should the compiler use the GLOBAL in both places? No, not until we've officially changed the rules. > Or, is it reasonable to use CONST in both places? No, not until assigning to None is an error rather than a warning. This will have to wait until at least 2.4 -- the warning is new in 2.3. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Mar 30 19:08:19 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 30 Mar 2003 14:08:19 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: "Your message of Sat, 29 Mar 2003 18:27:50 CST." References: Message-ID: <200303301908.h2UJ8Jd00667@pcp02138704pcs.reston01.va.comcast.net> [Ping] > Having access to a particular instance should certainly not allow > one to ask it for its class, and then instantiate the class with > arbitrary constructor arguments. Assuming the Python code in the class itself is not empowered in any special way, I don't see why not. So that suggests that you assume classes can be empowered. I can see this for classes implemented in C; but how can classes implemented in pure Python be empowered? --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@zesty.ca Sun Mar 30 20:45:09 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Sun, 30 Mar 2003 14:45:09 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <3E873AC0.2050004@prescod.net> Message-ID: On Sun, 30 Mar 2003, Paul Prescod wrote: > It wouldn't have hurt for you to describe how the code achieves security > by using lexical closure namespaces instead of dictionary-backed > namespaces. ;) Sorry. :) I assumed it would be clear. > I don't understand one thing. > > The immutability imposed by the "ImmutableNamespace" trick is easy to > turn off. But once I turn it off, I couldn't figure out any way to > violate the security because the closure's variables are invisible to > any code that is not defined within its block. Why bother with the > ImmutableNamespace bit at all? That immutability isn't required in order to prevent filesystem access. That immutability is only there to prevent multiple clients of the same DirectoryReader to use the DirectoryReader as a communication channel. > del x.__class__.__setattr__ Sneaky. :) In restricted mode you wouldn't be able to do that. > I can't see in > this model how to implement what C++ calls a "friend" class. I haven't tried an example that requires that yet, but two classes could communicate through access to a shared object if they wanted to. > If this technique became widespread, Python's restrictions on assigning > to lexically inherited variables would probably become annoying. The Namespace offers a possible workaround. I didn't end up using it in my second code example because none of the objects have mutable state, but here's how you could do it: def Counter(): self = Namespace() self.i = 0 def next(): self.i += 1 return self.i return ImmutableNamespace(next) It would be cool if you could suggest little "security challenges" to work through. Given specific scenarios requiring things like mutability or friend classes, i think trying to implement them in this style could be very instructive. -- ?!ng From ping@zesty.ca Sun Mar 30 20:53:59 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Sun, 30 Mar 2003 14:53:59 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <200303301908.h2UJ8Jd00667@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Sun, 30 Mar 2003, Guido van Rossum wrote: > [Ping] > > Having access to a particular instance should certainly not allow > > one to ask it for its class, and then instantiate the class with > > arbitrary constructor arguments. > > Assuming the Python code in the class itself is not empowered in any > special way, I don't see why not. So that suggests that you assume > classes can be empowered. I can see this for classes implemented in > C; but how can classes implemented in pure Python be empowered? In many classes, __init__ exercises authority. An obvious C type with the same problem is the "file" type (being able to ask a file object for its type gets you the ability to open any file on the filesystem). But many Python classes are in the same position -- they acquire authority upon initialization. To pick one at random, consider zipfile.ZipFile. At first glance it appears that once you create a ZipFile object with mode "r" you can hand it off to provide read-only access to a zip archive. (Even if a security audit of the code reveals holes, my point is that the API isn't far from accommodating such a design intent.) It's useful to be able to separate the authority to read one particular instance of ZipFile from the authority to instantiate new ZipFiles, which currently allows you to open any zip file on the filesystem for reading or writing. -- ?!ng From paul@prescod.net Sun Mar 30 21:59:26 2003 From: paul@prescod.net (Paul Prescod) Date: Sun, 30 Mar 2003 13:59:26 -0800 Subject: [Python-Dev] Capabilities In-Reply-To: References: Message-ID: <3E8768BE.8010603@prescod.net> Ka-Ping Yee wrote: > On Sun, 30 Mar 2003, Paul Prescod wrote: > >>It wouldn't have hurt for you to describe how the code achieves security >>by using lexical closure namespaces instead of dictionary-backed >>namespaces. ;) > > Sorry. :) I assumed it would be clear. It probably is for those following the thread more closely. > That immutability isn't required in order to prevent filesystem access. Okay, now I see that that's what you meant about "__dict__". You were talking about the object's. namespace in general, not the magical attribute named __dict__. >... >>del x.__class__.__setattr__ > > Sneaky. :) I would have complimented you on the elegance of this proposal but I thought it might just be a translation of E's object construct. To whatever extent you innovated in creating it, congratulations, it's very cool. > ....In restricted mode you wouldn't be able to do that. I'm not clear (because I've been following the thread with half my brain, over quite a few days) whether you are making or have made some specific proposal. I guess you are proposing a restricted mode that would make this example actually secure as opposed to almost secure. Are you also proposing any changes to the syntax? Also, is restricted mode an interpreter mode or is it scoped by module? I can't see how it would work as an interpreter mode because too much library code depends on introspectability and hackability of Python objects. >>I can't see in >>this model how to implement what C++ calls a "friend" class. > > I haven't tried an example that requires that yet, but two classes > could communicate through access to a shared object if they wanted to. This doesn't actually simulate "friend" but that's probably because friend makes no sense in a capability system. It occurs to me after further thought that there are two orthogonal problems. First is privacy for the sake of software engineering. Python has always rejected that and I'm glad it has (although it makes advocacy harder). This sort of privacy just gets in your way when you're trying to coerce code into doing what you want when it wasn't designed to. Languages like C++ make it really hard to hack when you need to, but they don't really prevent you from doing it if you are determined enough, so you have the worst of both worlds. Second, is safety for the sake of security. IF you have chosen the capabilities model of security, THEN "friend" perhaps doesn't make sense. You either have a capability reference or you don't. The code's compile-time class or package is irrelevant. Allowing classes (as opposed to objects) to declare each other friends probably only opens up security holes. But if you want to have an example of something like this for the record books, perhaps you could implement an iterator over a data structure with the caveat that we'd like to implement the iterator and data structure in separate files (because sometimes the implementation of each could be large and complicated). I think it works like this: The Data structure is one capability class. The iterator is another. The application asks the data structure to create an iterator. The data structure creates one and passes some subset of its internal state to the new object. It probably could not (and anyway should not) pass a pointer to the opaque closure that is its external representation. So instead it passes in whatever state variables the iterator is likely to be interested in. If you did want to emulate class-based "friendship" (can't think of why, off the top of my head) you could do so like this: def tellMeYourSecrets(myfriend): if instanceof(myfriend, MyFriendClass): return my_namespace() else: raise SecurityViolation, "Bug off" The example in Stroustrop is where you want a vector class to be able to directly read the internals of a matrix class rather than go through inefficient method calls. But in a capabilities universe, even matrices can't, in general, see the internals of other matrices. I guess they'd have to use the trick above if that was really necessary. >>If this technique became widespread, Python's restrictions on assigning >>to lexically inherited variables would probably become annoying. > > > The Namespace offers a possible workaround. Yes, but why workaround rather than fix? Is there a deep reason Python objects can't write to intermediate namespaces? Is it just a little bit of extra safety against accidentally overwriting something? This is probably overkill in the case of intermediate scopes. And if not, there could be a keyword which is like global but for intermediate scopes. > ... > It would be cool if you could suggest little "security challenges" > to work through. Given specific scenarios requiring things like > mutability or friend classes, i think trying to implement them in > this style could be very instructive. Unfortunately, most of the examples I can come up with seem to be hacks, workarounds and optimizations. It isn't surprising that sometimes you lose some efficiency or simplicity when working in a secure system. It makes me wonder about whether E might be less fun, efficient and productive than Python because security is embedded so deeply within it? (just a speculation...I don't know E) A Python that could go back in forth from secure mode to insecure mode might be a nice compromise. Paul Prescod From paul@prescod.net Sun Mar 30 22:45:38 2003 From: paul@prescod.net (Paul Prescod) Date: Sun, 30 Mar 2003 14:45:38 -0800 Subject: [Python-Dev] Capabilities In-Reply-To: References: Message-ID: <3E877392.9060509@prescod.net> Ka-Ping Yee wrote: >... > In many classes, __init__ exercises authority. An obvious C type with > the same problem is the "file" type (being able to ask a file object > for its type gets you the ability to open any file on the filesystem). > But many Python classes are in the same position -- they acquire > authority upon initialization. Just out of curiosity wouldn't you say that part of the capability zen is that capabilities that allow you to turn global strings into objects should either not exist or be very segmented from other capabilities? (in fact I remember discussing this with you at some Python conference!) In capdesk, I believe you drag a capability for a file from one window to another so that the "drop target" never needs to know or care what the filename was. So it might be better to separate the authority from the __init__ than to separate constructors from classes. Arguably it is better to add to the library than to change the language. return securefile("foo.txt").reader() x = zipfile.Zipfile(securefile("foo.txt").reader()) Paul Prescod From guido@python.org Mon Mar 31 00:09:52 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 30 Mar 2003 19:09:52 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: "Your message of Sun, 30 Mar 2003 14:53:59 CST." References: Message-ID: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> > > [Ping] > > > Having access to a particular instance should certainly not allow > > > one to ask it for its class, and then instantiate the class with > > > arbitrary constructor arguments. [Guido] > > Assuming the Python code in the class itself is not empowered in any > > special way, I don't see why not. So that suggests that you assume > > classes can be empowered. I can see this for classes implemented in > > C; but how can classes implemented in pure Python be empowered? [Ping] > In many classes, __init__ exercises authority. An obvious C type with > the same problem is the "file" type (being able to ask a file object > for its type gets you the ability to open any file on the filesystem). > But many Python classes are in the same position -- they acquire > authority upon initialization. What do you mean exactly by "exercise authority"? Again, I understand this for C code, but it would seem that all authority ultimately comes from C code, so I don't understand what authority __init__() can exercise. > To pick one at random, consider zipfile.ZipFile. At first glance it > appears that once you create a ZipFile object with mode "r" you can > hand it off to provide read-only access to a zip archive. (Even if > a security audit of the code reveals holes, my point is that the API > isn't far from accommodating such a design intent.) But is it really ZipFile.__init__ that exercises the authority? Isn't its authority derived from that of the open() function that it calls? > It's useful to be able to separate the authority to read one > particular instance of ZipFile from the authority to instantiate > new ZipFiles, which currently allows you to open any zip file on > the filesystem for reading or writing. In what sense is the ZipFile class an entity by itself, rather than just a pile of Python statements that derive any and all authority from its caller? I understand how class ZipFile could exercise authority in a rexec-based world, if the zipfile module was trusted code. But I thought that a capability view of the world doesn't distinguish between trusted and untrusted code. I guess I need to understand better what kind of "barriers" the capability way of life *does* use. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Mon Mar 31 00:34:16 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 31 Mar 2003 12:34:16 +1200 (NZST) Subject: [Python-Dev] Fast access to __builtins__ In-Reply-To: <200303281031.30093.aleax@aleax.it> Message-ID: <200303310034.h2V0YG902190@oma.cosc.canterbury.ac.nz> Alex Martelli : > It happens, though -- for code whose performance is not important, > e.g. initialization and "resetting" kind of stuff, a PyRun_String can be > SO much more concise and handier than meticulous expansion of > basically the same things into tens of lines of C code... Nowadays you can let Pyrex do the expansion for you...:-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Mar 31 01:49:48 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 31 Mar 2003 13:49:48 +1200 (NZST) Subject: [Python-Dev] Re: Fast access to __builtins__ In-Reply-To: <004b01c2f57e$7fb13700$bf11a044@oemcomputer> Message-ID: <200303310149.h2V1nmM02378@oma.cosc.canterbury.ac.nz> Raymond Hettinger : > * Scanning my own sources, it looks like some of the builtins > almost never appear inside loops (dir, map, filter, zip, dict, range). > The ones that are in loops usually do something simple (int, str, > chr, len). Either way, builtin access never seems to dominate > the running time. OTOH, maybe that's just the way I write code. That's probably true in the large. However, sometimes one has a tight little loop that makes lots of calls to a builtin. I've occasionally improved the speed of something noticeably using the copy-a-builtin-to-a-local trick. Maybe for these cases there could be a "builtin" declaration, like "global" but declaring that something is to be found in the builtin scope? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Mar 31 02:14:16 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 31 Mar 2003 14:14:16 +1200 (NZST) Subject: [Python-Dev] Capabilities In-Reply-To: Message-ID: <200303310214.h2V2EGW02500@oma.cosc.canterbury.ac.nz> Ka-Ping Yee : > On Sun, 30 Mar 2003, Guido van Rossum wrote: > > [Ping] > > > Having access to a particular instance should certainly not allow > > > one to ask it for its class, and then instantiate the class with > > > arbitrary constructor arguments. > > > > Assuming the Python code in the class itself is not empowered in any > > special way, I don't see why not. So that suggests that you assume > > classes can be empowered. I can see this for classes implemented in > > C; but how can classes implemented in pure Python be empowered? > > In many classes, __init__ exercises authority. An obvious C type with > the same problem is the "file" type Yes, I think the solution to this is not to forbid getting hold of the class of an object, but to design constructors so that they don't do anything that might be a security problem. In the case of files, that would mean removing the feature that file("foo") means the same as open("foo"), so that only the open() function can open arbitrary files. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From python@rcn.com Mon Mar 31 03:27:07 2003 From: python@rcn.com (Raymond Hettinger) Date: Sun, 30 Mar 2003 22:27:07 -0500 Subject: [Python-Dev] Re: Fast access to __builtins__ References: <200303310149.h2V1nmM02378@oma.cosc.canterbury.ac.nz> Message-ID: <006401c2f735$68b64be0$4010a044@oemcomputer> > Raymond Hettinger : > > * Scanning my own sources, it looks like some of the builtins > > almost never appear inside loops (dir, map, filter, zip, dict, range). > > The ones that are in loops usually do something simple (int, str, > > chr, len). Either way, builtin access never seems to dominate > > the running time. OTOH, maybe that's just the way I write code. > [Greg Ewing] > That's probably true in the large. However, sometimes one has a tight > little loop that makes lots of calls to a builtin. I've occasionally > improved the speed of something noticeably using the > copy-a-builtin-to-a-local trick. It will have to wait until Py2.4 and the issue will likely be subsumed by more sophisticated approaches that optimize all namespace access. Jeremy's DList technique looks especially promising. Similarly, I'm experimenting with a dict subclass that keeps its values in lists of length one and can return the container for clients interested in fast get or set access to the value associated with a given key. Also, I've been working on a faster design for dictionaries that increases overall sparseness (meaning fewer collisions) while increasing the density of entries that fit in a single cash line (reducing the cost of a miss). Increasing density involves splitting the arrays of PyDictEntry into separate arrays of hashes, keys, and values. Further, the entries are clustered into groups of up to 16 hash values that can fit in a single cache line. This also allows for a much tighter inner loop for the lookup function. Raymond Hettinger From mal@lemburg.com Mon Mar 31 07:42:50 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 31 Mar 2003 09:42:50 +0200 Subject: [Python-Dev] iconv codec Message-ID: <3E87F17A.3080602@lemburg.com> Since the introduction of the iconv codec there have been numerous bug reports related to the codec and the lack of cross platform support for it (ranging from: the codec doesn't compile and the codec doesn't support standard names for common encodings to core dumps in the linking phase). I'd like to question whether the codec is really ready for prime time yet. Right now it causes people more trouble than it does any good. Some examples: https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=675341 https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=690309 https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=712056 https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=694431 The problem doesn't seem to be related to the code implementation itself, but rather the varying quality of iconv implementations out there. OTOH, without some field testing the codec will never get into shape for prime time, so perhaps it would be better to only enable it via a configure option or make a failure to compile the codec as painless as possible. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 31 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 1 days left EuroPython 2003, Charleroi, Belgium: 85 days left From perky@fallin.lv Mon Mar 31 08:04:17 2003 From: perky@fallin.lv (Hye-Shik Chang) Date: Mon, 31 Mar 2003 17:04:17 +0900 Subject: [Python-Dev] iconv codec In-Reply-To: <3E87F17A.3080602@lemburg.com> References: <3E87F17A.3080602@lemburg.com> Message-ID: <20030331080417.GA52581@fallin.lv> On Mon, Mar 31, 2003 at 09:42:50AM +0200, M.-A. Lemburg wrote: > Since the introduction of the iconv codec there have been numerous > bug reports related to the codec and the lack of cross platform > support for it (ranging from: the codec doesn't compile and the > codec doesn't support standard names for common encodings to > core dumps in the linking phase). > > I'd like to question whether the codec is really ready for prime > time yet. Right now it causes people more trouble than it does > any good. iconv_codec NG is ready to submit to SF. I think the newer implementation can resolve many of the patch reports. I'll submit it in a few days. If you have a time, you can review my patches before my submission. The patch includes ko, zh_CN, zh_TW codecs, also. A note about another problems on the current iconv_codec: http://fallin.lv/cvs/~checkout~/py-multibytecodec/reports/iconv.1 The multibytecodecs which is in my patch submission queue: http://fallin.lv/distfiles/py-multibytecodec-030331.tar.gz > OTOH, without some field testing the codec will never get into > shape for prime time, so perhaps it would be better to only > enable it via a configure option or make a failure to compile > the codec as painless as possible. I agree. Hye-Shik =) From mal@lemburg.com Mon Mar 31 09:38:02 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 31 Mar 2003 11:38:02 +0200 Subject: [Python-Dev] iconv codec In-Reply-To: <20030331080417.GA52581@fallin.lv> References: <3E87F17A.3080602@lemburg.com> <20030331080417.GA52581@fallin.lv> Message-ID: <3E880C7A.4060405@lemburg.com> Hye-Shik Chang wrote: > On Mon, Mar 31, 2003 at 09:42:50AM +0200, M.-A. Lemburg wrote: > >>Since the introduction of the iconv codec there have been numerous >>bug reports related to the codec and the lack of cross platform >>support for it (ranging from: the codec doesn't compile and the >>codec doesn't support standard names for common encodings to >>core dumps in the linking phase). >> >>I'd like to question whether the codec is really ready for prime >>time yet. Right now it causes people more trouble than it does >>any good. > > iconv_codec NG is ready to submit to SF. I think the newer > implementation can resolve many of the patch reports. Are you sure ? As I mentioned in my mail, most problems seem to be related to the platform's iconv implementation, not so much to the Python one. > I'll submit > it in a few days. If you have a time, you can review my patches > before my submission. Sorry, no time for that. I'm heading off to Python UK today. > The patch includes ko, zh_CN, zh_TW codecs, also. > > A note about another problems on the current iconv_codec: > http://fallin.lv/cvs/~checkout~/py-multibytecodec/reports/iconv.1 > > The multibytecodecs which is in my patch submission queue: > http://fallin.lv/distfiles/py-multibytecodec-030331.tar.gz > >>OTOH, without some field testing the codec will never get into >>shape for prime time, so perhaps it would be better to only >>enable it via a configure option or make a failure to compile >>the codec as painless as possible. > > I agree. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Mar 31 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: one day left EuroPython 2003, Charleroi, Belgium: 85 days left From guido@python.org Mon Mar 31 12:21:04 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 31 Mar 2003 07:21:04 -0500 Subject: [Python-Dev] iconv codec In-Reply-To: "Your message of Mon, 31 Mar 2003 17:04:17 +0900." <20030331080417.GA52581@fallin.lv> References: <3E87F17A.3080602@lemburg.com> <20030331080417.GA52581@fallin.lv> Message-ID: <200303311221.h2VCL4Y03446@pcp02138704pcs.reston01.va.comcast.net> > iconv_codec NG is ready to submit to SF. Assuming the NG label means this is a completely new implementation, I propose we drop the current iconv implementation immediately and consider the NG version as we would consider any newly contributed module at this point in time (i.e. at most two weeks before the first beta of 2.3 is released). --Guido van Rossum (home page: http://www.python.org/~guido/) From zooko@zooko.com Mon Mar 31 17:51:03 2003 From: zooko@zooko.com (Zooko) Date: Mon, 31 Mar 2003 12:51:03 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Message from Guido van Rossum of "Sun, 30 Mar 2003 19:09:52 EST." <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> References: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido wrote: > > I understand how class ZipFile could exercise authority in a > rexec-based world, if the zipfile module was trusted code. But I > thought that a capability view of the world doesn't distinguish > between trusted and untrusted code. I guess I need to understand > better what kind of "barriers" the capability way of life *does* use. I think you are on track with regard to the deeper question you are grappling with. Almost all dangerous things come ultimately from C code. (I can think of one danger that can come from pure Python code: it can provide an illicit communications channel between other objects.) So in the "separate policy language" way of life, access to the ZipFile class gives you the ability to open files anywhere in the filesystem. The ZipFile class therefore has the "dangerous" flag set, and when you run code that you think might misuse this feature, you set the "can't use dangerous things" flag on that code. In the capability way of life, it is still the case that access to the ZipFile class gives you the ability to open files anywhere in the system! (That is: I'm assuming for now that we implement capabilities without re-writing every dangerous class in the Library.) In this scheme, there are no flags, and when you run code that you think might misuse this feature, you simply don't give that code a reference to the ZipFile class. (Also, we have to arrange that it can't acquire a reference by "import zipfile".) So far the two approaches have the same effect, and the difference, for better or for worse, is that the policy of "this code can't use ZipFile" is encoded in Python reference-management code in the latter and encoded in a pair of flags in the former. Now, we might want to allow certain code to use something else dangerous (such as the socket module) while simultaneously disallowing it from using ZipFile. As we add N more dangerous modules, and M more objects of untrusted code that we want to control, we have an N*M access control matrix to configure which code can use which modules. (In an access control matrix, rows are "subjects" -- things that can exercise authority and columns are "resources" -- things that might require authority when used.) In a system where designation is not unified with authority, you tell this untrusted code "I want you to do this action X.", and then you also have to go update the policy specification to say that the code in question is allowed to do the action X. This "say it twice if you really mean it" overhead puts a practical limit on how fine-grained your policies can be, and it adds a source of accidents that lead to security holes. So now with a large or fine-grained access control matrix, we see the "unify designation and authority" maxim really shines, and really matches well with the Zen of Python. But there is still another advantage that capabilities offer over other access control systems. With normal access control (and an extremely diligent and patient programmer and user) you can in theory achieve the Principle of Least Privilege -- that the untrusted code runs with the minimal set of authorities necessary to do its job. However, this is implemented by creating a new "principal" -- a new row in the access control matrix, setting the access control bits in each element of that row, and preventing any other code from setting the bits in that row. Now, observe that only maximally trusted code -- with "root" authority -- is allowed to make these kinds of updates to the access control matrix. This means that all code is divided into two kinds: the kind that can impose Least-Privilege on code that it invokes (this code has root authority), and the kind that can be constrained by Least-Privilege when it is invoked (this code doesn't). With capabilities there is no such distinction. All code can be constrained to have access to only the privileges that it requires, and at the same time all code can constrain other code that it invokes. This feature, which I call "Higher-Order Principle of Least Privilege" [*] enables new applications. For example, using first-order Least-Privilege a web browser which runs cap-Python "caplets" could extend selective privileges to the caplets, such as permission to read a certain file, while withholding others, such as permission to write to that file, or permission to send the contents of the file to a remote computer. In addition, if cap-Python supports Higher-Order Least-Privilege, those caplets could themselves use other caplets ("web services"?) without unnecessarily exposing their privileges to those sub-caplets. One could imagine, for example, a web browser written in cap-Python, which runs inside the first web browser (e.g. Mozilla with a cap-Python plug-in), and uses cap-Python caplets to extend its (the cap-Python web browser's) functionality. If people already had the cap-Python plug-in installed in their local Mozilla, then simply visiting the "cap-python-browser.com" site would be sufficient to launch the cap-Python web browser. Of course, this could lead straight to a fully functional desktop, making good on Marc Andreesen's old threat to turn the browser into the operating system and the operating system into the device driver. This would be effectively the "virtualization" of access control. I regard it as a kind of holy Grail for internet computing. Regards, Zooko [*] I call it that because it is the application of the Principle of Least Privilege to the implementation of the Principle of Least Privilege. One should be able to impose least-privilege constraints on the code one uses without requiring full root privileges oneself! http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From guido@python.org Mon Mar 31 19:43:52 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 31 Mar 2003 14:43:52 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Your message of "Mon, 31 Mar 2003 12:51:03 EST." References: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303311944.h2VJhsA16638@odiug.zope.com> > Guido wrote: > > > > I understand how class ZipFile could exercise authority in a > > rexec-based world, if the zipfile module was trusted code. But I > > thought that a capability view of the world doesn't distinguish > > between trusted and untrusted code. I guess I need to understand > > better what kind of "barriers" the capability way of life *does* use. [Zooko] > I think you are on track with regard to the deeper question you are > grappling with. Almost all dangerous things come ultimately from C > code. (I can think of one danger that can come from pure Python > code: it can provide an illicit communications channel between other > objects.) > > So in the "separate policy language" way of life, access to the > ZipFile class gives you the ability to open files anywhere in the > filesystem. The ZipFile class therefore has the "dangerous" flag > set, and when you run code that you think might misuse this feature, > you set the "can't use dangerous things" flag on that code. But that's not how rexec works. In the rexec world, the zipfile module has no special privileges; when it is imported by untrusted code, it is reloaded from disk as if it were untrusted itself. The zipfile.ZipFile class is a client of "open", an implementation of which is provided to the untrusted code by the trusted code. This implementation does access checking (according to a separate policy language, indeed). So importing Python modules is always safe for untrusted code, because the imported Python code derives its authority from whatever the untrusted user already has. (It's different for C extension modules of course.) > In the capability way of life, it is still the case that access to > the ZipFile class gives you the ability to open files anywhere in > the system! (That is: I'm assuming for now that we implement > capabilities without re-writing every dangerous class in the > Library.) In this scheme, there are no flags, and when you run code > that you think might misuse this feature, you simply don't give that > code a reference to the ZipFile class. (Also, we have to arrange > that it can't acquire a reference by "import zipfile".) The rexec world solves this very nicely IMO. Can't the capability world do it the same way? The only difference might be that 'open' would have to be a capability. > So far the two approaches have the same effect, and the difference, > for better or for worse, is that the policy of "this code can't use > ZipFile" is encoded in Python reference-management code in the > latter and encoded in a pair of flags in the former. But I think "this code can't use ZipFile" is the wrong thing to say. You should only have to say "this code can't write files" (or something more specific). > Now, we might want to allow certain code to use something else > dangerous (such as the socket module) while simultaneously > disallowing it from using ZipFile. As we add N more dangerous > modules, and M more objects of untrusted code that we want to > control, we have an N*M access control matrix to configure which > code can use which modules. (In an access control matrix, rows are > "subjects" -- things that can exercise authority and columns are > "resources" -- things that might require authority when used.) In the rexec world, modules and classes don't have separate privileges -- the privileges are held by a larger concept, which we might call a "workspace". The rexec world allows many workspaces with different privileges -- but no communication between them. > In a system where designation is not unified with authority, you > tell this untrusted code "I want you to do this action X.", and then > you also have to go update the policy specification to say that the > code in question is allowed to do the action X. Sorry, you've lost me here. Which part is the "designation" (new word for me) and which part is the "authority"? > This "say it twice if you really mean it" overhead puts a practical > limit on how fine-grained your policies can be, and it adds a source > of accidents that lead to security holes. > > So now with a large or fine-grained access control matrix, we see > the "unify designation and authority" maxim really shines, and > really matches well with the Zen of Python. Sorry, this is too abstract for me to see (yet). You are sounding a bit like a used-car salesman here, or "Proof by using Big Words". :-) > But there is still another advantage that capabilities offer over > other access control systems. With normal access control (and an > extremely diligent and patient programmer and user) you can in > theory achieve the Principle of Least Privilege -- that the > untrusted code runs with the minimal set of authorities necessary to > do its job. However, this is implemented by creating a new > "principal" -- a new row in the access control matrix, setting the > access control bits in each element of that row, and preventing any > other code from setting the bits in that row. > > Now, observe that only maximally trusted code -- with "root" > authority -- is allowed to make these kinds of updates to the access > control matrix. This means that all code is divided into two kinds: > the kind that can impose Least-Privilege on code that it invokes > (this code has root authority), and the kind that can be constrained > by Least-Privilege when it is invoked (this code doesn't). In the rexec world, it is possible for a restricted workspace (at least in theory -- the rexec module may not be directly usable but something similar could) to create another workspace and selectively pass privileges into that workspace. > With capabilities there is no such distinction. All code can be > constrained to have access to only the privileges that it requires, > and at the same time all code can constrain other code that it > invokes. > > This feature, which I call "Higher-Order Principle of Least > Privilege" [*] enables new applications. Sorry, more "Big Words". :-) > For example, using first-order Least-Privilege a web browser which > runs cap-Python "caplets" could extend selective privileges to the > caplets, such as permission to read a certain file, while > withholding others, such as permission to write to that file, or > permission to send the contents of the file to a remote computer. > > In addition, if cap-Python supports Higher-Order Least-Privilege, > those caplets could themselves use other caplets ("web services"?) > without unnecessarily exposing their privileges to those > sub-caplets. It really sounds to me like at least one of our fundamental (?) differences is the autonomicity of code units. I think of code (at least Python code) as a passive set of instructions that has no inherent authority but derives authority from the built-ins passed to it; you seem to describe code as having inherent authority. > One could imagine, for example, a web browser written in cap-Python, > which runs inside the first web browser (e.g. Mozilla with a > cap-Python plug-in), and uses cap-Python caplets to extend its (the > cap-Python web browser's) functionality. If people already had the > cap-Python plug-in installed in their local Mozilla, then simply > visiting the "cap-python-browser.com" site would be sufficient to > launch the cap-Python web browser. > > Of course, this could lead straight to a fully functional desktop, > making good on Marc Andreesen's old threat to turn the browser into > the operating system and the operating system into the device > driver. > > This would be effectively the "virtualization" of access control. I > regard it as a kind of holy Grail for internet computing. How practical is this dream? How useful? > Regards, > > Zooko > > [*] I call it that because it is the application of the Principle of > Least Privilege to the implementation of the Principle of Least > Privilege. One should be able to impose least-privilege constraints > on the code one uses without requiring full root privileges oneself! > > http://zooko.com/ > ^-- under re-construction: some new stuff, some broken links --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Mon Mar 31 21:41:50 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 31 Mar 2003 23:41:50 +0200 Subject: [Python-Dev] iconv codec In-Reply-To: <200303311221.h2VCL4Y03446@pcp02138704pcs.reston01.va.comcast.net> References: <3E87F17A.3080602@lemburg.com> <20030331080417.GA52581@fallin.lv> <200303311221.h2VCL4Y03446@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > Assuming the NG label means this is a completely new implementation, I > propose we drop the current iconv implementation immediately and > consider the NG version as we would consider any newly contributed > module at this point in time (i.e. at most two weeks before the first > beta of 2.3 is released). Ok. Given the reported problems with the iconv module, and the prospect of getting a complete rewrite, I'll back out the current code. This is quite sad, IMO, as the code *is* useful for the platforms on which it works, and this *is* the majority of the installations on which it is currently built. Regards, Martin From guido@python.org Mon Mar 31 22:10:36 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 31 Mar 2003 17:10:36 -0500 Subject: [Python-Dev] iconv codec In-Reply-To: Your message of "31 Mar 2003 23:41:50 +0200." References: <3E87F17A.3080602@lemburg.com> <20030331080417.GA52581@fallin.lv> <200303311221.h2VCL4Y03446@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200303312210.h2VMAba24516@odiug.zope.com> > > Assuming the NG label means this is a completely new implementation, I > > propose we drop the current iconv implementation immediately and > > consider the NG version as we would consider any newly contributed > > module at this point in time (i.e. at most two weeks before the first > > beta of 2.3 is released). > > Ok. Given the reported problems with the iconv module, and the > prospect of getting a complete rewrite, I'll back out the current > code. > > This is quite sad, IMO, as the code *is* useful for the platforms on > which it works, and this *is* the majority of the installations on > which it is currently built. But given that it's only got a small audience, a 3rd party module would satisfy the need just as well, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From zooko@zooko.com Mon Mar 31 22:22:41 2003 From: zooko@zooko.com (Zooko) Date: Mon, 31 Mar 2003 17:22:41 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Message from Guido van Rossum of "Mon, 31 Mar 2003 14:43:52 EST." <200303311944.h2VJhsA16638@odiug.zope.com> References: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <200303311944.h2VJhsA16638@odiug.zope.com> Message-ID: It's apparent that I didn't explain capabilities clearly enough. Also I misunderstood something about rexec in general and ZipFile in particular. Once we succeed at understanding each other, I'll then inquire whether you agree with my Big Word Proofs. (I, Zooko, wrote lines prepended with "> > ".) Guido wrote: > > > So in the "separate policy language" way of life, access to the > > ZipFile class gives you the ability to open files anywhere in the > > filesystem. The ZipFile class therefore has the "dangerous" flag > > set, and when you run code that you think might misuse this feature, > > you set the "can't use dangerous things" flag on that code. > > But that's not how rexec works. In the rexec world, the zipfile > module has no special privileges; when it is imported by untrusted > code, it is reloaded from disk as if it were untrusted itself. The > zipfile.ZipFile class is a client of "open", an implementation of > which is provided to the untrusted code by the trusted code. How is the implementation of "open" provided by the trusted code to the untrusted code? Is it possible to provide a different "open" implementation to different "instances" of the zipfile module? (I think not, as there is no such thing as "a different instance of a module", but perhaps you could have two rexec "workspaces" each of which has a zipfile module with a different "open"?) > > In this scheme, there are no flags, and when you run code > > that you think might misuse this feature, you simply don't give that > > code a reference to the ZipFile class. (Also, we have to arrange > > that it can't acquire a reference by "import zipfile".) > > The rexec world solves this very nicely IMO. Can't the capability > world do it the same way? The only difference might be that 'open' > would have to be a capability. I don't understand exactly how rexec works yet, but so far it sounds like capabilities. Here's a two sentence definition of capabilities: Authority originates in C code (in the interpreter or C extension modules), and is passed from thing to thing. A given thing "X" -- an instance of ZipFile, for example -- has the authority to use a given authority -- to invoke the real open(), for example -- if and only if some thing "Y" previously held both the "open()" authority and the "authority to extend authorities to X" authority, and chose to extend the "open()" authority to X. That rule could be enforced with the rexec system, right? Here is a graphical representation of this rule. (Taken from [1].) http://www.erights.org/elib/capability/ode/images/fundamental.gif In the diagram, the authority is "Carol", the thing that started with the authority is "Alice", and Alice is in the process of extending to Bob the authority to use Carol. This act -- the extending of authority from Alice to Bob -- is the only way that Bob can gain authority, and it can only happen if Alice has both the authority to use Carol and the authority to extend authorities to Bob. Those two sentences above (and equivalently the graph) completely define capabilities, in the abstract. They don't say how they are implemented. A particular implementation that I find deeply appealing is to make "has a reference to 'open'" be the determiner of whether a thing has the authority to use "open", and to make "has a reference to X" be the determiner of whether a thing has the authority to extend authorities to X. That's "unifying designation with authority", and that's what the E language does. > But I think "this code can't use ZipFile" is the wrong thing to say. > You should only have to say "this code can't write files" (or > something more specific). I agree. I incorrectly inferred from previous messages that the current problem under discussion was allowing or denying access to the ZipFile class. But whatever resource we wish to control access to, these same techniques will apply. > > In a system where designation is not unified with authority, you > > tell this untrusted code "I want you to do this action X.", and then > > you also have to go update the policy specification to say that the > > code in question is allowed to do the action X. > > Sorry, you've lost me here. Which part is the "designation" (new word > for me) and which part is the "authority"? Sorry. First let me point out that the issue of unifying designation with authority is separable from "the capability access control rule" described above. The two have good synergy, but aren't identical. By "designation" I meant "naming". For example... Let's see, I think I'll go back to my toy tictactoe example from [2]. In the tictactoe example, you have to specify which wxWindow the tictactoe game object should draw into. This is "designation" -- you pass a reference, which designates which specific window you are talking about. If you use the principle of unifying designation and authority, then this same act -- passing a reference to this particular wxWindows object -- conveys both the identification of which window to draw into and the authority to draw into it. # access control system with unified designation and authority game = TicTacToeGame() game.display(wxPython.wxWindow()) If you have separate designation and authority, then the same code has to look something like this: # access control system with separate designation and authority game = TicTacToeGame() window = wxPython.wxWindow() def policy(subject, resource, operation): if (subject is game) and (resource is window) and \ (operation == "invoke methods of"): return True return False rexec.register_policy_hook(policy) game.display(window) This is what I call "say it twice if you really mean it". Hm. Reviewing the rexec docs, I being to suspect that the "access control system with unified designation and authority" *is* how Python does access control in restricted mode, and that rexec itself is just to manage module import and certain dangerous builtins. > It really sounds to me like at least one of our fundamental (?) > differences is the autonomicity of code units. I think of code (at > least Python code) as a passive set of instructions that has no > inherent authority but derives authority from the built-ins passed to > it; you seem to describe code as having inherent authority. I definitely don't intend for code to have inherent authority (other than the Trusted Code Base -- the interpreter -- which can't help but have it). The word "thing" in my two-sentence definition (a white circle in the diagram) are "computational things that can have state and behavior". (This includes Python objects, closures, stack frames, etc... In another context I would call them "objects", but Python uses the word "object" for something more specific -- an instance of a class.) > > This would be effectively the "virtualization" of access control. I > > regard it as a kind of holy Grail for internet computing. > > How practical is this dream? How useful? Let's revisit the issue once we understand one another's access control schemes. ;-) Regards, Zooko [1] http://www.erights.org/elib/capability/ode/overview.html [2] http://mail.python.org/pipermail/python-dev/2003-March/033938.html http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From guido@python.org Mon Mar 31 22:43:09 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 31 Mar 2003 17:43:09 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Your message of "Mon, 31 Mar 2003 17:22:41 EST." References: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <200303311944.h2VJhsA16638@odiug.zope.com> Message-ID: <200303312243.h2VMhCC24639@odiug.zope.com> [Zooko] > It's apparent that I didn't explain capabilities clearly enough. > Also I misunderstood something about rexec in general and ZipFile in > particular. Once we succeed at understanding each other, I'll then > inquire whether you agree with my Big Word Proofs. It's apparent that you don't understand rexec enough; I'll try to explain. > (I, Zooko, wrote lines prepended with "> > ".) > > Guido wrote: > > > > > So in the "separate policy language" way of life, access to the > > > ZipFile class gives you the ability to open files anywhere in the > > > filesystem. The ZipFile class therefore has the "dangerous" flag > > > set, and when you run code that you think might misuse this feature, > > > you set the "can't use dangerous things" flag on that code. > > > > But that's not how rexec works. In the rexec world, the zipfile > > module has no special privileges; when it is imported by untrusted > > code, it is reloaded from disk as if it were untrusted itself. The > > zipfile.ZipFile class is a client of "open", an implementation of > > which is provided to the untrusted code by the trusted code. > > > > How is the implementation of "open" provided by the trusted code to > the untrusted code? Is it possible to provide a different "open" > implementation to different "instances" of the zipfile module? (I > think not, as there is no such thing as "a different instance of a > module", but perhaps you could have two rexec "workspaces" each of > which has a zipfile module with a different "open"?) To the contrary, it is very easy to provide code with a different version of open(). E.g.: # this executes as trusted code def my_open(...): "open() variant that only allows reading" my_builtins = {"len": len, "open": my_open, "range": range, ...} namespace = {"__builtins__": my_builtins} exec "..." in namespace The final exec executes the untrusted code string "..." in a restricted environment where the built-in 'open' refers to my_open. Because import statements are also treated this way (they call the builtin function __import__), the same applies for import. IOW, namespace["__builtins__"] acts as the set of "root capabilities" given to the untrusted code. > > > In this scheme, there are no flags, and when you run code that > > > you think might misuse this feature, you simply don't give that > > > code a reference to the ZipFile class. (Also, we have to > > > arrange that it can't acquire a reference by "import zipfile".) > > > > The rexec world solves this very nicely IMO. Can't the capability > > world do it the same way? The only difference might be that > > 'open' would have to be a capability. > > I don't understand exactly how rexec works yet, but so far it sounds > like capabilities. Yes. That may be why the demand for capabilities has been met with resistance: to quote the French in "Monty Python and the Holy Grail", "we already got one!" :-) > Here's a two sentence definition of capabilities: I've heard too many of these. They are all too abstract. > Authority originates in C code (in the interpreter or C extension > modules), and is passed from thing to thing. This part I like. > A given thing "X" -- an instance of ZipFile, for example -- has the > authority to use a given authority -- to invoke the real open(), for > example -- if and only if some thing "Y" previously held both the > "open()" authority and the "authority to extend authorities to X" > authority, and chose to extend the "open()" authority to X. But the instance of ZipFile is not really a protection domain. Methods on the instance may have different authority. > That rule could be enforced with the rexec system, right? Yes, except that there are currently design bugs (starting in Python 2.2) that open holes; see Samuele Pedroni's posts here. > Here is a graphical representation of this rule. (Taken from [1].) > > http://www.erights.org/elib/capability/ode/images/fundamental.gif > > In the diagram, the authority is "Carol", the thing that started > with the authority is "Alice", and Alice is in the process of > extending to Bob the authority to use Carol. This act -- the > extending of authority from Alice to Bob -- is the only way that Bob > can gain authority, and it can only happen if Alice has both the > authority to use Carol and the authority to extend authorities to > Bob. Sure. The question is, what exactly are Alice, Bob and Carol? I claim that they are not specific class instances but they are each a "workspace" as I tried to explain before. A workspace is more or less the contents of a particular "sys.modules" dictionary. > Those two sentences above (and equivalently the graph) completely > define capabilities, in the abstract. They don't say how they are > implemented. A particular implementation that I find deeply > appealing is to make "has a reference to 'open'" be the determiner > of whether a thing has the authority to use "open", and to make "has > a reference to X" be the determiner of whether a thing has the > authority to extend authorities to X. That's "unifying designation > with authority", and that's what the E language does. Yes. And then "has a reference to 'open'" is bootstrapped by sticking (some variant of) 'open' in the __builtin__ module of a particular "workspace". (Note that workspace is a term I'm inventing here, you won't find it in the Python literature.) > > But I think "this code can't use ZipFile" is the wrong thing to > > say. You should only have to say "this code can't write files" > > (or something more specific). > > I agree. I incorrectly inferred from previous messages that the > current problem under discussion was allowing or denying access to > the ZipFile class. But whatever resource we wish to control access > to, these same techniques will apply. > > > > In a system where designation is not unified with authority, you > > > tell this untrusted code "I want you to do this action X.", and > > > then you also have to go update the policy specification to say > > > that the code in question is allowed to do the action X. > > > > Sorry, you've lost me here. Which part is the "designation" (new > > word for me) and which part is the "authority"? > > Sorry. First let me point out that the issue of unifying > designation with authority is separable from "the capability access > control rule" described above. The two have good synergy, but > aren't identical. > > By "designation" I meant "naming". For example... Let's see, I > think I'll go back to my toy tictactoe example from [2]. > > In the tictactoe example, you have to specify which wxWindow the > tictactoe game object should draw into. This is "designation" -- > you pass a reference, which designates which specific window you are > talking about. If you use the principle of unifying designation and > authority, then this same act -- passing a reference to this > particular wxWindows object -- conveys both the identification of > which window to draw into and the authority to draw into it. > > # access control system with unified designation and authority > game = TicTacToeGame() > game.display(wxPython.wxWindow()) > > If you have separate designation and authority, then the same code > has to look something like this: > > # access control system with separate designation and authority > game = TicTacToeGame() > window = wxPython.wxWindow() > def policy(subject, resource, operation): > if (subject is game) and (resource is window) and \ > (operation == "invoke methods of"): > return True > return False > rexec.register_policy_hook(policy) > game.display(window) > > This is what I call "say it twice if you really mean it". > > Hm. Reviewing the rexec docs, I being to suspect that the "access > control system with unified designation and authority" *is* how > Python does access control in restricted mode, and that rexec itself > is just to manage module import and certain dangerous builtins. Yes. > > It really sounds to me like at least one of our fundamental (?) > > differences is the autonomicity of code units. I think of code > > (at least Python code) as a passive set of instructions that has > > no inherent authority but derives authority from the built-ins > > passed to it; you seem to describe code as having inherent > > authority. > > I definitely don't intend for code to have inherent authority (other > than the Trusted Code Base -- the interpreter -- which can't help > but have it). The word "thing" in my two-sentence definition (a > white circle in the diagram) are "computational things that can have > state and behavior". (This includes Python objects, closures, stack > frames, etc... In another context I would call them "objects", but > Python uses the word "object" for something more specific -- an > instance of a class.) > > > > This would be effectively the "virtualization" of access control. I > > > regard it as a kind of holy Grail for internet computing. > > > > How practical is this dream? How useful? > > Let's revisit the issue once we understand one another's access > control schemes. > ;-) > > Regards, > > Zooko > > [1] http://www.erights.org/elib/capability/ode/overview.html > [2] http://mail.python.org/pipermail/python-dev/2003-March/033938.html I propose to continue this in a week; I'm leaving for Python UK right now and expect to have scarce connectivity there if at all. Back Sunday night. --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@alum.berkeley.edu Mon Mar 31 22:49:29 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Mon, 31 Mar 2003 14:49:29 -0800 (PST) Subject: [Python-Dev] Capabilities In-Reply-To: <200303312243.h2VMhCC24639@odiug.zope.com> References: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <200303311944.h2VJhsA16638@odiug.zope.com> <200303312243.h2VMhCC24639@odiug.zope.com> Message-ID: [Guido van Rossum] > I propose to continue this in a week; I'm leaving for Python UK right > now and expect to have scarce connectivity there if at all. Back > Sunday night. > Which means it will get summarized in *three* separate summaries. This thread will never die!!! I am going to become a capabilities expert whether I want to or not. =) -Brett From greg@cosc.canterbury.ac.nz Mon Mar 31 22:50:59 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 01 Apr 2003 10:50:59 +1200 (NZST) Subject: [Python-Dev] Capabilities In-Reply-To: <200303311944.h2VJhsA16638@odiug.zope.com> Message-ID: <200303312250.h2VMox816033@oma.cosc.canterbury.ac.nz> > But that's not how rexec works. It seems to me that the restricted execution mechanism (is there a shorter term for this? calling it rexec is a misnomer, as has been pointed out -- let's call it the REM for now) really is a kind of capability system. The REM works by closing off a bunch of loopholes and then controlling which builtins a piece of code has access to. That code can then pass them on to other code or withhold them. Sounds a lot like capabilities, doesn't it? So the hypothesised "capability python" would be rather like having REM permanently in effect... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin@v.loewis.de Mon Mar 31 23:11:16 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 01 Apr 2003 01:11:16 +0200 Subject: [Python-Dev] iconv codec In-Reply-To: <200303312210.h2VMAba24516@odiug.zope.com> References: <3E87F17A.3080602@lemburg.com> <20030331080417.GA52581@fallin.lv> <200303311221.h2VCL4Y03446@pcp02138704pcs.reston01.va.comcast.net> <200303312210.h2VMAba24516@odiug.zope.com> Message-ID: Guido van Rossum writes: > But given that it's only got a small audience, a 3rd party module > would satisfy the need just as well, right? The audience is actually quite large: any call to .encode could invoke this codec, if Python does not provide a builtin codec. This includes, in particular, all CJK codecs. Together with a platform-specific codec wrapper for Windows and OS X, the need to package Python-specific CJK codecs (with the size and maintenance issues that come with them) might vanish. Regards, Martin From ping@zesty.ca Mon Mar 31 23:15:09 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Mon, 31 Mar 2003 17:15:09 -0600 (CST) Subject: [Python-Dev] Capabilities In-Reply-To: <3E8768BE.8010603@prescod.net> Message-ID: On Sun, 30 Mar 2003, Paul Prescod wrote: > I'm not clear (because I've been following the thread with half my > brain, over quite a few days) whether you are making or have made some > specific proposal. I guess you are proposing a restricted mode that > would make this example actually secure as opposed to almost secure. Are > you also proposing any changes to the syntax? Not yet. Although it's certainly tempting to propose syntax changes, it makes more sense to really understand what we want first. We can't know that until we've actually tried programming in the capability style in Python. That's why i want to explore the possibilities and try these exercises -- it will help us discover the shortest path from here to there. > Also, is restricted mode an interpreter mode or is it scoped by module? Whether restricted mode is activated depends on the __builtins__ of the current namespace. So the short answer is "by module". > Yes, but why workaround rather than fix? Is there a deep reason Python > objects can't write to intermediate namespaces? No. There's just no syntax for it yet. But let's figure out what we can get away with first. > > It would be cool if you could suggest little "security challenges" > > to work through. Given specific scenarios requiring things like > > mutability or friend classes, i think trying to implement them in > > this style could be very instructive. > > Unfortunately, most of the examples I can come up with seem to be hacks, > workarounds and optimizations. It isn't surprising that sometimes you > lose some efficiency or simplicity when working in a secure system. Hmm, i'm not sure you understood what i meant. The code example i posted is a solution to the design challenge: "provide read-only access to a directory and its subdirectories, but no access to the rest of the filesystem". I'm looking for other security design challenges to tackle in Python. Once enough of them have been tried, we'll have a better understanding of what Python would need to do to make secure programming easier. -- ?!ng